--- title: "Quick start guide" author: "Martin Westgate" date: "2025-04-30" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Quick start guide} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Ecological Metadata Language (EML) is a "comprehensive vocabulary and a readable XML markup syntax for documenting research data" ([Jones et al. 2019](https://doi.org/10.5063/F11834T2)). Supplying metadata in this format is a requirement of several data-sharing institutions, notably including the Global Biodiversity Information Facility ([GBIF](https://gbif.org)) and its partner nodes. While the [EML package](https://cran.r-project.org/package=EML) already exists to support building and manipulating EML within the workspace, there is no comparable system for writing these documents in a more human-readable format, such as R Markdown or Quarto markdown. `delma` provides this capability. The default method for using `delma` is to: 1. Create a metadata statement template with `use_metadata_template()` 2. Edit metadata in your chosen IDE 3. Read the data into R with `read_md()` 4. Write the data to an EML file with `write_eml()` EML is quite stringent in the types of data that are allowed, as well as their order and placement in the hierarchy. Getting these points right can be challenging, and we suggest using `check_metadata()` in combination with the [EML schema documentation](https://eml.ecoinformatics.org) to get it right. ## Formatting your metadata statement There are several points that are unusual about Rmarkdown documents formatted using `delma`, which we discuss below ### Document structure Create a metadata statement template using `use_metadata_template()` ```{r} #| eval: FALSE use_metadata_template("my-metadata-statement.Rmd") ``` Header levels in markdown-formatted metadata statements determine the nested structure of the resulting xml. For example, this text in markdown... ``` # EML ## Dataset ### Title My title ``` ...parses to this in xml. ``` My title ``` ### Attributes Attributes can be added to a particular EML tag by including them in a `list` within a code block, the `label` of which is used by `delma` to link tags to their attributes. To add attributes to the `userId` field, for example, you would add the following code under the `## userId` heading: ````markdown `r ''````{r} #| label: 'userId' #| include: false list(directory = "https://orcid.org") ``` ```` The `include: false` tag is added so that this content isn't displayed when the document is knit. ### Setting a unique ID Every EML document must open with the tag `eml`, and the attributes of that element **must** contain a unique identifier in the `packageId` field, as well as a link to the system within which that key is unique. A logical example might be a DOI: ````markdown `r ''````{r} #| label: 'eml' #| include: false list(packageId = "https://doi.org/10.32614/CRAN.package.galah", system = "https://doi.org") ``` ```` A valid alternative might be a GitHub release: ````markdown `r ''````{r} #| label: 'eml' #| include: false list(packageId = "https://github.com/AtlasOfLivingAustralia/galah-R/releases/download/v2.1.0/galah_2.1.0.tar.gz", system = "https://github.com") ``` ```` Note that the `eml` tag is unusual in `delma` in that it is added automatically if not supplied. Where this occurs, all tag levels are also incremented by one to account for this change. ### Dynamic content `delma` will call `rmarkdown::render()` internally whenever `read_md()` or `render_metadata()` is used, meaning that it is possible to add dynamic content to your metadata statements. The boilerplate statement that ships with delma uses this feature to automatically populate the `Title` and `Pubdate` fields from the `YAML` section, for example: ````markdown `r ''````{r} #| echo: false #| results: 'asis' # NOTE: This is set using the yaml above; do not edit by hand cat(rmarkdown::metadata$date) ``` ```` You could also implement dynamic content using inline code. For example: ```{r, eval=FALSE} This data contains `r readr::read_csv("my_data.csv") |> nrow()` rows. ``` ## Reading, writing, and format conversion Internally, `delma` uses the [lightparser](https://cran.r-project.org/package=lightparser) package to convert markdown files to `tibble`s, and the [xml2](https://cran.r-project.org/package=xml2) package to convert `list`s to xml and write them to file. Between these two packages, we have written functions to convert between tibble and list versions of EML-formatted markdown. Under the hood, `read_md()`, which does a few things: - calls `rmarkdown::render()` on your chosen file, meaning any code blocks or inline code is executed properly. - appends code blocks whose `label` matches an EML tag to the `attributes` of that EML tag; this allows quite complex attribute addition without affecting rendered text. - 'cleans' the imported tibble to a small number of required columns. This tibble can then be rendered to EML using `write_eml()`. The inverse operation is accomplished by calling `read_eml()` followed by `write_md()`.