Research Data Management: Metadata
At a Glance
Metadata is another form of documentation and is simply ‘data about data’.
- It is related to the broader contextual information that describes your data, but is usually more structured in that it conforms to set standards and is machine readable.
- Metadata Standards provide specific data fields or elements to be used in describing data for a particular use.
- Some research fields have predefined metadata standards.
- Dublin Core is used here as an example of a metadata schema to illustrate the types of metadata you might need to capture but also the fact that metadata can be captured in a standardised way using controlled vocabularies.
Find a Metadata Standard for your Discipline
- FAIRsharingFAIRsharing is a curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies.
- DCC list of discipline-specific metadata standardsA detailed list of discipline-specific metadata standards has been compiled by the Digital Curation Centre (DCC).
- Research Data Alliance Metadata Standards DirectoryThe RDA Metadata Standards Directory contains widely used metadata standards in the Arts & Humanities, Engineering, Life Sciences, Physical Sciences & Mathematics, Social & Behavioral Sciences and General Research Data.
Controlled Vocabularies & Ontologies
- Getty Thesaurus of Geographic Names (TNG)The Getty Thesaurus of Geographic Names (TGN) includes names and associated information about places.
- Library of Congress Subject Headings (LCSH)Library of Congress Subject Headings (LCSH) comprise a thesaurus or controlled vocabulary of subject headings, maintained by the United States Library of Congress.
- W3C Note on Date and Time Formats (W3CDTF)This document defines a profile of ISO 8601, the International Standard for the representation of dates and times. ISO 8601 describes a large number of date/time formats. To reduce the scope for error and the complexity of software, it is useful to restrict the supported formats to a small number. This profile defines a few date/time formats, likely to satisfy most requirements.
- ISO 8601 Date and Time FormatAn internationally accepted way to represent dates and times using numbers: YYYY-MM-DD
For example, September 27, 2012 is represented as 2012-09-27. - DCMI Type VocabularyThe DCMI Type Vocabulary provides a general, cross-domain list of approved terms that may be used as values for the Resource Type element to identify the genre of a resource.
2 Documentation and data quality
2b What metadata will accompany the data?
Points to consider:
- Indicate if you will be using a metadata standard to help others identify and discover the data. Use disciplinary metadata standards where these are in place.
Metadata
The UK Data Service explain: "metadata can describe the content, context and provenance of datasets in a standardised and structured manner, typically describing the purpose, origin, temporal characteristics, geographic location, authorship, access conditions and terms of use of a dataset."
Rich metadata (elements which describe the data) enhance the findability, interoperability and reusability of your data. To comply with the FAIR Principles metadata should be accessible, wherever possible, even if the data aren’t.
The quality of the descriptive information (metadata and documentation) regarding the data has a profound impact on their reusability so the more documentation and metadata you can provide, the better.
Your chosen Data Repository or Archive may have a metadata template you can complete or a required standard you must use. If not you should follow relevant disciplinary standards.
National Archives of Australia - Meta... What? Metadata!
Dublin Core
Dublin Core is used here as an example of a metadata schema to illustrate the types of metadata you might need to capture but also the fact that metadata can be captured in a standardised way using controlled vocabularies.
Dublin Core is comprised of 15 “core” metadata elements. It is one of the simplest and most widely used metadata schema. The name "Dublin" is due to its origin at a 1995 invitational workshop in Dublin, Ohio, nothing to do with Dublin, Ireland unfortunately. Originally developed to describe web resources, Dublin Core has been used to describe a variety of physical and digital resources.
Built into the Dublin Core standard are definitions of each metadata element that state what kinds of information should be recorded where and how. Associated with many of the data elements are suggested controlled vocabularies.
All elements are optional and repeatable.
Dublin Core Element | Definition | Suggested Controlled Vocabulary |
---|---|---|
Tile | A name given to the resource. Typically, a Title will be a name by which the resource is formally known. | |
Creator | An entity primarily responsible for making the resource. Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity. | |
Date | A point or period of time associated with an event in the lifecycle of the resource. Date may be used to express temporal information at any level of granularity. |
W3C Note on Date and Time Formats (W3CDTF) ISO 8601 |
Description | An account of the resource. Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource. | |
Rights | Information about rights held in and over the resource. Typically, rights information includes a statement about various property rights associated with the resource, including intellectual property rights. | |
Type | The nature or genre of the resource. To describe the file format, physical medium, or dimensions of the resource, use the Format element. | DCMI Type Vocabulary |
Language | A language of the resource. |
RFC 4646 ISO 639 |
Contributor | An entity responsible for making contributions to the resource. Examples of a Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity. | |
Relation | A related resource. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system. | |
Source | A related resource from which the described resource is derived. The described resource may be derived from the related resource in whole or in part. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system. | |
Coverage |
The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant. Spatial topic and spatial applicability may be a named place or a location specified by its geographic coordinates. Temporal topic may be a named period, date, or date range. A jurisdiction may be a named administrative entity or a geographic place to which the resource applies. Where appropriate, named places or time periods can be used in preference to numeric identifiers such as sets of coordinates or date ranges. |
Thesaurus of Geographic Names (TGN) |
Subject | The topic of the resource. Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. | Library of Congress Subject Headings (LCSH) |
Identifier | An unambiguous reference to the resource within a given context. Recommended best practice is to identify the resource by means of a string conforming to a formal identification system. | |
Format | The file format, physical medium, or dimensions of the resource. Examples of dimensions include size and duration. | Internet Media Types (MIME) |
Publisher | An entity responsible for making the resource available. Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity. |
- Dublin Core Metadata GeneratorThis tool can be used for generating Dublin Core code that was easy to use, flexible with regards to adding and removing tags, and updated to the most recent Dublin Core standards.
- Dublin Core Metadata Initiative: Creating Metadata (2019)How to create content for Dublin Core Metadata
Metadata Example: Book
The example below uses basic Dublin Core metadata elements to describe the book using Title, Creator, Date, Description and Type.
Notice how the author's name is formatted. This is a standard way of formatting names within Libraries and facilitates alphabetical sorting by author. Additional metadata elements that could be added to increase the richness of the metadata include the ISBN, information about the publisher, the location of this specific book within the library, some subject terms, the language of the book, the physical size of the book (helpful to know when shelving), the number of pages etc.
Metadata Example: Digital Image
The example below uses basic Dublin Core metadata elements to describe a photograph using Title, Creator, Date, Description and Type.
Additional metadata that could be captured about the photo includes the location it was taken, the copyright owner, licence information, information about the camera used to take the photo, technical specification include image resolution ec.
Metadata Example: Interview
The example below uses basic Dublin Core metadata elements to describe a piece of research data, in this case an interview, using Title, Creator, Date, Description and Type.
In the case of a single interview it's likely that multiple 'digital objects' for each interview will exist, for example the audio recording, a text file with the transcript and perhaps an NVivo analysis file. These can all be linked using metadata. Other metadata elements that might be useful to record include the topics covered in the interview, some demographic information about the interviewee ec.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License