Research Data Management: Documentation and Metadata
Good documentation makes material understandable, verifiable and reusable (by you or by others). Good documentation ensures that:
- Data can be understood during a research project and in the future
- Anyone re-using the data can interpret it correctly
- It can be searched for and retrieved efficiently by users of data centres and repositories
John MacInnes, Professor of Sociology, The University of Edinburgh, talks about the importance of good documentation in secondary data analysis:
‘Metadata’ is another form of documentation and is simply ‘data about data’. It is related to the broader contextual information that describes your data, but is usually more structured in that it conforms to set standards and is machine readable. One typical use of metadata is to create a catalogue record for a dataset held in an archive. By using a standard set of tags, an automatic system can tell where to locate the information about the title, creator, description, etc. This in turn helps to raise the visibility of your research by making it easier for others to learn about it (e.g. via a search engine or online catalogue), cite it and use it.
You need to consider how you will create or capture these metadata, what form the metadata will take, to what extent the metadata creation will be automated, and which metadata standard you will use.
Types of Documentation
Information about a file or dataset can be included within the data or document itself. For digital data sets, this means that the documentation can sit in separate files (for example text files) or be integrated into the data file(s), as a header or at specified locations in the file. Examples of embedded documentation include:
- Code, field and label descriptions
- Descriptive headers or summaries
- Recording information in the Document Properties function of a file (Microsoft)
- You can use the fields in Document Properties to add contextual information to your MS Office documents. Not only will this help you keep your files organised and possible to interpret, but it will also allow you to sort folders by properties that you have added and search for documents with particular properties.
This is information in separate files that accompanies data in order to provide context, explanation, or instructions on confidentiality and data use or reuse. Examples of supporting documentation include:
- Information about the project and data creators;
- Working papers or laboratory notebooks
- Questionnaires or interview guides
- Details on how the data were created, analysed, anonymised etc;
- Final project reports and publications
Lab notebooks, whether in print or electronic form, are a critical component of tracking and recording research. Consistent documentation of your research methods, calculations, and results is important not only for your personal use, but will help when you publish or otherwise share research, and when others want to reproduce what you have done.
Listed below are links to several guidelines. Please let us know if there are other guidelines that are used in your lab, School, or Research Institute:
Creating Useful Documentation
Documentation is best created alongside the data project, as it is easier to capture it then, rather than trying to remember to do things at a later stage. Make sure that there are strong links between your data and the associated documentation, e.g.:
- Include information within the data or document itself, e.g. in the document properties function of a file or the file header;
- Keep a database of metadata with links to files;
- Store a readme.txt file alongside the data which provides basic explanatory details;
- Record relevant context in lab notebooks or associated papers and reports;
- Link to websites or web pages which explain the context of the research.
Types of Metadata
There are three broad categories of metadata:
- Descriptive - common fields such as title, author, abstract, keywords which help users to discover online sources through searching and browsing.
- Administrative - preservation, rights management, and technical metadata about formats.
- Structural - how different components of a set of associated data relate to one another, such as a schema describing relations between tables in a database.
Metadata Standards provide specific data fields or elements to be used in describing data for a particular use.
Some research fields have predefined metadata standards. See further resources below to find a suitable metadata standard for your data.