Skip to main content

Research Data Management: Data Description

Bringing together University resources and services to facilitate researchers in the production of high quality data.

At a Glance

In a research context, examples of data include:

  • statistics
  • results of experiments
  • measurements
  • observations resulting from fieldwork
  • survey results
  • interview recordings
  • images

Legal Definition of Research Data

‘research data’ means documents in a digital form, other than scientific publications, which are collected or produced in the course of scientific research activities and are used as evidence in the research process, or are commonly accepted in the research community as necessary to validate research findings and results;

Article 2 (9) directive 2019/1024

Examples of Research Data

The following are examples of the types of research data you may need to manage throughout the research lifecycle and beyond:

  • Interviews, diaries, anthropological field notes, focus groups, answers to survey questions
  • Transcribed test responses
  • Coded numerical responses to surveys
  • Digital audio or video recordings
  • Digital images
  • Database contents
  • Digital models, algorithms or scripts
  • Maps & geospatial data
  • Ephemera
  • Archival material
  • Text documents, notes
  • Numerical data
  • Questionnaires, surveys, survey results
  • Audio and video recordings, photos
  • Database content (video, audio, text, images)
  • Mathematical models, algorithms
  • Software (scripts, input files ...)
  • Results of computer simulations
  • Laboratory protocols, methodological descriptions

Research Records

While not strictly research data the following research records may also be important to manage throughout the research lifecycle and beyond:

  • Correspondence (electronic mail and paper‐based correspondence)
  • Project files
  • Grant applications
  • Ethics applications
  • Technical reports
  • Research reports
  • Master lists
  • Signed consent forms

What are Research Data

This LibGuide is designed to help researchers effectively manage their research data throughout the research lifecycle and beyond. But what exactly do we mean when we refer to research data. The answer to this will depend on the discipline in question but at a basic level research data can be described as "the data needed to validate the results presented in scientific publications" or "the evidence used to inform or support research conclusions" (University of Sheffield).

In a research context, examples of data include statistics, results of experiments, measurements, observations resulting from fieldwork, survey results, interview recordings and images. 

The following classification of data was originally compiled by the Research Information Network and highlights the wide range of types that can exist:

  • Observational: data captured in real time that is usually unique and irreplaceable. For example, remote sensing data, survey data, field recordings, sample data
  • Experimental: data captured from lab equipment that is often reproducible, but can be expensive.  For example, gene sequences, chromatograms, magnetic field data
  • Models or simulations: data generated from test models where the model and metadata may be more important than output data from the model. For example, climate models, economic models
  • Derived or compiled: resulting from processing or combining ‘raw’ data, often reproducible, but may be expensive. For example, text and data mining, compiled databases, 3D models
  • Reference or canonical: a static or organic conglomeration or collection of datasets, probably published and curated. For example, gene sequence databanks, collection of letters or archive of historical images

File Formats & Standards

When choosing file formats for research data it's important to consider whether the format is: 

  • Open & non-proprietary
  • Ubiquitous
  • Uncompressed or lossless

File formats that are open or non-proprietary will tend to retain a good chance of being remaining accessible, even if the software that created them is no longer available. Specialised proprietary formats used only by a niche set of users may present problems for future use. Formats which are ubiquitous or have become the default standard within a discipline, whether proprietary or not, are also more likely to be maintained into the future. This is important whether you plan on sharing and archiving your data at the end of you research project or whether you simply want the data to remain accessible by yourself and other researchers in your department. 

  • Proprietary format: Photoshop .psd file
  • Open format: .tiff image file

Formats that are compressed or 'lossy' are often smaller in file size but the data are compressed as part of the encoding process, resulting in a data essentially being thrown away.

  • Lossy formats: .mp3 audio file, .jpeg image file
  • Lossless formats: .wav audio file, .tiff image file

 

Things to consider when choosing a file format:

  • How you plan to analyse your data
  • Which software and file formats you and your colleagues have used in the past
  • Any discipline specific norms or technical standards
  • Whether file formats are at risk of obsolescence because of their dependence on a particular technology.
  • Which formats are best to use for the long-term preservation of data
  • Whether important information might be lost by converting between different formats
  • The possibility of embedding metadata that describes content within the file itself, e.g. creator information, variable names and labels

Sometimes it is useful to store your data using one format for data collection and analysis and also in a more open or accessible format for sharing or archiving once your project is complete. If it is your intention to share your data our chosen Archive or Repository will likely have recommended file formats based on best practice within the disciplines they support.

 

Choosing a File Format: Useful Resources

If you aren't aware of any standards within your discipline the following is a good reference point:

  • Textual data: eXtensible Mark-up Language (XML) text according to an appropriate Document Type Definition (DTD) or schema (.xml), Plain text data, ASCII (.txt), PDF/A (.pdf, Archival PDF)
  • Tabular data with extensive metadata: Delimited text and command ('setup') file (SPSS, Stata, SAS, etc.) containing metadata information
  • Tabular data with minimal metadata (including spreadsheets): Comma-separated values (CSV) file (.csv)
  • Databases: eXtensible Mark-up Language (XML) text according to an appropriate Document Type Definition (DTD) or schema (.xml), Comma-separated values (CSV) file (.csv)
  • Images: TIFF version 6 uncompressed (.tif), JPEG (.jpeg, .jpg) (note: JPEGS are a 'lossy' format which lose information when re-saved, so only use them if you are not concerned about image quality)
  • Audio: Free Lossless Audio Codec (FLAC) (.flac), Waveform Audio Format (WAV) (.wav), MPEG-1 Audio Layer 3 (.mp3) but only if created in this format