Active Research Data Management

Introduction

During the lifetime of a research project different types of data and information are generated. Examples include images acquired with a microscope, genome sequences, observations performed during a field study, responses to a questionnaire or interview and numerical results obtained by simulations. Furthermore, primary data are typically processed and analysed to generate derived data sets, which form the basis of scientific publications (journal papers, theses etc.).

Active research data management (ARDM) refers to the skills and tools required to keep data and related information organised and safely backed up during the lifetime of a research project. If a project involves large amounts of data that are acquired by different researchers (maybe even in different locations), ARDM can pose significant challenges.
 

What data need to be managed?

For research to be reproducible, data management needs to extend beyond primary (‘raw’) data to include also processed data, metadata, analysis results, experimental descriptions and notes as well as materials, samples and protocols. Primary and processed data can either be stored in a conventional file-based system or directly ingested into a data management system (see Resources for ARDM). ETHZ provides services for active research data management to all its research labs.

Metadata provides important information about the content of the data and how they were generated, allowing others (including yourself in the future) to use and interpret the data. Metadata should be added as early as possible, ideally at the time of data acquisition. Some file formats support metadata tags (e.g. in file headers), otherwise metadata can be added as separate files (in plain text or using file formats such as JSON or XML).

Descriptions of experimental procedures, materials, samples and analysis workflows are typically text-based and can be paper-based or in electronic format (see Resources for ARDM). With increasing data amounts and complexity in modern research laboratories, paper-based tracking of information has become difficult. Electronic tracking of information is preferable because it allows for easy backup as well as user access and version control (see Resources for ARDM). Moreover, it allows an easier link to research data, which are mostly in digital format.

JavaScript has been disabled in your browser