Data management planning

Data management and planning are integral parts of responsible research conduct. Please see below for University of Helsinki instructions and recommendations.
Careful planning minimises risks

A data management plan (DMP) describes the management of research datasets throughout their lifecycle. The plan helps you identify and anticipate risks involved in data management, take into account legal requirements, and ensure sufficient data protection and information security. When writing a data management plan, you must determine how the use and preservation of data as well as their authorship will be agreed.

A data management plan is part of a research plan

Whereas a research plan describes, for example, what data will be analysed and how, a data management plan explains how data will be managed and their further use enabled. A data management plan is a dynamic document updated throughout the research project. At its best, it is a practical and easy-to-understand guide ensuring the quality and integrity of data.

University of Helsinki instructions for data management planning

Write your data management plan using DMPTuuli, an open tool designed for this purpose. We recommend that you follow the instructions below, as they contain more detailed information on University of Helsinki practices than the general Finnish DMP guidance. To avoid overlaps, your data management plan can refer to your research plan, and vice versa. Please note the following:

  • Follow the requirements of your organisation or funder.
  • Answer at least the main questions. If a question is not applicable to your research, explain why.
  • Include all relevant background information in your plan, including applicant names, project titles, numerical codes and version details.
1. Research data

Discuss the following in your answer:

  • What kind of data will be used and produced in the project? If you use sensitive data, see the section: Guidance for Sensitive Data
  • In which file formats will the data be?
  • Approximately how much data will the project have, e.g. in gigabytes or the number of samples?
  • Will you use or develop special software?

Tips for best practices

List your data in the following way using bullet points or a table. The plan is based on the described data types. If you use categorization or abbreviations to describe the data, it will be easier for you to refer to the specific dataset in the rest of the plan.

Example of datatypes in a list format:

1. Data collected in this project

  • Questionnaire x, file format .pdf, size 5 GB
  • DNA samples (clarify the origin, human or another organism), physical sample, size n=500
  • Pictures/videos about x, file format .jpg, .avi, size 1 TB

2. Data produced as an outcome of the process

  • Analyses of questionnaire x, .pdf, .xlsx, 2 GB
  • DNA sequences/analyses, FASTA, .txt, .xlsx, 2 TB
  • Documentation of the data (survey form, codebook, laboratory notebooks, readme files)

3. Previously collected existing data reused in this project

  • Samples from the Biobank
  • Data from Statistics Finland, database, 10 GB
  • Survey data from Finnish Social Science database Aila
  • Interviews or language corpus from the Language Bank of Finland

It is essential to identify sensitive data types, as data management planning includes recognizing and managing the risks involved with such data. If your data contains personal data, you need to identify the controller.  More information can be found in the Data protection guide for researchers (Flamma) and in Additional instructions for managing sensitive data.

Sensitive data is information that could cause damage if revealed. Such data are:

  • Personal data: 
    • Personal data includes all identifiers from which a person is identifiable directly or indirectly.
    • Direct identifiers: name, phone number, social security number, picture, voice, fingerprint, dental chart, etc
    • Indirect identifiers: gender, age, education, profession, nationality, work history, system log history, marital status, residence information, car license number, opinion, psychological or physical feature etc.
  • Sensitive personal data:
    • Special categories of personal data:
      • Data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs or trade union membership 
      • Genetic data 
      • Biometric data processed for the purpose of uniquely identifying a person 
      • Data concerning health
      • Data concerning a person’s sex life or sexual orientation
    • Other sensitive personal data: 
      • Data describing the economic or social status
      • Location data
      • Communication data
      • Behaviour
      • Other data that is particularly personal e.g., notes and diaries
  • Sensitive information about species, such as endangered animals, plants, nature conservation areas, or biosafety (FIN).
  • Other confidential information, such as patents, military information, organizational information, or trade secrets.

Discuss the risks involved in controlling data integrity and quality, as well as how they are managed. Notice that data quality and the quality of research methods are two distinct issues.

Tips for best practices

Describe the following practices, if they are or will be in use:

  • What tools for data management do you use, e.g. electronic lab notebook or digital survey form?
  • How the research team is familiarized with the RDM practices?
  • Are the methods validated, or are quality control pipelines in use?
  • Are transcriptions of audio or video interviews checked by someone other than the transcriber?
  • Are checksums used?
  • Digitizing of analogue or physical material should be done with sufficient accuracy.
  • In all conversions, maintaining the original information content should be ensured.
  • Discuss how minimisation, pseudonymisation, and anonymisation affect data quality.
2. Eth­ical and Legal Com­pli­ance

Discuss the following in your answer:

  • Does your data include personal data?
  • Are there any aspects of your data covered by general or discipline-specific research ethics guidelines?
  • Do you need an ethical review for your study, which is always carried out prior to data collection, if necessary?
  • What kind of research permits do you need? E.g. data permits by Findata, research permit of the organisation being researched?
  • Do you work with other confidential or sensitive data than described above (e.g., endangered species (FIN), conservation areas, military information)?
  • Are there Intellectual Property Rights (IPR) such as copyright or patent right matters involved with the data?

Describe how you will maintain high ethical standards and comply with relevant legislation when managing your research data. Describe what are the risks involved, and how are they managed.

  • Justify why you have the right to collect, handle, and preserve data that involves ethical issues. For example, explain if you have passed an ethical review, and describe how you will ask for informed consent from the potential research subjects to participate in the study.
  • If you handle personal data
    • Specify the legal basis for processing of personal data (research carried out in the public interest)
    • Inform your research subjects (Openness towards the subjects/participants in the research project.)
    • Describe, what kind of personal data you need and why
    • Describe, how you will secure the processed data and the privacy of the data subjects (examinees) and if needed, how you perform data anonymisation or pseudonymisation.
    • Assess the risk to data subjects that the processing of their data may cause
    • Assess the need for data protection impact assessment (with the help of Preliminary assessment for data protection - form)
    • Perform the data protection impact assessment if needed
    • Assess how you will fulfill the data subject rights (including informing)
    • Describe, how you will take care of the erasure of unnecessary data and what happens after your research 
    • If you transfer or disclose data for processing outside of the EU, please describe how the legality of the transfer and data subject rights will be taken care of. Data transfers outside the EU mean e.g. that you disclose personal data for partners that are located outside the EU or if you use a cloud service which servers are located outside the EU for data processing. 

Additional information

For example, other requirements apply to informing participants and documenting the processing of personal data. You can find more information about them in the links below.

Data protection guide for researchers (Flamma)
Informing research participants (Data management guidelines, FSD)
Research ethics on Flamma and on the university's external website

Describe what has been agreed about data usage rights. Consider if there are rights belonging to a third party. Anticipate what licenses will be used when data is opened.

Guidance about data ownership and licenses

  • Ensure that the necessary agreements have been made in the beginning of the project (data ownership, authorship and results). 
  • Use a license when opening your data for reuse  (e.g. research data, code, software).
    • The UH recommends license which is open as possible. Good scientific practice still stipulates that the author be mentioned. Alternative licenses:
    • UH Library Open Access Creative Commons license guide
    • GNU or MIT licenses and other software licenses.

Additional information
University of Helsinki research data policy

Responsible conduct of research (RCR)

3. Doc­u­ment­a­tion and metadata

Metadata is the documentation and description of research data.

Metadata standards are uniformal models of data documentation.

The documentation describes who collected the data and how it was collected. When, where, and by which means was it collected? How has it been processed? Metadata can include information about test arrangements, methods of analysis or research environments.

“The documentation and metadata associated with research data should follow discipline-specific standards to enable the reuse and further enrichment of the data in future research projects. Metadata associated with research data must be published whenever possible, either in national or international metadata services.” (quoted from Research Data Policy)

The documentation during the research project includes, e.g. explanations of variables and codes (data dictionary, codebooks) and readme-files. Documentation also includes file naming conventions, version control and directory structure.

After the project, the research data is published, archived, or listed in a data repository or in a metadata catalogue. For this, the data needs to be described as a whole or, e.g., by data types.

Tips for best practices

  • Please note that the documentation – file names, variables, and other metadata – might include personal or sensitive data.
  • Get acquainted with the University of Helsinki’s documentation guide.
  • Plan documentation prior to data collection. Begin documentation and metadata creation at the start of the project and continue with it until the end of the project.
  • Data repositories often require the use of a specific metadata standard. Check if there is a discipline-specific metadata standard or metadata model.
  • If there is no suitable metadata standard, you can create a readme-type metadata file.
  • Use generally accepted vocabularies to describe the data. You can find suitable and accepted terms from Finto or EMBL-EBI Ontology lookup service.
  • For documenting codes and algorithms, use the university’s GitLab-based version control service.
  • Open the metadata in a way others can access it, e.g. in Etsin. You can check the requirements for metadata in Etsin from Qvain-tool’s instructions.

Additional information:

4. Stor­ing data and ac­cess con­trol

Consider the following questions:

  • Where will your data be stored and backed up during the project?
  • Who is responsible for backups?
  • Make a plan with your partners and ensure secure data transfer.

Opening, publishing, and archiving data after the project will be described in section five.

Tips for best practices

Make sure your project has sufficient storage space. If not, please contact Helpdesk at tel. +358 (0)2 941 55555 or helpdesk@helsinki.fi.

  • If you work with sensitive data:
    • Be sure that your storage is safe enough for the data, e.g. a dedicated UH or CSC secure storage space (Umpio, storage server, ePouta...).
    • Encryption: If needed, particularly mobile devices, portable and external storage devices should be encrypted for use, e.g., Cryptomator.
    • Please, be in contact with datasupport@helsinki.fi if you are unsure about data protection.

Additional information
UH Research data management services

Answer the following questions:

  • Who is responsible for controlling access to the data?
  • How  will the access control be carried out? Is there an IT solution (e.g., password protection, usage logs, or some physical solution (file cabinet) in use?
  • Who in the research group has access to which data?
  • Why has each access right (editing, watching, deleting) been awarded?
  • Tell how information security and the risks from sensitive data have been taken into account. Will sensitive data be stored in an encrypted form? More tips are below.

Tips for best practices

  • If you use a personal or shared network drive, you can easily control access rights.
  • Access control of sensitive data should be well considered. Data handling and transferring needs to be in line with permissions.
  • Access control: There must be a list of users and all rights granted, and a procedure for withdrawing rights.
  • Monitoring: How will data usage be monitored during the study: can the technical equipment log who used, when, and what data? Ask IT services what kind of automatic usage and change log is provided.
  • Security of the premises: Check the locking options of workspaces, safe lockers and cabinets, and camera and access surveillance.

Additional information

National Cyber Security Centre: Ohje lokitietojen tallentamiseen ja hyödyntämiseen (FIN)

UH Helpdesk at tel. +358 (0)2 941 55555 or helpdesk@helsinki.fi

5. Open­ing data and long-term pre­ser­va­tion after the research pro­ject

Please answer the following questions. You can refer to the table in section General Description of Research Data, if you used one:

  • What part of the data will be opened/published?
  • Where will the data be opened? Name the repositories - if possible.
  • When the data can be opened/published ?
  • Will some part of the data be destroyed?
  • Do some parts of the data require anonymisation or pseudonymisation before opening?

If your data cannot be opened:

  • why data cannot be opened?
  • where the project metadata will be opened?
  • Even though data including personal data cannot be opened, its metadata, which do not contain sensitive information, should be opened.

Tips for opening data containing personal data

  • Opening and sharing personal data are dictated for example by:
    • how the research participants are informed about opening data (if the data has been gathered directly from research participants) ?
    • If the research participants are directly identifiable, the explicit consent of the research participants is required for the opening the data.
    • is further use of the data restricted by administrative authrorisations?
    • When opening, you must ensure that the data is properly protected, pseudonymized or anonymized
    • If you are planning on sharing personal data, please contact the University research lawyers (researchlawyers@helsinki.fi), if necessary. 

Tips for best practices

  • Choose suitable repositories for sharing and opening your data already at the beginning of the project. Check that your data fulfills the repository requirements.
    • You can search for suitable repositories from: re3data.org
    • Prefer discipline spesific data repositories. If you can not find any, you can use general repositories such as IDA, Zenodo, DryadFigshare.
  • If you cannot open the data, open your metadata, e.g., in Zenodo or in the national Etsin service.

Additional information

UH Research data policy
UH principles of open publishing
Tutkijan muistilista tutkimusdatan julkaisemiseen (Responsible Research)

 

Discuss where data with long-term value is archived and for how long.

  • What part of the data is archived?
  • Where will it be archived?
  • For how long will the data be preserved?
  • Are there costs related to archiving? Who takes care of them?

An archiving plan is part of research quality and transparency.

Tips for best practices

  • When data is created, it is important to consider how long it will be preserved.
  • Check publisher's requirements for data preservation, if you plan to publish in a journal that demands opening your data.
  • Check discipline-specific and funder's requirements for data preservation .
  • Personal data may also be archived.
    • The intended life cycle of the data, including the possible further use of the data after the end of the research project, must be openly communicated to the research subjects. This also applies to pseudonymised and anonymised data.
    • The identifiable data should, be removed unless there are proper grounds for archiving them. 
    • The appropriate protection of personal data, ie who has access to the data and why, must continue to be taken into account when archiving the material.
    • The UH offers guidance on safe data storage and preservation solutions contact datasupport@helsinki.fi.
  • Biological samples can be stored in biobanks (if the study subjects have granted a permission to do so)
  • Even if the project does not generate data requiring long-term preservation, the data should be kept for the duration of the verification of the results, which varies from discipline to discipline (generally at least 5 years).
  • The Databank of the University of Helsinki offers a location for 5 to 15 years for digital research datasets produced at the University. 
  • The Databank of the University of Helsinki is a preservation service for nationally valuable data for dozens or hundreds of years. 

Links to general guides and additional information

Five steps in deciding what data to keep (DCC, UK)
UH Archiving plan (Flamma)
Data disposal (Data management guidelines, FSD)

6. Data man­age­ment re­spons­ib­il­it­ies and re­sources

Summarise and describe the roles and responsibilities here. Answer the following questions:

  • Are data management responsibilities allocated to one person, or is the whole research group involved?
  • How do you share tasks between different parties if you are working in a research consortium?
  • Who is responsible for data protection?
  • If the data contains personal data, list the persons and organizations allowed to handle personal information and their different roles
  • Who is responsible for data protection and access to data?

Tips for best practices

  • When you are managing the data follow shared practices (documentation, metadata, storing and sharing)
    • Make sure that everyone is trained in the necessary practices and that everyone follows the same practices.  
    • Note who is responsible for updating the DMP document when you make decisions or changes in your practice.
    • List the persons or organizations responsible for different data management tasks. 
    • You need to name the group members / colleagues / persons allowed to handle personal  data and their different roles (controller, joint controller, processor)
    • Which one of you or the organizations involved will be responsible for the data after the project?

Describe what resources (time and costs) are needed for data management? Thorough planning at the start and during the project means less work at the end when the data is prepared for opening and preservation.

Tips for best practices

  • Estimate whether expert help or an assistant is needed for data management, data preservation, and data sharing tasks.
  • Give an estimate of how much time is needed for data documentation and cleaning to prepare the data (not results) to be opened: 1–2 h weekly, one day per month, 1–2 weeks before publishing, or some other time estimate.
    • Data documentation and cleaning means, for example, producing metadata, anonymising sensitive data, arranging data, transferring data etc.
    • It is recommended to keep documentation up to date throughout the project life cycle.
  • Allocate time and funds also if you need to  anonymize, protect or destruct sensitive data.

  • Specify your data management costs in the budget according to funder requirements.