Data management planning

Data management and planning are integral parts of responsible research conduct. Please see below for University of Helsinki instructions and recommendations.

Careful planning minimises risks

A data management plan (DMP) describes the management of research datasets throughout their lifecycle. The plan helps you identify and anticipate risks involved in data management, take into account legal requirements, and ensure sufficient data protection and information security. When writing a data management plan, you must determine how the use and preservation of data as well as their authorship will be agreed.

A data management plan is part of a research plan

Whereas a research plan describes, for example, what data will be analysed and how, a data management plan explains how data will be managed and their further use enabled. A data management plan is a dynamic document updated throughout the research project. At its best, it is a practical and easy-to-understand guide ensuring the quality and integrity of data.

University of Helsinki instructions for data management planning

Write your data management plan using , an open tool designed for this purpose. We recommend that you follow the instructions below, as they contain more detailed information on University of Helsinki practices than the . To avoid overlaps, your data management plan can refer to your research plan, and vice versa. Please note the following:

Follow the requirements of your organisation or funder.
Answer at least the main questions. If a question is not applicable to your research, explain why.
Include all relevant background information in your plan, including applicant names, project titles, numerical codes and version details.

1. Research data

Discuss the following in your answer:

What kind of data will be used and produced in the project? If you use sensitive data, see the section: Guidance for Sensitive Data
In which file formats will the data be?
Approximately how much data will the project have, e.g. in gigabytes or the number of samples?
Will you use or develop special software?

Tips for best practices

List your data in the following way using bullet points or a table. The plan is based on the described data types. If you use categorization or abbreviations to describe the data, it will be easier for you to refer to the specific dataset in the rest of the plan.

Example of datatypes in a list format:

1. Data collected in this project

Questionnaire x, file format .pdf, size 5 GB
DNA samples (clarify the origin, human or another organism), physical sample, size n=500
Pictures/videos about x, file format .jpg, .avi, size 1 TB

2. Data produced as an outcome of the process

Analyses of questionnaire x, .pdf, .xlsx, 2 GB
DNA sequences/analyses, FASTA, .txt, .xlsx, 2 TB
Documentation of the data (survey form, codebook, laboratory notebooks, readme files)

3. Previously collected existing data reused in this project

Samples from the Biobank
Data from Statistics Finland, database, 10 GB
Survey data from Finnish Social Science database Aila
Interviews or language corpus from the Language Bank of Finland

It is essential to identify sensitive data types, as data management planning includes recognizing and managing the risks involved with such data. If your data contains personal data, you need to identify the . More information can be found in the (Flamma) and in .

Sensitive data is information that could cause damage if revealed. Such data are:

Personal data:
- Personal data includes all identifiers from which a person is identifiable directly or indirectly.
- Direct identifiers: name, phone number, social security number, picture, voice, fingerprint, dental chart, etc
- Indirect identifiers: gender, age, education, profession, nationality, work history, system log history, marital status, residence information, car license number, opinion, psychological or physical feature etc.
Sensitive personal data:
- Special categories of personal data:
  - Data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs or trade union membership
  - Genetic data
  - Biometric data processed for the purpose of uniquely identifying a person
  - Data concerning health
  - Data concerning a person’s sex life or sexual orientation
- Other sensitive personal data:
  - Data describing the economic or social status
  - Location data
  - Communication data
  - Behaviour
  - Other data that is particularly personal e.g., notes and diaries
Sensitive information about species, such as (FIN).
Other confidential information, such as patents, military information, organizational information, or trade secrets.

Discuss the risks involved in controlling data integrity and quality, as well as how they are managed. Notice that data quality and the quality of research methods are two distinct issues.

Tips for best practices

Describe the following practices, if they are or will be in use:

What tools for data management do you use, e.g. electronic lab notebook or digital survey form?
How the research team is familiarized with the RDM practices?
Are the methods validated, or are quality control pipelines in use?
Are transcriptions of audio or video interviews checked by someone other than the transcriber?
Are checksums used?
Digitizing of analogue or physical material should be done with sufficient accuracy.
In all conversions, maintaining the original information content should be ensured.
Discuss how affect data quality.

2. Ethical and Legal Compliance

Discuss the following in your answer:

Does your data include personal data?
Are there any aspects of your data covered by general or discipline-specific research ethics guidelines?
Do you need an for your study, which is always carried out prior to data collection, if necessary?
What kind of research permits do you need? E.g. , research permit of the organisation being researched?
Do you work with other confidential or sensitive data than described above (e.g., (FIN), conservation areas, military information)?
Are there Intellectual Property Rights (IPR) such as copyright or patent right matters involved with the data?
Are there any aspects of your research that might be subject to ?

Describe how you will maintain high ethical standards and comply with relevant legislation when managing your research data. Describe what are the risks involved, and how are they managed.

Justify why you have the right to collect, handle, and preserve data that involves ethical issues. For example, explain if you have passed an ethical review, and describe how you will ask for informed consent from the potential research subjects to participate in the study.
If you handle
- Specify the legal basis for processing of personal data (research carried out in the public interest)
- Inform your research subjects (Openness towards the subjects/participants in the research project.)
- Describe, what kind of personal data you need and why
- Describe, how you will secure the processed data and the privacy of the data subjects (examinees) and if needed, how you perform data .
- Assess the risk to data subjects that the processing of their data may cause
- Assess the need for data protection impact assessment (with the help of )
- Perform the if needed
- Assess how you will fulfill (including informing)
- Describe, how you will take care of the erasure of unnecessary data and
- If you transfer or disclose data for processing outside of the EU, please describe how the legality of the transfer and data subject rights will be taken care of. Data transfers outside the EU mean e.g. that you disclose personal data for partners that are located outside the EU or if you use a cloud service which servers are located outside the EU for data processing.

Additional information

You can use the app to store all the documents related to your research, including those required for ethical review.

There are different requirements for informing participants and documenting the processing of personal data. You can find out more about these in the links below:

(Flamma)
(Data management guidelines, FSD)
Research ethics on and on the

Describe what has been agreed about data usage rights. Consider if there are rights belonging to a third party. Anticipate what licenses will be used when data is opened.

Guidance about data ownership and licenses

Ensure that the necessary agreements have been made in the beginning of the project (data ownership, authorship and results).
- Many funding agencies (Research Council of Finland, EU) require data ownership to be transferred to the university.
- (Flamma)
Use a license when opening your data for reuse (e.g. research data, code, software).
- The UH recommends license which is open as possible. Good scientific practice still stipulates that the author be mentioned. Alternative licenses:
- UH Library Open Access
- or licenses and other software licenses.

Additional information

In the you can store all documents related to your research project, for example various agreements and research permits.

(RCR)

(UH Data Support)

3. Documentation and metadata

Metadata is the documentation and description of research data.

Metadata standards are uniformal models of data documentation.

The documentation describes who collected the data and how it was collected. When, where, and by which means was it collected? How has it been processed? Metadata can include information about test arrangements, methods of analysis or research environments.

“The documentation and metadata associated with research data should follow discipline-specific standards to enable the reuse and further enrichment of the data in future research projects. Metadata associated with research data must be published whenever possible, either in national or international metadata services.” (quoted from )

The documentation during the research project includes, e.g. explanations of variables and codes (data dictionary, codebooks) and readme-files. Documentation also includes file naming conventions, version control and directory structure.

After the project, the research data is published, archived, or listed in a data repository or in a metadata catalogue. For this, the data needs to be described as a whole or, e.g., by data types.

Tips for best practices

Please note that the documentation – file names, variables, and other metadata – might include personal or sensitive data.
Get acquainted with .
Plan documentation prior to data collection. Begin documentation and metadata creation at the start of the project and continue with it until the end of the project.
Data repositories often require the use of a specific metadata standard. if there is a discipline-specific metadata standard or metadata model.
If there is no suitable metadata standard, you can create a readme-type metadata file.
Use generally accepted vocabularies to describe the data. You can find suitable and accepted terms from or lookup service.
For documenting codes and algorithms, use the university’s .
Open the metadata in a way others can access it, e.g. in Etsin. You can check the requirements for metadata in Etsin from .

Additional information:

FSD’s .

4. Storing data and access control

Consider the following questions:

Where will your data be stored and during the project?
Who is responsible for backups?
Make a plan with your partners and ensure secure data transfer.

Opening, publishing, and archiving data after the project will be described in section five.

Tips for best practices

Use the IT Services provided and maintained by the University of Helsinki: .
- More information: & for data storage.
- For example, prefer the personal and group storage spaces provided by the UH Centre for Information Technology.
- is convenient for sharing and collaboration if the data is not sensitive or confidential.
- Do not use the computer's hard drive or USB drives as the main storing option.

Make sure your project has sufficient storage space. If not, please contact at tel. +358 (0)2 941 55555 or .

If you work with sensitive data:
- Be sure that your storage is safe enough for the data, e.g. a dedicated UH or CSC secure storage space (Umpio, storage server, ePouta...).
- Encryption: If needed, particularly mobile devices, portable and external storage devices should be encrypted for use, e.g., .
- Please, be in contact with if you are unsure about data protection.

Additional information

Answer the following questions:

Who is responsible for controlling access to the data?
How will the access control be carried out? Is there an IT solution (e.g., password protection, usage logs, or some physical solution (file cabinet) in use?
Who in the research group has access to which data?
Why has each access right (editing, watching, deleting) been awarded?
Tell how information security and the risks from sensitive data have been taken into account. Will sensitive data be stored in an encrypted form? More tips are below.

Tips for best practices

If you use a personal or shared drive, you can easily control access rights.
Access control of sensitive data should be well considered. Data handling and transferring needs to be in line with permissions.
Access control: There must be a list of users and all rights granted, and a procedure for withdrawing rights.
Monitoring: How will data usage be monitored during the study: can the technical equipment log who used, when, and what data? Ask IT services what kind of automatic usage and change log is provided.
Security of the premises: Check the locking options of workspaces, safe lockers and cabinets, and camera and access surveillance.

Additional information

National Cyber Security Centre:

at tel. +358 (0)2 941 55555 or

5. Opening data and long-term preservation after the research project

Please answer the following questions. You can refer to the table in section General Description of Research Data, if you used one:

What part of the data will be opened/published?
Where will the data be opened? Name the repositories - if possible.
When the data can be opened/published ?
Will some part of the data be destroyed?
Do some parts of the data require anonymisation or pseudonymisation before opening?

If your data cannot be opened:

why data cannot be opened?
where the project metadata will be opened?
Even though data including personal data cannot be opened, its , which do not contain sensitive information, should be opened.

Tips for opening data containing personal data

Opening and sharing personal data are dictated for example by:
- how the research participants are informed about opening data (if the data has been gathered directly from research participants) ?
- If the research participants are directly identifiable, the explicit consent of the research participants is required for the opening the data.
- is further use of the data restricted by administrative authrorisations?
- When opening, you must ensure that the data is properly protected, pseudonymized or anonymized
- If you are planning on sharing personal data, please contact the University research lawyers (), if necessary.

Tips for best practices

Choose suitable repositories for sharing and opening your data already at the beginning of the project. Check that your data fulfills the repository requirements.
- You can search for suitable repositories from: ,
- Prefer discipline spesific data repositories. If you can not find any, you can use general repositories such as , , .
If you cannot open the data, open your metadata, e.g., in or in the national .

Additional information

(Responsible Research)

Discuss where data with long-term value is archived and for how long.

What part of the data is archived?
Where will it be archived?
For how long will the data be preserved?
Are there costs related to archiving? Who takes care of them?

An archiving plan is part of research quality and transparency.

Tips for best practices

When data is created, it is important to consider how long it will be preserved.
Check publisher's requirements for data preservation, if you plan to publish in a journal that demands opening your data.
Check discipline-specific and funder's requirements for data preservation .
Personal data may also be archived.
- The intended life cycle of the data, including the possible further use of the data after the end of the research project, must be openly communicated to the research subjects. This also applies to pseudonymised and anonymised data.
- The identifiable data should, be removed unless there are proper grounds for archiving them.
- The appropriate protection of personal data, ie who has access to the data and why, must continue to be taken into account when archiving the material.
- The UH offers guidance on safe data storage and preservation solutions contact .
Biological samples can be stored in biobanks (if the study subjects have granted a permission to do so)
Even if the project does not generate data requiring long-term preservation, the data should be kept for the duration of the verification of the results, which varies from discipline to discipline (generally at least 5 years).
offers a location for 5 to 15 years for digital research datasets produced at the University.
is a digital preservation service for nationally valuable data for dozens or hundreds of years.

Links to general guides and additional information
(Zenodo)
(DCC, UK)
(Flamma)
(Data management guidelines, FSD)

6. Data management responsibilities and resources

Summarise and describe the roles and responsibilities here. Answer the following questions:

Are data management responsibilities allocated to one person, or is the whole research group involved?
How do you share tasks between different parties if you are working in a research consortium?
Who is responsible for data protection?
If the data contains personal data, list the persons and organizations allowed to handle personal information and their different roles
Who is responsible for data protection and access to data?

Tips for best practices

When you are managing the data follow shared practices (documentation, metadata, storing and sharing)
- Make sure that everyone is trained in the necessary practices and that everyone follows the same practices.
- Note who is responsible for updating the DMP document when you make decisions or changes in your practice.
- List the persons or organizations responsible for different data management tasks.
- You need to name the group members / colleagues / persons allowed to handle personal data and their different roles ()
- Which one of you or the organizations involved will be responsible for the data after the project?

Describe what resources (time and costs) are needed for data management? Thorough planning at the start and during the project means less work at the end when the data is prepared for opening and preservation.

Tips for best practices

Estimate whether expert help or an assistant is needed for data management, data preservation, and data sharing tasks.
Give an estimate of how much time is needed for data documentation and cleaning to prepare the data (not results) to be opened: 1–2 h weekly, one day per month, 1–2 weeks before publishing, or some other time estimate.
- Data documentation and cleaning means, for example, producing metadata, anonymising sensitive data, arranging data, transferring data etc.
- It is recommended to keep documentation up to date throughout the project life cycle.
Allocate time and funds also if you need to anonymize, protect or destruct sensitive data.
Specify your data management costs in the budget according to funder requirements.