A Data Management Plan (DMP) should describe how data is managed during as well as after the active phase of the research project. The plan should be updated as the research project evolves.
The DMP is part of a research plan. To avoid overlap between the DMP and the research plan, you can refer from one document to the other. Introduce data analysis and other methods in your research plan.
In the DMP data is understood as a broad term including:
You can use DMPTuuli, an online tool, to create your data management plan. The list and content below works as a basic guideline to University of Helsinki guidance for data management plan. Open DMPs from UH researchers can be found from Zenodo.
What data will be used and produced in the project? In which file formats will the data be in? Approximately how much data will the project have? Will you use or develop special software?
If you use sensitive data, see the recommendations below (Guidance for sensitive data). Categorise your data in the following way using bullet points or a table, and use the same categorisation in all phases of your plan. Your answer to this question forms a general structure for the rest of the plan.
Data collected for this project
Data produced as an outcome of the process
Previously collected existing data reused in this project
Data type, Source, File format, Sensitivity, Size
Questionnaire X, Gathered Data, docx; txt, No/Yes, 1 Gt
Analysis for the questionnaire, data produced, .xslx, tif, 100 Mb
DNA samples, data reused from Biobank
It is important to identify sensitive data types, as planning data management includes recognising and managing the risks involved with such data. If you work with personal information, identify the controller. More information can be found in the Data protection guide for researchers (Flamma).
Sensitive data is information that could cause damage if revealed.
What risks are involved in controlling data integrity and quality, as well as how are the risks managed. Notice that data quality and the quality of research methods are two distinct things.
Does your data include personal information? Does your work with animals require an ethical permit? Do you work with other confidential or sensitive data than described above (e.g., endangered species (FIN), conservation areas, military information)?
Describe how you will maintain high ethical standards and comply with relevant legislation when managing your research data. What are the risks involved, and how are they managed?
NB: Most of the links below require logging in with the UH user account.
Describe what has been agreed about data usage rights. Consider if there are rights belonging to a third party. Anticipate what licenses will be used when data is opened.
Documentation means describing the data, i.e., these documents explain what data the project has and where the data originates from.
Documentation includes data dictionaries (explaining variables and codes) and readme files. Other important issues include file naming conventions, version control, and directory structure. There are standard methods available for documentation called metadata standards, which should be used if suitable for the data. These will increase the value of the data by making it easier to reuse.
Where will your data be stored and backed up during the project? Who is responsible for backups? Make a plan with your partners and ensure secure data transfer.
What part of the data will be opened / published? Where will the data be opened? Name the repositories. When will the data be available? Will some part of the data be destroyed?
If your data cannot be opened, explain why, and tell where the project metadata will be opened.
Tips for opening data containing personal information
Tips for best practices
Long-term preservation means that data is preserved more than 25 years. If your data has long-term value:
An archiving plan is part of research quality and transparency.
Who is responsible for data management tasks? Who is responsible for data protection and information security as well as controlling them?
Describe what resources (time and workload) are needed for data management? The better the planning for data management in the beginning of the project, the less work is needed when data is opened and preserved.