Research data management

We provide assistance in research data management through out the research life cycle including data organization, storing and sharing. On this page you can find guidance to the above mentioned topics and how to answer the general questions of a Data Management Plan (DMP).

Every member of the University of Helsinki community is responsible for good data management. The University provides research data infrastructures that includes tools and services for supporting the management, use, findability and sharing of data as well as with the capacity for storage, preservation, computing and processing.

Data Support at the University of Helsinki assists researchers in the management of research data. Data Support is a network of experts from the university library, IT Services, Central Archives, Research Affairs, Personnel Services, and Legal Affairs. You can contact us by email: datasupport@helsinki.fi

On this page you can find University of Helsinki guidance for research data management.

RDM_services_UH_grey

CC BY 4.0 University of Helsinki

 

A Data Management Plan (DMP) should describe how data is managed during as well as after the active phase of the research project. The plan should be updated as the research project evolves.

The DMP is part of a research plan. To avoid overlap between the DMP and the research plan, you can refer from one document to the other. Introduce data analysis and other methods in your research plan.

In the DMP data is understood as a broad term including:

  • data collected by various methods (such as surveys, interviews, measurements, imaging techniques etc.),
  • data produced during the research (such as analysis results),
  • research sources (such as archive material), and
  • notes and field notes, and
  • source code and software.

You can use DMPTuuli, an online tool, to create your data management plan. Open DMPs from UH researchers can be found from Zenodo.

DMP whats in it for me

CC BY 4.0 University of Helsinki

General description of research data

What data will be used and produced in the project? In which file formats will the data be in? Approximately how much data will the project have? Will you use or develop special software?

Tips for best practices

If you use sensitive data, see the recommendations below the examples. Categorise your data in the following way using bullet points or a table, and use the same categorisation in all phases of your plan. Your answer to this question forms a general structure for the rest of the plan.

Example 1:

Data collected for this project

  • Questionnaire x, file format .pdf, size 5 Gb
  • DNA samples (n=500)
  • Pictures/videos about x, file format .jpg, .avi, size 1 Tb

Data produced as an outcome of the process

  • Analyses of questionnaire x, .pdf, .xslx, 1 Gb
  • DNA sequences/analyses, FASTA, .txt, .xslx, 2 Tb
  • Documentation of the data (readme files, data dictionary, laboratory notebooks)

Previously collected existing data reused in this project

  • Samples from the Biobank
  • Data from Statistics Finland, database, 10 Gb
  • Survey data from Finnish Social Science database Aila

  • Interviews or language corpus from the Language Bank of Finland

Example 2:

Data type

Source

File format

Sensitivity

Size

Questionnaire x

data collected

.csv, .txt, .docx

no/yes

1 Gb

Analysis of questionnaire x

data produced

.xslx, .tif

 

100 Mb

DNA samples

data reused from Biobank

 

 

 

Guidance for sensitive data

It is important to identify sensitive data types, as planning data management includes recognising and managing the risks involved with such data. If you work with personal information, identify the controller. More information can be found in the Data protection guide for researchers (Flamma).

Sensitive data is information that could cause damage if revealed.

  • Personal data: 
    • Personal information includes all identifiers from which a person is identifiable directly or indirectly.
    • Direct identifiers: name, phone number, social security number, picture, voice, fingerprint, dental chart, etc
    • Indirect identifiers: gender, age, education, profession, nationality, work history, system log history, marital status, residence information, car license number, etc.
  • Sensitive personal information:
    • Special categories of personal data:
      • Data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs or trade union membership 
      • Genetic data 
      • Biometric data processed for the purpose of uniquely identifying a person 
      • Data concerning health
      • Data concerning a person’s sex life or sexual orientation
    • Other sensitive personal data: 
      • Data describing economic or social status
      • Location data
      • Communication data
      • Behaviour
      • Other data that is particularly personal e.g, notes and diaries
  • Sensitive information about species, such as endangered animals, plants, nature conservation areas, or biosafety (FIN).
  • Other confidential information, such as patents, military information, organizational information, or trade secrets.

Consistency and quality of data

What risks are involved in controlling data integrity and quality, as well as how are the risks managed. Notice that data quality and the quality of research methods are two distinct things.

Tips for best practices

  • Do you use data management tools, such as a database for data collection?
  • Has everyone handling the data been introduced to the best practices?
  • Are the methods validated, or are quality control pipelines in use?
  • Are transcriptions of audio or video interviews checked by someone other than the transcriber?
  • Are checksums used (software)?
  • Digitizing of analog or physical material should be done with sufficient accuracy.
  • In all conversions, maintaining the original information content should be ensured.
  • Discuss how minimisation, pseudonymisation, and anonymisation affect data quality.

Ethical issues related to data management

Does your data include personal information? Does your work with animals require an ethical permit? Do you work with other confidential or sensitive data than described above (e.g., endangered species (FIN), conservation areas, military information)?

Describe how you will maintain high ethical standards and comply with relevant legislation when managing your research data.  What are the risks involved, and how are they managed?

Tips for best practices

  • Justify why you have the right to collect, handle, and preserve data that involves ethical issues, for example, that you have passed an ethical review.
  • If you handle personal information (more tips behind the link):
    • Mention, who are the parties processing data and who or which organisation is the controller (or is there a joint controllership between the parties)
    • Describe, what kind of personal data you need and why
    • Specify the legal basis for processing of personal data (research carried out in the public interest, consent?)
    • Assess the risk to data subjects that the processing of their data may cause
    • Assess the need for data protection impact assessment (with the help of Preliminary assessment for data protection - form)
    • Perform the data protection impact assessment if needed
    • Describe, how you will secure the processed data and the privacy of the data subjects (examinees) and if needed, how you perform data anonymisation or pseudonymisation.
    • Assess how you will fulfill the data subject rights (including informing)
    • Describe, how you will take care of the erasure of unnecessary data and what happens after your research 

Legal compliance related to data management 

Describe what has been agreed about data usage rights. Consider if there are rights belonging to a third party. Anticipate what licenses will be used when data is opened.

Guidance about data ownership and licenses

  • Data ownership rights depend on research funding. Ensure that the necessary agreements have been made in the beginning of the project (data ownership & authorship). 
  • Use a license when opening your data for reuse  (e.g. research data, code, software).
    • The UH recommends the CC0 license for research data, with which you waive your rights to the work. You retain your moral rights, and good scientific practice still stipulates that the author be mentioned. Alternative licenses:
    • UH Library Open Access Creative Commons license guide
    • GNU or MIT licenses

Documentation means describing the data, i.e., these documents explain what data the project has and where the data originates from.

Documentation includes data dictionaries (explaining variables and codes) and readme files. Other important issues include file naming conventions, version control, and directory structure. There are standard methods available for documentation called metadata standards, which should be used if suitable for the data. These will increase the value of the data by making it easier to reuse.

Tips for best practices

  • Metadata standards: Many storage services require data to be saved using a standard. Hence, if you know where you will open data, check their standard requirements.
  • Data management software, i.e., databases and an electronic laboratory notebook
  • Data dictionaries, which explain variables, or code books, which gather all of the codes and calculations used.
  • File naming conventions
  • Directory structure:  Remember, if metadata, i.e., file, directory, or variable names, include sensitive data or personal information, they need to be handled accordingly.
  • Readme file(s) provide information about data files to ensure they are interpreted correctly.
  • Version control

Storing and back up during the research project

Where will your data be stored and backed up during the project? Who is responsible for backups? Make a plan with your partners and ensure secure data transfer.

Tips for best practices

  • Use the IT Services provided and maintained by the University of Helsinki: storing solution-excel.
    • More information: Helpdesk CSC services for data storage
    • For example use personal / group storage spaces, which are maintained and backed up (every hour) by the UH Centre for Information Technology 
    • Cloud storage options: Use the UH OneDrive for business or the Teams cloud instead of commercial services (e.g., Google drive/Dropbox).
    • Do NOT USE external hard drives as the main storing option.

Does your project have sufficient storage space? If not, please contact Helpdesk at tel. +358 (0)2 941 55555 or helpdesk@helsinki.fi.

  • If you work with sensitive data:
    • Be sure that your storage is safe enough for the data, e.g. a dedicated UH or CSC secure storage space (Umpio, storage server, ePouta...).
    • Do not use a cloud storage due to its insufficient data protection!
    • Encryption: If needed, particularly mobile devices, portable and external storage devices should be encrypted for use, e.g., Cryptomator.
    • Please, be in contact with datasupport@helsinki.fi if you are unsure about data protection.

Access control

  1. Who is responsible for controlling access to the data?
  2. How will the access control be carried out? Is there an IT solution (e.g., password protection, usage logs, or some physical solution (file cabinet) in use?
  3. Who in the research group has access to which data?
  4. Why has each access right (editing, watching, deleting) been awarded?
  5. Tell how information security and the risks from sensitive data have been taken into account. Will sensitive data be stored in an encrypted form? More tips are below.

Tips for best practices

  • If you use a personal or shared network drive, you can easily control access rights.
  • Access control of sensitive data should be well considered. Data handling and transferring needs to be in line with permissions.
  • Access control: There must be a list of users and all rights granted, and a procedure for withdrawing rights.
  • Monitoring: How will data usage be monitored during the study: can the technical equipment log who used, when, and what data? Ask IT services what kind of automatic usage log is provided.
  • Security of the premises: Check the locking options of workspaces, safe lockers and cabinets, and camera and access surveillance

 

Cover photo by Public domain pictures.

Opening data

What part of the data will be opened / published? Where will the data be opened? Name the repositories. When will the data be available? Will some part of the data be destroyed?

If your data cannot be opened, explain why, and tell where the project metadata will be opened.

Tips for opening data containing personal information

  • Opening and sharing personal data are dictated for example by: what the research participants are informed about when the data are collected, whether the research participant has given their explicit consent, or in what form and for what purposes the information is to be opened or shared.
    • When opening, you must ensure that the information is properly protected and, where possible, pseudonymize or anonymize the information.
    • The consent of the subject is required for the opening of the material, from which the research participants are directly identifiable.
    • In some cases, the material may be shared for the originally intended purpose. If you are planning on sharing personal data, please contact the University research lawyers (researchlawyers@helsinki.fi).
  • Even though data including personal information cannot be opened, its metadata, which do not contain sensitive information, should be opened.

Tips for best practices

  • Choose suitable repositories for sharing and opening your data already at the beginning of the project. Check that your data fulfills the repository requirements.
  • “As a rule, research data produced under the auspices of the University of Helsinki and related to published research results are open and available for shared use. The discoverability and citability of research data must be ensured.” (University of Helsinki research data policy)
  • Where to open data?
    • Check the recommendations of the publishers, learned societies, and funders of your own field.
    • Where have you or your colleagues in the same field published data?
    • Look for repositories: re3data.org.
    • General repositories: IDA, Zenodo, DryadFigshare.
  • If you cannot open the data, open your metadata about your project data, e.g., in Zenodo or in national Etsin.
  • Choose repositories using persistent identifiers (DOI, URN).
  • Remember to give your data a license (see 2.2)

Long-term preservation of data

Long-term preservation means that data is preserved more than 25 years. If your data has long-term value:

  1. What part of the data is archived?
  2. Where will it be archived?
  3. How long will the data be preserved?
  4. Are there some costs related to archiving? Who takes care of them?
  5. Will some part of the data be destroyed?

An archiving plan is part of research quality and transparency.

Tips for best practices

 

  • When data is created, it is important to consider how long it will be preserved.
  • Check publisher-related preservation time requirements, if you plan to publish in a journal that demands opening your data.
  •  
  • Check discipline-specific and funder-related preservation time requirements.
  • Personal data may also be archived. When transferring research material containing personal data to the archive, the identification of individuals should, as far as possible, be removed unless there are proper grounds for archiving them, due to the nature of the data. The subjects must also be informed of the archiving and the basis on which the archiving is based. The appropriate protection of personal data, ie who has access to the data and why, must continue to be taken into account when archiving the material.
    • The UH offers guidance on safe preservation methods. If you are preserving sensitive personal information, contact datasupport@helsinki.fi.
  • Biological samples can be stored in biobanks.
  • Fairdata-PAS is a preservation place for nationally valuable data for dozens or hundreds of years.
    • More information about UH service queue for Fairdata-PAS here.

Data management responsibilities

Who is responsible for data management tasks? Who is responsible for data protection and information security as well as controlling them? 

Tips for best practices

  • Are data management responsibilities allocated to one person, or is the whole research group involved? 
  • Who is responsible for ensuring that everyone has received the necessary training and everyone follows same practices.

Data management resources

Describe what resources (time and workload) are needed for data management? The better the planning for data management in the beginning of the project, the less work is needed when data is opened and preserved.

  • Estimate whether expert help/an assistant is needed for data management, data preservation, and data sharing tasks.
  • Give an estimate of how much time is needed for data documentation and cleaning to prepare the data (not results) to be opened: 1–2 h weekly, one day per month, 1–2 weeks before publishing, or some other time estimate.
    • Data documentation and cleaning means, for example, producing metadata (section 3.1), anonymising sensitive data, arranging data, transferring data etc.
    • It is recommended to keep documentation up to date throughout the project life cycle.
  • Specify your data archiving, opening, and publishing costs in the budget according to funder requirements.

Cover photo by Pixabay.