Research data management

We provide assistance in research data management through out the research life cycle including data organization, storing and sharing. On this page you can find guidance to the above mentioned topics and how to answer the general questions of a Data Management Plan (DMP).

Every member of the University of Helsinki community is responsible for good data management. The University provides research data infrastructures that includes tools and services for supporting the management, use, findability and sharing of data as well as with the capacity for storage, preservation, computing and processing.

Data Support at the University of Helsinki assists researchers in the management of research data. Data Support is a network of experts from the university library, IT Services, Central Archives, Research Affairs, Personnel Services, and Legal Affairs. You can contact us by email: datasupport@helsinki.fi

On this page you can find University of Helsinki guidance for research data management.

RDM_services_UH_grey

 

A Data Management Plan (DMP) should describe how data is managed during as well as after the active phase of the research project. The plan should be updated as the research project evolves.

The DMP is part of a research plan. To avoid overlap between the DMP and the research plan, you can refer from one document to the other. Introduce data analysis and other methods in your research plan.

In the DMP data is understood as a broad term including:

  • data collected by various methods (such as surveys, interviews, measurements, imaging techniques etc.),
  • data produced during the research (such as analysis results),
  • research sources (such as archive material), and
  • source code and software.

You can use DMPTuuli, an online tool, to create your data management plan. Open DMPs from UH researchers can be found from Zenodo.

DMP whats in it for me

General description of research data

What data will be used and produced in the project? In which file formats will the data be in? Approximately how much data will the project have? Will you use or develop special software?

Tips for best practices

If you use sensitive data, see the recommendations below the examples. Categorise your data in the following way using bullet points or a table, and use the same categorisation in all phases of your plan. Your answer to this question forms a general structure for the rest of the plan.

Example 1:

Data collected for this project

  • Questionnaire x, file format .pdf, size 5 Gb
  • DNA samples (n=500)
  • Pictures/videos about x, file format .jpg, .avi, size 1 Tb

Data produced as an outcome of the process

  • Analyses of questionnaire x, .pdf, .xslx, 1 Gb
  • DNA sequences/analyses, FASTA, .txt, .xslx, 2 Tb
  • Documentation of the data (readme files, data dictionary, laboratory notebooks)

Previously collected existing data reused in this project

  • Samples from the Biobank
  • Data from Statistics Finland, database, 10 Gb

Example 2:

Data type

Source

File format

Sensitivity

Size

Questionnaire x

data collected

.csv, .txt, .docx

no/yes

1 Gb

Analysis of questionnaire x

data produced

.xslx, .tif

 

100 Mb

DNA samples

data reused from Biobank

 

 

 

Guidance for sensitive data

It is important to identify sensitive data types, as planning data management includes recognising and managing the risks involved with such data. If you work with personal information, identify the controller. More information can be found in the Data protection guide for researchers (Flamma).

Sensitive data is information that could cause damage if revealed.

  1. Sensitive personal information; it is impossible to make an inclusive list about what sensitive personal information includes. The researcher is responsible for identifying any data that, if revealed, could harm the data subjects. Sensitive information can include health information, the risk of disease, sexual orientation, ethnic origin, trade union membership, religion, or genetic information.
  2. Sensitive information about species (FIN), such as endangered animals, plants, nature conservation areas, or biosafety.
  3. Other confidential information, such as patents, military information, organisational information, or trade secrets.

Personal information includes all identifiers from which the person is identifiable directly or indirectly.

  • Direct identifiers: name, phone number, social security number, picture, voice, fingerprint, dental chart
  • Indirect identifiers:  gender, age, education, profession, nationality, work history, system log history, marital status, residence information, car license number

 

Consistency and quality of data

What risks are involved in controlling data integrity and quality, as well as how are the risks managed. Notice that data quality and the quality of research methods are two distinct things.

Tips for best practices

  • Do you use data management tools, e.g., a database for data collection?
  • Has everyone handling the data been introduced to the best practices?
  • Are the methods validated, or are quality control pipelines in use?
  • Are transcriptions of audio or video interviews checked by someone other than the transcriber (“double blinding”)?
  • Are checksums used (software)?
  • Digitalisation of analog or physical material should be done with sufficient accuracy.
  • In all conversions, maintaining the original information content should be ensured.
  • Discuss how minimisation, pseudonymisation, and anonymisation affect data quality.

Ethical issues related to data management

Does your data include personal information? Does your work with animals require an ethical permit? Do you work with other confidential or sensitive data than described above (e.g., endangered species (FIN), conservation areas, military information)?

Describe how you will maintain high ethical standards and comply with relevant legislation when managing your research data.  What are the risks involved, and how are they managed?

Tips for best practices

  • Justify why you have the right to collect, handle, and preserve data that involves ethical issues, for example, that you have passed an ethical review.
  • If you handle personal information:

 

Legal compliance related to data management 

What has been agreed about data usage rights? Are there rights belonging to a third party? What kind of data-sharing agreements do you plan to make with your research partners? Are there Intellectual Property Rights (IPR) such as copyrights involved with the data? What license will be used when/if data is opened?

Tips for best practices

Guidance about data ownership and licenses:

  • Describe who owns the data, whether rights will be transferred and whether ownership issues have been agreed upon with partners outside of the university. Agreements about authorship should be done at the beginning of the project. By doing so, you prevent possible conflicts about rights of use.
    • Remember that many funding agencies (Academy of Finland, EU) require data ownership to be transferred to the university. Ensure that the necessary agreements have been made.
    • The University of Helsinki does not automatically own a researcher’s data if this has not been agreed.  Principal investigators are responsible for concluding contracts on the ownership and user rights of research data at as early a stage as possible. [University of Helsinki research data policy]
    • Instructions on concluding an agreement (Flamma)
  • Which license will be used for opening the data?  It is recommended to make all research data, code and software created within a research project available for reuse. The UH recommends the CC0 license, with which you waive your rights to the work. You retain your moral rights, and good scientific practice still stipulates that the author be mentioned. Alternative licenses: Creative Commons: Information on licensesGNU or MIT licenses

Documentation means describing the data, i.e., these documents explain what data the project has and where the data originates from.

Documentation includes data dictionaries (explaining variables and codes) and readme files. Other important issues include file naming conventions, version control, and directory structure. There are standard methods available for documentation called metadata standards, which should be used if suitable for the data. These will increase the value of the data by making it easier to reuse.

Tips for best practices

  • Metadata standards: Many storage services require data to be saved using a standard. Hence, if you know where you will open data, check their standard requirements.
  • Data management software, i.e., databases and an electronic laboratory notebook
  • Data dictionaries, which explain variables, or code books, which gather all of the codes and calculations used.
  • File naming conventions
  • Directory structure:  Remember, if metadata, i.e., file, directory, or variable names, include sensitive data or personal information, they need to be handled accordingly.
  • Readme file(s) provide information about data files to ensure they are interpreted correctly.
  • Version control

Storing and back up during the research project

Where will your data be stored and backed up during the project? Opening, publishing, and archiving data after the project will be described in section five. Who is responsible for backups? Make a plan with your partners and ensure secure data transfer.

Tips for best practices

Use the IT Services provided and maintained by the University of Helsinki.

  • Personal / group storage space, which are maintained and backed up (every hour) by the UH Centre for Information Services
  • Other UH storing options, e.g., virtual server, dedicated physical server
  • CSC services for data sto rage
  • Cloud storage options: Use the UH OneDrive for business or the team cloud instead of other services (e.g., Google drive/Dropbox).
  • Does your project have sufficient storage space? If not, please contact Helpdesk at tel. +358 (0)2 941 55555 or helpdesk@helsinki.fi.

If you work with sensitive data:

  • Be sure that your storage is safe enough for the data, e.g., UMPIO (UH), a virtual storage server (UH), a private storage server (UH), NetApp storage cluster (UH), ePouta (CSC).
  • Do not use cloud storage due to its insufficient data protection!
  • Encryption: If needed, particularly mobile devices, portable and external storage devices should be encrypted for use, e.g., Cryptomaror.
  • Please be in contact with datasupport@helsinki.fi if you are unsure about data protection.
  • Do NOT USE external hard drives as the main storing option.

 

Access control

  1. Who is responsible for controlling access to the data?
  2. How will the access control be carried out? Is there an IT solution (e.g., password protection, usage logs, or some physical solution (file cabinet) in use?
  3. Who in the research group has access to which data?
  4. Why has each access right (editing, watching, deleting) been awarded?
  5. Tell how information security and the risks from sensitive data have been taken into account. Will sensitive data be stored in an encrypted form? More tips are below.

Tips for best practices

  • If you use a personal or shared network drive, you can easily control access rights.
  • Access control of sensitive data should be well considered. Data handling and transferring needs to be in line with permissions.
  • Access control: There must be a list of users and all rights granted, and a procedure for withdrawing rights.
  • Monitoring: How will data usage be monitored during the study: can the technical equipment log who used, when, and what data? Ask IT services what kind of automatic usage log is provided.
  • Security of the premises: Check the locking options of workspaces, safe lockers and cabinets, and camera and access surveillance

 

Cover photo by Public domain pictures.

Opening data

What part of the data will be opened / published? Where will the data be opened? Name the repositories. When will the data be available? Will some part of the data be destroyed?

If your data cannot be opened, explain why, and tell where the project metadata will be opened.

  • Data with personal information can only be published anonymised. Pseudonymised data is still personal data, and hence, it cannot be opened without explicit consent for that purpose.
  • Personal information can be shared subject to a license, if the original processing purpose allows it. If you plan to share data which includes personal information, be in contact with UH’s research lawyers (tutkimuksenjuristit@helsinki.fi).
  • The metadata of the data holding personal information should still be able to be opened, although the actual data cannot be.

Tips for best practices

  • Choose suitable repositories for sharing and opening your data already at the beginning of the project. Check that your data fulfils the repository requirements. Choose repositories using persistent identifiers (DOI, URN).
  • “As a rule, research data produced under the auspices of the University of Helsinki and related to published research results are open and available for shared use. The discoverability and citability of research data must be ensured.” [University of Helsinki research data policy]
  • Where to open data: Check the recommendations of the publishers, learned societies, and funders of your own field of science. Where have you or your colleagues in the same field published data?
    • Specific repositories for one data type can be found in re3data.org.
    • General repositories are, e.g., IDA, Zenodo, Dryad, and Figshare
    • If you cannot open the data, open your metadata about your project data, e.g., at Zenodo or the national Etsin.

 

Long-term preservation of data

Long-term preservation means that data is preserved more than 25 years. If your data has long-term value:

  1. What part of the data is archived?
  2. Where will it be archived?
  3. How long will the data be preserved?
  4. Are there some costs related to archiving? Who takes care of them?
  5. Will some part of the data be destroyed?

An archiving plan is part of research quality and transparency.

Tips for best practices

  • When data is created, it is important to consider how long it will be preserved.
  • Remember to check if the publisher has requirements for the length of time for preservation regarding data related to a publication.
  • Check discipline-specific and funder-related preservation time requirements.
  • Traditionally, special categories of personal data (sensitive data) are advised to be destroyed when the project ends. However, GDPR does not require the destruction of data, but does require that participants need to be informed about data preservation and the basis of the duration of preservation. The university offers guidance on safe preservation methods. If you are preserving sensitive personal information, contact datasupport@helsinki.fi.
  • Biological samples can be stored in biobanks
  • Fairdata-PAS is a preservation place for nationally valuable data for dozens or hundreds of years. More information about UH service queue for Fairdata-PAS here

Who is responsible for data management tasks? Who is responsible for data protection and information security as well as controlling them?  What resources (time and workload) are needed for data management?

The better the planning for data management in the beginning of the project, the less work is needed when data is opened and preserved.

Tips for best practices

  • Are data management responsibilities allocated to one person, or is the whole research group involved?
  • Estimate whether expert help or an employee is needed for data management, data preservation, and data sharing tasks.
  • Give an estimate of how much time is needed for data documentation and cleaning to prepare the data (not results) to be opened:  1–2 h weekly, one day per month, 1–2 weeks before publishing, or some other time estimate.
    • Data documentation and cleaning means, for example, producing metadata (section 3.1.), anonymising sensitive data, arranging data, transferring data etc.
    • It is recommended to keep documentation up to date throughout the project life cycle
  • Specify your data archiving, opening, and publishing costs in the budget.

 

Cover photo by Pixabay.