Take care of your data management skills. Data management skills are fundamental for researchers. Together with data management planning, they ensure that researchers can identify and manage risks related to data handling (e.g., data protection, data security, data access rights, and data storage). The University of Helsinki's Data Support provides free data management training for researchers. Data Support also offers guidance and training, as well as tools for data management planning.
In this guide, survey data refers to data collected through surveys; questionnaires sent, given, or conducted by phone or face-to-face interviews with a selected group of people. The data can be quantitative or qualitative, depending on the questions or fields in the survey form. In quantitative research, survey questions typically take the form of a statement or a question followed by a measurement scale, such as a Likert scale or a yes/no scale. The survey can be administered to participants once or multiple times at different intervals, for example, several times a day over a short period in experience sampling studies or a few times over several years in longitudinal studies.
Plan the data collection, storage, and processing in advance, as well as the archiving or disposal of the data after the study. When preparing a data management plan, pay attention to the type and quality of the questions (for example, whether the questions address intimate topics or whether responses could reveal respondents' personal data), the data collection method used (such as the security of an online survey form), and the practices for storing and sharing survey data.
The typical lifecycle of a research process is about 3–5 years. If time is not allocated for data management or it is not planned in advance, data management often takes a backseat to publication. As a result, even data considered valuable for research may remain unorganized in different locations, making its reuse difficult or impossible. Data reuse should be considered already in the planning phase of data management (e.g., for reuse permissions and metadata production).
The collection of personal data is regulated by the General Data Protection Regulation (GDPR). If respondents' personal data, such as name, address, phone number, or personal identification number, are collected, there must be a lawful basis for data processing. The recommended basis for data processing in scientific research is public interest.
Survey data may contain identifiable personal data. Identifiable personal data must be removed from the dataset as soon as it is possible for the research, and respondents must be informed about how their personal data will be processed. Consider which personal data are necessary for your research. Do not collect unnecessary information or data "just in case"—always follow the principle of data minimization in data collection.
Open-ended responses risk compromising the anonymity of the survey. A survey can be anonymous if it does not collect personal data, if the collected data cannot be linked to personal data obtained by other means, or if the survey does not contain open-text fields. Open-text fields do not inherently make the dataset identifiable or sensitive, but special attention should be given to the fact that respondents can write anything in them—including information that reveals their identity or contains sensitive data. If open-ended responses include information that could identify the respondent, the survey data must be treated as containing identifiable personal data.
When combining survey data with registry data, ethical and legal principles for processing registry data must also be followed. You can find registry data guidelines here.
If you need legal assistance, be prepared for the possibility that it may take several weeks to receive a response—ask in advance and use the waiting time to advance other tasks. The contact address for legal services at the University of Helsinki is tutkimuksenjuristit@helsinki.fi.
When using survey data, the general ethical principles of research involving human subjects apply.
Remember that you need a research ethics committee statement for your survey study if:
See research ethics pre-evaluation.
If any of the above conditions may apply to your survey study, carefully examine the potential effects of your research on participants and plan how to minimize any harm they may experience. This may sometimes require removing or modifying certain parts of the survey. In this process, it is important to consider the possible vulnerable position of respondents, for example, due to belonging to a minority group. Also, consider the possibility that your respondents may not have privacy when answering.
You will need a formal research permit if you recruit respondents through an organization that requires one. The research permit must be applied for from the target organization. An example of such a situation would be a survey conducted with patients recruited through HUS (Helsinki University Hospital), in which case you would need a research permit from HUS. To apply for a research permit, you will need either an existing statement from a research ethics committee or an explanation of why such a statement is not required. If you are recruiting respondents from sources such as student lists, a research permit is not needed, but you may still require a research ethics committee statement.
Survey studies must be based on informing respondents and obtaining their consent. How do you inform participants? "The purpose of informing participants under the General Data Protection Regulation (GDPR) is to provide clear and understandable information to the data subject about how their personal data will be processed in the research. The information should be concise and to the point, written in a way that ensures the research target group understands what data processing entails. Particular attention should be paid to using clear language when the study involves children, seniors, or other individuals in vulnerable positions. Participants must be informed before they decide to take part in the study (so they can give informed consent to participate). This can be done, for example, by providing a data protection notice to participants in advance or by including a link to the data protection notice alongside the survey. It is also advisable to keep the data protection notice accessible on the research project's possible website." See research data protection matters.
Make use of existing datasets. Before collecting new data, it is advisable to check whether there is already existing data that can be utilized. In many countries, large survey datasets are collected in a coordinated and centralized manner, which researchers can use. These collections include, among others, World Values Survey and European Social Survey. The Open Science Framework community maintains a directory of openly available datasets in the field of psychology.
Some datasets, such as Germany's SOEP longitudinal dataset, are available under specific agreements.
Finnish survey datasets can be found, for example, in the Finnish Social Science Data Archive.
These collections have good documentation on all necessary aspects, from collection details to the data itself. On the websites of the data collectors, it is often possible to download data or conduct analyses without downloading the data to one’s own computer. If you use data collected by someone else, cite it according to good scientific practices.
Data collection can also be outsourced to a commercial provider. When using a commercial provider for data collection, attention should be paid to aspects such as procurement processes and data security. It is also important to be precise about how the external provider is instructed regarding the stages of data collection, data storage, data delivery, and what is included in the price. As part of the procurement contract, a separate data management agreement should be made, specifying, among other things, the storage of data during collection and its transfer to the research team.
Survey data is collected by distributing or sending the survey to participants or by interviewing them. The survey can be a paper questionnaire, a telephone interview, or an online form accessible via a link, which can be distributed in various ways, such as by redirecting respondents from a website.
The features of the survey tool affect the workflow. Before starting to use a survey tool, check whether it includes the functionalities you need. For example, can the survey or the tool’s interface be translated into another language? Does it support multilingual functionality? If the survey needs to be available in multiple languages, consider this when planning the resources required for data management (time, budget). If the study requires measuring response time, ensure that the tool has the necessary features for this.
Where is the survey data stored during collection? It is crucial to verify where the data collected through the tool is stored. Personal data must not be stored in cloud services, and the General Data Protection Regulation (GDPR) restricts the transfer or disclosure of data outside the EU. The purpose is to protect the privacy of data subjects and ensure proper data processing in the same manner as within the EU, even when data is processed outside the EU (see considerations for international transfers and disclosures). The storage location of personal data should also be checked in commercial survey tools. According to GDPR, personal data must not be stored outside the EU. If you share your data on the Open Science Framework website, select a German server when creating the storage location to ensure the data remains under GDPR jurisdiction.
Determine whether you need to grant access rights to the survey and whether this is possible with the chosen tool. Does access require institutional credentials, or how is it managed?
Consider the time required to learn the tool if using it for the first time. Take into account aspects such as:
At the University of Helsinki, all users have access to the E-lomake and Redcap software. Check with your faculty or unit to see if a survey tool has been procured there.
E-lomake is not suitable for collecting sensitive data: "Data stored in E-lomake is securely located on the University of Helsinki’s own servers, but the system does not have all the logging features required by GDPR for processing personal data. Therefore, sensitive personal data should not be stored there at all (as has never been intended). This requirement is now emphasized in the guidelines" (see E-lomake data protection [the website is in Finnish]). Redcap, on the other hand, can be used to collect sensitive data. Redcap support is available at redcap-support@helsinki.fi.
Data is typically not analyzed within the survey tool itself but with various statistical software programs. For analysis, the data is transferred from the survey tool to another environment. During this transfer, data security must be ensured. The security of the data processing software (e.g., statistical software) must also be considered—ensure, for example, that unauthorized persons do not have access to the data. If the data contains sensitive personal information, transfer it from the survey tool to a secure storage environment (such as Umpio at the University of Helsinki) and conduct the analysis using software available in that environment.
Are you able to analyze surveys conducted in different languages yourself, or do you need translation services? A contract must be made with the translation service, and participants must also be informed about its use.
A survey tool can generate documentation – for example, the Redcap software automatically produces a data dictionary and/or a codebook that explains the variables and codes used in the survey. A readme file stored alongside the data is the minimum requirement for data documentation. Without documentation, the dataset may become incomprehensible even to those who collected it after some time. Documentation should be maintained throughout the project, as retroactively documenting data can be practically impossible.
Project documentation includes file naming and folder structure. Design a folder structure that suits your project. A structure that is too deep, with many subfolders, may make it difficult to locate the correct file or folder. On the other hand, an overly simple structure, where all files are in a single folder, can also make finding the right file challenging. It is essential to keep raw data separate from processed data and to freeze raw data to prevent any unintentional modifications during processing.
Administrative documents related to the project, such as consent forms and information sheets for respondents, are also part of project documentation and should be stored in a separate folder. It is also advisable to have dedicated folders for application documents and article versions. If the project has a shared folder accessible to multiple researchers, the folder structure and documentation practices should be agreed upon within the project team (see the University of Helsinki's Data Support documentation guide).
Metadata plays a key role in implementing the FAIR principles. If you want your data to be findable, accessible, interoperable, and reusable, it must be well-documented and include metadata that describes it. The better you adhere to the FAIR principles in your dataset, the more effectively others can use it. In practice, this means publishing the data or metadata in a location where it receives a persistent identifier, allowing you to assign a license, and ensuring it includes metadata that makes its use possible.
Even if the dataset itself cannot be made openly available, metadata can usually be published. However, it is crucial to ensure that metadata does not contain any sensitive information. Metadata can be published, for example, in Etsin.
The survey tool is not intended for data storage. Transfer the data out of the tool you are using once data collection is complete. When choosing a storage location for the data, consider who needs access to which data—for example, if access is only needed for finalized analyses, the folder containing raw data can be set to be more restricted. Someone in the project should be responsible for managing access permissions.
Make use of university services for data storage. During the research project, data can be stored in a personal home directory (if used individually) or a group directory. Guidelines for different storage solutions suitable for various purposes can be found in the table on the University of Helsinki's Data Support wiki page. Document for yourself and your team where the data is stored, so that you can, for example, delete any data you have committed to destroying.
The home directories and group directories at the University of Helsinki are backed up every hour and function on Windows, Mac, and Linux operating systems. These directories are located on the university's own servers. Every university member has access to a home directory (Z-drive on Windows computers). Instructions for obtaining a group directory. If the data is sensitive, a suitable storage location at the University of Helsinki is Umpio.
An external hard drive may be necessary for fieldwork, for example, but it is not suitable as the sole storage location because one must then remember to manage backups manually. Hard drives can also break or get lost. However, if storing something on a hard drive, and especially if the data is sensitive, the hard drive should be encrypted (encryption) using tools such as Cryptomator. Instructions for using Cryptomator at the University of Helsinki. The data should also be transferred as soon as possible from the hard drive to university-provided storage locations.
Storing sensitive data. Particularly sensitive data can be stored in Umpio, but Umpio is only suitable for temporary storage and processing during research. If the dataset has users outside the University of Helsinki, it is advisable to use CSC’s services for storing sensitive data. If data must be temporarily stored on an external hard drive or USB stick, it should be password-protected and preferably also encrypted (see Cryptomator instructions on the Helpdesk website). Sensitive data must not be sent as an email attachment.
"Measures to protect data include pseudonymization and anonymization, data encryption, data aggregation, transmission via encrypted connection, instructions for data handlers, collection of log data, restriction of access rights and user permissions, usage monitoring, and agreements." (See research data protection). For example, the Redcap software provides log data.
If your survey data is linked to registry data, you must follow the provided guidelines for storing registry data. You can find the registry data guidelines here.
Sharing during the project
Within the project team, it is advisable to agree on who is responsible for each data management task. If a role or task only requires access to analyzed data, there is no need to grant access to raw data, and so on.
Sharing data becomes more complicated if the project includes individuals from outside your own organization. In such cases, CSC’s storage solutions can be utilized, as they also offer options suitable for sensitive data. The situation becomes even more complex if the project involves individuals conducting research outside Finland. In such cases, it is recommended to contact Data Support for assistance in finding a suitable solution.
When opening and archiving data, it is necessary to follow what has been communicated to the participants. In Finland, a suitable archive for survey research data is typically the Finnish Social Science Data Archive (Tietoarkisto). The Finnish Social Science Data Archive only accepts anonymous data. You can familiarize yourself with the Data Management Handbook of the archive here.
A suitable storage location for your data can also be found using the Re3data.org service. When selecting an archive, it is advisable to favor curated archives where you can choose a (preferably open) license for the data. A curated archive ensures the long-term preservation of the data. A sign of a curated archive is, for example, the Core Trust Seal. It is also advisable to choose an archive that provides a persistent identifier (PID) for the data. Persistent identifiers ensure that the data remains reliably findable and citable. Also, check how long the repository promises to retain the data.
Data should be made available in an open-source file format (e.g., a CSV file) and possibly in multiple file formats, allowing more researchers to access and use the data—not just as an SPSS file, for example, which is a proprietary and paid software.
If you have informed participants that the data will be destroyed after a certain period, you must follow through with this. If you have stated that the data will be retained for the verification period of the research and then destroyed, set a calendar reminder for yourself to ensure you remember to delete the data.
Remove your survey data from the survey tool and save the necessary survey templates for future projects. Check in what format and how the project's data can be exported from the tool. Also, check the data retention policy of the tool—how long you can keep your data there.
If you wish to use the collected data for another study, participants must be informed about this, or permission for future research must be obtained at the time of the original consent.
Boynton, P. M., & Greenhalgh, T. (2004). Selecting, designing, and developing your questionnaire. Bmj, 328(7451), 1312-1315. https://doi.org/10.1136/bmj.328.7451.1312
Gideon, L. (2012). Handbook of Survey Methodology for the Social Sciences. Springer.
Plutzer, E. (2019). Privacy, sensitive questions, and informed consent: Their impacts on total survey error, and the future of survey research. Public Opinion Quarterly, 83(S1), 169-184. https://doi.org/10.1093/poq/nfz017
Vehkalahti , K. (2014). Kyselytutkimuksen mittarit ja menetelmät. Finn Lectura. https://helda.helsinki.fi/server/api/core/bitstreams/bc1c2c8a-0eb8-4881-ba8f-510ce386b810/content
Data protection:
Data protection guide for researchers in Flamma
Data Protection Principles in Flamma
Scientific research and data protection
Research ethics:
https://tenk.fi/sites/default/files/2021-01/Ihmistieteiden_eettisen_ennakkoarvioinnin_ohje_2020.pdf
https://www.aka.fi/tutkimusrahoitus/vastuullinen-tiede/tutkimusetiikka/
Open data:
https://www.europeansocialsurvey.org/
https://www.worldvaluessurvey.org/wvs.jsp
Creating surveys:
https://www.fsd.tuni.fi/fi/palvelut/menetelmaopetus/kvanti/kyselylomake/laatiminen/
https://www.pewresearch.org/writing-survey-questions/
https://projectredcap.org/resources/videos/
Data storage during the project:
https://wiki.helsinki.fi/x/kgV5FQ
https://helpdesk.it.helsinki.fi/help/10813
https://helpdesk.it.helsinki.fi/help/10672
Data archiving:
PDF version of the guide will be added soon.