Take care of your data management skills. Data management skills are fundamental for researchers. Together with data management planning, they ensure that researchers can identify and manage risks related to data handling (e.g., data protection, data security, data access rights, and data storage). The University of Helsinki's Data Support provides free data management training for researchers. Data Support also offers guidance and training, as well as tools for data management planning.
In this guide, video and audio recording data refer to videos and audio recordings in which people appear and which are recorded for research purposes.
Plan in advance the collection, storage, and processing of data during the research, as well as the archiving or disposal of the data after the study. If time is not allocated for data management or if it is not planned in advance, data management often takes a backseat to publication. As a result, even data considered valuable for research may remain unorganized in different locations, making its reuse difficult or impossible. The reuse of data should be considered already in the data management planning phase (e.g., for reuse permissions and metadata production). For example, Kielipankki provides a concise checklist for data management planning (in Finnish) as well as a more comprehensive and concrete guide for data producers (in Finnish).
A privacy notice must be prepared for the collection of personal data. A person's image and voice are considered personal data, so collecting video and audio-recorded data requires a data protection notice. It is advisable to provide the notice to participants in advance and to keep it available on the research project's website.
Minimize the collection of personal data. According to the data protection regulation ”research should be conducted without personal data whenever possible”. The necessity of personal data should be assessed as early as possible in the research process, and the collected personal data ”must be adequate, relevant, and necessary for the purpose of processing. ” (See, for example, this Data Protection Guide for Researchers and this Data Protection).
Agree on ownership, usage, and copyright of the data also within the research group and with the institution conducting the research. Before data collection, it is important to establish agreements with research project partners regarding at least data ownership and usage rights, processing, storage, and potential openness. These agreements can be refined as the research progresses. Regarding copyright and related rights, Kielipankki provides a concise information package titled "Copyright and Related Rights of Original Data" (in Finnish).
Participant consent is required for data collection. Whenever participants are recorded on audio or video, it must be ensured that they understand their participation in the research and give permission for the recordings. A signed consent form is not required; it is sufficient that participants are informed about the purpose of the study.
Data collection may require ethical review or prior research permission. If data is collected in a school, for example, research permission must be obtained from the municipality’s education or relevant department. An ethical review may be required for data involving participants under 15 years old.
Ensure permissions before starting data collection. Any necessary research permissions and ethical reviews must be obtained before beginning data collection. If the data is intended to be deposited in a service specialized in the responsible preservation or publication of research data after the project, it is essential to secure sufficient rights during data collection to allow further sharing with third parties. Obtaining these permissions afterward can be difficult or impossible.
Utilize university-provided services for data collection. If you are collecting data in the field, you can borrow equipment from HSSH’s Equipment Library. The library can provide guidance in selecting a suitable device when possible. Video and audio recordings can also be collected at HSSH’s Interlab – Multimodal intra- and interindividual research laboratory which offers facilities for audiovisual work (video recording and interviews) and provides optimal conditions for high-quality recording (video recording and interviews).
Ensure that the data is reusable. For future use, attention should be paid to the technical quality and storage formats of the data to ensure compatibility with its intended long-term storage location. These aspects should be considered already during data collection. The most recommended formats are widely used and supported by multiple software programs. (See, for example, Kielipankki's guide on choosing the technical format of data [in Finnish] and Kielipankki's guide on data collection [in Finnish].
Consider the metadata perspective already during data collection. Data collection goes hand in hand with the production of descriptive information, or metadata. During the project, it is important to document, among other things, explanations of variables and codes (data dictionaries, codebooks) as well as readme files.
Utilize university-provided services for data processing and analysis. HSSH’s Interlab – Multimodal intra- and interindividual research laboratory offers tools for processing and analyzing audiovisual data, including transcription and annotation. Guidelines for transcription and annotation can be found on Kielipankki's website (in Finnish). In addition to these resources, Transana software, which is well-suited for conversation analysis, is also available at HSSH’s Interlab.
The description of datasets must be planned and resourced. Describing datasets enhances their reuse. The resources required for documentation should be allocated in the project’s budget and schedule. Sufficiently detailed metadata should be collected for the dataset, which can be published, for example, in COMEDI service provided by CLARINO Bergen CLARIN Centre.
High-quality documentation also includes explanations of file naming conventions, version control, and folder structure. It is important to describe the methods, sources, and locations from which the data is intended to be collected. Basic principles for producing metadata can be found in the Data Management Guide, and more detailed instructions are available in the Finnish Social Science Data Archive (FSD) guide.
Plan version control in advance. During research, different versions of the dataset are usually created. Version control is important both during the research process and afterward if the dataset is relevant for reuse. Kielipankki website provides information on version control (in Finnish), particularly from the perspective of data openness and reuse, including the creation of new versions.
Utilize university services for data storage. During the research project, data can be stored in a personal home directory (if used individually) or a group directory. Guidelines for different storage solutions suitable for various purposes can be found in the Library Data Management Guide. It is important to keep in mind that video datasets require a considerable amount of storage space, which may also lead to additional costs.
Storage of sensitive data. Particularly sensitive data can be stored in Umpio, but Umpio is only suitable for temporary storage during research. Data processing in Umpio is not possible. If the dataset has users outside the University of Helsinki, it is advisable to use CSC's sensitive data storage services. If data needs to be temporarily stored on an external hard drive or USB stick, it must be password-protected and preferably also encrypted (see, for example, the use of the Cryptomator program on the Helpdesk website).
Kielipankki is a good fit for a storage location. Compared to many other fields, linguistics has well-established storage solutions. The most central for data openness is Kielipankki (see corpora), which collects text, audio, and video datasets. Kielipankki can be used for storing and accessing data not only in linguistics but also in other fields. It is maintained by the FIN-CLARIN consortium, which consists of Finnish universities, CSC, and the Institute for the Languages of Finland. For example, Research Council of Finland and several foundations recommend that the long-term preservation of language data be arranged through FIN-CLARIN. The University of Helsinki is responsible for acquiring and receiving datasets offered through Kielipankki, developing tools, and training activities, while CSC handles technical maintenance. The producer or owner of a language dataset must ensure sufficient rights and permissions (in Finnish) for the use and reuse of the data and be prepared to sign a storage agreement with FIN-CLARIN (see guidelines for data producers [in Finnish]. It is also important to consider potential costs associated with making the dataset available in Kielipankki (see FIN-CLARIN's offered services [in Finnish]).
New services provided by UH are Databank and Data Archive.
Data to be deleted must be securely destroyed. When disposing of sensitive datasets, the deletion process must follow the guidelines provided by Helpdesk. Physical storage media, such as external hard drives or CDs, can be physically destroyed; for example, a broken CD cannot be repaired. IT support can also be asked to handle the destruction process if there is uncertainty or if the data is highly sensitive. Detailed information on IT recycling can be found in Flamma.
Versions of the dataset when making data available. Guidelines on version control from the perspective of data openness and reuse are available on Kielipankki’s website (in Finnish).
Kielipankki 2015-21. Aineistonhallintasuunnitelma.
Kielipankki 2015-21. Aineistojen tuottajan ohjeet ja muistilista.
Kielipankki:”Henkilötiedot ja tutkittavien henkilöiden yksityisyys”, joka ohjeistaa mm. rekisterinpitäjän määrittelyssä ja henkilötietojen käsittelyssä.
Tietosuoja: tiedonkeruun minimointi.
Tietoa tietosuojasta ja Helsingin yliopiston ilmoituspohjat löytyvät Flammasta.
Henkilötietorekisterien keräämistä ja käsittelyä ohjaa EU:n tietosuoja-asetus (GDPR, General Data Protection Regulation).
Kansallinen tietosuojalaki (1050/2018), jossa on tarkennettu tutkimukseen liittyviä kohtia.
Eettisestä ennakkoarvioinnista löytyy tietoa Helsingin yliopiston sivulta.
PDF version of the guide will be added soon.