Privacy Notice

Privacy Notice of the DigiTala project.

Short web address:

The research project DigiTala (financed by the Academy of Finland 2019-2023 and Svenska Folkskolans vänner 2015-2017) involves the processing of personal data. The purpose of this notice is to provide information on the personal data that has been and will be processed, the source of the data, and how the data will be used in the study. For more information on the rights of data subjects, please see the end of this notice.

Participation in the study is voluntary. There will be no negative consequences for you if you choose not to participate in the study or if you withdraw from the study. Participation in the study does not affect other language assessments (e.g. your course grade, YKI test results).

1. Data Controller

  • University of Helsinki Address: P.O. Box 3 (Fabianinkatu 33), 00014 University of Helsinki, Finland
  • University of Jyväskylä, Address: Seminaarinkatu 15, P.O. Box 35, 40014 University of Jyväskylä, Finland
  • Aalto University Foundation sr, Address: P.O. Box 11000, FI-00076 Aalto, Finland

2. Contact person and principal investigator

Contact person in matters concerning the project / Principal investigator:

  • Name: Raili Hildén
  • Institution/Faculty/Department: University of Helsinki / Faculty of Educational Sciences
  • Address: P.O. Box 9 (Siltavuorenpenger 3A), 000014 University of Helsinki
  • Tel.: +358504482514
  • E-mail:

3. Contact details of the Data Protection Officer

The Data Protection Officer of the University of Helsinki can be reached at , the Data Protection Officer of the University of Jyväskylä at and the Data Protection Officer of Aalto University at

4. Description of the study and the purposes of processing personal data

The DigiTala project develops applications that use automatic speech recognition (ASR) to assess language skills. The aim is to find out how second and foreign language skills can be assessed in large-scale and high-stakes language tests such as the Finnish Matriculation Examination. The applications also allow students to practice pronunciation and speaking on their own.

The first versions of the applications are designed for Finnish upper secondary school students studying Swedish or Finnish as a second national language. In addition to upper secondary school students, data is also collected from participants of the YKI (National certificates of language proficiency) tests in Finnish and Swedish. Furthermore, data is collected from teachers and human raters who listen to the students’ speech samples and evaluate their performance.

The DigiTala project employs researchers from the University of Helsinki, Aalto University, and the University of Jyväskylä. The University of Helsinki is responsible for the pedagogical content of the applications and for collaboration with schools and other partners. Aalto University is responsible for developing automatic speech recognition and automatic feedback. Moreover, Aalto University is responsible for storing research data. The University of Jyväskylä is responsible for analyzing speech samples and training raters.

5. Personal data included in the research data

A) Data being collected from upper secondary schools in the second phase of the project (2019-2023):

  • Students’ names, dates of birth, schools, contact information, consent
  • Students’ recordings (speech, possibly including videos that show their faces)
  • Students’ test performances (answers, points, grades)
  • Students’ background information (e.g., how many courses they have taken, possibly questions on language background, language learning, oral language skills, gender, first language)
  • Students’ self-assessments of their language skills / test performance
  • Students’ opinions on the exam (questionnaire, interview)
  • Student performance assessment (automatic assessment)
  • Teachers’ names and contact information
  • Teachers’ background information (e.g., gender, age, first language, language skills, questions on teaching methods)
  • Teachers’ evaluations of students’ test performances
  • Teachers’ perceptions and opinions about the tool that is being developed (questionnaire, interview)

B) Data being collected from human raters in the second phase of the project (2019-2023):

  • Names and contact information for human raters, consents
  • Human raters’ background information (e.g. first language, language skills, experience)
  • Human raters’ assessment of students’ performance (individual dimensions, overall rating)
  • Feedback from raters on the tool that is being developed (questionnaire, interview)

C) Data being collected from participants and interviewers of the YKI tests (in Finnish and Swedish) in the second phase of the project 2019-2023):

  • Participants’ names, dates of birth, consents
  • Participants’ test performances (spoken and written performances) and grades
  • Participants’ background information (first language, age, gender)
  • Interviewers’ names, consents, and recordings (audio/video)

D) Data being collected from other stakeholders in the second phase of the project 2019-2023):

  • Feedback and comments on the assessment tool from language test designers of the Matriculation Examination
  • Feedback and comments on the test tasks from the Finnish National Agency for Education

E) The data collected in the first phase of the project (2015-2017) contains the answers of secondary school students to an oral Swedish language test. In the test, upper secondary school students were responding to the test questions using a computer. The audios for their answers were recorded, and some of the answers were, then, evaluated and/or transcribed.

F) Data being collected from adult learners in the second phase of the project (2019-2023):

  • Participants’ contact information and consent
  • Participants' background information (age, gender, first language, other language skills)
  • Participants' opinions on the exam (questionnaire)
  • Participants' recordings (speech)

6. Sources of personal data

During the second phase of the project, we are planning tests that measure oral language skills in Finnish and Swedish. In the tests, upper secondary school students respond to tasks using a computer. We collect speech samples from students during the test (test answers). In addition, students respond to a questionnaire that elicits information on their background, their views on the test they have taken, and their self-assessments of their language skills. Some students are also interviewed for feedback on the application and development suggestions.

Moreover, YKI test-takers’ answers to the YKI test in Finnish / Swedish and their grades will be used in this research. After the examination day, the University of Jyväskylä will send a questionnaire to the YKI participants to ask for their background information and consents for transferring test performances. The University of Jyväskylä will also ask the YKI interviewers’ consents for using recordings where they appear.

In addition to students and YKI participants, data is collected from teachers and human raters who listen to and evaluate students’ and YKI participants’ speech samples. Both teachers and raters fill in the background questionnaire. Some of them are also interviewed to gather feedback and suggestions for improvement. In addition, feedback and comments are collected from language test designers of the Matriculation Examination and Counsellors of Education at the Finnish National Agency for Education

We also want to compare the upper secondary school students’ performances in this oral skills test with each student’s result in the Matriculation Examination. We will ask for the students’ consent for this. Information on the students’ results in the Matriculation Examination will be requested from the Matriculation Examination Board.

Due to the COVID-19 pandemic, the data will partly be collected online. We use Zoom video conferencing tool of the University of Helsinki and University of Jyväskylä. At Zoom the video and audio traffic is located in the Nordic countries. Video conferences will be stored on the computer of the researcher, not in cloud storage.

In the first phase of the project (2015-2017) the speech data was collected from upper secondary school students who took a computer-mediated Swedish speaking test. Later, the utterances were transcribed and evaluated by human raters.

7. Sensitive personal data

No data considered as special category data under Article 9 of the General Data Protection Regulation will be processed in the study.

Test assignments, as well as questionnaire and interview questions, are designed so that they will not disclose any sensitive personal information (such as information related to ethnic origin, political opinions or health).

8. Lawful basis of processing

Personal data is processed on the following basis, which is based on Article 6(1) of the General Data Protection Regulation: scientific research purposes or statistical purposes carried out in the public interest.

9. Recipients of the personal data

Known recipients at the time of writing the Privacy Notice:

  • Project researchers (all data)
  • Service providers related to the data collection:
  • Zoom video conferencing tool of the University of Helsinki and Jyväskylä Funet Miitti (online connections, interviews, feedback discussions, trainings)
  • Webropol tool for making surveys (surveys for consent and background information / Webropol of the Aalto University; for YKI test-related surveys we use Webropol of the University of Jyväskylä)
  • Matriculation Examination Board (speech samples that are published for illustrative purposes)
  • The Federation of Foreign Language Teachers in Finland (speech samples that are published for illustrative purposes)
  • Teacher in-service training (speech samples that are published for illustrative purposes)
  • Teachers participating in the study (speech samples)
  • Human raters recruited by the project (speech samples)
  • Transcribers recruited by the project (speech samples, interview materials)
  • Once the project is complete, the material will be stored in the Language Bank of Finland from participants that have given their consent for this purpose (names, dates of birth, school details, and contact information will not be stored in the Language Bank)
  • Project’s research partners in University of Eastern Finland (Joensuu, Finland)

10. Transfer of personal data to countries outside the EU/European Economic Area

No personal data will be transferred to recipients outside the European Economic Area.

11. Automated decisions

No automated decisions with significant effects on the participants are made in the study.

12. Safeguards to protect the personal data

The personal data is processed and stored in such a way that only persons who need the data for research purposes can access it. The personal data register including personal information on the participants will be stored separately from the collected speech samples and ratings (in different systems).

Personal data processed in IT systems: username and password, registration of use/logging, access control, encryption.

Manual data (e.g., paper-based data or data in another physical form) is protected in the following ways: In a locked locker that only the project leaders at the University of Helsinki and at the University of Jyväskylä (Raili Hildén and Mikko Kuronen) can access.

Processing of direct identifiers: Direct identifiers will be removed in the analysis phase and will be stored separately from the research material being analysed. Students’ speech samples are referred to with research identifiers instead of names.

13. Duration of processing

The criteria for defining storage of research data containing personal data is based on good scientific practice. In scientific research, the aim is to store the research data so that the research results can be verified and previously collected data can be used for further scientific research on the same subject or for scientific research in other fields.

Personal data collected in the second phase (2019-2023) will be processed for five years after the project has been completed in order to complete research-related publications.

The pseudonymized data collected in the first phase of the project (2015-2017) will be stored for 15 years.

14. Retention of personal data after the completion of the study

The research material will be deleted from the storage devices used by the project researchers after five years after the end of the project (2019-2023).

The research material will be archived for later, compatible scientific research in accordance with the Privacy Policy: speech samples and research identifiers will be included, but not direct identifiers.

The storage of research material is based on Articles 5(1) (b) and (e) of the Data Protection Regulation. Prior to new research use, the Language Bank of Finland will ensure that the new research use is compatible with the original use of the material in accordance with the regulation requirements. No new privacy notice will be sent to the data subject regarding the new use of the research material, as the data controller will no longer be able to identify the data subjects without unreasonable effort.

Where will the material be archived and for how long: the Language Bank of Finland, permanently. Names and any other identifiers referring directly to the participant will be removed from the material stored in the Language Bank of Finland, but in theory, it will still be possible to identify subjects by their voice. The material stored in the Language Bank will be labeled as material with the highest level of protection (RESTRICTED). Restricted materials can be accessed only for personal research upon application.

15. Your rights as a data subject, and exceptions to these rights

The contact person in matters concerning the rights of the participant is the person mentioned in section 1 of this notice.

Rights of data subjects

According to the General Data Protection Regulation (GDPR), data subjects have the right

  • to access their data
  • to rectification of their data
  • to the erasure of their data and to be forgotten
  • to restrict the processing of their data
  • to data portability
  • to object to the processing of their data
  • not to be subject to a decision based solely on automated processing.

Not all of these rights can be exercised in all situations, depending on factors such as the basis for the processing of personal data.

For more information on the rights of data subjects in different situations, please see the Data Protection Ombudsman’s website:

Applicability of rights

If the processing of personal data in scientific research does not require the identification of the data subject and the data controller is unable to identify the data subject, the rights of access to data, the rectification of data, erasure of data, restriction of processing, notification responsibility and portability are not applicable unless the registrant provides additional identifying information (Article 11 of the Data Protection Regulation).

Exceptions to data subject rights

Under the General Data Protection Regulation and the Finnish Data Protection Act, certain exceptions to the rights of data subjects can be made when personal data is processed in scientific research, and fulfilling the rights would render impossible or seriously impair the achievement of the objectives of the processing (in this case, scientific research).

Right to lodge a complaint

You have the right to lodge a complaint with the Data Protection Ombudsman’s Office if you think your personal data has been processed in violation of applicable data protection laws. Contact details:

  • Data Protection Ombudsman’s Office (Tietosuojavaltuutetun toimisto)
  • Address: Ratapihantie 9, 6th floor, 00520 Helsinki
  • Postal address: P.O. Box 800, 00521 Helsinki
  • Tel. (switchboard): 029 56 66700
  • Fax: 029 56 66735
  • E-mail: tietosuoja(at)