University of Helsinki gives recognition to promoters of open and reusable research data

The Open Science Award of 2021 was granted to the Language Bank of Finland and research coordinator Kati Lassila-Perini

The annual University of Helsinki Open Science Award is granted in recognition of exceptional work in promoting open science. The theme of the 2021 Open Science Award is accessibility and reusability of research data. The goal of this year’s award is to highlight the importance of accessible and reusable research data to science and to the academic community.

Nominations were requested from University units, and from the University community via the Flamma intranet. For the award, nominations were sought of research projects or research infrastructures that have significantly promoted the accessibility and reusability of research data in their own field.

The nominations were assessed by the award jury that consisted of Vice-Rector Paula Eerola, University Librarian Kimmo Tuominen, IT Manager Minna Harjuniemi, Director of the Finnish Museum of Natural History Aino Juslén, and Senior Advisors Tiina Käkelä and Marko Peura.

The jury decided to grant the award to two nominees, who both represent long-term grassroots work in enabling and promoting the use of valuable research data. The award was given to the Language Bank of Finland, and especially its Donate Speech data, and to research coordinator Kati Lassila-Perini’s work in utilising the open data of particle physics in research and education.  

The University of Helsinki Open Science Award was presented at Think Corner on 29 October, in the event Open Science Afternoon: Open Data Matters.The event took place during the international Open Access Week, 25–31 October.


Publicly accessible language resources

Founded in 1995, today the Language Bank of Finland  is a wide-ranging service for researchers using language resources. It provides a wide variety of text and speech corpora, plus tools and training for the use, analysis, and management of its research data. Some resources are openly available to everyone, some are accessible through registering with an organisational user account, and some require personal access rights.

The language resources are collected by the national FIN-CLARIN consortium formed by Finnish universities and other research organisations. The Language Bank is coordinated by the University of Helsinki, and its technical services are provided by CSC (IT Center for Science). FIN-CLARIN is part of the international CLARIN ERIC research infrastructure, and the whole CLARIN community’s language resources are accessible to the users of the Language Bank.

One of the most prominent recent projects of the Language Bank has been the Donate Speech campaign, organised together with the Finnish broadcasting company YLE, The Finnish Climate Fund, and Solita. The project has thus far collected around 4,000 hours of casual speech in Finnish from all over Finland. Donate Speech material can be used in academic research as well as commercially, in accordance with the Data Protection Regulation. 1,500 hours of the material has been transcribed manually, and the rest will be transcribed using automatic speech recognition. The material consists of speech on various topics, especially on the impact of the coronavirus pandemic on the everyday life and work of Finns, which also makes it interesting from the point of view of the social sciences and humanities. In October 2021, Donate Speech was also given the European broadcasting companies’ Prix Europa award in the category of digital audio projects.

The Language Bank has also promoted the reusability of its material by organising methodological courses online. The courses are especially designed for graduate and doctoral students in the humanities, social sciences and behavioural sciences. The Language Bank’s bilingual course Puheen analyysin perusteet – Introduction to Speech Analysis was given the Teaching with CLARIN Award for open educational resources in 2021.


CERN experiment data for research and training

The work of research coordinator Kati Lassila-Perini to make the data of particle physics experiments openly available shows that the commitment of a single person to promoting open science can have a global impact. Lassila-Perini works at CERN, the European Organization for Nuclear Research. In 2011, she took part in creating a policy for the preservation, reuse, and open access of data from the CMS experiment at CERN. This policy was the first of its kind, and Lassila-Perini has coordinated the project “Data Preservation and Open Access” at CERN that has implemented the principles of the policy into everyday practice. As a result, a major part of the data from the CMS experiment is now freely available.

The CMS experiment detects high energy particle collisions, and the transformation of the particles during and after the process. The datasets produced in the experiment are large and complex, and therefore making them openly available is also demanding. The handling of the datasets requires specific software. For reuse, the handling of the datasets should be clear and easy, so that researchers from other fields can utilise them. The opening of CMS datasets has also promoted multidisciplinarity at the University of Helsinki. Moreover, the long-term preservation of the data from the CMS experiment makes possible its future use, which represents sustainable development in science.

In the project of the Helsinki Institute of Physics, “Education and Open Data”, led by Lassila-Perini, the CMS datasets have also been used for science education in Finnish high schools. With the support of the Finnish National Agency for Education, around 20 groups of Finnish high school students visit CERN each year. The project has also organised further training for high school teachers on the use of open data and developed tools for the handling of large and open datasets in education.