In the Language Bank: Mikko Kurimo

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Mikko Kurimo tells us about his research on automatic speech recognition.

Who are you?

I am a Professor in Speech and Language Processing and leader of the at the of Aalto University.

What is your research topic?

For my PhD dissertation 25 years ago, I developed neural network algorithms to make automatic speech recognition more accurate and more robust. In order to train statistical models for recognizing speech sounds, it is necessary to utilize large amounts of speech material where the sounds are aligned with the corresponding text. At that time, very few such corpora were available. Thus, the research team had to collect and process the data themselves. When we developed automatic methods for aligning speech and text, it become possible to utilize larger data such as audiobooks and radio and television news (e.g., ) in training the Finnish speech recognizer.

However, sufficient accuracy cannot be reached just by modeling individual speech sounds, since they do not appear separately in speech and in practice they are modified to fit in the word and sentence context. Therefore, the speech recognizer must also be provided with a model of the language in question. On the basis of the language model, the recognizer decides which words and sentences are represented by the observed speech sound sequences. To train the language model, huge quantities of text are required that should also contain a large variety of examples of different types of language use. For training the Finnish speech recognizer, we have used, e.g., the .

When it is possible to automatically convert read-aloud speech and dictation into text with sufficient accuracy, this technology can be used in dictation services as well as in many other useful applications, such as transcribing planned speeches or respeaking presentations or television programmes. However, I am even more interested in natural and spontaneous speech that we all use in our everyday conversations and storytelling. Since free speech is the most efficient means of communication for humans, is of utmost importance to have an automatic speech recognizer that can understand this kind of speech when developing Artificial Intelligence systems that are to communicate with people.

The challenges in training models of conversational speech lie in the huge amount of variation in speech and in the limited availability of carefully transcribed resources of natural speech that are suited for training the recognizers. Since written language differs from spoken language in many ways, it is in practice necessary to create the text resources by transcribing speech first.

How is your research related to Kielipankki?

When training the first conversational speech recognizer, we used the corpus in addition to the corpus we collected ourselves. The language models were trained with specific portions of conversations in written format that were found to be similar to spoken language according to the aforementioned spoken corpora.

At the moment, we are preparing two new corpora of free speech for publication: an extension of the and the speech material collected in the Donate Speech campaign. Both corpora contain approximately 4000 hours of speech, which clearly exceeds the total amount that was included in all previously published Finnish speech corpora that were suitable for training automatic speech recognizers. I am confident that the new data will enable us to significantly improve the automatic speech recognizer we have developed at Aalto University (Aalto-ASR), whose most recent version () is currently available via the Language Bank of Finland.

Publications related to Kielipankki

Mikko Kurimo (1997). Using Self-Organizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models. PhD thesis, Helsinki University of Technology, Espoo, Finland.

Mikko Kurimo, Vesa Siivola, Teemu Hirsimäki, Janne Pylkkönen, Reima Karhila, Peter Smit, Seppo Enarvi, André Mansikkaniemi, Matti Varjokallio, Ulpu Remes, Heikki Kallasjoki, Sami Keronen, Katri Leino, Ville T. Turunen & Kalle Palomäki (tekijän nimet eivät ole missään erityisessä järjestyksessä, paitsi projektin johtaja mainitaan ensimmäisenä). 2000 –2016. AaltoASR – rajoittamattoman sanaston jatkuvan puheen automaattinen tunnistin avoimella lähdekoodilla, Aalto-yliopisto.

Seppo Enarvi & Mikko Kurimo (2013). . In Proceedings of the 10th International Workshop on Spoken Language Translation (IWSLT), Heidelberg, Germany, s. 256–263.

André Mansikkaniemi, Peter Smit & Mikko Kurimo (2017). . Proceedings of Interspeech 2017, Vol. 8, s. 3762–3766.

Juho Leinonen, Sami Virpioja & Mikko Kurimo (2021). . In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). Linköping University Electronic Press.

Peter Smit, Sami Virpioja & Mikko Kurimo (2021). . Computer Speech & Language,Vol. 66.

More information on the aforementioned resources in Kielipankki

(Lahjoita Puhetta)

The consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. is the collection of services that provides the language materials and tools for the research community.

22.11.2021

Mikko Kurimo

News

Language

Share this page

Newsletter