Donate speech to help artificial intelligence understand dialects

From June onward, the Finnish public broadcasting company Yle has been prominently marketing the Donate Speech campaign though its channels. Among the parties contributing to the project is the University of Helsinki.

The goal of the Finnish language project, which is carried out in cooperation with the University, is to collect a 10,000-hour corpus of natural, contemporary and diverse speech into the Language Bank of Finland , which can be safely utilised under permission in research as well as by companies developing technical and Finnish-language AI solutions. The corpus is being compiled from different Finnish dialects, while artificial intelligence is being developed to recognise colloquial speech, with its dialects and hesitations. Also contributing to the project is the Finnish State Development Company.

By now, over 2,600 hours' worth of speech in a range of dialects has been accumulated. Research Director Krister Lindén  from the Department of Digital Humanities and the FIN-CLARIN consortium  says that, to begin with, only young men were expected to become interested in the campaign and to install the Donate Speech application, or to contribute to the campaign through its website. In reality, 60% of the participants have been women. The age distribution is heterogeneous, with people of all ages involved. The goal is for artificial intelligence to be able to recognise dialects as well as the speech and speaking style of people of different ages.

A project with international significance

Donate Speech is an internationally unique project, where data protection has been taken into consideration and people can anonymously donate their everyday speech. The donors are aware of the purpose for which they are donating their speech, and the companies involved in AI development gain no other data associated with them outside the speech. The project consolidates collaboration between the humanities at the University of Helsinki and the business sector.

The project is aimed at making services accessible throughout Finland, as it became apparent that the tools needed for AI development, and accessible by all, are lacking in Finnish.

“AI solutions must be developed so that they understand ordinary speech spoken anywhere in Finland, not only standard speech,” Lindén says.

“There is also social pressure to develop such solutions,” adds Mietta Lennes, project planning officer at the Language Bank of Finland.

“Artificial intelligence helps provide personalised services that also take people with special needs into consideration.”

For example, care robots must be able to understand their patients’ speech, literally. By the tone of voice, robots can identify the moods of their patients. Artificial intelligence must also be developed to understand Finnish spoken by immigrants, which may not be perfect. Services must be reliable so that customers are not afraid to speak to machines and are also understood.

Self-service for large groups of people and service applications for special needs groups require user interfaces that can be reliably operated by text and speech in the native language of the user. With the help of the Donate Speech campaign, an up-to-date speech and linguistic resource is being established for the benefit of everyone, also resulting in the development of smoothly functioning applications and services operable by speech.

Cooperation with Yle provides widespread visibility

Yle, the University’s cooperation partner, is monitoring the amount of data accumulated and concurrently boosting the campaign in those regions in Finland where donation activity appears to be dropping. Among the factors being monitored is the share of participants belonging to different age groups.

“In the summer, much of the speech collected naturally revolved around the coronavirus pandemic, but also around animals and sports. There are several themes to speak about in the project,” Lennes says. And more themes has been added in the autumn.

Dear University community members, alumni and partners, donate your speech for researchers anonymously, free of charge and with your personal data in secure hands. 

Partners in the Donate Speech campaign

The Donate Speech campaign is carried out collaboratively by Yle, the Finnish State Development Company  and the University of Helsinki. Also participating in the speech collection project are FIN-CLARIN, a consortium coordinated by Finnish universities, CSC – IT Centre for Science Ltd., and the Institute for the Languages of Finland, which, among other things, helps researchers in the humanities utilise and further process research datasets. The Language Bank of Finland, a collection of services provided by the consortium, offers corpora and tools for research purposes. The University of Helsinki is responsible for the acquisition of corpora, tools and related education, while CSC is responsible for technical administration. Specialists from Aalto University and the University of Turku have also contributed to the project.