A high-performance ASR is vital in automatic assessment of spoken language skills, but the speech of language learners can be difficult to recognize due to, e.g., mispronunciations and grammatical errors. Moreover, traditional ASR systems require loads of speech data with text from the target language in order to function properly.
It is more challenging to develop ASR for small – or low-resourced – languages, such as Finnish and Swedish, than for English. This is because low-resourced languages have considerably fewer speakers and thus less training data is available. The foci in DigiTala’s ASR studies are in the speech of Swedish and Finnish language learners, groups with very limited data.
Target data used only for fine-tuning
Instead of traditional training data, speech with text, the researchers first used large speech data without text to pre-train the ASR. The pre-training was unsupervised, meaning that the systems learn without human intervention.
Both the Finnish and Swedish ASRs were pre-trained with large multilingual data containing speech from 23 languages. The Swedish ASR was also trained with large Swedish speech data without text, but such data was not available for Finnish.
Then the ASR systems were fine-tuned with Finnish and Swedish language learners’ speech with corresponding text. Both fine-tuning data were collected in the DigiTala project. The performances of new ASRs were compared to the ones trained only with the small target data.
Self-learnt machines were more accurate
The pre-trained ASRs proved to be more accurate than ASRs trained only with the small speech with text data in the target language. This was the case also for the multilingual ASR compared to monolingual ASR without pre-training.
In other words: a self-learnt machine can learn a new language from relatively small target data, even without previous “knowledge” on the target language. This makes the development of ASR for small languages and other small speakers groups, such as language learners, more efficient.
Read more about the development of ASRs for non-native speech in Yaroslav Getman’s Master’s Thesis.
The article on the development of L2 Swedish ASR is available in the Interspeech proceedings.