Contact Information

Institute of Behavioural Sciences
Speech Sciences
Siltavuorenpenger 1 A
PO Box 9
00014 University of Helsinki
tel. 02941 29342

Research on speech databases

Due to many technological advances, speech scientists are able to handle larger and larger amounts of speech recordings and to analyze video material as well. In Finland, a great deal of research and development is performed using large speech corpora.

Normally, researchers gather their own speech material and annotate or transcribe those units they wish to study. However, speech corpora could be utilized more efficiently, if a set of general technical principles were available and acknowledged for processing and sharing the material. In such a case, different speech corpora could be joined together into a common speech database on which researchers could perform searches.

The Speech Sciences atg the Institute of Behavioural Sciences is involved in developing a common speech database system in Finland. Principles for collecting and annotating speech corpora are also under development. This work is performed together with other linguistic departments at the University of Helsinki, the Laboratory of Acoustics and Audio Signal Processing at Helsinki University of Technology, and CSC (the Finnish IT center for science).


2002-2004: Integrated resources for speech technology and spoken language research (funded by Academy of Finland)

Other results and deliverables

A guide for annotating i.e. labeling speech (currently in Finnish only)

A general conceptual framework for annotating speech (RDF Schema), published by CSC


What´s on