University homepage
University of Helsinki Department of Speech Sciences - Faculty of Behavioural Sciences
 
Samples of Glott-HMM synthesis
 
Project members:
Tuomo Raitio
Paavo Alku
Helsinki University of Technology
Department of Signal Processing and Acoustics
P.O. Box 3000
FI-02015 TKK
Finland

Antti Suni
Martti Vainio
University of Helsinki
Department of Speech Sciences
Siltavuorenpenger 20 A
PO Box 9
00014 University of Helsinki
Finland

Simple4all project 2011-2014
 We participated in a EU-funded project, Simple4All, with a purpose of developing synthesis technology for low resourced languages, using minimal supervision. Typically, text analysis requires expert knowledge for inferring letter-phoneme correspondence, syllable structure, part-of-speech classes, pronunciation of abbreviations etc. Here, only raw utf-8 text and corresponding audio is needed. Our unsupervised system was one of the top entries in Blizzard 2014, where the task was to build various Indian voices. Check samples here .

Vocal Effort Continuum with New Voices 2012 (LISTA)
 Samples used in testing voices built from two new voice databases, including adapted breathy and lombard styles.

male original

soft
normal lombard

male synthesis

soft
normal lombard

female original

soft
normal lombard

female synthesis

soft
normal lombard

Vocal Effort Continuum with Pulse Library 2012
 Humans adapt their speech according to auditory environment in order to get the message delivered but without using unnecessary effort. Depending on the context, natural speech might vary from whisper to shouting. This vocal effort continuum is an integral part of human communication, but it has not been utilized in machine-to-human communication, mainly due to immature technology and inconvenient amount of training data necessary for synthesizing such continuum.

Here, we demonstrate our experiments on the subject, using HMM-adaptation and interpolation techniques, together with small pulse libraries (5-10 sentences) built from selected points on the effort continuum. The amount of training data is 600 sentences of normal speech and 200 sentences of Lombard speech, used for adaptation. The rest of the points in the continuum have been extrapolated from these two voices. While the extrapolated samples are not quite up to the quality of the normal voice, they reproduce the intended degrees of effort quite well.

very soft
(extrap -0.8)
soft
(extrap -0.5)
normal
(trained 0.0)
lombard
(adapted 1.0)
lombard
(extrap. 1.5)

For modelling various degrees of effort, even HMM adaptation may not be strictly necessary. Below, we have just adjusted the means of synthesis source parameters to match the means calculated from pulse libraries built from 7 sentences of three different degrees of effort. The samples are from direct analysis, no statistical modelling was performed. The three versions of each utterance were generated from the normal version of the utterance.

soft
normal lombard


Blizzard Challenge 2011
 Again this year, our group participated in the annual speech synthesis evaluation, the Blizzard Challenge. In this international event, participants are given a task to build various synthetic voices in a limited time frame, from databases provided by the organizers. This year's task was to build a American female voice from a large, well-annotated database.

Main novelties of our entry were the use of pulse library in a female voice, and new parameterization scheme based on stablized weighted linear prediction (SWLP). Despite some artefacts, our voice performed well among other parametric systems. The best concatenative systems, however, were significantly better, due to large database, designed for unit-selection synthesis. Below are some samples of our entry.

 
HMM-Based Lombard Speech Synthesis, 2011

Following our succesful entry in Blizzard 2010 speech-in-noise task, we decided to investigate the subject further. Several voices were built and compared to human speech in terms of intelligibility, naturalness and appropriateness, in different noise conditions.

human speechnormalBlizzardLombard AdaptedLombard ExtrapolatedHuman Lombard
 
Older samples
 Additional samples available here.
Questions
 If you have any questions or comments concerning the demo samples or our research on HMM-synthesis, please send them to
antti.suni (at) helsinki (dot) fi or tuomo.raitio (at) hut (dot) fi