University homepage
University of Helsinki Department of Speech Sciences - Faculty of Behavioural Sciences
 
Samples of Glott-HMM synthesis
 
Project members:
Tuomo Raitio
Paavo Alku
Helsinki University of Technology
Department of Signal Processing and Acoustics
P.O. Box 3000
FI-02015 TKK
Finland

Antti Suni
Martti Vainio
University of Helsinki
Department of Speech Sciences
Siltavuorenpenger 20 A
PO Box 9
00014 University of Helsinki
Finland

Experiments on glottal pulse selection, 2010
 Our current method of excitation is based a single inverse filtered glottal pulse, which is modified in spectral domain to fit the context. While this method provides reasonably natural voice quality, our experiments on Blizzard challenge highlighted the weaknesses of this approach; modelling of irregularly voiced sounds and retaining the voice characteristics of the target speaker.

Thus, we have started experimenting on full unit-selection framework for the voice source.
In this framework, thousands of glottal pulses are extracted from training data to build a glottal pulse library. Then, in synthesis, the most appropriate pulse is selected for each pitch period, while hopefully retaining the smoothness of the parametric synthesis framework.

Samples from our initial experiments below use pulse libraries built of 100 utterances each.

For our Finnish voice, the differences are quite small. Pay attention to quality of /h/ -phones and nasals, as well as utterance final creaky voice.

One pulse Pulse library

For English, the transitions between unvoiced and voiced sounds are improved, as well as naturalness of voice quality, with the expense of some overall smoothness.

One pulse Pulse library

Blizzard Challenge 2010
 This year, our group participated in the annual speech synthesis evaluation, the Blizzard Challenge. In this international event, participants are given a task to build various synthetic voices in a limited time frame, from databases provided by the organizers. The target languages were English and Mandarin Chinese.

In one task, participants were asked to build a voice suitable to be heard in presence of noise. With modifications to various aspects of the voice source, our system placed first, being more intelligible than even the human speakers. Among tight competition, our system placed among average systems on most other tasks. Due to time constraints, we had to submit somewhat rough versions of the voices. Below are some samples of our entry.

English (EH1)EH1 modified

Mandarin (MH1)
Glott-HMM in male speech
 The following samples (from late 2008) were part of the material used in comparing our method with two widely used HMM-synthesis techniques: Basic HTS with 25 mel-cepstrum coefficients and simple excitation, and STRAIGHT, a high quality vocoder, which is used in state-of-the-art HMM-based speech synthesis systems. The training data consisted of 600 sentences of Finnish male speech.
mcep+simpleSTRAIGHTGlott-HMMReal speech

Clear majority of listeners preferred Glott-HMM over the other synthesis methods.

The prominence of the words were manually annotated, so the prosody of the synthesizers is closer to the original than in text-to-speech systems. Still, there is a lot to be done to achieve the fluency and clarity of the real speech.

Female voices - work in progress
 Female speech is notoriously difficult to model in parametric synthesis. The breathiness, or softness, is difficult to reproduce convincingly. Our preliminary attempt on female voices was build from SLT Arctic database, around 1000 sentences using the CMU-Festival front-end. The comparison samples have been collected from WWW-demos, and are not directly comparable due to differences in voice building and post-filtering methods. However, they illustrate nicely the difficulties of modelling female speech and the progress made in HMM-based speech synthesis in recent years.

Original speaker:

sample method yearsource
simple excitation - MCEP 2003 Festvox
mixed excitation - MGCEP 2007? DFKI - Mary TTS
mixed excitation - STRAIGHT 2006? CSTR
Glott-HMM 2009
Questions
 If you have any questions or comments concerning the demo samples or our research on HMM-synthesis, please send them to
antti.suni (at) helsinki (dot) fi or tuomo.raitio (at) hut (dot) fi