The presentation deals with a new model for Russian TTS which is ideologically and technically different from the previously developed rule-based Russian concatenation TTS . It uses the selection of natural contours from recorded speech examples, rather than predicts them, since prosodic information is extracted from natural speech.
Appropriate prosody should include models for conveying such linguistic functions of intonation as sentence delimiting, sentence forming, distinctive and expressive or attitudinal. Most theories of intonation make distinctions between falling and rising tones.
Our labeling scheme consists of -14 basic contour types, with up to 3 subtypes for each of them; - 6 break levels. Contours include those which end with a final rising tone (used in non-final units and questions), or a fall (used in declaratives, wh-questions, imperatives and most exclamations). We have also introduced a level tone for contours An inventory of basic falling contour types includes -3 patterns for finality; -1 pattern for giving emphasis; -1 patterns for imperatives; -1 pattern for wh-questions; -1 patterns for exclamations;
An inventory of basic rising contour types includes -4 patterns for non-finality (low (falling)rising, rising-falling, high-rising; level) -2 patterns for yes-no questions (rising-falling; low-(falling)rising) -1 pattern for exclamations;
An inventory of level types include: - 1 pattern for non-finality; - 1 pattern for parentheses; Patterns for finality, for example, are differentiated on the grounds of the degree of completeness or finality expressed. Thus, the end of a paragraph is marked by a more profound falling tone than the one used for the end of the sentence within the text, or the one which ends up a phrase. A proposed set of contour types for non-final units makes it possible to generate a variety of intonation contours for syntactically /grammatically incomplete units (dependent phrases and clauses leading on to something else in the sentence). At the same time, these contour types are stylistically different, which allows
a certain freedom of choice in the selection of the appropriate contour type. Moreover, it makes possible to capture individual preferences in different contexts
–conversational speech or reading. The recorded speech corpus was manually transcribed by trained phoneticians. Intonation transcription involves marking of the melodic pattern (from a pre-set inventory) and a break type for each intonation unit.
In our system 6 break options include (starting with the longest pause): /p6/ - the end of a paragraph; /p5/ - the end of a sentence; /p4/ - the end of a clause (in a compound sentence); /p3/ - the end of a phrase (NP, for example); /p2/ - the end of a very short phrase (1 word); /p1/ - a break within an intonation unit; it is NOT an intonational unit boundary.
If there is no physical pause between intonation units, which is often the case both in read and spontaneous speech,
a prosodic unit boundary is marked by a slash “/ ” (=a “zero” pause). The use of punctuation marks in assigning a break level is too simplistic and does not always reflect a real situation. In this respect syntactic-semantic information is more reliable. At the same time, break options can be tied to a particular contour: for example, in our system, falling contours which realize emphatic or logical stress either in a grammatically incomplete or complete unit, require a longer pause compared to a similar unit with no logical or emphatic stress.
The presentation deals with basic principles for the selection of intonation models: they will be described and justified both from the phonetic — acoustical — and functional points of view. Statistical data on the frequency of the intonation models and their acoustic parameters (F0) will be presented.