| Home page / Kotisivu |
Last updated: 11.10.2001
|
A few general principles...
|
I am a phonetician who is interested in Finnish spontaneous speech. For my work, I need to collect large amounts of conversational speech data of a high acoustic quality, and for the most part, the material has to be segmented and transcribed manually. So far, I have been doing a lot of labeling work myself. Labeling is also quite error-prone, and it is easy to "forget" or change your own segmentation principles. People tend to disagree on segment boundary placement and the phonetic labels for segments, and yet they are not very consistent themselves. This page is to provide documentation of the principles that I have been using (or those that I think I have been using). Primarily, it is a notebook for myself, but I hope it will also help others to learn the exhausting secrets of labeling work... Many of the principles I state below have been either inspired by or jointly agreed upon with my friend Nina Alarotu, who undoubtedly still is the number one speech labeler in Finland today, considering the amount of labeling hours she spent with the Finnish speech database! The symbol lists are based on the Worldbet references that are available on the web, and I have only made slight modifications in order to have a useful and easy-to-remember symbol set for my purposes.
This page is constantly under construction: I will elaborate on the different aspects and problems of transcription as I go along. Please feel free to comment!
Mietta Lennes
mietta.lennes@helsinki.fi
Suggest more! |
I use Praat for the segmentation and labeling of my sound files.
Praat has the possibility of using very long sound files, whose size can be as big as 2 gigabytes. These sounds can be used as LongSound objects, meaning that the sound file is not fully loaded into memory, but it can be viewed and accessed with a corresponding TextGrid object. TextGrids are nothing but text files, so you can use long conversational speech files from CD-ROMs or other read-only media and only edit and save the small TextGrid files each time when labeling the speech file. You can easily share the TextGrids with others, and each person can modify the TextGrid for herself, as long as the sound file is not changed.
In Praat, you can use many IntervalTiers for different levels of transcription. I have tried to utilize this possibility to the full, since many different levels allow many kinds of searching criteria later on. Here, I discuss those levels which I think are useful. I also point out some segmentation and transcription problems and give my solutions to them (for the time being; I may change my mind...)
|
Phone level |
Phone level: My transcription method is rather phonetic: I try to find the closest possible Worldbet symbol for each speech sound I find in the signal, adding all necessary diacritics. I determine the phone boundaries and labels in an acoustic-phonetic basis: partly by inspecting spectrograms, intensity curves, etc., and partly by listening to shorter and longer segments of speech. A phonetic transcription is supposed to describe what is actually produced, so I make as little reference to phonemes or "what-there-should-be" as possible.
Phoneme level: This level I do not use, because it has turned out to be too difficult and frustrating to look for the corresponding speech segments for a Finnish phoneme string. The phonemic structure of a word of casual Finnish speech cannot always be defined in a straightforward way, since the corresponding written word forms may not exist. Moreover, we do have an orthographic transcription of the utterances, and it is quite simple to convert the orthographic symbol strings into phoneme-like strings, since Finnish orthography is said to follow the phonemic structure rather closely. However, I am working on a Praat script that would automatically convert my narrow phonetic segmentation and labeling to a "quasi-phonemic" segmentation and labeling for situations where this representation might be needed.
Syllable level: I have attempted to
mark all syllable boundaries. This should be done after segmenting and labeling
the phonetic level, perhaps at the same time as marking word boundaries.
The problems with this level are similar to those with word boundaries:
In Finnish spontaneous speech, the definition of a syllable and its boundaries
often relies completely on the intuition of the language speaker. Syllable
boundaries in the middle of geminates are marked as geminates
at word boundaries.
A boundary in the syllable level does not have to coincide with a boundary
in the phonetic level, but word boundaries have to coincide with syllable
boundaries.
Words at the end of utterance: If there is, e.g., a [h]-like sound at the end of the last word, I take this as belonging to the utterance, but not to the word. However, where this [h]-sound is clearly coloured by the final vowel of the utterance-final word, it belongs to the word as well, and should be phonetically labeled as the voiceless counterpart of the vowel (e.g., "A_0").
Accent level: The boundaries in this level
should be identical to word level boundaries. I have marked with an 'x'
all words that to me are clearly prominent (bear a sentence accent). As
in any perceptual prominence judgments, my decisions are not the ultimate
truth, but I consider it useful to have even a preliminary and subjective
indication of accentuation. At this point, no distinction is made between
different sentence accent types etc. Several or none accented words are
allowed within one utterance.
Thus, (sentence) accent is now marked roughly as a property of a whole word,
and no indication about the more specific domain of accent is given. In
Finnish, there is no lexical stress (lexical items are generally not distinguished
with regard to accent placement), but in case a word receives a (sentence)
accent, it is usually the first syllable that is perceived as most
prominent. So, although the real situation is more complicated than that,
we may assume that the first syllable of an accented word is the prominent
one.
Intonation Phrase level (IP): This level is not very well developed. I have just been trying to cut utterances into smaller phrases which would be somewhat coherent prosodically. No fine definitions so far.
Utterance level: An utterance is a stretch of speech by a single speaker between two pauses. A pause is any period of silence where the speaker is not articulating a speech sound. The labels of utterance level intervals are the orthographic transcriptions of the utterances. Boundaries must coincide with intonation phrase boundaries.
Nonspeech level: This level describes
long-term "non-speech" properties which may overlap with speech
sound production: breathing (in or out; ingressive speech can occur for
short utterances in Finnish), laughter; or external sounds, like different
background noises. Those phenomena that cannot overlap with speech articulations
can be marked in the utterance level or phone level (the Worldbet
symbols for these features always begin with a dot '.', so they can
be distinguished from phones).
Actually, I have not started to use this level yet, but I plan to...
Topic level: A rather loosely defined tier, where I mark any big changes in the discourse topic, e.g., "Grandma's new house", "New job", etc. I mark the topics in English, to make the TextGrid file more legible to the international audience.
"Ready?" level: I label with 'ok' all the stretches of the sound file which have been segmented, labeled and checked to the full. This way, when running analysis scripts, I can only select good and checked data.
For the phonetic transcription of speech, I use Worldbet, which is an ASCII
version of the International
Phonetic Alphabet (IPA). There are a few basic Worldbet documentation sources
on the web (please let me know if you spotted better ones):
James L. Hieronymus, ASCII
Phonetic Symbols for the World's Languages: Worldbet. Technical report,
Bell Labs, 1993. (PostScript document, 294 K. The download site of the above
file is Center for Spoken Language
Understanding at OGI.)
This is the basic paper on Worldbet! Unfortunately, the printing quality
is not very high.
Chen Tao and Guo Shuang: Adaptation of IPA to World Languages
IPA,
Worldbet and OGIbet (by Ed Kaiser)
However, these sources are not quite consistent with regard to symbol definitions, and I have had to make a few slight modifications and additions. Below, I will try to explain each Worldbet symbol and convention that I use for Finnish. I will also refer to the corresponding IPA symbols.
In Worldbet, the base symbol is one ASCII character or sometimes a combination of two characters (see Worldbet symbols). Additional features or feature changes are indicated with diacritics, in the same manner as in IPA. Diacritics are separated from the base symbol with an underscore '_'. I have not avoided using diacritics whenever necessary. In case I need to give several diacritics for a sound, I have decided to add an underscore in front of each diacritic. This way, it is easy to refer to diacritics that have been given in a different order from within a script or a program.
It seems to me that a certain impression of vowel quality (e.g., if a vowel is centralized, or when indicating the voiced-voiceless distinction for consonants) can often be indicated with two or even more different symbol combinations (a base symbol, or base symbol plus diacritics). I often prefer to use a common and simple base symbol and to add more diacritics. Since we are not making specific claims about phonemic structure, and thus not "deriving" the phones from phonemes, it does not matter which symbol is used as a base, as long as the diacritics and the base symbol combined will imply the same set of perceived phonetic features. (Moreover, different transcribers will use slightly different symbols anyway!)
The symbols I have more or less systematically used for Finnish are marked
in bold.
(Unfortunately, the corresponding IPA symbols are still missing. I'm working
on it...)
| IPA | Worldbet | Description |
| i | front high unrounded | |
| y | front high rounded | |
| I | front high unrounded centralized | |
| Y | front high rounded centralized | |
| e | front mid-high unrounded | |
| 7 | front mid-high rounded | |
| E | front mid-low unrounded | |
| 8 | front mid-low rounded | |
| @ | front low unrounded, between mid-low and low (the Finnish /ä/ as reference) |
|
| a | front low unrounded | |
| 6 | front low rounded | |
| ix | (the Russian /i/ as reference) | |
| ux | (the Finland-Swedish /u/ as reference) | |
| & | schwa = central vowel | |
| ox, &_w | rounded schwa = rounded central vowel | |
| 3 | central mid-low unrounded | |
| ax | central low/mid-low unrounded | |
| 4 | back high unrounded | |
| u | back high rounded | |
| U, u_x | back high rounded centralized | |
| 2 | back mid-high unrounded | |
| o | back mid-high rounded | |
| back mid-low unrounded | ||
| > | back mid-low rounded | |
| A | back low unrounded (the Finnish /a/ as reference) | |
| 5 | back low rounded | |
Stops
Nasals
Fricatives
Affricates
Approximants
Laterals
Flaps/ taps
Trills
Ejectives
Implosives
Clicks
The symbols I have more or less systematically used for Finnish are marked
in bold.
(Unfortunately, the corresponding IPA symbols are still missing. I'm working
on it...)
| IPA | Worldbet | Description |
| p | voiceless bilabial stop | |
| t[, t_[ | voiceless dental stop | |
| t | voiceless (apico)alveolar stop | |
| tl | voiceless lateral alveolar stop | |
| tr | voiceless retroflex stop | |
| tn, t_n | voiceless nasalized stop | |
| c | voiceless palatal stop | |
| cp | voiceless labial palatal stop | |
| k | voiceless velar stop | |
| kp | voiceless labial velar stop | |
| q | voiceless uvular stop | |
| ? | glottal stop | |
| b | voiced bilabial stop | |
| d[, d_[ | voiced dental stop | |
| d | voiced (apico)alveolar stop | |
| dl | voiced lateral alveolar stop | |
| dr | voiced retroflex stop | |
| dn, d_n | voiced nasalized stop | |
| J | voiced palatal stop | |
| Jb | voiced labial palatal stop | |
| g | voiced velar stop | |
| gb | voiced labial velar stop | |
| Q | voiced uvular stop | |
| ph | voiceless bilabial aspirated stop | |
| t[h | voiceless dental aspirated stop | |
| th | voiceless (apico)alveolar aspirated stop | |
| ch | voiceless palatal aspirated stop | |
| kh | voiceless velar aspirated stop | |
| qh | voiceless uvular aspirated stop | |
| bh | voiced bilabial aspirated stop | |
| d[h | voiced dental aspirated stop | |
| dh | voiced (apico)alveolar aspirated stop | |
| Jh | voiced palatal aspirated stop | |
| gh | voiced velar aspirated stop | |
| Qh | voiced uvular aspirated stop | |
| pH | voiceless bilabial hyperaspirated stop | |
| tH | voiceless (apico)alveolar hyperaspirated stop | |
| tR | voiceless retroflex hyperaspirated stop | |
| tN | voiceless nasalized hyperaspirated stop | |
| kH | voiceless velar hyperaspirated stop | |
| bH | voiced bilabial hyperaspirated stop | |
| dH | voiced (apico)alveolar hyperaspirated stop | |
| dR | voiced retroflex hyperaspirated stop | |
| dN | voiced (nasalized) hyperaspirated stop | |
| gH | voiced velar hyperaspirated stop | |
The symbols I have more or less systematically used for Finnish are marked
in bold.
(Unfortunately, the corresponding IPA symbols are still missing. I'm working
on it...)
| IPA | Worldbet | Description |
| m | bilabial nasal | |
| M | labiodental nasal | |
| n[ | dental nasal | |
| n | (apico)alveolar nasal | |
| nl | lateral alveolar nasal | |
| nr | retroflex nasal | |
| nj | palatal nasal | |
| N | velar nasal | |
| nm | labial velar nasal | |
| Nq | uvular nasal | |
The symbols I have more or less systematically used for Finnish are marked
in bold.
(Unfortunately, the corresponding IPA symbols are still missing. I'm working
on it...)
| IPA | Worldbet | Description |
| F | voiceless bilabial fricative | |
| f | voiceless labiodental fricative | |
| T | voiceless dental fricative | |
| s | voiceless (apico)alveolar fricative | |
| s{ | voiceless laminoalveolar fricative | |
| hl | voiceless lateral alveolar fricative | |
| sr | voiceless retroflex fricative | |
| S | voiceless postalveolar fricative | |
| S{ | voiceless laminopalatoalveolar fricative | |
| C | voiceless palatal fricative | |
| x | voiceless velar fricative | |
| W | voiceless labial velar fricative | |
| X | voiceless uvular fricative | |
| HH | voiceless pharyngeal fricative | |
| h | voiceless glottal fricative | |
| H | voiceless epiglottal fricative | |
|
...under construction...sorry...
|
||
Under construction...
The symbols I have more or less systematically used for Finnish are marked
in bold.
(Unfortunately, the corresponding IPA symbols are still missing. I'm working
on it...)
Under construction...
Under construction...
Under construction...
Under construction...
Under construction...
Under construction...
Under construction...
| IPA | Worldbet | Description |
| _0 | voiceless | |
| _v | voiced | |
| _h | aspirated | |
|
...under construction...sorry...
|
||
| Worldbet | Description |
| .fp | filled pause |
| .tcl, .tc | tongue click |
| .ls | lip smack |
| .br | breath noise (.bri = breath in, .bro = breath out) |
| .glot | glottalization |
| .vs | squeak, voice crack |
| .laugh | laugh |
| .ct | clear throat |
| .cough | cough |
| .sniff | sniff |
| .sneeze | sneeze |
| .yawn | yawn |
| .burp | burp |
| .uu | unintelligible speech |
| .ns | human, not speech |
| .bn | background noise |
| .ln | line noise |
Home page / Kotisivu