(Hands-on) Course material - Mietta Lennes
All the lectures will include some examples and demonstrations and a chance to try things out on your own computer!
Lecture 1: Annotation and labeling
Try the script save_conversation_tiers_as_text_file.praat on the TextGrid file F1_F2_excerpt.TextGrid. (See instructions)
Lecture 2: How to speed up your annotation project using Praat scripts
- Insert potential utterance/pause boundaries automatically (by detecting silent intervals according to intensity): mark_pauses.praat
- Make sure you have a text file called labels.txt in the same directory with the next two scripts (you can make one up if you don't have it). The lines in this text file will be inserted as labels to the intervals in the (topmost) tier in the TextGrid.
- Insert labels for written sentences (in the case of read-aloud text): label_sentences_from_text_file.praat
- Insert labels for the actual utterances (make a copy of the written sentence tier and rename it as "utterance") label_utterances_from_text_file.praat
- Add word and syllable tiers according to the transcript in the utterance tier (the resulting syllables will only make sense in Finnish...): add_syllable_and_word_tiers.praat
- Add an initial phone tier (works for Finnish): generate_phone_tier_from_words_and_syllables.praat
Some instructions for using the scripts mentioned above
Lecture 3: Descriptive systems (defining the annotation units; principles and pitfalls)
Get the annotation status of your corpus: annotation_status.praat
Instructions: Save the script, put your sound and TextGrid files into a subdirectory corpus/, run the script and see. The script produces a text file called annotation_status.txt that should provide you with a summary of the annotation tiers and the amount of labeled intervals or points in them.
A script for marking the prominence of syllables in a sound file: mark_prominence.praat
You need a sound file that has been annotated with utterance, word and syllable tiers. The script will insert a point tier with a point at the mid point of each syllable, the user will see and hear one utterance at a time and he/she can judge which syllables are prominent (you could use, e.g., an ordinal scale from 0/empty=not prominent;1;2=most prominent). You are allowed to continue working with the same sound file in several sessions.
NB: This is not a real experimental setup.
Lecture 4: Exploiting the annotation (analysing your speech corpus with
- Sound files for the "pre-annotated" example corpora from the site of the Handbook of the IPA
Please download the Hindi package, unzip it, and copy the sound files in the subdirectory Narratives to a suitable place on your computer.
- (Exercises part 1: Getting organized, creating orthographic annotation, a script for drawing pitch curves for utterances)
- (Exercises part 2: Using two annotation tiers: scripts for calculating segmental durations and formants)
- The Thursday version of the vowel analysis script that was written by you with Paul and continued with Mietta
- Search (and analyse) your corpus: search_corpus.praat
Exercises and downloads for the course
Other tools for you to try out (these will not be further discussed during the lectures):
a multi-platform annotation tool that supports the Annotation Graph formalism (by Steven Bird and Mark Liberman), redeveloped by Bertin Technologies
a multi-platform annotation tool that supports hierarchical annotation layers; part of the LAT system by MPI
Links and additional information
Metadata for speech corpora
If you wish, you can take a look at the webpage for the IPR and Metadata Workshop that is held in Helsinki this week. The page contains an e-form for collecting a set of the most important metadata elements for speech corpora (find the link to "metadata for audio corpora"). However, please do not press the Submit button in the e-form, because it might confuse the workshop organizers ;-)
References mentioned during the lectures
- S. Bird and M. Liberman, A formal framework for linguistic annotation. Speech Communication, vol. 33,
no. 1-2, pp. 23-60, 2001.
- Handbook of the IPA
- R. J. J. H. van Son, D. Binnenpoorte, H. van den Heuvel, and L. C. W. Pols, The IFA corpus: a phonemically segmented Dutch "open source" speech database. In Proceedings of Eurospeech 2001, Aalborg, Denmark (P. Dalsgaard, B. Lindberg, H. Benner, and Z. Tan, eds.), 2001.