Transcription & annotation

Transcription conventions

Transcription conventions in a project like VOICE are of particular importance and need to reconcile three main requirements:

  1. they need to capture the reality of spoken interactions as precisely as possible,
  2. they need to be replicable, i.e. the scheme must be usable without further explanation by other researchers,
  3. they need to make sure that the resulting transcriptions are computer-readable.

The VOICE transcription conventions (version 2.1), which are the result of the project team’s extensive experience in applying these criteria to a wide range of ELF data, are of two kinds: mark-up and spelling. The VOICE mark-up conventions are specifically designed to reflect what seem to be the most significant features of ELF interactions. The VOICE spelling conventions are designed to render the diversity of ELF speech in a standardized way. The transcription conventions are made available with a view to facilitating the understanding of VOICE transcripts. However, other (ELF) researchers are invited to make use of the conventions for their own research.

The recommended citation for the VOICE Transcription Conventions is:

VOICE Project. 2007. VOICE Transcription Conventions [2.1]. (date of last access).

Alternatively, you may refer to the mark-up and spelling conventions separately:

VOICE Project. 2007. "Mark-up conventions". VOICE Transcription Conventions [2.1]. (date of last access).

VOICE Project. 2007. "Spelling conventions". VOICE Transcription Conventions [2.1]. (date of last access).


The VOICE Mark-up Conventions provide for various forms of annotation in the corpus. Most of the annotation concerns features of spoken discourse. For instance, the mark-up provides for information on emphasis, overlaps, variations in pronunciation and coinages, as well as non-English speech. For full details of how these (and other) features are implemented, please refer to the Mark-up Conventions.

In addition to the above, the project team is currently considering the possibility of extending the mark-up of VOICE with POS-tagging. (Ruth Osimk)