Basic structure

Project and sampling description

One of the main uses of English today is that of a lingua franca, i.e. used as a means of communication among speakers with different first-language backgrounds. Nevertheless, linguistic descriptions have as yet focused almost entirely on English as it is spoken and written by its native speakers. The VOICE project seeks to redress the balance by providing the first general corpus capturing naturally occurring, non-scripted face-to-face interactions in English as a lingua franca (ELF).

The unit chosen for sampling data for inclusion in VOICE is that of the speech event. Speech events are (as far as practicalities allowed) included in their entirety. The speech events were selected for inclusion in the corpus on the basis of a set of seven external, i.e. non-linguistic, criteria, which therefore define the target population. Accordingly, VOICE captures speech events that fulfill the following criteria:

  • English as a lingua franca
  • Spoken
  • Naturally occurring
  • Interactive
  • Face-to-face
  • Non-scripted
  • Self-selected participation (i.e. the speakers decided for themselves that they are capable of using ELF to accomplish specific participant roles in the speech event they are taking part in)

The ELF interactions recorded cover a range of different speech events in terms of domain (professional, educational, leisure), function (exchanging information, enacting social relationships), and participant roles and relationships (acquainted vs. unacquainted, symmetrical vs. asymmetrical).

Corpus statistics & corpus structure

Total numbers


Speech event types

For more detailed definitions used to delimit speech event types, please refer to


First languages of speakers

Given the nature of VOICE as a corpus of English as a lingua franca, speakers of 49 different first languages are represented in the corpus. VOICE 1.0 Online focuses mainly, though not exclusively, on European ELF speakers. Exact statistics on speakers' first languages are available at

Gender and age