The Diachronic Corpus of Present-Day Spoken English (DCPSE)

The Diachronic Corpus of Present-Day Spoken English is a parsed corpus of spoken English. It contains more than 400,000 words from ICE-GB (collected in the early 1990s) and 400,000 words from the London-Lund Corpus (late 1960s–early 1980s). The samples were selected to try to obtain a balanced sample by spoken 'genre', containing similar numbers of words in telephone conversations, for example. The orthographic transcriptions have been normalised and annotated according to the same criteria. ICE-GB was used as a gold standard for the parsing of DCPSE. The parsing has been corrected by a variety of methods to provide as high a quality of result as possible.

Project leader: Professor Bas Aarts
Time of compilation: 2002–2004
Size: c. 800,000 words
Language: English
Number of texts/samples: 280
Period: 1958–1992
Economic & Social Research Council
Project home page:


A Getting Started: Second Edition pdf file is available from the project website.


Principal Investigator: Professor Bas Aarts

Senior Research Fellow: Sean Wallis

Research Assistants: Dr Dirk Bury, Lesley Kirk, Yordanka Kostadinova-Kavalova, Dr Ann Law, Gabriel Ozón


Available on CD-ROM with ICECUP 3.1.

Technical information

Searchable with ICECUP 3.1.