The Diachronic Corpus of Present-Day Spoken English (DCPSE)
The Diachronic Corpus of Present-Day Spoken English is a parsed corpus of spoken English. It contains more than 400,000 words from ICE-GB (collected in the early 1990s) and 400,000 words from the London-Lund Corpus (late 1960s–early 1980s). The samples were selected to try to obtain a balanced sample by spoken 'genre', containing similar numbers of words in telephone conversations, for example. The orthographic transcriptions have been normalised and annotated according to the same criteria. ICE-GB was used as a gold standard for the parsing of DCPSE. The parsing has been corrected by a variety of methods to provide as high a quality of result as possible.
Project leader: Professor Bas Aarts
Time of compilation: 2002–2004
Size: c. 800,000 words
Number of texts/samples: 280
Funding: Economic & Social Research Council
Project home page: http://www.ucl.ac.uk/english-usage/projects/dcpse
A Getting Started: Second Edition pdf file is available from the project website.
Principal Investigator: Professor Bas Aarts
Senior Research Fellow: Sean Wallis
Research Assistants: Dr Dirk Bury, Lesley Kirk, Yordanka Kostadinova-Kavalova, Dr Ann Law, Gabriel Ozón
Available on CD-ROM with ICECUP 3.1.
Searchable with ICECUP 3.1.
CoRD Entry submitted on March 1, 2010 by Sean Wallis.
Information for the entry was edited and approved by Sean Wallis and Bas Aarts.