The Penn-Helsinki Parsed Corpus of Modern British English (PPCMBE)
The Penn Parsed Corpus of Modern British English, consisting of just under one million words, is part of an ongoing larger project at the University of Pennsylvania and the University of York to produce syntactically annotated corpora for all stages of the history of English. The genre composition of the corpus has been kept as close as possible to that of the Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME).
Project leader: Anthony Kroch
Size: c. 1 million words (948,895)
Language: Modern British English
Number of texts/samples: 101
Funding: The National Science Foundation
Project home page: http://www.ling.upenn.edu/hist-corpora/
Reference lines and copyright
Kroch, Anthony, Beatrice Santorini and Ariel Diertani. 2010. Penn Parsed Corpus of Modern British English. http://www.ling.upenn.edu/hist-corpora/PPCMBE-RELEASE-1/index.html
Santorini, Beatrice. 2010. Annotation manual for the Penn Historical Corpora and the PCEEC. Release 2. http://www.ling.upenn.edu/hist-corpora/annotation/index.htm
Professor Anthony Kroch, Dr Beatrice Santorini and Ariel Diertani (University of Pennsylvania)
Each text in the corpus comes in three different formats: text (.txt), part-of-speech (POS) tagged (.pos) and parsed (.psd).
The Penn Corpora are distributed with a search program CorpusSearch 2, written by Beth Randall, and released as open source software.
A CD-ROM may be purchased with the corpus order form.
The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME)
Penn-Helsinki Parsed Corpus of Middle English, 2nd edition (PPCME2)
York-Helsinki Parsed Corpus of Old English Poetry
York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE)
Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English
Parsed Corpus of Early English Correspondence (PCEEC)
Information for the entry was edited by Prof. Anthony Kroch.