Parsed Corpus of Early English Correspondence (PCEEC)

Published in 2006, the Parsed Corpus of Early English Correspondence (PCEEC) is based on the original CEEC, with two major differences. Firstly, unlike the original, which is plain text only, the PCEEC comes in files of three different kinds: plain text files, part-of-speech tagged files, and syntactically parsed files. Secondly, due to copyright restrictions, the PCEEC contains somewhat less material than the original (unpublished) CEEC; see the tables on the front page for a numerical comparison.

The annotation scheme for the PCEEC is the same as that used by the Penn-Helsinki Parsed Corpus of Middle English (2nd edition) and the Penn-Helsinki Parsed Corpus of Early Modern English. In addition, the PCEEC includes metadata about the letters (date, etc.) and correspondents (name, date of birth, etc.). This is partly similar to the parameter coding in the original CEEC, but more extensive; there is also a separate 'associated information file' (AIF) providing additional information on the letters and correspondents. For details, see the online manual.

The part-of-speech tagging of the PCEEC was carried out by Arja Nurmi (University of Helsinki) on an Academy of Finland junior fellowship, and the syntactic annotation by Ann Taylor (University of York), funded by the Arts and Humanities Research Council of the United Kingdom. The sociolinguistic information for each correspondent was provided by the Helsinki team, and by Ann Taylor, assisted by Joanne Close, at York. Copyright clearance was done by Mikko Laitinen and Terttu Nevalainen (Helsinki). The corpus is distributed by the Oxford Text Archive.