Corpora of Early English Correspondence

The Corpus of Early English Correspondence (CEEC) has been compiled to facilitate sociolinguistic research into the history of English. The project was originally set up to test how methods developed by sociolinguists of present-day languages could be applied to historical data. The CEEC family of corpora currently covers four hundred years from 1400 to 1800, and consists of five daughter corpora. The original corpus, which spans the decades from 1410 to 1680, was completed in 1998, and its sampler version (CEECS) was made publicly available the same year. Based on the original, the Parsed Corpus of Early English Correspondence (PCEEC) was released in 2006. The 18th-century extension (CEECE) and the supplement of the original (CEECSU), and their attendant sender and letter databases are nearing completion. The ultimate aim of the compilers is to combine these subcorpora into one structured whole, which will amount to over 5 million running words.

Project leader: Terttu Nevalainen, University of Helsinki
Co-founder of project: Helena Raumolin-Brunberg, University of Helsinki
Time of compilation: 1993– (ongoing)
Size: 5.1 million words
Language: English (Late Middle, Early Modern, Late Modern)
Number of letter collections: 188
Number of letter writers: c. 1,200
Number of letters: c. 12,000
Period: 1403–1800
Released: CEECS 1998, PCEEC 2006
Funding: Academy of Finland: 1.9.1993–31.12.1995; University of Helsinki: 1.1.1996–30.06.1998; Academy of Finland, University of Helsinki: 1.1.2000– (National Centre of Excellence funding for the VARIENG Research Unit)
Project home page: http://www.helsinki.fi/varieng/domains/CEEC.html

Table 1. The CEEC family.

  CEEC CEECE CEECSU TOTALS
words 2,597,795

2,219,422

442,484

5,259,701

collections

96

77

19

192

letters

5,961

4,923

829

11,713

writers

778

308

94

1,180

time span c. 1410–1681 1653–1800 1402–1663 1402–1800

Table 2. Published versions of CEEC corpora.

  CEECS PCEEC
words

450,085

2,159,132

collections

23

84

letters

1,123

4,970

writers

194

666

time span 1418–1680 1410–1681

Poster by Samuli Kaislaniemi

Figure 1. The CEEC family of corpora, with special reference to CEECE and CEECSU (click to view PDF file).

Tietoa suomeksi / Information in Finnish

Reference lines and copyright

CEEC = Corpus of Early English Correspondence. 1998. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Jukka Keränen, Minna Nevala, Arja Nurmi and Minna Palander-Collin at the Department of Modern Languages, University of Helsinki.

CEECS = Corpus of Early English Correspondence Sampler. 1998. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Jukka Keränen, Minna Nevala, Arja Nurmi and Minna Palander-Collin at the Department of Modern Languages, University of Helsinki.

PCEEC:

Parsed Corpus of Early English Correspondence, parsed version. 2006. Annotated by Ann Taylor, Arja Nurmi, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. Distributed through the Oxford Text Archive.

Parsed Corpus of Early English Correspondence, tagged version. 2006. Annotated by Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. Distributed through the Oxford Text Archive.

Parsed Corpus of Early English Correspondence, text version. 2006. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Jukka Keränen, Minna Nevala, Arja Nurmi and Minna Palander-Collin, with additional annotation by Ann Taylor. Helsinki: University of Helsinki and York: University of York. Distributed through the Oxford Text Archive.

CEECE = Corpus of Early English Correspondence Extension. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily and Anni Sairio at the Department of Modern Languages, University of Helsinki.

CEECSU = Corpus of Early English Correspondence Supplement. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily and Anni Sairio at the Department of Modern Languages, University of Helsinki.

SCEEC = Standardised-spelling Corpora of Early English Correspondence. 2012. Compiled by Terttu Nevalainen, Helena Raumolin-Brunberg, Samuli Kaislaniemi, Jukka Keränen, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily and Anni Sairio. Standardised by Mikko Hakala, Minna Palander-Collin and Minna Nevala. Department of English / Department of Modern Languages, University of Helsinki.

Manual

CEECS: Nurmi, Arja (ed.) (1998) Manual for the Corpus of Early English Correspondence Sampler CEECS. Department of Modern Languages. University of Helsinki. Available at http://khnt.hit.uib.no/icame/manuals/ceecs/.

PCEEC: Ann Taylor and Beatrice Santorini (2006) http://www-users.york.ac.uk/~lang22/PCEEC-manual/

Compilers

Project leader: Terttu Nevalainen
Senior scholar: Helena Raumolin-Brunberg
CEEC: Jukka Keränen, Minna Nevala, Arja Nurmi and Minna Palander-Collin.
CEECS: Jukka Keränen, Minna Nevala, Arja Nurmi and Minna Palander-Collin.
PCEEC: Jukka Keränen, Minna Nevala, Arja Nurmi and Minna Palander-Collin.
CEECE: Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily and Anni Sairio.
CEECSU: Samuli Kaislaniemi, Mikko Laitinen, Minna Nevala, Arja Nurmi, Minna Palander-Collin, Tanja Säily and Anni Sairio.

Student assistants

CEEC, CEECS, PCEEC: Kirsi Heikkonen, Alistair Melville-Smith, Taru Nurmi, Arja-Liisa Rossi, Reza Sanatnama, Heli Tissari and Anne Virolainen.
CEECSU, CEECE: Maarit Alanko, Annemieke Bijkerk, Teo Juvonen, Emma Murros, Tuuli Tahko and Eero Timoskainen.

Annotation

PCEEC: Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk and Terttu Nevalainen.

File format

The coding system is based on the set of ASCII codes (96 printable characters). The names of the files follow the published letter editions that form the basis of the corpus structure and are explained in more detail in the manual.

Availability

ICAME CD-ROM (CEECS)

The Oxford Text Archive (CEECS, PCEEC)