Sociolinguistic coverage of the Corpora of Early English Correspondence
The aim of the compilation of the CEEC was to design a corpus suitable for testing the applicability of sociolinguistic methods to historical data. Personal letters were chosen as material because they were known to share a number of linguistic features with the colloquial spoken idiom, thus forming a genre suitable for diachronic sociolinguistic research. Correspondence was also considered suitable because it was possible to trace details on the social backgrounds of the corpus informants and store them in a separate database in order to facilitate sociolinguistic analyses.
Owing to the limited literacy of the lower ranks and women, most of the letters stem from the literate section of society, i.e. the male members of the gentry, clergy and professionals such as lawyers and military officers. The share of female informants is about 20%. As regards the male informants, the proportion of the lower ranks is also approximately 20%, whereas letters by lower-ranking women only amount to 5% of the total data by women.
Although the corpus material goes back to different parts of England, priority was given to four regions, which contributed about half of the letters. These regions are London (the City and Southwark), East Anglia, the North (counties north of Lincolnshire) and the Court. The important role of the language of the rapidly growing capital was self-evident, and the North was considered equally significant as it was known to have been the site of origin of many medieval changes. East Anglia was chosen for the good availability of data, in particular for the continuity provided by two very large collections, the Paston letters from the fifteenth century and the Bacon letters from the sixteenth. The idea of a separate region 'the Court' came about when a unit was needed for correspondents who worked for the central administration of the country or stayed at court in various capacities. Many of these people lived in Westminster.
Figure 1. Map of CEEC regions (outline kindly provided by Simo Ahava).
While the CEEC may not in all respects represent the entire language community from the fifteenth to seventeenth centuries, it has proved to provide a reliable sample of the informal language used by the language community, or at least by the literate writing community of Tudor and Stuart England.