Background and history

The compilation of a corpus of Scottish correspondence was motivated by the compiler's awareness that royal, official and family letters were a data source with unique properties for research that seeks to reconstruct both past language use and social and cultural practices (Reconstruction of text languages). Correspondence can be considered a unique source in the sense that it offers both linguists and historians a wide range of informants representing different degrees of linguistic, stylistic and socio-cultural literacy; the idiolects and group-lects also reflect the influence of community type as well as geographical and social distance and mobility (Letters as a data source; Dimensions of space, time and social milieu).

The creation of the Corpus of Scottish Correspondence (CSC) draws on a long-term exchange of ideas between researchers active in the area of compiling diachronic corpora in the scholarly communities of the International Computer Archive of Modern and Medieval English (ICAME) and the Research Unit for Variation, Contacts and Change in English. In the late 1980s and early 1990s, I compiled the Helsinki Corpus of Older Scots (a revised bibliography) drawing on extracts of texts representing fifteen different genres, all based on early printed texts or chiefly 19th- and 20th-century editions. As a user of the HCOS corpus, I soon became aware of problems caused by normalisation and modernisation, frequently adopted as editorial practices in the texts selected for the corpus. It was evident that findings based on this corpus may sometimes reflect the history of varying editorial principles and practices rather than the history of the Scots language (for information on the influence of modernising punctuation, see for example Meurman-Solin (2007) and for the editing of contacted forms, section 2.3.4 in Transcription and digitization). Since there is usually not enough information about the editors’ decision making in the prefaces and introductions to the editions, the validity of edition-based data should always be checked against manuscript (see Lass 2004).

In contrast with the HCOS, the Corpus of Scottish Correspondence applies the principles and practices of philological computing to the transcription and digitization of the manuscript originals of the letters in order to ensure the authenticity and validity of data. In other words, the text in the original manuscript has been reproduced faithfully, and no emendation, tacit expansion of contracted forms, modernisation or normalisation has been permitted. For example, question marks signal ambiguous readings, and indication is given of cancellations, insertions and non-linguistic features, e.g. visual prosody, in the manuscript originals. (Transcription and digitization; Visual prosody).

As regards the theoretical and methodological approach, the CSC closely resembles the corpora which function as databases for the historical atlases created at the Institute for Historical Dialectology, University of Edinburgh, the Linguistic Atlas of Early Medieval English (LAEME), compiled by Margaret Laing and Roger Lass, covering the period c. 1150 to c. 1300, and the Linguistic Atlas of Older Scots (LAOS), compiled by Keith Williamson, phase 1, c. 1380 to c. 1500. The digitisation and annotation of the texts would not have been possible without the expertise of my colleagues Laing, Lass and Williamson, and the software created by Williamson.

A number of other factors influenced the decision-making process during the creation of
the CSC (Electronic data sources for Older Scots). Since three geographical areas are well represented in the Corpus of Early English Correspondence (CEEC), East Anglia, London and the North of England, the focus on Scotland seemed very relevant (In the CEEC, the Court has been defined as a fourth area, social rather than geographical. For more information on the CEEC, see Nevalainen and Raumolin-Brunberg (1996), Nevalainen and Raumolin-Brunberg (2003). In order to trace the diachronic developments and diffusion of numerous linguistic features in the history of English, it is highly relevant that we now have directly comparable data originating from the various areas of Scotland.


Lass, Roger. 2004. "Ut custodiant litteras: Editions, corpora and witnesshood". Methods and Data in English Historical Dialectology (Studies in Language and Communication, 16), ed. by Marina Dossena & Roger Lass, 21-48. Bern: Peter Lang.

Meurman-Solin, Anneli. 2007. "Relatives as sentence-level connectives". Connectives in the History of English (Current Trends in Linguistic Theory, 283), ed. by Ursula Lenker & Anneli Meurman-Solin, 255-287. Amsterdam/Philadelphia: John Benjamins.

Nevalainen, Terttu & Helena Raumolin-Brunberg, eds. 1996. Sociolinguistics and Language History. Studies based on the Corpus of Early English Correspondence. Amsterdam - Atlanta, GA: Rodopi.

Nevalainen, Terttu & Helena Raumolin-Brunberg. 2003. Historical sociolinguistics. London: Longman.