2.2 Selection of data for the CSC

The manuscripts of the letters included in the Corpus of Scottish Correspondence are held in the National Archives of Scotland, the National Library of Scotland, and the British Library. References to the catalogues of these libraries are given at the beginning of each document (for information on the references, see the discussion of the symbol %MS in Section 2.4 Language-external information in the text files. Since the compiler lives outside Britain, it has been necessary to restrict the focus of the corpus to these major collections of Scottish letters. The present version of the CSC contains only a small proportion of these important collections. For a corpus of this size, it was unnecessary to search for more material kept in various local archives. Of course, some early correspondence is still in private hands, and less accessible than the family archives deposited in public record offices. As regards material to which access is limited, the compiler has contacted the trustees of such documents, and been granted permission to make her transcripts of the documents available as part of the web-based CSC.

The present CSC has been designed to provide as much information about sixteenth- and seventeenth-century correspondence as possible. The very earliest Scottish letters extant in the archives date from circa 1400 (see Section 1.3.1). Since these are included in the ECOS-Phase 1 database, which is an important manuscript-based source for Scottish documents dating from the fifteenth century compiled by Keith Williamson, the CSC is restricted to post-1500 letters. Despite continued browsing through the archives, the proportion of fifteenth-century letters in the corpus will remain small. While the number of fifteenth-century autograph letters cannot be expected to increase to any great extent, there is no lack of sources in the genre of correspondence for the later periods.

The present version of the corpus also comprises some letters dating from the first decades of the eighteenth century. The focus on the early part of the eighteenth century is primarily due to the fact that later letters reflect a widening of contacts with English writers, and it seemed important to examine epistolary prose in primarily Scottish networks before extending the corpus to include letters by informants regularly commuting between Scotland and England.

The designation of a writer as of Scottish origin is based on information available in biographical sources of various kinds. Thus, Scottishness is exclusively defined by language-external criteria, the family background being a primary source of information in the categories of the nobility and the gentry. In the 2007 version of the CSC it has not been possible to provide conclusive evidence of the informants' geographical mobility. The auxiliary database currently in preparation will hopefully make it possible for the user to define the co-ordinates of geographical and social mobility for the majority of individual informants. However, some writers will unavoidably remain unlocalized.

The expanded versions of the CSC are designed to be diachronically representative up to 1800. Marina Dossena, at the University of Bergamo, Italy, ( will cover the nineteenth century in her corpus of correspondence which chiefly contains Scottish business letters and letters written by emigrants (Dossena, Marina 2004. 'Towards a corpus of nineteenth-century Scottish correspondence', Linguistica e Filologia, 18: 195-214).

The balance between the sixteenth and the seventeenth century has not yet been fully achieved: the proportion of sixteenth-century letters is smaller and consists of fewer informants, especially for the period 1500-1550. Beside letters by Archibald, 6th Earl of Angus, and Gavin Douglas, dating from 1515-1528, the most valuable data in the first half of the sixteenth century have been extracted from collection catalogued as SP2 in the National Archives of Scotland (NAS), which contains letters written by geographically diverse correspondents to Mary of Lorraine, Queen Dowager (widow of James V), and dating from a relatively short period (1542-60). Only the letters identified as autograph in the earlier edition of these letters have been included. For more information, see Section 1.3 Dimensions of space, time and social milieu.

The main criteria in the selection of data for the CSC are as follows:

Only original manuscripts of letters have been included; there are no letters which have been indicated to be later copies in the catalogues or the actual documents, or have been detected to be such by the compiler according to criteria such as type of handwriting and paper quality.

Priority has been given to autograph letters by a single writer, those by two or more writers being exceptions in the CSC (see Index of Sources). However, it has not always been possible to find conclusive evidence for a particular letter being by the hand of the person who signed that letter. Comparison of hands is not always possible, since the archives have had to limit the number of documents a reader is allowed to examine simultaneously. There may be other reasons that a hand remains unidentified; for example, we may have only one single letter in a particular hand.

Among the sixteenth-century letters in particular, there are letters written by two different hands. In these, the most frequent pattern is that the body of the letter is in secretary hand and the signature, sometimes also the letter-closing formula, and, even less frequently, the initial term of address, are in a different hand, mostly resembling italic or a variety of the more rounded styles. In letters of this kind, the section in secretary is assumed to be non-autograph, while the signature (and the formulae) are considered autograph. The two hands are indicated by positioning the comment 'hand 1>' before the autograph sections and 'hand 2>' before the non-autograph ones (for information on the file-initial commentary, see %HD1 and %HD2 in Section 2.4 Language-external information in the text files).

As discussed in Section 1.3 Dimensions of space, time and social milieu, while the chief goal has been to achieve diachronic, diatopic and diastratic representativeness, close attention has also been paid to ensuring that the proportion of letters written by and addressed to women does not remain too small.

The selection process has also been conditioned by a number of pragmatic issues. There may be a tendency to include more numerous documents from carefully catalogued compilations which are easy to access at the archives. A particularly important factor in the decision-making has been the physical condition of the documents. Badly damaged documents have usually been excluded, especially those in which the folio is either torn or worn out at the margins, or where it has stayed folded for centuries. Since the keepers of the documents have been obliged to disallow their reproduction by photocopying, some of these documents, as well as those in which the ink is very pale, have been transcribed in situ. The transcriptions of these letters have usually been rechecked during another visit to the archives. However, the majority of the CSC letters have been transcribed from photocopies or photographs ordered by the compiler. In the case of imperfect copies, the originals have been re-consulted. For information about the transcription process and the CSC archive containing the copies, see %ST in Section 2.4 Language-external information in the text files.

Sample size, indicated by the number of words, varies between the different informants for two reasons. Letters differ from one another considerably as regards their length; for example, letters by legal or financial advisers are usually much longer than the rather formal letters written by newly-married women to new relatives as a polite gesture. Since it has not been possible to regulate sample size, frequencies will have to be normalized in the presentation of the results of quantitative analysis. Word counts are provided in %WC at the beginning of each file. Statistical information in the file Data arranged by time and space, with word counts also contains word counts for all the letters by a particular writer, as well as providing totals for all letters localized to a particular region. However, it should be remembered that letters representing a particular geographical area may sometimes be linguistically too heterogeneous to permit the interpretation of the findings with reference to dialectal preferences.