1.3 Dimensions of space, time and social milieu

This section provides general information about the variables of time, space and social milieu for the user to consider in assessing the representativeness of the CSC corpus (for a very important discussion of these variables, see also Laing 2004).

Even though some basic language-external information is provided about each digitized letter (for a detailed account of the language-external information presented at the beginning of each text file, see Section 2.4 Language-external information in the text files), the main principle is to keep the texts and the knowledge bank in which information about the texts and their writers has been deposited separate. The rationale is that some of this knowledge is hypothetical in nature and may even prove incorrect later. With an increased understanding of a particular textual history, it may become necessary to rewrite its description. Thus the principle of proteanism (see Section 2.1 Protean corpora: multidimensionality, flexibility, and transparency) is applied to how the auxiliary databases attached to the CSC have been compiled and constructed. In other words, in exactly the same way as the database of manuscript texts, the auxiliary databases will be subject to revision and expansion biannually (CSC informants and Index of Sources).

As discussed in Meurman-Solin (2001b, 2003, 2004a), new users of corpora which have been carefully structured according to language-external variables may sometimes apply such variables as interpretative tools rather uncritically. For example, genre is often considered the primary factor not only in assessing representativeness but also in interpreting linguistic findings. Apart from the more general problem of it being 'difficult for the analyst to separate out the effects of diachrony from the effects of genre' (Herring et al. 2000: 3), there are problems caused by the fact that genres are often unevenly represented over time, space and social milieu. 'Since there are gaps in historical data in the earlier periods in particular, the claim that genre is the primary conditioning factor in the introduction and spread of a specific linguistic feature is valid only in so far as the database the evidence is extracted from can be considered fully representative' (Meurman-Solin 2003: 172). Hopefully, a separate knowledge bank will motivate such users to reconceptualize and redefine these variables in accordance with both their theoretical and methodological approach and their particular research question.

In my view, the most important task at this stage of reconstructing and describing variation and change in Older Scots is to stress that conclusions should be data-driven and data-oriented. It would be unwise to interpret the findings with reference to sociolinguistically defined variables, for instance, before other factors - such as the conventions of epistolary discourse, partly borrowed or recontextualized from other discourses, or the influence of models, both British and European, on letter-writing - have also been thoroughly studied. In the genre of letters, the practices of polite society influence linguistic and stylistic preferences in an important way (cf. Palander-Collin 1999, Nevala 2004, Meurman-Solin and Nurmi 2004, Palander-Collin and Nevala 2005).

1.3.1 Time

The very earliest Scottish letters in the archives date from circa 1400 (those by George March in 1400 and James Douglas in 1405). Despite continued browsing through the family archives kept in libraries, record offices and private collections, the size of the part of the corpus dating from the fifteenth-century, and the proportion of autographs in particular, will remain small. In a corpus of letters, a certain degree of imbalance between the quantity and quality of pre-Reformation evidence and data from later periods is unavoidable (a similar problem occurs in the CEEC; see Nevalainen and Raumolin-Brunberg 1996, 2003, Nurmi 2002), while the situation is quite different as regards early legal and administrative documents and literary texts. These form the core of the first phase of the Edinburgh Corpus of Older Scots (ECOS), compiled by Keith Williamson. (

Beside letters by Archibald, Earl of Angus and Gavin Douglas, dating from 1515-1528 and primarily stored in the British Library, the most valuable data in the first half of the sixteenth century have been extracted from deposit SP2 in the National Archives of Scotland (NAS), which contains letters written by geographically diverse correspondents to Mary of Lorraine, Queen Dowager (widow of James V). A significant proportion of these letters are autographs and, since they date from a relatively short period (1542-1560), they also provide useful evidence for a synchronic study of diatopic and diastratic variation (Meurman-Solin 2000a). As regards pre-1560 texts, it has not been possible to achieve this degree of representativeness in other twenty-year periods (the periodization adopted in the Corpus of Early English Correspondence; see Nevalainen and Raumolin-Brunberg 1996). In contrast, there is no lack of manuscript sources in the genre of correspondence in the later periods. The CSC is designed to be diachronically representative up to circa 1730, the seventeenth century being the most densely covered period in the present version of the corpus. A supplementary, although somewhat different, corpus of Scottish correspondence is being compiled by Marina Dossena, University of Bergamo (Dossena 2004). This corpus will chiefly contain nineteenth-century business letters and letters written by emigrants. For information about how the variable of time is integrated in the text files (the parameter %DA), see Section 2.4 Language-external information in the text files.

1.3.2 Space

The following geographical areas are quite well represented in the present version of the corpus: Ross and Cromarty, Moray, Aberdeenshire, Angus, Perthshire, Fife, Lothian, the Border counties, Lanarkshire, Ayrshire, Argyllshire and Dumfries and Galloway. Neither the Orkney and Shetland Islands nor the Western Isles have been included. The database permits the reconstruction of the dialect continuum in Scots, especially for the seventeenth and the latter half of the sixteenth centuries.

The place of origin of the writers and, if specified in the original, the place of writing of a given letter are indicated as part of the set of file-initial parameters as well as in the various auxiliary files. More detailed information about the geographical origin of the writer, tracing his or her geographical mobility, will be recorded in the database CSC Informants, but this resource has not yet been completed. The scarcity of prosopographical information about some informants, women in particular, may make their localisation by language-external factors difficult. As a result of marriage (or marriages), women may have moved from one place to another. Moreover, since we do not usually know where and when a female informant became literate, it may be impossible to tell which area the spelling practices she learned belong to and how they relate to her pronunciation.

If an informant cannot be localized using language-external criteria, it may be possible to position his or her idiolect in a particular geographical area according to linguistic criteria, by applying the principles of the "fit technique" (see Laing and Williamson 2004, for instance). For information about how the variable of space is integrated in the text files (the parameter %LC), see Section 2.4 Language-external information in the text files.

1.3.3 Social milieu

The user may decide to interpret linguistic findings with reference to various factors, including the writer's and the addressee's gender, age, social rank, social mobility and education. The user may also try to reconstruct the informants' social networks, including the geographical spread of such networking. However, in the case of numerous women and younger sons in particular, this is only possible to the extent that such information is provided by their correspondence. Approximately 20 per cent of the informants in the present version of the CSC are female.

The age range is relatively representative, with the exception of very young informants. Patrick Waus' letters to his parents, written when he was of school age (circa 1540), have not been included, since it has not been possible to check them against the manuscripts. The fate of the Waus Correspondence is unknown; according to the NAS, these letters may have been destroyed in the fire at Barnbarroch House in 1941. The user may find it interesting to examine the editions of Patrick Waus' letters in the Helsinki Corpus of Older Scots (see also Meurman-Solin 1999).

The data bank CSC informants recording language-external information is being designed using standard reference works such as the Scottish Peerage and memoirs of some of the most renowned Scottish families, collected and edited by Sir William Fraser, as well as drawing on the continuously improving catalogues of the libraries and archives in Scotland. Since the inclusion of more recent historical, sociological and genealogical research will require an interdisciplinary team of researchers, this auxiliary resource remains incomplete.

My earlier studies drawing on corpora of Older Scots have shown that, in general, no straightforward correlation between linguistic variation and sociolinguistically-defined conditioning factors is evident. The spread of the relative pronoun WHO in sixteenth-century Scots reflects developments in how it is used as a reference signal, not only in noun-phrase structures but also as a sentence-level constituent. Since the early instances are frequently attested in formulae (typically the final formula '[as knows] god who keep/preserve [the addressee of the letter] eternally'), the history of WHO in Scots can be related to the spread of stylistic literacy (Meurman-Solin 2000b). Early Scottish women's writing skills have been illustrated in Meurman-Solin (2001a), and social milieu rather than formal education explains the higher degree of stylistic literacy of some of the female informants. Meurman-Solin and Nurmi (2004) examines the use of circumstantial adverbial clauses introduced by seeing and considering. These topic-forming clauses are skilfully used by numerous letter-writers to provide background information of various kinds (see also Meurman-Solin and Pahta 2006). Meurman-Solin (2002) shows that the progressive is more frequent in two specific environments, in which its use can be shown to be conditioned by text type: these are narratives and speech-based texts, depositions of witnesses in trial proceedings being examples of the latter. Thus, the frequencies and distributions of particular linguistic features can be related to various discourse properties. Johnston (1997: 51) draws our attention to a more general concern related to stylistic competence, claiming that more mobile people would use a "watered-down" style to communicate with people from other regions; apart from this better ability to differentiate between speech styles, the upper classes and professionals, in general the more mobile people, would act as the main guardians of a "Standard Scots style". By this logic, 'the town vernacular might well be different from the countryside ones just outside the walls, given that cities did tend to have a more varied population and would attract people from around their whole hinterland, which could extend over more than one dialect group.'

The user may find it appropriate to redefine the concept of space, extending it to cover dimensions other than geographical area. My earlier corpus-based research suggests that "distance" as a social, economic and cultural construct, rather than as a concept defined purely geographically, is a significant conditioning factor in the variation and change attested in the history of Scots (Meurman-Solin 1999, 2000a and b, 2001a).

In the present approach, diastratic variation can be said to complement our understanding of the spread of linguistic features over time and space. Thus, the primary goal of the present research is to create variationist typologies of linguistic systems by drawing on minutely detailed inventories of data. In this approach, membership of a variationist typology is strictly limited to items that have been attested as genuine alternatives in a pattern of variation at a particular level of analysis, whether structural, syntactic, or related to communicative or text-structuring functions.

1.3.4 Community type

In addition to time, space and social stratification, the representativeness of a database can be assessed with reference to three community types: "speech community", "discourse community" and "text community". As suggested in Meurman-Solin (2004b: 28), 'the best informants for reconstructing practices of a speech community can be found in texts written in private settings by non-professional, preferably less trained and relatively inexperienced writers.' Phonetic spellings and other features reflecting spoken varieties recorded in letters by these informants are not attested in texts influenced by shared scribal practices or, in the case of early printed works not available in manuscript, by the preferences of printers. For information on recorded phonetic spellings, see Meurman-Solin (1999 and 2005).

As argued in Meurman-Solin (2004b), while letters written by informants defined as representatives of speech communities permit us to identify idiolectal grammars, those by members of discourse communities also reflect grouplectal preferences. Language use in texts of the latter kind is affected by the conventionalised practices of professional coalitions, writers sharing similar communicative goals and applying similar genre-specific rules of writing, or groups who strictly follow a specific prescriptivist trend. Texts created by members of a particular discourse community can no longer be exclusively examined with reference to the variables of time, space and social milieu, since at least some of their linguistic choices have been influenced by 'inherited, borrowed, or recontextualized discourses, English or foreign' (Meurman-Solin 2004b: 28). As Keith Williamson (p.c.) has pointed out, a discourse community could also be defined as referring to a set of coeval writers who produce similar kinds of texts with similar objectives.

The term "text community" refers to literate people in a particular place and time who share a particular range of written texts. The identification of a text community is based on information about the consumption of literary texts and texts representing religious instruction. Another method for reconstructing text communities is to browse through bundles of documents put together in the archive of a particular family, administrative body, or some other institution. Such bundles will typically contain legal documents and letters to officials or friends and relatives, but also more or less unedited reports, notes, pro memoria-type documents, diaries and memoirs. Text communities have tended to be defined on the basis of edited texts, and many texts, despite their integral social and communicative function in their historical context, have tended to be marginalised, as they do not have the status of texts or genres traditionally included in the canon.

The conceptual framework provided by the three community types permits us to assess the relevance and validity of databases more reliably. We will become aware of the extent to which the range of texts varies between communities, for instance. For example, sixteenth-century Scottish women, 'mainly used their writing skills for writing letters to their relatives, and, somewhat later, for keeping accounts and summarizing the daily events in their personal diaries. In this case, language use can be assumed to be essentially conditioned by the restricted social functions of writing' (Meurman-Solin 2001a: 16). For information on Scottish women's literacy and education, see Marshall (1983) and Houston (1985).

A diachronic database representing a text community would in theory comprise the full range of once-functional texts relevant to the expression of the communicative purposes of the various discourse communities in a given geographical area. In practice, this full range will remain beyond recovery and it is therefore necessary to provide at least some direct or indirect evidence of what the major gaps are (see Section 1.3.1 and Section 1.3.2).


