1.1 Reconstruction of text languages

The term "Older Scots" is used to refer to language varieties in Scotland in the period from the mid-fourteenth century up to the end of the seventeenth century (see Further reading on Older Scots). The established periodization is as follows:

Pre-literary Scots
to 1375
Early Scots
Middle Scots
Early Middle Scots
Late Middle Scots

Thus Older Scots is a "text language". Fleischman (2000: 34) suggests that

[t]he term "text language" is intended to reflect the fact that the linguistic activity of such languages is amenable to scrutiny only insofar as it has been constituted in the form of extant texts, which we might think of as its "native speakers", even if we can't interrogate them in quite the same way as we can native speakers of living languages. Another crucial difference between text languages and living languages is that the data corpus of a text language is finite; new data only become available when previously unknown documents are discovered, whether in the form of manuscripts, printed texts, tablets, etc.

As stressed in Meurman-Solin (2004b), recent advances in corpus linguistics no longer justify the polarization of the two approaches to the reconstruction of the languages of the past, namely 'the essentially data-driven and data-oriented' approach and 'the theoretical approach', which 'extrapolates from the data in order to identify general principles and mechanisms of language change' (cf. Fleischman 2000: 34). Instead, with major advances in historical corpus linguistics, the integration of the two approaches permits us both to provide a comprehensive description of linguistic features and to identify patterns reflecting systemic developments, and ultimately to model language variation and change in a theoretically relevant way.

As with many other pre-1500 varieties of text languages, in Older Scots the scarcity of data which are representative of a sufficiently wide range of language use causes some problems. As Johnston (1997:48) points out, until recently the evidence scholars had to rely on was mostly 'documents written by a small, unrepresentative section of Older Scots society, often a specialized, 'set-piece' type of text such as a will, a deed, a public record or a literary work'. With the creation of the Edinburgh Corpus of Older Scots database (c. 1380 to c. 1500), compiled to create the Linguistic Atlas of Older Scots (LAOS) (, the situation has improved considerably, but it is not possible to create a fully balanced corpus of this variety for the pre-1500 period (see Section 1.3 Dimensions of space, time and social milieu). There are also significant gaps in the data for the first half of the sixteenth century.

The CSC aims to permit the user to detect a wide range of variation and variability in Older Scots by including idiolects of professional writers and other writers with university educations, as well as less-trained and inexperienced writers, and by rejecting all practices of normalization and standardization in transcribing and digitizing the manuscript texts. Aitken (1971) sees a high degree of variation as an inherent feature of Older Scots, and his view has been amply supported by recent corpus-based research. The reconstruction of variation and change has also benefited from the fact that, in corpora of Scots, the proportion of texts written by women has increased (Meurman-Solin 2001b, 2005).

As discussed in Meurman-Solin (2004a), the reification, or objectification, of the Scottish variety and its description as part of a hierarchized system of varieties has tended to divert our attention from the more challenging task, that of providing a comprehensive description of variation and change in the various areas of Scotland. However, diatopically representative data selected from manuscript evidence makes it possible to examine the history of Scots without reference to standardization – or Anglicization (Devitt 1989) – or indeed a preconceived language system. In the present, emphatically data-driven approach, a comprehensive description can be presented more traditionally by illustrating the attested patterns of continued variation or by resorting to methods made possible by new technology (for a description of the attached software, see Software).

In my view, the reconstruction of the history of Scots seems to be negatively affected by three tendencies in earlier research on such primarily geographically- and politically-defined varieties of English as Scots (Meurman-Solin 2004a). There is a tendency to objectify or reify regional varieties, assuming they form relatively homogeneous – perhaps even relatively self-contained – entities or systems; a tendency to emphasize socio-political rather than linguistic factors in order to legitimize the naming and describing of regional varieties in a certain way; and a tendency to create hierarchies, leading to the analysis of a regional variety exclusively with reference to a standardized variety (cf. Milroy 1999). The use of quantitatively- and qualitatively-improved data is necessary for the creation of an unbiased comprehensive account of language varieties which until now have been described less fully than the standardized varieties used by wider speech and text communities.

In addition to the negative implications of reification, hierarchization and historicization, the categorization of texts in a database may influence the way in which we interpret the findings. As pointed out in Meurman-Solin (2001a), language-external variables used in structuring electronic databases may lead to a compartmentalization of texts into subcategories which, when examined more closely, are internally quite complex and heterogeneous. In the CSC, no categorization into subgenres of correspondence has been provided. Instead, the user is invited to proceed from the idiolectal level to the local, regional, supraregional and national levels, reconstructing variation and change over time and space before examining the conditioning of other language-external variables (see Section 1.3 Dimensions of space, time and social milieu). Earlier research has shown that factors such as social status and networks play a major role (e.g., Meurman-Solin 2001b), but style- and discourse-related variables reflecting contemporary politeness strategies can also be shown to influence the choice of linguistic features in various ways (see Meurman-Solin 1993, 2002, Meurman-Solin and Nurmi 2004, Meurman-Solin and Pahta 2006).

The reconstruction of past language use through historical documents encounters problems of various kinds. In addition to the above-mentioned gaps in the evidence caused by scarcity of texts, those witnesses we do have represent different degrees of validity and relevance. Depending on the type of research question, the texts in a corpus must be categorized into primary and secondary witnesses on the basis of what is known about their history, their writers, and the circumstances of their production and distribution (cf. Nevalainen and Raumolin-Brunberg (1996: 43) and the discussion in Meurman-Solin 2001a).

As pointed out in Meurman-Solin (2004b), for a text to function as a primary witness or an anchor text, its history must be able to be reliably recovered, and there must also be sufficient information about its writer. Even relatively well-known anchor texts may be complex, in the sense that no straightforward claims about correlation between language-external factors and linguistic features can be made. For example, legal texts, especially public documents, usually offer good prima facie evidence for localization in “space” (Williamson 2000). However, despite the precise date of production, for defining the variable of “time” as a conditioning factor, it is necessary to investigate what the role of conventions and formulae is in a text, as such fixed expressions increase the general degree of conservatism in legalese. In contrast, the date of a letter allows us to specify a particular point of time in a person's idiolect in a particular communicative situation, but the definition of the variable of “space” calls for a scalar system of parameter values when applied to texts by geographically and socially mobile writers. Localization is often difficult in the case of women who, as a result of marriage (or marriages), moved from one place to another. Moreover, as we do not usually know where and when a female informant became literate, it may be impossible to tell which area the spelling practices she learned belong to and how they relate to her pronunciation.

In my earlier research (Meurman-Solin 2000a-c, 2002) I have also discussed a number of other factors which complicate the process of interpreting linguistic findings. I would like to highlight the important role of different degrees of linguistic and stylistic competence (for illustrations, see Meurman-Solin 2001b, Meurman-Solin and Nurmi 2004). Another important aspect to consider is that, although the network of family castles scattered around on the map of Scotland may give the impression of places on the periphery or in isolation, their distance from administrative centres varies depending on a particular family or family member's role in national politics, the economy or culture.

In addition to the geographical distance between the various places of origin of the texts and their writers, I have found it useful to apply the concepts of economic and social distance in commenting on differences between members of self-contained tightly-knit speech communities and those regularly in contact with people originating from various other areas within a rather diffusely-patterned administrative and economic framework (Meurman-Solin 2000a-c). For information on the concepts of speech community, discourse community and text community, see Section 1.3 Dimensions of space, time and social milieu.

To sum up the main points, the CSC database provides new information for a comprehensive descriptive account of the continuum from idiolectal and local to regional and national varieties of Scottish English in the period 1500-1730. As regards individual informants, our ability to relate our linguistic findings to language-external factors depends on how successfully we have managed to define features such as degree of geographical and social mobility, which can be estimated by drawing on information provided by social, economic and cultural history as well as demography. In addition to basic information about literacy in Scotland (see Marshall 1983, Houston 1985), it is useful to examine a particular idiolect with reference to its position on the cline of linguistic, stylistic and social literacy.


