Areal and regional variation: Mapping out new territory

Terttu Nevalainen
Research Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki

One of the topics addressed at the Helsinki Corpus Festival was the areal and regional coverage of English historical corpora. Resources like the Helsinki Corpus of Older Scots (HCOS) and A Representative Corpus of English Historical Registers (ARCHER), which make areal comparisons possible, are still relatively rare, and an International Corpus of English on historical principles is yet to be compiled. There are, however, many new resources ranging from corpora composed of late medieval manuscript material to nineteenth and twentieth-century grammar and dictionary databases which make significant contributions to the study of variation and change in English from a variety of geographical and linguistic perspectives. The five articles in this section all show the value of data-source triangulation, that is, comparing multiple data sources in order to evaluate and put the findings based on new materials in a wider research context.

In her paper, “The consonantal element (th) in some Late Middle English Yorkshire texts”, Vibeke Jensen discusses a corpus of 43 texts consisting of religious prose and legal documents which A Linguistic Atlas of Late Mediaeval English (LALME) localizes to the West Riding of Yorkshire and the City of York. Adopting two parallel methods to cross-validate her data, Jensen applies the LALME questionnaire to short texts in their entirety and to large portions of longer texts, and also studies 3,000-word digitized samples of the texts by means of corpus tools. In these Yorkshire texts the (th) element has markedly different realizations depending on whether it appears in a lexical word (<th>) or a function word (<y/þ>). This distinction, which reflects a pronunciation difference, is quite systematic and supports the idea of a “Northern system”. Jensen’s work forms part of the Middle English Grammar Project (MEG) launched by the Universities of Stavanger and Glasgow.

Merja Stenroos and Kjetil V. Thengs’s contribution “Two Staffordshires: Real and linguistic space in the study of Late Middle English dialects” introduces two different approaches to medieval dialect geography. Making use of the Middle English Grammar Corpus (MEG-C), the authors compare the information provided by these texts, localized on linguistic grounds following the LALME method, with their Staffordshire data included in the Middle English Local Documents Corpus (MELD), organized according to their known regional provenance. Mapping the spellings variants of ten frequent items in these two data sources, the authors conclude that “geographical space” and “linguistic space” reveal partly different trends, providing complementary evidence on regional variation. Some of the maps give very similar results for the two kinds of “space”, but some MELD maps show a higher proportion of supralocal, levelled forms, and yet others display more markedly local forms. MELD and MEG-C both form part of a long-term research programme at the University of Stavanger.

Moving on to the sixteenth century, Mel Evans illustrates how in Early Modern English regional spelling systems gave way to arrangements which were marked by social and idiolectal rather than primarily regional indexicalities. Her article “A sociolinguistics of Early Modern spelling? An account of Queen Elizabeth I’s correspondence” concentrates on the overall consistency and the graph forms and conventions that constituted the spelling practices of a highly educated public figure. Comparing a corpus that covers the Queen’s lifespan with her contemporaries sampled in the Corpus of Early English Correspondence (CEEC), Evans finds that the Queen’s practice shows an increase in consistency over time compared to her own earlier usage, possibly reflecting her localized “spelling contact”, preferences she shared with her relations. Evans concludes that a more extensive sociolinguistic analysis of spelling variation would help us better appreciate the social significance of the written mode in Early Modern English.

Lieselotte Anderwald’s study “Throve, pled, shrunk: The evolution of American English in the 19th century between language change and prescriptive norms” focuses on the potential influence of normative grammars on the evolution of English irregular verb forms (u/a verbs, thrive, plead, and have gotten). The author compares recommendations in 272 grammars of English, mostly published in the 19th century, with the actual usage of these forms in the 400-million word Corpus of Historical American English (COHA). The Collection of Nineteenth-century Grammars (CNG) is a resource that Anderwald has compiled for charting the ways in which prescriptions emerge over time both in Britain and in the United States. Correlating prescription with corpus findings by decade, she comes to the conclusion that, when there is variation, rather than leading the development, 19th-century American grammars tend to reflect actual usage albeit with some notable time lag. Prescription hence changes over time and appears to have exerted less influence on irregular verb forms than is commonly assumed.

In their article “Balanced corpora and quotation databases: Taking shortcuts or expanding methodological scope?”, Laurel Brinton, Stefan Dollinger and Margery Fee introduce a new resource for regional dialect research developed at the University of British Columbia for regional dialect research. They report on the process of updating A Dictionary of Canadianisms on Historical Principles, as a result of which the citations of headwords in context from the first edition of the dictionary have been combined with the newly collected historical and present-day quotations to form the 2.3 million-word Bank of Canadian English (BCE). Comparing findings on changes in the use of deontic modal auxiliaries and the progressive passive based on this new resource with data that can be retrieved from balanced corpora (e.g. COHA) and the Oxford English Dictionary, the authors find that a structured database such as the BCE can perform at least as well as a balanced historical corpus when it comes to diachronic dialect research. They also note that a quotation database comprises a more diverse range of authors than a traditional text corpus. Digitized reference sources should  therefore be included in the resources that mark the way forward in the historical study of areal and regional variation.

The two powerpoint presentations in this section come at the opposite ends of the chronological continuum. Anne-Christine Gardner discusses “Measures of productivity and lexical diversity: Suffixation (c.1150–1350)” using the Linguistic Atlas of Early Middle English (LAEME) as her primary data source. The study forms part of her PhD project “The development of abstract noun derivations in different regions and text types (c.1150–1700)”.

In their presentation, “Animacy ‘down under’”, Marianne Hundt and Benedikt Szmrecsanyi explore regional variation in animacy effects in genitives and progressives. They compare data from the latter half of the 19th century in A Corpus of Early New Zealand English (CENZE, 1840–), which consists of material provided by the New Zealand Electronic Text Centre, the National Library of New Zealand and the Internet, with paralled British and American data drawn from the ARCHER corpus1.


1. Published version: Marianne Hundt and Benedikt Szmrecsanyi. 2012. “Animacy in Early New Zealand English.” English World-Wide 33: 3, 241–263. [doi 10.1075/eww.33.3.01hun].


