Studies in Variation, Contacts and Change in English
Volume 7

How to Deal with Data: Problems and Approaches to the Investigation of the English Language over Time and Space

Edited by Terttu Nevalainen1 & Susan M. Fitzmaurice2

1 Research Unit for Variation, Contacts and Change in English, University of Helsinki
2 School of English Literature, Language and Linguistics, The University of Sheffield

Publication date: 2011


Ahava, Simo
Compiling the Helsinki Archive of Regional English Speech (HARES): The problem of content description

This paper introduces the Helsinki Archive of Regional English Speech (HARES) and the protocols the HARES team chose for its compilation. Transcribing rural speech and compiling it into a fully functional archive or corpus is riddled with problems and pitfalls that must be acknowledged during corpus compilation (see e.g. Beal et al. 2007). In this paper, I will comment on the problem of content description, previously visited in my unpublished Master's Thesis (Ahava 2010). The question is whether or not transcriptions combined with a versatile enough annotation schema (XML with TEI specifications in the case of HARES) can solve the paradoxical problem of using text to describe speech. This problem is persistent especially when grammaticality of the transcribed utterance is under scrutiny. Thus I will also introduce an intermediate variant in the past BE paradigm of Cambridgeshire dialect speakers (Ahava 2010, see also Richards 2010) and illustrate how it characterises the problem of content description, which people doing transcription work and spoken language corpus compilation must resolve.

Beal, Joan C. & Karen P. Corrigan
Inferring syntactic variation and change from the Newcastle Electronic Corpus of Tyneside English (NECTE) and the Corpus of Sheffield Usage (CSU)

Taking the point made by Laurie Bauer (2002: 107-108) that "we do not have public electronic corpora that would allow us to investigate differences in the syntax of Newfoundland and Vancouver Englishes, or of Cornish and Tyneside Dialects", this paper demonstrates ways in which corpora derived from previously-conducted sociolinguistic surveys can be used to make such comparisons. In particular, we report on research which examined data from the NECTE corpus and the Corpus of Sheffield Usage in order to investigate diachronic and/or diatopic variation in relative markers, adverbials and quotatives.

Burnes, Susan
Metaphors of conflict in press reports of elections

Metaphors of conflict are so ingrained in the language we use to discuss political elections that it is difficult not to use them: we fight election campaigns, we suffer heavy losses, we emerge victorious. This paper presents a case study of English and French press reports on the 2008 elections in Pakistan, asking why this metaphorical source domain is so entrenched, what the consequences are of its use, and whether the patterns of use observed occur across languages. It discusses the reasons for using metaphors of conflict to describe peacetime situations drawing on conceptual metaphor theory and pragmatics in an account that also considers the contextual notion of newsworthiness. A cross-linguistic perspective facilitates the exploration of similarities and dissimilarities between metaphors in the two languages. The paper concludes by discussing the metaphors that seem to characterise conflict situations.

Fitzmaurice, Susan M.
Talking politics across transnational space: Researching linguistic practices in the Zimbabwe Diaspora

This paper situates the communicative practices used in the diaspora in the Zimbabwean political and historical context. The data for analysis consist of internet commentary produced by individuals in response to op-ed articles posted on the internet in web-based newspapers such as The Zimbabwe Times and the Zimbabwe Independent. The principal matter for consideration is the extent to which the readers’ posts reveal evidence that they observe conventionally prescribed or more local vernacular language norms. The content of these posts is explicitly political in nature and commentators conventionally state their patriotic interest in matters they regard as directly relevant to their own diasporic identities on the one hand and to the lives of their families in Zimbabwe on the other. Linguistic analysis of these materials indicates that readers rarely adhere to conventional discourse practices appropriate to public genres such as the ‘letter to the editor’ and instead adopt more idiosyncratic strategies such as code-switching and ‘textspeak’, the abbreviated code used in text-messaging. These strategies serve several functions. The choice of specifically Zimbabwean cultural and linguistic reference points might be taken to assert the writer’s familiarity with and proximity to the interests and concerns of a specific national and patriotic group. However, the choice of languages such as Shona or Ndebele within an English language context could be construed as marking the commentator not only as an interested party, but as an authentic member of the Zimbabwe nation on which the news focuses.

Montgomery, Chris
Starburst charts: Methods for investigating the geographical perception of and attitudes towards speech samples

This article looks at ways in which researchers working within the field of perceptual dialectology might investigate the perception of speech. Despite misgivings about the appropriateness of speech perception tasks within the wider field of folk linguistics (Niedzielski & Preston 2003), it is argued that such research is important when seeking a full understanding of linguistic perception. The article moves on to discuss what type of speech perception task is best suited to a perceptual dialectology framework. Data processing methods are discussed in detail, including a step-by-step guide to the creation of ‘starburst charts’. The utility of such charts is demonstrated using an example from fieldwork undertaken in the north of England and possible future developments using the method are introduced.

Keywords: Language attitudes; perceptual dialectology; voice sample placement; geographical perception; dialectological methods; starburst charts

Shaw, Philip
Coin epigraphy and early Old English variation: Explorations in the Corpus of Early Medieval Coin Finds

This piece provides an introduction to the potential uses as a linguistic corpus of two important interlinked databases of coins hosted and maintained by the Fitzwilliam Museum, Cambridge. These databases provide a valuable way to search and assemble data not present in the Dictionary of Old English Corpus, as well as allowing easy cross-referencing of this linguistic data with other information about the coins, such as their find sites. As the databases are designed primarily with numismatic study in mind, however, linguists need to be aware of the ways in which transcription protocols may differ from those they might themselves adopt. The possibilities for using XML to mark-up linguistic transcriptions of Anglo-Saxon coin inscriptions are briefly discussed, before turning to three brief case studies conducted using the Fitzwilliam's databases. The first of these discusses evidence for devoicing of final /d/ in Old English, comparing data from coin inscriptions against personal name forms in Libri Vitae. The second considers the variety of forms of the name of the Northumbrian moneyer Cuthheard, considering the possibility that practical considerations involved with the administration of a currency may prompt experimentation with varying spellings. The final case study considers the evidence provided by early Anglo-Saxon coin inscriptions for vowel epenthesis in Old English, a feature that also appears sporadically in early manuscript sources.

Siirtola, Harri, Terttu Nevalainen, Tanja Säily & Kari-Jouko Räihä
Visualisation of text corpora: A case study of the PCEEC

Information visualisation methods have a relatively modest role in corpus linguistics nowadays. There are good reasons for this: many of the information visualisation tools in the past have been targeted mainly at technical users, making them unusable for non-specialists; and the publishing tradition in corpus linguistics has favoured textual and tabular data presentations over graphics. However, we believe that the information visualisation community has a lot to offer corpus linguistics if only the domain was better understood.

To support our claim, we survey information visualisation as a cognitive tool for corpus linguists and present a selection of text corpus visualisations. We use the Parsed Corpus of Early English Correspondence (PCEEC) as our material to demonstrate techniques of text corpus visualisation.

Yáñez-Bouza, Nuria
Mapping 18th-century grammar writers in the British Isles (and beyond)

The aim of this paper is two-fold: to unearth the origins and whereabouts of 18th-century grammar writers, the codifiers of the English language in the age of prescriptivism, and to show the value of the Eighteenth-Century English Grammars database as a new tool for qualitative and quantitative studies in the field of the 18th-century grammatical tradition. More precisely, I will look into two locative factors little explored hitherto, namely the place of birth of the authors and the place where they lived, taught and/or wrote, each of which is analysed by country, county and city in the fullest detail possible. It is hoped that the research presented here will shed light on the backstage of English grammar-writing practices and will lay the ground for further investigations in 18th-century English studies.


Healey, Antonette di Paolo (ed.). 2004. The Dictionary of Old English Corpus in Electronic Form. 16 Mar. 2009.

ECEG = Eighteenth-Century English Grammars database, 2010. Compiled by María E. Rodríguez-Gil (University of Las Palmas de Gran Canaria, Spain) and Nuria Yáñez-Bouza (The University of Manchester, UK).

eXtensible Markup Language (XML),

Helsinki Archive of Regional English Speech (HARES),

PCEEC = Parsed Corpus of Early English Correspondence, tagged version. 2006. Annotated by Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk & Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. Distributed through the Oxford Text Archive.

Text Encoding Initiative (TEI),


Ahava, Simo. 2010. Intermediate Past BE: A Paradigm Reshaped With Data Drawn From HARES. MA thesis, University of Helsinki. 20 Oct. 2010.

Bauer, Laurie. 2002/2003. "Inferring variation and change from public corpora". The Handbook of Language Variation and Change, ed. by Jack K. Chambers, Peter Trudgill & Natalie Schilling-Estes, 97-114. Oxford: Blackwell.

Beal, Joan C., Karen P. Corrigan, Nicholas Smith & Paul Rayson. 2007. "Writing the Vernacular: Transcribing and Tagging the Newcastle Electronic Corpus of Tyneside English (NECTE)". Annotating Variation and Change (Studies in Variation, Contacts and Change 1), ed. by Anneli Meurman-Solin & Arja Nurmi. 20 Oct. 2010.

Niedzielski, Nancy & Dennis R. Preston. 2003. Folk Linguistics. Berlin: Mouton de Gruyter.

Richards, Hazel. 2010. "Preterite be: a new perspective?" English World-Wide 31(1): 62-81.