Placing the Helsinki Corpus Middle English Section Introduction into Context

Irma Taavitsainen and Päivi Pahta

The Helsinki Corpus (HC) was planned and the compilation work of the sections, including Middle English, performed in the mid 1980s. HC was released in 1991 as a general-purpose historical corpus aimed at giving an overall picture of extant language data from Old English up to 1710. The selection of texts reflects the early practices of corpus planning and state of knowledge in the field more than twenty years ago. HC is based on editions that had come out by the time, and new editions have been published since then. The idea of basing corpus texts directly on manuscript sources has been presented more recently, and even now the special features of Middle English pose restrictions (see Taavitsainen, Pahta and Mäkinen 2005, Pahta and Taavitsainen 2004). The principles of preparing manuscript texts for print have undergone changes during the history of editing medieval texts from the nineteenth century. The scope varies from the construction of hypothetical "originals" to faithful transcriptions with minimum editorial intervention. The latest trend is towards electronic editions with multiple layers from transcription to parallel normalised text version to enable the use of advanced corpus software (see Digital Editions for Corpus Linguistics (DECL) homepage

The article introducing the Helsinki Corpus Middle English Section was written in 1990-91 and must be read as a historical document dated to those years. At the time of writing, the four-volume A Linguistic Atlas of Late Mediaeval England (1986, LALME) had just come out. A major international project The Index of Middle English Prose (1984-, IMEP) had been going on for some years, but only few volumes were completed. The Index of Printed Middle English Prose (1985, IPMEP), with bibliographical references to Middle English texts, was available.

Our knowledge of the underlying manuscript reality has improved greatly within recent years. The late medieval period was the time of vernacularisation of genres and registers of writing from the Bible to scientific and utilitarian texts, and the transfer of literary genres (see Wogan-Browne et al. 1999). Vernacularisation was a pan-European phenomenon and took place on a large scale in various countries in the fourteenth and fifteenth centuries (see Crossgrove et al. 1998). For example, by the end of the medieval period, texts on scientific and technological subjects were becoming increasingly common all over Europe in vernacular languages. The extent of the process in English was unknown to us until the Scientific and Medical Writings in Old and Middle English: An Electronic Reference (eVK) compiled by Linda Voigts and Patricia Kurtz was completed in the year 2000. Today, this electronic database is an indispensable source of information about the underlying manuscript reality (see Introduction to MEMT). Another important more recent source of information in the area is George Keiser's volume A Manual of the Writings in Middle English 1050-1500, Vol. 10, Works of Science and Information from 1998. Significant new discoveries in manuscript reality have been made in Early Middle English period with the progress of A Linguistic Atlas of Early Middle English (LAEME) project (see e.g. Laing 2000 and LAEME website). Other, more general major improvements in Middle English research have been implemented by the launch of the electronic dictionaries and databases. The Middle English Dictionary (1956-2001, MED) was earlier available in paper copies, but now accessible on-line as part of the Middle English Compendium, with a HyperBibliography of Middle English Prose and Verse (based on the MED bibliographies). The Compendium also includes a Corpus of Middle English Prose and Verse.

New lines of development have been inspired by HC, and at present the situation is very different from that of the 1990-91 article. For example, the Late Middle English part of HC contains 2000-word samples of a few medical texts. The desideratum for a larger corpus became evident with pilot studies on scientific writing in HC (Taavitsainen 1994). The corpus of Middle English Medical Texts (MEMT) grew out of this need. It consists primarily of editions of medical treatises from c. 1375 to c. 1500 and an appendix of texts written c. 1330, but it also contains some new transcriptions (see MEMT Introduction and Catalogue). In a parallel way, Corpus of Early English Correspondence (CEEC) had its inspiration from the letters included in HC (see Nevalainen and Raumolin-Brunberg 2003).

The article discussed above should be read as a historical text with the date 1990 (the most recent references are to 1990), and we refer the reader to seek newer knowledge on the topic in the reference works mentioned above.


Crossgrove, William, Margaret Schleissner and Linda Ehrsam Voigts (eds) (1998). Early Science and Medicine: A Journal for the Study of Science, Technology and Medicine in the Pre-modern Period 3/2, Special issue: The Vernacularization of Science, Medicine, and Technology in Late Medieval Europe. Leiden: Brill.

Edwards, A.S.G. (ed.) (1984). Middle English Prose: A Critical Guide to Major Authors and Genres. New Brunswick, New Jersey: Rutgers University Press.

eVK = Scientific and Medical Writings in Old and Middle English: An Electronic Reference (2000). Voigts, Linda Ehrsam and Patricia Deery Kurtz (compilers). CD-ROM. Ann Arbor: University of Michigan Press.

IMEP = The Index of Middle English Prose (1984-). Vols. 1-. Cambridge: D.S. Brewer.

IPMEP = The Index of Printed Middle English Prose (1985). Lewis, R.E., Norman F. Blake and A.S.G. Edwards. (Garland Reference Library of the Humanities 537.) New York: Garland.

Keiser, George R. (1998). A Manual of the Writings in Middle English 1050-1500, Vol. 10: Works of Science and Information. New Haven: The Connecticut Academy of Arts and Sciences.

LAEME = A Linguistic Atlas of Early Middle English 1150-1325 (2007). Compiled by Margaret Laing and Roger Lass (Edinburgh: The University of Edinburgh). Online at

Laing, Margaret (2000). 'Never the twain shall meet': Early Middle English - the East-West divide. Taavitsainen, Nevalainen, Pahta and Rissanen (eds), 97-124.

LALME = A Linguistic Atlas of Late Mediaeval English (1986). McIntosh, Angus, M.L. Samuels and Michael Benskin, with the assistance of Margaret Laing and Keith Williamson. 4 vols. Aberdeen: Aberdeen University Press.

MED = Middle English Dictionary (1956-2001). Kurath, Hans, Sherman M. Kuhn and Robert E. Lewis (eds). Ann Arbor: University of Michigan Press. Online at

Middle English Compendium. Online at

Nevalainen, Terttu and Helena Raumolin-Brunberg (2003). Historical Sociolinguistics: Language Change in Tudor and Stuart England. London: Longman.

OED = The Oxford English Dictionary (1989). Murray, James A.H., Henry Bradley, W.A. Craigie and C.T. Onions (eds). 2nd edition. Oxford: Clarendon Press. Online at

Pahta, Päivi and Irma Taavitsainen (2004). Vernacularisation of Scientific and Medical Writing in Its Sociohistorical Context. Taavitsainen and Pahta (eds), 1-18.

Taavitsainen, Irma (1994). On the evolution of scientific writings from 1375 to 1675: Repertoire of emotive features. Fernandez, Francisco, Miguel Fuster and Juan José Calvo (eds), English Historical Linguistics 1992. Amsterdam: John Benjamins. 329-342.

Taavitsainen, Irma, Terttu Nevalainen, Päivi Pahta and Matti Rissanen (eds) (2000). Placing Middle English in Context. (Topics in English Linguistics 35.) Berlin and New York: Mouton de Gruyter.

Taavitsainen, Irma and Päivi Pahta (1998). Vernacularization of medical writing in English: A corpus-based study of scholasticism. Early Science and Medicine 3: 157-185.

Taavitsainen, Irma and Päivi Pahta (eds) (2004). Medical and Scientific Writing in Late Medieval English. (Studies in English Language.) Cambridge: Cambridge University Press.

Taavitsainen, Irma, Päivi Pahta and Martti Mäkinen (compilers) (2005). Middle English Medical Texts. CD-ROM with MEMT Presenter software by Raymond Hickey. Amsterdam and Philadelphia: John Benjamins.

Wogan-Browne, Jocelyn, Nicholas Watson, Andrew Taylor and Ruth Evans (eds) (1999). The Idea of the Vernacular: An Anthology of Middle English Literary Theory, 1280-1520. (Exeter Medieval Texts and Studies.) Exeter: University of Exeter Press.