Placing the Helsinki Corpus Middle English Section Introduction into Context

Irma Taavitsainen and Päivi Pahta

The Helsinki Corpus (HC) was planned and the compilation work of the sections, including Middle English, performed in the mid 1980s. HC was released in 1991 as a general-purpose historical corpus aimed at giving an overall picture of extant language data from Old English up to 1710. The selection of texts reflects the early practices of corpus planning and state of knowledge in the field more than twenty years ago. HC is based on editions that had come out by the time, and new editions have been published since then. The idea of basing corpus texts directly on manuscript sources has been presented more recently, and even now the special features of Middle English pose restrictions (see Taavitsainen, Pahta and Mäkinen 2005, Pahta and Taavitsainen 2004). The principles of preparing manuscript texts for print have undergone changes during the history of editing medieval texts from the nineteenth century. The scope varies from the construction of hypothetical "originals" to faithful transcriptions with minimum editorial intervention. The latest trend is towards electronic editions with multiple layers from transcription to parallel normalised text version to enable the use of advanced corpus software (see Digital Editions for Corpus Linguistics (DECL) homepage

The article introducing the Helsinki Corpus Middle English Section was written in 1990-91 and must be read as a historical document dated to those years. At the time of writing, the four-volume A Linguistic Atlas of Late Mediaeval England (1986, LALME) had just come out. A major international project The Index of Middle English Prose (1984-, IMEP) had been going on for some years, but only few volumes were completed. The Index of Printed Middle English Prose (1985, IPMEP), with bibliographical references to Middle English texts, was available.

Our knowledge of the underlying manuscript reality has improved greatly within recent years. The late medieval period was the time of vernacularisation of genres and registers of writing from the Bible to scientific and utilitarian texts, and the transfer of literary genres (see Wogan-Browne et al. 1999). Vernacularisation was a pan-European phenomenon and took place on a large scale in various countries in the fourteenth and fifteenth centuries (see Crossgrove et al. 1998). For example, by the end of the medieval period, texts on scientific and technological subjects were becoming increasingly common all over Europe in vernacular languages. The extent of the process in English was unknown to us until the Scientific and Medical Writings in Old and Middle English: An Electronic Reference (eVK) compiled by Linda Voigts and Patricia Kurtz was completed in the year 2000. Today, this electronic database is an indispensable source of information about the underlying manuscript reality (see Introduction to MEMT). Another important more recent source of information in the area is George Keiser's volume A Manual of the Writings in Middle English 1050-1500, Vol. 10, Works of Science and Information from 1998. Significant new discoveries in manuscript reality have been made in Early Middle English period with the progress of A Linguistic Atlas of Early Middle English (LAEME) project (see e.g. Laing 2000 and LAEME website). Other, more general major improvements in Middle English research have been implemented by the launch of the electronic dictionaries and databases. The Middle English Dictionary (1956-2001, MED) was earlier available in paper copies, but now accessible on-line as part of the Middle English Compendium, with a HyperBibliography of Middle English Prose and Verse (based on the MED bibliographies). The Compendium also includes a Corpus of Middle English Prose and Verse.

New lines of development have been inspired by HC, and at present the situation is very different from that of the 1990-91 article. For example, the Late Middle English part of HC contains 2000-word samples of a few medical texts. The desideratum for a larger corpus became evident with pilot studies on scientific writing in HC (Taavitsainen 1994). The corpus of Middle English Medical Texts (MEMT) grew out of this need. It consists primarily of editions of medical treatises from c. 1375 to c. 1500 and an appendix of texts written c. 1330, but it also contains some new transcriptions (see MEMT Introduction and Catalogue). In a parallel way, Corpus of Early English Correspondence (CEEC) had its inspiration from the letters included in HC (see Nevalainen and Raumolin-Brunberg 2003).

The article discussed above should be read as a historical text with the date 1990 (the most recent references are to 1990), and we refer the reader to seek newer knowledge on the topic in the reference works mentioned above.


