The Middle English Grammar Corpus (MEG-C)

The Middle English Grammar Corpus (MEG-C) consists of samples of Middle English texts, transcribed from manuscript or facsimile reproduction. Shorter texts are included in their entirety, and longer ones in 3000-word samples. In the first instance, we include texts localised in the Linguistic Atlas of Late Mediaeval English, from the period 1350–1500. However, the Corpus will eventually also cover earlier texts, as well as texts showing non-regional varieties of Middle English.

Project leader: Merja Stenroos, University of Stavanger
Time of compilation:
Size: 450,000 words (version 2009.1)
Language: English
Number of texts/samples: 320 (version 2009.1)
Period: 1350–1500
2008 (version 1.0); 2009 (version 2009.1)
Funding: Norwegian Research Council; University of Stavanger; AHRB; University of Glasgow
Reference lines and copyright

"MEG-C Base, version 2009.1", The Middle English Grammar Corpus, Merja Stenroos, Martti Mäkinen, Simon Horobin, Jeremy Smith (compilers), December 2009, University of Stavanger, accessed [date], < _project/meg-c_base/>.


Stenroos, Merja & Martti Mäkinen. 2009. MEG-C Corpus Manual – version 2009.1.

Merja Stenroos, Martti Mäkinen, Simon Horobin, Jeremy Smith

The following people have taken part in transcribing and proofreading the texts for MEG-C:

Vibeke Jensen
Christina Jerez Delgado
Simon Horobin
Mari Munthe Landsnes
Eleanor Lawson
Martti Mäkinen
Nedelina Vasileva Naydenova
Cerwyss O’Hare
Jeremy J. Smith
Merja Stenroos
Hildegunn Støle
Kjetil Vikhamar Thengs
Judith Youngson


The Corpus is provided in three different formats.  The "base" version, MEG-C Base preserves our own coding and commentary, and gives the fullest information about the text as it appears in the manuscript. MEG-C Html is published in two versions, both as html and pdf files, and they are designed for easier reading and browsing.  Both the text files of MEG-C Base and the pdf files of MEG-C Html  may be downloaded as zip archives.

MEG-C is still in preparation, but in the meantime, the corpus is made available to the research community in its unannotated form. A preliminary version was made available on the project website in December 2007, and MEG-C version 1.0 was launched in April 2008. The corpus will be updated regularly as more texts are added; however, each published version will remain available. Apart from the present Manual, the corpus is accompanied by a Catalogue of Sources, available on the corpus web site, which will also be updated for each version.

Technical information

There is no search function implemented on the web site yet. The recommendation is that the text files are downloaded and then used with text processing or corpus software of one's choice. The downloadable files of MEG-C Base are UTF-8 encoded and the end-of-line coding follows the UNIX format. However, the files are ASCII compatible: we use only the first 127 characters of the UTF-8 set, and those are identical with the first 127 characters in the basic ASCII set. Therefore the text files are suitable for any concordancing program that can digest ASCII, e.g. such as AntConc or WordSmith. As the transcription methods distinguish between upper and lower case letters for several purposes, we advise that the chosen program support case sensitivity.

Associated projects

Middle English Grammar Project (MEG)