The Middle English Grammar Corpus (MEG-C)
The Middle English Grammar Corpus (MEG-C) consists of samples of Middle English texts, transcribed from manuscript or facsimile reproduction. Shorter texts are included in their entirety, and longer ones in 3000-word samples. In the first instance, we include texts localised in the Linguistic Atlas of Late Mediaeval English, from the period 1350–1500. However, the Corpus will eventually also cover earlier texts, as well as texts showing non-regional varieties of Middle English.
Project leader: Merja Stenroos, University of Stavanger
Time of compilation:
Size: 450,000 words (version 2009.1)
Number of texts/samples: 320 (version 2009.1)
Released: 2008 (version 1.0); 2009 (version 2009.1)
Funding: Norwegian Research Council; University of Stavanger; AHRB; University of Glasgow
Project home page:http://www.uis.no/research-and-phd-studies/research-areas/history-languages-and-literature/the-middle-english-scribal-texts-programme/meg-c/
Reference lines and copyright
"MEG-C Base, version 2009.1", The Middle English Grammar Corpus, Merja Stenroos, Martti Mäkinen, Simon Horobin, Jeremy Smith (compilers), December 2009, University of Stavanger, accessed [date], <http://www.uis.no/research/culture/the_middle_english_grammar
Stenroos, Merja & Martti Mäkinen. 2009. MEG-C Corpus Manual – version 2009.1.
PDF available at
Merja Stenroos, Martti Mäkinen, Simon Horobin, Jeremy Smith
The following people have taken part in transcribing and proofreading the texts for MEG-C:
Christina Jerez Delgado
Mari Munthe Landsnes
Nedelina Vasileva Naydenova
Jeremy J. Smith
Kjetil Vikhamar Thengs
Free to download from the project website.
The Corpus is provided in three different formats. The "base" version, MEG-C Base preserves our own coding and commentary, and gives the fullest information about the text as it appears in the manuscript. MEG-C Html is published in two versions, both as html and pdf files, and they are designed for easier reading and browsing. Both the text files of MEG-C Base and the pdf files of MEG-C Html may be downloaded as zip archives.
MEG-C is still in preparation, but in the meantime, the corpus is made available to the research community in its unannotated form. A preliminary version was made available on the project website in December 2007, and MEG-C version 1.0 was launched in April 2008. The corpus will be updated regularly as more texts are added; however, each published version will remain available. Apart from the present Manual, the corpus is accompanied by a Catalogue of Sources, available on the corpus web site, which will also be updated for each version.
There is no search function implemented on the web site yet. The recommendation is that the
text files are downloaded and then used with text processing or corpus software of one's
choice. The downloadable files of MEG-C Base are UTF-8 encoded and the end-of-line
coding follows the UNIX format. However, the files are ASCII compatible: we use only the
first 127 characters of the UTF-8 set, and those are identical with the first 127 characters in
the basic ASCII set. Therefore the text files are suitable for any concordancing program that
can digest ASCII, e.g. such as AntConc or WordSmith. As the transcription methods
distinguish between upper and lower case letters for several purposes, we advise that the
chosen program support case sensitivity.
Middle English Grammar Project (MEG)
CoRD Entry submitted on April 9, 2010 by ?.
Information for the entry was edited by ?.