The Malaga Corpus of Late Middle English Scientific Prose

The corpus contains the results of the project "Electronic Corpus of Mediaeval English Manuscripts: Scientific and Technical Texts", developed at the University of Málaga in collaboration with researchers from the universities of Murcia, Oviedo, Jaén and Glasgow. The project has a twofold objective: a) the electronic edition of hitherto unedited late Middle English scientific manuscripts housed in the Hunterian Collection (Glasgow University Library) and the Wellcome Collection (The Wellcome Library, London), displaying the digitized images along with the corresponding diplomatic transcription; and b) the compilation of an annotated corpus of late Middle English Fachprosa from this material, the annotation comprising mark-ups, lemmas and POS-tags.

Two versions of each text are available:

  • A plain text version
  • An annotated and POS-tagged version (Excel spread-sheets)

Project leader: Javier Calle-Martín, Antonio Miranda-Garcia (until 2012)
Time of compilation: 2012-2015 (estimated)
Size: Aprox. 1 million words.
Language: Late ME
Period: 14th - 15th Century
Project home page:
Funding: The Spanish Ministry of Science and Innovation (reference FFI2011- 26492)


Available on project homepage.


Javier Calle-Martín, University of Málaga
Antonio Miranda-García , University of Málaga
David Moreno-Olalla , University of Málaga
Juan Camilo Conde-Silvestre, University of Murcia
Laura Esteban-Segura, University of Murcia
Teresa Marqués-Aguado, University of Murcia
Graham D. Caie, University of Glasgow
Santiago González Fernández-Corugedo, University of Oviedo
Alejandro Alcaraz-Sintes, University of Jaén


Open access on request. Work still in progress.

Available online at

Reference line and copyright

© Antonio Miranda García, Javier Calle Martín, David Moreno Olalla, Santiago González Fernández Corugedo, Graham D. Caie, 2007-15. All rights reserved.


Esteban-Segura, Laura and Teresa Marqués-Aguado. 2013. “New Software Tools for the Analysis of Computerized Historical Corpora: GUL MSS Hunter 509 and 513 in the Light of TexSEn”. In Vincent Gillespie and Anne Hudson (eds.). Probable Truth. Editing Medieval Texts from Britain in the Twenty-First Century. Turnhout: Brepols. 405-426.

Calle-Martín, Javier and Antonio Miranda-García. 2012. "Compiling the Malaga Corpus of Late Middle English Scientific Prose". In Nila Vázquez (ed.). Creation and Use of Historical English Corpora in Spain. Newcastle upon Tyne: Cambridge Scholars Publishing. 51-65.

Calle-Martín, Javier et al. 2012. "The Reference Corpus of Late Middle English Scientific Prose". Proceedings of KONVENS 2012 - The 11th Conference on Natural Language Processing. Vienna: University of Vienna. 424-432.

Calle-Martín, Javier and Antonio Miranda-García. 2011. "From the Manuscript to the Screen: Implementing Electronic Editions of Mediaeval Handwritten Material". Studia Anglica Posnaniensia 46.3: 3-20.

Moreno-Olalla, David and Antonio Miranda-García. 2009. "An Annotated Corpus of Middle English Scientific Prose: Aims and Features". In Javier Díaz Vera and Rosario Caballero Rodríguez (eds.). Textual Healing: Studies in Middle English Medical, Scientific and Technical Texts. Bern, Berlin, Bruxelles, Frankfurt am Main, New York, Oxford and Wien: Peter Lang. 123-141.