Series title: Studies in Variation, Contacts and Change in English
Volume 14 – Principles and Practices for the Digital Editing and Annotation of Diachronic Data
Publication date: 2013

The Corpus of Early English Medical Writing (1375–1800) – a register-specific diachronic corpus for studying the history of scientific writing

Irma Taavitsainen and Päivi Pahta
Research Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki


This article deals with the Corpus of Early English Medical Writing (CEEM) that consists of Middle English Medical Texts (MEMT), Early Modern Medical Texts (EMEMT), and Late Modern English Medical Texts 1700-1800 (LMEMT).

CEEM timeline

CEEM is a register-specific diachronic corpus, designed to serve as research material for the diachronic study of English as the special language of science and medicine in its disciplinary embedding. The corpus contributes to the new digital infrastructure for linguistic research by making available a carefully selected body of texts. The first two sub-corpora have been publicly available for a number of years, and have already proved useful not only for linguistic research but also for philological, medical and cultural study of the register. CEEM reflects a broad view of science and medicine with fuzzy borders as it takes the medieval, early modern and late modern views into account. MEMT contains a great deal of material that was earlier unknown to researchers and pushes the timeline of English scientific writing back to the large pan-European boom of the vernacularization of science that began in the late medieval period (c. 1375- ); traditionally the beginnings of English scientific writing were placed to the Royal Society period in the seventeenth century.  EMEMT introduces several novel features to corpus design and makes contextualization and multimodal approaches possible. It also includes normalized versions of texts for the application of advanced corpus linguistic tools. In the late modern period the volume of available material increases considerably. This poses an interdisciplinary challenge and collaboration with medical historians is necessary to ensure representativeness of corpus data. LMEMT will follow the principles set by earlier work and provide a representative corpus of c. 2 million words.

1. Introduction

New computerized databases have revolutionized our ideas about language, and empirical work on authentic source material of the early stages of scientific writing has proved older views one-sided and even false. Language histories tend to place the beginnings of scientific writing to the seventeenth century and emphasize the role of the Royal Society in the formation of writing conventions, paying tribute to the breakthrough of English as the language of science in England in c. 1700. [1] The following quotation from 1993 illustrates this:

The seventeenth century saw the triumph of the scientific outlook in England, and the sciences have had a pervasive influence on the language and the way it has been used in the past three hundred years. As we have seen, Latin gave way to English as the language of science and scholarship. The rise of scientific writing in English helped to establish a simple referential kind of prose as the central kind in Modern English. … and what we may call the plain style became central, the background against which other kinds of prose were read. The plain style is not of course confined to science, but is found in all kinds of expository writing … (emphasis original; Barber 1993: 214–215)

The large pan-European boom of vernacularization of scientific writing in the late medieval period (1375– ) has only been discussed more extensively in linguistically-oriented research from the 1990s onwards. Yet the old views are persistent. As late as in 2008, David Banks writes:

If one were to select a moment as being the time when scientific English first came into being, then the late seventeenth century would be the primary candidate. Up to that point, virtually all scientific writing had been in Latin, the academic lingua franca of the age, but from that moment on, more and more scientific writing was published in the vernaculars of the period. (Banks 2008: 23)

He continues with the view presented by Halliday in 1988, attributing credit to Isaac Newton as “the person who first gave scientific English its letters of noblesse…” and mentioning Chaucer’s Treatise on the Astrolabe as a rare representative of earlier scientific writing. [2]

But scientific texts were in fact written in English much earlier, often in a style that cannot be called “plain”, as shown by the following example, which is a passage that comes from a fifteenth-century vernacular translation of a Latin theoretical text:

Example 1
[}Incipit Liber Cerebri}]
[\f. 40v\] The brayn naturaly is cold and moist to
suscepcioun or takyng and therfor as of dyuers lightly it
turnyth, and as to moevyng membris leuyth and yevith the
mobilite, and as to calefaccion and drie spirite to the
hede of temperat breth bryngith in of whos myngis is
cold, drie and thikke. Withyn whom forsoth bien divisions
or departynges. In the first is saide fantastic, the
secund racional, in the thridde memorial. Withyn
fantastic and racional is a cloth sumwhat cold and drie
and thikker in so moche, whiche dividith racional and
memorial havyng in hym a litel weike and thyn flessh. Of
memorial forsoth proceden 2 weike canals and moist as the
mary in the spindel of the bak, whiche percen al the
compact made, and comen vnto the fantastic testule bi
the whiche fantastic spirite and racional willith to
commende memory, and eftsones of memorial to leede to
racional and fantastical.

The text is known as De humana natura, and the extract in example 1 deals with the anatomy and physiology of the brain. The English translator of the Latin text had difficulties in rendering the ideas in the vernacular, as English did not have the terminology or stylistic and grammatical conventions necessary for expressing sophisticated theoretical concepts and their relationships. In the early phases of vernacularization these conventions were often adopted directly from Latin, and thus, in many cases, some familiarity with the Latin source text is necessary for interpreting meanings in the vernacular version.

The catalogue entry of De humana natura in the corpus of Middle English Medical Texts (2005) gives the following description of the text:

MS: Trinity College Cambridge R.14.52, ff. 40v-44r.
Carrillo Linares, M.J.: An Edition with an Introduction, Notes and a Glossary of the Middle English Translation of the Latin Treatise De humana natura.
Unpublished Licentiate Thesis.
University of Sevilla, 1993.
pp. 54-91.
3,546 words.

eVK: 6635
Keiser: --

Description: De humana natura is a literal and heavy-handed Middle English translation of a Latin text known as De humana natura or De natura humana. The original text has been ascribed to various authors, including Galen, Hippocrates and Constantinus Africanus. The text deals with the main organs of the body and their development in a fetus. The Middle English version survives in a unique late fifteenth-century copy; in the manuscript, the text bears the title Liber cerebri. The manuscript was written by a scribe known as the "Hammond scribe", whose hand can be identified in at least 15 manuscripts or parts of manuscripts… Two other texts from the same manuscript and by the same translator are contained in MEMT (De spermate and De XII portis).

De humana natura is one of the texts included in the Middle English part of the historical corpus introduced in this article, the Corpus of Early English Medical Writing (CEEM hereafter). In addition to this text, the corpus contains other early vernacular medical texts from the fourteenth and fifteenth centuries, thus pushing back the beginnings of English medical and scientific writing three centuries from the earlier point of departure in scholarship, described in the opening quotation from Barber (1993). At the other end of the timescale, CEEM currently contains material up till 1700 and is being extended to 1800. When completed, the corpus will provide a representative sample of English medical and scientific writing of over four million words, readily available for a variety of research purposes for a multidisciplinary scholarly community, including linguists, philologists, and historians of science and medicine.

2. Scientific thought-styles and the plan for the Corpus of Early English Medical Writing

CEEM was originally planned to provide material for our project on Scientific thought-styles: The evolution of English medical writing, launched in 1995 (see the project description). The aim of the project was to discover what linguistic features were used in the writings of different periods in the history of science, characterized by different scientific ideologies and ways of doing science, i.e. scientific thought-styles. Our special interest in the initial phases, stemming from general histories of science, was to map linguistic differences between scholasticism and empiricism. We soon discovered that linguistically the issue was much more complicated than the history of science and language histories had led us to believe, and that there were several different styles even in the early period. Nevertheless, the core features of scholasticism could be readily identified. They include various types of prescriptive phrases and heavy reliance on authorities. This is shown in medieval English texts, for example, in the frequent use of references to and quotations from ancient Greek and Arabic authors (Taavitsainen & Pahta 1998).

The following extract is taken from a Middle English surgical treatise with several intertextual layers. It is written in scholastic style, and transferred from Latin into the vernacular for the first time in this period.

Example 2
[\f. 11ra\] þe anothamye, þe which I may proue
bi þre skillis. The firste bi auctorite.
The secunde bi similitude. The
.iij. bi resoun. Bi auctorite as þus, as
Galien witnessiþ in Metateigni, capitulu=m= 1=m=, where
he puttiþ þe cure of woundis &c. seiynge
on þis wise. It is nedeful a
sirurgian keruere to knowen þe anothamye;
þat he leue <or gesse> þe brode ligament
to be þe skyn in þe round
ligament to be <a> senewe; and so he
falle in-to errour in his worchinge,
&c / Bi similitude it mai be openly
proued bi discrecioun, for in þe same wise;
kutte=þ= an vnkunnynge sirurgian
in a mannys bodi þat knowiþ not þe
anothamye, as doiþ a carpenter þat
knowiþ not his craft whanne he
kerueþ in tree / Bi resoun þus: þer
may no craftis-man worche regulerly
in suget or matir þat he knowi=t=h no=t=.

The catalogue entry of the text in the corpus of Middle English Medical Texts contains the following information:

MS: Wellcome Medical Library 564, ff. 10ra-22ra.
Grothé, R.: Le ms. Wellcome 564: Deux traites de chirurgie en moyen-anglais.
Unpublished PhD Thesis.
University of Montreal, 1982.
pp. 1a-25a.
11,036 words.

eVK: 936
Keiser: 256

Description: The name of the text Chirurgie de 1392 is given by the editor to the first of the two surgical treatises included in MS Wellcome 564 (ff. 10a-128a). The text has also been called the Compilacioun of sirurgie. This text is exceptional among Middle English surgical treatises as it is not a direct translation of Latin but a more original compilation from various sources, including passages of Lanfrank and Mondeville, interspersed with more personal observations (Rawcliffe 1995: 132). According to the table of contents, some parts are missing from the manuscript. The compiler was a London surgeon, possibly also a physician, working in 1392. The text has also been edited by R.N. Mory (1977).

In contrast, the top genres of empiricism, such as the experimental essay and accounts of scientific discoveries, are expressed with hedges of uncertainty and caution in first-person expressions displaying modality (see, e.g. Taavitsainen 2001).

Example 3
Now as to the case of the young Gentleman before mentioned, I supposed either the Muscle by that convulsive starting Motion in the Womb to be overstrained, and to have lost its Action; or the Membrane by that greater aperture of the Organ to be over-stretched, and afterwards to remain so flaccid, that it was beyond the activity of the Muscle and Curviture of the Ossicles to give it a due Tension; or peradventure there was a concurrence of both Causes. Which due tension, if by any remedy it might be restored I assum'd, that he might recover his hearing in that Ear: To which end, I advised the Excellent Lady his Mother, to consult with Learned Physitians, if by some adstringent Fumes, or otherwise he might find help. (PT 1668_PT3_665-8)

These examples in many ways represent the opposite ends of the stylistic continuum and illustrate the contrast between the medieval scholastic and early-modern empirical styles of scientific thinking. Our research based on CEEM to date has, however, shown that this initial idea of a strict correlation between time and style of writing that served as our starting point is an oversimplification, and the picture has already proved more dynamic and diversified than was previously known. For example, features of learned medieval scholastic style are employed for persuasive ends in more popular texts in the Early Modern period, lending an aura of learning to them. At the same time general references that indicate vagueness, e.g. “learned men”, in medieval texts for heterogeneous audiences acquire new meanings in the new discourse communities of the seventeenth century where the members knew exactly who were referred to with the same phrase (Taavitsainen 2009).

3. Description of the Corpus of Early English Medical Writing (CEEM)

CEEM is a register-specific diachronic corpus, purpose-designed to serve as research material for studying the long-term diachronic evolution of English as the special language of science and medicine in its disciplinary embedding. With our corpus work we contribute to the digitized databases, the new infrastructure for linguistic research, by making a carefully selected body of texts available to other scholars, with a purpose-built corpus tool (for corpus compilation principles, see e.g. McEnery et al. 2006). The corpus consists of three subcorpora: Middle English Medical Texts (MEMT, 1375–1500), Early Modern English Medical Texts (EMEMT, 1500–1700) and Late Modern English Medical Texts (LMEMT, 1700–1800); the first part was released in 2005, the second in 2010, and the third is in preparation. The composition of each subcorpus reflects the domain-specific extralinguistic reality of English medical writing during the relevant time period, realized in the growing bulk of data in the subcorpora over time.

CEEM reflects a broad view of science and medicine with fuzzy borders. Science includes knowledge of the order of the world, natural phenomena, and the laws that govern our existence, and this fundamental aspect of culture is reflected in extant written documents. However, what counts as science and scientific writing has undergone fundamental changes in the course of time. In medical writing the field has become narrower in modern times with the introduction of a strict borderline between science and pseudo-science, though historically this line is fuzzy. Astrology is a good example of the vacillation of disciplines between science and pseudo-science. In medieval science, astrology, coupled with astronomy as one area of knowledge, was progressive and mainstream with applications for several domains. In medicine, it provided the means of defining, for example, the correct time for taking medicine, performing operations, or conceiving a healthy child. Medical astrology was at its strongest in the sixteenth century, after which it began to lose its scientific credibility, but the same doctrines, with similar applications, have continued to circulate in popularized medical writings aimed at non-professional audiences for centuries after this change. CEEM contains texts that, according to modern criteria, can be considered borderline cases.

The corpus also reflects a comprehensive view of the domains of science and medicine in another respect, as it includes material pertaining to different traditions of writing aimed at varying target audiences. The texts range from the strata of highest learning, originally circulating among the most highly-educated medical professionals, to practical health guides and other instructions written for the general public. This feature of the corpus design is also related to the specific nature of medicine among sciences, with its special position between theory and practice: on the one hand, medicine is an area of knowledge, a science and a learned university discipline, and on the other hand, an art, an occupation involving the application of technical skills and knowledge to medical praxis. Medical education comprised general theories and specialist doctrines, but also practical applications through apprenticeship. What is new in our corpus design is that we have incorporated contextualizing information into our database to facilitate the anchoring of texts to audiences and levels of authors’ education and other extralinguistic factors.

Compiling a special-language corpus presents an interdisciplinary challenge, as it requires special knowledge of the domain in question, in this case, the history of science and medicine. We have met this challenge in two ways. Firstly, all members of the corpus team have familiarized themselves with studies in the history of science and medicine, with some division of labour for developing complementary types of expertise. Secondly, we have cooperated with historians of science and medicine, and consulted them, for example, on our text selections to ensure the representativeness of the corpus data in view of the extant text population within the field.

3.1 Middle English Medical Texts (1375–1500)

MEMT Corpus Cover

The corpus of Middle English Medical Texts (MEMT) was published on CD-ROM in 2005 (Taavitsainen et al. 2005). The corpus software included on the CD, a purpose-built software program MEMT Presenter, was designed by Raymond Hickey. MEMT includes about half a million words of running text, containing a representative sampling of vernacular English writing in the domain of medicine from the scholastic period, c. 1375 to 1500, including some medical verse texts, and a small appendix of trilingual Latin-French-English writings from c. 1330. The corpus comprises 86 texts of varying length; most samples are c. 10,000 words, while short texts are included in toto.

In the corpus structure, organized in MEMT Presenter into a tree view, the medical prose texts, which form the main bulk of the corpus data, are divided into three categories. These categories represent a widely accepted tripartite division of Middle English medical texts into three traditions of writing first suggested by Voigts (1982, 1984), i.e. academic writing, surgical writing, and remedybooks. In our corpus compilation, we adopted this three-part division, but modified it on the basis of our own knowledge of the field. In MEMT the text categories are labelled surgical texts, specialized texts, and remedies and materia medica.

Figure 1

Figure 1. Distribution of texts in the Middle English Medical Texts corpus.

Surgical texts and specialized treatises belong to the learned traditions of medical writing, going back to academic origins, with emerging vernacular adaptations and translations from the last quarter of the fourteenth century onwards. These texts represent a number of genres mainly derived and transferred from Latin writings, including learned genres like commentaries, compilations, questions and answers, as well as consilia and practica, which were basically case studies. The remedybook tradition and materia medica consists of recipes, advice for a better life in prose and verse, various regimen texts as well as prognostications and charms, which verge on the occult. Texts in this category have mixed origins, including both learned texts from classical sources and medical lore with some traits going back to Old English (Rubin 1974, Voigts 1984).

The MEMT CD-ROM contains an extensive introduction to the corpus, including an account of the compilation principles and coding conventions. The text catalogue, which forms an important part of the introduction, provides the corpus user with a detailed description of each individual corpus text and its textual and historical background. Each catalogue entry contains the bibliographical information of the source text, identification numbers of the text in some major reference works, and some background information about the text. Initially we collected these facts for our own research purposes from introductory chapters to the editions, published research, manuscript catalogues, and various other sources, but we decided to make them available on the CD for the benefit of other corpus users as well. The catalogue entries in section 1 and section 2 above serve to illustrate this feature of metadata in MEMT.

In addition to describing the corpus and the individual corpus texts, the introduction also includes a general description of Late Middle English medical writing, which helps the end user to understand the special nature of the historical and disciplinary context in which the corpus texts were originally written and used. There is also an account of research based on MEMT at the time of publication. Finally, the CD contains a manual of the MEMT Presenter program.

MEMT is primarily based on printed editions. This was a necessity as the material contains several layers of transmission and is too complex to be transcribed directly from the manuscripts, as it requires painstaking research into the origins to make sense of the text (see e.g. Pahta 1998 and Tavormina 2006). In addition to published editions, we included all unpublished theses of various universities that we knew of and were granted permissions to publish; these texts are not readily available elsewhere. The process of copyright acquisition was time-consuming as copyright holders had to be traced. [3] In some cases, publication fees were so high that only short extracts could be included. However, when judging critically the overall representativeness of the corpus in the last phase before its completion, we decided to include some manuscript work even though we are acutely aware of the difficulties. Thus some important texts were transcribed for the first time, including Galen’s De ingenio sanitatis (Methodus medendi), the only extant Middle English translation of Galen’s authentic writing, and Thesaurus pauperum in dialogue form. In our compilation process, a great deal of checking was done to ensure the accuracy of the editions, and we found some important texts lacking in this respect. Since the texts were well-known and deserved to be included, we went back to the manuscripts. At the time of releasing MEMT in 2005, a number of editions of learned theoretical texts in Trinity College, Cambridge, MS R.14.52, one of the most important Middle English codices containing scientific and medical texts, were still in preparation and could not be included. These editions were published in two volumes Sex, Aging and Death in a Medieval Medical Compendium, edited by Tavormina, in 2006. Taking a critical look at MEMT now, several years after its release, our collection of late medieval medical texts would be fruitfully complemented with the editions of these texts to improve the representativeness of learned academic treatises translated and adapted into English in the fifteenth century.

MEMT has been used as research material in a large number of studies by the members of the Scientific thought-styles team; summaries of this research are available in the introduction to MEMT in Taavitsainen et al. (2005), Taavitsainen et al. (2006) and Lehto et al. (2010). Most of this research has been carried out in the general frame of the project, focusing on the specific features of scientific and medical writing. A similar focus on the special language of science and medicine is also found in a study by Biber et al. (2011), examining the uses of prepositional phrases in early English medical prose. Other research based on MEMT includes Alonso-Almeida’s (2009) study of null objects, and Carroll’s (2008) analysis of extender tags, such as and so on, in historical data, and it was also used by Esteban Segura (2011) for studying suffixal doublets in Middle English.

3.2 Early Modern English Medical Texts (1500–1700)

EMEMT Corpus Cover

The second subcorpus, Early Modern English Medical Texts (EMEMT), was released in 2010 (Taavitsainen et al. 2010) on CD-ROM with EMEMT Presenter, a purpose-designed software by Raymond Hickey. Unlike MEMT, the corpus was published with a book, Early Modern English Medical Texts: Corpus Description and Studies (ed. by Taavitsainen & Pahta 2010), which discusses the contents of the corpus and includes a survey of earlier research, some new studies on the corpus data, and a manual of EMEMT Presenter.

EMEMT contains a two-million word representative sample of the entire field of English medical writings (see below) that appeared in print between 1500 and 1700. The decision to include only printed books was necessary for practical reasons. [4] The text selection provides continuity to MEMT, but at the same time also reflects the increasing volume and broadening scope of medical writing in English. The corpus includes c. 230 texts, ranging from theoretical treatises rooted in academic traditions of medicine to popularized and utilitarian texts verging on household literature.

In the corpus structure, the texts are grouped into six text categories that facilitate systematic research into the history of medical writing in its disciplinary context. While the MEMT traditions continue, the scope of writings is different in the Early Modern period. The bulk of material increases and new types of writing emerge. The first category is called general treatises and textbooks (category 1) and consists of texts that claim to include “all physicke” and are often miscellanies of various fields. treatises on specific topics (category 2) is the largest category, subdivided into five subgroups: a) texts on specific diseases; b) texts on specific methods of diagnosis or treatment; c) texts on specific therapeutic substances; d) texts on midwifery and children’s diseases, and e) texts on plague. Categories 3 and 4 continue the remedybook tradition: recipe collections and materia medica covers a wide field from texts bordering on magic to highly sophisticated treatises, while regimens and health guides focus on preventative medicine and are equally varied. surgical and anatomical treatises (category 5) is a direct descendant of the respective MEMT category, but there is variation within the texts. Samples of the first scientific journal, philosophical transactions, are given in the sixth category. We have also included an Appendix of “Medicine in society”, consisting of texts which are not strictly speaking medical but include social satires of medical doctors, an allegorical representation of the human body, or literature inspired by medical issues or depicting the hard times of the Black Death, for instance.

In addition to providing samples of texts in their original spelling (proofread thrice and checked against the originals in various scholarly libraries), we have included normalized versions with standardized spelling produced in collaboration with Alistair Baron and Paul Rayson from the UCREL Research Centre at Lancaster University. The analysis was performed automatically with a manually-trained version of the VARD (Variant Detector) software. These normalized versions enable the application of some advanced corpus linguistic methodologies like the key word function or collocational analysis, and they also facilitate lexical studies by securing precision in wordlists. The EMEMT Presenter allows the user to move from the standardized version to the original, which is a necessary precaution to prevent confusion between the different text versions. [5]

It is no exaggeration to say that EMEMT represents a new way of thinking and enters a new phase in corpus compilation in many respects, as we have introduced several novel ideas of corpus design. Our approach is best described as “pragmaphilological” as we consider the cultural, sociohistorical, and communicative contexts necessary for an adequate interpretation of the linguistic data especially in pragmatic, sociolinguistic, and philological studies. As in MEMT, a wealth of background data has been made available to the end user of the CD in an easily accessible way. Each text has a comprehensive catalogue entry of facts gathered from various sources (see below).

Several fundamental changes took place in medical writing during the time span of the corpus 1500–1700 as the world view gradually changed from the medieval Ptolemaic presentation to the Copernican. Horizons broadened to new continents, and people ceased to believe in received knowledge and wanted to discover and verify knowledge for themselves, instead of relying on old books and ancient authorities.

The first half of the fifteenth century up to c. 1550 provides continuation to the late medieval period. New compositions and texts written by known authors on topics that had never before received attention emerged; in the mid-century; for example, John Caius’s The Sweating Sickness was published in 1552. Example 4, our first example of EMEMT texts, The Compost of Ptholomeus  from 1540, illustrates the transition period. The history of the text is connected to the pan-European vernacularization boom in the previous century, but at the same time characteristics pertaining to printed books as a publication form are discernable. The text belongs to the category of regimens and health guides and contains astrological lore. The following commonplace discusses the influences of the skies and predestination and is also found, e.g. in the Wise Book of Astrology and Philosophy, a popular astrological encyclopaedia of the fifteenth century. The style is simple with references like “Ptholomeus saythe” and phrases of certainty such as “with-out doubte it is so” (see Taavitsainen 2009):

Example 4
Wherfore Ptholomeus saythe moreouer,
that of lyuyng or dyeng, the heuenly bodyes
may stere a man both to good & euyll, with-out
doubte it is so. But yet maye man with-stande
it by his owne fre wyll, to do what he
wyll hym selfe, good or bad euermore. And
about the whiche inclynacion is the myght &
wyll of God, that longeth the lyfe of man by
his goodness, or to make it short by iustyce. [^f.C1v^]

The information card of the text in EMEMT text catalogue contains bibliographical facts as well as a brief account of the textual tradition, and the contents of the whole work are given. The information cards also include hyperlinks to various online resources (subject to subscription), which makes it easy for the corpus user to retrieve more information about the text. If a subscription to EEBO is available, the original text can be accessed directly; we have been very particular in this respect and checked the source texts in situ in the respective scholarly libraries.

For the benefit of off-line use and to enhance the visual side of the corpus, we have also included selected facsimile images of the texts to illustrate various points. We were able to obtain permission to publish most of the title-pages, and some other page shots as examples of lay-out and type. The above images of The Compost of Ptholomeus show that black letter type was used. It was easier to read than Roman type and intended for broad and heterogeneous audiences with elementary literacy skills (see Jones 2011 and Suhr 2011). The woodcut illustrations are traditional in their schematic presentation of Ptolemy examining the stars with a female figure, Astronomy, guiding him. The moon has an important role, and suitable equipment, a quadrant and an armillary sphere, give the picture an additional slant to learning and authority.

Example 5, our second extract from EMEMT, comes from Harward’s Plebotomy, published 1601. In the corpus structure the text is included in category 2b, texts on specific methods of diagnosis or treatment. The text relies on humoral theory and begins with a definition. The style of argumentation is different from medieval texts, although authorities are referred to, and an enumerative text strategy is applied. The passage also illustrates our technique of taking marginal notes into account in corpus compilation: they are given in a separate file but marked clearly in the text:

Example 5
[}The first booke of Harwards
The First Chapter.
What Phlebotomy is, and of the foure distinct kinds
and vses thereof.}]
PHlebotomy is the letting out
of bloud by the opening of a
vayne, for the preuenting or
curing of some griefe or infirmitie.
I take in this place
bloud, not as it is simple and
pure of it self, but as it is mingled
with other humours, to wit, fleame, choler,
melancholy, and the tenue serum, which all (as
Fernelius sheweth) as they are conteined together
in the vaynes, are by one word vsually called by
the name of bloud. And although it still fall out
that other humours are also by Phlebotomy euacuated
out of the whole body, yet (as Fuchsius doth
proue out of Galen) it is properly the remedy of
those diseases, which of the ranknes of bloud haue
taken their originall. There are foure seuerall
sorts and vses of letting of bloud. The first is called

[^1. p.1^] Fernel. method. medendi. lib. 2. cap. 1. & 3.
[^2. p.1^] Fuchs. Instit. lib. 2.

Background information is given on the information card of the text for placing the extract in its textual context. A short survey of the author’s education reveals that the writer had received the highest education of the time. Such descriptions give a parameter value to “author’s education”. The audience level can often be indirectly defined as learned authors addressed both learned and more heterogeneous readerships, but texts by less educated authors were not aimed at the highest level of readers, though they may occasionally have read them, too. The hyperlinks included in the information card guide the user to more facts, if needed.

The title-page makes use of different types, and the lay-out is matter-of-fact without illustrations. The pages in the image gallery show the reference techniques of learned writing of the time: text passages are framed with italics and marginal notes are used to indicate precise references.

Our third EMEMT extract in example 6 demonstrates the importance of visual presentation in an anatomical treatise called Microkosmografia from 1615. The advances in the techniques and styles of presentation provide a striking contrast to the early woodcuts. The title of the text reflects the medieval theory of the influences of the macrocosm reflected on the human body, but new knowledge of anatomical details derived from autopsies and observation has brought the contents to a different level of accuracy. With pictures like the ones presented here, the image gallery can provide inspiration for new kinds of study, combining analysis of text with illustrations and other means of visual presentation. Most of the text is directly linked with the pictures, but overall descriptions are also included, as the following passage shows:

Example 6
In the vpper region, are contained
the Animal organs, that is, the braine, which is the seate of the soule, and the original
or fountaine of sence and motion. In the middle region, are contained the vitall parts, and
parts seruing for respiration, as the Heart, the Lungs, and the arteries. In the lower region
are contained all the naturall organs seruing for concoction of nourishment, expurgation,
of excrements and procreation. And therefore the vpper Region is called Animall, the
middle Spirituall, and the lowest Naturall. The vpper is walled about on euery side with
bones, as it were a strong bulwarke or peece for defence, because in it, the soule which is
the Queene of this Little world, keepeth her residence or state. The middle is partly bony,
and partly fleshy: bony, for the strength of the heart, and to frame the cauity; and fleshye,
for the more facile motion of the Systole and Diastole. [^p.63^]

[^7. p.63^] What is contained in each region.
[^8. p.63^] Why the vpper region is bonie.
[^9. p.63^] Why the middle is partly bonie, and partly fleshie.

Our last EMEMT sample, example 7, comes from the last quarter of the seventeenth century, which is already very different and more in accordance with the following century and its thought-style. The following passage can be compared to example (1) from the fifteenth century, an early attempt to discuss the formation of the soul in embryology. The same issue is discussed here from a different point of view. The text belongs to category 4, regimens and health guides, and shows that texts in this category can represent new ways of thinking, although the roots lie deep in the traditions of Western medicine, and the same themes are repeated over and over again (cf. The Compost):

Example 7

But Man having a greater liberty by the
prerogative of his rational Soul, does make
his choice, and wanders amongst varieties
both good and evil, and often deceives
himself, chusing what is destructive to his
Being: So that breaking the Law of Nature,
which he ought to observe as Bounds
and Rules to his actions, making them sanative
and preservative; does on the contrary
alter and change those necessary appointments
and supports; renders them destructive
by his irregular incongruous use, vitious
customs, and imprudent choice.
The most considerable things to be observed
by Man, as conducing and tending to
the lengthening or shortning of his life, according
to their management and procurement,
well or ill, do fall under these Heads.
Meat and drink; place of abode; sleep
and watching; exercise and rest; excretions
and retentions; passions of mind; all usages
and customs.

The information card outlines the author’s professional position in the field of medicine and relates his attitude to controversial issues of his time. The source of this information is directly accessible with a hyperlink (subject to subscription). The image gallery contains a portrait of the author.

EMEMT was released only recently and thus, so far, has mainly been used by the group that compiled it. [6] In addition to the corpus team members, an earlier version of the corpus was made available to an international group of scholars who worked on various research tasks for an edited volume Medical Writing in Early Modern English (Taavitsainen & Pahta 2011). A comprehensive account of this research is available in an article by Lehto et al. (2010), published in the volume accompanying EMEMT. Most studies based on EMEMT are diachronic, examining continuity and change in medical writing over time, specifically in relation to language-external developments in the domain of medicine. Several studies combine MEMT and EMEMT to obtain a longer diachronic line. Other studies compare language use in medical writing with other registers, making use of a variety of electronic corpora or reference tools for comparison. In addition, EMEMT texts were used by the Lancaster VARD team Alistair Baron, Paul Rayson and Dawn Archer (2009, 2011) to discuss spelling variation in Early Modern English. A number of visiting postgraduate students were also allowed to collect data from EMEMT for their PhD projects prior to release.

3.3 Late Modern English Medical Texts 1700–1800

At present the Scientific thought-styles team is working on the corpus of Late Modern English Medical Texts 1700–1800, the third component of our total corpus plan. The eighteenth century provides a challenge in many respects, not least because the material increases in volume and there are overlaps in several directions. The period 1700–1800 represents a transition from the thought-styles of the earlier periods to more modern approaches to medicine, and new developments can perhaps best be categorized as “enquiry as a thought-style”. New fields of medicine include immunology, epidemics, demographic treatises, and institutional texts, and several new scientific journals were established. The above list mentions a few examples of emerging areas that show how the field of medicine moved towards more modern concerns and increasing professionalism. Health issues were also discussed broadly in magazines read by members of polite society.

4. Other register-specific scientific and medical corpora

Several other corpora, completed or in compilation, contain scientific and medical texts, with very little overlap with CEEM. Relatively small samples of scientific writing are contained in diachronic general-purpose corpora, which have not been designed for analysing scientific or medical language specifically, but can be used, for example, as diagnostic tools to provide some suggestions for further enquiry in special-language corpora. The Helsinki Corpus, a case in point, includes twelve scientific texts from the medieval and Early Modern periods, primarily medical writing (see Rissanen et al. 1993; and the CoRD entry). A Representative Corpus of Historical English Registers (ARCHER), a multi-genre corpus of British and American English covering the period 1650–1990, continues the line with medical texts (see e.g. Biber & Finnegan 1997), and can be used together with the Helsinki Corpus to obtain a long and narrow view of the register. An example of a specialized corpus including some scientific texts is the Lampeter Corpus of Early Modern English Tracts, consisting of non-literary tracts and pamphlets (see Schmied & Claridge 1997). Several corpus projects focusing on scientific writing have also been launched in recent years, especially in Spain. These include the Coruña Corpus, compiled by a project on Multidimensional Corpus-based Studies in English (MUSTE) at the University of A Coruña (see Moskowich & Crespo 2012). A subcorpus of eighteenth- and nineteenth-century texts on astronomy has been currently completed, and the compilation of subcorpora on philosophy and life sciences is in progress (e.g. Moskowich-Spiegel 2007). A group of scholars in Gran Canaria are compiling a Corpus of Early English Recipes (Alonso-Almeida, Ortega-Barrera & Quintana-Toledo 2012). Another ongoing project with participants from Glasgow, Malaga and Oviedo is aiming to build an annotated corpus of Middle English Fachprosa from electronic editions of the late medieval scientific manuscripts in the Hunter collection at Glasgow University Library (The Málaga Corpus of Late Middle English Scientific Prose; see e.g. Moreno Olalla & Miranda García 2009). These corpora, when completed, will provide valuable data for further enhancing our view of the history of scientific writing both within the domain of medicine and other disciplines.

5. Conclusion

Both MEMT and EMEMT represent philologically-oriented corpora that provide contextualizing facts to the user. The text selection of both required a great deal of research in charting the field of medical writing, and we condensed the information gathered in the compilation process into the text catalogue entries. This is what makes CEEM very different from other historical corpora. Scientific and medical writing is a complicated field: scientific thought-styles change and find expression with different linguistic features. A number of genres emerged as adaptations or translations in the late medieval period, and continue to live on albeit with changes in the Early Modern period. Some genres have their roots even deeper, in the ancient traditions of knowledge, and new genres are created for the needs of the discourse community and take their place in communicating new knowledge. The target audiences, also varying over time, provide another dimension of variation to the material. The field is rich and offers scope for a variety of approaches. In addition to textual assessments on discourse features, the more traditional fields of syntax, morphology, and lexis have not been explored to any great extent, and new applications of corpus linguistic methodology are made possible but not performed yet with normalized versions. Last but not least, our corpora open up new pathways of research in the multimodal direction as we have enhanced the visual side by including samples of illustrations, title-pages and lay-out of the texts. [7]

The compilation of the third subcorpus, Late Modern English Medical Texts 1700–1800, has the same goal of contextualizing the corpus texts with rich metadata. The new period poses new challenges. The interest in science, particularly in natural sciences, exploded, and with the spread of the empirical approach that had emerged in the previous centuries, the accumulation of knowledge gathered momentum and speeded up specialization. The new knowledge was constructed and disseminated in a rapidly increasing body of texts. For corpus compilers, there is an increasing amount of data to sample, and an increasingly versatile and complex sociohistorical context to consider and record in metadata. The task is not an easy one, nor can it be completed in a short time, but the results obtained so far are encouraging, showing that the efforts are worthwhile: research based on the first two subsections of CEEM indicate that our philologically informed corpus compilation, providing representative and richly contextualized corpus data, yields a nuanced understanding of the long-term diachronic development of a register that has had an important role in the history of the English language.


[1] Old English computational and veterinary texts are often acknowledged in the earlier literature. Some Middle English medical texts of surgery and recipe collections were edited as curiosities as early as the nineteenth century, but the bulk of medieval English medical texts did not receive scholarly attention, not even in philological research and text editing until the 1980s (see below).

[2] For example, nominalization as a distinctive feature of scientific writing is said to originate in Newton’s writings (Halliday 1988). However, a study based on the Early Modern English subsection of the Corpus of Early English Medical Writing shows that the feature is already present in sixteenth-century medical writing (Tyrkkö & Hiltunen 2009).

[3] The reactions to our query were positive, and the editors were pleased to have their work included in the corpus.

[4] Research into manuscripts fell outside our resources; for medical text circulating in manuscripts in this period, see Jones (2011).

[5] A worry brought forth by David Crystal in a workshop on historical corpora at ICAME32 in Oslo on June 1, 2011.

[6] This article was submitted for publication in June 2011.

[7] Three members of the Scientific Thought-styles team have launched a new project to this end. See the contribution by Marttila, Suhr & Tyrkkö in this volume.


ARCHER = A Representative Corpus of Historical English Registers:

CEEM = Corpus of Early English Medical Writing: See the CEEM project description here:

Corpus of Early English Recipes:

Coruña Corpus:

EEBO = Early English Books Online:

EMEMT = Early Modern English Medical Texts 1500–1700. CD-ROM. Software by Raymond Hickey. Early Modern English Medical Texts: Corpus and Studies, ed. by Irma Taavitsainen & Päivi Pahta. Amsterdam/Philadelphia: John Benjamins.

Lampeter Corpus of Early Modern English Tracts:

The Málaga Corpus of Late Middle English Scientific Prose:

MEMT = Middle English Medical Texts: On CD-ROM:

Catalogue entries from the MEMT CD-ROM for Galen’s De ingenio sanitatis (Methodus medendi) and Thesaurus pauperum:
MS: BL Sloane 6, ff. 183r-185r.
Transcribed from the manuscript by Päivi Pahta.
2,110 words.

eVK: 4356, 8098
Keiser: 258
LALME: 1.115

Description: This text is an extract of a Middle English translation of Methodus medendi, originally written by Galen of Pergamon, (c. 130-c. 200), a Greek physician, anatomist, physiologist and philosopher, and one of the most influential physicians of all times. According to Getz (1991b), the English translation was made about 1400 from the version of Galen's text known as De ingenio sanitatis, a twelfth-century Latin translation from Arabic by Gerard of Cremona. The Middle English text (prologue, Book 3 and parts of Book 4) survives in a unique copy in MS Sloane 6, dating from the fifteenth century. The MEMT extract contains the prologue and the beginning of Book 3.


THESAURUS PAUPERUM MS: BL Sloane 3489, ff. 29r-33v.
Transcribed from the manuscript by Irma Taavitsainen and Päivi Pahta.
3,301 words.

Description: This text is an early medical dialogue explaining the basis of humoural medicine, giving medical advice and recipes. For a discussion, see Taavitsainen forthcoming. The text has been edited by Cant (diss. 1973).


Alonso-Almeida, Francisco 2009. “Null objects in Middle English Medical Texts”. Textual Healing: Studies in Middle English Medical & Scientific Texts and Manuscripts, ed. by Javier Díaz et al., 1–25. Frankfurt am Main: Peter Lang.

Alonso-Almeida, Francisco, Ivalla Ortega-Barrera & Elena Quintana-Toledo. 2012. “The Corpus of Early English Recipes: Design and implementation”. Creation and Use of Historical English Corpora in Spain, ed. by Nila Vasquez. Cambridge: Cambridge Scholars.

Banks, David. 2008. “The Development of Scientific Writing”. Linguistic Features and Historical Context. London, Oakville: Equinox.

Barber, Charles. 1993. The English Language: A Historical Introduction. Cambridge: Cambridge University Press.

Baron, Alistair, Paul Rayson & Dawn Archer. 2009. “Word frequency and key word statistics in historical corpus linguistics”. Anglistik: International Journal of English Studies, 20(1), 41–67.

Baron, Alistair, Paul Rayson & Dawn Archer. 2011. “Quantifying Early Modern English spelling variation: Change over time and genre”. Presented at Conference on New Methods in Historical Corpora, University of Manchester 29–30, April 2011.

Biber, Douglas & Edward Finegan. 1997. “Diachronic relations among speech-based and written registers in English”. To Explain the Present: Studies in the Changing English Language in Honour of Matti Rissanen, ed. by Terttu Nevalainen & Leena Kahlas-Tarkka, 253–275. Helsinki: Société Néophilologique.

Biber, Douglas, Bethany Gray, Alpo Honkapohja & Päivi Pahta. 2011. “Prepositional modifiers in early English medical prose: A study on their historical development in noun phrases”. Communicating Early English Manuscripts: Studies in English Language, ed. by Päivi Pahta & Andreas H. Jucker, 197–211. Cambridge: Cambridge University Press.

Carroll, Ruth. 2008. “Historical English phraseology and the extender tag”. Selim 15: 7–37.

Esteban Segura, Laura. 2011. “Suffixal doublets in Late Middle English: -ness vs -ship”. Neuphilologische Mitteilungen 112: 183–194.

Halliday, M.A.K. 1988. “On the language of physical science”. Registers of Written English: Situational Factors and Linguistic Features, ed. by Mohsen Ghadessy. London: Frances Pinter.

Jones, Peter Murray. 2011. “Medical literacies and medical culture in early modern England”. In Taavitsainen & Pahta (eds.), 30–43.

Lehto, Anu, Raisa Oinonen & Päivi Pahta. 2010. “Explorations through Early Modern English Medical Texts: Charting changes in medical discourse and scientific thinking”. In Taavitsainen & Pahta (eds.), 151–166.

McEnery, Tony, Richard Xiao & Yokio Tono. 2006. Corpus-Based Language Studies: An Advanced Resource Book. Abingdon & New York: Routledge.

Moreno Olalla, David & Antonio Miranda García. 2009. “An annotated Corpus of Middle English Scientific Prose: Aims and features”. Textual Healing Studies in Middle English Medical and Scientific Texts and Manuscripts, ed. by Javier Díaz Vera, 123–140. Bern: Peter Lang.

Mory, Robert Nels. 1977. A Medieval English Anatomy. Ph.D. dissertation, University of Michigan.

Moskowich-Spiegel, Isabel. 2007. “Presenting the Coruña Corpus: A collection of samples for the historical study of English scientific writing”. ‘Of Varying Language and Opposing Creed’: New Insights into Late Modern English, ed. by Javier Pérez-Guerra, Dolores Gonzalez-Alvarez, Jorge L. Bueno-Alonso & Esperanza Rama-Martínez, 341–357. Bern: Peter Lang.

Moskowich, Isabel & Begoña Crespo, eds. 2012. Astronomy ‘playne and simple’: The Writing of Science Between 1700 and 1900. Including CD-ROM, the Corpus of English Texts on Astronomy (CETA), compiled by Isabel Moskowich, Inés Lareo, Gonzalo Camiña Rioboo & Begoña Crespo. Amsterdam: John Benjamins.

Pahta, Päivi. 1998. Medieval Embryology in the Vernacular: The Case of De spermate. Helsinki: Société Néophilologique.

Pahta, Päivi & María José Carrillo-Linares. 2006. “Translation strategies in De spermate and De humana natura”. Sex, Aging and Death in a Medieval Medical Compendium: MS Trinity College Cambridge R.14.52, Its Language, Scribe, and Texts, ed. by M. Teresa Tavormina, 95–117. Tucson: Arizona Center for Medieval and Renaissance Studies.

Rawcliffe, Carole. 1995. Medicine and Society in Later Medieval England. Stroud: Alan Sutton.

Rissanen, Matti, Merja Kytö & Minna Palander-Collin, eds. 1993. Early English in the Computer Age: Explorations through the Helsinki Corpus. Berlin & New York: Mouton de Gruyter. Helsinki Corpus on CoRD:

Rubin, Stanley. 1974. Medieval English Medicine. Newton Abbot: David & Charles Ltd.

Schmied, Josef & Claudia Claridge. 1997. “Classifying text- or genre-variation in the Lampeter Corpus of Early Modern English texts”. Tracing the Trail of Time: Proceedings from the Second Diachronic Corpora Workshop, ed. by Raymond Hickey, Merja Kytö, Ian Lancashire & Matti Rissanen, 119–135. Amsterdam: Rodopi.

Suhr, Carla. 2011. Publishing for the Masses: Early Modern English Witchcraft Pamphlets (Mémoires de la Société Néophilologique de Helsinki LXXXIII). Helsinki: Société Néophilologique.

Taavitsainen, Irma. 2001. “Evidentiality and scientific thought-styles: English medical writing in Late Middle English and Early Modern English”. Modality in Specialized Texts: Selected Papers of the 1st CERLIS Conference, ed. by Maurizio Gotti & Marina Dossena, 21–52. Frankfurt am Main: Peter Lang.

Taavitsainen, Irma. 2009. “The pragmatics of knowledge and meaning: Corpus linguistic approaches to changing thought-styles in early modern medical discourse”. Corpora: Pragmatics and Discourse. Papers from the 29th International Conference on English Language Research on Computerized Corpora (ICAME 29), ed. by Andreas H. Jucker, Daniel Schreier & Marianne Hundt, 37–62. Amsterdam & New York: Rodopi.

Taavitsainen, Irma & Päivi Pahta. 1998. “Vernacularization of medical writing in English: A corpus-based study of scholasticism”. Early Science and Medicine 3: 157–185.

Taavitsainen, Irma & Päivi Pahta, eds. 2010. Early Modern English Medical Texts: Corpus Description and Studies. Amsterdam & Philadelphia: John Benjamins.

Taavitsainen, Irma & Päivi Pahta, eds. 2011. Medical Writing in Early Modern English. Cambridge: Cambridge University Press.

Taavitsainen, Irma, Päivi Pahta, Turo Hiltunen, Ville Marttila, Martti Mäkinen, Maura Ratia, Carla Suhr & Jukka Tyrkkö with assistance of Alpo Honkapohja, Anu Lehto & Raisa Oinonen. 2010. Early Modern English Medical Texts 1500–1700. CD-ROM. Software by Raymond Hickey. Early Modern English Medical Texts: Corpus and Studies, ed. by Irma Taavitsainen & Päivi Pahta. Amsterdam & Philadelphia: John Benjamins.

Taavitsainen, Irma, Päivi Pahta & Martti Mäkinen. 2005. Middle English Medical Texts. CD-ROM with MEMT Presenter software by Raymond Hickey. Amsterdam & Philadelphia: John Benjamins.

Taavitsainen, Irma, Päivi Pahta & Martti Mäkinen. 2006. “Towards a corpus-based history of specialized languages: Middle English Medical Texts”. Corpus-Based Studies in Diachronic English, ed. by Roberta Facchinetti & Matti Rissanen, 79–94. Bern: Peter Lang.

Tavormina, M. Teresa, ed. 2006. Sex, Aging and Death in a Medieval Medical Compendium: MS Trinity College Cambridge R.14.52, Its Language, Scribe, and Texts. Tucson: Arizona Center for Medieval and Renaissance Studies.

Tyrkkö, Jukka & Turo Hiltunen. 2009. “Frequency of nominalization in Early Modern English medical writing”. Corpora: Pragmatics and Discourse. Papers from the 29th International Conference on English Language Research on Computerized Corpora (ICAME 29), ed. by Andreas H. Jucker, Daniel Schreier & Marianne Hundt, 297–320. Amsterdam & New York: Rodopi.

Voigts, Linda Ehrsam. 1982. “Editing Middle English medical texts: Needs and issues”. Editing Texts in the History of Science and Medicine: Papers given at the seventeenth annual Conference on Editorial Problems, University of Toronto, 6–7 November 1981, ed. by Trevor Levere, 39–68. New York & London: Garland.

Voigts, Linda Ehrsam. 1984. “Medical prose”. Middle English Prose: A Critical Guide to Major Authors and Genres, ed. by Anthony S.G. Edwards, 315–335. New Brunswick, NJ: Rutgers University Press.