All three subcorpora of the Corpus of Early English Medical Writing follow the same overall principles of compilation.
(1) Texts are selected according to extralinguistic criteria, the objective being ideal coverage of the various subgenres of medical writing relevant to the time period. Lists of text available from a variety of sources were compiled, and medical historians were consulted. Care was taken to ensure that all texts selected for keying-in were the earliest available edition of a particular title.
(2) Where possible, all texts are included as 10,000 words extracts. Shorter texts are inluded in toto. Prefatory material (e.g. prefaces, epistles, contents pages, etc.) was excluded from the corpus, but a selection was keyed-in for in-house use. The title page was always included.
(3) Keying in was performed primarily by research assistants. For MEMT, most source texts were scholarly editions; for EMEMT, facsimile copies of orinals. At this stage, some originals were found to be unusable due to tight binding and corruption due to foxing in the original or problems in photography. If the overall quality of the text was deemed acceptable, the keying-in was performed with the understanding that problems could be corrected at a later stage. At this stage, annotation was added to mark missing and blurred passages as well as various types of metatextual information.
(4) An initial double proofreading process followed, in which the research assistant keying-in a text would proofread his/her own work, and the hand it over to another research assistant for a second round of proofreading.
(5) All texts were checked against the originals at source libraries by project leaders and postgraduate researchers. Care was taken to ensure that the original text was the same copy used for the facsimile. During the checking most of the blurred passages could be corrected.
(6) Catalogue information for each text was compiled during and after the checking.
Most of the texts in CEEM corpora are to be found at the British Library (London), the Wellcome Medical Library (London), Cambridge University Library and King's College Library (Cambridge), Bodleian Library (Oxford), Henry E. Huntington library and Art Gallery (San Marino, California), Trinity College Library (Dublin), Glasgow University Library, and Yale Medical Library. The members of the Scientific thought-styles project wish to express their gratitude to the staff at all the libraries visited during the compilation work.