A Corpus of late 18c Prose

Unpublished letters transcribed from originals, all written to Richard Orford, a steward of Peter Legh the Younger at Lyme Hall in Cheshire. The corpus was mainly intended to help fill the gap between the major diachronic corpora of English and modern multi-genre corpora. It was also specifically designed to illustrate non-literary English and English relatively uninfluenced by prescriptivist ideas. It would also be of interest to scholars working on dialectal English of the north-west (north Cheshire and south Lancashire in particular).

Project leader: David Denison, The University of Manchester
Time of compilation: 1999–2003
Size: 300,000 words
Language: English
Number of texts/samples: 1827
Period: 1761–1790
Released: 2003
Funding: Two bursaries from the John Rylands Research Institute to pay student researchers
Project home page: http://personalpages.manchester.ac.uk/staff/david.denison/late18c

Reference lines and copyright

The Corpus of late 18c Prose is available without fee for educational and research purposes, but it is not in the public domain. Copyright to the text is retained by the John Rylands University Library of Manchester; copyright to the annotated files is retained by David Denison and Linda van Bergen (© 2002).


None, but see Coding conventions and the file coding.htm, available from David Denison, which is reproduced in van Bergen, Linda & David Denison. 2007. A corpus of late eighteenth-century prose. In Joan C. Beal, Karen P. Corrigan & Hermann L. Moisl (eds.), Creating and digitizing language corpora, 2 vols, vol. 2, Diachronic databases, 228-46. Basingstoke and New York: Palgrave.


David Denison (project leader)
Linda van Bergen
Joana Proud [Joana Soliva]


Free subscription. The corpus is available from the Oxford Text Archive, or directly from David Denison. Note that in the latter case users need to agree formally to the conditions of use by filling out the access request form and returning it via e-mail to David Denison (david.denison@manchester.ac.uk).

Technical information

The corpus is currently in two forms: plain text with COCOA-style annotations, like the Helsinki Corpus (one file, 1.6 Mb), and HTML (three linked files, 804-909 Kb, plus a coding description, 6 Kb). The files are extended (8-bit) ASCII (ANSI/Windows default coding), and the text is coded as far as possible according to the conventions used in the Helsinki Corpus, that is, with COCOA-style brackets giving information on writer, date, page breaks, etc, enclosed within carets. If any scholar would like to request a different coding, such as SGML, or indeed to produce one themself, please get in touch with David Denison at david.denison@manchester.ac.uk.