Corpus Finder

  • To sort corpora according to any attribute, click on the appropriate column header.
  • Use the filters to view a specific selection of corpora.
  • For explanations of the table categories, see below.

 

Corpus Start End Periods Word Count Text Samples Spoken/
Written
Annotation Format Availability
HC - Helsinki Corpus 730 1710 OE
ME
EModE
1,572,800 450

Written

Other CD
Download
License required
SCOTS - Scottish Corpus of Texts & Speech 1945 2007 PDE 4,000,000 1177 Written & Spoken Other
None
Online Open access
NECTE - Newcastle Electronic Corpus of Tyneside English 1969 1994 PDE   62 Spoken Other Download
DVD
Free subscription
HD - Helsinki Corpus of British English Dialects 1970 1985 PDE 1,008,641 187 Spoken Other On-site

Free subscription

CEEC - Corpus of Early English Correspondence 1403 1800 ME
EModE
LModE
5,100,000 12,000 Written Parsing
Other

CD
Download

Free subscription
CEEC - Corpus of Early English Correspondence / 1998 version 1410 1681 ME
EModE
2,597,795 5,961 Written Other On-site Free subscription
CEECE - Corpus of Early English Correspondence Extension 1653 1800

EModE
LModE

2,219,422 4,923 Written Other On-site Free subscription
CEECSU - Corpus of Early English Correspondence Supplement 1402 1663 ME
EModE
442,484 829 Written Other On-site Free subscription
CEECS - Corpus of Early English Correspondence Sampler 1418 1680 ME
EModE
450,085 1,123 Written Other CD
Download
Free subscription
CEEC: PCEEC - Parsed Corpus of Early English Correspondence 1410 1681 ME
EModE
2,159,132 4,970 Written

Tagging
Parsing
Other

 

Download Free subscription
CEEM - Corpus of Early English Medical Writing 1375 1700 ME
EModE
2,300,000 320 Written Other CD Commercial
CEEM: MEMT - Middle English Medical Texts 1375 1500 ME 495,322 86 Written Other CD Commercial
CEEM: EMEMT - Early Modern English Medical Texts 1500 1700 EModE 2,000,000 450 Written Other CD Commercial
CEEM: LMEMT- Late Modern English Medical Texts     LModE     Written   - In preparation
CIE - A Corpus of Irish English 14th 20th c. 14c present ME
EModE
LModE
PDE
70 Written Other CD From compiler
ENPC - The English-Norwegian Parallel Corpus 1975 1995 PDE 2,600,000 100 Written Tagging
Other
On-site Free subscription
LOB - The Lancaster-Oslo/Bergen Corpus 1961 1961 PDE 1,000,000 500 Written Tagging
None
CD License required
FRED - Freiburg Corpus of English Dialects 1970 1999 PDE 1,011,396 121 Written Other

On-site
CD

Free subscription
COERP - Corpus of English Religious Prose 1150 1800 ME
EModE
LModE
    Written Other - In preparation
CLMETEV - Corpus of Late Modern English Texts 1710 1920 LModE 15,000,000 176 Written None Download Free subscription
LAEME - A Linguistic Atlas of Early Middle English 1150 1325 ME 816,170 167 Written Other Online Open access
CSC - Corpus of Scottish Correspondence 1500 1715 EModE 256,300 719 Written Other Online In preparation
HCOS - Helsinki Corpus of Older Scots 1450 1700 EModE 834,200 71 Written Other CD License required
ELFA - English as a Lingua Franca in Academic Settings 2001 2008 PDE 1,010,834 165 Written & Spoken   CD License required
BROWN - The Brown corpus 1961 1961 PDE 1,000,000 500 Written Tagging
Other
None
CD License required
PPCME2 - The Penn-Helsinki Parsed Corpus of Middle English, 2nd edition 1150 1500 ME 1,155,965 55 Written Tagging
Parsing
None
CD License required
PPCEME - The Penn-Helsinki Parsed Corpus of Early Modern English 1500 1710 EModE 1,794,010 229 Written Tagging
Parsing
None
CD License required
LC - The Lampeter Corpus of Early Modern English Tracts 1640 1740 EModE 1,193,385 120 Written Other CD
Download
License required
CED - A Corpus of English Dialogues 1560-1760 1560 1760 EModE 1,183,690 177 Written Other CD
Download
License required
FLOB - The Freiburg-Lancaster-Oslo/Bergen Corpus 1992 1992 PDE 1,000,000 500 Written Tagging
None
CD License required
FROWN - The Freiburg-Brown Corpus 1991 1991 PDE 1,000,000 500 Written Tagging
None
CD License required
ZEN - Zurich English Newspaper corpus 1661 1791 EModE
LModE
1,600,000 349 Written Other Online
CD
Free subscription
YCCQA - Yahoo-based Contrastive Corpus of Questions and Answers 2006 2009 PDE 29,400,000 665,000 Written Other Download Free subscription
DOEC - Dictionary of Old English Corpus 600 1150 OE 3,000,000 3060 Written Other CD License required
SCONE - Seville Corpus of Northern English 600 1590 OE
ME
    Written Other Download Open access
VOICE - Vienna-Oxford International Corpus of English 2000 2007 PDE 1,023,043 151 Spoken Other Online Free subscription
BLOB-1931- The Before LOB-1931 Corpus 1928 1934 PDE 1,000,000 500 Written Tagging
None
- In preparation
MICASE - Michigan Corpus of Academic Spoken English 1997 2001 PDE 1,800,000 152 Spoken Other CD
Download
Online
Open access
MICUSP - Michigan Corpus of Upper-level Student Papers 2002 2009 PDE 2,600,000 829 Written Other Online Open access
PPCMBE - The Penn-Helsinki Parsed Corpus of Modern British English 1700 1914 LModE
PDE
948,895 101 Written Tagging
Parsing
None
CD License required
BE06 - British English 06 2003 2008 PDE 1,010,996 500 Written Tagging Online License required
Small Corpus of Political Speeches (SCPS) 1789 2010 PDE 655,479 239 Written & Spoken Tagging On-site License required
COOEE - Corpus of Oz Early English 1788 1900 LModE 2,000,000 1353 Written Other - Free subscription
A Corpus of late 18c Prose 1761 1790 LModE 300,000 1827 Written None Download Free subscription
The John Swales Conference Corpus (JSCC) 2006 2006 PDE 100,000 23 Spoken None Download Open access
The Middle English Grammar Corpus (MEG-C) 1350 1500 ME 450,000 320 Written Other Download Open access
The York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE) - - OE 1,500,000 100 Written Tagging Download Free subscription
International Corpus of English - the British component (ICE-GB) 1990 1993 PDE 1,061,264 500 Written & Spoken

Tagging Parsing

 

CD License required
The London-Lund Corpus of Spoken English (LLC) 1953 1987 PDE 500,000 100 Spoken Other CD License required
The Corpus of Contemporary American English (COCA) 1990 2009 PDE 400,000,000 - Written & Spoken Tagging Online Free subscription
Helsinki Corpus of Regional English Speech (HARES) 1970 1980 PDE     Spoken Other Download License required
A Representative Corpus of Historical English Registers (ARCHER) 1659 1999

EModE
LModE
PDE

1,789,309 955 Written None On-site Not available
OBC - Old Bailey Corpus 1720 1913

EModE

LModE

14,000,000   Spoken Tagging Online Free subscription
TIME corpus 1923 2009 PDE 100,000,000 275,000 Written Tagging Online Free subscription

 

Corpus Finder categories

Corpus

Some corpora consist of subcorpora (CEEC, CEEM). In these cases both the entire corpus and the subcorpora have been listed; the subcorpora are indented.

Start, End, Periods

The period labelling follows roughly the categorisation below unless a particular period is specified in the name of the corpus.

OE

Old English c. -1300

ME

Middle English c. 1300-1500

EModE

Early Modern English c. 1500-1700

LModE

Late Modern English c. 1700-1900

PDE

Present Day English 1900-

Word count, Text samples

Left empty when the word count or number of text samples is unknown.

Spoken/Written

Shows whether the corpus material is from written sources, recorded speech or both.

Annotation

Tagging

Part-of-speech annotation

Parsing

Syntactic annotation

Other

Annotation of, e.g., discursive features, text structure,
phonetic features, orthography, etc.

None

 

Format

CD/DVD

The corpus is distributed on a disc.

Download

The corpus can be downloaded from the internet.

Online

The corpus is accessible online without downloading.

On-site

The corpus can only be accessed locally.

Availability

Open access

The corpus is can be freely used by anyone.

Free subscription

The corpus is free to use but requires a subscription.

Licence required

A paid subscription is required.

Commercial

 

In preparation

 

Not available

The corpus is not available to external users for copyright reasons.

 

Javascript for the Corpus Finder table by Max Guglielmi (http://tablefilter.free.fr/).