Corpus Finder
- To sort corpora according to any attribute, click on the appropriate column header.
- Use the filters to view a specific selection of corpora.
- For explanations of the table categories, see below.
| Corpus |
Start |
End |
Periods |
Word Count |
Text Samples |
Spoken/
Written |
Annotation |
Format |
Availability |
| HC - Helsinki Corpus |
730 |
1710 |
OE
ME
EModE |
1,572,800 |
450 |
Written |
Other |
CD
Download |
License required |
| SCOTS - Scottish Corpus of Texts & Speech |
1945 |
2007 |
PDE |
4,000,000 |
1177 |
Written & Spoken |
Other
None |
Online |
Open access |
| NECTE - Newcastle Electronic Corpus of Tyneside English |
1969 |
1994 |
PDE |
|
62 |
Spoken |
Other |
Download
DVD |
Free subscription |
| HD - Helsinki Corpus of British English Dialects |
1970 |
1985 |
PDE |
1,008,641 |
187 |
Spoken |
Other |
On-site |
Free subscription |
| CEEC - Corpus of Early English Correspondence |
1403 |
1800 |
ME
EModE
LModE |
5,100,000 |
12,000 |
Written |
Parsing
Other |
CD
Download |
Free subscription |
| CEEC - Corpus of Early English Correspondence / 1998 version |
1410 |
1681 |
ME
EModE |
2,597,795 |
5,961 |
Written |
Other |
On-site |
Free subscription |
| CEECE - Corpus of Early English Correspondence Extension |
1653 |
1800 |
EModE
LModE |
2,219,422 |
4,923 |
Written |
Other |
On-site |
Free subscription |
| CEECSU - Corpus of Early English Correspondence Supplement |
1402 |
1663 |
ME
EModE |
442,484 |
829 |
Written |
Other |
On-site |
Free subscription |
| CEECS - Corpus of Early English Correspondence Sampler |
1418 |
1680 |
ME
EModE |
450,085 |
1,123 |
Written |
Other |
CD
Download |
Free subscription |
| CEEC: PCEEC - Parsed Corpus of Early English Correspondence |
1410 |
1681 |
ME
EModE |
2,159,132 |
4,970 |
Written |
Tagging
Parsing
Other
|
Download |
Free subscription |
| CEEM - Corpus of Early English Medical Writing |
1375 |
1700 |
ME
EModE |
2,300,000 |
320 |
Written |
Other |
CD |
Commercial |
| CEEM: MEMT - Middle English Medical Texts |
1375 |
1500 |
ME |
495,322 |
86 |
Written |
Other |
CD |
Commercial |
| CEEM: EMEMT - Early Modern English Medical Texts |
1500 |
1700 |
EModE |
2,000,000 |
450 |
Written |
Other |
CD |
Commercial |
| CEEM: LMEMT- Late Modern English Medical Texts |
|
|
LModE |
|
|
Written |
|
- |
In preparation |
| CIE - A Corpus of Irish English 14th – 20th c. |
14c |
present |
ME
EModE
LModE
PDE |
|
70 |
Written |
Other |
CD |
From compiler |
| ENPC - The English-Norwegian Parallel Corpus |
1975 |
1995 |
PDE |
2,600,000 |
100 |
Written |
Tagging
Other |
On-site |
Free subscription |
| LOB - The Lancaster-Oslo/Bergen Corpus |
1961 |
1961 |
PDE |
1,000,000 |
500 |
Written |
Tagging
None |
CD |
License required |
| FRED - Freiburg Corpus of English Dialects |
1970 |
1999 |
PDE |
1,011,396 |
121 |
Written |
Other |
On-site
CD
|
Free subscription |
| COERP - Corpus of English Religious Prose |
1150 |
1800 |
ME
EModE
LModE |
|
|
Written |
Other |
- |
In preparation |
| CLMETEV - Corpus of Late Modern English Texts |
1710 |
1920 |
LModE |
15,000,000 |
176 |
Written |
None |
Download |
Free subscription |
| LAEME - A Linguistic Atlas of Early Middle English |
1150 |
1325 |
ME |
816,170 |
167 |
Written |
Other |
Online |
Open access |
| CSC - Corpus of Scottish Correspondence |
1500 |
1715 |
EModE |
256,300 |
719 |
Written |
Other |
Online |
In preparation |
| HCOS - Helsinki Corpus of Older Scots |
1450 |
1700 |
EModE |
834,200 |
71 |
Written |
Other |
CD |
License required |
| ELFA - English as a Lingua Franca in Academic Settings |
2001 |
2008 |
PDE |
1,010,834 |
165 |
Written & Spoken |
|
CD |
License required |
| BROWN - The Brown corpus |
1961 |
1961 |
PDE |
1,000,000 |
500 |
Written |
Tagging
Other
None |
CD |
License required |
| PPCME2 - The Penn-Helsinki Parsed Corpus of Middle English, 2nd edition |
1150 |
1500 |
ME |
1,155,965 |
55 |
Written |
Tagging
Parsing
None |
CD |
License required |
| PPCEME - The Penn-Helsinki Parsed Corpus of Early Modern English |
1500 |
1710 |
EModE |
1,794,010 |
229 |
Written |
Tagging
Parsing
None |
CD |
License required |
| LC - The Lampeter Corpus of Early Modern English Tracts |
1640 |
1740 |
EModE |
1,193,385 |
120 |
Written |
Other |
CD
Download |
License required |
| CED - A Corpus of English Dialogues 1560-1760 |
1560 |
1760 |
EModE |
1,183,690 |
177 |
Written |
Other |
CD
Download |
License required |
| FLOB - The Freiburg-Lancaster-Oslo/Bergen Corpus |
1992 |
1992 |
PDE |
1,000,000 |
500 |
Written |
Tagging
None |
CD |
License required |
| FROWN - The Freiburg-Brown Corpus |
1991 |
1991 |
PDE |
1,000,000 |
500 |
Written |
Tagging
None |
CD |
License required |
| ZEN - Zurich English Newspaper corpus |
1661 |
1791 |
EModE
LModE |
1,600,000 |
349 |
Written |
Other |
Online
CD |
Free subscription |
| YCCQA - Yahoo-based Contrastive Corpus of Questions and Answers |
2006 |
2009 |
PDE |
29,400,000 |
665,000 |
Written |
Other |
Download |
Free subscription |
| DOEC - Dictionary of Old English Corpus |
600 |
1150 |
OE |
3,000,000 |
3060 |
Written |
Other |
CD |
License required |
| SCONE - Seville Corpus of Northern English |
600 |
1590 |
OE
ME |
|
|
Written |
Other |
Download |
Open access |
| VOICE - Vienna-Oxford International Corpus of English |
2000 |
2007 |
PDE |
1,023,043 |
151 |
Spoken |
Other |
Online |
Free subscription |
| BLOB-1931- The Before LOB-1931 Corpus |
1928 |
1934 |
PDE |
1,000,000 |
500 |
Written |
Tagging
None |
- |
In preparation |
| MICASE - Michigan Corpus of Academic Spoken English |
1997 |
2001 |
PDE |
1,800,000 |
152 |
Spoken |
Other |
CD
Download
Online |
Open access |
| MICUSP - Michigan Corpus of Upper-level Student Papers |
2002 |
2009 |
PDE |
2,600,000 |
829 |
Written |
Other |
Online |
Open access |
| PPCMBE - The Penn-Helsinki Parsed Corpus of Modern British English |
1700 |
1914 |
LModE PDE |
948,895 |
101 |
Written |
Tagging Parsing None |
CD |
License required |
| BE06 - British English 06 |
2003 |
2008 |
PDE |
1,010,996 |
500 |
Written |
Tagging |
Online |
License required |
| Small Corpus of Political Speeches (SCPS) |
1789 |
2010 |
PDE |
655,479 |
239 |
Written & Spoken |
Tagging |
On-site |
License required |
| COOEE - Corpus of Oz Early English |
1788 |
1900 |
LModE |
2,000,000 |
1353 |
Written |
Other |
- |
Free subscription |
| A Corpus of late 18c Prose |
1761 |
1790 |
LModE |
300,000 |
1827 |
Written |
None |
Download |
Free subscription |
| The John Swales Conference Corpus (JSCC) |
2006 |
2006 |
PDE |
100,000 |
23 |
Spoken |
None |
Download |
Open access |
| The Middle English Grammar Corpus (MEG-C) |
1350 |
1500 |
ME |
450,000 |
320 |
Written |
Other |
Download |
Open access |
| The York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE) |
- |
- |
OE |
1,500,000 |
100 |
Written |
Tagging |
Download |
Free subscription |
| International Corpus of English - the British component (ICE-GB) |
1990 |
1993 |
PDE |
1,061,264 |
500 |
Written & Spoken |
Tagging Parsing
|
CD |
License required |
| The London-Lund Corpus of Spoken English (LLC) |
1953 |
1987 |
PDE |
500,000 |
100 |
Spoken |
Other |
CD |
License required |
| The Corpus of Contemporary American English (COCA) |
1990 |
2009 |
PDE |
400,000,000 |
- |
Written & Spoken |
Tagging |
Online |
Free subscription |
| Helsinki Corpus of Regional English Speech (HARES) |
1970 |
1980 |
PDE |
|
|
Spoken |
Other |
Download |
License required |
| A Representative Corpus of Historical English Registers (ARCHER) |
1659 |
1999 |
EModE
LModE
PDE |
1,789,309 |
955 |
Written |
None |
On-site |
Not available |
| TIME corpus |
1923 |
2009 |
PDE |
100,000,000 |
275,000 |
Written |
Tagging |
Online |
Free subscription |
Corpus Finder categories
Corpus |
Some corpora consist of subcorpora (CEEC, CEEM). In these cases both the entire corpus and the subcorpora have been listed; the subcorpora are indented. |
Start, End, Periods |
The period labelling follows roughly the categorisation below unless a particular period is specified in the name of the corpus. |
OE |
Old English c. -1300 |
ME |
Middle English c. 1300-1500 |
EModE |
Early Modern English c. 1500-1700 |
LModE |
Late Modern English c. 1700-1900 |
PDE |
Present Day English 1900- |
Word count, Text samples |
Left empty when the word count or number of text samples is unknown. |
Spoken/Written |
Shows whether the corpus material is from written sources, recorded speech or both. |
Annotation |
Tagging |
Part-of-speech annotation |
Parsing |
Syntactic annotation |
Other |
Annotation of, e.g., discursive features, text structure,
phonetic features, orthography, etc. |
None |
|
Format |
CD/DVD |
The corpus is distributed on a disc. |
Download |
The corpus can be downloaded from the internet. |
Online |
The corpus is accessible online without downloading. |
On-site |
The corpus can only be accessed locally. |
Availability |
Open access |
The corpus is can be freely used by anyone. |
Free subscription |
The corpus is free to use but requires a subscription. |
Licence required |
A paid subscription is required. |
Commercial |
|
In preparation |
|
Not available |
The corpus is not available to external users for copyright reasons. |
Javascript for the Corpus Finder table by Max Guglielmi (http://tablefilter.free.fr/). |
|
|