The Lancaster-Oslo/Bergen Corpus (LOB Corpus)
The Lancaster-Oslo/Bergen Corpus (LOB Corpus) is a British English counterpart of the Brown Corpus. Like its American counterpart, it contains 500 texts of c. 2,000 words, distributed across 15 text categories, 9 informative and 6 imaginative. The LOB Corpus exists in two main versions: the original version and a POS-tagged version. The texts were selected by stratified random sampling; see the manual for the original version of the corpus. All the texts are written and were originally published in 1961.
Note this comment in the manual for the original version of the corpus:
The true “representativeness” of the present corpus arises from the deliberate attempt to include relevant categories and subcategories of texts rather than from blind statistical choice. Random sampling simply ensures that, within the stated guidelines, the selection of individual texts is free of the conscious or unconscious influence of personal taste or preference.
Project leader: Geoffrey Leech (project leader), Stig Johansson (project leader), Knut Hofland (head of computing), Roger Garside (head of computing, POS-tagged version). For information on other people taking part in the project, see the manuals.
Time of compilation: original version 1970–1978, POS-tagged version 1981–1986.
Size: app. 1 million words
Number of texts/samples: 500
Released: 1976 (original version), 1986 (POS-tagged version)
Funding (original version): Longman Group Limited, the British Academy, Department of British and American Studies, University of Oslo, Norwegian Research Council for Science and the Humanities, Norwegian Computing Centre for the Humanities
Funding (POS-tagged version): Social Science Research Council, Norwegian Research Council for Science and the Humanities, Norwegian Computing Centre for the Humanities
Reference line and Copyright
The LOB Corpus, original version (1970–1978), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), and Knut Hofland, University of Bergen (head of computing).
The LOB Corpus, POS-tagged version (1981–1986), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), Roger Garside, Lancaster University, and Knut Hofland, University of Bergen (heads of computing).
Original version: Johansson, Stig, Geoffrey Leech, and Helen Goodluck (1978), Manual of Information to Accompany the Lancaster-Oslo/Bergen Corpus of British English, for Use with Digital Computers. Oslo: Department of English, University of Oslo.
POS-tagged version: Johansson, Stig, Eric Atwell, Roger Garside, and Geoffrey Leech (1986), The Tagged LOB Corpus. Users' Manual. Bergen: Norwegian Computing Centre for the Humanities.
The few errata in the original published texts are reproduced without comment in the LOB Corpus. These errata, however, are listed in the Manual of Information under the individual entries of text samples.
Geoffrey Leech (project leader), Stig Johansson (project leader), Knut Hofland (head of computing), Roger Garside (head of computing, POS-tagged version).
For information on other people taking part in the project, see the manuals.
Available for research; distribution and licence through ICAME and the Oxford Text Archive
The Brown Corpus
The Kolhapur Corpus of Indian English
The Australian Corpus of English (ACE)
The Wellington Corpus of Written New Zealand English
The Freiburg-LOB Corpus of British English (F-LOB)
The Freiburg-Brown Corpus of American English (FROWN)
The International Corpus of English (ICE)
CoRD Entry submitted on October 22, 2008 by Prof. Stig Johansson, Department of English Language, University of Oslo.
Information for the entry was edited by Prof. Stig Johansson and Prof. Geoffrey Leech.