The BLOB-1931 Corpus (previously called the Lancaster-1931 or B-LOB Corpus)
This corpus belongs to the same family as the Brown, LOB, Frown and F-LOB Corpora. Its design follows exactly the plan of the LOB (Lancaster-Oslo/Bergen) Corpus, which in turn is based – except for one or two trivial differences – on the design of the Brown Corpus. In other words, the BLOB-1931 Corpus is a comparable or ‘matching’ corpus in relation to other members of the ‘Brown family’, which includes, in addition to the corpora mentioned above, the Kolhapur Corpus of Indian English, the Australian Corpus of English, and the Wellington Corpus of Written New Zealand English. The BLOB-1931 Corpus is sampled from the period 1928-1934, a window centring on 1931, to provide a comparison (roughly with a thirty gap) between the written British English of 1931±3,1961 (LOB) and 1991 (F-LOB). A comparable corpus for thirty years earlier (1901±3) is also at an advanced stage of compilation.
Project leader: 1: Geoffrey Leech (Lancaster University); 2: Paul Rayson (Lancaster University)
Time of compilation: 2003–2006
Size: approximately 1,000,000 words
Number of texts/samples: 500 texts of 2000 words each
Funding: The Leverhulme Trust
Reference lines and copyright
The BLOB-1931 Corpus
None yet exists
Project leader 1: Geoffrey Leech (Lancaster University)
Project leader 2: Paul Rayson (Lancaster University)
Research Associate: Nick Smith (Lancaster University)
In practice, most of the work was done by Nick Smith (now at the University of Leicester).
Contributors to the digitization of the corpus included: Jeremy Bateman and Lilian Hoffmann in Lancaster, and Gabriela Diaconu, Nicole Höhn and Birgit Waibel in Freiburg. We are grateful to Christian Mair for helping the compilation and editing work by arranging attachments of students from the University of Freiburg to Lancaster University.
We also wish to thank Mike Ashley, Lancashire County Library and Carnforth second-hand bookshop for their help in procuring texts for the corpus.
Not yet available. We hope to make it available on-line soon.
There are an untagged version and a POS-tagged version of
BLOB-1931. The POS-tagged version was annotated using the CLAWS4 tagger (see Garside and Smith 1997) and the patching tool Template Tagger (see Fligelstone et al. 1997).
Both versions will be encoded in XML.
The BLOB-1901 Corpus
See Leech and Smith (2005) for further information on the Lancaster1931 (B-LOB) Corpus. Some of the first findings of a comparison between B-LOB, LOB and F-LOB are reported in Leech and Smith (2009) and Leech et al. (2009).
CoRD Entry submitted on February 22, 2010 by Geoffrey Leech.
Information for the entry was edited by Geoffrey Leech.