Sampling Techniques

(Source: F-LOB manual, original version)

The basic principle in the compilation of Brown and LOB was to randomly select not only the titles from the bibliographical sources but also the particular section of a text using a random-number table. This sampling principle was modified either out of practical considerations, such as the availability of material, or whenever a single text did not yield the required 2,000 words. Rather than simply include the next article, the next suitable article (as far as style and subject matter were concerned) was chosen. The press sections of Brown and LOB are therefore not representative samples in a strict statistical sense.

This applies even more so to the sampling procedures employed in the compilation of F-LOB. The main aim in compiling the press section of F-LOB was to match the 1991 material as closely as possible with that used in LOB by sampling the same newspapers (see Sand & Siemund 1992). For the other sections, the same magazines and periodicals used in LOB were sampled whenever possible. In the sampling of monographs great care was taken to select books on equivalent topics rather than to randomly select titles from bibliographical sources. The main aim was to achieve close comparability with F-LOB rather than statistical representativeness. (For an overview of the original composition of the corpus, see Johansson et al. 1986).

For a discussion of the considerations that went into the compilation of the corpus and the selection of text samples, and of the kinds of research that the data will ultimately allow, see Leech & Smith (2005).


Johansson, S., et al. 1986. The tagged LOB corpus: users’ manual. Bergen: Norwegian Computing Centre for the Humanities.

Leech, Geoffrey & Nicholas Smith. 2005. "Extending the possibilities of corpus-based research on English in the twentieth century: A prequel to LOB and F-LOB." ICAME Journal 29, 83-98.

Sand, Andrea & Rainer Siemund. 1992. "LOB - 30 years on..." ICAME Journal 16: 119-122.