Sampling techniques

The basic principle in the compilation of Brown and LOB was to randomly select not only the titles from the bibliographical sources but also the particular section of a text using a random-number table. This sampling principle was modified either out of practical considerations, such as the availability of material, or whenever a single text did not yield the required 2,000 words. Rather than simply include the next article, the next suitable article (as far as style and subject matter were concerned) was chosen. The press sections of Brown and LOB are therefore not representative samples in a strict statistical sense.

The Frown corpus was intended to match the Brown corpus as closely as possible. The text samples were chosen not only to match the genre, but the texts were also taken from publications that were similar in content and style, and, in the case of periodicals, from titles that had a continuous publishing history from the 1960s to the 1990s (see manual of the tagged Brown corpora).