(Adapted from project website.)
All documents are classified as belonging to one of the following genres, and we have tried where possible to include an equal number of words in all genres:
- Administrative prose (such as legal documents or council minutes)
- Expository prose (such as travel narratives)
- Personal writing (such as diaries and personal letters)
- Instructional prose (such as textbooks and educational materials)
- Religious prose (including sermons)
- Verse and drama
- Imaginative prose (such as novels and short stories)
This is in addition to the subcorpus of language commentators, which consists of approximately 1 million words. There is a comprehensive list of all documents available for browsing.
While CMSW has been assembled with the aim of being used as a complete corpus for researchers, there are individual documents and collections of documents which might be of interest to those wishing a narrower focus.