Google Books Corpora

Project leaders: Mark Davies
Size:  aprox 200 billion words
Language: American English, British English
Period: 1500-2009
Released: 2011
This new interface for Google Books allows you to search more than 200 billion words (200,000,000,000) of data in both the American and British English datasets, as well as the One Million Books and Fiction datasets. (If you're interested just in contemporary English, there are still nearly 100 billion words from just 1980-2009).
Although this "corpus" is based on Google Books data, it is not an official product of Google or Google Books (citation). Rather it was created by Mark Davies, Professor of Linguistics at Brigham Young University, and it is related to other large corpora that we have created.

This interface allows you to search the Google Books data in many ways that are much more advanced than what is possible with the simple Google Books interface. You can search by word, phrase, substring, lemma, part of speech, synonyms, and collocates (nearby words). You can copy the data to other applications for further analysis, which you can't do with the regular Google Books interface. And you can quickly and easily compare the data in two different sections of the corpus (for example, adjectives describing women or art or music in the 1960s-2000s vs the 1870s-1910s). Note however that what you see here is still an early version of the corpus (interface), and new features will be added and corrections will be made over the coming months.

Davies, Mark. (2011-) Google Books (American English) Corpus (155 billion words, 1810-2009). Available online at

Citation for Google Books and Culturomics:
Jean-Baptiste Michel*, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden*.  Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331 (2011) [Published online ahead of print 12/16/2010]. 


