The Historical Thesaurus of English: Past, present and future

Christian Kay, Department of English Language, University of Glasgow


This paper will celebrate the fact that the Historical Thesaurus of English has at last been completed after over 40 years of hard lexicographical labour. It will begin by briefly describing the principles, methodology and aspirations of the project, with particular reference to developments in theoretical semantics, especially the cognitive paradigm, and to the use of technology. At the micro-level, links between sections of a lexical thesaurus can reveal cognitive pathways in the development of recurrent metaphors. At the macro-level, a thesaurus structure reveals a collective world-view, modified by factors such as time, dialect, and the idiolect of its compilers. The paper will then mention the kinds of research already done by people from many parts of the world using sections of data as they were completed. It will conclude by attempting to evaluate the potential of the project for future work, including plans to link it to the database of the electronic Oxford English Dictionary and to text corpora, identifying variable spelling as a key problem that variationist linguistics has yet to resolve.

1. Origins and purpose

The Historical Thesaurus of English (HT) has been a long time in the making. It was founded by Professor Michael Samuels in 1964. Forty years later, it has been published online for scholarly use and is about to be published in book form as the Historical Thesaurus of the OED (Kay et al. forthcoming). The fact that publication is both print and electronic is indicative both of the time period through which the project has passed and of the different uses to which it may be put in future. When work began, computers had no role in arts research; nowadays, it is difficult to envisage life without them, but printed books still have a key role to play.

1.1 Further information

Fuller information on the development of the project and the lexicographical decisions it entailed can be found in Kay and Wotherspoon (2002).

A general description of the project can be found at the following address:

The web version can be accessed at:

1.2 Description of the project

HT is an onomasiologically organised display of the lexicon of English over approximately 1300 years, from Old English to the present day, arranged in conceptual categories running from the most general meanings to the most specific. Within each large semantic category, such as Food or Warfare or Disease, are more detailed categories, such as Bread or Swords or Diseases of the Feet. Meanings (i.e. senses of word forms) are drawn from the Oxford English Dictionary (OED), including the second edition and its supplements but not the ongoing third edition.

Plans are underway to link HT to the electronic OED, so that they can be used together. In addition to providing an exceptionally rich resource for lexical studies, such linkage will enable HT to incorporate the large amounts of new material available to the OED as the result of the availability of many new resources including electronic text archives. At the other end of the historical scale, since the OED omits words which did not survive the end of the Old English period (dated for our purposes at 1150 AD), we have added the contents of A Thesaurus of Old English (Roberts and Kay 2000). This project is also available online, at:

1.3 Purpose of the project

The original and continuing purpose of HT is to provide data for researching the history and development of the English lexicon in terms of lexical obsolescence, innovation and semantic change, on the assumption that these phenomena will be illuminated if words are studied in their semantic contexts. M.L. Samuels made this point when announcing that the work would be undertaken in the English Language Department at the University of Glasgow:

[...] no solution to the problem of push- and drag-chains in lexis will be forthcoming until it is possible to study simultaneously all the forms involved in a complex series of semantic shifts and replacements. The required data exist in multivolume historical dictionaries like the OED, but they cannot be utilised because the presentation is alphabetical, not notional. The need is for a historical thesaurus which will bring together under single heads all the words, current or obsolete (and all the obsolete meanings of words still current) that have ever been used to express single and related notions.

(Samuels 1972:180)

2. Applications

Although no thesaurus can be considered complete until the last word has been slotted into place, virtually complete sections have been released to scholars over the last twenty years, enabling detailed analyses of particular semantic fields to take place.

2.1 Statistical studies

Now that the whole project is available, there will be scope for global and comparative studies, for example on the origins and forms of new words in different areas of vocabulary. Several PhD's at the Universities of Glasgow and London have produced figures for rates of change in fields such as Expectation (Sylvester 1994), Good and Evil (Thornton 1988), Love (Coleman 1999), and Religion (Chase 1988). Some of the results of these were collated in Coleman 1995, which also points to some of the pitfalls involved in extracting statistics from lexical data. More recently, Diller has produced a body of work using HT data as a starting point for the examination of various aspects of the field of Emotions (see, for example, Diller 2002).

2.2 Language and culture

In addition to providing data for lexical semantics, HT has potential as a tool for historical, literary and cultural studies. Simple lists of words can offer a starting point for studies of artefacts, structures or ideas at particular periods, as in Fischer's work on kinship terms (see, for example, Fischer 2002 and 2006).

All words in HT are accompanied by their dates of use, so it is possible to select those which were available at a particular time, and, perhaps equally important, to deselect those which were not. The lists include words which the researcher might miss if relying largely on modern English, i.e. those which have become obsolete or have changed their meanings. This knowledge may be useful when approaching a body of historical text data, such as text corpora or library catalogues.

2.3 Teaching packages

Groups of words can also form the basis of teaching packages demonstrating the links between language and culture. A package based on A Thesaurus of Old English, "Learning and teaching with the Thesaurus of Old English", is available at:

A parallel package, "Word Webs: Exploring English Vocabulary", based on HT and online text corpora, can be found at:

2.4 Data sample

A brief example of the text as it might appear in printed form is given below. The extract comes from the section on Banking and Finance and illustrates terms for an obsolete English coin, the sixpence, which might be of interest both to a historian researching coinage and a linguist interested in the development of terms for money, especially in informal use. The section slots into a larger section on Money, with each place in the heading string representing a level in the semantic taxonomy.

2.4.1 Abbreviated hierarchy Money Medium of exchange/currency Coins English coins   Sixpence

The words beneath each heading are given in order of their first recorded date of use, with a dash indicating continuing currency in modern English and a second or subsequent date marking the last use of an obsolete word. A plus sign '+' indicates either occurrences widely separated by time or a change in register, for example from unmarked to informal use. The label 'sl' indicates slang; 'au' = Australian.

2.4.2 Words for "Sixpence"

testern 1546-1614; half-shilling 1561-1695; tester 1567/8+a1839sl; teston 1577-1598; crinklepouch 1593sl; mill-sixpence 1598-1639; sixpence 1598-1886; testril 1601+1905; pig 1622-a1700sl; sice 1660-1709sl+1830sl; simon a1700sl; kick c1700-1871; cripple 1785sl+1885sl; tilbury 1796-1812sl; tizzy 1804-1946sl; tanner 1811sl -; bender 1836-1855sl; snid 1839sl; sprat 1839-1902sl; lord of the manor 1839-1972sl; fiddler 1846-1885sl; grunter 1858sl; sixpenny piece/bit 1897 -; zac(k) 1898-1977au sl; sprazer 1931-1961sl; sprowsie 1931-1966sl.

2.5 Online version

For browsing, the compactness of the printed version has advantages, enabling the user to gain an overview of particular sections and their place in the overall structure. However, like its parent dictionaries, it can only be searched in one way. The electronic version, on the other hand, offers a variety of ways of approaching the data, depending on where one's interest lies. This is illustrated from the searches available on the homepage:

2.5.1 Types of searches

Figure 1. Search types: Browse, Synonym search, Label, Affix, Part of speech, Dates

Figure 1. Search menu of the HT web version.

These searches represent the interactions of the 29 database fields in which the project information is stored. The last of these, the Dates search, which enables restriction to a particular historical period, was by far the most complex to implement, since the programming had to take account of the fact that a word current between, say, 1590 and 1780 was also current in any year between these dates.

3. Interactions with semantics

In addition to participating in the computing revolution in humanities research, HT has witnessed forty years of developments in semantic theory, from the structural semantics of the 1960's to the dominant cognitive paradigm of the present day. As I have discussed elsewhere (Kay 2000), the initial context of the project was structural semantics, especially the search for meaning components into which words might be atomized, with a view to finding core components which might form the basis of categories. Any onomasiological dictionary works at least implicitly on this principle.

3.1 Roundness

In Roget's Thesaurus of English Words and Phrases (RT), we find a category Rotundity, containing the items below:

252 Rotundity [+ ROUND]

N. rotundity, rondure, roundness, orbicularity 250 circularity; sphericity, sphericality, spheroidicity; globularity, globosity, cylindricity, cylindricality, gibbosity, gibbousness 253 convexity.
sphere, globe, spheroid, prolate sphere, oblate sphere, ellipsoid, globoid, geoid; hollow sphere, bladder; balloon 276 airship; soap bubble 355 bubble; ball, football, pelota, wood (bowls), billiard ball, marble, ally, taw; crystal ball; cannon ball, bullet, shot, pellet; bead, pearl, pill, pea, boll, oakapple, puffball, spherule, globule; drop, droplet, dewdrop, inkdrop, blot; vesicle, bulb, onion, knob, pommel 253 swelling; boulder, rolling stone.

(Lloyd 1982:145. Internal numbers are cross-references to other categories.)

What holds this category together is clearly the fact that in the real world all the items represented in it, from footballs to peas and onions, are, to varying degrees, round in shape, i.e. in componential terms they are [+ ROUND]. However, it would equally clearly be possible to classify them according to other principles. Some of the items are hard while others are liquid; some are used in games or warfare; some are vegetable in nature, some mineral; some are edible while others are not. There seem to be almost as many potential categories as there are words to be classified.

3.2 Issues of categorization

For the thesaurus-maker, problems of categorization arise at both the macro- and micro-levels. At the micro-level, the classifier has to decide which items go with which, i.e. are regarded as synonyms. For a paper publication, decisions have to be fairly firm, since space is limited. For an electronic publication, there is more latitude for cross-referencing and multiple placement. At the macro-level, decisions have to be taken about the overall structure, involving the establishment of major categories and the order in which they occur. Taken together, these major categories form a semantic map, a world-view as represented by the lexicon, or at least by the editor's interpretation of it.

3.2.1 World-views

Roget's world-view in RT had six major divisions:

1. Abstract relations


2. Space

3. Matter

4. Intellect: the exercise of the mind

5. Volition: the exercise of the will

6. Emotion, religion and morality.

(Lloyd 1982:xxxvi-xxxvii)

For HT, despite its greater content, there are three:

1. The External World (subsuming much of Roget's classes 1-3)
2. The Mental World (subsuming Roget 4-5 and Emotions from 6)
3. The Social World.

3.2.2 Comparison

Number of major divisions apart, the most obvious differences between the two works are firstly the fact that HT begins with the lexis of the concrete, observable world and deduces abstract relations from it, and secondly the much richer content of HT's third section, the Social World, containing the lexis of social existence and interaction. At the micro-level, our classification proceeds by types and functions, so that most of the items in categories like RT 252 above are dispersed to sections such as Foodstuffs and Weapons, leaving only words with the essential, rather than accidental, property of round shape in that category. (For further discussion of the differing structure of thesauri, see Fischer 2004.)

3.3 Cognitive semantics

How people classify things is of considerable psychological interest. Early work on prototype theory, notably by Eleanor Rosch, established through various psychological tests that speakers share certain perceptions when making decisions about category membership, most famously when deciding that a sparrow is a more typical bird than a penguin, or a carrot a more typical vegetable than a leek (Rosch 1973).

Such work offers support for the loose groupings of shared meanings in thesaurus categories, which are essentially fuzzy sets, shading from a prototypical core to more peripheral meanings. In psychological terms, it is possible that this is one way in which the brain stores lexis. However, as Rosch's experiments make clear, categories are also shaped by sociolinguistic context, so that, for example, experiments done by American and British students will vary in their results.

Results may also be affected by educational factors, such as a knowledge of scientific taxonomies, though these may be over-ridden by folk taxonomies, as in the many occurrences of tomatoes in lists of prototypical vegetables. As my own informal attempts to replicate such experiments have shown, results may also be affected by transient individual experiences, such as the fact that a subject has recently eaten a banana, and therefore places bananas high on their list of prototypical fruits. In the last analysis, categorization may be idiosyncratic.

3.3.1 Fluidity of categorization

Experiments such as Rosch's can be interpreted in two ways. On the one hand, the fact that culturally-akin groups of speakers show similarities in categorization suggests a degree of common world-view within a particular context, however defined. On the other, the existence of differences and idiosyncrasies suggests that such claims should be made with caution. The human capacity for assigning things to categories is essentially fluid and creative.

Aitchison makes this point about categorization at the micro-level when she writes:

[...] humans are able to categorize in different ways, at different times, depending on their purpose: a knife might be linked with scissors when selecting tools for cutting, with a fork and spoon when choosing implements for eating, and with a sword or dagger when thinking about weapons. Furthermore, the categorization need not be a permanent one, but thought up on the spur of the moment, for a particular purpose [...]

(Aitchison 2004:5)

Fischer makes a similar point:

Categorization, in short, is an open, ongoing, multidimensional process, which the human brain handles far more effectively and flexibly than any printed thesaurus. The brain seems to be able to store many different categorizations at the same time; some may be privileged as being more important than others, but these others are present nevertheless and can be summoned/activated in appropriate contexts or situations.

(Fischer 2004:55)

Recent work on mental spaces and conceptual blending postulates a similar fluidity in the creation of meaning on specific occasions. (See, for example, Fauconnier and Turner 2002.)

3.3.2 Polysemy

In a thesaurus, flexibility can be coped with, at least to some extent, by placing words in multiple categories, representing differing senses and the contexts in which they are used. Aitchison's example, 'knife', appears at least once in each of 21 HT categories, based on the nine nominal and five verbal senses in the OED with their numerous compounds. However, even such acknowledgement of polysemy cannot cope with the infinite creativity of the human mind.

3.4 Metaphor

Since the pioneering work of Lakoff and Johnson in Metaphors We Live By (1980), with its recognition that metaphor is both fundamental and creative, metaphor has been a focus of interest in cognitive semantics. One justification for starting the HT classification with the lexis of the material universe is that our abstract thinking is firmly grounded in the language of the material world, as the metaphorical expression "firmly grounded" illustrates.

Such connectivity between abstract and concrete is revealed by the multiple categorization of polysemous words, often including one or more abstract senses radiating from an initial concrete sense. One way of using HT is to look for recurrent pathways between abstract and concrete meanings, which may well indicate the development of conceptual metaphors.

3.4.1 Lack of seriousness

How, for example, do we express the abstract concept of lack of seriousness in one's approach to life? A glance at the relevant HT category, Frivolity/Light-mindedness, reveals some dominant metaphorical links. The dates of use of each meaning, extracted from the OED, are given after it, with a dash indicating continuous currency.

Lack of material weight

The most obvious of these links express lack of seriousness in terms of lack of material weight, as in:
lightness 1340 -
leger 1598
legerity 1561-1598
levity 1564 -
light-minded 1611 -
light-mindedness 1661 -
light-headed 1579/80 -
light-headedness 1813 -

A couple refer more specifically to absence of weighting material on a ship:
unballasted 1644 -
unballast 1655-1659

Light substances

Some are linked to particular light substances:
frothy 1593 -
corky 1601-1661
barmy 1602-1785 (referring to barm, the froth on beer)
barmy-brained 1824
airy 1627 -
thistle-down 1897
gossamer 1806/7 -


Sometimes, there is a component of light and largely pointless movement, especially in words designating people:
fizgig a1529 -
flibbertigibbet 1640+1892 -
flip-flap 1702
flutter-pate 1894
caperwitted a1670
flighty 1768/74 -


Another common metaphorical link is to creatures allegedly sharing some of these characteristics:
butterfly 1605
papilionaceous 1832-1875
butterfly-brained 1961

feather-brain 1839 -
feather-brained 1820 -
bird-brain 1943
bird-brained 1922
sparrow-brain 1930 -

and possibly
flea-lugged 1724-1823 (Scots, having 'lugs'/ears like a flea)

This particular connection is apparent in one of the OED quotations for 'light-headedness', citing Charles Dickens in Martin Chuzzlewit in 1844, xxiv: "As to lightheadedness, there never was such a feather of a head as mine".


Words referring particularly to people often exploit an alternative metaphorical frame, Breaking/dispersing, as in:
shatter-brain 1719 -
shatterpate 1775
shatterwit 1775
scatterbrain 1790
scatter-brained 1804 -
scatter-headed 1867
shatter-headed c1686-1713
shatter-brained 1727

Examples like these, of varying degrees of complexity, abound throughout HT, offering fascinating insights into how connections are made, both synchronically and diachronically. An extensive study, based on HT data, into metaphors expressing the concept of stupidity has been conducted by Kathryn Allan (2009). Robert Kiełtyka has produced a comprehensive study of animal metaphors, also using HT among his sources (Kiełtyka 2008).

4. Future plans

HT is offered to the research community as a tool to facilitate linguistic research rather than as a set of results or a statement of any particular theory or psychological stance. We expect it to be used in future by historical linguists and general semanticists interested in studying particular fields of meaning and the links between them, as well as by literary scholars, historians and others whose purposes we may not currently envisage. One such new direction has been the use of HT lists as a starting point for work in historical pragmatics, a field of study that hardly existed as such when HT was conceived. (See, for example, Taavitsainen and Jucker 2007.)

On the technical side, in addition to the links with the OED described in section 1 above, we hope to develop HT as a search engine for historical and other non-standard texts, which will involve exploring ways of maximizing the retrievability of variant spelling forms. Using HT for research purposes will be a luxury after so many years devoted to bringing it to fruition. We hope that research will take off in many different directions as scholars realise the potential of the materials.


