Semantic corpus trawling: Expressions of “courtesy” and “politeness” in the Helsinki Corpus

Andreas H. Jucker, University of Zurich
Irma Taavitsainen, University of Helsinki
Gerold Schneider, University of Zurich


This article studies politeness related terminology in the history of English with the aim of shedding some new light on the long diachrony of the folk notion of “politeness”, and at the same time we assess the usefulness of the research method which we call “metacommunicative expression analysis”. Metacommunicative expressions are words and phrases used to talk about aspects of communication, or a particular type of behaviour, such as “polite” or “impolite”. The semantic field of expressions that have been used in the history of English to talk about polite and courteous behaviour was charted with the help of the new research tool, the Historical Thesaurus of the Oxford English Dictionary and the Oxford English Dictionary. The frequencies of politeness related vocabulary items were then studied in the Helsinki Corpus. This exercise revealed areas of higher or lower density of politeness related vocabulary at particular points in the history of the English language and in specific prototypical text categories in the Helsinki Corpus. In the end, the analysis had to go back to the actual texts and interpret the specific politeness related vocabulary in their contexts. The analysis reveals the multifaceted nature of politeness and its manifestations at different points in the history of English and in different text categories.

1. Introduction

In recent years, politeness has become an increasingly important subfield of pragmatics. There are yearly conferences with large numbers of participants. There is even a dedicated Journal of Politeness Research that is devoted entirely to this field. This may be surprising if we consider that politeness is a term that describes human communicative behaviour that is – in a sense – secondary. It describes particular ways of behaving or ways of communicating. In our everyday lives we do a lot of things when we communicate. We greet each other, we exchange information about the weather or our holiday plans, we deliver academic lectures, ask each other for help, pay compliments, issue requests or commands, and so on and so forth, and we may do these things more or less politely or impolitely. In spite of the importance that the notions of politeness and impoliteness seem to have for us, there is still fairly little agreement among scholars about how these terms are to be used for scholarly purposes. What exactly do we want to describe when we set out to describe polite and impolite behaviour?

Politeness and impoliteness may manifest themselves on very different levels. Speakers may choose particularly polite or impolite words, they may use a tone of voice that comes across as polite or not so polite. They may accompany their words with facial expressions or gestures that render what they say as more or less polite. Moreover, politeness and impoliteness are situational and individualistic, and to some extent they may even be unintentional. What is polite in one situation or to one particular person may be impolite (perhaps ironic or sarcastic) in another situation or to another person. All these problems are exacerbated if politeness and impoliteness are studied from a diachronic angle. If it is difficult to reach any agreement about what politeness and impoliteness are in Present-day English, it must be far more difficult to ascertain what they were in earlier periods. The difficulty obviously increases the further back in time we go and the more distant a period is from our own because we lack both the personal experience and the opportunity of asking members of the relevant speech communities.

As a result we only have very partial and patchy answers for some selected periods in the history of the English language. There are a number of studies devoted to terms of address, and in particular to the distinction between the pronouns of address ye and thou (see Finkenstaedt 1963; Burnley 1998 and a recent overview in Mazzon 2010); there are studies of Brown and Levinson’s (1987) politeness strategies in Shakespeare’s plays (e.g. Brown and Gilman 1989; Kopytko 1995); and a few politeness related speech acts have been studied from a diachronic perspective (e.g. Kohnen 2008a, 2008b, 2011).

A more comprehensive view of polite behaviour in the history of the English language, however, seems very difficult if not impossible to achieve. However, we are not without some useful tools. The most expedient one is to study politeness via the discourse about politeness and politeness related phenomena of a particular period. How did people talk about politeness? What did they consider to be polite and what did they consider to be impolite and why? How did they evaluate their own utterances and the utterances of their interlocutors and other people? Thus, we follow the trend of some politeness scholars who study present day material and who have turned their attention away from academically defined notions of politeness to the everyday notions of politeness.

Thus we propose to study politeness related terminology in the history of the English language to gain a better understanding of the large diachrony of politeness. And at the same time we would like to assess the usefulness of the extant research tools for such an endeavour. What is their potential? How far will they take us in a better understanding of politeness phenomena in earlier periods? Do they allow us to discern clear patterns, and do they allow us to discern how these patterns developed in the course of time? And what are the limitations of the tools? Are there aspects of politeness that cannot be described?

2. Methodology in historical politeness research

It has become standard in politeness research to distinguish between the everyday notion of politeness and the technical term. The everyday notion is called “first order politeness” or “politeness1” and the technical term “second order politeness” or “politeness2” (see in particular Eelen 2001 and Watts 2003). This is a useful distinction because it reminds us that technical definitions of the term politeness may deviate considerably from our everyday understanding of the term. The everyday term is, like most words of a natural language, fuzzy and subject to variation across social groups and across time, while the technical term is as precise as the scholar manages to make it (see also Watts, Ide & Ehlich 1992: 3; Kasper 2003).

Watts maintains that the politeness scholar should give priority to first order politeness:

A theory of politeness2 should concern itself with the discursive struggle over politeness1, i.e., over the ways in which (im)polite behaviour is evaluated and commented on by lay members and not with ways in which social scientists lift the term “(im)politeness” out of the realm of everyday discourse and elevate it to the status of a theoretical concept in what is frequently called Politeness Theory. (Watts 2003: 9)

Kasper (2003: 2) also maintains that “First order politeness phenomena constitute the empirical input to politeness theories.” The object of study, therefore, is the term politeness and its use by native speakers. How do they use this term? What kind of phenomena do they describe with this term? And how do they evaluate such behaviour?

This kind of method has been described as analysis of metalanguage (Jaworski, Coupland and Galasiński 2004; Culpeper 2009) or – more specifically – metacommunicative expression analysis (Jucker and Taavitsainen in prep). Metacommunicative expressions are words and phrases that can be used to talk about aspects of communication, in the sense that they name a particular speech act, such as compliment, greet, insult or thank, or a particular type of behaviour, such as polite or impolite. A search for such expressions may reveal performative uses of the speech act that they name, as, for instance, in the utterance “Donald, I compliment you on the book” (COCA, CNN Newsroom, 2009). Many of these expressions are not generally used performatively but all of them can be used discursively, that is to say people use them to talk about that particular aspect of communicative behaviour. In their discursive use, metacommunicative expressions give us what might also be called an ethnographic view of the kind of communicative behaviour that they describe:

How people represent language and communication processes is, at one level, important data for understanding how social groups value and orient to language and communication (varieties, processes, effects). This approach includes the study of folk beliefs about language, language attitudes and language awareness, and these overlapping perspectives have established histories within sociolinguistics. Metalinguistic representations may enter public consciousness and come to constitute structured understandings, perhaps even ‘common sense’ understandings – of how language works, what it is usually like, what certain ways of speaking connote and imply, what they ought to be like. (Jaworski, Coupland and Galasiński 2004: 3; also quoted by Culpeper 2009: 66)

Like any other word of a natural language a metacommunicative expression may change its meaning over time, and its denotation may be fuzzy at any given point in time. Metacommunicative expressions should, therefore, be described in the context of semantically similar expressions in order to make sure that the analysis does not focus on a narrowly focused special meaning. A metacommunicative expression may have undergone semantic broadening or narrowing, and only an analysis that incorporates neighbouring expressions can make sure that such trends can be discerned.

The term “polite” with the meaning ‘refined, elegant, scholarly; exhibiting good or restrained taste’ is first attested in the English language in c 1500 (OED, “polite”, adj. 2(a)). The meaning ‘courteous, behaving in a manner that is respectful or considerate of others; well-mannered’, which is more like the modern meaning of “polite” is first attested in 1751 (OED, “polite”, adj. 2(c)). Thus, if we want to pursue the history of polite behaviour we have to extend the search to expressions within the same semantic field, e.g. “courtesy”, “civility” or “deference”. Thus, we have to exchange the fishing rod for a trawl net, as it were, in order to catch not only individual instances but an entire shoal of terms, i.e. the contents of an entire semantic field.

Such an undertaking depends on several electronic tools. First of all, it is necessary to identify the relevant terms in the semantic field of “polite” and “politeness”, and second, all the relevant spelling variants of these terms must be listed. These terms, then, have to be used as search terms in relevant historical corpora. The results of such searches have to be interpreted in terms of their distribution across the genres represented in the corpora and in terms of their diachronic development.

3. Case study: “politeness” and “courtesy” in the history of English

3.1 A first diachronic approximation

In this section we will present the results of a case study. As a first approximation, we used the new research tool of the Ngram Viewer (see the Sources section for a link). The Ngram Viewer provides access to a diachronic corpus of printed texts from 1500 to 2000, a database that contains the text of all those millions of books that have been scanned by the Google Books project (see Michel et al 2010). The exclusion of handwritten material prevents the corpus from going further back than 1500. According to the relevant website the entire corpus comprises 5 million books, which constitute about four per cent of all books ever printed, and 500 billion (5 * 1011) words. Of these, 361 billion words are in English and they are searchable for n-grams, i.e. strings of n words, where n is a figure between 1 and 5. This, therefore, includes single words like banana or SCUBA and strings that are up to five words long, like the United States of America.

In the corpus-linguistic sense, the Ngram Viewer database is similar to the World Wide Web as a corpus. It is a massive collection of data without any structure imposed by the researcher. The Web includes carefully prepared and edited texts as well as very casual texts and texts written by writers who may have a limited command of English (McEnery and Hardie 2012: 7). In a similar way the Ngram Viewer database includes books from all sorts of contexts. Moreover, it must be borne in mind that the texts included in the database are the results of digital scans and optical character recognition processing. For the early centuries this does not appear to be very reliable. It seems very likely that searches for the early centuries miss relevant hits because of faulty character recognition. A single misread character in a target item will prevent this item from being retrieved, and, therefore, results for the sixteenth and seventeenth century often look erratic and should be treated with even more scepticism than those for later centuries. Spot checks also reveal that the dating of individual books cannot always be trusted especially in the case of reprints of older material. Nevertheless, the diagrams provided by the Ngram Viewer often give a useful first approximation to the diachronic development of individual words or short strings of words. The frequencies of the individual search strings are given as normalized figures in percentages of the total number of words for a particular period. This provides figures that are not easily readable, but by moving the comma four digits to the right they can be read as instances per million words. Thus figure 1 ranges from 0 to 26 words per million words.

Figure 1

Figure 1. Words of courtesy and politeness from 1700 to 2000 (Ngram Viewer; see the link to the query in the Sources section)

The Ngram Viewer only retrieves exact strings. In order to retrieve capitalized instances or spelling variants, additional searches would be needed, but the results of searches for alternative forms of the same word cannot be cumulated. Thus the results can only be very tentative, but the diagram in Figure 1 suggests that the word polite made a particularly strong appearance in the eighteenth century, but obviously the Ngram Viewer does not distinguish between different senses of particular n-grams. It might be suspected that meanings that are now obsolete are responsible for at least some of this reduction. Civility and deference were also important particularly in the second half of the eighteenth century. In the nineteenth century the frequency of all these words was more than halved. Courtesy, on the other hand, only appeared at the beginning of the nineteenth century. The word politeness is interesting because it lags almost a century behind polite. It reaches an early peak at the beginning of the nineteenth century and then steadily declines throughout the nineteenth and twentieth century. The word courteous shows less variation across the three centuries except for the marked decline from about 1930. Cultural critics who have deplored the general decrease of polite behaviour in the twentieth century might interpret the general tendency for all the terms included in this figure to decline between about 1930 and 1980 as confirmation for their pessimistic views.

[UPDATE January 24, 2013 - See here for an addendum to this section - AHJ, IT, GS]

3.2 The semantic field of courtesy and politeness

The starting point for our investigation of politeness and courtesy in the history of English is the semantic field of expressions that have been used in the history of English to talk about polite and courteous behaviour. The new research tool, the Historical Thesaurus of the Oxford English Dictionary (see link in Sources), provides 67 entries under the headings “courtesy, n”, “courteous, a” and “courteously, adv”. We decided for these headings because they have a more extensive time depth and a broader application and to some extent subsume the concept of politeness. It is, of course, possible to distinguish semantically between “politeness” and “courtesy”, but as a heading for an entire semantic field, we do not wish to make a principled distinction between them. These 67 entries were the starting point for our investigation. They are all given with the date of their first attestation in English as listed in the Oxford English Dictionary. The list also contains words that are not immediately obvious candidates for the semantic field of politeness and courtesy. In these cases it is usually a special, perhaps even obsolete, meaning, of the expression which is intended, and the date in these cases refers to the earliest known occurrence of this expression with this particular meaning. Table 1 lists all 67 items chronologically and split into the time frames given by the Helsinki Corpus; all parts of speech are listed together, using the OED headwords. The nine expressions that have a first attestation after the period covered by the Helsinki Corpus are lumped together under the heading Present-day English. The same word may appear more than once if different relevant meanings have different first attestations.

OE (–1150) well 825, manship 850, methe 850, methe 850, fair 850, methely 850
M1 (1150–1250) courtesy 1225, debonairty 1225, gentrice 1225, hendy 1225, debonair 1230, menskly 1230, debonairship 1240
M2 (1250–1350) hend 1250, fairness 1275, courteous 1275, hendly 1275, hendly 1275, courteously 1290, hendelaik 1300, hendness 1300, hendship 1300, bonair 1300, debonairly 1300, bonairty 1303, gracious 1325, bonair 1330, bonairly 1340, bonairness 1375
M3 (1350–1420) hend 1375, goodly 1377, debonairness 1382, humanity 1384, gentle 1385, mensking 1400, debonary 1402
M4 (1420–1500) debonarious 1485
E1 (1500–1570) humane 1500, formal 1518, courteousness 1530, comity 1543, handsomely 1548, civilly 1552, civilness 1556, civility 1561, civil 1565,
E2 (1570–1640) humanely 1596, complement 1597, fairly 1609, gallantly 1611, over-civil 1639
E3 (1640–1710) courtship 1640, galanterie 1641, genteel 1656, complementalness 1657, gallantry 1675, gallant 1680, overcivility 1693
PDE (1710–) politely 1748, polite 1751, civil 1767, civilish 1779, chivalrous 1818, politeful 1832, comity of nations 1862, frauendienst 1932, knight in shining armour 1965

Table 1. 67 expressions of “courtesy” in the Historical Thesaurus of the OED and their first attestation grouped into the Helsinki Corpus periods

The Historical Thesaurus of the OED lists only very few expressions that go back to Old English. They are, therefore, lumped together under one heading in spite of the four Old English subperiods distinguished by the Helsinki Corpus. One reason for this dearth of expressions is certainly the fact that the Oxford English Dictionary, which provides the data base for the Historical Thesaurus, does not list Old English words if they did not survive into later stages of the English language. Thus, it is possible that there is a small or even substantial vocabulary of politeness related terms in Old English that is not covered by the Oxford English Dictionary and hence is not included in the lists of the Historical Thesaurus of the OED.

The present line of investigation, therefore, does not seem very fruitful for Old English politeness. [1] It has, however, been pointed out that politeness was valued and courtly custom observed in noble households, and refinement was also cultivated in monasteries (Burnley 1998: 23). The statement is based on e.g. the scene in Beowulf where appropriate behaviour is described in the polite reception of visitors.

In Table 1, M2 (1250–1350) stands out as a period with a particularly rich influx of new politeness related terms. With 15 new expressions it yields about twice as many as in most other periods, and M4 (1420–1500) stands out because there is only one single new term. It is easy to speculate that during the late thirteenth and early fourteenth century French influence with its politeness culture was particularly strong, leading to the adoption of relevant terminology to describe the new kind of courtly behaviour. It is more difficult, perhaps, to offer a plausible explanation for the lack of new politeness vocabulary in the fifteenth century. Apart from these two periods all other periods show a steady increase of the politeness related vocabulary by between five and nine new expressions.

For Middle English there are other excellent tools to probe into the meanings of lexical items and their contexts. We decided to explore the politeness terminology given in Table 1 in order to catch more precisely the quality of the politeness vocabulary retrieved so far. We searched for all these words in the on-line versions of the Middle English Dictionary and the Middle English Compendium and assessed the entries in detail to see in which ME texts they occurred in their politeness-related senses. This exercise proved fruitful as the MED and the quotations gave clear evidence of the words belonging to the vocabulary connected with the ideal qualities of chivalry and courtly love and life. The word cŏurteis (adj. & n.)[AF courteis, curt-, cort- & CF co(u)rtois.] seems to be very central, as it is commonly used in the explanations of other items in the list. It is defined in the following way: 1. (a) Of persons: courtly or refined in manners; well-bred, urbane; polite, courteous; considerate, kind; (b) of behavior, actions, words, etc.: refined, well-mannered, polite. Another central term is hēnd(e) (adj. from OE gehende) which also refers to courtly or knightly qualities, “an hende cniht”, and is also associated with polite behaviour. The collocation “curteys and hend”, or the other way round “hende and curteys”, is one of the stock phrases of this vocabulary and occurs very frequently. The citations from MED come from fiction, mostly from romances and chivalric adventures, such as Roman de la Rose, Flous and Blaunchefour, Awntyrs of Arthure, saints lives, and Malory’s and Chaucer’s works. Only a few non-literary works are cited, and even there there is a clear connection to courtly literature as books such as those on table manners were aimed at an aristocratic readership to instruct in appropriate behaviour according to the courtly norms of “nurture” and courtly life.

In Middle English, the genre of courtly literature has proved difficult to define, but a broad definition associates it with texts that exhibit the characteristics defined in MED under courtly (see above, and Burnley 1998: 123). The contents are secular, but e.g. the epithets of courtly ladies are applied to the Virgin Mary, and terms of chivalry and courtly love “became thoroughly established concepts reflecting dominant concerns of the age” (Barron 1987: 57). This statement applies very well to the results of our study of politeness terms in Middle English, as the qualitative examples below will show.

3.3 Semantic field and corpus search

The first attestation of these terms does not say anything about the overall frequency of politeness related vocabulary at a particular period in the history of the English language. This can only be established by using an appropriate trawl net, i.e. by searching for all politeness related terms in a general-purpose diachronic corpus, such as the Helsinki Corpus. It turned out, however, that this list of 67 expressions was not directly usable as search terms because it includes several items that occur frequently with a non-courtesy specific sense and thus lead to low precision. These are: complement, complementalness, fair, fairly, fairness, formal, goodly, humane, humanely, humanity, methe, methely, well. For practical reasons two multi-word phrases (comity of nations, knight in shining armour) and a recent loan (frauendienst) were also excluded. After this step 50 entries remained in the search list.

This list, however, would not find the entire politeness related vocabulary because the Historical Thesaurus of the OED does not list spelling variants but only the most current spelling of each word. It was, therefore, necessary to retrieve all relevant spelling variants from the Oxford English Dictionary in order to narrow the mesh size of the trawl net and to increase recall. Many of these spelling variants could be merged with the help of regular expressions. The historical spelling variants of the Present-day expressions courtesy, courteous and courteously, for instance, can be covered by the following list of expressions: cortays*, corteis*, corteous*, cortes*, corteys*, cortez*, courtaysye, courteis*, courteosie, courteous*, courtes*, courteys*. The asterisk covers the endings –e, –y, –ie, and –ness. This procedure, again, left some items that retrieved unwanted hits, thus negatively affecting precision. These were spelling variants with frequent homonyms. The Middle English expression hend, for instance has several spelling variants including ende, hind*, and hynd*. But these word forms very often appear as homonyms with non-politeness related meanings. These forms were, therefore, also excluded. The addition of the spelling variants according to this procedure yielded a list of 185 search terms (see the list in the Sources section). With this list it was now possible to trawl the Helsinki Corpus for politeness related vocabulary and to plot a map of the frequency of these expressions according to their occurrence in different periods and in different genres. Although we did not perform a formal evaluation of precision and recall, we have manually inspected large random sets to weed out forms with low precision, as just described. The amount of noise that an unedited list of terms creates is considerable, and would partly lead to different results.

3.4 Results

The 185 search terms produced 1164 hits in the entire Helsinki Corpus, i.e. 0.663 hits per 1000 words. [2] Figure 2 plots the frequency of the politeness related vocabulary in the Helsinki Corpus according to the periods. It is immediately obvious that our procedure of pruning the search terms did not leave any hits before the Middle English period. It is interesting that it is again M2 (1250–1350) which stands out with the highest frequency of politeness related vocabulary. The Early Modern English period shows, perhaps surprisingly, a steady decline.

Figure 2

Figure 2. Politeness related vocabulary in the Helsinki Corpus by period (hits per 1000 words) (see the link in Sources to download the Excel file of this figure) [3]

Figure 3 presents the results according to the genre classification given in the Helsinki Corpus. The genres are given in alphabetical order. The genres with the highest frequency of politeness related vocabulary are non-private letters, romance, medical handbooks and fiction followed by astronomical handbooks, travelogues and mystery drama. The lowest frequencies are found in rules, the bible and philosophy.

Figure 3

Figure 3. Politeness related vocabulary in the Helsinki Corpus by genre (hits per 1000 words) (see the link in Sources to download the Excel file of this figure) [4]

However, the categorization according to genres is very fine-grained. Some genres are represented by only a few texts and most genres do not extend over the entire period covered by the Helsinki Corpus. The prototypical text categories, on the other hand, provide continuation with at least a few samples of each category in Old, Middle and Early Modern English. Figure 4 presents the results according to the prototypical text category.

It should be pointed out, however, that the use of genre distributions has changed over the twenty years of the history of the Helsinki Corpus. In the early studies they were the end results, but in recent research they are used as a point of departure for further studies pointing to other, more specialized and larger corpora for more examples. For example, if private letters has most occurrences of the feature under scrutiny in the Helsinki Corpus, the publicly available versions of CEEC can be used, or if scientific writing was the category with most hits, Middle English Medical Texts or Early Modern Medical Texts can be reverted to.

Figure 4

Figure 4. Politeness related vocabulary in the Helsinki Corpus by prototypical text category (hits per 1000 words) (see the link in Sources to download the Excel file of this figure) [5]

Figure 4 shows that there is a fairly even distribution of politeness related vocabulary across the prototypical text categories of expository, religious instruction and secular instruction. The X category, which stands for texts that could not be assigned to any of the other prototypical text categories, shows a similarly high frequency. Statutory has a smaller frequency. The two categories that stand out with the highest frequency of politeness related vocabulary are the two narrative categories of imaginative and non-imaginative narration, and it is imaginative narration which clearly has the top position with a frequency that is twice as high as non-imaginative narration. This result is very much in accordance with our observations on the MED examples.

For the next step we investigated the interrelationship between the diachronic development and the distribution across the prototypical text categories in order to see whether the individual text categories reveal different diachronic patterns. The frequency map given in Table 2 shows the results of this investigation.

  O1 O2 O3 O4 M1 M2 M3 M4 E1 E2 E3
EXPOS     0.000         0.000 0.258 0.236 0.084
INSTR REL   0.000 0.000 0.000 0.085 0.358 0.272 0.576 0.706 0.000 0.456
INSTR SEC   0.000 0.000 0.000 0.000   0.083 0.243 0.685 0.243 0.255
NARR IMAG     0.000     1.243 0.453 0.539 0.509 0.232 0.000
NARR NON-IMAG   0.000 0.000 0.000 0.260 0.614 1.003 0.116 0.525 0.333 0.198
STAT   0.000 0.000 0.000   0.000 0.000 0.086 0.405 0.081 0.000
X 0.000 0.000 0.000 0.000   0.000 0.690 0.671 0.327 0.176 0.107

Table 2. Frequency map of politeness related vocabulary in the Helsinki Corpus (per 1000 words). Increasingly darker shades of green indicate increasingly higher frequencies. [6]

The frequency map in table 2 yields a more fine-grained picture of the development of politeness related vocabulary in the Helsinki Corpus. It can be seen as a map plotting the richest fishing grounds in the Helsinki Corpus. The distribution of empty cells reveals periods for which a particular prototypical text category is not attested in the Helsinki Corpus. Expository texts, for instance, make a brief appearance in O3 (950–1050), and after that only occur regularly from M4 (1420–1500). The many cells with a frequency of 0.000 indicate all those periods and text categories for which texts are included in the Helsinki Corpus but without any politeness related vocabulary caught by our search strings.

The cells shaded in green indicate all those periods and text categories for which politeness related vocabulary is attested in the texts of the Helsinki Corpus. Darker shades of green indicate higher frequencies. The highest frequency is attested in imaginative narration in M2 (1250–1350) followed by non-imaginative narration in M3 (1350–1420) onwards. Three of the text categories (imaginative and non-imaginative narration and category X) show a fairly clear decrease from Middle English to Early Modern English. Two categories (religious and secular instruction) show a slight increase towards the end of the Middle English period followed by a slight decrease during the Early Modern English period. And two text categories (expository and statutory) make only a cursory appearance round about the beginning of the Early Modern English period.

We have mentioned that the manual inspection of initial results in order to weed out false positives which affect precision, and extending to spelling variants by the use of regular expressions is an essential step in our approach. Figure 5 compares hits per 1000 words between using our curated list of terms (the data from figure 2) and using a raw list of terms. The raw list of terms leads to massive undergeneration (i.e. a massive loss of recall) on earlier texts, where spelling variants are less standardized, and to a massive overgeneration (very low precision) in late texts, where the majority of the hits are non-courteous instances of fair and goodly.

Figure 5

Figure 5. Hits per 1000 words, comparing a curated list of terms and the raw list of terms.

3.5 Interpretation and analysis

In order to zoom in on the texts that are responsible for high frequencies of politeness related vocabulary in the Helsinki Corpus, those texts with frequencies higher than 1.5 hits per 1000 words were retrieved. They are given in Table 3.



Text type

Prototypical text category

No. of hits

No. of words

Hits per 1000 words

The Thrush And The Nightingale







Henry V, Letters







Elyot, The Boke Named The Gouernour







Middle English Sermons ... Ms. Royal







Historical Poems (In MS Harley 2253)







Dame Sirith; Interlude







Howard; Tunstall; A Letter







The Brut Or The Chronicles Of England







The Book Of Vices And Virtues







Kyng Alisaunder







Table 3. Text files in the Helsinki Corpus with more than 1.5 hits of politeness related vocabulary per 1000 words

There are about 100 files in the Helsinki Corpus which have at least one item of politeness related vocabulary. Table 3 lists the top ten files which all have a frequency of politeness related vocabulary higher than 1.7 hits per 1000 words. The text files differ considerably in their length. At the bottom of the list, there is a text which contains 11,000 words of the romance Kyng Alisaunder. And at the top of the list there is a relatively short poem “The Thrush and the Nightingale”. Both these texts are from the second Middle English period, M2 (1250–1350).

In the following, we want to go back to the original texts and provide some illustrative examples of how the vocabulary of politeness and courtesy is used and what we can learn from these passages about politeness in earlier periods of the English language. The first example is taken from the text in the Helsinki Corpus with the highest density of politeness and courtesy vocabulary, the short Middle English debate poem “The Thrush and the Nightingale”.

(1) [\Nightingale\] ’Þrestelcok, þou art wod,
Oþer þou const to luitel goed,
Þis wimmen for to shende.
Hit is þe swetteste driwerie,
And mest hoe counnen of curteisie.
Nis noþing al so hende.
‘Thrush, you are mad,
Or too ignorant
In slandering these women.
They are the sweetest object of love,
And know most about courtesy;
There is nothing so well-bred.’ [7]

The thrush attacks women on the basis of classical authorities and his own experience (as a man rather than as a bird). Contrary arguments are presented by the nightingale, who defends women and their virtue, beauty and charm. Finally the thrush concedes defeat when the nightingale mentions the Virgin Mary, who was the culmination of all these virtues. The text abounds in courtly love terms, well in accordance with the love lyrics of the time. In extract (1) the central terms of this study curteisie and hende are used to describe women.

The second example is a letter written by the Mayor of London to King Henry V thanking him for letters received.

(2) Of alle erthely Princes our most dred soueraign Liege lord and noblest kynge, we recomaunde vs vnto your soueraign highnesse and riall power, in as meke wyse and lowely maner as any symple officers and pouuere lieges best may or can ymagine and diuise vnto her most graciouse and most soueraign kyng, Thankyng with all our soules your most soueraign excellence and noble grace of þe right gentell, right graciouse, and right confortable lettres, which ye late liked to send vs fro your toun of pount-de-larche, which lettres wiþ al lowenesse and reuerence we haue mekly resceyued, and vnderstonde bi which lettres, amonges al other blessed spede and graciouse tithinges in hem conteyned, for which we thanke hyly, and euer shulle, the lord almighty, ware we most inwardly conforted and reioysed, whan we herde þe soueraign helthe and parfit prosperite of your most excellent and graciouse persoune, which we beseche god of hys grete grace and noble pite euer to kepe and manteyne.

The highlighted expressions in this assessment are used as epithets for the addressee, i.e. as adornments to make the utterance polite. Other politeness formulae abound in letters. For instance, the address contains conventionalized phrases praising the recipient, and self-deprecating discourse is employed with elaborate forms in assurance of loyalty. Even the prosodic features contribute to the appropriate politeness in approaching a high status recipient. Ars dictamen was part of the art of writing formal letters, and the example above shows the writer’s competence in this art. Thus the vocabulary is only a small part of the decorum, and as such rather simple: gracious is repeated four times and gentill occurs once.

The third example is taken from an educational treatise, Elyot, The Boke Named The Gouernour with the subtitle The Ordre Of Lernynge That A Noble Man Shulde Be Trayned In Before He Come To Thaige Of Seuen Yeres originally published in 1531. The first instance of the adjective gentyl could be interpreted as ‘well-born’.


Nat withstandyng, I wolde nat haue them inforced by violence to lerne, but accordynge to the counsaile of Quintilian, to be swetely allured therto with praises and suche praty gyftes as children delite in. And their fyrst letters to be paynted or lymned in a pleasaunt maner: where in children of gentyl courage haue moche delectation.

But there can be nothyng more conuenient than by litle and litle to trayne and exercise them in spekyng of latyne: infourmyng them to knowe first the names in latine of all thynges that cometh in syghte, and to name all the partes of theyr bodies: and gyuynge them some what that they couete or desyre, in most gentyl maner to teache them to aske it agayne in latine.

Semblably the nourises and other women aboute hym, if it be possible, to do the same: or, at the leste way, that they speke none englisshe but that which is cleane, polite, perfectly and articulately pronounced, omittinge no lettre or sillable, as folisshe women often times do of a wantonnesse, wherby diuers noble men and gentilmennes chyldren, (as I do at this daye knowe), haue attained corrupte and foule pronuntiation.


The contents place polite language use in contrast to more everyday speech – a concern that was to become a major issue in the eighteenth century when polite language use became a defining feature of educated, “polite” society. The passage expresses the ideas that were to gain wide currency two centuries later by contrasting clene, polite, perfectly and articulately pronounced English with corrupte and foule language use attributed to folisshe women and their behaviour of a wantonnesse, as opposed to the language behaviour of noble and gentyl men.

The last passage is taken from a Middle English sermon.


Þou knowist verely þe hiʒ prudence of þis nobull virgyn and also hur sadnes in soule. Þou knowist also, gracious Lord, þis message þat þou commaundes me to execute.

Where-fore ʒiff itt like to thy most gracious lordshipp me to do þis message, I beseche þe, chef soueraygne Lord, graunte me þi signet, where-of when þat she haþ knalage þer-of, þat she may applie hur will to þi godly purpose.”

But þe language of þe aungell was soleyn and gracius when þat he seid þat she was full of grace and blissed hur abowon all women.


This passage contains the adjective gracious three times, but other vocabulary contributes to the overall style of the passage in praise of the Virgin Mary. The passage is very much in tone with the courtly love tradition that evolved around the Mary cult. The same vocabulary was also attributed to female saints (found e.g. in MED entries).

4. Conclusions

Politeness scholars, such as Eelen (2001) or Watts (2003), have strongly advocated the study of what they call first-order politeness or politeness1, i.e. the folk notion of “politeness” rather than second-order politeness or politeness2, i.e. some arbitrarily defined technical notion of politeness. However, so far surprisingly little work has focused on the metalanguage of courtesy and politeness in the history of English (but see Culpeper 2009 on the metalanguage of impoliteness). In this paper we have tried to provide some foundations for such an endeavour. On the basis of the Historical Thesaurus and the Oxford English Dictionary we have established a large trawl net of relevant expressions and their spelling variants in the semantic field of courtesy and politeness. Given the nature of the Oxford English Dictionary, politeness related terms that existed only in Old English and did not survive into Middle English slip through the net because they are not recorded.

Trawling this net through the rich fishing grounds of the Helsinki Corpus revealed areas of higher or lower density of politeness related vocabulary at particular points in the history of the English language and in specific prototypical text categories. But in the end, the analysis has to go back to the actual texts and interpret the specific politeness related vocabulary in the actual contexts in which they occurred. The sample analyses provided above revealed the extent to which this vocabulary was related to women and their behaviour, to the cult of Mary, but also as an ideology of the rising middle classes in terms of refined and educated behaviour. The analysis of such texts reveals quite clearly that politeness is a much broader notion than the use of a few items in the vocabulary. The letter written by the Mayor of London quoted above shows particularly clearly how the appropriate levels of politeness are achieved by the joint effect of several devices, including vocabulary, address formulae, and the written prosody of ars dictamen.

Courtesy and politeness are such complex and multi-faceted notions that a linguistic analysis necessarily can only reveal a very partial picture. The Helsinki Corpus and the other electronic tools described above served to point out interesting paths to follow. They turned out to be valuable as diagnostic tools, but the cases should not be overstated. A lot of work still remains to be done to gain a better understanding of politeness phenomena in the history of the English language.


We wish to thank Jonathan Culpeper, Christian Kay and Daniela Landert for perceptive and helpful comments on a draft version of this paper. All remaining errors and infelicities are, of course, our own.


Since the publication of this paper a problematic aspect of Figure 1 has been brought to our attention. We are grateful to Daniel Reisinger (personal communication), who noticed that the sharp rise of politeness and courtesy at around 1800 is an artifact of a typographical change that took place at that time. Before that time non-final <s> was printed as a long <s>, which looked like this: <ſ >. Optical character recognition software is likely to interpret long <s> as an <f>. And indeed a search for courtefy and politenefs reveals that their curves in the late eighteenth century form plausible anticipations of the curves for courtesy and politeness, which take over almost immediately after 1800.

Figure 1a: OCR errors of courtesy and politeness in “English 2009” of Ngram Viewer

However, Ngram Viewer has recently changed its algorithm. In its current version the erroneous courtefy and politenefs can no longer be found while searches for courtesy and politeness also retrieve spellings with a long <s>. Thus, a current replication of Figure 1 looks different from what it looked like when we first created it. The default version of the Ngram Viewer with the corpus choice “English” is the new improved one. The corpus option “English 2009” provides the Figure inserted in the original version of our article.

Figure 1b gives the developments of our selection of politeness terms according to the improved algorithm. Politeness still lags behind polite but not quite as much, and courtesy shows a steady increase starting in the middle of the eighteenth century.

Figure 1b: New version of Figure 1 without systematic OCR errors for non-final <s>


[1] Readers interested in this period are referred to the print version of the Historical Thesaurus of the OED (Kay et al. 2009) or the version on the Glasgow University website or A Thesaurus of Old English, all of which contain full OE material.

[2] Our software provided a word count of 1,755,858 words for the entire Helsinki Corpus. This contrasts with the official word count of the Helsinki Corpus of 1,572,760. In view of the fact that we consistently relied on the counts produced by our software, the discrepancy should not have any distorting effects on the results.

[3] For the dating of the individual hits of our searches in the Helsinki Corpus we used the relevant html codes given for each text, i.e. <C (part of corpus); or <M (manuscript) if different from <O (original) (see Kytö 1991: 48–49).

[4] The genre classification is based on html code <T (text type).

[5] EXPOS = expository, INSTR REL = instruction religious, INSTR SEC = instruction secular, NARR IMAG = narration imaginative, NARR NON-IMAG = narration non-imaginative, STAT = statutory (Kytö 1991: 55)

[6] Empty cells indicate that the Helsinki Corpus does not have any texts for this period and this text category, while a frequency of 0.000 indicates that there are relevant texts but they did not yield any hits.

[7] Translation taken from


Compendium of Middle English Prose and Verse (

Corpus of Contemporary American English (COCA) (

Link to the query of figure 1:

Link to the list of 185 search terms (.txt file)

Link to Excel file of figures 2 and 4

Link to Excel file of figure 3

Google Books Ngram Viewer (

Helsinki Corpus (;

Historical Thesaurus of English (

Historical Thesaurus of the Oxford English Dictionary (

Middle English Dictionary (

Oxford English Dictionary (

Thesaurus of Old English (


