Early Modern English

Terttu Nevalainen and Helena Raumolin-Brunberg

(Adapted, with the publisher's permission, from Rissanen, Matti - Merja Kytö - Minna Palander (1993), Early English in the computer age: explorations through the Helsinki Corpus. Berlin - New York: Mouton de Gruyter. Pp. 55-73. Minor changes have been made to remove outdated statements.)

1. Introduction

The Early Modern English section of the Helsinki Corpus differs from the sections representing earlier times in several respects. The greater number and diversity of texts resulting from the advent of the printing press yield finer temporal and textual divisions, and the relatively ample information about social conditions in Early Modern England allows us to use informants' social backgrounds as a reliable criterion in text selection. In contrast with the Old and Middle English sections, there are few problems connected with the dates and provenances of texts.

A crucial difference between the Early Modern English section and those representing earlier periods is the standardization of spelling, which affected most of the written language and led to the disappearance of localizable regional dialects in written material. Thus, the dialectal continuity observed in the earlier sections cannot be pursued here. This is by no means to say that all dialectal differences would have disappeared from the language even in its written form, but a great deal of standardization took place even in grammar and lexis at the supralocal level during the Early Modern period.

A further important difference is the type of source material. While all earlier texts are based on manuscripts, the early modern data go back to printed sources, with the only exceptions being private forms of writing such as diaries and letters. In the case of the latter, we have used the best edited material available.

On the whole, due to the temporal proximity of Early Modern English times, a more systematic application of the overall compilation principles referred to in the general introduction has been possible in this part of the corpus than in the earlier sections. In particular, it has been possible to take into account the sociohistorical background of the period in a systematic manner, as reported in Nevalainen & Raumolin-Brunberg (1989). [1]

The Early Modern English period is divided into three subperiods as indicated in Table 1.

Table 1. The Early Modern English period of the Helsinki Corpus: A quantitative overview.







EModE1 1500–1570






EModE2 1570–1640






EModE3 1640–1710












In contrast to the procedure followed for the other sections, for the Early Modern English period we have sought synchrony in diachrony by concentrating on three shorter periods within the framework presented in Table 1. These are the beginning of the sixteenth century, the turn of the seventeenth century and the latter half of the seventeenth century. This temporal division means shorter subperiods and better potential for synchronic comparisons than is possible for the Old and Early Middle English sections, although in some genres it has not been possible to follow the policy as consistently as we would have wished.

The division into three subperiods with an interval of 60 to 80 years is not arbitrary. Compared with present-day sociolinguistic studies of language change and recent research on historical sociolinguistics (e.g. Labov 1972, 2001; Nevalainen & Raumolin-Brunberg 2003), this interval may even appear too long for some rapid changes. We now know that the rate of change varies a great deal between different changes and even different phases of the same change. Although it may not always be possible to tackle the actual progression of a rapid change, such as the replacement of subject ye with the object form you and the development of its (see Nevalainen & Raumolin-Brunberg 1994; 2003), the situations before and after such change can certainly be examined using the Helsinki Corpus. In this way, the Helsinki Corpus can provide adequate material for a general survey of most early modern linguistic changes, which can subsequently be analysed in more detail using larger corpora.

The social conditions in Early Modern England also support a division into three subperiods. Although no particularly radical changes can be attested, unlike those that must have followed the medieval foreign invasions and the Black Death in the fourteenth century, English society did not remain unchanged between 1500 and 1700. It developed from a fairly static, sparsely populated Catholic peasant society into an increasingly stratified and economically diversified Protestant society, whose population more than doubled during this period (Briggs 1985; Clay 1984). Given the connection between social change and language change (Milroy & Milroy 1985b), there is good reason for collecting data not only from times before the acceleration of these changes (subsection I), but also from the period of their culmination (subsection II), and of the eventual stabilization of the state of affairs (subsection III).

2. Types of text

2.1. List of texts

The text types, prototypical text categories, dates of composition and number of words in each individual text are shown in Table 2.

Table 2. Early Modern English texts arranged by prototypical text category and text type, with dates of composition and word counts.

text category/
Text type

Author, Text


Subsection 1 (1500–1570)








The Statutes of the Realm (III; Henry VIII)









Thomas Vicary, The Anatomic of the Bodie of Man



Robert Record, The First Principles of Geometrie


Secular instruction







Master Fitzherbert, The Book of Husbandry



William Turner, A New Boke of the Natures and Properties of All Wines


Secular instruction/







Thomas Elyot, The Boke Named the Gouernour


(a. 1570)

Roger Ascham, The Scholemaster


Religious instruction






(1521, 1535)

John Fisher, The English Works of John Fisher



Hugh Latimer, Sermon on the Ploughers; Seven Sermons Before Edward VI


Nonimaginative narration







Thomas More, The History of King Richard III



Robert Fabyan, The New Chronicles of England and France





John Leland, The Itinerary of John Leland 6,860


Richard Torkington, Ye Oldest Diarie of Englysshe Travell 7,240





Henry Machyn, The Diary of Henry Machyn



Edward VI, Journal






Thomas Mowntayne, Narratives of the Days of the Reformation



William Roper, The Lyfe of Sir Thomas Moore


Imaginative narration







A Hundred Mery Talys



Thomas Harman, A Caveat or Warening for Commen Cursetors





Proceeding: trial




State Trials, The Trial of Sir Nicholas Throckmorton









Nicholas Udall, Roister Doister



William Stevenson (?), Gammer Gvrtons Nedle





Correspondence: private



(early 16th century)

Beaumont Papers



Clifford Letters of the Sixteenth Century



The Correspondence of Sir Thomas More



Original Letters (Illustrative of English History)



Plumpton Correspondence





Correspondence: official




Original Letters (Illustrative of English History)









George Colville, Boethius






William Tyndale, Five Books of Moses 10,100


William Tyndale, The New Testament 11,130
Total:   190,160

Subsection II (1570–1640)








The Statutes of the Realm (IV; Elizabeth I, James I)









William Clowes, Treatise for the Artificiall Cure of Struma



Thomas Blundevile, The Tables of the Three Speciall Right Lines Belonging to a Circle






John Brinsley, Ludus Literarius, or the Grammar Schoole



Francis Bacon, Advancement of Learning


Secular instruction







George Gifford, A Dialogue Concerning Witches and Witchcraftes



Gervase Markham, Countrey Contentments


Religious instruction







Richard Hooker, Two Sermons upon Part of S. Judes Epistle



Henry Smith, A Preparative to Mariage; Of the Lords Supper; Of Usurie


Nonimaginative narration







John Stow, The Chronicles of England


(a. 1627)

John Hayward, Annals of the First Four Years of the Reign of Queen Elizabeth






John Taylor, Pennyles Pilgrimage



Robert Coverte, A Trve and Almost Incredible Report of an Englishman






Richard Madox, The Diary of Richard Madox



Margaret Hoby, Diary of Lady Margaret Hoby






Simon Forman, The Autobiography and Personal Diary of Dr. Simon Forman


(c. 1600)

John Perrott (?), The History of That Most Eminent Statesman, Sir John Perrott


Imaginative narration







Robert Armin, A Nest of Ninnies



Thomas Deloney, The Pleasaunt History of ... Iack of Newberie





Proceeding: trial




The Arraignment of the Earles of Essex and Southampton



State Trials, The Trial of Sir Walter Raleigh









William Shakespeare, The Merry Wiues of Windsor



Thomas Middleton, A Chaste Maid in Cheapside





Correspondence: private




Barrington Family Letters



The Correspondence of Lady Katherine Paston



The Ferrar Papers



The Knyvett Letters



Letters of Philip Gawdy



Letters of the Lady Brilliana Harley



The Oxinden Letters





Correspondence: official




The Edmondes Papers



Original Letters (Illustrative of English History)









Elizabeth I, Queen Elizabeth's Englishings of Boethius, Plutarch, &c









The Authorized Version





Subsection III (1640–1710)








The Statutes of the Realm (VII; William III)









Robert Hooke, Micrographia



Robert Boyle, Electricity & Magnetism


Secular instruction







Izaak Walton, The Compleat Angler



T. Langford, Plain and Full Instructions to Raise All Sorts of Fruit-Trees






John Locke, Directions Concerning Education



Charles Hoole, A New Discovery of the Old Art of Teaching Schoole


Religious instruction







John Tillotson, The Folly of Scoffing at Religion



John Tillotson, Of the Tryall of the Spirits



Jeremy Taylor, The Marriage Ring


Nonimaginative narration    



(a. 1703)

Gilbert Burnet, Burnet's History of My Own Time



John Milton, The History of Britain






Celia Fiennes, The Journeys of Celia Fiennes



John Fryer, A New Account of East India and Persia






Samuel Pepys, The Diary of Samuel Pepys



John Evelyn, The Diary of John Evelyn






George Fox, The Journal of George Fox



Gilbert Burnet, The Life and Death of John Earl of Rochester


Imaginative narration







Penny Merriments (= Samuel Pepys' Penny Merriments)



Aphra Behn, Oroonoko





Proceeding: trial




State Trials, The Trial of Titus Oates



State Trials, The Trial of the Lady Alice Lisle









John Vanbrugh, The Relapse or Virtue in Danger



George Farquhar, The Beaux Stratagem





Correspondence: private




Correspondence of the Family of Haddock



Correspondence of the Family of Hatton



Diaries and Letters of Philip Henry



Letters of John Pinney



Original Letters of Eminent Literary Men



The Oxinden and Peyton Letters





Correspondence: official




Essex Papers; Selections from the Correspondence of Arthur Capel



Original Letters (Illustrative of English History)









R. Preston, Boethius





2.2. Text type continuity and variation [2]

One aspect of standard language development is the use of the standardizing variety in maximally varying public functions. In Early Modern English, this means the spread of the vernacular to all areas of public writing, including education, church, science and belles lettres. The printed records that have come down to us also testify to the great variety of popular literature, both utilitarian and entertaining, written in English in the Early Modern period (Spufford 1981). Yet another domain that is similarly diversified and shows a remarkable increase in the use of the vernacular is that of private writing, such as personal diaries, private correspondence and autobiographies. These developments partly reflect our temporal proximity to the period, and partly the role of the printing press and increased literacy in Early Modern England.

For this period, the compilers of the Helsinki Corpus also followed the three general principles of compilation listed in the introduction: (a) avoidance of translations, (b) concentration on prose, and (c) use of printed or edited material. Only two translations were included: the Bible (Tyndale’s and the Authorized Version) and Boethius’ De Consolatione Philosophiae (translations by Colville, Queen Elizabeth, and Preston). Their value lies in the direct continuity that they present with the earlier translations included in the Helsinki Corpus.

Direct text-type continuity also occurs in most other nonprivate genres, such as law, handbooks, early science, sermons, history, biography and fiction. With less public writings, drama and correspondence for instance, the continuity usually only extends to Middle English. The main difference between the earlier sections and Early Modern English is that, with the exception of the translations and correspondence, all text types are as a rule represented by two different texts sampled from each subperiod. While the principle of genre continuity is generally observed, we have also made an effort to include some of the novel data sources of the period in the Early Modern English corpus. Their main value lies in the expansion of the generic repertoire to give a better overview of linguistic variation as it is attested in the aggregate of extant texts.

In contrast to the Middle English section, the completely novel genres in Early Modern English therefore encompass both public and private writings. Public genres include educational treatises and proceedings of state trials, and private writings include personal diaries and travelogues. There are also a number of new subgenres or related text types. These usually represent the abandonment of medieval religious subject matter in favour of secular themes. Thus, early comedies replace Middle English mystery plays, and biographies of poets and statesmen continue the earlier tradition of saints’ lives. Intermediate forms also occur. Two cases in point are Thomas Mowntayne’s autobiography, from Reformation times, and George Fox’s journal, taken down from dictation.

The bulk of extant texts thus indicate an increase in the proportion of personal and secular records in the vernacular. The Early Modern English section attempts to capture the process of textual diversification and to include some of the register variation available. This means that all previous genres cannot be represented to the same extent as earlier. Old and Middle English documents, religious treatises and rules have no counterparts in the Early Modern English section. Sermons delivered by individual preachers have been selected instead of homilies. Given the use of the corpus in the study of language variation and change, it seemed to us important to focus both on the diachronic continuity of text types and on the synchronic diversity of the texts available. It is hoped that broad diachronic comparability is secured by the use of the diachronic prototypes discussed in the general introduction.

The genre selection naturally shows internal variability. It may reflect register variation due to such situational parameters as the purpose of the text, social and age differences among the writers, or conscious stylistic choices, especially in more literary domains of writing. As pointed out in section 5, below, social variation has only been controlled through the choice of writers in so far as to guarantee the representativeness of the data. The relevant distinctions (age group, sex, and social and professional status) are recorded by the coding parameters associated with each individual text, and may be used to trace factors that possibly condition register variation.

Generic variability may also be observed in the purely formal organization of the texts. Dialogue form was more common in early modern times than might be expected on the basis of the generic usage of today. In addition to plays and trials, we have also included dialogue texts typical of such varied genres as philosophical and educational treatises (Boethius, Brinsley), handbooks (Gifford, Walton) and fiction (Penny Merriments). Occasional passages of dialogue appear without restriction in other text types as well. No systematic attempt was made to take this kind of variation into account where it occurs intratextually.

In the case of a highly variable genre such as the sermon, known stylistic variation was included at the compilation stage by selecting representatives of two different stylistic strata from each subperiod. The choices were based on traditional literary scholarship, and it was expected that the differences between, for instance, the popular “silver-tongued” Smith and the arguably more florid Hooker would manifest themselves linguistically. As the discussion of generic drift in section 4 suggests, continuity of ornate and plain preaching styles may not always be evident from one subperiod to another (on the Early Modern English sermon, see, e.g., Mitchell 1932, Herr 1940, and Blench 1964). [3]

In order to improve genre comparability across time, subgenre distinctions and subject matter were taken into account in the categories of fiction, histories, education, handbooks, science and travel. In each subperiod, fiction includes both traditional jest books (“merry tales”, “penny merriments”) and longer prose narratives, such as those by Deloney and Behn, which bear some resemblance to the modern novel. Histories and chronicles continue the medieval annalistic tradition on the one hand (Fabyan, Stow, Milton), and record near-contemporary history on the other, intermingling narratives with character sketches (More, Hayward, Burnet). Finally, education, handbooks, science, and travel are further divided on the basis of subject matter. Each of the three subperiods is represented by an educational treatise with more theoretical aims (Elyot, Bacon, Locke) and by one with purely practical goals (Ascham, Brinsley, Hoole). One handbook from each period deals with some aspect of husbandry (Fitzherbert, Markham, Langford), and two medical treatises are included in science (Vicary, Clowes). The travel books that we have selected describe both domestic (Leland, Taylor, Fiennes) and foreign travel (Torkington, Coverte, Fryer).

3. Availability of texts

The later the texts, the more likely they were to meet the corpus criteria. While it was difficult to find representative texts for several genres from subperiod I, very few problems arose for subperiod III. The following examples illustrate the decisions that had to be taken for the first subperiod. It was very difficult to find a private diary to parallel that of Henry Machyn, and finally the diary of the young King Edward VI was chosen, although it is not as personal as later diaries in the same genre. For the category of private correspondence, personal letters exchanged within the family circle were sought, but few edited collections were available. Thus, even the Plumpton letters were included, despite the fact that the edition is based on early seventeenth-century copies of the originals. Although the Early Modern English part of the corpus was supposed to comprise prose texts alone, verse was found to be unavoidable in early comedy.

Availability has a clear connection with optimal timing. As mentioned above, although past synchrony was sought, it was not always possible to find strictly contemporary texts from all the different genres. Our intention was to collect the first period texts from the first half of the sixteenth century, but the two text samples representing each text type both date from the target period only in law, the Bible, sermons, chronicles, travelogues, and private and official correspondence (i.e. in seven out of sixteen genres). In subperiod II we found a better concentration at the turn of the century, but even here there were problems in some genres; for instance, most of the private letters date from the 1620s and 1630s.

On the whole, there are marked generic differences in availability, and hence the range of choice varies. In some genres, especially in our first subperiod, it is not possible to find extant texts by more than a very small number of authors. In addition to diaries, this is true for example in autobiographies (Mowntayne) and educational treatises (Elyot and Ascham). On the other hand, in such genres as private correspondence there is an abundance of material, especially in subperiods II and III. However, since a more specific coding system (husband-wife, parent-child, etc.) was applied here than elsewhere, a large amount of data had to be examined before the selections could be made. Much work was also done on official correspondence, where letters jointly signed by several persons were thought to be ideal, but unfortunately found to be relatively infrequent. Consequently, the letters included in the corpus are mainly written by government officials in their official capacity.

4. Genre drift

The Early Modern English diversification of written records involves some issues of genre continuity and comparability that merit a separate discussion. New genres and subgenres are derived from a variety of existing sources. Although they usually bear a family resemblance to their sources, generic continuity within the corpus may in these cases be limited to the general level of diachronic prototypes. Thus, educational treatises may be compared with earlier secular instruction, including handbooks and science and philosophy, and (auto)biographies with other nonimaginative narratives, such as histories, religious treatises and travelogues. Genre-internal variability may still be considerable. Fowler (1982: 157) notes that, in its early form, autobiography, for instance, combined elements from courtesy books, chronicles, epistles, anecdotes, exempla and essays.

While diachronic differentiation is only to be expected with new text types, similar shifts within established genres may on the whole be less desirable from the point of view of a corpus user. Period styles are naturally reflected in the choice of the more literary texts. We should then expect these diachronic register drifts to affect the language of such genres as fiction, plays, educational treatises and sermons, even within the Early Modern English period. Sermon styles have already been mentioned above. While Latimer is usually presented as a colloquial preacher and Fisher as ornate, a similar distinction can be applied less straightforwardly to the well-known preachers of the late seventeenth century. At the less elaborate end of the stylistic continuum, colloquial preaching yielded to a plain style. Tillotson’s style, for example, has been characterized as “plain but well-modulated” (Mitchell 1932: 339).

Restoration comedies differ from earlier comedies in that they tend to take place in fashionable society. They may therefore be expected to display linguistic registers different from those of the Jacobean city comedy, or indeed the mid-sixteenth-century “merry interludes”. The modest selection in the Helsinki Corpus includes plays with both town and country settings from all three subperiods, but this does not necessarily guarantee register continuity. One thing is clear, however: formally, the first period (1500–1570) cannot be reconciled with the rest. In a continuation of the medieval tradition, all the early sixteenth-century comedies were written in rhyming couplets.

Generic drift has also attracted the attention of historical linguists, who have detected such variability in genres such as correspondence. Davis (1967) comments on a number of conventional and nonspontaneous elements in personal letters in Late Middle and Early Modern English. Markus (1988) even suggests that the sixteenth century should be taken as the dividing line between formal and familiar letters. On the basis of a detailed linguistic analysis, Biber and Finegan (1989) find their sample of seventeenth-century letters to be on the whole more oral than the eighteenth-century data they examined. These various observations would then seem to point to a tendency for personal letters to become more oral in the course of the Early Modern English period. At present, the drift still remains very much an open question (see, however, González-Álvarez & Pérez-Guerra 1999). The parameter coding of the Helsinki Corpus should help to settle some differences arising from purely external factors, such as the correspondents’ unequal status, and help the corpus user to compare like with like.

5. Social factors and the evolving standardness of texts

One of the original guiding principles for the compilation of the Early Modern English period corpus was the assumed standardness of texts. In contrast to the situation during the Old and Middle English periods, local varieties were less and less frequently used in writing, and it would have been practically impossible to create a localizable dialect corpus from this period (cf. Samuels 1981: 43). The concentration on the incipient national standard in the corpus was not based on a value judgement, but on the practical aim of compiling a corpus of texts with internal comparability, both in terms of periodization and genre continuity (Nevalainen & Raumolin-Brunberg 1989). Our later research has shown that it may have been too early to speak about a national standard language during the early modern period, as both codification and prescriptive grammars only appeared in the eighteenth century. We have also seen that regional differences appear even in writing, especially as far as quantitative differences in the use of linguistic innovations are concerned (Nevalainen & Raumolin-Brunberg 2000, 2003: Nevalainen 2003; Nevalainen & Tieken-Boon van Ostade 2006).

Our text selections were based on nonlinguistic factors in order to keep the corpus open to all kinds of linguistic research without risking circular reasoning. The main criteria were the social and educational backgrounds of our informants. The most likely users of the incipient standard were the male representatives of the gentry and the professions. They were literate, and usually educated above the elementary level. Although they represented only a small minority of the total population, the vast majority of the texts that have come down to us stem from these people. A parameter with the values “high”, “professional” or “other” is used to classify the authors of the texts included in the corpus. Writers marked with the first two parameter values were most likely to be using the evolving standard, whereas the group labelled “other” may include texts with dialectal features.

A further factor affecting the standardness of the texts is the domain of language use. Language used in public life is more likely to represent the incipient standard than private writings. Colloquial registers and spoken language are inherently the least standardized and standardizable areas of language use (Milroy & Milroy 1985a: 88-89). Thus, such speech-based texts as trial recordings and comedies may contain regional features. In fact, these two genres allow us glimpses of the dialectal speech of the lower ranks in the form of recorded testimonies and stage dialect. Despite not necessarily attempting to record spoken language, private letters also contain informal language. This is an area where a great deal of work was done to find letter writers with identical sender-recipient relations from all three subperiods. The private letters in the corpus consist of letters from mothers and fathers to their sons and daughters and vice versa, as well as letters between siblings collected in a systematic manner.

In view of the central role women have been found to play in linguistic change (cf. Labov 1972: 301-304; Milroy 1987: 190-197), female writers were given priority whenever possible, even if their language was also likely to contain regional features. Consequently, the corpus came to include samples of women’s language from travel books (Celia Fiennes), diaries (Lady Margaret Hoby), trial proceedings (Lady Alice Lisle), fiction (Aphra Behn) and private letters (several in all subperiods).

It is obvious from the above that, despite the attempts to choose texts that are broadly standardized, some of them may fall short of this requirement. In order to ensure a broad generic variability and to guarantee social representativeness, the standardization criteria were relaxed in the selection of private texts. Even here we tried to avoid the clearest cases of regional dialects by giving Londoners priority in the selection of early texts, since their language was more likely to represent the evolving standard. Our later research on the Corpus of Early English Correspondence has shown that, indeed, early modern morphosyntactic changes tended to diffuse from the capital area (Nevalainen & Raumolin-Brunberg 2000, 2003: 157-184). If no relaxation of criteria had taken place, the language of the middle ranks and women would have been excluded. That of the lowest ranks, of course, was not even available because of the illiteracy of this social stratum.

Standardization of language can hardly be examined without attention to the general level of education and literacy. With school attendance increasing, the overall literacy of men increased from c. 10% to c. 45% between 1500 and 1700 (Cressy 1980: 77). The level of literacy varied both socially and geographically: the higher ranks and town dwellers in general, and Londoners in particular, could read and write more often than the rest of the population, and literacy rates for women lagged far behind those for men.

As mentioned above, spelling is the area where standardization was most notable during the early modern era. The regularization of spelling increased throughout the Early Modern English period, so that a unification of printing house practices was reached by 1650 (Scragg 1974: 68). In nonprinted sources, private spellings can be found during all subperiods, and they are especially frequent in texts written by women.

As successors to the earliest national standard, fifteenth-century Chancery English, legal language and official correspondence have been regarded as representatives of the evolving standard norm at the beginning of the sixteenth century (Gomez Soliño 1981; Raumolin-Brunberg & Nevalainen 1990). [4] It seems, however, that this type of official language quite soon lost its norm status and became a specialized language (Nevalainen & Raumolin-Brunberg 1994). Some other text types also lost their sensitivity to changes and gradually became archaic. A good example is the Bible, of which no new standard translation was made after the Authorized Version during the Early Modern English period.

In general, the corpus user is given the choice between texts that strictly represent the incipient standard language (documentary and other nonprivate material and texts written by gentlemen or professional men) and texts with potential dialectal features (entertainment and private writings and texts written by men of middle ranks or by women).

6. Authorship

In the vast majority of cases no problems were found identifying the authors of the Early Modern English texts. The main exceptions were such official documents as The Statutes of the Realm, which represent the anonymous continuity of the Chancery Standard. Similarly, no attempts have been made to further identify the people who, in their official or private capacity, recorded the proceedings of the State Trials. As pointed out in section 7, below, Hargrave’s edition appears fully satisfactory in terms of substance, but in view of its variable sources should not be trusted to reproduce original spellings.

With the exception of Armin’s Nest of Ninnies, no single author has been identified for the jests and merry tales. This seems to be in the nature of popular literature, and is also reflected in the authorship problems of early drama. Following common editorial opinion, the authorship of Gammer Gvrton’s Nedle has been ascribed to William Stevenson, Fellow of Christ’s College, Cambridge. The full identity of the author of Master Fitzherbert’s Book of Husbandry is also uncertain. There were two Fitzherbert brothers, one a lawyer and the other a gentleman farmer, and opinions are divided as to which was the author. Usually, the work is attributed to John, the gentleman farmer. Finally, the author of Perrott’s published biography is not given on the title page of the book, probably because of the precarious political status of Sir John Perrott at the time. This anonymous biographer was not identified in the other sources consulted, either (see, e.g., Stauffer 1930: 137-140).

7. Editions and principles of excerption

In order to ensure the linguistic accuracy of the texts selected, samples were as a rule drawn from the standard editions with original spelling. In the case of early printed books, the Scolar Press (English Linguistics) and English Experience facsimile reprints were used extensively. With some types of texts, such as diaries and correspondence, there was usually no choice of edition, and we had to accept whatever edition was available. The level of linguistic accuracy of these editions may vary to some extent, and, in particular, the spelling of some nineteenth-century editions cannot be fully trusted. To take an example, when Axel Wijk (1937) re-examined the manuscript of the diary of Henry Machyn, he found a number of discrepancies with the standard edition. This was the only one available even at the time of the compilation of the corpus, but a new electronic edition has recently been provided by a team at the University of Michigan (Bailey et al. 2006).

Modernized editions were avoided. For this reason, the edition of the Lisle letters from the early sixteenth century was not included in the corpus. Allowance was only made for the third-period diary of Samuel Pepys, which was written in shorthand. His spelling is therefore not recoverable, but the standard edition by Latham and Matthews reproduces his morphology with great care and accuracy.

Where there was more choice among editions, we as a rule compared them in order to find a reliable one. Thus, Hargrave’s edition of the State Trials, for instance, compared favourably with the earlier or other contemporary versions of the trials that were consulted (e.g. Holinshed’s record (1577) of the trial of Sir Nicholas Throckmorton; the 1719 account of the trial of Sir Walter Raleigh, along with the 1677 life and Overbury’s 1648 copy of his trial and conviction). It is important to point out, however, that Hargrave’s edition does not represent contemporary early modern spelling, but rather follows the spelling practices of his own time.

The individual texts were normally excerpted so as not to violate the generic structure of the text types. Where possible, the samples consist of entire (sub)texts, such as letters, diary entries, jests and statutes. Where the generic structure was not so well defined, as is usually the case with book-length treatises, two 2,000 to 3,000-word samples of paragraph sequences were selected from different parts of the work. In both cases, each genre is represented by a minimum of 10,000 running words in each of the three subperiods (see the word counts in Table 2).

In retrospect, it is clear that a 5,000-word sample is not sufficient for a variationist study of linguistic elements that are not among the most frequent ones. Luckily, the corpus has been augmented by additional material from the texts already included in it (the Penn-Helsinki Corpus of Early Modern English).

8. Concluding remarks

With the proliferation of texts available in the Early Modern English period, there were fewer self-evident choices than in the earlier periods. However, the choices were still far from random, due to the number of requirements that had to be met by individual texts, including date, generic continuity and comparability across time, together with the social status and sex of the writers. In a broad sense, the present selection consists of judgment samples. This means that other texts could have been chosen as well, as long as they fulfilled the basic criteria of selection.

The extensive use of the corpus since its publication no doubt indicates that the criteria and the actual selections were successful in providing balanced and representative, albeit necessarily rather limited, material for the study of Early Modern English.


[1] The article (Nevalainen & Raumolin-Brunberg 1989) not only contains data on the Early Modern English section but also discusses the principles and problems connected with the compilation of diachronic corpora in general. The article is concerned with the following issues: connections between society and language change, the representativeness and periodization of the Early Modern English section, the rise of the standard language and the role of London in the standardization process, the education and literacy of the informants, and the selection and continuity of the text types. Hence, it provides a broader background for the Early Modern English section than the present introduction, which in turn includes more information on the choice of the individual texts in relation to the rest of the Helsinki Corpus.

[2] The term ‘text type’ is used here to refer to the classification of texts according to external criteria, roughly corresponding to the German term Textsorte. Many of the ideas for the text classifications in the Helsinki Corpus go back to German scholarship, in particular Werlich (1976/1983). According to more recent approaches, ‘text type’ would be replaced by ‘genre’.

[3] Several stylistically plain sermons, educational treatises, private letters and comedies in the corpus were drawn from a larger noncomputerized database compiled for the study of Early Modern English exclusive adverbials; see Nevalainen (1983, 1991).

[4] In order to test the range of standardness within subsection I, a small study of the spelling variation was carried out (Raumolin-Brunberg-Nevalainen 1990). In this research, the official texts (The Statutes of the Realm and official letters) were taken to represent the standard norm, and the private writings (private correspondence, personal diaries and autobiographies) were compared with the norm. The analysis included the variant spellings of 28 frequent lexemes. The results show a surprisingly high similarity between the variants used in the different registers: the most frequent variant is the same for 82% of the lexemes, and there is no lexeme for which either of the varieties (standard norm and private writings) would have only one exclusive variant of its own.


