Investigating the development of ESP through historical corpora: the case of archaeology articles written in English during the Late Modern period (and beyond?)

Daniela Cesiri
Department of Comparative Linguistic and Cultural Studies, Ca' Foscari University of Venice


A number of recent linguistic contributions focus on the study of English for Specific Purposes (ESP), a discipline that studies the use of English for communicating specific and specialized knowledge in the global context. However, as specialised terms often enter the general lexicon as well, diachronic linguistic inquiry is essential to study the development of ESP as also the evolution of general English.

Specific terms become, then, part of the general public’s linguistic repertoire, contributing to the spread of ‘scientific’ lexis and to the popularisation of specialized knowledge. One example of a discipline that awaits further linguistic investigation is archaeology, a field that is becoming increasingly popular among the general public both due to the desire to rediscover our ancient past and also thanks to the spread of popularised publications, journals, television programmes and movies (Clack & Brittain 2007). The investigation of a historical corpus of archaeology texts and essays is therefore important for studying the evolution of the discipline’s specific discourse in English and how the language of archaeology in English has evolved to become a distinct branch of ESP.

A previous contribution (Cesiri forthcoming) considers the linguistic features of present-day cultural heritage research articles (of which archaeology constitutes an important part). In continuation of this study, my article will seek to investigate the linguistic features characterising publications in English on archaeology. I will consider in particular the beginning of the discipline, that is to say the nineteenth and early twentieth centuries, which are the core centuries in the development of scientific techniques in archaeology and it gaining a proper academic status. The study will use a corpus of archaeology texts and the corpus analysis software Wordsmith Tools 5.0 (Scott 2008). Finally, the results from this study will be compared with those from Cesiri (forthcoming): this will be essential in the investigation of disciplinary and linguistic evolution in the field of archaeology as a distinct type of ESP.

1. Introduction

The present study is meant as a continuation of a synchronic analysis conducted on the present-day English used in the academic field of cultural heritage studies (CHSs) which include the sub-fields of art history and criticism, archaeology and cultural heritage preservation and restoration. In the  earlier study (Cesiri forthcoming), these fields were only very cursorily investigated from the perspectives of either English for Special Purposes or English for Academic Purposes, but the preliminary analysis in that direction suggested quite interesting outcomes.

The first preliminary results were presented during the CERLIS International Conference organised last June 2011 by the University of Bergamo (Italy) on the use of English for Academic Purposes (EAP) in the field of CHSs. The analysis was conducted using a corpus of research articles (RAs) in the three fields comprising nearly 900,000 words. In order to try to classify the three fields along the hard-soft sciences continuum, the use of hedges and boosters in the RAs was analysed. The use of these devices can be identified as stylistically typical of a discipline (cf. Hyland 2009), thus enabling the classification of a particular discipline as belonging to the hard or the soft sciences. This is useful information for scholars who write in that discipline, because respecting the syntactic, stylistic and lexical rules established by the discourse community to which the discipline belongs is essential in order to be understood and accepted by the other members of the community itself.

That contribution introduced the official definition of cultural heritage in general, and of CHSs in particular. The definition is the one generally accepted by the members of the scientific community itself and was drawn from the UNESCO Draft Medium Term Plan 1990-1995.

UNESCO is – by its own definition – a specialised agency of the United Nations system, whose main objective “is to contribute to peace and security in the world by promoting collaboration among nations through education, science, culture and communication”. Its Medium Term Plan is an official document produced after consultation with the Member States and the international scientific and intellectual community which would serve to focus on the right projects, ideas and actions that could be necessary for the preservation of human knowledge as well as of cultural and intellectual heritage. It was considered essential that the definition of cultural heritage given in the document include as many sites and monuments, human artefacts and natural landscapes as possible which might need preservation and protection from damage or decay. In the document, cultural heritage is defined as

the entire corpus of material signs – either artistic or symbolic – handed on by the past to each culture and, therefore, to the whole of humankind. […] The idea of the heritage has now been broadened to include both the human and the natural environment, both architectural complexes and archaeological sites, not only the rural heritage and the countryside but also the urban, technical or industrial heritage, industrial design and street furniture” (UNESCO, 25 C/4, 1989: 57 in Jokilehto 2005: 4–5).

The above definition entails that the field of CHSs is considered by the same scientific community as an interdisciplinary academic field which considers critically the ways heritage is preserved, presented and participated in by scholars and ‘consumers’ (cf. Stig Sørensen and Carman 2009).

The quotation provided above served as a starting point and reference definition for research conducted in Cesiri (forthcoming) as it provides a background on which a linguistic and discursive description might be drawn. Indeed, at a preliminary level of investigation, observation of the topics covered in present-day RAs considered in Cesiri (forthcoming) immediately reveals that this is a composite academic domain, since it includes contributions dealing with history, the arts (history and criticism), archaeology and the most technical aspects of preservation and restoration of monuments, artefacts, manuscripts, sites and so forth. In my previous contribution, I considered CHSs as composed of three different disciplines: archaeology, art history and criticism and cultural heritage preservation and restoration. These domains are (and were) not to be considered a monolithic classification since they include sub-disciplines focusing on different and specific aspects of the main domain. For instance, archaeology – which is the discipline at the core of the present paper – can include historical archaeology, palaeontology, marine archaeology, and so forth. However, these sub-discipline share with their main domain a specific theoretical bedrock, so they can ultimately be considered as belonging to the main discipline. Finer grained linguistic and discursive analyses can certainly unveil more subtle discursive differences and help collocate the single sub-disciplines along the continuum illustrated in figure 1 below, but for the purposes of the present analysis, only the main domain of archaeology will be considered. Indeed, given that the documents analysed here represent the first instances of archaeology as a modern science (see paragraphs immediately following), we might presume that the sub-disciplines existing today had not taken a separate form yet owing both to the technological and the theoretical limitations of a fledgling scientific discipline. However, considering the variety of aspects and approaches in present day archaeology, RAs in the area of CHSs often adopt methodologies and theories belonging to the ‘soft’ as well as to the ‘hard’ sciences.

As far as the field of CHSs is concerned, research outcomes from my previous study were quite varied: the field of art history and criticism exhibited features typical of the soft sciences/humanities in which the use of hedges to lessen the force of a writer’s statement within a proposition prevails over the more ‘neutral’ style of the hard sciences. On the other hand, the field of cultural heritage preservation and restoration seems to be closer to the hard sciences, which might be explained by the higher technical and ‘scientific’ features of the sub-discipline itself in which domain and genre hybridisation with disciplines such as chemistry and physics go together with the historical-artistic analysis of the sites or artefacts treated.

The field of archaeology was the most complex of the three since it showed a slightly greater use of hedges/boosters than the hard sciences but not quite like the humanities, so I proposed to classify it as a hybrid domain, defining it as a technical discipline, positioned half-way along the disciplines’ continuum (see figure 1).

In order to complete the information already available from this preliminary study and to take a diachronic look at the evolution of a discipline which has a century-long tradition (cf. Glyn 1981), the aim of this study is to compare data obtained from the present day state of the discipline with those from the Late Modern (LMod) period (c. 1720–1910). The study of the early years of archaeology as a modern discipline will contribute to the study of disciplinary and linguistic evolution in the field of archaeology as a distinct type of specific language in academic contexts. I will investigate the use of hedges and boosters – as was done in Cesiri (forthcoming) – in archaeological publications in English. In particular, I will consider the beginning of the discipline as a modern science, i.e. the nineteenth and early twentieth centuries which are core to the development of scientific techniques in archaeology and the achievement of a proper academic status for the discipline (cf. Evans 2008).

Major changes in the field techniques and practical tenets of the discipline had some effect on the publications which made the results of scientifically-driven archaeological investigations public. The scholars’ language changed and started to resemble a style typical of properly scientific narrative. To what extent it became similar to present-day style and features of ESP and how LMod archaeologists presented their results is the main aim of the present study.

2. State of the art

There are only a few studies that have attempted a linguistic description of RAs in the field of CHSs and only then for the sub-field of art history, namely by Kemal & Gaskell (1991) and Tucker (2003, 2004). These studies, however, do not seek to examine the discipline in a particular academic domain but describe its intrinsic characteristics per se. As far as archaeology is concerned, a contribution worth mentioning is Joyce (2002), an introduction to the stylistic and semiotic conventions of the discipline rather than a linguistic analysis of its features in the English for Specific Purposes (ESP) and the EAP contexts. To the best of my knowledge, no study has yet considered the linguistic features of what I shall call LMod Archaeology. However, a number of contributions describing this period in the discipline from a historical viewpoint can be mentioned, such as Glyn (1981) and Levine (1986).

2.1 Archaeology during the LMod period

The first steps towards archaeology as a science took place during the Enlightenment, or the Age of Reason. The seventeenth and eighteenth centuries in Europe were a time of great growth in scientific and natural exploration, and this was a crucial period in the history of archaeology as well. What was at that time still called ‘Antiquarianism’ (Levine 1986) consisted of research into what was true and scientifically verifiable in the Bible, and in Greek and Latin classic accounts of ancient history. The professional archaeologist was still a sort of adventurer who looked for ancient – and possibly valuable – artefacts to be sold to museums and private collectors, usually the aristocracy.

By the beginning of the nineteenth century, the museums of Europe were beginning to receive a huge amount of relics from all over the world. These artefacts came in completely unorganized. The persons who first studied and attempted to preserve this heritage from the ancient past were mainly historians and the first trained professional archaeologists, who felt that some form of professional intervention was necessary in order to save artefacts and monuments from ‘plunderers’ and amateurs who could irreversibly damage these relics. In short, they felt that what was still a field of amateurs should be turned into a proper academic discipline. The real advancement towards the techniques and methodology of modern archaeology can be attributed to three scholars: Heinrich Schliemann, Lane Fox Pitt-Rivers, and William Flinders Petrie (cf. Glyn 1981).

3. The present-day corpus: reference data for the present analysis

The information in this section concerning the present-day corpus will be useful for understanding the references made later on in this paper, especially regarding the presentation of data from the LMod period and the comparison with present-day data.

The RAs used in Cesiri (forthcoming) were published in international journals. The most important journals for each field were selected based on the ratings of the European Reference Index for the Humanities (ERIH). The journals were selected from those rated  ‘A’ and assigned to the sub-category INT1 for international journals which includes “international publications with high visibility and influence among researchers in the various research domains in different countries, regularly cited all over the world” (ERIH 2012). With the purpose of having a homogeneous sample for each journal, I considered only the first three issues of each journal among those published in 2010. Indeed, not all of the journals had a fourth issue already published when the RAs were collected, so the articles were gathered by considering all the RAs included in these issues excluding reviews and editorial comments. The total number of articles thus collected was 118.

The final group of RAs was divided into three corpora, for each of the sub-disciplines which appear to compose CHSs, i.e. archaeology, art history and criticism, cultural heritage preservation and restoration. As concerns archaeology, 41 articles were selected from two journals, namely the International Journal of Historical Archaeology and the Oxford Journal of Archaeology.

The total size of the corpus was 894,785 words distributed in the three corpora, of which 345,118 are in the archaeology sub-corpus. Quantitative data will be presented both in raw figures and in normalised in figures in order to have a more accurate picture of the frequencies in the data to be analysed.

4. The Corpus of LMod archaeology texts

In order to investigate the features described above – hedges and boosters – the primary texts were grouped to form a computer-searchable corpus. The corpus is composed of four collections of essays published in specific volumes and printed in the United Kingdom and two issues of the Journal of the British Archaeological Association whose essays are shorter than the former and might be considered of a text type similar to RAs. The composition of the corpus is illustrated in table 1. [1]

Four collections of essays
Two issues of the Journal of the British Archaeological Association
1867 (Archaeological Institute of Great Britain and Ireland): 81,560 words 1882 issue 209,442 words
1880 (Essays on Art and Archaeology): 138,715 words 1883 issue 194,248 words
1908 (Annals of Archaeology and Anthropology Issued by the Institute of Archaeology, University of Liverpool): 65,734 words
Total number of words composing the whole corpus: 719,356

Table 1. Corpus composition

The titles Essays on Art and Archeology (1880) and Annals of Archaeology and Anthropology (1908) might be somewhat misleading when it comes to their actual contents. While it is true that these publications generally contained articles dealing with archaeology and art and anthropology, a careful perusal of their contents led to the selection of these particular issues because they contain articles dealing with topics specifically related to archaeology. They could thus be considered – using present-day terminology – monographic issues. The same consideration is valid for all the other articles included in the LMod corpus.

The analysis of this corpus of LMod archaeology texts was conducted according to research procedures established in studies investigating academic domains in present-day situations. The work of Ken Hyland (1998a, 1998b, 2008, 2009) was considered particularly useful as he has widely investigated typical patterns and significant features of academic writing, searching for specificity in the humanities and the hard sciences and detecting characteristics which define one particular discipline with respect to the others along the disciplines continuum (see figure 1). In addition, a number of other contributions on academic genres across disciplines were consulted, namely: Del Lungo & Tognini Bonelli 2004, Swales 2004, Biber 2006, Fløttum, Dahl & Kinn 2006, Hyland & Bondi 2006, Giannoni 2010, and Hirsh 2010. The wide range of literature analysed was essential in order to obtain information as complete as possible about genre characterisation and definition and which could help in the definition of the field of archaeology.

The data was analyzed using Wordsmith Tools 5.0 (Scott 2008),  a computer program for linguistic corpus analysis. The specific features investigated were hedges and boosters. The lexical items were selected and adapted from Hyland (1998a: 375), whose list seems to represent the most frequently used hedges and boosters in present-day academic writing (see table 2).


about, almost, apparent, apparently, appear*, approximately, argue*, around, assume*, assumption, basically, can, certain+extent, conceivably, conclude*, conjecture*, consistent+with, contention, could, could not, of+course, deduce*, discern*, doubt, doubt*, doubtless, essentially, establish*, estimate*, expect*, the+fact+that, find, found, formally, frequently, general, generally, given+that, guess*, however, hypothesize*, hypothetically, ideally, implication*, imply, improbable, indeed, indicate*, inevitable, infer*, interpret, we+know, it+is+known, largely, least, likely, mainly, manifest*, may, maybe, might, more+or+less, most, not+necessarily, never, no+doubt, beyond+doubt, normally, occasionally, often, ostensibly, partially, partly, patently, perceive*, perhaps, plausible, possibility, possible, possibly, postulate*, precisely, predict*, prediction, predominately, presumably, presume*, probability, probable, probably, propose*, prove*, provided+that, open+to+question, questionable, quite, rare, rarely, rather, relatively, reportedly, reputedly, seem*, seems, seemingly, can+be+seen, seldom, general+sense, should, show, sometimes, somewhat, speculate*, suggest*, superficially, suppose*, surmise, suspect*, technically, tend*, tendency, theoretically, I+think, we+think, typically, uncertain, unclear, unlikely, unsure, usually, virtually, will, will+not, won't, would, would+not


actually, admittedly, always, assuredly, certainly, certainty, claim*, certain+that, is+clear, are+clear, to+be+clear, clearly, confirm*, convincingly, believe*, my+belief, our+belief, I+believe, we+believe, conclusive, decidedly, definitely, demonstrate*, determine*, is+essential, evidence, evident, evidently, impossible, incontrovertible, inconceivable, manifestly, must, necessarily, obvious, obviously, sure, surely, true, unambiguously, unarguably, undeniably, undoubtedly, unequivocal, unmistakably, unquestionably, well-known, wrong, wrongly

Table 2. List of hedges and boosters analysed in the corpus (adapted by the present author from Hyland 1998a: 375)

The list includes a total of 186 items. However, the original list was amended by the addition of all the possible forms of the verbs (indicated with the wildcard asterisk). These seemed to have been excluded in Hyland’s list since no reference is made to this kind of search in his articles using the same list. It should also be said that the classification of certain occurrences as being either hedges or boosters could not be ascertained only through an automated search since there might be ‘borderline’ cases in which a word classified as hedge in Hyland’s list could be used as a booster in the LMod corpus and vice versa. Only the analysis of the context of use would help disambiguate these cases. For this reason, once that the corpus search produced concordances in which the hedge or booster was the node word, a manual disambiguation of the real function of hedge or booster was conducted by looking at the whole sentence in which the hedge or booster had been used by the writer.

Hedges and boosters can be defined as “communicative strategies for increasing [boosters] or reducing [hedges] the force of statements” (Hyland 1998a: 349). In academic writing they are essential to the authors’ rhetoric and for interacting with readers. Indeed, as Hyland argues, “they not only carry the writer’s degree of confidence in the truth of  a proposition, but also an attitude to the audience” (349).

In particular, we should make a differentiation between the role of hedges and the role of boosters in academic writing. On the one hand, “boosters allow writers to express conviction and assert a proposition with confidence, representing a strong claim about a state of affairs” (350), on the other hand “hedges represent a weakening of a claim through an explicit qualification of the writer’s commitment” (350). To this general definition, moreover, it can be added that in academic writing hedges and boosters are also indicative of the discourse practices and choices typical of a certain discipline (cf. Falahati 2007) as they tend to reflect the scientific procedures of each discipline and the researchers’ processes of reasoning in each field. In this respect they prove to be essential as they allow researchers in any academic field to gain collective adherence to the author’s claims and to enhance the persuasive force of their assertions (cf. Salager-Meyer 1994).

Table 3 displays figures which show the raw frequency of hedges and boosters in the corpus as well as the values of their frequency normalised to 10,000 words.

LModE Archaeology Texts Raw frequencies Per 10,000 words
Hedges Boosters Hedges Boosters
Essays 5789 750 183.39 23.76
2 Journal Issues 5841 665 185.04 21.07

Table 3. Raw and normalised frequency figures

In order to have a more accurate representation of frequencies in the corpora from the two periods and to focus only on items used in significant amounts in the LMod corpus, a minimum frequency of twenty per item was set, the same minimum number set for the present-day corpus. Indeed, considering the greater size of the LMod corpus in relation to the present-day corpus, it was assumed that a different proportion would not reflect these differences and would not allow a balanced comparison of the two periods. For these reasons, it was decided to provide normalised frequencies  and to divide the corpus of LMod texts into two different groups which could give a clearer picture of the exact distribution of hedges and boosters (as illustrated in table 1). This division, then, produced two corpora of approximately the same size as the present-day corpus, thus allowing me not only to separate the different text types representing the LMod archaeology texts but also to have a finer-grained analysis of the corpus data available for the two periods.

A preliminary idea of the distribution of hedges and boosters in the two periods can be obtained from table 3. According to the figures, hedges outnumber boosters considerably. These figures actually confirm a tendency already noticed by other scholars working on inter-disciplinary specificity in present-day academic writing – such as Hyland (2009), Salager-Meyer (1994) and Falahati (2007) – that in academic writing hedges are generally far more frequent than boosters because they reflect scholars’ preference to mitigation rather than emphasis. In addition, Hyland (2009) affirms that the humanities make a greater use of hedges than the sciences. In the following section, a more detailed analysis of hedges and boosters found in the LMod and the present-day corpora is provided and which will eventually confirm or invalidate these preliminary information provided by figures in table 3.

5. The LMod period and the present day

Comparing the frequency of hedges and boosters in the LMod corpus to those from the present-day corpus, the quantitative differences are immediately evident. The use of hedges is quite similar in the two periods, while the use of boosters appears very limited in the LMod corpus. This might be explained by the function of boosters within a discipline’s narrative. According to Hyland (1998a: 369), boosters seek “to convince the reader by their belief in the logical force of the argument”. In addition, they can be used to suggest “the efficacy of the relationship between data and claims” (370). Boosters are also used by scholars “to comment impersonally on the validity of their propositions” (370). Therefore, their limited use might reflect the LMod authors’ decision not to ‘let data speak for themselves’ and to propose their own interpretations through a more frequent use of hedges. This could be a typical approach adopted by scholars of a discipline which has no strong scientific tradition in the modern sense yet. Thus, they cannot base their own interpretation on a homogeneous discourse community in which experts share the same solid basic information. By contrast, LMod archaeologists had to consider technical data and a modern methodology of analysis new to the discipline, as well as a vast discourse community comprising contributors from different backgrounds: not only fellow archaeologists, historians and sociologists,  but also engineers and other scientists who started to have an interest in applying their own disciplines to the preservation of the physical past of humankind (Cf. Glyn 1981 and Levine 1986).


6. Hedges

Table 4 represents in greater detail the distribution of hedges in the LMod corpus divided according to the year of publication of the texts considered and compared with the figures found in the present-day corpus.

LModE Archaeology Texts Raw frequencies Per 10,000 words
Hedges Boosters Hedges Boosters
Essays 5789 750 183.39 23.76
2 Journal Issues 5841 665 185.04 21.07
Present-day Corpus
Archaeology 4448 1381 128.8 40.0
Art History 3003 789 100.2 26.3
CH Pres/Rest 1999 596 79.9 23.8

Table 4. Distribution of hedges and boosters in the LMod and present-day corpora.

As we can see, the two journals published in the year 1882 and 1883 – which are collections of articles – show the highest use of hedges in general. Graph 1 illustrates the distribution of hedges in the LMod corpus. The most recurrent hedges seem to be about (412 occurrences), found (786), may (661), most (354), probably (250), should (195), will (555) and would (349).

However, comparing these figures with the present-day corpus (table 5), we can notice some differences which demonstrate that – at the end of the nineteenth century – the discipline was only just starting to take its direction towards the modern sciences. [2]

Entries LMod Corpus (1882/1883) Present-day Corpus

Table 5. Most frequent hedges/boosters in the LMod and present-day corpora.

Graph 2 helps visualise the distribution of the most frequent hedges in the LMod corpus with data from the present-day corpus.

Figure 2

Graph 2. Most frequent hedges/boosters in the LMod corpus compared to the present-day corpus.

Found, may and will are by far the most frequent devices included in the count and which could indicate some specific preferences in the narrative prose of the articles investigated since found can point to the archaeological material found in a site, may could signal some tentative interpretations and will might express either some future perspectives in research (thus keeping its ‘traditional function for the future tense) or certainty about a precise interpretation (thus working as a booster). In this latter case, qualitative analysis from the LMod corpus – conducted through Wordsmith Tools and the concordances thus produced – indicates that of the 555 instances of will, almost half of them (namely, 257 instances) have the function of booster (see examples 8 and 9 below) while the remaining instances express the future tense, especially in reference to future stages of research or of the publication itself, as in the following examples (emphases added).

(1) […] the heads of the various families whose collections will be included in the work (RA_1882);

(2) This will, it is hoped, be printed elsewhere in the Journal (RA_1883).

In addition, in the LMod corpus, preference seems to be given to what Hyland (1998a: 362) calls “attribute hedges” which are “devices like about, approximately, partially, generally, quite” and which

refer to the relationship between propositional elements rather than the relationship between a proposition and a writer … attribute hedges therefore indicate the extent to which results fit a standard disciplinary schema of what the world is thought to be like, signaling a departure from commonly assumed prototypicality.

This kind of hedges is believed to be most frequently used in the sciences than in the humanities. Some instances of the use of these devices in the RAs from the LMod period are illustrated (emphases added) in the following examples (3) to (7).

(3) It seems to have been about 20 ft. high above the water, and from 12 to 14 on the inner side (RA_1867);

(4) The date of its commencement is approximately fixed by the fact that it was Theodoros, the celebrated architect and sculptor of Samos, who recommended the laying the foundations on fleeces of wool and charcoal (RA_1880);

(5) One of his children picked up the plate, which is about 8 inches by 3. The point is, does such a way serve to throw light on anything beyond itself, and anything that is known, or partially known, about the last days of the Abbey as it was? (RA_2Journals_1883);

(6) and under the mound two doorways were generally found leading into one and the same chamber (RA_1908);

(7) Lines of taping must be well planned, with triangle ties to secure the angles. Pulling up straight is difficult in a wind, especially on broken ground, and one per cent. error is quite possible then (RA_1920).

However, hedges have a fuzzy nature which does not enable a clear-cut classification into single categories as they might have different roles and pragmatic functions according to the context(s) of the sentence in which they are used. For instance, in the case of examples (3), (4) and (5), the hedges might first appear as expressions of a certain approximation.  Nevertheless, it is the present author’s opinion that they might actually be considered attribute hedges in the sense and function specified by Hyland (1998a: 363 and 1998b: 163 ff.) as referring to the presentation of quantitative data in the sciences. In particular, Hyland (1998b: 163–164) affirms that

natural phenomena do not always dovetail with scientists’ idealised cognitive models of what the world is like […] and experimental results frequently vary from how scientists imagine and structure them. Variations from an idealised conception of a particular relationship, behaviour, procedure or appearance are common in science. In order to accurately describe such vagaries of experimental conduct, attributes are frequently hedged.

The use of attribute hedges allows deviations between idealised models of nature and instances of actual behaviour to be accurately expressed. They enable writers to restructure categories, define entities and processes exactly, and to distinguish how far results approximate to an idealised state. […] Hedges are used to indicate variability with respect to certain descriptive terms.

Seen under this light, attributes such as about, approximately and partially are used by the LMod authors to dovetail the idealised model of interpretation previously given by scholars on the basis of almost purely theoretical assumption with the evidence from the actual findings. In present-day academic writing, as Hyland (1998a: 363) affirms, “particularly when used with numerical data, attributed hedges allow writers to draw on unspoken conventions of imprecision”. We might assume that this can be applied to attribute hedges in the LMod corpus as well, since authors try to provide reliable explanations on the archaeological data and findings conscious of the “permissible imprecision” (Hyland 1998a: 363) they can have in a discipline for which field work and scientifically-analysed data thus collected are still fledgling.

On the other hand, archaeology texts also make consistent use of devices that modify statements such as the epistemic modal will (example 8 and 9, emphases added):

(8) it will be well to discuss the case a little more fully (RA_1880);

(9) I trust, however, some day other people will be more fortunate than I have been, and obtain permission from the Government (RA_2Journals_1883).

For a further analysis of the occurrence of will in present-day academic discourse, we might cite results from Hyland (1998a). His corpus is composed “of [56] published articles together with a series of interviews with members of the relevant discourse communities” (353–354) in which both the humanities and the hard sciences are represented by a total of eight disciplines and 330,000 words. Hyland (1998a: 356) reports that from a total of 1929 boosters found in his corpus, will counts 483 occurrences, thus positioning itself as the most recurrent device in the corpus (according to table 2 in Hyland 1998a: 356). In addition, it is stated that – along with ‘may’ – will accounts “for nearly  17% of all devices in the corpus. Thus, results from the LMod corpus (supported by statements from Hyland 1998a for the present day) might indicate the actual importance of this modal both in the past and in the present in the ‘negotiation of academic knowledge’.

As for the other modal verbs, should is used whenever the author suggests some degree of probability in the interpretation or in the presentation of facts or whenever he proposes a new interpretation, in both cases with the purpose of lessening the force of imposition of the statement itself, as examples (10) to (12) below illustrate (emphases added).

(10) We should thus have the wall of enceinte of the present Inner ward, from Lanthorn tower to Wakefield, Bell, and Devereux towers, as the extent of the fortress on the south and west fronts (RA_1867);

(11) We should thus have the combination of a richly-sculptured shaft resting on a richly-sculptured square pedestal (RA_1880);

(12) Surface flints should have levels noted on them (RA_1920).

A comparison with present-day data shows that during the LMod period the discipline was still inclined to propose the scholar’s interpretation in a tentative manner rather than leaving data to speak for themselves like in the present-day discipline, which might be interpreted again with the lack of a proper shared knowledge of basic information inside the newly born discourse community of professional archaeologists.

6.1. Use of modals can and may

LMod Corpus (1882/1883) Present-day Corpus
Can 181 396
May 661 374

Table 6. Figures comparing modals can and may in the two corpora.

Table 6 shows the comparison of the use of the modal can and may in the LMod and the present-day corpora. [3] Moreover, graph 3 illustrates the distribution of the two modals in the two corpora, showing that the present-day corpus contains a more homogeneous use of can and may, while the LMod corpus show a preference for the modal may.

Graph 3

Graph 3. Distribution of can and may in the LMod and the present-day corpora.

Some interesting data comes from the quantitative comparison of the two modals in the two corpora, especially as regards the modal verb can. Indeed, in the present-day corpus can occurred in clusters – expressing “logical possibility” (Biber et al. 1999: 491) – such as ‘can be seen’, ‘can be + verb (e.g. considered, attributed and so on)’ with a function which is similar to epistemic modality. In addition, we can find the phrase ‘can+main verb’ such as in ‘can serve’, ‘can inform us about’, in a function that could be called deontic modality in the sense described by Biber et al. (1999: 485), i.e. of “extrinsic modality” which “refers to the logical status of events or states, usually relating to assessments of likelihood: possibility, necessity, or prediction”. Taken in this sense, this use of can in the LMod corpus expresses the possibility offered by archaeological evidence of providing a certain interpretation.

Therefore, can seems to be very functional in conveying different meanings. However, can seemed also to have a more neutral function than other modals as it was used to indicate the possibility of a certain interpretation or event without expressing the author’s opinion on the likelihood or not of that possibility. Writers, then, seemed to show the tendency to add a more impersonal force to their statements. It is useful for writers in the scientific field to describe experimental procedures (together with the use of about – see Hyland 1998a, 1998b and 2009) and by so doing bring their discipline more towards the sciences than the humanities. In the sciences, researchers try “to portray their evaluation impersonally, constructing a context in which claims appeared to arise from the research itself” (Hyland 2009: 373) .  

The limited use of can in the LMod corpus – especially if compared to the use of may with 661 hits – confirms the tendency of LMod archaeologists to present their results in a more interpretative and cautious way, as examples (13) to (22) show (emphases added). However, as Biber et al. (1999: 492) affirm, “can is especially ambiguous in academic prose, since it can often be interpreted as marking either ability or logical possibility”. Indeed, the following examples might appear ambiguous in the sense just explained, but considering the verbs associated to can one may assume that they express a logical possibility inherent to the interpretation of the data presented, rather than an ability of the author in the interpretation of the data themselves.

(13) and the only existing buildings, besides the White tower, which can safely be attributed to him are the Record tower, parts of the old crypt of Rochester cathedral, and perhaps a small part of its west front (RA_1867);

(14) the same class of tombs there yielded a remarkable series of Panathenaic amijliorce, which bear the names of Athenian archons, and can hardly, therefore, be the product of any but Attic potteries (RA_1880);

(15) all this can be obtained by means of drapery, as long as this is artistic (RA_1883);

(16) It is better when on unknown ground to plot a map as you go, so that no misunderstanding of notes can arise after (RA_1920);

(17) It may possibly be again called into requisition for coronation banquets (RA_1867);

(18) it may be observed that museums are designed for the instruction and recreation (RA_1880);

(19) It may probably have served originally as the quarter of the commanding officer (RA_2Journals_1882);

(20) When, much later, the inner walls were built, the intermediate space may have been served for military purposes (RA_2Journals_1883);

(21) That is to say, the process may have been known to this race before it broke up into divisions in its invasion (RA_1908);

(22) there is evidence of tombs, and they may be unplundered. Blown sand or grass may hide all trace of tombs (RA_1920).

7. Boosters

As table 7 and graph 4 illustrate, the overall scarcity of boosters employed in the LMod corpus and their heterogeneous distribution among the texts themselves offers limited possibilities of interpretation, or a coherent interpretation of the figures. [4]

List of Hedges/Boosters Year 1867 Year 1880 Year 1908 Year 1920 Total 2 Journal Issues

Table 7. Most frequent boosters in the LMod corpus and their distribution in the texts

Graph 4

Graph 4. Graphic representation of the most frequent boosters in the LMod corpus.

What is interesting to notice, however, is the frequency in the use of the modal verb must (which occurs 370 times in the four collections of essays and 229 times in the texts from the two journals).

As the following examples (23) to (28) show (emphases added), the scholars use must to express a logical consequence that can be inferred from the evidence produced by the artefacts or sites as well as from the interpretation of ascertained facts.

Examples (27) and (28), in particular, express the logical consequence of processes of interpretation conducted by the author. If we consider the contexts in which the modal is used, we see that, in example (27), must follows the author’s reasoning, whereby – studying an inscription and seeking to identify a praetor it cites – the author concludes that the identification of this man with one who served under emperor Augustus is not supported by the date of the inscription itself which is from an earlier period and is to be attributed to the age of Caesar. Thus, the author concludes that, necessarily, there must have been two praetors bearing the name of ‘Callistratus’ and who held their office at different times under two different emperors.

In example (28), must is again used to express the unique interpretation which the author extracts from the evidence in his possession. Indeed, the paragraph deals with interpretation of the functions of a certain site containing niches whose functions have to be deduced by the archaeologist. [5] Then, the author excludes some of the most obvious functions because they are not supported either by evidence from the site itself or from evidence and interpretations conducted on other similar sites. Thus, the conclusion is that the interpretation given by the author is the most logical one and this is  expressed through the use of must in the epistemic sense.

(23) This must have been the case at Dover, which is spoken of as a castle long before the Norman walls could have been ready, but where the earthworks were already considerable and strong, and no doubt crested by a stockade (RA_1867);

(24) Two male torsos of heroic size, both of which are helmeted, must, from their correspondence in scale, be respectively Pelops and Oinomao (RA_1880);

(25) considerable changes must have taken place in them during the four or five centuries of Roman rule (RA_2Journals_1882);

(26) These two seals must have been attached to their respective documents at a later period (RA_2Journals_1883);

(27) If we accept this identification we must postulate two praetors of the name of Callistratus (RA_1908);

(28) Frequently the walls are pitted with the loculi of a columbarium, which, however, appear to be too small to receive cinerary urns and must be intended for some other purpose (RA_1920).

8. Conclusions

Summarising the results of the corpus analysis, we might be inclined to conclude that during the LMod period archaeology was still a fledgling scientific discipline because of the presence of mixed characteristics from both the sciences and the humanities. However, this peculiar feature will be found also in the present-day data, thus probably showing that the tendency toward domain hybridisation is – in actual fact – an intrinsic feature of the discipline which emerged as the previous ‘antiquarianist’ movement was transformed into a modern science. The hybridisation can be explained by the nature of the discipline itself, composed of empirical investigations typical of the hard sciences and historical, artistic and socio-cultural interpretations of the reasons and background behind the creation and use of sites, artefacts and other remains of the past.

Finally, the use of a historical corpus has proved an important tool to discover that the apparent idiosyncrasies of this discipline can be explained by investigating its past stages. Indeed, the comparison of data from both present-day and historical corpora was essential to the investigation of the disciplinary and linguistic evolution in the field of archaeology as a distinct type of specialized discourse.


[1] The total size of the archaeology sub-corpus in the present-day corpus was 345,118 words divided in 118 RAs from two journals (International Journal of Historical Archaeology, IJHA, Springer; Oxford Journal of Archaeology, OJA, Blackwell ).

[2] The figures present in the second column in table 5 refer only to the archaeology sub-section of the present-day corpus.

[3] The figures reported for the present-day corpus report data from the archaeology sub-section of the corpus.

[4] The only exception is must which occurs with a high frequency in the 1867 and 1880 RAs, respectively.

[5] A niche is usually defined as a “shallow recess, especially one in a wall to display an ornament” (OED online).


