Series title: Studies in Variation, Contacts and Change in English
Volume 14 – Principles and Practices for the Digital Editing and Annotation of Diachronic Data
Publication date: 2013

Manuscript abbreviations in Latin and English: History, typologies and how to tackle them in encoding [1]

Alpo Honkapohja
University of Zurich


This article discusses the theoretical and practical problems related to encoding manuscript abbreviations in TEI P5 XML. Encoding them presents a challenge, because the correspondence between the orthographic sign indicating abbreviation and what the sign stands for is more complex than in non-abbreviated words. The article consists of a review of the terminology used to describe the abbreviations, looking at their history from antiquity to abolition and taxonomies of abbreviations in paleographical handbooks between 1745 and 2007. It discusses the editorial treatment of abbreviations in printed editions and relates them to the terminology used in the handbooks, offering criticism of it from a linguistic and editorial point of view and how to best represent the abbreviations in TEI P5 mark-up. Traditional taxonomies of abbreviation divide the abbreviations into groups based on the shape of the abbreviating symbol or the position of the abbreviated content. Some of the distinctions, such as the one between contractions and suspensions are not at all relevant for digital encoding. However, the system outlined in this article allows for tagging them in a way which will enable quantitative corpus study of them. The data comes mainly from a digital edition of The Trinity Seven Planets, a TEI P5-based digital edition.

1. Introduction

Take a foreign language, write it in an unfamiliar script, abbreviating every third word, and you have the compound puzzle that is the medieval Latin manuscript. (Heimann & Kay 1982: i)

Learning to understand abbreviations and expanding them correctly is one of the central skills in paleography. The forms and types of abbreviation used by the scribe are also one of the most important features by which a particular hand can be dated and localised (Clemens & Graham 2007: 89). Perhaps because the discipline is inherently somewhat conservative (Doyle 2000: 5), much of the terminology currently in use originates in the nineteenth century or earlier. Major reference works, such as Chassant (1845), Trice Martin (1892) and Cappelli (1899) date from the nineteenth century. [2] However, as using electronic corpora and digitised resources is becoming increasingly common in historical linguistics, it is important to find ways of reproducing historical documents in digital form, ensuring that they provide valid and reliable evidence. Bamman & Crane (2011) make the following point about classical philology:

Where classical philology has so far diverged from data-driven science […] is in its reliance on the authority of the editor rather than on the data itself. As much as the judgment of Kühner and Smyth may far exceed our own, the cornerstone of the scientific method is the reproducibility of experiments […]. (Bamman & Crane 2011: 2)

This point can be extended to paleography. The type of argument used in paleographical studies is endoxa, argument from authority. The assumption is that scholars are experts in their historical language variant, and that they have over years of reading and familiarity with their texts acquired sufficient expertise to make claims about it. While there is as little reason to doubt the experience and judgment of Chassant (1845), Cappelli (1899) or Traube (1907) as there is of Kühner (1914) or Smyth (1920), the important transformation made by corpus linguistics is that it is data driven, and it is now possible to repeat the same work in quantifiable and statistically relevant form (cf. Bamman & Crane 2011: 2). This requires ways of representing the data in such a way that it can be quantified and used as reliable evidence, and the traditional paleographical terminology is not necessarily the optimal means of approaching it.

This article discusses the practical and theoretical problems related to encoding the numerous abbreviated words found in medieval manuscripts. Encoding them presents a challenge, because the correspondence between the orthographic sign indicating abbreviation and what the sign stands for is more complex than in non-abbreviated words. It also includes a brief historical overview of where the terminology used to describe manuscript abbreviations originates from, taking into account Latin and English sources and the scholarship that was carried out between the eighteenth and early twentieth centuries. Such an overview is necessary, because it is important that these taxonomies were not designed with digital encoding in mind. Basic categories like the difference between suspensions and contractions (see sections 2.1 and 2.2 below), which have become standard terminology for discussing abbreviations, are not relevant for mark-up (see examples 1 and 2 below). Yet they have been the basis of some very interesting scholarship (e.g., Traube 1907), which might benefit from being repeated using corpora. It is therefore important to place the taxonomies into their proper context, considering when, where and why they originate, and how they might best be annotated digitally.

The present study consists of the following sections. Section 2 presents the various terms for abbreviations used in paleography manuals. Section 3 gives a brief overview of the history of Latin abbreviations and how they were adapted in England, starting with antiquity and finishing with their official abolition in legal book-keeping in 1731. Section 4 looks at taxonomies of abbreviations in paleographical handbooks between 1745 and 2007. Section 5 discusses the editorial treatment of abbreviations in printed editions and offers criticism of it from a linguistic and editorial point of view. Finally, section 6 discusses how to best represent the abbreviations in TEI P5 mark-up.

The data for this article comes mainly from a digital edition of The Trinity Seven Planets (Honkapohja 2013), a TEI P5-based digital edition, which encodes the abbreviations using the system described in section 6. The edition is intended both to represent the data faithfully, without non-transparent editorial interference, and to be readable to people who are not specialists in Late Middle English scientific writing. This is achieved by including a detailed glossary, and by allowing the user to switch between a view where the abbreviations are expanded and one where they are compressed. On a few occasions I also take examples from another manuscript in the same group, Boston Countway MS 19. The purpose of the The Trinity Seven Planets edition was also to function as a pilot study for my PhD project, which involves a codicological and dialectological analysis of the Sloane Group of Middle English manuscripts (cf. Voigts 1990), a group of multilingual manuscripts, with medical, alchemical, and scientific content from ca. 1440–1490.

2. Types of abbreviations

Abbreviation means the shortening of a word or a phrase at the beginning or the end. The traditional reasons given for this are either to save space or to save time (cf. Petti 1977: 22), but other factors that have also been suggested, including language-independent communication in a multilingual environment (cf. Voigts & Minnis 1989, Wright 2000, 2002 and 2011), the avoidance of using a sacred name (cf. Traube 1907), or the allegoric, ritualistic, and occult purposes related to alchemical and magical symbols (cf. Gettings 1981, Voigts 1989, Pereira 1998, Walsh & Hooper 2012). Sections 2.1–8 introduce the terminology which has become standard in paleography handbooks dealing with the subject. The majority of the terms were first used by Chassant (1845).

2.1 Suspension (alternative names: truncation, curtailment)

The term suspension was introduced by Chassant (Abréviations par suspension, cf. 1845: xxvi–xxvii) and refers to abbreviating a word by omitting a number of characters at the end. This is often, but not always, indicated by a sign of abbreviation, such as the dot (or punctus).

AUG. ‘Augustus’ (Cappelli 1899: xvi)
ib. ‘ibidem’ (ibid.: 170)

In its extreme form, only the initial letter of each word is written out, which some paleographical works, including Chassant himself (Abréviations par sigles, cf. 1845: xvii–xxiii), consider as a separate category.

fq ‘filius quondam’ (Cappelli 1899: xiv, see also section 2.3 below)
R.I.P. ‘Requiescat in pace’ (Chassant 1845: xx) or ‘Rest in Peace’
D. ‘Dux’ or ‘Dominus’ (Cappelli 1899: xv)

In these cases, the plural may be formed by duplicating the letter:

FF. ‘Fratres’ (Cappelli 1899: xv)
MSS. ‘Manuscripta’ or ‘Manuscripts’ (cf. Driscoll 2006: 259–260)
pp. ‘pages’

Suspension is also sometimes considered to include syllabic suspensions, meaning abbreviating by suspending one or more letters from the end of each syllable.

tm ‘tamen’ (Clemens & Graham 2007: 89)

2.2 Contraction

Contraction, another term introduced by Chassant (Abréviations par contraction, cf. 1845: xxiii–vi), which refers to omitting letters from the middle of the word. Contraction is often indicated by a macron above the word.

flo ‘falso’ (Chassant 1845: xxiii)
caplo ‘capitulo’ (Chassant 1845: xxiii)

Often only the first and last letters are included.

ds ‘deus’ (Cappelli 1899: xviii)
sm ‘secundum’ (Cappelli 1899: xviii)

In many cases, the vowels are omitted, and consonants written out. This is sometimes called “complex contraction” (Barker 2007: 1).

scdm ‘secundum’ (Cappelli 1899: xviii)
prbrs ‘presbyteris’ (Cappelli 1899: xviii)

Traube (1907) and Lindsay (1915) maintain that the distinction between suspensions and contractions corresponds with Paganism and Christianity, Greco-Roman abbreviations being almost always accomplished through suspension and Christian ones through contraction (see, e.g., Solomon 2008: 13), although in later literature this is contested (cf. Barker 2007: 1–2). [3]

2.3 Sigils

The term sigil is used in two different senses. Chassant uses it to refer to the Roman practice of abbreviating words to their initials, the most extreme form of suspension (1845: xvii–xxiii). This usage did not catch on, and Cappelli has incorporated it under suspension (see 2.1).

Gettings (1981) and Voigts (1989) use the term in another sense, referring to special symbols used in medieval scientific and magical works, which may lack clear lexical referents. According to Gettings, “the word symbol is not sufficiently specialised […] since it carries a literary as well as iconographic connotation […] anything may be a symbol of anything else” (Gettings 1981: 7).

“[T]he late Latin sigillum ‘appears frequently in medieval magical contexts, and has even been used specifically for certain astrological symbols and devices which were supposed to be amuletic in power” (Gettings 1981: 9) According to Voigts, “[T]his meaning of sigil also has a Middle English sanction.[...]the Middle English Dictionary defines the word sigille as ‘a sign or mark, as used by John Lydgate’” (1989: 93, see also sections 2.6 and 5 below).

2.4 Abbreviation by signs of abbreviation, including brevigraphs

This category refers to the various signs which indicate the presence of an abbreviation (Abréviations par signes abbréviatifs, cf. Chassant 1845: xxviii–xxxvii). Another term commonly used is brevigraph, formed from brevis and grapho (see, e.g., Tannenbaum 1930: 125). “They generally represented at least two letters or one syllable, and might resemble one of the omitted letters or be apparently arbitrary in shape” (Petti 1977: 23).

Some sources, including Cappelli (1899) and Hector (1958), make a distinction between general signs of abbreviation, which indicate that a word is abbreviated, and special signs of abbreviation, which correspond with a particular graphemic content, indicating that at least two or three letters need to be supplied. Some degree of variation is permissible within this graphemic content. For instance, the sign may stand for ‘con-’ and ‘com-’ when in word-initial position (cf. Cappelli 1899: xxiv). Cappelli further describes signs which may combine with other signs (Segni abbreviativi con significato relativo, cf. Cappelli 1899:xxix–xli), and alter the meaning of these signs. This category resembles the Unicode category of combining forms.

2.5 Superscript letters (Abréviations par lettres supérieures)

The superscript letters are a specific type of abbreviation in which part of the abbreviated word, often the last letter, is written above the line. The superscript part may also be the Latin case ending of the word (see, e.g., Lindsay 1915: 413–414). Superscript letters typically abbreviate through contraction, but suspension is also possible.

sigifire ‘significare’ (Cappelli 1899: xlii)
ho ‘hoc’ (Cappelli 1899: xlv)

Some superscript abbreviations originate in English, including:

wt ‘with’ (Hector 1958: 37)
Mr ‘Master’ (Hector 1958: 37)
Matie ‘Majestie’ (Hector 1958: 37)

Not all superscripts are abbreviations. A good example of this is the Early Modern English ye ‘the’.

2.6 Abbreviations by special signs

Many of the paleographical manuals distinguish between signs of abbreviation and what they call special signs (Abréviations par signes particuliers, Chassant 1845: xliii) or conventional symbols (Segni convenzionali, Cappelli 1899: l–lii).  The defining characteristics, according to Cappelli (l), are that the special sign stands alone, whereas signs of abbreviation appear as a part of a word. However, it is entirely possible to add Latin case endings after such symbols (Walsh & Hooper 2012: 63).

This category includes the symbols used for common words such as et/and or enim, monetary units (cf. the modern symbols for pound, euro and dollar: £ € $), and weights and measures (see section 6.6 below). Some of the most difficult cases from the point of view of encoding have also been classified into this category, such as the alchemical, astrological, magical and hermetic symbols used in the Middle Ages (cf. Gettings 1981, Walsh & Hooper 2012). Voigts (1989: 91–3) uses the terms carecter and sigil for these. The boundaries between this category and categories 2.4 and 2.5 are fuzzy.

2.7 Elision

Elision, a category mentioned by Tannenbaum (1930) & Petti (1977), is an English rather than a Latin one, and originates with Early Modern materials. Elision was not part of the traditional medieval system, but since Petti (1977) includes it and it is used so widely as a reference, I will briefly discuss it here. Elision was not used for saving time or space “but with the silencing of letters for metrical necessity, euphony or colloquial convenience” (Petti 1977: 25). The abbreviation could “take place at the beginning as well as the middle or at the end of a word” (Petti 1977: 25). The most common abbreviation sign is the apostrophe, but it is not always used. Elision could also include assimilation of two words together (cf. Tannenbaum 1930: 123–4), such as the Early Modern: ’tis. ‘it is’.

2.8 Other categories

Some of the categories proposed by Chassant did not become established in handbook literature. These are his use of sigils (see 2.3), abbreviative letters and monogrammatic letters. Abbreviative letters (1845: xl–xli) are the types of abbreviation in which a graphic element, such as the horizontal bar used for a nasal or the small tail in e-caudata, is joined to a letter to change the meaning of that letter. Cappelli incorporates these into his category of abbreviative signs which are dependent on the context (cf. Segni abbreviativi con significato relativo, Cappelli 1899: xxix–xli). Monogrammatic letters (Cappelli 1899: xli–xlii), or ligatures, in turn, refer to the common practice used in stone and metal inscriptions of saving space by joining letters together. Chassant stresses their irregular quality.

3. Historical systems of abbreviations

The two main reasons to use abbreviations are the economy of time and the economy of space (cf. Petti 1977: 22). Economy of time was the more important one in Ancient Rome, where abbreviations were needed for making quick transcriptions of spoken language. In late Antiquity and the early Middle Ages, saving parchment became the driving principle.

Three systems are cited as being of Ancient origin, Tironian notae, Notae iuris and Nomina Sacra. The first two are mentioned in sources like Isidore of Seville’s Etymologiae or Probus’s De Notis, surviving in various manuscripts. Both are lists of abbreviations which are far from comprehensive. They do not include many of the abbreviations used in older Roman cursive, or inscriptions, or minuscule abbreviations found in non-calligraphic manuscripts of late antiquity. The third one, Nomina Sacra, was described by Traube (1907).

3.1 Tironian notae

Ancient Rome was a society in which public speaking played a significant role (and which had an ample supply of slaves for scribal duties). This led to a need for a quick shorthand, or tachygraphic, developed as an aid for making quick transcriptions. Of the number of tachygraphic systems around, the one to gain prominence is the one known as Tironian notae, after Cicero’s amanuensis, the freedman, Tiro (Chassant 1845: x).

The system of Tironian Notae is sometimes said to have been invented by Tiro personally (see, e.g., Johnson & Jenkinson 1915, Baker 2012: 159). The claim is hard to verify, since all surviving Ciceronian works and letters are several generations removed from the original – no Roman manuscript is known as an authorial holograph (Greetham 1994). Chassant gives the alternative theory of origin that the system was invented by Ennius and perfected by Tiro (1845: x). However, it seems entirely possible, and even likely, that this is merely another instance of pseudepigraphy, the name of a famous person becoming attached to something not invented by him for reasons of prestige. [4]

Whatever Tiro’s personal involvement, Tironian notae proved to be an influential system, and many of the signs survived as manuscript abbreviations until at least the ninth century (Chassant 1845: xliii). The Tironian system was expanded according to new needs – and St. Cyprian, bishop of Carthage, mentions a need to supplement it with Christian symbols in the third century. New symbols were added to according to the differing needs of various MSS (cf. Chassant 1845: xiii). Insular scribes were particularly fond of the Tironian signs, as they were of all abbreviations (cf. Brown 1990: 5).

3.2 Notae iuris

Another system of abbreviations, which is clearly identified as such in classical and medieval sources is the Notae Iuris, mentioned both by Probus and Isidore of Seville (cf. Lindsay 1915). This consists of abbreviations used by the legal profession, including some of the most commonly known Roman ones, such as pr. ‘praetor’ or the ubiquitious SPQR ‘senatus populusque romanus’. Some of the Notae Iuris are very old, being attested in inscriptions and manuscripts “from the earliest of times” (Lindsay 1915: 414). Notae Iuris are typically suspensions, but also include contractions such as EE. ‘Esse’, or syllabic suspensions such as cos. ‘consul’, an example of abbreviating the word from the middle which definitely predates Christianity (cf. Lindsay 1915: 413–414). Some Christian abbreviations are included in the lists, and these may have superscript letters for endings, as in Sma Di Ni ‘Sanctissima Domini Nostri’.

3.3 Nomina Sacra

The third category, Nomina Sacra, refers to a set of abbreviations found in early Christian writings. These abbreviations are used with remarkable consistency in these writings, including inscriptions on amulets and icons as well as papyri, in several different languages, including Greek, Latin, Coptic, Armenian, and Slavonic sources (Barker 2007: 8–9, Solomon 2008: 14–15). Unlike the previous two categories, the name is a modern one. The Latin term, which has become standard, originates with Traube (1907: 17–18).

The practice was adopted into Latin via Greek, which itself may have been influenced by the Hebrew practice of avoiding mentioning name of the Lord (see, e.g., Clemens & Graham 2007: 89), although a number of other hypotheses have also been put forward (see Barker 2007: 1–2, Solomon 2008). Many of the Nomina Sacra are Greek, and retain the form of the characters in the Latin west, even though there are some indications that scribes may not always have understood this. Their use sometimes led to mixed forms, such as IHM ‘Iesum’, a combination of Latin accusative and the Greek Nomen Sacrum, or even the hypercorrect expansion ‘Ihesus’ (Clemens & Graham 2007: 89).

Traube makes the case that the purpose of these abbreviations was not to save time and space, but to avoid mentioning names which were considered holy. He cites a range of examples from manuscripts in which the scribe has only used the Nomina Sacra in Christian contexts; for example, the Christian God is abbreviated DS, but pagan ones are written in full; e.g., deus (cf. Lindsay 1915: 2).

3.4 Other types of abbreviation in antiquity

As already mentioned, there are several abbreviations found in ancient sources which do not fall under the traditional categories. Some are what Lindsey calls “capricious abbreviation”, especially including inscriptions in metal and stone, from coins to buildings (1915: 415). According to him they are “most untrustworthy witnesses, for the limited space caused the curtailment of words (which were in no danger of being mistaken by the reader) to take capricious forms” (Lindsay 1915: 4). The same phenomenon also affects titles and indexes of manuscripts, and can be found in pocket-sized copies of gospels, commonplace books which required economy of space at the expense of calligraphy (Lindsay 1915:415). However, archeological work has also uncovered papyri and other sources containing evidence of minuscule abbreviations (cf. Barker 2007) that are likely to have influenced the scripts used in the early Middle Ages, but not in the majuscule codices of late antiquity preserved in medieval monasteries.

3.5 Medieval Latin abbreviations

The main motivation for using manuscript abbreviations in the early Middle Ages differed from antiquity. In Ancient Rome, there was a great need for shorthand, whereas in the Middle Ages, the main concern was to save expensive writing materials, and thus the economy of space became the prime objective (cf. Petti 1977: 22). One factor contributing to this was the codex type of manuscript, which supplanted the roll between the fourth and the seventh centuries and was written on parchment because, despite being more expensive, it was better suited for bending and sewing into quires (Greetham 1994: 61).

Many popular abbreviation signs emerge during the early Middle Ages, including the ubiquituous for ‘per’. According to Lindsay it is unclear whether these symbols were invented with the new minuscule scripts, or whether they represent a continuous tradition in cursive script, the origin of which is obscured due to lack of surviving manuscripts: “[…] these symbols seem to have suddenly come into existence along with minuscule script, a wider view shews (sic) us that they were in continuous use in non-calligraphic writing, and that it is only the loss of early writing of this kind which hides the continuity from us” (Lindsay 1915: 3).

In areas under Roman rule, such as France, Italy, and Spain, there was a continuation from late Antiquity through the Dark Ages into the later Middle Ages, as opposed to Ireland and the Anglo-Saxon kingdoms, where writing came with Christian missionaries (cf. Chassant 1845). The areas in which the abbreviations were most popular, Ireland and Scandinavia, are also some of the poorest. Irish minuscule scribes used all means possible to save vellum, keeping letter-size very small, crowding words together and ignoring rules for syllable-division between lines (Lindsay 1915: 2–3).

The insular scribes, of course, also made use of some of the special characters of their own, some of which were taken over from the Germanic runes, such as þ ‘thorn’ and æ ‘ash’ as well as ð ‘eth’, which was invented by Irish monks. These characters were introduced to represent sounds in Germanic languages which were not present in the Latin script, but they also had their abbreviative uses such as the practice of using a strike-through thorn for ‘that’ and the practice of the Beowulf scribe of abbreviating the word eðel ‘homeland’ with eth. (see, e.g., Baker 2012: 159)

Using Latin abbreviations reached its height with scholasticism. Abbreviations were carried over to the vernacular, as far as practicable, but with less consistency. “A work which, like St Augustine’s De Trinitate, would have filled a codex in the twelfth century, could be copied in a smaller unit by the fourteenth. The development of a very small handwriting and the copious use of abbreviations made this possible” (Robinson 1980: 160).

3.6 Adapting abbreviations for vernacular English

When writing was transfered from Latin to the vernacular, the abbreviation symbols were applied to these languages. The process was less straightforward for a Germanic language like English than for Romance languages (Hector 1958: 37). As the orthographical system of Middle English allowed for more variation than the fairly standard orthography of Latin (see Smith 2008: 215–217), the referents became less consistent than with the fairly regular Latin orthography (Roberts 2005: 10, Rogos 2013: 6). Some signs transformed from abbreviations with a clear graphemic reference to general signs of abbreviation. Some may be used as otiose flourishes (Johnson and Jenkinson 1915: xxiii).

Abbreviations remained popular in the fifteenth and sixteenth centuries, and were carried over into early printed books, which attempted to imitate the visual appearance of manuscripts, and were based on manuscript exemplars like their handcopied cousins (see König 1983: 85, Edwards 2000: 65). Sometime in the sixteenth century, abbreviations disappeared from printed books, as both economy of time and economy of space became increasingly irrelevant, but in personal correspondence and handwritten documents they remained popular throughout the Early Modern period. During the Interregnum they were briefly abolished, but the death of Cromwell and the restoration of monarchy also brought back Latin and its abbreviations (Hector 1958: 29, also 23). Their long history finally came to an end some 70 years later, when their use was officially abolished again in 1731, along with Latin (after which its uses were only very marginal and ceremonial) (Hector 1958).

4. Abbreviations in paleographical handbooks

Table 1. Abbreviations in a sample of paleographical handbooks, 1745–2007

The earliest paleographical handbooks which deal with abbreviations date from the eighteenth century, are characterised by their practical focus. For example, a German work by Walther (1745) is aimed at antiquarians, in addition to people who wish to consult archives, monastic libraries, and private collections containing information on geneologies (Walther 1745: 2–3). The first English manual by Wright (1776), which dates some 50 years after the abolition of abbreviations, states that its audience is young lawyers and “Gentlemen of liberal education and large property” (Wright 1776: iv) as well as learned historians who might use it in consulting old documents. To prove his point about the usefulness of understanding old handwriting, he cites a number of court cases in which general histories and transcriptions of records were produced as evidence, but were rejected by the court because primary records would have been available (Wright 1776: v–vii).

ALTHOUGH it is universally agreed that the public have reaped some advantages by the Acts [...]to be thereafter written in English, yet the Tax growing from those advantages is become so excessive, that few persons are now to be found capable of reading or explaining old Deeds and Charters […]. (Wright 1776: iv)

Neither Walther nor Wright attempts a taxonomy of the abbreviations. These originate in the following century, when the most important reference works on abbreviations were written, including Chassant (1845), Trice Martin (1892), and Cappelli (1899), all of which are still relevant and in use today. Their publication can be seen as by-products of the large amount of editorial activity, and the flourishing of scholarly clubs, societies, and series in Victorian England and on the continent (see, e.g., Iredale 1982: 8, Chassant 1845: iii). For instance, the appearance of the work by Chassant was associated with the drive for publication of archival material, which was a reaction to the destruction of historical documents caused by the French revolution of 1789 (Iredale 1982: 14).

The taxonomy of abbreviations, which is still in use today to a large degree, originates with this work. In his introduction, Chassant states that his intention is to write the first French work on manuscript abbreviations, and laments the lack of such a resource earlier, especially since a German one has been available for 100 years (Walther 1745). He does not merely want to copy the system, and is of the opinion that a dictionary which lists everything in alphabetical order without making the effort to explain how the abbreviations work will require the reader to learn each one of them individually (cf. Chassant 1845: v–vi).

Dès 1835, nous avions déjà, dans notre Essai sur la Paléographie française, cherché à débrouiller le chaos des abréviations, en les classant par genre et en expliquant les règles qui président à leur construction. Dans une seconde édition de cet  ouvrage, en 1839 (Paléographie des chartes et des manuscrits), plus de développement fut donné à cette partie importante de la paléographie; aujourd'hui nous pensons l'avoir complètement expliquée. (Chassant 1845: v)

The standard and most comprehensive reference work for Latin abbreviations Cappelli (1899), which covers some 14,000 abbreviated words, uses essentially the same system as Chassant, but drops some of his categories. “Since 1929, no changes have been introduced into the Italian text […]” (Heimann & Kay 1982: iii), but it is still in use and has been reprinted numerous times. The main British contribution to the field, Trice Martin (1892), has a more specialized focus as it is intended as a practical aid for deciphering local archives written in Britain, containing a Latin-English glossary as well as Latin forms of English personal and place names (cf. Iredale 1982: 5). It attempts no coverage of documentary paleography (Iredale 1982: 6), but does provide a list of most of “the abbreviated forms of Latin and French words used in English records and manuscripts” (Trice Martin 1892: v). Unlike Chassant and Cappelli, Trice Martin does not offer a full taxonomic system for the abbreviations, but he does use some terminology derived from Chassant, including “superior letters” (1892: vi), and “marks of contraction” (1892: vii).

The 19th and early 20th century works typically speak of the categories of abbreviation as rules. For instance, Cappelli states in his introduction “[…] all too often the beginner slavishly looks up in this dictionary every abbreviation he encounters, when in nine cases out of ten he could ascertain the meaning by a applying a few simple rules” (Heimann & Kay 1982: i) and Chassant mentions he thinks he has the system “completely explained” (1845: v). This emphasis can probably be partly explained by genre, the works being instructive or reference works, and partly by the time of writing, which was within the positivist period of science (cf. Heimann & Kay 1982). More recent paleographical handbooks (Brown 1990, Clemens & Graham 2007) have moved towards smaller number of categories, and just list “symbols” or “abbreviative symbols” as a third category (see table 1).

The turn of the nineteenth and twentieth centuries saw the approaches towards paleography become more comparative, taking advantage of the invention of photography, which “revealed that there were rules of calligraphy, scribal practices and conventions, and styles of decoration”, which made it “possible to set up strictly palaeographical criteria of date and provenance for literary and liturgical texts” (Hector 1958: 11–12). The seminal work was Nomina Sacra (1907) by Ludvig Traube, intended as the first step in a longer project to write the history of Latin abbreviations in the west. Traube had a broader focus than his predecessors, taking into account social circumstances surrounding the production of manuscript books with the intention of explaining the developments in their context. His work was cut short by his untimely death, but it was continued by Lindsay, whose Notae Latinae (1915) is a comprehensive survey of the use of Latin abbreviations in the early Middle Ages. It is an excellent work, and if a comprehensive study of manuscript abbreviations using digital techniques were to be attempted, it should be used as the basis.

Traube had shewn the necessity for a much a larger and more comprehensive account, in order to supply clues to the date and the home of a MS. and to throw light on the history of the writing-centres, and their relations with each other. (Lindsay 1915: vii)

5. Evolution of editorial principles and practices

My experience of working with medieval Anglo-Latin and English documents suggests that a common practice in printed editions is to tacitly expand “contracted forms, often replacing them with a full form which has been selected without giving a justification for the preference of one variant over the other” (Meurman-Solin 2007: 2.3). This problem can be carried over to corpora, if they are based on printed editions (cf. Honkapohja, Kaislaniemi & Marttila 2009: 456).

Expanding abbreviations is partly due to the requirements of the printed medium and the difficulty of representing the abbreviation signs in modern typeface. But this is only part of the reason. The technology for representing special characters has existed for a long time, since the invention of printing, in fact, as early printed books were modelled directly on manuscripts, and also reproduced many of the abbreviations. [5] More important has been the philosophy of editing which has favoured critical editions (Edwards 2000: 74–75). As Machan (1994: 9) puts it, critical editions have become such a norm that “criticism that draws directly on primary source documents is often regarded as a distinct branch of scholarship independent of the main trends of enquiry”, and it is the edited text with which most scholars and students work. This, “while eminently useful and understandable […] nonetheless systematizes our removal from the Middle Ages” (Machan 1994: 9).

A critical edition will attempt to establish a text based on textual criticism, as opposed to merely reproducing a text in existence (Greetham 1994: 347). The task of the editor is to reveal the authorial or archetypical work from the different versions available. A number of methodologies have been developed for this, including Lachmann’s stemmas, assigning manuscripts to genetic groups based on shared errors and constructing genealogical trees from them (Machan 1994: 21); Bediér’s best-text editing, which involves selecting one manuscript as the base text, and emending it with readings from others (Machan 1994: 24), or W.W. Greg’s concept of copy-text and dividing features of the text into substantives (words and intended meaning) and accidentals (spelling variation, punctuation, etc.) (Machan 1994: 30). Manuscript abbreviations have been a feature which have not been considered worth transmitting in critical editions (cf., e.g., Rogos 2013: 6), as they represent scribal rather than authorial language, in Greg’s terminology accidentals, and are part of the manuscript text rather than the authorial work which the editor aims to reconstruct. Editions which do present a single manuscript normally expand them, listing the types used in a critical apparatus.

However, it is much less suited to approaches which Shillingsburg calls the historical orientation of editing (1986: 19), in which the manuscript is seen as a cultural product, interesting in itself and its socio-cultural context. With the emergence of digital editing as a viable alternative for traditional editing, the choices for presenting the text have expanded enormously (e.g. Vanhouette et. al 2006: 161). Advances have also been made by applying electronic corpora to historical linguistics, which enable handling unprecedented amounts of data. Both have fueled some discussion of what is required of data for historical linguistics (see, e.g., Bailey 2004, Lass 2004, Curzan & Palmer 2006, Grund 2006, Driscoll 2009, Rogos 2011, 2013), and a re-evaluation of the suitability of editions prepared for literary studies or historians as data for historical linguistics (see Honkapohja et al. 2009, for more detailed discussion). Most vocal in his criticism has been Lass (2004: 40), who criticises any intervention that replaces scribal language with editorial language, including tacit emendation, modernization of punctuation or word division, and silent expansion of abbreviations, stating that editions and corpora should be as transparent as possible, avoiding irreversible editorial interference while offering maximum flexibility for the user.

Section 5 provides a summary of arguments directed towards expanding abbreviations from the point of view of both historical linguistics and editorial theory.

5.1 Criticism of expanding abbreviations

First of all, the practice of silently expanding abbreviations contributes to linguistic hybridity (cf. Wright 2000: 152, Lass 2004: 22, Grund 2006: 105–106). Critical editions which consist of several texts from various manuscripts bring together the language of “partly independent groups of scribes […] operating over long time-spans, and often not talking to each other […] The ‘text’ is a modern sampling of this material – not of anything that was ever all together in the same place at the same time” (Lass 2004: 37). If, for example, an editor uses another copy of a text to supply the expansion of a word in the text he or she is editing, this will contribute to linguistic hybridity.

Expanding the abbreviations also masks the fact that the data is a combination of the phonemic or graphemic characters of the Latin alphabet and elements whose relationship to the words they abbreviate is more complex (see Benskin 1977: 506, Rogos 2011: 47, 2013: 7). Many of the abbreviations by brevigraphs and superscript letters (sections 2.4 and 2.5) could be classified as syllabic script like the Japanese writing system and special signs or symbols (section 2.6) as pictograms or logograms resembling those of the Chinese writing system or ancient Egyptian hieroglyphs. In the editorial expansion, they are assigned a definite alphabetic and implied phonetic value (Rogos 2013: 7), “regardless of the fact that medieval languages did not observe regular spelling conventions” (Wright 2000: 152). As a result, the data used in research will be a combination of graphemes which originate with the scribe, and those which are supplied by the editor, based on editorial principles, or sometimes pure intuition (cf. Driscoll 2009: 3).

Of course, some of the abbreviations have clear referents, but in more ambiguous cases the editor is put into situations in which he or she has to choose between variant spellings, or whether to consider a certain noun or verb inflected (cf. Driscoll 2009: 3, 19–20), or a particular abbreviation otiose or not (cf. Rogos 2013: 27–31).

It is standard practice when expanding abbreviations to do so in keeping with the normal orthographic practice of the scribe in question […] the situation can thus easily arise where a scribe has written er three times and ir twice, but otherwise used the tittle. This would then be expanded, perhaps several hundred times, to er giving an entirely false impression of the distribution of the two forms in the resulting text. (Driscoll 2009: 19–20) [6]

The problem is greater with vernacular manuscripts with the amount of spelling variation – as has been demonstrated by the use of the parallel Chaucer’s Man of Law’s Tale corpus by Rogos (2013: 27–31) – but it is also present with a number of Latin words with variant spelling (such as nūquam which can stand for ‘nunquam’ or ‘numquam’). The relatively standard Latin orthography may also lead to additional standardisation when the editor assumes that the expansions are more standard than they are. “A sign like is likely to be expanded by a modern editor to ‘pro’ each time it occurs, unless it occurs as a word-initial morpheme where ‘por’ might be preferred […] To represent all these forms by a single modern rendition is to falsify the data for historical linguist” (Wright 2000: 152).

For historical linguistics, the signs of abbreviation that are used are potentially significant data. For example, the use of certain abbreviations might be geographically conditioned, resembling what was demonstrated by Benskin, in a seminal article, to be the case for the graphemes þ and y (1982). [7] He showed that writing the open form of thorn is not a sign of a late scribe or Norman influence (Benskin 1982: 13), but rather a Northern English and Scottish trait between 1350 and 1450 (Benskin 1982: 14–15). Benskin’s study was carried out manually using the fit-technique for LALME, but with the system presented in this article, it would be possible to construct corpora which would enable studying the distributions of abbreviations using a quantitative approach. Rogos (2013: 3–4) argues for the value of using abbreviations and word final characters for creating scribal profiles. Another area of historical linguistics where encoding abbreviations might be useful is studying multilingual texts. Word-final abbreviations may have been used to hide suffixes, allowing the readers to supply the endings in Latin or the vernacular, such as in late medieval mixed-language business accounts, which have been studied by Wright (see, e.g., Wright 2000: 152–156, 2011: 195).

6. The System used for The Trinity Seven Planets

This section discusses the XML-based system and editorial choices made in preparing my edition of The Trinity Seven Planets (Honkapohja 2013), as well as making transcriptions of other manuscripts in the Sloane Group of Middle English Manuscripts (see Voigts 1990, Honkapohja 2013). The examples come from two manuscripts, Trinity College Cambridge, O.1.77 and Boston Countway Library of Medicine O.1.77, with the exception of (1) for which I could not find a suitable example in the manuscript data. The encoding is also based on principles outlined for The Digital Editions for Corpus Linguistics Project (Honkapohja, Kaislaniemi & Marttila 2009). The system aims to:

  • keep the encoding as consistent and logical as possible,
  • keep the editor’s contribution transparent,
  • allow the user as much flexibility as possible with the encoded data.

The Trinity Seven Planets edition gives the user the opportunity to view the text with the abbreviations expanded, or represent them by a unicode symbol. This is achieved by using the <choice>-element, which “groups a number of alternative encodings for the same point in text” (TEI P5 Guidelines, 3.4). Words with abbreviations are encoded twice. Thus a diplomatic transcript of an item which contains an abbreviation is put unexpanded between <abbr>-tags indicating that they contain “an abbreviation of any sort” (TEI P5 Guidelines, 3.5.5). The expanded word is put between <expan>-tags to mark that the tagged content “contains an expansion of an abbreviation” (TEI P5 Guidelines). The expansion itself is signalled by a pair of <ex>-tags, which indicate the information is “a sequence of letters added by an editor or transcriber when expanding an abbreviation” (TEI P5 Guidelines, The signs of abbreviation used in the original text are put between <am>-tags, the tag content being “a sequence of letters or signs present in an abbreviation which are omitted or replaced in the expanded form of the abbreviation” (TEI P5 Guidelines).

From the point of view of corpus linguistics, this type of encoding has the advantage that both the sign used to indicate an abbreviation and the editor’s expansion are tagged and thus searchable. What this XML-based tagging offers in addition to ASCII-based encoding, used in resources like MEG-C and LAEME (cf. Lass 2004, Stenroos & Mäkinen 2011) is that it encodes both the form of abbreviation and the editorial expansion and keeps them separate. The tagging satisfied the need for transparency (cf. Lass 2004: 40, Honkapohja et. al 2009: 454), since everything inside the <ex>-tags can be identified as having been supplied by the editor.

6.1 Suspensions and contractions

The distinction between contractions and suspensions made by Chassant has become a mainstay of abbreviation terminology (see Table 1). However, the mark-up use required for XML annotation is very similar in both cases. The only difference is in the location of the <ex>-tags.



Anno Domini

<expan>A<ex>nno</ex> D<ex>omini</ex></expan>


Trinity O.1.77, f. 6v


If it is necessary to encode the distinction, it is possible to give a type-attribute for the <abbr> element (see Driscoll 2009: 6), @type="suspension" or @type="contraction". A possible reason for doing this would be to perform quantitative corpus work on the distribution of the two; for instance, to examine Traube’s notion that contractions are more likely to occur with Christian Latin vocabulary and suspensions with Greco-Roman words in a particular text.

6.2 Signs of abbreviation, simple and complex

Some of the signs of abbreviation can be represented by common ASCII characters such as punctus, enclosed by <am>-tags (see example 5). However, many of the abbreviations examined in sections 2.4 and 2.5 above require special characters. TEI Guidelines offer several ways of encoding them: using the symbol directly if the encoding is UTF-8, giving a numeric character reference, or a hexa-reference, in the case of some common ones an entity reference, or using the gaiji-module of TEI (cf. TEI P5 Guidelines 5).

As the focus of The Trinity Seven Planets was on visual presentation of the abbreviations, the editorial decision was to encode them with the Unicode numeric character reference. I used only characters which are part of the Unicode at the moment. Projects such as the Medieval Unicode Font Initiative (MUFI), replace characters missing from the Unicode by allocating them to the Private Use Area (PUA). The problem of using these is that, if they get incorporated into Unicode itself, their codepoint may change. Therefore the decision was made to encode characters which are not part of the Unicode at the moment with ASCII equivalents. This only meant three punctuation signs: punctus elevatus, virgula suspensiva, as well as an astrological symbol for Dragon’s tail, which was used once in the text (cf. Honkapohja 2013).

Another alternative, which might be used for a more ambitious corpus project would be to use the gaiji-module available in TEI (5.2). This consists of a <g>-tag, “which serves as a proxy for new characters or glyphs” and elements which are stored in the <charDecl>-element in the TEI header which provide very comprehensive facilities for describing a particular special character. The <g>-module has the advantage that it allows for purely semantic encoding of the signs and symbols in mark-up. It can also be used to distinguish between characters and glyphs which Unicode regards as identical. Example (3) illustrates encoding the same abbreviation by numeric character reference and applying the gaiji-module.


propirtees, Trinity O.1.77, f.123r


<abbr><am><g rend="looped-p"</am>pirtees</abbr>

6.3 Complex Abbreviations

In contrast to the distinction between suspensions and contractions, which has little relevance for XML-based mark-up, what does matter is the length, position and complexity of the abbreviated content. Examples (4–6) are cases of complex contraction (see 2.2 above) in which a single horizontal bar corresponds with expansions in several places in the word.

Driscoll (2006 & 2009) makes a distinction between “abbreviations with a lexical reference (suspensions, contractions, and a number of brevigraphs)” (2006: 259), which stand for cases where a part or whole of the word is “written out and the rest omitted, the omission often, but not always, being indicated by some sign or mark” (2009: 2) and ones “with a graphemic reference (superscript letters and signs and the remainder of the brevigraphs)” (2006: 259), which “represent the same combination of graphemes regardless of the lexical item in which they occur” (2009: 2). He furthermore states that “[i]t strikes one as counterintuitive to treat the former on anything other than the whole-word level, while treating the latter in the same way seems equally misconceived” (Driscoll 2006: 259, see also 2009: 2–3).

Examples of lexical abbreviations in this study are (12, 48, 10, 12, 1721) and examples of graphemic abbreviations are (3, 9, 11, 1315). However, some of these are clearer cases than others. The suspension of (1) is straightforward to interpret as having a lexical reference, as are the complex abbreviations of (46) and the numerals and conventional symbols of (1722), which are a form of mathematical or ideographic notation. Example (2), qd ‘quod’, on the other hand, while a contraction and thus lexical according to Driscoll, does correspond to a short sequence of two graphemes in very straightforward way, especially with the fairly standard Latin orthography. Moreover, it bears close resemblance to the standard way in Trinity O.1.77 of abbreviating ‘quid’, qid (see also Cappelli 1899: 307), which would be a graphemic abbreviation according to this classification. Should the i-element written superscript be taken as having a partly graphemic reference, or is it merely part of the convention associated with contractions of writing part of the abbreviated content in superscript (see 2.5 and 3.3 above)? Why should two forms which are very close in scribal practice be treated as taxonomically separate?

In the end, the editorial policy which I chose for The Trinity Seven Planets was derived from practical considerations of how to best represent the content visually in the “expand abbreviations” and “collapse abbreviations” views offered in the user interface. The content between <abbr>-tags was encoded to produce as visually faithful a facsimile of the actual view as possible and the content between <expan>-tags to produce a view which shows the editorial expansions in italics, in the locations where these expansions are. I have specified the expanded content with <ex>-tags, except in cases where none of the characters within the <abbr>-tags correspond with content within the <expan>-tags (see examples 7, 8, 1722). Example 5 is a special case, since here the expansion would have produced improper nesting (see section 6.9 below).


Boston MS 19, f.5r



.id est.,
Trinity O.1.77,

<expan>id est</expan>


Boston MS 19, f.3v


6.4 Nomina Sacra

A slight problem is caused by the Nomina Sacra. The scribe in Trinity O.1.77 abbreviates the name of Jesus as illustrated below. In example (8), the scribe has written the horizontal bar which normally signifies a nasal n, m, or nm, or is used as a general marker of abbreviation. However, in this case it seems to stand for s, since the verb form would normally require the vocative case (cf. Mark 10:47). [8] In either case, the scribe uses Latin characters resembling the Greek ones to write the Greek letters. In the latter example, it is debatable whether the final character should be understood as a Latin u or a Greek sigma, the matter being further complicated as the expansion could be construed as Greek ‘Ἰησοῦ’ just as well as Latin ‘Iēsū’.

I decided to consider the entire Nomen Sacrum as a sign of abbreviation, since the referents of the charaters ihc are ‘iota’, ‘eta’ and ‘sigma’ rather than the Latin characters, to which they are identical in appearance. However, because of the visual resemblance to these Latin character, I decided they are more appropriately encoded by the Latin characters than the Greek ones.


Trinity O.1.77. f.1r (iota, eta, sigma)




Trinity O.1.77. f.1r
(iota, eta, upsilon)



6.5 Superscript

Superscript characters present a different challenge for encoding. It would be possible to treat each of them as an individual type of abbreviation (see, e.g., Driscoll 2009: A.1.3), but this does not take into account that they are a productive category. As they are written in Latin alphabetic script, it would be possible to coin an endless number of new ones. For example, since Latin case endings are frequently written in superscript, e.g., insto = ‘institutio’ inste = ‘institutione’ (Cappelli 1899: 182), the encoding I have used in The Trinity Seven Planets is that found in Cummings (2009), using the <hi rend="superscript">-tags, as in examples (910). An editorial decision which needs to be made with superscript abbreviations is whether to include the letters that are written as superscript within the <ex>-tags or not (see example 10). I opted for the former, as the forms of the letter used as superscript are often quite far removed from the normal variant, although a case could be made for both of the encodings.


Boston MS 19, f.3r

<abbr>g<hi rend="superscript">a</hi>uat<hi rend="superscript">r</hi></abbr>


with, or with
Trinity O.1.77

<abbr>w<hi rend="superscript">t</hi></abbr>
<abbr>w<hi rend="superscript">t</hi></abbr>

This encoding can also be extended to signs of abbreviations, as it allows for an easy distinction between those written on the line and ones written as superscripts, as in example (11).


Trinity O.1.77, f.123r

<abbr>eu<hi rend="superscript">&#42864;</hi>y</abbr>

Example (12) illustrates a more complex type of superscript abbreviation. The a signifies the final letter of a longer contraction, and is a clear instance of abbreviation with a lexical reference.


et cetera
Trinity O.1.77, 122r

 <abbr><am>&amp;</am> c<hi rend="superscript">a</hi></abbr>
 <expan><ex>et</ex> c<ex>etera</ex></expan>

6.6 Uncertain expansions

In a number of cases, the expansion of an abbreviation may be uncertain (see section 5.1 above). A TEI-XML-based encoding offers a flexible and transparent way for dealing with these cases. The expansion between the <expan> tags is, by definition, editorial intervention. In addition, it is possible to use the mechanisms available in TEI P5 to indicate certainty and responsibility (cf. TEI P5 att.responsibility). This can be achieved by the attributes @cert, which “signifies the degree of certainty associated with the intervention or interpretation” (TEI P5) and @resp, which “indicates the agency responsible for the intervention or interpretation, for example an editor or transcriber”.

For instance, in the The Trinity Seven Planets edition I tagged all uncertain abbreviations as cert="unknown" and resp="#ah". This means cases with potential ambiguity in which a particular word was not found in expanded form in the data the encoding experiment is based on. The user interface of the electronic edition shows these in grey in the “expand abbreviations” view.

In order to determine whether the expanded form of a word could be found, I performed corpus searches using AntConc software. The transcription included not only the Seven Planets, but all sections of the Trinity manuscript, which can be safely attributed to Hand A (5084 words).

The biggest problem was the abbreviation, which can be expanded as ‘–es’, ‘–ys’ or ‘–is’ (cf. Trice Martin 1892: 8). In some cases, such as example (13), the words are not found expanded in the data. Table 2 illustrates the spread of all these forms, the number of tokens found for each one, the phonological context, as well as whether the etymology is Latin, Old English, Romance or of ambiguous late Latin or Romance origin (see also Honkapohja 2013). In the end, I decided to use ‘–es’ for all ambiguous expansions as it is the one with most expansions. [9]

On the other hand, this also illustrates the editorial difficulties caused by this approach. The word “planetis” occurs 16 times written by Hand A. It is abbreviated in 15 of the cases, and written out only on a single instance, in which the word is split between two lines, as illustrated by example (15). Following the standard editiorial procedure, I expanded it as‘–is’ in all of the instances, without tagging it as uncertain, which may lead to some regularisation in the data (see section 5.1 above).

Voiceless alveolar stops tokens etymology








late Latin or Romance



late Latin or Romance






late Latin or Romance



late Latin or Romance



late Latin or Romance





late Latin or Romance

tretis (treatise, singular)


late Latin or Romance





Old English

tretys (treatise, singular)


late Latin or Romance










Old English






late Latin or Romance



late Latin or Romance



late Latin or Romance



late Latin or Romance

Voiced alveolar stops tokens etymology





late Latin or Romance



Old English





Old English



Romance (originally: Arabic)



Old English





Old English




Old English



Old English



late Latin or Romance



Old English

Table 2. Nouns ending in –es, -is, -ys or in Middle English texts written by Hand A in Trinity O.1.77.


Bestes [,-is,-ys]& fishes,
Trinity O.1.77 f.123r


yuel planetis,
Trinity O.1.77, f.127r


<expan>best<ex cert="unknown" resp="#ah">es</ex></expan>




6.7 Strike-through h and l

Another thing which posed considerable problems was the strike-through forms some times found in ħ and ł, which are a notorious problem in transcribing Middle English, as they may or may not stand for a final e (see example 16). Rogos (2013) has convincingly demonstrated that Chaucer scribes sometimes use the strike-through form in a rhyming position with –e (cf. Rogos 2013: 28–30), which can be taken as a clear indication of this. However, having two strike-through h:s or l:s rhyme with each other is also normal (Rogos 2013: 30). As these forms may have graphemic significance, I encoded them in the transcription. Since there is no certain evidence available as to whether they should be expanded, I did not give an alternative expansion for them within the <choice>-tags.


Trinity O.1.77,


6.8 Numerals and measurements

Because The Trinity Seven Planets edition is intended to be readable by a non-specialist audience interested in medieval astrology as well as undergraduates, the editorial decision was made to expand all numerals. I used the same editorial principle as for other expansions – if the word could be found expanded in the text I used that. If not, then I used a form which would fit the scribal orthography or the head word given by the Middle English Dictionary. These were tagged as uncertain expansions. For example, ‘twelue’ – see example (17) – is found expanded, but ‘seuen’ is not (example 18).


xij, Trinity O.1.77, f.123r



 Trinity O.1.77, f.123r

 <expan cert="unknown" resp="#ah">seuen</expan>

Numerals are also potentially problematic, because they are language independent. For instance, example (19) below can be expanded as either ‘quattuor’ or ‘four’. A similar problem is related to the apothecaries’ weights and measurements, which can also be language independent (example 20; see also Voigts 1989: 91). Unlike a number of other texts in the Sloane manuscripts, The Trinity Seven Planets did not contain any of these. If it had, the editorial aim of making the text as accessible as possible to non-specialist audience would have required expanding them. With Latin, the expansion is fairly straightforward, but with Middle English it results in an editorial problem. Should example (20) be expanded word for word? This would lead to the unnatural word order ‘vnce halfe’. [10] Should it be expanded as ‘halfe an vnce?’ This leads to a considerable amount of editorial intervention. What if it is in the plural, iij? Should it be expanded in Latin, even in passages where the matrix language is English? Would the audience have read out aloud in Latin or English?

Another phenomenon is the special characters found in scientific, medical, astrological, alchemical and astrological works (see, e.g., Voigts 1989: 91). Of these, those used in alchemy and magic are particularly problematic, since they were occult disciplines (e.g. Pereira 1999: 339). [11] The language used in describing the substances, conditions, and processes is characterised “by figurative complexity of language uncharacteristic of modern scientific texts” (Walsh & Hooper 2012: 58). For example, Roger Bacon considered that alchemical language hid “the general truth about elementary generation and corruption, being the veritable root (radix) of ever natural and medical knowledge” (Perreira 1998: 29). Moreover, a writer could “arbitrarily create a symbol ad hoc to stand for something else” (Voigts 1989: 92). To take a later example, Sir Isaac Newton employed symbols of his own invention in his alchemical works (Walsh & Hooper 2012: 57). Walsh & Hooper distinguish three types of usage in Newton’s alchemical writing. Sometimes he followed conventional practice, sometimes he modified the symbols “to refer to states and conditions of the substances he saw in his studies” and occasionally he invented completely new ones (Walsh & Hooper 2012: 60).

TEI P5 Guidelines contain the @lang, which identifies the language of any element (cf. TEI P5 Guidelines It is also possible to use the @type attribute to state explicitly that they are numerals or symbols, and it can also be specified in style-sheets that these things do not have to be expanded in the user interface, as many people prefer to work with them as they are. Moreover, it is possible to include the numerical value of the numbers with attributes. The @type could also be used to assign subtypes to the alchemical sigils, such as the division into Substances, Processes and Apparatuses given by Walsh & Hooper (2012: 72), although there are limits to what an encoding system can do with alchemy (Walsh & Hooper 2012: 67–69).


quattuor or four,
Boston MS 19, f.1r

<abbr>iiij<hi rend="superscript">or</hi></abbr>


<abbr>iiij<hi rend="superscript">or</hi></abbr>


“uncia semis”,
ounce half” or
half an ounce”?
Boston MS 19,



<expan>halfe an vnce</expan>


6.9 Word-tags

The Trinity Seven Planets was tagged for words, using the <w>-tags, which represent “a grammatical (not necessarily orthographic) word” (17.1.). In simple abbreviations, the tags were placed outside the <choice>-tags, as in example 20. In cases where the expanded content corresponds to several words, such as example 21 below, the <w>-tags were placed inside the <choice>-tags. This has a slight effect on the word count, as the word count of the text, abbreviations expanded is slightly higher than when they are compressed.

Example 5 is a case where the expansion consists of two words. Encoding it i<ex>d est</ex> would cause improper nesting with word-tags <w>i<ex>d</w> <w>est</w></ex>  (see Section 7 below).


xij, Trinity O.1.77, f.123r



cccxxxviij, Trinity O.1.77,

<expan cert="unknown" resp="#ah">

7. Conclusions

This article, which has introduced an XML-based system for annotating the numerous manuscript abbreviations in medieval manuscripts, consisted of a brief historical overview of both the historical systems of Latin abbreviations, and their English equivalents, and the way they have been described and taxonomized in paleography handbooks. The latter part of the article looked at the practical problems and editorial decisions taken when annotating abbreviations in The Trinity Seven Planets digital edition, relating them to the terminology used in paleographical handbooks.

The issues related to encoding abbreviations in TEI XML can be broken down into two categories, those related to representing the sign abbreviation and those related to representing the abbreviated content. The first one takes place between the <abbr>-tags and the second one between <expan>-tags. The distinction between contractions and suspensions, which is one of the most widely used taxonomical categories (see Table 1), has to do with the position of the abbreviated content. However, from the point of view of XML encoding this distinction is not important (see section 6.1). On the other hand, a distinction made by Driscoll (2006) between lexical and graphemic abbreviations turned out to be more relevant. The distinction between lexical and graphemic reference resembles the abbreviation categories used by Hector (1958) and Cappelli (1899), which distinguish between signs that have general reference, indicating that something has been omitted, and those which have special reference, which correspond to a particular set of graphemes. But these are more concerned with the shape and the function of the symbols, whereas what matters more from the point of view of XML annotation is the location and quality of the expanded content. Some of the examples found in the data used in this study turned out not to fall neatly into Driscoll’s distinction, and a more practical editorial principle was selected (see section 6.3 above).

When it comes to representing the sign of abbreviation, a TEI-XML -based approach is useful and flexible, providing several different ways of encoding special characters (6.2), the Greek characters used in Nomina Sacra (6.4) and superscript abbreviations (6.5). It also provides a mechanism for indicating uncertain editorial expansions in a transparent manner. Moreover, it is possible to use the @type to make distinctions between sub-categories of abbreviations considered important for the analysis (6.1 and 6.8). On the whole, the system outlined in this article is well suited for corpus linguistic research, as it can be used to annotate both the sign of abbreviation and its expansions in a manner which enables using both when making corpus linguistic searches.


[1] This article originated as a paper given by me and Ville Marttila at the Leeds International Medieval Congress 2009 and a poster by me at the Digital Humanities 2009 conference at the University of Maryland. I am grateful to Elena Pierazzo and Syd Baumann for their detailed comments on the poster, which made us reconsider the system of encoding abbreviations at the time. I am also grateful to Carla Suhr for help with the French, as well as for Samuli Kaislaniemi for helpful discussions on what the abbreviations represent.

[2] I use the original publication year for each of the paleographical handbooks in question, even though I may use a newer edition. The reason for this is that the article has a historical focus, and giving the original publication year will help the reader to see how they fit into to the developments discussed.

[3] The distinction still has relevance for modern spelling conventions. As Driscoll (2002) notes, there is a rule in Modern English and several other languages, according to which words which are abbreviated by cutting in the middle should be indicated by a dot, whereas ones which abbreviate the word in the middle and include the final letter should not. Thus, Mr, Mrs and Dr should be written without a dot, and Rev. and Feb. with one – a distinction which corresponds with suspensions and contractions, even though the terms will be unfamiliar to the general public.

[4] I am grateful to my former office mate Turo Vartiainen for finding the correct Greek term for the phenomenon, as well as other stimulating conversations.

[5] For example, paleographical dictionaries such as Cappelli (1899) or Trice Martin (1892) reproduce the abbreviations typographically. There have also been attempts at making typographic facsimile editions, such as William and the Werewolf by Madden (1832), but they have not achieved much popularity (see Edwards 2000: 66).

[6] With my own data from the Sloane Group of Middle English manuscripts (see Voigts 1990), I have been forced to decide whether to expand a horizontal bar, indicating a nasal, as m or n, (or in some cases mn, nn, nm) hundreds of times per text.

[7] Although these cannot be said to be abbreviations.

[8] The online Corpora Library of Latin Texts A and B produce no hits with “Iesus miserere” and four with “Iesu miserere”.

[9] Meurman-Solin (2007: 2.3.4.) uses a different tagging, in which the expanded form of contractions is tagged using a single expansion “as an emic representation of all the possible variant realisations” that a particular sign may substitute for.

[10] Trinity O.1.77 contains a short text on the apothecaries’ notes on the first fly-leaf, which specifies that “halfe an vnce […] is þus writen […] ss”. I have used an image from the Boston MS because better quality images are available. Taking an expansion from a different manuscript would contribute to linguistic hybridity, if done in an actual edition (see 5.1 above).

[11] According to a division made by Manzalaoui (1974), medieval science, from a modern point of view, falls under three categories. Firstly, “activities that we still regard as experimentally sound, true in mathematical and quantitative terms, and technologically, or, at least, empirically, useful.” Secondly, “the pseudo-sciences” which for the most part, “involve a closely textured and internally self-consistent logical system based upon a single false axiom” (224) and thirdly “activities [...] in which the theoretical basis is occult, and the teaching deliberately kept esoteric” (224–5).



Trinity College Cambridge. O.1.77.
The manuscript images are displayed with the kind permission of the Master and Fellows at the Trinity College Library.

Collectanea medica, circa 1450
Ballard 19
Boston Medical Library in the Francis A. Countway Library of Medicine


Anthony, L. 2011. AntConc 3.2.4. Computer software. Tokyo, Japan: Waseda University.

Bailey, Richard W. 2004. “The need for good texts: The case of Henry Machyn’s Day Book, 1550–1563”. Studies in the History of the English Language II: Unfolding Conversations (Topics in English Linguistics 45), ed. by Anne Curzan & Kimberly Emmons, 217–228. Berlin & New York: Mouton de Gruyter.

Baker, Peter S. 2012. Introduction to Old English. 3rd ed. Chichester: Wiley-Blackwell.

Bamman, David & Gregory Crane. 2011. “The Ancient Greek and Latin Dependency Treebanks”. Language Technology for Cultural Heritage: Selected Papers from the LaTeCH Workshop Series (Theory and Applications of Natural Language Processing), ed. by Caroline Sporleder, Antal van den Bosch & Kalliopi Zervanou, 79–98. Berlin: Springer. doi:10.1007/978-3-642-20227-8_5

Barker, Don C. 2007.“P.Lond.Lit 207 and the origin of the Nomina Sacra: A tentative proposal.”. Studia Humaniora Tartuensia 8(A.2): 1–13.

Benskin, Michael. 1977. “Local archives and Middle English dialects”. Journal of the Society of Archivists 5: 500–514.

Benskin, Michael. 1982. “The letters <Þ> and <Y> in later Middle English, and some related matters”. Journal of the Society of Archivists 7: 13–30.

Brown, Michelle P. 1990. A Guide to Western Historical Scripts from Antiquity to 1600. London: British Library.

Cappelli, Adriano. 1990 [1899]. Lexicon Abbreviaturarum Dizionario Di Abbreviature Latine Ed Italiane. Milano: Hoepli.

Chassant, L-A. 1970 [1845].Dictionnaire des abréviations latines et francaises usitées dans les inscriptions lapidaires et métalliques, les manuscrits et les chartes du moyen âge. Hildesheim: Georg Olms Verlag.

Clemens, Raymond & Timothy Graham. 2007. Introduction to Manuscript Studies. Ithaca: Cornell University Press.

Cummings, James. 2009. “Converting Saint Paul: A new TEI P5 edition of The Conversion of Saint Paul using stand-off methodology”. Literary & Linguistic Computing (3): 307–317. doi:10.1093/llc/fqp019

Curzan, Anne & Chris C. Palmer. 2006. “The importance of historical corpora, reliability, and reading”. Corpus-Based Studies of Diachronic English, ed. by Roberta Facchinetti & Matti Rissanen, 17–34. Bern: Peter Lang.

Doyle, A.I. 2000. “Recent directions in medieval manuscript study”. New Directions in Later Medieval Manuscript Studies, ed. by Derek Pearsall, 1–14. Bury St Edmunds: York Medieval Press.

Driscoll, Matthew J. 2002. “Stray thoughts on abbreviations in some modern European languages”. Grace-Notes Played for Michael Chesnutt on the Occasion of His 60th Birthday, 18 September 2002, ed. by Jonna Louis-Jensen & Ragnheiður Mósesdóttir. Copenhagen: Det Arnamagnæanske Institut.

Driscoll, M.J. 2006. “Levels of transcription”. Electronic Textual Editing, ed. by Lou Burnard, Katherine O’Brien O’Keeffe & John Unsworth, 254–261. New York: The Modern Language Association of America.

Driscoll, M.J. 2009. “Marking up abbreviations in Old Norse-Icelandic manuscripts”. Medieval Texts – Contemporary Media: The Art and Science of Editing in the Digital Age, ed. by M.G. Saibene & M. Buzzoni, 13–34. Pavia: Ibis.

Edwards, A.S.G. 2000. “Representing the Middle English manuscript”. New Directions in Later Medieval Manuscript Studies, ed. by Derek Pearsall, 65–79. Bury St Edmunds: York Medieval Press.

Gettings, Fred. 1981. Dictionary of Occult, Hermetic and Alchemical Sigils. London: Routledge & Kegan Paul.

Greetham, David C. 1994. Textual Scholarship: An Introduction (Garland Reference Library of the Humanities 1417). New York: Garland.

Grund, Peter. 2006. “Manuscripts as sources for linguistic research: A methodological case study based on the Mirror of Lights”. Journal of English Linguistics 34(2): 105–125.

Hector, Leonard Charles. 1966 [1958]. The Handwriting of English Documents. Ilkley, West Yorkshire: Scolar Press.

Heimann, David & Richard Kay. 1982. The Elements of Abbreviation in Medieval Latin Paleography (University of Kansas Publications 52). Kansas: University of Kansas Libraries.

Honkapohja, Alpo. 2013. “The Trinity Seven Planets”. Scholarly Editing: The Annual of the Association of Documentary Editing 34.

Honkapohja, Alpo, Samuli Kaislaniemi & Ville Marttila. 2009. “Digital Editions for Corpus Linguistics: Representing manuscript reality in electronic corpora”. Corpora: Pragmatics and Discourse. Papers from the 29th International Conference on English Language Research on Computerized Corpora (ICAME 29). Ascona, Switzerland, 14–18 May 2008, ed. by Andreas H. Jucker, Daniel Schreier & Marianne Hundt, 451–474. Amsterdam & New York: Rodopi.

Iredale, David. 1982. “Introduction”. The Record Interpreter: A Collection of Abbreviations, Latin Words and Names Used in English Historical Manuscripts and Records. Chichester, Sussex: Phillimore.

Johnson, Charles & Hilary Jenkinson. 1963 [1915]. English Court Hand A.D. 1066 to 1500. Illustrated Chiefly from the Public Records. Oxford: Clarendon Press.

König, Eberhart. 1983. “The influence of the invention of printing on the development of German illumination”. Manuscripts in the Fifty Years after the Invention of Printing, ed. by J. B. Trapp, 85–96. London: The Warburg Institute.

Kühner, R. & C. Stegmann. 1914. Ausführliche Grammatik der Lateinischen Sprache II. Hannover: Hansche Buchhandlung.

LAEME = Linguistic Atlas of Early Middle English 1150–1325. 2008. Compiled by Margaret Laing. Edinburgh: The University of Edinburgh.:

Lass, Roger. 2004. “Ut custodiant litteras: Editions, corpora and witnesshood”. Methods and Data in English Historical Dialectology, ed. by Marina Dossena & Roger Lass (Linguistic Insights. Studies in Language and Communication 16), 21–48. Bern: Peter Lang.

Lindsay, Wallace Martin. 1915. Notae Latinae: An Account of Abbreviation in Latin MSS. of the Early Minuscule Period. Cambridge: Cambridge University Press.

LLT-A = Library of Latin Texts – Series A. Brepolis Databases.

LLT-B = Library of Latin Texts – Series B. Brepolis Databases.

Machan, Tim William. 1994. Textual Criticism and Middle English Texts. Charlottesville, VA: University Press of Virginia.

Madden, Frederick. 1832. The Ancient English Romance of William and the Werwolf: Edited from an Unique Copy in King’s College Library, Cambridge; with an Introduction and Glossary. London: Roxburghe Clube.

Medieval Unicode Font Initiative (MUFI).

MEG-C = The Middle English Grammar Corpus. 2011. Compiled by Merja Stenroos, Martti Mäkinen, Simon Horobin & Jeremy Smith, University of Stavanger.

Meurman-Solin, Anneli. 2007. Manual to the Corpus of Scottish Correspondence (CSC). Helsinki: Research Unit for Variation, Contacts, and Change in English.

Pereira, Michela. 1998. “Mater medicinarum: English physicians and the alchemical elixir in the fifteenth century”. Medicine from the Black Death to the French Disease, ed. by Roger French, Jon Arrizabalaga, Andrew Cunningham & Luis García-Ballester, 26–52. Aldershot: Ashgate.

Petti, Anthony G. 1977. English Literary Hands from Chaucer to Dryden. London: E. Arnold.

Roberts, Jane. 2005. Guide to Scripts Used in English Writings up to 1500. London: British Library.

Robinson, P. R. 1980. “The ‘booklet’: A self-contained unit in composite manuscripts”. Codicologica 3: 46–69.

Rogos, Justyna. 2011. “On the pitfalls of interpretation: Latin abbreviations in MSS of the Man of Law’s Tale”. Foreign Influences on Medieval English, ed. by Jacek Fisiak & Magdalena Bator, 47–54. Frankfurt am Main: Peter Lang.

Rogos, Justyna. 2013. “Isles of systemacity in the sea of prodigality? Non-alphabetic elements in manuscripts of Chaucer’s ‘Man of Law’s Tale’”.

Shillingsburg, Peter L. 1986. Scholarly Editing in the Computer Age: Theory and Practice. Athens, GA: University of Georgia Press.

Smith, Jeremy J. 2008. “Issues of linguistic categorisation in the evolution of written Middle English”. Medieval Texts in Context, ed. by Denis Renevey & Graham D. Caie, 211–224. New York: Routledge.

Smyth, Herbert Weir. 1920. Greek Grammar. Cambridge, MA: Harvard University Press.

Solomon, Kenneth R. 2008. “Nomina sacra: Scribal practice and piety in early Christianity”. The Church Convergent, Divergent, and Emergent: 21st Century Ecclesiology. Chicago, IL: Moody Bible Institute.

Stenroos, Merja & Martti Mäkinen. 2011. Corpus Manual, Version 2011.1. The Middle English Grammar Corpus, Compiled by Merja Stenroos, Martti Mäkinen, Simon Horobin and Jeremy Smith.

Tannenbaum, Samuel A. 1930. The Handwriting of the Renaissance: Being the Development and Characteristics of the Script of Shakspere’s Time. New York: Columbia University Press.

TEI consortium, eds. 2007–. TEI P5: Guidelines for Electronic Text Encoding and Interchange.

Traube, Ludwig. 1907. Nomina Sacra: Versuch einer Geschichte der christlichen Kürzung München: C.H: Beck’sche Verlagsbuchhandlung.

Trice Martin, Charles. 1982 [1892]. The Record Interpreter: A Collection of Abbreviations, Latin Words and Names Used in English Historical Manuscripts and Records. Chichester, Sussex: Phillimore.

Unicode = The Unicode Standard. By the Unicode Consortium.

Vanhoutte, Edward, Lou Burnard, Katherine O’Brien O’Keeffe & John Unsworth. 2006. “Prose fiction and modern manuscripts: Limitations and possibilities of text encoding for electronic editions”. Electronic Textual Editing, 161–180. New York: The Modern Language Association of America.

Voigts, Linda Ehrsam. 1989. “The character of the carecter: Ambiguous sigils in scientific and medical texts”. Latin and Vernacular: Studies in Late-Medieval Texts and Manuscripts, ed. by A.J. Minnis, 91–109. Bury St. Edmunds: D.S. Brewer.

Voigts, Linda Ehrsam. “The ‘Sloane Group’: Related Scientific and Medical Manuscripts from the Fifteenth Century in the Sloane Collection”. The British Library Journal 16: 26–57.

Walsh, John A. & Wallace Edd Hooper. 2012. “The liberty of invention: Alchemical discourse and information technology standardization”. Literary & Linguistic Computing 27(1): 55–79. doi:10.1093/llc/fqr038

Walther, I.L. 1745. Lexicon Diplomaticvm, Abbreviationes Syllabarvm Et Vocvm in Diplomatibus Et Codicibus a Secvlo Viii. Ad Xvi. Vsqve Occvrrentes Exponens. Göttingen: 10. Pet. et 19. Wilh. Schmidios, Fratres.

Wright, Andrew. 1846. Court-Hand Restored: Or the Student’s Assistant in Reading Old Deeds, Charters, Records, Etc. 8th ed. London: Henry G. Bohn.

Wright, Laura. 2000. “Bills, accounts, inventories: Everyday trilingual activities in the business world of later medieval England”. Multilingualism in Later Medieval Britain, ed. by D.A. Trotter, 149–156. Woodbridge: D. S. Brewer.

Wright, Laura. 2002. “Code-intermediate phenomena in medieval mixed-language business texts”. Language Sciences 24: 471–489.

Wright, Laura. 2011. “On variation in medieval mixed-language business writing”. Code-Switching in Early English (Topics in English Linguistics 76), ed. by Herbert Schendl & Laura Wright, 191–218. Berlin: de Gruyter. doi:10.1515/9783110253368.191