Varieng Home

3.4 Commentary

This section will describe the typology of comments in the tagged CSC corpus. A separate database entitled Key to comments provides a detailed list of all the comments used.

All comments are marked by curly brackets. (It should be pointed out that these brackets are also used with punctuation marks and backward slashes indicating a line-break (e.g. {.} and {\}. These brackets thus signal an item as something to be ignored by Tagger.)

There are three main types of comments. They can be distinguished from one another by their position:

  1. comments in the text
  2. comments in the lexel
  3. comments in the grammel

I. Comments in the text

Type I comments can be classified into four sub-categories, according to the information they contain:

  • I.a language-external comments on the manuscript original
  • I.b often interrelated language-external and linguistic comments on handwriting
  • I.c linguistic comments related to structure, syntax or discourse function
  • I.d linguistic comments on zero-realizations

I.a Language-external comments on the manuscript original

Language-external comments on the manuscript original (Type I.a) provide the user with information about the physical condition of the manuscript and the text, focusing on features which may have negatively affected the transcription process. In other words, in the case of a damaged manuscript, comments such as {torn} or {<blurred} are inserted into the text to make the user aware that the original text is unrecoverable or the reading of a word or passage in very pale ink is doubtful. A comment without '<' refers to something which is damaged or torn completely; since emendation is not allowed in the present system (see Section 2.3 Transcription and digitization), the torn part will remain as an unfilled gap in the running text. A comment with '<' indicates that the reading of the preceding item is doubtful due to the physical feature specified by the comment. Thus, the comment {<torn} usually occurs in contexts in which the preceding item contains question marks, indicating that some characters are partly torn in the manuscript original (e.g. DOU?N?E {<partly torn} indicates the incomplete shapes of <U> and <N> in the manuscript). The function of these comments is to permit the user to assess the quality of the data and to be aware of the lower validity of unclear attestations.

I.b Often interrelated language-external and linguistic comments on handwriting

Type I.b comments mark any idiosyncratic features of handwriting that may lead to ambiguity. Since ambiguous characters in a particular hand make the identification of a given linguistic variant doubtful, the comments may contain both language-external (i.e. script-specific) and linguistic (i.e. related to potential patterns of variation) information. With particularly untidy or badly-formed hands, it may sometimes be impossible to decipher a word, and the doubtful characters in the reading provided have been signalled with a question mark immediately following and a comment, which may suggest an alternative reading. Even in regular hands, <t> and <c> often have similar shapes, and comments are used to inform the user about the alternative reading (see Section 2.3.3). In hands that do not properly distinguish between <a> and <o>, the linguistic item and the comment have the following structure: HA?ME {<or O}. If this ambiguity applies to the majority of shapes a particular pair of characters have in a particular hand, a comment about this is also positioned before the body of the text in the file (e.g. {shapes of <a> and <o> are sometimes indistinguishable}). In addition to unclear realizations of individual characters, the idiosyncratic ductus in a hand may be reflected in a compressed realization of particular sequences of characters; for example, a sequence of minims may be represented by a wavy line. In these cases, the approximate realizations of individual characters have been signalled as such with question-marks, and the comment {<compressed} follows.

I.c Linguistic comments related to structure, syntax or discourse function

Linguistic comments related to structure, syntax or discourse function (Type I.c) are restricted to cases which are difficult to search without such comments. As regards syntactic features, fronting is a case in point; typical cases are indicated with the comments {fronted O>}, {fronted av>} and {fronted C>}. The comment {appositive} is an example of those related to structural features, while comments such as {tl} 'title', {ho} 'honorific' and {term of address} belong to the subgroup of comments on discourse function. Since the present tagging system is based on a relatively long process of experimentation, some discourse features are searchable using either comments or particular elements in the tags. This could be seen as partly overlapping marking, but in my experience different types of research questions require different types of tools. Thus, it is more economic to use the comments {inversion>} and {inversion indicating subordination>} in studies of subject verb order than it is to use the grammels of verb forms in which the position of the subject is indicated by an arrow (e.g. vps13<n+ 'a simple NP as an immediately preceding subject' in contrast with vps13>n+ 'a simple NP as an immediately following subject'). What is more important is that this detailed grammel is only provided for the present indicative and the present and past subjunctive, where there is variation in the choice of the form: invariable past tense forms (with the exception of the verb BE) are exclusively tagged with 'vpt'. In hindsight, it would probably have been useful to apply the same detailed tagging system to all finite verbs; in fact, this feature may undergo revision in later versions of the corpus.

Type I.c comments also include information about passages of direct speech ({direct speech>} paired with {direct speech<}). It has also been necessary to introduce the category of participial clauses as independent propositions ({indep>}) in cases in which a link between the preceding or following clause(s) cannot be established according to semantic criteria. These clauses tend to have the function of polite letter-initial or, even more frequently, letter-closing formulae, the most frequent verbs in these probably being praying and beseeching.

I.d Linguistic comments on zero-realizations

Linguistic comments on zero-realizations (Type I.d) specify the linguistic feature which has been marked as being part of a variational pattern with 'realized' and 'zero-realized' members. It should be noted that the concept of zero-realization is here defined in this variationist way. Thus, there is not necessarily a direct link between zero-realization and ellipsis, since, as regards the former, in many cases no claim can be made about a particular item having been omitted. The comment {that deletion} would perhaps be more appropriate as {zero complementizer}, the latter indicating that clausal links realized by complementizers are in variation with those with no explicit linking device.

One of the focus areas in the application of I.d comments is the investigation of degrees of explicitness in creating links between clauses and larger chunks of text. The comments {zero pre} and {zero post} are used to refer to the absence of a logical connector in clause- or sentence-initial position, for example, where links realized by connectives such as and, but, for and so are typically used in early epistolary prose (see Meurman-Solin 2004).

Another important area requiring commentary relates to features traditionally referred to as ellipsis in grammars, such as the omission of items, or rather leaving them unrepeated, in coordinate phrases or clauses (e.g. {zero S}, {zero v}, {zero aux} and {zero pr}).

Since the compiler was particularly interested in the relative system at the time the commentary was being developed, she has experimented with two overlapping types of annotation in the tagging of relatives. The comment {zero rel} is followed by a tag in which the grammel is introduced by a zero (e.g. $/0RO{y1} 'an object form of a zero relative with an inanimate antecedent in the singular'). The system of using 0-grammels seems ideal for variationist linguistic research, and will probably replace the comments on zero-realizations altogether in later versions of the corpus.

II. Comments in the lexel

Comments positioned in the lexel have four functions:

  • II.a semantic categorization
  • II.b semantic disambiguation
  • II.c identification of causative constructions and speech acts
  • II.d identification of Latinate constructions

II.a Semantic categorization

Semantic comments are attached to lexels used with a number of semantic functions to facilitate the creation of categorized inventories by category. Thus, high-frequency items such as the subordinator as carry a semantic label such as {cause}, {comp} 'comparison/degree', {time}, {manner}, etc. Since these semantic roles are relatively well established, they permit a straightforward categorization. It should be noted, however, that this labelling is intended merely to facilitate searches, and cannot be considered a conclusive analysis. Fuzzy features have been left uncategorized by semantic criteria.

An inherent feature in the tagging system is that some variants are defined as the default in variational patterns (see Section 3.2 Principles of tagging). In comments positioned in the lexel, this applies to lexical items such as while and since; the sense 'until' of the former and 'cause' of the latter are indicated by a comment ({until} and {cause}, respectively), whereas the other uses are considered defaults and therefore remain uncommented upon.

II.b Semantic disambiguation

It may seem that comments representing types II.a and II.b overlap to some extent. However, Type II.b comments are not intended to suggest any kind of a basic taxonomy for more detailed research. Instead, they permit the user to distinguish between homonyms (e.g. let 'allow' and let 'prevent'). Here, too, the principle of defining particular uses as the default has been applied (see Section 3.2 Principles of tagging), so that, for example, the use of or as a coordinating conjunction is considered the default and not commented on, whereas or as a subordinating conjunction, in the sense 'before', is marked with the comment {time}.

II.c Identification of causative constructions and speech acts

The challenging goal of providing a commentary to facilitate discourse analysis has not yet been achieved in this version of the CSC, but there are certain areas where some tools have been developed. As is perhaps generally true of innovative work of this kind, the areas of experimentation have been chosen according to the compiler/tagger's present research interests. In investigating the typology of clause linkage, Lehmann (1988) depicts a continuum from highly elaborated to highly compressed, singling out the grammaticalization of superordinate verbs with a subordinative potential, such as causative and optative verbs, as resulting in realizations which are closer to the compressed end. The system for commenting on the causative or optative function of superordinate verbs with object + infinitive complementation is still crude; the comment {cause}, together with an infinitive tagged as /vi-av, permits the user to find all the instances in which the compressed structure is chosen instead of a more elaborate main clause + subordinate clause structure. Since one of the most frequent discourse functions in letters is to present requests, variation between elaborated and compressed realizations invites an analysis of the general degree of complexity in epistolary prose styles.

II.d Identification of Latinate constructions

The object + infinitive complementation structure referred to in the discussion of Type II.c comments is additionally indicated with {lat} in cases in which standard reference works have identified the influence of the Latin accusativus cum infinitivo construction. Similarly, Latinate object + present participle and object + past participle constructions are identified by the comment {lat} in the lexel.

III. Comments in the grammel

This category of comments provides additional feature-specific information, primarily concerning particular semantic and syntactic properties. The fact that there seems to be some degree of overlap between features receiving a type II.d comment and those with a category III comment can usually be explained by the fact that a particular distinctive feature is considered more central than the shared one(s). For example, the fact that the type II.d comment {lat}, attached to the object + infinitive/present participle/past participle constructions, is positioned in the lexel, while the category III comment {abs} for 'absolute construction' is positioned in the grammel, is due to the fact that the former feature is closely related to the verb category specified by the lexel, whereas the comment {abs}, attached to a participle form, permits searches for non-finite adverbial clauses with an explicit subject and sometimes a subordinating element such as with or without (e.g. [with] god willing).

Category III comments may relate exclusively to the tagged item (e.g. {onom} attached to the element n indicates that a noun is used onomastically, i.e. as (part of) a name); to a feature in the context which, in established linguistic description, is considered an appropriate classifying criterion (e.g. a semantic comment defining the animacy and number of the antecedent in relative constructions: {y1} 'inanimate singular', {+h2} 'animate human plural'); or to a wider structure such as a clause of which the tagged item is a constituent (e.g. {cond}, attached to the predicate verb, to categorize an adverbial clause as one of condition). In the present version of the corpus, the semantic role of non-finite and verbless adverbial clauses is not indicated; this is an area for revision in later editions of the CSC. However, the core property strings 'vi-av', 'vpsp-av' and 'vpp-av' permit a relatively easy search for all non-finite adverbial clauses, the verbless ones being identified by the comment {zero v-av}, which indicates zero-realization.

The subcategorization of category III comments below excludes the application of this first-level hierarchy of categorization as a classifying criterion. Instead, the discussion of these comments is organized according to the following four types:

  • III.a Semantic comments
  • III.b Syntactic comments
  • III.c Structural comments
  • III.d Discourse function

The majority of semantic comments in the grammel (Type III.a) are on adverbial clause type (e.g. {conc} 'concessive', {circ} 'circumstance', etc.). In the case of {time}, the comment also solves the problem of tagging ambiguous verb forms interpretable as the present or (unmarked) future indicative or the present subjunctive; in these, the forms are tagged as present indicative, but the comment {time} permits the creation of an inventory of these forms in adverbial clauses of time.

Type III.b comments, which provide syntactic information, are relatively infrequent, since syntactic relations are primarily indicated with elements integrated into the grammel, such as hyphens (e.g. n-av 'noun in an adverbial phrase') and arrows (e.g. /vi>pr 'an infinitive with a prepositional complement' and /pr<aj 'preposition introducing an adjective complementation') (see Section 3.3 Practices of tagging). However, some Type III.b comments are quite frequent: for example, {non-ad} attached to the grammel of an item positioned non-adjacently with respect to its head (e.g. non-adjacent of-phrase postmodifiers in complex noun phrases) or, in relative constructions, its antecedent or anchor. The use of the comment {disc} is restricted to tags attached to intervening relatives in discontinuous nominal structures (e.g. the tag of the zero-relative in the great cair you have to sie things settled is / 0RO{sent}{disc}, and that of $see is /vi{non-ad}<n). See the discussion of the absolute construction above.

Among type III.c comments, which relate to structural features, the one allowing a search for appositive structures is probably the most useful, as it is otherwise difficult to retrieve structures of this kind (e.g. in Master Linton, his servant, a type I.c comment {appositive} is placed between the first (Master Linton) and second unit (his servant) of the appositive structure, and, in addition, a type III.c comment is attached to the grammel of servant, namely n{app}). Another example illustrating this type is the comment {post} attached to adjectival attributes positioned after the nominal head in complex noun phrases (e.g. in Thursday last, the tag of the lexel $last is /aj{post} 'a postpositioned adjective').

III.d comments on discourse function do not yet provide a systematic description, the annotation of features for pragmatics and discourse analysis obviously still being an area for collaborative work in the field of corpus linguistics. In addition to the causative and optative constructions (II.c) and clause-combining devices (II.a) discussed above, the compiler of the CSC has considered it useful to draw the user's attention to the ambiguity of I hope in relative clauses, such as which I hope you will consider. A Type I.d comment {zero that&com} is placed between $hope and $/P02N (you) to alert the reader to the two alternative readings, i.e. 'which I hope that you will consider' and 'which, I hope, you will consider', the latter giving I hope the function of a comment clause. In addition to this I.d comment, {com}, a III.d comment, is attached to the verb of mental process in contexts in which the function of the comment clause is clearcut. Quantitatively, the most important III.d comments are {ts}, {foc}, {tf} and {f}: {ts} indicates that a connective has a text-structuring function at the level of discourse (see Section 3.2 Principles of tagging); {foc} is attached to features which restrict the focus to an item immediately following (e.g. $as for in as for that matter); {tf} 'topic-forming' is exclusively used in the grammels of connectives; and {f} 'formulaic' marks the tagged item as part of a formula.


Lehmann, Christian 1988. 'Towards a typology of clause linkage.' In: John Haiman and Sandra A. Thompson (eds) Clause Combining in Grammar and Discourse, 181-225. Amsterdam and Philadelphia: John Benjamins.

Meurman-Solin, Anneli 2004. 'Towards a Variationist Typology of Clausal Connectives. Methodological Considerations Based on the Corpus of Scottish Correspondence'. In: Marina Dossena and Roger Lass (eds) Methods and Data in English Historical Dialectology (Linguistic Insights. Studies in Language and Communication, 16), 171-197. Bern: Peter Lang.