Varieng Home

3.2.3 The theory of tags in the CSC

The general theoretical approach to the tagging of the CSC data can be defined with reference to a set of principles guiding the reconstruction of past language use. The most important among these is that the tagging is designed to reflect a profoundly variationist perspective. The shape of Scots, conditioned by time, place and social milieu, is assumed to reflect continued variation and variability, resulting in a high degree of language-internal heterogeneity, further increased by contact with a number of other languages and language varieties. A sophisticated tagging system is required, both to identify and analyse such complex patterns of variation and to trace multidirectional processes of change, assessing their relative intensity in terms of time and space. The tag of a given item provides descriptive information about its structural and contextual properties. It may also contain comments which permit such processes as semantic specification or disambiguation, but this information does not permit a straightforward categorization. Instead, the tag demarcates a 'variational space' within which an item can be examined in a valid and relevant way. The concepts of 'potential for category membership' and 'variational space' are central to the creation of comprehensive inventories for the study of the continuous variation and change that is inherent in any language variety, be it idiolectal, local, regional, or supraregional.

In applying the variationist approach to the reconstruction of historical varieties, it is necessary to keep in mind that, due to scarcity of texts from the early periods in particular, there may be significant gaps in our knowledge base. Ideally, all the members of a particular pattern of variation would be attested in the data as alternatives, and information provided by the tags would permit the creation of comprehensive inventories of such patterns. In practice, if only fragments are available, rather than a balanced and representative collection of texts, we will also have to consider the implications of potential unattested variants; one way we might achieve this would be to draw on typological information about corresponding systems in later periods, or those representing other languages or language varieties. Thus, while even highly sophisticated data-retrieval systems will not permit us to depict the full scale of variation in Older Scots, the corpus-based variationist approach, with its deeply ingrained emphasis on being data-driven and data-oriented, will provide a more reliable synchronic and diachronic description of this language variety.

This section will examine five themes which the compiler/tagger of the CSC has found particularly important in designing the tagging system, the sixth on the list below being discussed in Section 3.1. These will illustrate some of the new dimensions of the theoretical approach to linguistic annotation on which the tagging praxis is based. The system has been created with the following goals in mind:

  1. applicability to analyses at the phrasal, clausal, sentential, discoursal and textual levels (Section 3.2.3.1)
  2. efficiency in data retrieval for the identification of historical continua (Section 3.2.3.2)
  3. categorization on the basis of discourse functions, with the fuzziness and polyfunctionality of linguistic categories as inherent features (Section 3.2.3.3)
  4. inclusion of zero-realisations in variational paradigms (Section 3.2.3.4)
  5. focus on the cline from elaboration to compression in information processing (Section 3.2.3.5)
  6. visual prosody (Section 3.1 Visual prosody)

3.2.3.1 Analysis at the phrasal, clausal, sentential, discoursal and textual levels

In creating variationist typologies, the decisive factor is that variants in a particular typology have been observed to show patterning at a particular level of analysis, which may be structural, syntactic or related to communicative or text-structuring functions, to give some examples. Thus, the tagging makes a distinction between the phrasal, clausal, sentential, discoursal and textual levels of analysis. As discussed in more detail in Section 3.3, in addition to inflexional and derivational morphology, structural features of the noun phrase, the verb phrase, the adjective phrase and the adverb phrase are described by using arrows to indicate relations within a particular structurally-defined element or between a number of such elements.

An obvious case in which there is a need to distinguish between clausal and sentential levels is the variational pattern of relative markers in the CSC data, which includes relatives introducing clauses as postnominal modifiers as well as those functioning as independent constituents at the sentence level, the latter lacking a syntactically-defined antecedent (cf. Meurman-Solin 2007a). Among relatives, there are also those that have a function at the discoursal and textual levels. For example, such relative compounds as wherefore, whereupon and whereat have been attested with text-structuring functions, either signalling the marked communicative function of an utterance which follows immediately (e.g. a request, a wish) or as cohesive elements in narrative strategies. The discoursal level of analysis focuses on elements related to the communicative acts in a letter, whereas the textual level draws the user's attention to devices used to structure that letter as a whole. Thus, wherefore may be used as a text-structuring element, while the relative phrase until which (time) often introduces a letter-closing polite formula.

Subordinators such as since, as, because and for (the first two in the role of cause) and connective adverbs such as therefore are members of the same semantically-defined inventory, but their membership in a syntactically-defined one is not as straightforward as the traditional categorization into conjunctions and adverbs would suggest. The assessment of for in particular requires a careful analysis of the data (Rissanen 1989, 1999, Kohnen 2007, Lenker 2007). In this context, the theoretically significant aspect is that, when the qualifications of these items for membership in a particular category at the discoursal and textual levels are assessed, it becomes clear that they are not members of the same patterns of variation (Meurman-Solin 2004b).

The tagging in the 2007 version of the CSC is appropriate for multi-level analyses of such systems as connectives (conjunctions, adverbs, adverbial phrases and prepositional phrases), relatives, appositives, causative and optative constructions, multi-word verbs, verb complementation types and adjective complementation types. However, it is relatively easy to design a data-retrieval tool for a number of other systems by combining various types of information in the grammel. Thus, for a study of the system of addressing the recipients in letters, the combination of the grammels of personal pronouns and nouns with the core property voc 'vocative' will provide the required inventory. The user is advised to experiment with a number of data-retrieval tools based on combinations of different types of information in order to find the most reliable one.

3.2.3.2 Data retrieval for the identification of historical continua

Even though the 2007 CSC does not permit the study of developments in epistolary prose up to Present-Day Scottish English (for other electronic data sources for Scots, see Section 1.2 Digitized data for Scots), the tagging system has been tailored to meet the challenge of tracing developments over a long time-span, the chief interest of the compiler/tagger being the reconstruction of historical continua in areas such as clause-combining devices, pronominal reference systems and nominalization. This requires that the source of an item or collocate continues to be transparent over time, even though a later grammaticalization or reanalysis, for example, would permit recategorization. Thus, according to is tagged $accord/vpsp-pr>pr and $to/pr<vpsp-pr, seeing (that) $see/vpsp-cj{c} (and $that/cj<) and exceedingly $exceed/vpsp-aj-av and $-ly/xs-vpsp-aj-av, indicating that the ultimate source of all these is a present participle.

Tags providing information about potential rather than established membership of what I conceptualize as 'categorial space' are intended to ensure that comprehensive inventories can be created for the examination of the full scale of variation. For example, the tag for the variational pattern of upon (the) condition that includes upon this condition that ($condition/n-cj{c}), even though the determinative element this positions the variant at that end of the cline where an abstract noun followed by a nominal that-clause as the second unit would be analysed as an appositive structure ({c} in the grammel indicates the presence of the complementizer that, $that/cj<). The approach to lexicalization is the same. Consequently, in invariable collocates such as anyway and at all, the two units are tagged separately, and their interrelatedness is indicated with arrows: $any/pn-aj>n-av and $way/n-av<pn-aj, $at/pr>pn-av and $all/pn-av<pr. With anyway, there is also a comment on the absence of a preposition (i.e. {zero pr}), which permits the grouping of prepositional and non-prepositional adverbials.

Even though there is evidence of the grammaticalization of a number of well-established connectives incorporating nouns in Scots by the sixteenth century, these have been given tags that trace their development over time. This decision makes it possible for the users of the database to create inventories of all clause-combining devices incorporating nouns, irrespective of what degree of grammaticalization or lexicalization they may reflect in the various contexts in which they have been attested. In fact, it may sometimes be impossible to define exactly where a particular instance should be positioned on the historical continuum. Thus, the well-established subordinator because, here as its variant becaus, which usually occurs in the CSC in univerbated form and without the complementizer that, is tagged as follows:

$by/pr-cj_BE+

$cause/n{rc}-av-cj{-c-emb-post}_+CAUS

{zero that<}

The grammel pr-cj, attached to the preposition by, relates this element to all prepositions that introduce a nominalization of some kind, in this instance the nominalization of a causal process, the comment {rc} standing for 'reduced clause'. The adverbial function of by cause is also indicated, the third core property recording the use of the phrase as a connective, this time as a subordinator without the complementizer that ({-c and {zero that<}; 'emb' indicates embedding and 'post' the position of the subordinate clause after the main clause.

The same nominalization also occurs with a prepositional complement:

$by/pr-cj_BE+

$cause/n{rc}-av-pr>pr_+CAUS

$of/pr<n-av-pr_OF

The shared feature(s) of the grammels of prepositions followed by nominals permit the creation of inventories of all subordinators and prepositional phrases incorporating nominal heads. Thus, adverbial phrases with simple noun-phrase heads such as to the end that can also be searched using the core property string n-av-cj:

$to/pr_TO

$/T_THE

$end/n-av-cj{c-post}_END

$that/cj<_THAT

3.2.3.3 The discourse basis of categorization

The third factor stressed in this theoretical approach is the inherent fuzziness and polyfunctionality that is evident in language use when it is examined using representative large-scale corpora. In order to summarize aspects discussed in more detail in other sections, I would like to mention three features which make a manuscript-based corpus of Scottish correspondence ideal for developing practices for tagging fuzziness. Firstly, the data primarily represents online language use in an explicitly interactive communicative situation. There is relatively little or no editing. Prototypically, fuzziness results from intended or unintended pragmatic inferences in a piece of text in context. Thus, it is appropriate to base the study of discourse-triggered semantic and pragmatic change on online language use. Secondly, in a corpus of letters there is a rich variety of idiolectal and local grammars, which reflect the different degrees of linguistic and stylistic literacy of their writers. In principle, each idiolect will have to be viewed individually, without assuming that there will be similarities or, for that matter, particular differences between members of a particular speech, discourse or text community (Meurman-Solin 2004c). Most importantly, there are idiolects that are relatively free from standardizing trends. Thirdly, no comprehensive description of the Scottish variety can be imposed on the tags, since no such description is available. Previously unexplored data challenges us to take fuzziness and polyfunctionality seriously, as views on the category membership of particular features are less established. Moreover, manuscript-based data is also free from standardizing trends adopted by early printers or later editors.

Problems caused by the ambiguity, categorial fuzziness and polyfunctionality of particular linguistic features are of course a challenge the moment one starts processing data using semantic, pragmatic and discourse-analytic tools. In the tagging system applied to the CSC, these problems have been solved by providing information about both the prototypical (which is usually also the earliest) use of a particular item and the use in discourse in the text being tagged. For example, in the sentence 'We shall wait unto the time we receive your lordship's answer to this letter', the category membership of the word time is described in terms of a noun (n) being used in a prepositional phrase as an adverbial (av) functioning as a subordinator of time (cj):

$unto/pr_UNTO

$/T_THE

$time/n-av-cj{-c-post}_TIME

{zero that<}

The element '-c' is co-referential with the comment {zero that<}, both indicating that there is no complementizer (cf. unto the time that). Using tags of this kind, it is possible to create an inventory in which subordinators such as to the time (that), until the time (that), until, till and to, for instance, are part of the same variational pattern (see also Meurman-Solin 2002 and Brinton 2007).

The tags indicate fuzziness and polyfunctionality by referring to co-ordinates on a cline. As a result, there is no need to insist on membership of a single category for a given word. The order of these co-ordinates is carefully controlled, and the hierarchy between different types of information is transparent. Core properties such as form, word-class and function precede components providing contextual information. On the one hand, the co-ordinates permit the positioning of a particular feature in variational space, while on the other they trace developments over time. In other words, no tag is merely an interpretation of a particular occurrence in a particular context, but provides information on all the different stages, faithfully reflecting historical continua. In this system, ambiguity is no longer a problem, because ambiguity, or fuzziness, has been dispelled by providing all the relevant co-ordinates to reflect variation and change over time. In order to trace developments that take place over a long time-span, such as grammaticalization processes, it is necessary to indicate the various stages, beginning with the origin, listing properties perceived in the analysis of pragmatic inferences, identifying examples which provide evidence of a process of grammaticalization, and stating the grammaticalized use and any further developments.

For example, in order to trace change over time as reflected in the history of indefinite pronouns, elements such as thing, one, body and man can be grouped into a variational pattern using property strings and contextual indicators of the following kind:

Anything
$any/pn-aj>n-pn_ANY
$thing/n-pn<pn-aj_THING
Anybody
$any/pn-aj>n-pn_ANY+
$body/n-pn<pn-aj_+BODY
Anyone
$any/pn-aj>qc-n-pn_ANY
$1/qc-n-pn<pn-aj_ONE
Any man
$any/pn-aj>n-pn_ANY
$man/n-pn<pn-aj_MAN

Another example that illustrates fuzziness resulting from the focus on diachronic developments is the tagging practice applied to anaphoric reference signals such as same, ilk and like. These items can be viewed as members of the same variational pattern:

Same
$/T>pn_THE
$same/pn<T_SAME
Ilk
$/T>pn_THE
$ilk/pn<T_ILK
Like
$/T>pn_THE
$like/pn<T_LIKE

These examples have been intended to illustrate that in the majority of cases categorial fuzziness can be dispelled by describing the properties of a feature with reference to its origin as well as context of occurrence, positioning it on a cline. It is suggested that this practice will allow the tagger to faithfully depict the data without resorting to the kind of streamlining which is unavoidable in traditional tagging and parsing methods. Since tags should be as theory-neutral as possible and the role of each individual tagger as an interpreter of linguistic data should remain as low-profile as possible, the use of descriptive clines of properties is suggested to permit the creation of more valid inventories of features than more compartmentalizing tagging practices.

The tagging principles have been influenced by the discussion of notional or conceptual properties, elaborated tags making the interrelatedness of the members of a particular notional category explicit (Anderson 1997, Jackendoff 2002). Even though the conventional categorization into parts of speech is applied to the tags, they also succeed in indicating fuzziness and polyfunctionality by referring to co-ordinates on a cline rather than insisting on membership in a single category. Scalar concepts such as 'nouniness' and 'adverbhood' reflect this approach. I would like to mention that one of my personal research interests is the examination of the discourse basis of nouniness, enabling me to relate the discussion of how discourse analysis meets typology to Lehmann's (1988) parameters. These are relevant in depicting the continuum from maximal elaboration to maximal compression in clause linkage, nominalization resulting from what Lehmann calls desententialization being closer to the compressed end of the cline (see also Section 3.2.3.5).

In their important article on the discourse basis for lexical categories, now republished in the OUP reader Fuzzy Grammar, Hopper & Thompson (2004: 248) discuss the integration of 'the notional side of categories with their pragmatic function in language use'. While they accept the broad correlation that, for example, 'certain prototypical percepts of thing-like entities will be coded in a grammatical form identifiable as N' (ibid.: 249), they set out 'to show that semantic congruence is actually rooted in predictable pragmatic (discourse) functions.' Moreover, in their view, even though semantic features which are used to assign 'concrete, stable things' (such as visibility) to Ns and 'kinetic, effective actions' (such as movement) to Vs are relevant, these features 'do not seem to be adequate for assigning a given form to its lexical class' (ibid.: 251). This is because '[p]rototypicality in linguistic categories depends not only on independently verifiable semantic properties, but also and perhaps more crucially on linguistic function in discourse.' In the CSC, the string of co-ordinates in the grammel provides the user with information about both the lexical class and the discourse function of the tagged item (see also Meurman-Solin 2007b).

Discourse-based tagging can be illustrated with the following three examples: Right honorable as a term of address is tagged $right/av, $honour/aj-n{ho}-voc and $-able/xs-aj-n{ho}-voc, in which {ho} comments on the honorific function, 'voc' being an abbreviation of vocative. The two core category names connected with a hyphen in the tag (here aj-n) indicate the nominal use of an adjective.

Similarly, in the variational pattern of conform to and conformand/conforming to ($conform/v-aj-pr>pr, $to/pr<v-aj-pr and $conform/vpsp-aj-pr>pr, and $to/pr<vpsp-aj-pr), the core property co-ordinates v-aj-pr and vpsp-aj-pr function as addresses for the tracing of the historical continuum of these items and other prepositions originating from verb forms.

Multi-unit connectives such as as soon asprovide links between the correlative pair $as/av>cj and $as{time}/cj<av, with soon being assigned to the category of adverbs occurring in a connective function by the core properties av-cj in the grammel. The varying semantic roles of connectives representing this type (cf. as/solong as, as much as, as far as, etc.) are specified by a comment in the lexel (e.g. in as long as, as {cond} is distinguished from as{time}).

If ambiguity cannot be made transparent by tagging, comments such as {syntactic ambiguity>} and {syntactic merger>} have been added. These alert the user to the possible need to re-examine the problematic structure that follows the comment.

3.2.3.4 Zero-realisations in variational paradigms

Another essential ingredient in this theoretical approach is that zero-realisations are included in patterns of variation. In other words, a zero-realisation is one of the attested variants on the cline from zero to reduced or elliptical to 'full' variants, this cline being construed on the basis of corpus evidence. It should be pointed out that in principle the term zero-realisation can be considered misleading, since it may not always be possible to verify empirically that something has been omitted. However, in the CSC approach, a zero-realisation is indicated when a variational pattern comprising both explicitly expressed variants and those left implicit has been repeatedly attested in the data.

Zero-realisations are primarily indicated using comments in curly brackets (cf. the discussion of grammels introduced by a zero at the end of this section). Ellipted items, including features carefully discussed in grammars, such as that-deletion, are indicated using comments such as {zero v}, {zero aux}, {zero that}, and {zero S}. The following example contains three zero-realisations: the first two are the relative pronoun and the verb; the clause type is labelled as a verbless relative structure, an alternative to the finite clause 'who are next (to) my sovereign':

$all/pn-aj>R_ALL

$other/pnpl>R_VTHER+IS $/plpn>R_+IS

{zero rel}

$/0RN{+h2}<pn-aj<pn-aj_0

{zero v{rel}}

$next/aj-pr>pr_NIXT

{zero pr<aj-pr}

$/P11G+C_MY

$sovereign/aj-n-av_SOUERANE

The third instance of a zero-realisation illustrates the tagging practice which stresses attested variational patterns, the prepositional type next to having also been recorded in the CSC. Rather than ask the users of the database to design searches based on a list of lexels reflecting variation of this kind, the combination of information in the grammel and the comment permits them to create a full inventory of prepositional and non-prepositional items. Similarly, the variation between so that and that either in the semantic role of result or that of purpose is indicated in the following way:

{zero av>cj}

$that{result}/cj{emb-post}<av_Y^T

 

{zero av>cj}

$that{purpose}/cj{emb-post}<av_Y^T

The combination of the comment {zero av>cj} and the elements in the grammel cj<av permits the retrieval of all connectives sharing these components.

The recorded variants of the verb please, which mostly occurs at the beginning of a letter, suggest that instead of analyzing it as an impersonal verb, the following tagging practice provides the most efficient tool for data retrieval:

{zero formal S}

$please/vsjpt13<S-_PLES+IT $/vsjpt13<S-_+IT

$/T>pn_THE

$same/pn<T_SAmING~

{zero vi<S}

$that/cj_Y^T

The omission of the predicate verb (usually the verb to know) from a postposed nominal infinitive clause with a nominal that-clause object is quite frequent in the CSC, the variant being it please NP to know/to be informed that.

The explicit marking of semantic relations between sentences and clauses is a salient feature in early letters, zero marking being much less frequent in the historical data examined than in similar registers in Present-day English (Meurman-Solin 2004a). In the CSC data, the highly elaborate clause-linkage system can be illustrated using the following extract from a late seventeenth-century letter:

as for our Couk. that man [zero] you feid \ at Edinburgh you kno befor you went from Chanain \ I told you [zero] I wad give him his live becase his \ conditions wase tou much . and sins I wase in\forming my self about. him, and thay tell \ me [zero] he is a {an unidentified word; correction unclear} illretired Cheild \\ yet I writ to gorge biset [zero] if he wase satisfyd wt \ the 6 pound I wase content to tak him. but \ sinse I cam hear he cam and ofired him self \ and I told him that I had advertisd him befor \ that I thoght his condition tou much, yett \ wase content to give him mor then 6 pound \ if he wad tak fei for all becase his casawalatis \ wad a bein mor then twelve pound. and when \ we give casawalatis pipoll talks of it mor \ then it is wirth. hou ivir he wold not {an inserted word unclear} \ a jot, so I gave him ovir, and I have non at \ all for the preasent, for he that wase \ at scello wase not good. and robin whait \ will not ingage [ZERO] he is but a weik lad. but he \ comse in and douse any thing [zero] we disair, him \ so if you could get Cheif perswadit, I think [zero] \ what he hes learned in inglend wold do \ well anof in scotland, but if he, will not \ I think [zero] it is best [zero] you seik for on therfor betir give a good fei to on that can do well \ then on that can not. if you war thinking \ to stay in the south, I fansi [zero] a woman \ kouk wad do best, becase we wold not \ have brewing or beking or kiling of meat \ and awoman I belive wad fei for les then a \ man. not having that to do, and she wold \ help to dres roums or any other thing \ [zero] war to be doon. but if we stay hear a man \ I think wad do best, but when you fei \ him , give him nathing but fei for \ when thay get other casawalatis \ {f1vb} thay straiv to spend mor then thay wad do otherway{ins}se? \ [ZERO] we gave 6 pound last. and you kno 8 pound wase the \ most that ivir ve gave but you may straiv to get them \ ase cheap ase you can. [ZERO] I onse heard m=rs= havircemp say that \ the duchis of latherdeall had awoman Couk that servd hir \ hous . for ase grit a family ase it wase, if it be a man kouk \ and if we stay in the north. I will give him aman and a \ litill booy undir him. and whither man or woman, if we \ stay at Edinburgh I will give them only a lase undir them. if \ I get the undir kouk of my oun Choosing I will pay the \ fei and if thay choos them thay most pay the fei them \ selvs. but I rether choose them my selfs becase I \ will get them cheapest. [ZERO] you may tak yr sistirs adve?is \ in this. [ZERO] that man that you feid at Edinburgh wase serving hir \ onse sa she will kno if he be ill or not, I haue trubiled \ you tou much with this subjek.

(NLS MS Dep 175/70/Bundle 3/1990; Margaret Mackenzie, 1688) (see Meurman-Solin 2004b: 178-179)

As illustrated by this extract, the reconstruction of the full variational pattern as regards clause linkage requires the signalling of the presence or absence of text-structuring connectives (in green), adverbial links (in yellow) and adverbs as logical connectors (in red). In the extract, zero-realisations of these have been indicated with [ZERO], and those of complementizers with [zero].

The following example contains a directive:

{zero pre}

$get/v-imp_GET

$as/av>cj_AS

$exact/aj_EXACT

$information/n{rc}>pr_INFORMATION

$&/cj_&

$knowledge/n{rc}>pr_KNOWLEDGE

$of/pr<n&n_OF

{\}

$every/pn-aj>n-pn_EUERY

$thing/n-pn<pn-aj_THING

$as{comp}/cj{post}<av_AS

$/P02N_YOU

$can/vm_CAN

{zero vi}

$possible/av_POSSIBLE $-ly/xs-av_0

Directives of this kind are frequently related to the preceding context using adverbs such as so or therefore, or the connective and, which in some idiolects may introduce virtually every new proposition. The absence of a link of any kind is here indicated by the comment {zero pre}. Correlative pairs of connectives (although yet, as so, since therefore) are considered to be variants representing a high degree of explicitness on the continuum which depicts the clause-combining system in the CSC data, while the opposite end of the cline is defined as the absence of explicit links:

$although/cj{-c-pre}>av_ALTHOUGHT

$duty/n_DEUTIE

$&/cj_AND

$obligation/n{rc}_OBLIGATION

{\}

$do/vsjpt{neg}{conc}>vi_DID

$/neg<v_NOT

$engage/vi<v_INGADGE

$/P11O_ME

$yet{conc}/av<cj_YETT

$gratitude/n{rc}_GRATITUDE

$oblige{cause}/vps13<n+_OBLIDG+ES $/vps13<n+_+ES

$/P11O_ME

$to/im+C_TO

$lay/vi-av>pr_LAY

{\}

$hold/n{rc}_HOLD

$on/pr<vi-av_ON

$every/pn-aj_EUERIE

$occasion/n_OCASION

The absence of yet in this example would be marked with the comment {zero post}.

A limited set of linguistic features have been selected for an experiment in which, in addition to the comment indicating zero-realisation, a separate lexel + grammel introduced by zero (e.g. 0RO for the zero-realisation of a relative pronoun as object) is provided. In participial relative clauses, for instance, the absence of a subject relative pronoun is indicated as follows:

$/T_THE

{\}

$lord/npl>pr_LORD+IS $/pln>pr_+IS

$of/pr<npl_OF

$/P11G+C_MY

$sovereign/aj_SOUERANE

$lord/nG{ho}_LORD+IS $/Gn{ho}_+IS

$&/cj_AND

$master/nG{ho}_MAISTER+IS $/Gn{ho}_+IS

$council/n{coll}_COUNSALE

{zero rel}

$/0RN{+h2}_0

{zero aux}

$choose/venpp{pass}{rel}_CHOS+IN $/venpp{pass}{rel}_+IN

$in/pr_IN

$parliament/n-av_PARLIAMENT

Another device for indicating zero-realisation is the use of a zero only in the slot provided for variants attested in the text, i.e. the slot following the sign _. For example, in accordance with the so-called Northern Subject-Verb Rule, uninflected verb forms in certain contexts have a grammel of the following kind:

$/P11N_j

$know/vps11<P+_KNOW $/vps11<P+_0

In this example, the fact that the verb know is uninflected in this particular context (immediately preceded by a pronoun subject) is signalled by putting a zero in the slot in which the inflectional morpheme is usually positioned. This instance could be compared with the following:

$/&_AND

{zero S}

$pray/vps11<P-_PRAY+S $/vps11<P-_+S

$/P02O_YOU

The verb pray appears in a suffixed form, and has no immediately preceding subject (for more detailed information, see Meurman-Solin 1992).

As pointed out earlier, another essential ingredient in this theoretical approach is that zero-realisations are included in variationist paradigms. For example, the morpheme -ly in open-class adverbs is tagged as a suffix (/xs-av), and its absence is interpreted as a zero-realisation.

Exceeding

$exceed/vpsp-aj-av_EXCEED+ING $/vpsp-aj-av_+ING $-ly/xs-vpsp-aj-av_0

Exceedingly

$exceed/vpsp-aj-av_EXCEED+ING+LY $/vpsp-aj-av_+ING+ $-ly/xs-vpsp-aj-av_+LY

The tag for the verb form is positioned before those for word-class and function in tags of participial adjectives as adverbs.

3.2.3.5 Variation on a cline from maximally-elaborated to maximally-compressed ways of processing information

The volume Connectives in the history of English (Lenker and Meurman-Solin 2007) includes a number of studies which illustrate how comprehensive corpus-based inventories permit us to identify continua in diachronic developments that would remain undetected without concepts such as fuzziness, polyfunctionality and reanalysis in our toolkit. A particularly useful framework for an inventory of systems of clause linkage emerged from the typological investigation of the most important aspects of complex sentence formation in the languages of the world by Lehmann (1988). Lehmann defines six parameters, ranging from two maximally-elaborated paratactic clauses with finite verbs and no syntactic embedding at one end, to a single clause containing an embedded non-finite predicate with no complementizer or other element signalling embedding at the other. The possibilities thus range from a pole of 'maximal elaboration' to a pole of 'maximal compression (or condensation)'. The six parameters are as follows: i) hierarchical downgrading of the subordinate clause (from weak parataxis to strong embedding); ii) main clause syntactic level of the subordinate clause (from high sentence to low word); iii) desententialization of the subordinate clause (from weak clause to strong noun); iv) grammaticalization of the main verb (from weak lexical verb to strong grammatical affix); v) the interlacing of two clauses (from weak clauses adjunct to strong clauses overlapping); and vi) the explicitness of the linking (from maximal syndesis to maximal asyndesis).

Focus areas in the CSC tagging system are nominalization, what Lehmann calls desententialization, and the grammaticalization of the superordinate predicate with causative constructions. Nominalizations can be retrieved using the comment {rc} 'reduced clause', which is attached to the lexical class of an item:

$but{without}/pr-cj_BUT

$any/pn-aj_ONY

$delay/npl{rc}-av_DELAY+IS $/pln{rc}-av_+IS

$or/cj_OR

$off/av>vnpl-k-av_OF+

$put/vnpl{rc}-k-av<av_PUT+TING+is $/vnpl{rc}-k-av<av_+TING+is $/plvn{rc}-k-av<av_+is

A preposition with a nominalization as its complement is given the grammel pr-cj to stress the fuzziness between prepositions and conjunctions in this context.

Lehmann discusses nominalizations as examples of desententialization. He points out that 'the more a verb gets nominalized, the more it starts behaving like an ordinary noun. It is in this sense that we may speak of the increasing nominality (or 'nouniness') of subordinate clauses, when they are reduced by desententialization' (Lehmann 1988: 197; see Figure 3, p. 200):

At my coming from Ireland to Edinburgh

 

$at/pr-cj_AT

$/P11G+C_MY

$come/vn{rc}-av>pr>pr_COM+ING $/vn{rc}-av>pr>pr_+ING

$from/pr<vn-av_FROM

;_* IRELAND

$to/pr+V<vn-av_TO

;_* EDINBURGH

Your happy delivery of a young Charles

 

$/P02G_YOUR

$happy/aj_HAPPY

$delivery/n{rc}>pr_DELIVERY

$of/pr<n_OF

$/A+C_A

$young/aj_YOUNG

'_*CHARLES

As regards causative constructions, Lehmann (1988: 201-2) uses the Italian example 'Ho fatto prendere a mio figlio un'altra professione' to illustrate how a verb 'combines directly with a subordinate verb to yield an analytic causative verb'. In the CSC, the following tagging permits the creation of an inventory of varying causative structures, which reflect different degrees of grammaticalization:

Have + object + bare infinitive

 

[which I would]

$have{cause}{lat}/vi_HAVE

$/P02O_YOU

{zero im}

$compare/vi{-im}-av_COMPARE

[diligently with him]

 

Have + object + present participle

 

[your grace would have]

$have{cause}{lat}/vpp{psp}_HAD

$/P11G+C_MY

$wife/n_WIFE

$come/vpsp-av_COM+ING $/vpsp-av_+ING

[to your grace]

 

See + object + past participle

 

[I ask you]

$to/im+C_TO

$see{cause}{lat}/vi-av_SIE

$/P13OI_IT

$sell/vpp{pass}-av_SOLD

 

Cause + nominal that -clause

Cause + object + bare infinitive

 

$cause{cause}{lat}/v-imp_CAUSE

{zero that&Oinf}

'_ANN

$receave/vsjps13<n{nom}_RECEAUE

$/P13OI_Jt

This last example illustrates a case of ambiguity which can only be solved by providing the two alternative readings in a comment. Since the verb cause mostly occurs with a to-infinitive complement, the tagger has opted for the first reading in providing the grammel for receave, but of course this decision is not based on detailed research. In investigating infinitives and present subjunctives, the user should also submit to a detailed analysis occurrences after the comment {zero that&Oinf}.