A variational pragmatic approach to reformulation markers in English and Hungarian

It is well known that the variationist paradigm was originally developed for the analysis of the social stratification of phonological features, and the methodology was later extended for the study of morpho-syntactic and lexical features. Variationist studies of discourse-pragmatic features are even more recent. Moreover, as Pichler notes, studies of phonological and morpho-syntactic variation and change have been “relatively homogeneous and congruent in focus and methodology” (2010: 582), while there is remarkable heterogenity in the study of discourse-pragmatic variation due to the “lack of a coherent set of methodological principles” (ibid.).

The present paper will provide a combination of inductive and deductive, quantitative as well as qualitative analyses of Hungarian reformulation markers (RMs) across a variety of speech situations and genres. The study will map the functional spectrum of one thousand randomly selected tokens of the RMs vagyis (raw frequency=27,722), azaz (RF=27,824) and mármint (RF=2,770) in the 181-million-word Hungarian National Corpus (MNSZ) in three registers (literary, political and private discourse) across five regional varieties of Hungarian (those spoken in Hungary, Slovakia, Subcarpathia, Transylvania and Vojvodina) and compare the results with previous research into the use of the most frequent English translation equivalents I mean, that is, and or (rather) (cf. e.g. del Saz Rubio: 2003; Cuenca: 2003).

The individual tokens have been tagged for the following features: (1) collocations and co-occurrence patterns (including discourse marker clusters), (2) speech act / function / rhetorical role of the host utterance and the preceding utterance, (3) RMs’ position in the utterance, and (4) focus of DM (narrow [NP/VP], and broad focus).

The variation coefficient (CV%) and Juilland’s D dispersion values are surprising: it is the more frequent and intuitively more informal RMs vagyis and azaz that show a relatively even distribution across genres and are less frequent in informal genres (CV%vagyis=21.44%; Juilland's Dvagyis =0.79; relative frequency in private discourse vagyis =68%; CV%azaz=23.97%; Juilland's Dazaz=0.76; RF in PDazaz=101%), while the intuitively more formal RM mármint is more unevenly distributed (CV%=41.74%; Juilland's D=0.58) and is more frequent in private discourse (RF in PDmármint =321%).

The paper will argue that the dispersion values can be explained with reference to the correlations between the functional features and socio-pragmatic parameters that have been annotated, and they reflect medium-specificity, degree of planning and institutional norms, and to a lesser degree regional variation.


Social meanings of discourse markers and disfluent speech

Research on the evaluation of linguistic variants of which one is a standard and the other a non-standard feature, e.g. –ing and –in as in "singing", find evaluative differences to relate to perceived prestige, solidarity and dynamism (e.g. Campbell-Kibler 2011). This study aims to find out whether these dimensions also surface with conversational variants to which the standard/non-standard dichotomy may not apply, e.g. unfilled pauses and, to a lesser degree, the discourse marker “you know”. Similar findings could be seen as evidence for more general evaluative cognitive mechanisms, rather than a specific sociolinguistic one, such as the sociolinguistic monitor (Labov et al. 2011).

This study explores these questions by conducting perceptual tests with several guises: speech without noticeable pauses and discourse markers, and the same speech with three 300ms unfilled pauses and / or discourse particles “like” (D’Arcy 2007) or “you know” (in a specific function, Holmes 1986) inserted. Stimuli were prepared for two female, mid-20s speakers and three topics, resulting in a total of 36 stimuli. Data were collected in England in 2016. 668 respondents rated three stimuli each on scales such as intelligence, casualness, etc. in a between-subjects design.

Respondent ratings were subjected to mixed effects linear regressions. Excerpts with unfilled pauses are heard as more prestigious (more educated and posh) but less dynamic (less confident, certain, etc.) than the neutral guise. “You know” is heard as less prestigious and less dynamic than the neutral guise. In guises where unfilled pauses precede “you know” or “like”, new social meanings emerge. This study concludes that those mechanisms responsible for the evaluation of standard and non-standard speech also seem to apply to conversational features. It further provides support for a theory of indexicality that is not meaning-additive but meaning-interactive as new meanings emerge when different features are combined.


The discourse-pragmatic marker ‘you know’ in two native and two non-native varieties of English

Recent studies suggest that the same discourse-pragmatic marker [DPM] may be used differently across varieties of English (Kallen 2005, Siemund et al. 2009), or differently by non-native (NS) and native speakers (NNS) (Diskin 2017, Müller 2005). A DPM may also vary as regards its position within the turn (Heritage 2015). ‘You know’ fulfils a variety of functions, including requesting acknowledgement (Schourup 1985) or reassurance (Holmes 1986); appealing to shared knowledge and achieving intimacy (Östman 1981, Schiffrin 1987); and introducing consequence, background information or clarification (Erman 1987). It has been found to be more frequent among NS as compared to NNS (Fung and Carter 2007), that its use increases with proficiency and acculturation (Hellermann and Vergun 2007), and that NNS favour its ‘coherence’ over its intersubjective functions (House 2009).

This paper presents a quantitative analysis of the frequency, function and position of ‘you know’ in two NS (Irish and Australian English) and two NNS varieties of English (Polish and Chinese migrants in Ireland). The data originates from two corpora of sociolinguistic interviews with 53 individuals collected by the author in Dublin and Melbourne. Using fixed effects regression models, results show that ‘you know’ is employed significantly more frequently among the Polish group as compared to both the Chinese and the NS, with no effect found for proficiency. The Poles were more likely to use ‘you know’ in turn-medial position, to focus or illustrate, whereas the Irish group used ‘you know’ significantly more in an interpersonal turn-final position, often eliciting a (minimal) response from their interlocutor. No differences were found in rates of use between the Irish and the Australians, but, as alternatives to ‘you know’, the Australians were found to employ other turn-initial markers such as ‘I mean’, or ‘look’, whereas the Irish had greater uses of ‘well’ in similar instances.


Um, about that, uh, variable: uh and um in teen instant messaging

Recent variationist studies of filled pauses in English have shown that uh is declining in favour of um in apparent time (Fruehwald, 2016; Wieling et al., 2016). Wieling et al. (2016) suggest that um may be taking on a new discourse function in English, leading to an increase in its frequency. This study adds to these findings with data from a corpus of young-adult Toronto instant messaging (IM) (Tagliamonte, 2003–2006, 2007–2010, 2016; Tagliamonte & Denis, 2008). IM is a written format, meaning that using um/uh requires conscious effort (Tottie, 2017; Wieling et al., 2016). This means that in IM, um/uh are being used as discourse markers, not unconscious ‘hesitation markers’.

While each individual primarily uses one of the two variants, possibly as part of establishing a consistent personal style (as with (ing) in Dinkin, 2014 and u vs. you in Tagliamonte & Denis, 2008), most speakers use both variants, and intraspeaker choice is conditioned by position in message: initial position favours um. Um is primarily used to introduce propositions, indicating confusion, uncertainty, disagreement, and/or discomfort (1-a), while uh is primarily used mid-message or message-finally, indicating overt lexical access (1-b) or hesitation (1-c).

(1)       a.       um well i sorta already told allie i would do something with her cuz

                      she is coming home that day

            b.       ok, i am trying to play that game. . . uh Hearts. . . right

            c.       thats. . . kinda. . . uh. . .

Contrary to recent work indicating that female speakers favour um (Fruehwald, 2016; Wieling et al., 2016), there is no direct gender effect in this data. However, a significant interaction between gender and syntactic position—women favour um message-initially more than men—may indicate a gender difference in terms of discourse function. Taken together, these results suggest that um and uh have divergent discourse functions online, potentially reflecting emergent differences in the spoken language.


Before the rise of um

One of the most dramatic discourse-pragmatic changes in twentieth century English has progressed under the radar of laypeople and (until recently) linguists: the rise of um as the predominant variant of the ‘filled pause’ variable (UHM) at the expense of uh (Tottie 2011, Fruehwald 2016, Wieling et al. 2016). Fruehwald (2016:43) documents this “textbook” change over 100+ years of apparent time: um increases incrementally between generations and the rise is led by women. In this paper, we investigate UHM, as in (1), at an early stage of change to determine what triggered the rise of um.

(1)     Uh as a rule they harrowed it before they um drilled it.                                  (M/1899)

We utilize the variationist method to examine UHM in the Farm Work and Farm Life Since 1890 corpus of oral histories (recorded in 1984 with elderly farmers in two regions of Ontario, Canada) (Denis 2016). The 24 interviewees were born between 1890 and 1920. For apparent-time contrast, we also consider the two interviewers (community-insiders and university students). Nearly 5000 tokens were extracted and coded for speaker birth year, gender, region and utterance position. Following Tottie (2017), utterance position may correlate with a discourse-functional contrast. We consider the possibility that functional expansion may have triggered the change (cf. Wieling et al. 2016:228).

The overall frequency of um among the farmers is 11%. We find no significant effect of gender (12% for women, 10% for men). In one region, there is an effect of birth year. In contrast, the interviewers’ frequencies are much higher and gender-differentiated. Lastly, we find no effect of utterance position. Our results indicate that this data covers the first stage of this change. At this early stage, we find is no functional difference between the forms suggesting that functional expansion did not trigger the rise of um.


Variation and change among pragmatic markers as planners in American English

Uh and um, henceforth UHM, are the archetypal planning devices that speakers use when producing online speech, but the best-known and most frequently studied pragmatic markers well, you know, I mean and like (WYIL) have also been shown to be used as planners as their original meaning has been bleached. These pragmatic markers have either been studied individually over time, or presented together as a group at one particular point in time (e.g. Beeching 2016,  Tagliamonte 2016), but to my knowledge, there have been no attempts at showing their development as a “functional field” over time. The aim of this paper is to present a diachronic overview of UHM and WYIL and to consider them as a possible functional field.

The background is that UHM has been shown to be used most frequently by older speakers (e.g. Tottie 2001, Laserna et al. 2014, Rousier-Vercruyssen 2017); the cause has usually been assumed to be the decline of cognitive functions in older speakers. I wish to discuss the possibility that there are also other factors at work and that speakers in different age groups prefer different markers as planners. The data presented in Table 1, based on c. 170,000 words from the Santa Barbara Corpus of Spoken American English (SBC), suggest that this may be the case. Note that that there is a marked decline in the use of UHM over time: the oldest speakers use much more of these than younger speakers, whereas you know is more frequent in young and medium age groups. As expected, like is only frequent in the youngest group. The data obviously have to be used with great circumspection, as WYIL retain much of their original meanings, but I shall show that their planning function is often clear from context.

Table 1. The distribution of uh, um, you know, I mean and like in impromptu conversation in American English in the SBC subsample.

15-24 16% 17% 27% 11% 29% 1383
25-34 43% 15% 28% 8% 7% 1454
35-44 44% 22% 17% 8% 9% 603
45-59 42% 23% 25% 9% 1% 1049
60+ 58% 23% 16% 3% <1% 909


Toi and tota – from a pronoun to a particle

This paper examines the planning expressions toi (tua, tuo) and tota (tuata, tuota), ‘well, ehm’, originating from a demonstrative pronoun corresponding to ‘that’, in spoken Finnish. It is not an uncommon phenomenon that a pronoun is used as a “filler” in spontaneous speech when a speaker encounters trouble formulating an utterance. Demonstrative pronouns have this function in many languages (Hayashi & Yoon 2006). The aim of this paper is to clarify the process of a pronoun turning into a particle by examining the case of toi in Finnish. The paper studies the use of toi and tota in everyday conversation, in the morphosyntactically coded Arkisyn-database.

Hayashi and Yoon (2006) describe three distinct usage types of demonstratives in the context of word-formulation trouble. Two of the types, the placeholder use and the interjective hesitator use, are relevant when toi is concerned. Placeholders are referential pronouns and take part of the syntactic structure of the utterance, that is, the forms used correspond syntactically and semantically to the word, the place of which the pronoun is holding. Contrasting to this, hesitators are non-referential, have no role as a clausal constituent, and have little correspondence to the word the speaker is searching.

This paper shows that, in Finnish, there is a continuum from a genuine demonstrative pronoun to a particle. That is, the reference turns more and more “open” and unclear until it is lost completely. (For “openness” of tota, see Etelämäki & Jaakola 2009.) In the referential end of the continuum there are placeholder pronouns, often in the nominative case (toi) and used in word searches and other purposes. In the non-referential end there are genuine particles, usually in the form of the partitive case (tota). In between there are occurrences in either partitive or nominative case, with some, but often vague, reference.


The Thai Pragmatic Particle di: Corpus Analysis and the Use of di Compared to si by Native Thai Speakers

Final particles are areal features in languages spoken in Southeast Asia and some countries in East Asia (Goddard 2005). Thai pragmatic particles or final particles are generally used in a spoken language to express feelings or attitudes of speakers, intimacy between interlocutors, politeness, and social status. Moreover, they are important in terms of making conversations sound smooth and natural. One of the well-known particles in Thai is si which is normally found in commands or confirmative utterances. According to Maklai (2015), the communicative functions of si are to increase an authority in an utterance, to make a firm utterance, to show no interest of speakers, and to mark a topic of an utterance with a contradictory tone. Recently, the use of pragmatic particle di instead of si has increased among many native Thai speakers especially, the young. This study aims to investigate whether the pragmatic particle di is the variation of si, and to explore the use of di compared to si by native Thai speakers. The study consists of two parts. The first part involves an analysis of communicative functions of di using data from the Thai National Corpus (TNC) compared to the communicative functions of si from Maklai (2015). The results show that in general, the pragmatic particle di has similar communicative functions as si, and it can be concluded that di is one of the variations of si. Another part involves the use of di from an online survey by two different age groups of native Thai speakers; 40 participants of 20-30-year-old, and 40 participants of 50-60-year-old.The findings show that the group of 20-30-year-old participants generally uses the pragmatic particle di more than si significantly whereas the group of 50-60-year-old participants rarely uses di. It can be seen that one important factor among many factors in the use of the variation of si is the age of speakers.


Expressive pseudo-masculine particles in the history of American English: A corpus-based account

Expressive particles are a type of pragmatic particle that are used as interjections to convey emotional stance or express an attitude toward the interlocutor while adding no substantial new information to the propositional content (see Aijmer 1996, McGready 2009). One particular subtype of expressive particles are pseudo-masculine particles: nouns such as man or buddy that express a sense of companionship or common ground, but may also indicate condescension or a sense of excitement or exasperation. Although these particles have been discussed in literature before (Hill 1994, Kiesling 2004, Siegel 2005, McGready 2006 and 2009, Rendle-Short 2010, Nousiainen 2014), no previous study has traced their use over a long timeline applying detailed grammatical and pragmatic analysis to evidence from large corpora.

In this paper, I will discuss the diachronic development and distribution of the particles boybuddy, dude, man and mate in American English. Using the 400-million-word Corpus of Historical American English (COHA) as primary data and operationalising the particles as single tokens separated by punctuation, the c. 12,500 observations were analysed using the following variables: sentence type, placement within the sentence, tense, polarity, appositional function, direct interlocutor address or absence thereof, the presence of modal verbs and their type, the semantic class of the lexical verb, and the pragmatic function the particle. This matrix is analysed using multifactorial non-parametric regression, in the present study recursive partioning (see Strobl et al 2009) to identify significant and substantial trends over time. As will be shown, the use of the particles remained essentially stable at a low frequency until the 1940, after which both their use and lexical diversity increased rapidly, hitting a peak in the 1970s and turning into decline from there.


The role of (historical) pragmatics in the uses of response particles. The case of French

Many languages use particles as minimal affirmative vs negative responses to a preceding utterance by a different speaker. Typologically, response particles function according to two basic systems, a polarity-based one and a (dis)agreement-based one.

The French system is often thought of as polarity-based, oui (‘yes’) and non (‘no’) marking the positive vs negative polarity of the response. However, it is in fact a hybrid system, integrating elements of (dis)agreement. Saliently, French has a second affirmative particle si, which marks reversal of the negative polarity of, and thus disagreement with, the utterance it responds to, cf. (1):

(1)   A : Jean ne viendra pas.
       B : Si(, il viendra)./Non(, il ne viendra pas).

       ‘A: Jean won’t come.
        B: Yes(, he will)./No(, he won’t).’

Moreover, oui is often preferred to si or non when responding with agreement to syntactically negative utterances that are positively oriented at the pragmatic level, as seen in (2):

(2)   A: N’êtes-vous pas la fille de X ?
       B : Oui/Si.

       ‘A: Aren’t you X’s daughter?
       B: Yes.’

I argue that a better understanding of the current system can be obtained by taking historical pragmatics into account. The French response particles result from lexicalization of two different constructions in Medieval French, oui < oïl < o il < Latin HOC ILLE (FECIT) (‘this he/it (did)’) vs si (< Latin SIC ‘thus’)/non + V. Medieval French had a second negative marker, viz. nenni, whose source construction nenil < nen il (‘not he/it’) is analogous to that of oui. I show quantitatively that the two pairs of response markers (oïl/nenil vs non/si) originally occurred in distinct types of contexts and had different types of pragmatic import. This remains true of oui/si, whereas in the case of the negative markers, non gradually encroached upon the territory of nenni, eventually ousting the latter.

On the diachrony of giusto? (right?) in Italian: A new discoursivization?

In Italian, the adjective giusto ‘right’ has performed the discourse function of response marker since at least 1613 (DELI, 2008: 671). This study shows that in the last forty years, the adjective has undertaken a new process of discoursivization, defined as the diachronic process that ends in discourse (Ocampo, 2006: 317). In particular, it investigates giusto as serving the function of invariant tag (Andersen 2001), a linguistic item appended to a statement for the purpose of seeking information, verification or corroboration of a claim (Millar & Brown 1979). Through lexicographic, quantitative and qualitative analyses carried out over a range of Italian historical (1729 - 1950) and contemporary dictionaries and written and spoken corpora (1200 - 2017), evidence of records of first appearance, frequency of occurrence, diachronic trends, and contexts of use of giusto? are retrieved. The results reveal that, although the use of giusto? as an invariant tag has not been documented by contemporary Italian lexicography yet, records of such a use are in fact found since 1980. Moreover, its high frequency of occurrence in the corpora suggests that giusto? represents a case of discoursivization. Finally, by analysing the distribution of the studied constructions in a corpus of dubbed Italian from (American) English, the study also explores the possibility that language contact with English, mainly via dubbing translations, may have played a concurrent fundamental role for such a change.


Variability of German Question Tags

In this work we analyze German question tags across different media channels. The large inventory of German question tags leads to great dialectal and pragmatic variability (ex. (1)). While tag questions (TQs) are characteristic of conversational speech, they also appear in scripted (The OpenSubtitles[1] corpus, (Lison and Tiedemann, 2016)), and written conversations (Twitter, (Scheffler, 2014)). We address two interrelated questions: (i) Which aspect of conceptual orality (Koch and Oesterreicher, 1985) in media channels facilitates the use of TQs? (ii) What do technological and social settings of the channels say about the pragmatics of the individual question tags?

  1. Du     musst             nicht              in                   die                 Schule,          {ne, oder, wa, ja, …}?
    You   must              not                 in                   the                 school,                                right?
    ‘You don't have to go to school, do you?’

We examine the occurrence of TQs in German Twitter, telephone (Karins et al., 1997) and scripted conversations through quantitative methods. We find that TQs are most frequent in telephone speech, although they also feature prominently in the other corpora (Figure 1). This indicates that TQs are an important method for establishing and maintaining common ground in conversations, whether spoken, scripted, or written. We further analyze the pragmatic context of a sample of question tags in the three corpora, including e.g. their co-occurrence with modal particles and the speaker's certainty (annotated based on examples in context). The usage pattern reveals significant differences regarding specific question tags across corpora. Overall, TQs frequently occur in all studied corpora, which points to the fact that they are licensed by interactive conversations rather than the spoken mode. 


Figure 1: Number of occurrences of German tags in the different types of media per 1 Mio. tokens.

Inter-speaker accommodation on backchannels in narratives

Backchannels like “yeah”, “mhm”, “aye”, or “right” have been shown to vary based on language (Clancy, Thompson, Suzuki, & Tao, 1996; White, 1989), variety (Cathcart, Carletta, & Klein, 2003; O’Keeffe & Adolphs, 2008), and speaker gender (Bilous & Krauss, 1988; Fellegy, 1995; Kogure, 2003; Reid, 1995). Speakers have been shown to accommodate (Giles, Coupland, & Coupland, 1991) to their interlocutors’ backchannel frequency both across (Ike & Moulder, 2017; White, 1989) and within languages (Schweitzer & Lewandowski, 2012).

Interactional and pragmatic studies demonstrate that backchannels vary based on function and sequential placement (Bavelas, Coates, & Johnson, 2000; Gardner, 1998; Goodwin, 1986; Guthrie, 1997; Norrick, 2012; Schegloff, 1982) as well as position in a story (Guardiola, Bertrand, Espesser, & Rauzy, 2012). However, these linguistic factors are not generally included in analyses of backchannel accommodation.

The present study addresses this gap by analysing inter-speaker accommodation on backchannel production in a set of six dyadic conversations between four Scottish participants (each participant takes turns talking to one of the other three). I focus on one interactional context, stories (Labov & Waletzky, 1966), and examine backchannel type and (normalised) frequency with respect to sequential placement, interactional function, and position in the narrative.

Initial results show firstly that there is variability in how often individual speakers produce backchannels in response to stories, and secondly point at an interlocutor effect, with speakers deviating from their mean backchannel frequency and moving towards their interlocutor’s backchannel frequency.

The difference between speakers’ aggregate backchannel behaviour might be indicative of backchannel use being socially stratified along axes of age, gender, and social class, which can be explored further in the full dataset. This preliminary analysis will then be combined with a qualitative analysis of the narratives and the responses to them, for example by comparing the same story being narrated to and received by two different interlocutors.


‘And it was all like weird’ – Some new uses of intensifiers in contemporary British speech

The variability and on-going innovation of intensifiers make them difficult to study. Recently we have acquired better opportunities to study on-going changes in the area of intensifiers on the basis of corpora and corpus-linguistic methods. The aim of this paper is to study the emergence of the intensifiers well, right, all (like) with adjectival heads in contemporary British speech on the basis of the new Spoken BNC2014 (Love et al. 2017) using as an earlier sampling point the ‘old’ Spoken BNC from the 1990s. The ‘new’ intensifiers well, right, all had their heyday in earlier English (cf Ito and Tagliamonte 2003, Rickford et al 2008), then lost in importance only to reemerge as innovations in the spoken language of adolescents as is evident from new spoken corpora:

          (1) ah it 's right cute this
          (2) they 're well old
          (3) I don't know but it looks all funny
          (4) it just goes all like crappy

The research questions are as follows:

- How frequent are the ‘new’ intensifiers in SpokenBNC2014 and what can we conclude about the quantitative changes they have undergone recently by making a comparison with their frequencies in the SpokenBNC1990?

-What is the relation between the intensifier and the linguistic (syntactic and semantic) context? How are the intensifiers used with different (trendy or common-place) adjectives and with positive and negative values (semantic prosody)?

- What is the relation between the frequency and use of the intensifiers and the speaker’s age, gender and social class? Does age-grading come into the picture?

Theoretically the study takes its inspiration and starting-point from the interest in the mechanisms through which intensifiers re-emerge in the spoken language of particular speakers and how these innovations interact with long-term, stable developments of the intensifiers (Macaulay 2006, Barnfield and Buchstaller 2010, D’Arcy 2015).


Cool system, lovely patterns, awesome results: A cross-variety comparison of adjectives of positive evaluation

Speakers in different places and different generations are known to vary markedly in their use of discourse-pragmatic phenomena, including general extenders (Cheshire, 2007; Tagliamonte & Denis, 2010), quotatives (e.g. Buchstaller, 2014), and adverbs (Aijmer, 2002; 2008). However, little research has been conducted on adjective variation (but see Tagliamonte & Brooke, 2014), despite the fact that adjectives are key resources for emotional expression (e.g. Matesic & Memisevic, 2016). Indeed, adjectives that encode positive affect add notable emotion and pragmatic nuance to vernacular discourse:

              It was fantastic. The singers were amazing and gave a great history of music.

This paper examines nearly 5000 of these adjectives in spoken community-based corpora of British and Canadian English collected in the late 1990’s and early 2001’s and employs quantitative comparative methods (Poplack & Tagliamonte, 2001) and statistical modeling to analyze the data.

The distribution of forms across generations mirrors the diachronic development contained in the OED: older forms, such as wonderful, amazing, and terrific are favoured by elderly speakers, while newer variants, such as super, fantastic, and brilliant are favoured by younger speakers. British English stands out for its use of lovely (32.3%) and recycling of an older form, great from oldest to youngest speakers (17.2% > 19.8% > 46.4%). Canadian English is distinguished by high rates of cool, (24.4%) and one of the most recent forms, awesome (5.7%), especially among speakers under 30. Mixed effects regressions expose important linguistic parallels between varieties: Older variants are favoured in attributive position, while newer variants favour predicative and ‘stand alone’ uses. Co-occurrence with intensifiers is positively correlated with speaker age. Taken together the results lead us to suggest that adjective variation is internally structured, but the forms themselves are highly sensitive to place, time and pragmatic force, opening up new possibilities for pinpointing what actuates linguistic change.


Discourse Values as indicators of pragmaticalization in Spoken British English – a diachronic view

This paper presents a diachronic analysis of discourse values for hedges sort of and kind of. Our aim is to evaluate discourse values, defined as the “discourse function in relation to grammatical function expressed (in per cent)” (see Stenström, 1990: 161; Aijmer, 2002: 27) and their potential contribution to the quantitative assessments of patterns of language change.

Beeching, who conducted a detailed analysis of a number of pragmatic markers (including you know, I mean, like, sort of, well, and just) in spoken British English contexts, highlights the usefulness of the discourse value (or D-value) for investigations into pragmaticalization, functional pragmatic marker development, and indexicalization (2016: 78-80).

This study expands on Beeching’s work and provides a comparative study of the discourse values of hedges sort of and kind of as found in subsets of the BNC 1994, as well as the newly compiled BNC 2014. Sort of and kind of are versatile pragmatic markers that can occur in pre- and postmodification on all syntactic levels (examples 1 to 4). Their propositional function can be defined as typification of noun phrases (see example 5), following analyses of Fetzer (2010) and Brems (2010).

          (1) They're sort of nasty to them.
          (2) You know that, sort of?
          (3) It just sort of reached your earhole.
          (4) They're nice sort of wrapped up.
          (5) Asked her what sort of coins she uses.

The analysis thus focuses on the changing relative frequencies of use between pragmatic contexts and propositional contexts and whether this change is in accordance with current patterns of pragmaticalization of sort of and kind of. It concludes with general commentary on discourse values as quantitative evidence for language change.

“It's just a little weird, is all” The development and use of sentence-final is all

Sentence-final is all has received little attention in the literature. Its use is a relatively recent development since the late nineteenth century (see example (1)), mostly restricted to colloquial American English (Delin 1992; Follett 1998, s.v. all):

1.      she lands on the floor almost at her husband’s feet, and one sharp little cry is all. (1883 COHA)

Shibasaki (2016) considers is all a quotative marker similar to BE all as defined in e.g. Rickford et al. (2007), but the citations in the OED (s.v. be v., def. P2h) suggest otherwise:

2.       Expensive? Naaaw. Three hundred, is all. (1939)

3.       You didn’t see the bus, is all. (1954)

Here, is all does not appear to represent reported speech so much as to refer back to the preceding text. We agree with the OED that in this use is all implies ‘that is all there is to be said’. In our data, speakers often use it to close a topic and to distance themselves from an unwanted interpretation of the preceding utterance, as in example (2).

This paper examines the historical development and pragmatic function of sentence-final is all drawing on data from various corpora (see references). We argue that sentence-final is all derives from postponed independent or conjoined that BE all, as in examples (4) and (5). Our historical data do not support Ando’s (2005) and Fujii’s (2006) suggestion that that is all may represent a shortening of a longer construction, such as that is all I say/mean.

     (4) I would but see him, That is all. (1600 EEBO)
     (5) I am not well in health, and that is all (1623 EEBO)

We conclude that a conversational implicature arose from that is all ‘do not infer anything more’, triggering the development of reduced is all towards a discourse-pragmatic marker.


General Extenders in English and Spanish among Southern Arizona Bilinguals

This study analyzes the use of general extenders in recorded conversations in English and Spanish between nine pairs of young adult Spanish-English bilingual friends from Southern Arizona. Building on previous studies of general extenders in English (Cheshire, 2007; Pichler & Levey, 2011; Wagner et al., 2014) and Spanish (Cortés, 2006; Fernández, 2015), 325 tokens of general extenders were analyzed quantitatively according to function (referential or nonreferential), length, sex, and language dominance. Linear and logistic mixed-effects models were carried out in R with random intercepts for each participant to take into account cross-individual variation.

It was expected that general extenders would be susceptible to borrowing in a language contact situation since discourse-pragmatic features often appear on the periphery of grammar and are detachable (e.g., Brody, 1995). However, in the speech of the same Spanish-English bilinguals, contact with English did not appear to influence the use of general extenders in Spanish. No English forms of general extenders were found in Spanish. Moreover, general extenders in Spanish were significantly longer and were used to fulfill referential functions more often than general extenders in English regardless of sex and language dominance. Lastly, referential general extenders were significantly longer than non-referential general extenders in both English and Spanish, mirroring the results of Wagner et al’s (2015) study of general extenders in English.

As the first study to analyze the use of general extenders in English and Spanish in the speech of the same bilinguals, these results contribute to our knowledge of the limited permeability of discourse in the speech of bilinguals. They also underline the ability of bilinguals to both understand and reproduce the subtleties of the use of these features in the two languages they speak.

The borrowability of English swearwords in Dutch: a variationist approach

This paper presents a study on contact-induced discourse-pragmatic variation and change, quantitatively addressing the borrowability of 882 English swearwords in Dutch and qualitatively studying the way in which these pragmatic units “assume, in addition to the expression of emotional attitudes, various discourse functions” (Dewaele 2004).

Methodologically, we aim to introduce innovations to research on  swearing and borrowability by relying on set-external proof to uncover which swearwords are more prone to borrowing than others (compare van Hout & Muysken 1994, Zenner et al. 2013). Instead of focusing only on the frequency of the English swearwords that are effectively attested in the Dutch lexicon, we start off from a (near-)comprehensive list of all potentially borrowed swearwords in English. This list was created by combining input from lexicographical sources (e.g. Rawson 1989) and online lists of swearwords (compare Wang et al. 2014).

Through a quantitative variationist analysis, we then verify what determines which of these English forms are attested in Dutch and which are not. Specifically, we rely on a Twitter corpus of six million Dutch tweets published in the Low Countries: as blending areas of registers and modalities, of proximity and distance, Twitter provides us with a rich empirical basis for conducting swearword research. Three types of predictors are included to explain the attested variation in borrowability: swearword-specific parameters (e.g. offensiveness ratings; see Dewaele 2016), contact-linguistic parameters (e.g. speech economy; see Chesley & Baayen 2010), and lectal parameters (the contrast between Belgian Dutch and Netherlandic Dutch tweets; see Ruette 2018). Multiple correspondence analyses and regression trees reveal a clear impact of both swearword specific (e.g. offensiveness ratings) and more general contact linguistic (e.g. speech economy) parameters.

No significant differences are found between the Belgian Dutch and Netherlandic Dutch data. In our interpretation of the results, we qualitatively discuss specific examples that demonstrate the linguistic creativity of both groups of language users in embedding the English bad words in otherwise Dutch tweets, and support the thesis that swearwords are highly similar to discourse particles (cp. Dewaele 2005).


Oh my god! / Herregud! What governs speakers’ choices of borrowed vs. domestic forms of discourse-pragmatic variables?

English exerts a major influence on other languages, and borrowing is a significant product of language contact. This includes the borrowing of discourse-pragmatic items such as politeness formulae, greetings, expletives, etc. (Andersen 2014; Andersen 2017; Peterson 2017; Terkourafi 2009). One intriguing issue that has received little attention in contact linguistics but which lends itself nicely to a variationist socio-pragmatic approach is the question of what motivates speaker’s choice of a borrowed form versus a domestic alternative form and how this choice is constrained by contextual factors. This paper considers English-based forms that are used as discourse-pragmatic items in Norwegian. I consider four different items, two polite expressions and two interjections, and their domestic alternatives. This includes please, used in requests alongside vær så snill, sorry used in polite excuses alongside jeg beklager/er lei (meg) for, expletives with fuck including what the fuck vs. domestic hva faen, and oh my god, used alongside herregud and similar forms. I introduce a methodology for coding each variant in terms of their illocutionary force and speech act type, with a view to exploring the pragmatic conditioning of the use of such pragmatic Anglicisms. The aim is to assess whether the forms in question can justifiably be considered variants of a discourse-pragmatic variable. Rather than seeing the English forms as replacing their domestic equivalents, I argue that we can see signs of a pragmatic division of labour due to differences in illocutionary force of the borrowed vs. domestic variants, e.g. such that Anglicisms are used in speech situations that are potentially less offensive, while domestic forms tend to be preferred where there is a greater need for face-threat mitigation. I explore four different corpora of spoken Norwegian, UNO, NoTa-Oslo, the Big Brother corpus and the Scandinavian Dialect Corpus. Since not all of these corpus data are conversational, the analysis is augmented with data from fictional dialogue drawn from a large text archive.


Variation and change in real time in two French-Canadian communities

This paper examines the use of consequence markers ça fait que, so, donc and alors in two genetically-related varieties of Canadian French. The study is based on corpora collected in the 1970s and 2010s in Montréal, Québec, a majority francophone environment, and Welland, Ontario, a minority francophone environment.

Blondeau, Mougeon & Tremblay (to appear) examined the variable use of consequence markers in the 2010s. They found that Montreal and Welland French are currently evidencing patterns of sociolinguistic divergence. In this paper, we probe further this divergence with a comparison of the community trends over four decades and an analysis of variation across the lifespan for a cohort of 12 speakers in each community. This analysis aims to see how these speakers have positioned themselves vis-à-vis the changes underway in each community. Statistical analysis revealed that, over the four decades, in Montreal, vernacular (ça) fait (que) rose, English borrowing so remained absent, and standard alors decreased sharply and lost out to donc. However, in Welland, so rose sharply and concomitantly (ça) fait (que) decreased and, while alors and donc have competed as standard variants, they both underwent a moderate decrease. Analyses of the effects of social factors reveal complex patterns of change in both communities, driven primarily by gender and SES in Montreal and by bilingualism and SES in Welland. The analysis of change during the life-time of the twelve speakers shows that their socio-biographic trajectory (e.g., occupational history) explains why some speakers participate in the ongoing changes while others remain stable or even retreat from them. Mougeon et al. (2016) found that in the 1970s, Welland and Montreal French shared the same sociolinguistic norms; the present study shows that this is no longer the case and that, as far as the consequence markers are concerned, the seeds of change were already present in the 1970s.


Cross-varietal differences in prospective/retrospective preference: The perception of final connectives by Irish and American English speakers

Irish English is known as having the retrospective (or final particle) use of but and so in clause-final position (Hickey 2007, Amador-Moreno 2010, Kallen 2013) as in:

(1) It’s all that it is Janie it’s muscular spasm but                             (SPICE-Ireland, P1A-053)
(2) It was you opened the curtains so                                               (SPICE-Ireland, P1A-050)

On the other hand, Mulder & Thompson (2008) observe the lack of final particle but in American English in their comparison with Australian English.

These observations were supported by a comparison of two spoken corpora of American and Irish English (Santa Barbara Corpus of Spoken American English and the comparable categories of SPICE-Ireland, P1A-001 to P1A-100 and P1B-001 to P1B-020). In SPICE, there were nine tokens of the retrospective use of final but and eight tokens of such use of final so, but neither of those final connectives were attested in SBC. This result partly points to a marked preference for the final-tag construction in Irish English, where final tags are defined as retrospective types of pragmatic markers.

Our study further carried out a questionnaire survey targeted on Irish and American English speakers to explore their perception of the two connectives in final position. Given that final connectives allow two possible interpretations (prospective/final hanging and retrospective/final particle), we investigated how final connectives would be interpreted by native speakers of the two varieties. The survey result reveals that the retrospective readings were more favored by Irish English speakers, but the prospective ones were more dominant among American English speakers. An additional interview survey was conducted to ensure a more accurate understanding of the final connectives by American English speakers. The findings verify that Irish English manifests a greater degree of the constructional entrenchment of final-tagged structures, a consequence of which may result in interpretive discrepancies between speakers of the two varieties.


Three vernacular determiners in York English: evidence for discourse-pragmatic factors in grammaticalization trajectories

The variety of English spoken in the city of York (UK) has three vernacular determiners: a zero article, a reduced, vowel-less determiner (Jones 1999), and a complex demonstrative construction of the type this here NP. They are illustrated in (1) with data from the York English Corpus (Tagliamonte 1996–1998).

(1)     (a) And when Ø river come up it used to flood up. (Gladys Walton, 87) [ZERO]

         (b) Does ? teacher play it on the guitar? (Mark Aspel, 24) [REDUCED]

         (c) What is that there red book do you know? (Albert Jackson, 66) [COMPLEX]

In research with Sali Tagliamonte (Rupp & Tagliamonte 2017) we have asked: Why do these vernacular determiners occur in York English, and what is their social and grammatical function? We have probed the occurrence of the vernacular determiners from the joint perspective of language variation and change, historical linguistics and discourse-pragmatics. We have conducted both qualitative and quantitative multivariate analyses (Goldvarb; Sankoff, Tagliamonte and Smith 2015) of the contemporary York English Corpus (1.2 million words; using a socially stratified subsample of 50 speakers) and several historical corpora (including The Oxford English Dictionary and The Penn-York Computer Annotated Corpus of a Large Amount of English 1473-1800).

Noting that the definite article first emerged in the north of England (McColl Millar 2000), we postulate that the vernacular determiners are best understood as representing different stages in the grammaticalization trajectory of the definite article (Greenberg 1978; Lyon’s 1999 Definiteness Cycle). We demonstrate that the three vernacular determiners have acquired new social and discourse-pragmatic uses of conveying Yorkshire identity (Tagliamonte and Roeder 2009), psychological distance (Johannessen 2006) and discoursenew, hearer-old information (Prince 1981). We conclude that rather than having disappeared, the determiners have remained productive. Following Traugott (1995) and Epstein (1995), we envisage that discourse-pragmatic factors may influence grammaticalization trajectories.


Insubordination and intralinguistic variation: a quantitative corpus analysis of insubordinate subjunctive complement clauses in varieties of Spanish

This paper presents a quantitative corpus analysis of intralinguistic variation with independent subjunctive complement constructions in Spanish, as in (1):

  1. ¡Que te ayude Antonio!
    ‘Antonio should help you!’

This example illustrates the phenomenon of insubordination, the use as an independent clause of formally subordinate clauses (Evans 2007). On the one hand, the construction includes an initial complementizer and a verb in the subjunctive mood, which is typical of subordinate complement clauses. On the other hand, it is syntactically and pragmatically independent, since no candidate main clause material occurs or can be reconstructed in the speaker’s turn or in the preceding turns.

Insubordinate constructions express similar functions to those of discourse markers: interactional, modal and discourse-organizational. In particular, insubordinate subjunctive complement (ISC) constructions can express either (third-person) orders (1), wishes (2) or quoted orders (3) (Sansiñena 2015):

   2.    ¡Que pases un buen día!
          ‘I hope you have a good day!’

   3.     A: ¡Ven!
           B: ¿Qué?
           A: ¡Que vengas!
          ‘A: Come!
​​​​​​​          B: What?
​​​​​​​          A: I told you to come!

Our study addresses understudied aspects of insubordination. Firstly, it has been repeatedly shown in the literature that this phenomenon is found cross-linguistically in spoken language (Evans 2007, Dwyer 2016). Less attention has been paid to the fact that some cases of insubordination can also be found in written genres.  Secondly, some studies have shown that there are inter-linguistic differences in the availability of specific semantic types among closely related languages, in such a way that some languages only allow for some of the meanings/functions shown in (1)-(3) (Verstraete and D’Hertefelt 2016). However, the possibility that variation occurs between regional varieties of the same language remains relatively unexplored (cf. Gras & Sansiñena 2017, Corr 2018).

The aim of this paper is to identify possible instances of variation in the distribution of meanings/functions of ISCs in different regional varieties of Spanish, and pinpoint which aspect of oral language is a determining factor to explain variation amongst genres. For this purpose, we conducted a quantitative analysis of a corpus that represents three varieties of Spanish (Peninsular, Chilean, and Argentinian). Each subcorpus consists of two oral genres (conversation and interview) and two written genres (social media and news reports), which vary in the level of formality, interaction and intersubjectivity.

Preliminary results indicate that all meanings/functions of ISCs are available across all varieties under study –suggesting that they constitute a well-entrenched element of Spanish grammar--, while they are mostly found in spontaneous interactional discourse, either spoken or written.


Putting the Romance back into reported speech: Evidence from Quebec French, Acadian French, Brazilian Portuguese and Italian

Research on English identifies the quotative system as the locus of rampant variability and innovation (Buchstaller 2014). Although other languages are reportedly witnessing the emergence of new quotatives (e.g., Foolen 2008; Buchstaller & van Alphen 2012), many such claims fail to situate the candidate for change within the wider variable system in which it is emerging and/or do not adduce any real- or apparent-time evidence to demonstrate what has changed.

We address these shortcomings by conducting a comparative variationist analysis of 3,600 quotative tokens extracted from 197 speakers representing four vernacular varieties: Quebec French (QF), Acadian French (AF), Brazilian Portuguese (BP) and Italian (ITA), recorded respectively in Ottawa-Gatineau (2014), north-east New Brunswick (2013), São Paulo (2009-2013; see Mendes 2013), and different regions of Italy (2005; see Cresti & Moneglia 2005). We use these datasets, each containing an apparent-time component, to: (i) identify cross-linguistic patterns of quotative variation; (ii) probe evidence of linguistic change; and (iii) compare trajectories of change in different varieties by operationalizing measures of advanced grammaticalization (e.g., using grammatical person, content of the quote, etc.; see Ferrara & Bell 1995).

The results turn up a number of key findings. In addition to generic speech verbs, all varieties, except ITA, contain quotatives incorporating markers of similarity/manner (e.g., QF/AF être comme ‘be like,’ BP assim ‘like this’). Apparent-time change in BP (involving quotative tipo ‘type’ in speakers aged < 30) and ITA (involving a speaker nominal quotative) is relatively incipient, but it is more salient in QF and AF, where the innovative variant être comme has progressed further along the cline of grammaticalization in QF than AF.

Taken together, the results call for a more circumspect and quantitatively informed assessment of claims that the quotative systems of typologically related languages are experiencing independent parallel innovations (Buchstaller & Alphen 2012: xii).


A multi-dimensional, multi-functional and multilingual account of discourse marker variation

Discourse markers (henceforth DMs) are the focus of a very rich field of study, investigating their many forms and functions in various languages. However, they are still rarely studied onomasiologically, especially in spoken multilingual data, as opposed to the bulk of contrastive case studies. This presentation aims to analyze the variation in use and functions of a broad bottom-up selection of DMs across three languages from different typological families, namely French (Romance), English (Germanic) and Polish (Slavic). Such an endeavor requires not only to overcome issues of definition and delimitation of the DM category, accounting for the diversity of their forms in different languages through an operational tertium comparationis (Krzeszowski 1981), but also to design an annotation model encompassing their full functional spectrum, in the perspective of spoken discourse analysis.

Our study follows a corpus-based methodology based on Crible & Degand’s (in press) multilingual annotation scheme for functions of (spoken) DMs. The functional taxonomy distinguishes between four domains (ideational, rhetorical, sequential, interpersonal) that may be combined with eleven functions (e.g. cause, contrast, topic-shift). This taxonomy with two independent levels has been applied to spoken unplanned dialogues in the three languages (approx. ca. 30 minutes; between 5000 and 6000 words), resulting in the identification of 286 DMs in English (30 types), 442 DMs in French (35 types), and 847 DMs in Polish (48 types). The annotations were extracted for contrastive analyses of distribution and variation of DMs and their functions. The results indicate that the multilingual annotation scheme may be validly applied to the three different languages, demonstrating that semantic equivalence of DMs attested in different languages does not necessarily lead to functional and distributional similarities between them (e.g. in the case of English you know, French tu vois and Polish wiesz). Currently the annotation scheme is tested on additional languages (Slovenian, Spanish, L2 English, Brazilian Portuguese).


