Transcribing Cambridgeshire dialect speech: problems and solutions

Anna-Liisa Vasko

The purpose of this discussion is to supplement the information given in Vasko (2010), which is a revised and extended version of Ojanen (1982).

This discussion focuses on principles and conventions used in transcribing Cambridgeshire dialect data. The aim is to give a short account of the problems associated with transcribing spontaneous spoken interview material and discuss how these problems have been solved.

1. Spontaneous dialect speech

The Cambridgeshire dialect speech material for Vasko (2010) was collected through interviews which were recorded on tape. In order to ensure that the informants’ speech would be as close as possible to their natural everyday speech, the informants were encouraged to choose their own topics. A spouse, relative or friend was often present during the interviews, which were recorded in the informants’ homes. There were occasions when the informant lit a pipe, turned away from the microphone or was interrupted by another speaker, or when the dog was barking, etc. This informality was intended to create an atmosphere conducive to spontaneous and free conversation. For a more detailed discussion of the principles of collecting the material, see 2.1.1 and 2.1.2.

This atmosphere of free conversation during the interviews necessarily resulted in material which is far more difficult to transcribe than, for instance, the material collected in guided conversations or by questionnaires. Continuous spontaneous dialect speech is sometimes only minimally intelligible. Characteristics of informal spoken English are often ‘taken a step further’ and occur in ‘more advanced’ forms. In the typical on-going flow of dialect speech, clausal and non-clausal elements are ‘woven’ together even more freely than in more formal varieties of the spoken language. Dialect speech is created in real time, ‘on the spot’. Speakers proceed without a premeditated plan and often change the structure in the middle of an utterance. Consequently, like informal speech in general, dialect speech is characterized by pauses, hesitations and repetitions. To save time and energy, speakers often reduce the length of what they want to say by not expressing the words which are not essential for understanding. This often results in ellipsis when compared to written StE structures. Naturally, from a dialect speaker’s point of view nothing is felt to be ‘missing’. For a more detailed discussion, see Characteristics of Cambridgeshire dialect speech.

2. Ethical questions

The transcriber also has to consider ethical questions when making public material which was produced in a private setting. The informant and other persons present in the interview were always asked for permission to record them. In the transcriptions, the people referred to are anonymised by replacing their names with asterisks. Only exceptionally, for the sake of the clarity of the example and the interpretation, is the full first name or surname given in the transcription. In this case, the first name is not necessarily the name given to the person at his/her baptism but a pseudonym, and the surname is a common name such as Smith. [1] After the illustrative examples, the speakers’ initials are given in brackets together with the locality of the recording, e.g. (Willingham SS). The initials of the informant are given before the example if the example is an extract of a dialogue between two informants, e.g. CM: What year would that be when I went up = the co-op? – EM: When I went in hospital. In addition, if an informant’s friend or relative speaks during the interview, their replies are also labelled, e.g. MRS H. In two-informant interviews which take place at the home locality of one of the informants, the locality of the other informant is also indicated. [2]

3. Transcriptions for the study of morphological and syntactic features

For the purposes of research, material collected by the tape-recorded-interview method needs to be presented in some written form, i.e. it must be transcribed. The type of transcription naturally depends on the purpose of the transcription, i.e. the type of research for which the transcription is intended (Tagliamonte 2006: 54). The need to select areas of particular importance is in part imposed by the enormously complex nature of dialect speech and speech events (Jaffe 2000: 500-501). Research objectives vary and, thus, the amount of detail required in the transcriptions also varies. [3] For instance, in his Grammar of the Dialect of the Bolton Area, Shorrocks (1998 -1999) gives examples of grammatical forms and constructions (Part II) mostly in phonetic script (i.e. IPA). It is true that, for linguists, the best way to represent speech sounds is to use the unambiguous International Phonetic Alphabet (IPA) (Cameron 2001: 40-41). However, a very detailed phonetic transcription will be inaccessible to non-specialists. Even for the specialist reader, such a transcription can pose real challenges for the grasp of spontaneous dialect speech flow.

The Cambridgeshire dialect speech data were collected particularly for the purposes of morphological and syntactic research. Orthographic transcription as described below was considered detailed enough to guarantee an effective and versatile analysis of the morphological and syntactic features of the data.

The data for Vasko (2010) (for details of the data, see 2.1.1, 2.1.2) were transcribed in their entirety by the present writer. This transcription of continuous free dialect speech was a laborious task, requiring frequent consultation of such resources as the detailed field diary (cf. Penhallurick 1985: 223-233; Shorrocks 1980: 102-105) and local experts on the dialect. At the first stage of the transcribing process, I followed the same principle as Glauser (1984: 42) in his study of Grassington speech: “If one does not know what exactly are the important features, it seems wise to take down what one can.” I wrote down exactly what I heard without trying, at that stage, to interpret the word or construction in question. In discussing the problems connected with the transcription of dialect speech, especially from the point of view of a non-native speaker, Melchers (1996: 164) points out the necessity of consulting an expert on the dialect. She considers this idea to be confirmed by the concept of ‘outsiders’ and ‘insiders’, as developed by Labov and the Milroys. However, Melchers (1996: 162) also warns that when the transcriber becomes more familiar with the speech he/she is listening to, there is a risk that he/she may hear what he/she expects to hear. Paradoxically, then, as Melchers suggests, “a more naïve listener, an ‘outsider’, would constitute a more truthful observer of actual realizations than someone well versed in the variety investigated”.

The transcriptions by the present writer, and especially the illustrative examples in Vasko (2010), have been checked and verified by Joseph McVeigh and Simo Ahava at the Research Unit for Variation, Contacts and Change in English, University of Helsinki. (For more information on the regional British English speech project and the sample of Cambridgeshire speech, see

Since no standard, unanimously accepted methodology for orthographic dialect transcription exists, transcribers are left to make the decisions about practical approaches themselves. The goals of the transcriptions in Vasko (2010) are well formulated by Tagliamonte (2006: 54). Tagliamonte summarizes these goals as follows: the transcriptions must be (1) “detailed enough to retain enough information to conduct linguistic analyses in an efficient way” and (2) “simple enough to be easily readable and relatively easily transcribed.” Tagliamonte further observes that “maintaining a workable balance between these two goals is the key component of any corpus construction endeavour.” (See also Beal et al. 2007b.) Moreover, Tagliamonte (2006: 55) emphasizes that the most important aspect of creating a uniform and coherent data set for research is to ensure that the speech material is presented faithfully and consistently. This means transcribing exactly what the person said regardless of whether it follows the rules of the standard language.

However, there are also unavoidable limits to any type of transcription. As Tiittula (2001: 8) points out, transcriptions necessarily lack some of the phonetic and prosodic information that would fully represent the complexity of the spoken language. Tiittula further specifies that any possible similarity between speech and writing relates to the language as it exists without any paralinguistic or non-verbal features. The modes of speech and writing (acoustic-vocalic vs. graphic; auditory vs. visual) are completely different; thus, there are necessarily features which are present in one and cannot be fully represented in the other. For example, speakers may talk simultaneously, but in writing this has to be presented in a linear form.

Any transcription, no matter how detailed, is an interpretation of what exists on the audio record. Due to these limitations of transcription, audio recordings are the primary documentation of the material. For details of the Cambridgeshire recordings for Vasko (2010), see 2.1.2.

The major principles used in transcribing the Cambridgeshire data are in accordance with the Transcription Protocol put forth by Tagliamonte (2004), (2006) and (2007). Tagliamonte summarizes the principles as follows:

(a) phonological processes are typically not represented

(b) morphological alternations that are phonologically marked are orthographically represented

(c) local dialect lexis and slang that does not appear in dictionaries is recorded consistently.

The audio recordings for Vasko (2010) have been transcribed into British English orthographic representation. Since sound files for Vasko (2010) are provided, indications of phonological processes are generally not needed in the orthographic transcription. Semi-phonetic spellings, such as those presented in literary dialect or ‘eye-dialect’, are not used. [4] In detail, the orthographic conventions are as follows:

(1) All aspects of dialect grammar are preserved in the transcription. For instance, if the informant uttered I’m heard, this is used instead of the StE I’ve heard. Similarly,

      (a) the unmarked third-person singular of the simple present, e.g. he live, is distinguished from the marked, he lives, and

      (b) the unmarked plural of the noun, e.g. two mile, is distinguished from the marked, two miles, etc. cf. e.g. Tagliamonte (2004).

(2) Where the word or construction is morphologically (or morpho-phonemically) determined, a variant spelling is used, e.g. me for StE my and one on ’em for one of them. cf. e.g. Beal et al. (2007b), Tagliamonte (2004), (2006).

(3) Variant spellings wa and war are used for the simple past form of the verb be to indicate that the form is neither the StE was nor the StE were but an intermediate variant. [5] The intermediate variant of the negative simple past of the verb be is spelled wont to indicate that the form is neither the StE wasn’t nor the StE weren’t. [6]

(4) Transcriptions include dialect words. If the word is found in dialect dictionaries (e.g. Wright 1896-1905) or dialect literature (e.g. Porter 1969), this form is used. If the word has different spellings in different sources, as is often the case, the spelling closest to the utterance is chosen. If the word is not found in any dialect dictionary or literature, it is transcribed as closely to the utterance as possible. [7] For the list of these words, see Ahava (forthcoming).

(5) Response tokens (non-lexemes) such as mm, ah, etc. are preserved, since there is meaning attached to them that can influence how the conversation proceeds and, thus, the analysis of the data. [8]

In addition, the following transcription conventions have been used:

(1) Where the omission of sounds or words is potentially confusing, the omitted sections are included within square brackets, as in I[’d] go and I [have] gone. Also included in square brackets are any words added in order to clarify the meaning, e.g. two [pails] a dinnertime.

(2) Punctuation marks (full stops, commas, question marks, etc.) are used. However, identifying sentences and clauses is often problematic in spontaneous dialect speech. For the notion of a sentence, see Characteristics of Cambridgeshire dialect speech. Thus, punctuation marks do not strictly follow the rules of the standard language.

(3) An interruption in speech is indicated with the equals sign (=) when the interruption is not considered to be the end of a sentence or clause, e.g. I were = born here = in this house.

(4) Re-starts and partial words are represented with single hyphens, e.g. I’m- I’m been to my father’s bottle and You didn’t ou- ought to let him [have] had it.

(5) The commentary on involuntary sounds such as coughing, sneezing, sniffing, laughing and crying is given in capital letters and included in square brackets, e.g. [LAUGHING]. These may be meaningful in the analysis.

(6) Extra-linguistic information, such as remarks concerning noise, is also indicated, e.g. [DOG BARKING].

(7) Non-verbal communication such as pointing, nodding etc. is indicated wherever it is felt to help the understanding of the content of a conversation.

(8) Conversation is inherently overlapping. The practice used is to transcribe the speech of the first person, then the second, etc., even though many interchanges overlap.

The principles outlined above are used in the presentation of illustrative examples and extracts of the Cambridgeshire dialect speech for Vasko (2010). For the complete transcriptions of the interviews, see Ahava, Sampler of Cambridgeshire speech (forthcoming).


[1] Tagliamonte (2006: 51-52) “utilised the speakers’ own initials but filled in new ethnically consistent names”, i.e. Tagliamonte used pseudonyms, such as “Katy Webster = Katherine Walters” and “Bobby Hamilton = Barry Hatfield”. The corpus/community was also indicated, e.g. York (YRK).

[2] In the audio recordings, the names are 'cut off'. For more information on the regional British English speech project and the sample of Cambridgeshire speech, see; I. Regional English speech project at the University of Helsinki.

[3] There is great variation in the ways different researchers have chosen to represent speech using the written medium. This reflects the special difficulties which apply to the transcription of speech. For instance, transcription practices can be thought of in terms of a continuum with two dominant modes: naturalism, in which every utterance is transcribed in as much detail as possible, and denaturalism, in which idiosyncratic elements of speech (e.g. stutters, pauses, nonverbals, involuntary vocalizations) are removed. For a more detailed description of these two types of transcriptions, see, e.g., Oliver et al. 2005. Between the two extremes of naturalism and denaturalism, there are endless variations using elements of each to achieve certain analytical objectives and research goals.

[4] Preston (1985) suggests that ‘eye-dialect’ should be avoided; in “The L’il Abner Syndrome: Written Representations of Speech”, this is defined as “forms which reflect no phonological difference” (1985: 328). Another objection to eye dialect is the fact that it tends to make non-standard speakers appear more different than they really are from standard speakers (Cameron 2001: 41). Macaulay (1991), Kirk (1997), Preston (2000) and Beal (2005) note the unreliability of non-standard spelling such as ‘eye-dialect’ in representing vernacular Englishes orthographically.

[5] For intermediate variants of the simple past of the verb be, see Ahava (2010), Richards (2010), Vasko (forthcoming).

[6] For the intermediate variant wont of the negative simple past of the verb be, see, e.g., Hazen (1998), Vasko (forthcoming).

[7] For Cambridgeshire dialect vocabulary in particular, see, e.g., Porter (1969: 400-401), who includes a “Select Glossary of Cambridgeshire Words in use within living memory”. For dialect vocabulary, see also Trudgill (1999: 101-124), in which there is a discussion of dialect vocabulary from the point of view of regional variety and origin, and Peitsara (1995: 120), which deals with East Anglian vocabulary.

[8] Response tokens (hm, ok, ah, yeah, etc.) are often neglected as either inconsequential or extraneous. However, research has shown that such tokens can provide a great deal of insight into the informational content of the conversation. For instance, Gardner (2001) offers researchers a typology of response tokens and an indication of their use and intent. Among the most common are the following three: (a) continuers, such as mm and hm, which are used to note agreement with the speaker and give them back the primary role in the conversation; (b) acknowledgements, such as mm and yeah, which express agreement or understanding between a speaker and a listener; and (c) repairs, such as huh, which ask the speaker to rephrase or repeat an idea or question.


