Each interview is provided with five parameter codes adapted from the
Helsinki Corpus of English Texts (HC). The parameters give
- B: the HD file code (<B DICAM13>)
- N: the county, village, informant information (|N_CAM_LANDBEACH_SJ)
- Y: the age(s) of the informant(s) (<Y 86>)
- X: the sex of the informant(s) (<X FEMALE>)
- H: the occupation(s) of the informant(s) (<H HOUSEWIFE>)
After the main parameter codes the following details can also be seen:
- the digital archive code (Dig. CAM32A)
- the length of the tape in minutes (47:37 min)
- the year or date of the recording with the fieldworker's initials (Rec 1974 by AO)
- the initials of the transcriber (Transcribed by AO)
- the total and the stripped word counts (Our WC 6,075, stripped WC 5,796)
- the number of archive pages (41 archive pages)
- the date of the final computer manuscript (CMS 9.1.2001)
[Dig. CAM32A. 47:37 min. Rec 1974 by AO. Transcribed by AO. Our WC 6,075, stripped WC 5,796.
41 archive pages. CMS 9.1.2001]
The corpus contains orthographic transcriptions of spoken dialect speech. The most important aspect of creating a uniform and
coherent corpus is to ensure that the recorded speech data be presented faithfully and consistently. This means transcribing exactly
what the person said regardless of whether it follows the so-called 'rules' of the standard. Thus all the utterances of the informants
in the dialect corpus are described as carefully as possible, including repetition, hesitation, interrupted words and inarticulated
sounds, i.e. features that are not present in written standard language. The transcriptions in WordCruncher are untagged and devoid of
IPA (International Phonetic Alphabet) symbols.
Then = the ol' Turner started, a big red bus, like the buses we got now. And = course I mean they used to = that was just the jobs = to
go on that an' course what we = whatever did we used to pay to go on that for a start, very, very little, well, I mean what we have to
pay now. [LAUGHING] Yes. (Editorial conventions are discussed below.)
The principles of compilation and the editorial conventions used in the Helsinki Corpus of British English Dialects (HD)
follow those in the Helsinki Corpus of English Texts (HC), differing mainly in the emphasis on spoken material (HD) versus written material (HC).
Since no standard, unanimously accepted methodology for transcription exists, the transcribers are left to make the decisions on any
practical approaches themselves. Problems the transcriber must face include, for example, the dichotomy of detail versus readability. The
transcription should retain enough information to facilitate efficient linguistic analyses but should also be simple enough to ensure
readability. The transcriber must also decide to what extent to describe discourse features, i.e. overlapping speech, prosody, hedges,
etc. Too detailed description of discourse will render the transcription difficult to use in research other than discourse analysis. Also,
dialect features in the lexicon must be distinguished from those of the so-called "eye-dialect", i.e. words that are spelled so as to
look like dialect but the pronunciation of which doesn't actually differ from standard speech (ev'ry, hun'red, gran'mother).
A typical feature of dialect speech is that it proceeds in long sequences of paratactic units, with
little connection between pauses and grammatical boundaries, and even without any indications of changes in
the topic. Thus it is necessary for transcribers to define the concept of sentence in the context of their research. Since spoken language hardly follows
the grammatical structuring of sentence and clause elements, some other method must be used to describe the speech unit (e.g. Anna-Liisa Vasko
uses the term meaning unit in her research). Nevertheless, orthographic transcriptions of dialect speech conventionally
contain sentence-final punctuation marks (full-stops, question marks and exclamation marks). These can reflect
a variety of cues (e.g. pauses in the speech, falling intonation) in addition to grammatical structure.
Thus, although the punctuation in dialect transcription is intended to help the reader, it does not
necessarily follow the rules of the standard.
Sample of transcription from the Cambridgeshire subcorpus:
[Q: DID YOU DO ANY DITCHES?]
Ditches as well. They used to dug out, they uset' clean all around the ditches out, by hand = spade and shovel.
[MG: IN THE WINTER?]
Yeah. Now they got mechanic diggers now = do all on it. 'At 's done away all that. [LAUGHING] Ah yeah. An' they
don' supply half the labour = not half the labour 's supplied now, I known twenty-five worked down here
where I been work.
Twenty-five there used to be down there. Now there 's eight = run the lot. That 's includin' stockman
an' all. They let you know the difference roun', in farmin' today. Ha ha. Yeah. (West Wickham CC)
When referring to a passage from the corpus, the village and the informant's initials are given in brackets at the end
of the passage.
Editorial conventions used in HD are as follows:
The apostrophe is used to indicate sound-dropping in word-initial position ('way for away),
and word-final position (tha' for that).
Sample content from the Cambridgeshire subcorpus displaying the variety of editorial conventions used in the corpus:
[MH: SHALL YOU GET TO PETERBOROUGH SHOW?]
[TR:] No, nor shan't I. I haven't been for several years =
[EF:] %---% I 'm too old to go with 'orses.
[TR:] = I ain't been for several years.
[EF:] [SIMULTANEOUSLY] %---% horsekeeper.
[EVERYBODY TALKING AT THE SAME TIME]
[TR:] Th' last time I were went t' my cousin *'s, somebody else wanted to go = "Well," I said, "you =
[MRS H:] [SIMULTANEOUSLY] Nobody kept horses like that.
[TR:] = you take him an' I 'll stop at home." And when they were comin' home they had to pull in a lay-by, an' as the big boaters went by the water, went right over th' top of their car. I were glad I didn't go.
[MH: HOW MANY WOULD YOU HAVE AT ONCE, **?]
[EF:] I used to have twelve.
[MH: TWELVE. YEAH.]
[EF:] Five in the mornin' till six at night = sometimes seven.
[TR:] Ah, I 'm pleased t' see you, **. I I knowed you were about [EF LAUGHING]. Yeah, if ** * 's a-comin', you come last Saturday night, didn't you? (Rampton TR+EF)