Helsinki Archive of Regional English Speech – Cambridgeshire sampler (HARES-CAM)

HARES is a collection of audio-recorded interviews that were gathered in England in the 1970s and 1980s. The fieldworkers were Finnish graduate and post-graduate students from the University of Helsinki, who shared a common interest in the study of dialect syntax. The informants were elderly persons who had lived in the region all their lives and who had left school at an early age. HARES combines the digital audio files with orthographic transcriptions and XML-annotated metadata.

The Cambridgeshire sampler contains 20 interviews recorded in 15 villages. The interviewees are 'typical' HARES informants, in that they are elderly (the youngest is in his 60s and the oldest in his late 90s), non-mobile (the informants have been born and bred in the area, with little or no time spent away), minimally educated (having left school at an early age) and rural (with professions such as horsekeeper, farmer and housewife). The sampler represents a 'modern' trend in dialectological surveys, because some of the interviews have women as primary informants.

Project leader: Anna-Liisa Vasko
HARES team: Simo Ahava – project design, audio digitisation, audio post-processing, audio transcription, XML schema design, XML annotation, manual; Joseph McVeigh – audio transcription, XML annotation; Alice Beal – casual assistance (Summer 2009), audio transcription.
Size: 20 interviews totalling 18 hours, 18 minutes and 40 seconds in length and 1,050,824 kilobytes in size.
Language: English, rural.
Time periods: 1970s and 1980s
Status: Completed in 2010, now available for research use only.
Corpus data: The corpus comprises of the audio files and XML and plain text annotated transcriptions.
Funding: Research Unit for Variation, Contacts and Change in English (VARIENG; 2008–2010); City Centre Campus Online Services (2008–2009).

Reference line and Copyright

HARES-CAM = Helsinki Archive of Regional English Speech – Cambridgeshire sampler. 2010. Compiled by Ahava, Simo, Joseph McVeigh and Anna-Liisa Vasko at the Department of Modern Languages, University of Helsinki.

To refer to the corpus data, indicate which interview you are citing in parentheses after the excerpt (interviewID-hares). For example:

(1) then used to take the horses home and <pause/> clean them and feed them (cam13-hares).


Simo Ahava, Joseph McVeigh and Anna-Liisa Vasko.


Ahava, Simo. 2010. Manual for the Cambridgeshire sampler. University of Helsinki.

File structure

  • /HARES-CAM/hares.rng – The schema for the XML files (RELAX NG)
  • /HARES-CAM/hares.xml – The corpus master file (XML)
  • /HARES-CAM/manual.pdf – Manual (PDF)
  • /HARES-CAM/quickstart.pdf – Quick Start Guide & Reference Sheet (PDF)
  • /HARES-CAM/AUDIO/cam01…cam20.mp3 – The audio files (MP3)
  • /HARES-CAM/TXT/cam01…cam20.txt – The interview files (plain text)
  • /HARES-CAM/TXT/ex-ir.tag – Tag settings for WordSmith Tools
  • /HARES-CAM/TXT/tags.tag – Tag settings for WordSmith Tools
  • /HARES-CAM/XML/cam01…cam20.xml – The interview files (XML)


HARES-CAM is available for research use only. Please contact simo.ahava(AT) for access permission.

CoRD Entry submitted on October 7, 2010 by Simo Ahava, Department of Modern Languages, University of Helsinki.