The Salamanca Corpus. Digital Archive of English Dialect Texts. (SC)

(Entry based on project webpage.)

Project leaders: María F. García-Bermejo Giner
Time of compilation: 2011-
Size: 6,115,267
Language: EModE, LModE
Period: 1500-1950
Project home page:

The Salamanca Corpus has been possible thanks to the generous financial support of the Spanish Ministry of Education and Science. Two research grants have so far funded our investigation:

1. Title: “Variación lingüística en el Inglés Moderno Temprano: Dialectos y sociolectos marginados en el proceso de estandardización” (PB98-0258).Period: 30/12/1999-30/12/2002Main researcher: Dr. Gudelia Rodríguez Sánchez.
2.Title: “Idiolectos y sociolectos ingleses marginados en el proceso de estandardización desde fines del siglo XVI hasta mediados del siglo XX” (BFF 2003-09376).Period: 10/12/2003-09/12/2006. Main Researcher: Dr. María F. García-Bermejo Giner.
We are also grateful to the University of Salamanca both for granting us space in their server for this web-page and for permanently hosting this electronic Corpus at the University Digital Archive: GREDOS.

The Salamanca Corpus (SC) attempts to give a sizeable sample of regional linguistic features as used in pre-1974 English counties throughout history. It is not our intention to substitute any other kind of information. Yet, literary data may prove particularly useful for time spans when regional documents are rather scant. The compilation of the corpus has been undertaken according to specific criteria.   

    Chronologically, the corpus covers a time span that extends from the 1500s up to the twentieth century. As can be seen in the present database, texts have been arranged into distinct chronological sections which have been devised with the aim of enabling longitudinal diachronic research. In particular, three broad time periods have been established: 1500-1700, 1700-1800, 1800-1950. This will allow for the comparative study of specific features across time, namely dialect spelling practices in the rendition of a particular variety, syntactic markers, morphological patterns, etc. Obviously, it will depend on the specific demands of a particular search, or scholar, that the diachronic facilities of the corpus will be applied in one way or another.

   Typologically, the corpus is literary restricted. It is worth emphasising that, as shown in the present database, different genres have been considered irrespective of their literary merit: cases of drama, prose and verse. Furthermore, the documents selected have been classified in terms of the type of dialect representation, namely literary dialect and dialect literature (see links on the left). It is fairly impossible to offer a balanced number of texts which are, for example, representative of dialect literature, since the the amount of regionally-anchored material dating to the early modern period is significantly scarce if compared with the nineteenth century. This also holds true for cases characteristic of literary dialects in prose specimens.

    Diatopically, texts representative of pre-1974 English dialects have been selected. The long-standing literary pedigree of counties such as Yorkshire and Lancashire has made it possible to find many documents representative of these varieties. Others such as Essex or Buckinghamshire suffer from a relative lack of vernacular literature, making it more complex to retrieve historical data from these areas. Unbalanced as it might seem, the selection of texts has not been made randomly, but according to the availability of material which is in turn dependent on the literary practices of each time period.

Reference line and copyright

Copyright © 2011-DING, The Salamanca Corpus, Universidad de Salamanca




On the website.


Drs. M. F. García-Bermejo Giner, Maria Pilar Sánchez-García and Javier Ruano-García. The DING Group (“Dialectología Inglesa y Diacronía Inglesa”. English Dialectology and the History of the English Language), Research group of the University of Salamanca.