d2e - From Data to Evidence

To diversify the discussion of data explosion in the humanities, the Research Unit for Variation, Contacts and Change in English (VARIENG) organised an academic conference that addressed the use of new data sources, historical and modern, in English language research

. We were particularly interested in papers discussing the advantages and disadvantages of the following three kinds of data:

Big data

In recent years, mega-corpora and other large text collections have become increasingly available to linguists. These databases open new opportunities for linguistic research, but they may be problematic in terms of representativeness and contextualisation, and the sheer amount of data may also pose practical problems. We welcome papers drawing on big data, including large corpora representing different genres and varieties (e.g. COCA, GloWbE), databases (e.g. EEBO, ECCO) and corpora created by web crawling (e.g. EnTenTen, UKWaC).

Rich data

Rich data contains more than just the texts, including representations of spacing, graphical elements, choice of typeface, prosody, or gestures. This is further supplemented by analytic and descriptive metadata linked to either entire texts or individual textual elements. The benefit of rich data is that it can provide new kinds of evidence about pragmatic, sociolinguistic and even syntactic aspects of linguistic events. Yet the creation and use of rich data bring great challenges. We invite papers on the representation, query, analysis, and visualisation of data consisting of more than linear text.

Uncharted data

Uncharted data comprises material which has not yet been systematically mapped, surveyed or investigated. We wish to draw attention to texts and language varieties which are marginally represented in current corpora, to data sources that exist on the internet or in manuscript form alone, and material compiled for purposes other than linguistic research. We welcome papers discussing the innovative research prospects offered by new and and previously unused or even unidentified material for the study of English in various contexts ranging from communities and networks to social groups and individuals.

The following invited speakers were invited to give presentations at the conference:

(Brigham Young University)
(Lancaster University)
(University of Tampere)
(Institute of Historical Research, University of London)

In addition to the invited speakers, the conference also featured invited demonstrations on data, methods and visualization by selected scholars, including (University of Glasgow), (University of Strathclyde), (University of Zurich) et al., and others.

The conference, held in the Main Building of the University, celebrated the 20^th anniversary of the Varieng research unit and formed a part of the programme celebrating the 375^th anniversary of the University of Helsinki.

Available material

(in PDF form)

Presentation slideshows:

(by Tanja Säily)

(by Michael Pace-Sigge)

Organising committee

, , , , (webmaster), , (chair), , , , , , (conference secretary), ,