To diversify the discussion of data explosion in the humanities, the Research Unit for Variation, Contacts and Change in English (VARIENG) organised an academic conference that addressed the use of new data sources, historical and modern, in English language research. We were particularly interested in papers discussing the advantages and disadvantages of the following three kinds of data:
In recent years, mega-corpora and other large text collections have become increasingly available to linguists. These databases open new opportunities for linguistic research, but they may be problematic in terms of representativeness and contextualisation, and the sheer amount of data may also pose practical problems. We welcome papers drawing on big data, including large corpora representing different genres and varieties (e.g. COCA, GloWbE), databases (e.g. EEBO, ECCO) and corpora created by web crawling (e.g. EnTenTen, UKWaC).
Rich data contains more than just the texts, including representations of spacing, graphical elements, choice of typeface, prosody, or gestures. This is further supplemented by analytic and descriptive metadata linked to either entire texts or individual textual elements. The benefit of rich data is that it can provide new kinds of evidence about pragmatic, sociolinguistic and even syntactic aspects of linguistic events. Yet the creation and use of rich data bring great challenges. We invite papers on the representation, query, analysis, and visualisation of data consisting of more than linear text.
Uncharted data comprises material which has not yet been systematically mapped, surveyed or investigated. We wish to draw attention to texts and language varieties which are marginally represented in current corpora, to data sources that exist on the internet or in manuscript form alone, and material compiled for purposes other than linguistic research. We welcome papers discussing the innovative research prospects offered by new and and previously unused or even unidentified material for the study of English in various contexts ranging from communities and networks to social groups and individuals.
The following invited speakers were invited to give presentations at the conference:
- Professor Mark Davies (Brigham Young University)
- Professor Tony McEnery (Lancaster University)
- Professor Päivi Pahta (University of Tampere)
- Dr Jane Winters (Institute of Historical Research, University of London)
In addition to the invited speakers, the conference also featured invited demonstrations on data, methods and visualization by selected scholars, including Dr Marc Alexander (University of Glasgow), Prof. Jonathan Hope (University of Strathclyde), Prof. Gerold Schneider (University of Zurich) et al., and others.
The conference, held in the Main Building of the University, celebrated the 20th anniversary of the Varieng research unit and formed a part of the programme celebrating the 375th anniversary of the University of Helsinki.
Programme and Book of Abstracts (in PDF form)
- Lieselotte Anderwald:
Empirically charting the success of prescriptivism
- Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich:
The taming of the data: Using text mining in building a corpus for diachronic analysis
- Sanna Franssila:
The politics of betrayal: The use of verbs abuse, betray and sell out in American opinion media
- Mirka Honkanen:
Multilingual resources and authentication in the computer‐mediated communication of U.S.‐Nigerians: The role of African American Vernacular English in the repertoire
- Tyler Kendall:
Making old data sources into new data sources: On the aggregation of sociolinguistic datasets and the future of real-time and cross-study analysis
- Jeffrey Lijffit and Tanja Säily:
Adjusting p-values for heterogeneityin collocation analysis
- Morana Lukač:
Charting out the discourse of linguistic prescriptivism in the Complaint Tradition Corpus
- Terttu Nevalainen, Tanja Säily and Turo Vartiainen:
Language Change Database: A new online resource
- Michael Pace-Sigge:
MONOCOLLOCATES: How fixed Multi-Word Units with OF or TO indicate diversity of use in different corpora
- Gerold Schneider, Mennatallah El-Assady and Hans Martin Lehmann:
Tools and Methods for Processing and Visualizing Large Corpora (DEMO)
- Tanja Säily and Jukka Suomela:
types2: Exploring word-frequency differences in corpora
- Turo Vartiainen:
Covert expressions of attitude in criminal records: Rape cases in the Old Bailey Corpus, 1720–1749
- Presentations of the Poster Session
Photos from the d2e Conference (by Tanja Säily)
Photos from the d2e Conference (by Michael Pace-Sigge)
Turo Hiltunen, Leena Kahlas-Tarkka, Samuli Kaislaniemi, Matti Kilpiö, Ville Marttila (webmaster), Minna Nevala, Terttu Nevalainen (chair), Minna Palander-Collin, Maura Ratia, Matti Rissanen, Anni Sairio, Tanja Säily, Carla Suhr (conference secretary), Irma Taavitsainen, Turo Vartiainen
The Varieng conference From Data to Evidence gratefully acknowledges the support and cooperation of the following institutions: