Helsinki Digital Humanities Hackathon #DHH19 gathered students and researchers of humanities, social sciences, and computer science in May at the University of Helsinki. During a week and a half of intensive multi-disciplinary work, the groups applied digital methods to a variety of datasets, with the goal of solving research questions in the following themes:
Parliaments are the centres of political power and decision making in democratic societies. The outputs from these institutions are, by design, public, yet the processes themselves can appear less plain to the outside observer. Although purposefully transparent, the discussions and debates, in which both individuals and parties with particular agendas come together to put forward their own agendas, are both rich and complex in content. This is partially due to the institutional makeup of parliamentary democracies having their own standards and norms. However, a perhaps more problematic issue is the sheer amount of data an observer is forced to engage with when studying parliamentary debates. To this end, researchers interested in these topics have turned to quantitative and digital approaches to develop insights into the workings of democracy.
This research group takes the CLARIN Parliamentary corpora (https://www.clarin.eu/resource-families/parliamentary-corpora), made up of parliamentary transcripts from across Europe (and the EU) and in multiple languages, to follow in this tradition. This unique dataset provides researchers with abundant opportunities to engage with political discussion and debates from multiple perspectives: over time, across borders, in various languages, across political affiliation, on particular issues, etc.
Potential research questions:
See group's blog.
This group focuses on English and French literature in the early modern period, particularly the 18th century. The group will computationally analyse large databases of published texts to unearth variation and change in historical genres and styles of writing. Databases will include, e.g., Eighteenth Century Collections Online (full texts of c. 50% of all literary works published in the 18th century in English) and the English Short Title Catalogue (extensive metadata of historical English publications). The objective is to identify both (1) linguistic means by which genres or individual styles specialized and (2) historical processes that led to changes in genres and styles, as public discourse diversified during the period. The group will be working in cooperation with researchers familiar with the topic, and will benefit from existing datasets and tools. In addition, you will explore ways of doing the above effectively within the time constraints of the project.
Relevant issues and research approaches may include:
See group's blog.
While different countries have internal dialogues on multiple different fora, Twitter has emerged as the global agora for transnational discussion between citizens. In this group, Twitter data from the Internet Archive as well as a purposive direct sample will be used to chart the flows, actors, topics and themes discussed around the topic of #Brexit, supposed to happen on 29/03/2019. By the time of the hackathon, we will have gathered about 1.5 months of Twitter traffic related to the topic both prior as well as after the 29/03/2019 date. In addition to their content and hashtag information, the group will be able to use the time, geographical and language information, and user information associated with the tweets.
Possible research questions include:
Methodological expertise and innovative ideas in the following areas will be useful
See group's blog.
The Newspaper group studies how newspapers transformed from the late eighteenth century to the early twentieth century as a channel for marketing goods and services. It further studies the relationship between advertising and journalistic ideals. The datasets used consist of digitised newspapers from multiple countries: the National Library of Finland’s Newspaper corpus that contains nearly all of the newspapers and periodicals published in Finland from 1771 to 1919, the British Library’s Nineteenth Century Newspapers, The Dutch Royal Library’s collection digitized newspapers and possibly smaller datasets provided by the NewsEye project. Once we have better knowledge of the composition, computational and linguistic skills of the members in the newspaper group, we will decide upon which datasets to focus on and divide the group into smaller parallel groups. The benefit of focusing on advertisements is that they can be approached through several supplementary means: the available metadata, the optically read texts and the digital images of newspaper pages. By combining the qualitative study of newspapers with text mining tools such as Named Entity Recognition, Topic Modelling, Vector Space Models, methods from Computer Vision as well as statistical analysis of metadata, the group will have a number of interesting research questions to choose from, such as:
Studying the relation of advertisements and journalistic ideals in newspapers has two potentially important outcomes. First, it may contribute to a scholarly discussion on the development of print capitalism by using the advertisements as a proxy for larger political and social developments. Second, by better understanding the logic of advertisements in the press, it may help in developing article extraction (at least with regard to advertisements) and thus pave the way any future study that needs to be able to discern between text that emerges from advertisements and from journalistic text.
See group's blog.