Helsinki Digital Humanities Hackathon #DHH19 will have four thematics areas of interest with one or more groups per topic, each with up to eight participants + group leaders.
The Many Voices of European Parliamentary Debates
Parliaments are the centres of political power and decision making in democratic societies. The outputs from these institutions are, by design, public, yet the processes themselves can appear less plain to the outside observer. Although purposefully transparent, the discussions and debates, in which both individuals and parties with particular agendas come together to put forward their own agendas, are both rich and complex in content. This is partially due to the institutional makeup of parliamentary democracies having their own standards and norms. However, a perhaps more problematic issue is the sheer amount of data an observer is forced to engage with when studying parliamentary debates. To this end, researchers interested in these topics have turned to quantitative and digital approaches to develop insights into the workings of democracy.
This research group takes the CLARIN Parliamentary corpora (https://www.clarin.eu/resource-families/parliamentary-corpora), made up of parliamentary transcripts from across Europe (and the EU) and in multiple languages, to follow in this tradition. This unique dataset provides researchers with abundant opportunities to engage with political discussion and debates from multiple perspectives: over time, across borders, in various languages, across political affiliation, on particular issues, etc..
Potential research questions:
- What are the issues which have dominated or disappeared from political debate?
- How are international political topics shaped by the language in which they are discussed?
- How have important political debates, concepts, and/or topics changed over time? (i.e., diachronic concept change)
- Various forms of quantitative text analysis (text mining, topic modelling, natural language processing, etc.)
- Political scientists
- Experts in EU languages
Genre and Style in Early Modern Publications
This group focuses on English and French literature in the early modern period, particularly the 18th century. The group will computationally analyse large databases of published texts to unearth variation and change in historical genres and styles of writing. Databases will include, e.g., Eighteenth Century Collections Online (full texts of c. 50% of all literary works published in the 18th century in English) and the English Short Title Catalogue (extensive metadata of historical English publications). The objective is to identify both (1) linguistic means by which genres or individual styles specialized and (2) historical processes that led to changes in genres and styles, as public discourse diversified during the period. The group will be working in cooperation with researchers familiar with the topic, and will benefit from existing datasets and tools. In addition, you will explore ways of doing the above effectively within the time constraints of the project.
Relevant issues and research approaches may include:
- Charting the development of new genres in the 18th century, comparing the English and French data.
- Analysing stylistic variation within genres based on author metadata (e.g. gender, age, popularity).
- Applying and developing methodologies for identifying genres and styles in historical texts.
- Creating new methodological approaches for recognizing historical and linguistic phenomena.
- Creating an effective data refinement workflow.
- Creating typologies (from source material) that form the basis of statistical analysis.
- Interpreting the developments and changes uncovered by computational methods in the material over the century.
Brexit in Transnational Social Media
While different countries have internal dialogues on multiple different fora, Twitter has emerged as the global agora for transnational discussion between citizens. In this group, Twitter data from the Internet Archive as well as a purposive direct sample will be used to chart the flows, actors, topics and themes discussed around the topic of #Brexit, supposed to happen on 29/03/2019. By the time of the hackathon, we will have gathered about 1.5 months of Twitter traffic related to the topic both prior as well as after the 29/03/2019 date. In addition to their content and hashtag information, the group will be able to use the time, geographical and language information, and user information associated with the tweets.
Possible research questions include:
- Who are the actors involved in discussing Brexit on Twitter? How does the citizen discussion interact with official communication, both on and off Twitter?
- What are the topics talked about? How do they change in time and in response to (or in anticipation of) real-life events in the process?
- What are the social dynamics of the discussion? What are the groups, and how are they distributed geographically and linguistically?
Methodological expertise and innovative ideas in the following areas will be useful
- Natural language processing
- Machine learning
- Communication and linguistics
- Political science
- European studies
- Legal studies
- Data visualization
Newspapers and Capitalism
The Newspaper group studies how newspapers transformed from the late eighteenth century to the early twentieth century as a channel for marketing goods and services. It further studies the relationship between advertising and journalistic ideals. The datasets used consist of digitised newspapers from multiple countries: the National Library of Finland’s Newspaper corpus that contains nearly all of the newspapers and periodicals published in Finland from 1771 to 1919, the British Library’s Nineteenth Century Newspapers, The Dutch Royal Library’s collection digitized newspapers and possibly smaller datasets provided by the NewsEye project. Once we have better knowledge of the composition, computational and linguistic skills of the members in the newspaper group, we will decide upon which datasets to focus on and divide the group into smaller parallel groups. The benefit of focusing on advertisements is that they can be approached through several supplementary means: the available metadata, the optically read texts and the digital images of newspaper pages. By combining the qualitative study of newspapers with text mining tools such as Named Entity Recognition, Topic Modelling, Vector Space Models, methods from Computer Vision as well as statistical analysis of metadata, the group will have a number of interesting research questions to choose from, such as:
- Was there a clear distinction between journalistic text and advertisements?
- Did the amount of advertisements grow over time? Can the late nineteenth century and early twentieth century be seen as an age of marketization in the newspapers?
- Is there a correlation between the political profile of a newspaper and the advertisements included in it? Did, for instance, socialist papers have a different advertising profile from conservative papers?
- Can some newspapers be seen as less serious based on the advertisements contained in them?
- How did contemporaries reflect upon newspapers as carriers of information and as forums for buying and selling?
- Do any interesting networks of advertisements emerge? Were the same advertisements published in several papers, if yes, were the newspapers related somehow?
- Is it possible to discern particular features that seemed to make a particularly compelling advertisement?
Studying the relation of advertisements and journalistic ideals in newspapers has two potentially important outcomes. First, it may contribute to a scholarly discussion on the development of print capitalism by using the advertisements as a proxy for larger political and social developments. Second, by better understanding the logic of advertisements in the press, it may help in developing article extraction (at least with regard to advertisements) and thus pave the way any future study that needs to be able to discern between text that emerges from advertisements and from journalistic text.