Group leaders:
The group will conduct research on the corpora of Finnic oral folk poetry: Suomen Kansan Vanhat Runot (Old Poems of the Finnish People), which in addition to Finnish contains material in Karelian, Izhorian and Votic languages, and Eesti Regilaulude Andmebaas (Estonian Runosongs’ Database). The corpora contain written records of original folk poems, including epics, lyrics, occasional songs (e.g. wedding songs) and charms.
From the computational perspective, the datasets are challenging because of large variation in terms of orthography and dialect. The songs contain recurring themes, characters, formulas (fixed short expressions) and overlapping text fragments, but due to the surface-level variation identifying those similarities is a research question for unsupervised language processing.
The data, tools and supervision will be provided by members of the project:
Possible research questions include:
Expertise or interest in the following areas will be especially useful:
Group leaders:
It is commonly accepted that the emergence of social media, especially Facebook and Twitter, have changed and challenged the media landscape in important ways. However, because of the sparse availability of concurrent media and social media data, many aspects of the interaction between social media and traditional news media have been left unstudied. This has changed of late, as Twitter has improved the accessibility of its data for research purposes. At the same time, the
The group seeks to find different ways to study the interaction between digital newsmedia journalism and Twitter. The topic will be centred on comparing the presences of and debates around two citizen initiative campaigns in social media and political newsmedia: one for same-sex marriage (Tasa-arvoinen avioliittolaki 2017) and one for legal gender recognition (Translaki).
Workflows and analyses will be geared to capture dynamics and interactions related to, for example:
The main computational challenges of this topic are mapping different phenomenon arising from Twitter data to newsmedia data, and vice versa. This involves, for example
The data available for the group includes all articles from Helsingin Sanomat (the biggest Finnish broadside newspaper), YLE (public broadcast company), Iltalehti (national tabloid) and STT (the main Finnish news agency) between 2011 and 2017, as well as the metadata of YLE’s radio and TV broadcasts. We have also access to Twitter's historical API.
Group leaders:
The international newspapers group at DHH21 will develop a multilingual case that investigates which places dominated the news reporting during the Great War between 1914 and 1918. The group will identify news articles that relate to the war and extract names of places in order to discover which war efforts were covered in the large multilingual collection of historical news.
The digitized newspapers are provided by the project “
The aim is not to describe the different battles and war efforts as such, but to compare which locations seemed relevant depending on the viewpoint of the different language papers. Helsinki, Vienna, and Paris got their news through different channels. Consequently, the imagination of what and where things happened during the Great War looked different in those places. Through a systematic comparison, we may be able to understand the spatial imaginaries of war.
Possible tasks are:
Group leaders:
At Helsinki Computational History Group we have created a dataset of text reuses in the Eighteenth-Century Collections Online (ECCO). This dataset was created by running BLAST on EEBO-TCP and ECCO and sidestepping the OCR-problems that often hamper text mining of ECCO. We tracked each case of text reuse of strings of 50 characters or more totalling millions of text reuse cases.
The task of this hackathon group is to use this text reuse dataset to study eighteenth-century intertextuality through the uses of English translations of Pierre Bayle’s Historical and Critical Dictionary. This is not the first time that digital humanities project focuses on text reuse cases in dictionaries (Allen et al. 2010; Leca-Tsiomis 2013). The aim of this project is to also learn from these earlier experiences.
Research interests
This group is particularly well suited for students with a computational background. We aim to create workflows that make the task of using and analysing the text reuse data more convenient. Computer scientists joining the group have the chance of developing tools that tackle challenging historical data, and contribute to the real research questions of historical text reusage. The developed tools would have great potential for further use in later analysis of the dataset beyond the hackathon project.
The dataset is very intriguing also from the perspective of eighteenth-century studies. We will focus on the concept of remediation by studying the little known phenomena of text reuse at large scale. We will also study translations as intellectual activity and switch the interest of knowledge from authors to publishing networks where the role of the author is seen in a different light.
Possible tasks to exemplify the work in the group
Workflow for studying text reuses of Bayle’s Dictionary
Study of the text reuse phenomenon in general through the case of translations of Bayle’s Dictionary.
Networks of publishing for Bayle’s Dictionary
References and further reading for potential group members
Allen, Timothy, Charles Cooney, Stéphane Douard, Russell Horton, Robert Morrissey, Mark Olsen, Glenn Roe, Robert Voyer. 2010. Plundering Philosophers: Identifying Sources of the Encyclopédie. Journal of the Association for History and Computing 13:
Bayle, Pierre. 2000. Political Writings, trans. Sally L. Jenkinson, Cambridge: Cambridge University Press.
Justin Champion, 2008. “Bayle in the English Enlightenment,” in Pierre Bayle (1647-1706), le philosophe de Rotterdam: Philosophy, Religion and Reception, eds. van Bunge and Bots, Brill, 2008: 175-196.
Leca-Tsiomis, Marie. 2013. The Use and Abuse of the Digital Humanities in the History of Ideas: How to Study the Encyclopédie, History of European Ideas, 39:4, 467-476, DOI: 10.1080/01916599.2013.774115
Labrousse, Elisabeth. 1983. Bayle, trans. Denys Potts. Oxford and New York: Oxford University Press.
Lennon, Thomas. 2008. Pierre Bayle in Stanford Encyclopedia of Philosophy:
Group leaders:
The group focuses on the debates in the Parliament of Finland in the twentieth century. The group’s objective is to learn how to use public speech data, in this case parliamentary linked open data, for studying pressing societal issues of the past. Moreover, the group develops and uses tools that allow to identify themes, topics, and place names in the debates, and to classify the debates by using related metadata such as speaker information. The Finnish data exemplifies the parliamentary corpora and the linked open data standards that are developed and used internationally.
Parliaments are the main legislative institutions and key places of decision-making and political discussion in our democratic societies. The parliament is a national arena of speaking and debating, to which the Members of the Parliament (MPs), the “people’s representatives”, are elected in regional districts.The parties and the MPs align with political ideologies, but also with geographic areas such as urban centres, the countryside, or their home region. Moreover, locations are markers in the debates about policy issues, such as the environment or foreign policy, where a reference to the Soviet Union or Chernobyl can play different rhetorical roles. The group, thus, will study the different ways in which parliamentary politics and place are related. The group can approach the question from several perspectives in their project, including:
The parliamentary debate material and the related metadata are provided by the project Semantic Parliament – ParliamentSampo: Linked Open Data Service for Studying Political Culture (SEMPARL) (
Possible tasks for the project are:
Group leaders:
Terms and conditions of employment are regulated on the society level and have further impact on each individual contract. When independent unions and employers (or employers’ organizations) negotiate those terms and conditions of employment and regulate relations between the parties, the activity is referred to as ‘collective bargaining’. The written document resulting from this negotiation is a collective bargaining agreement (CBA). While being very important for the workers and for the employers, these documents (CBAs) are not easy to find and their content is often unknown even to those who are covered by them.
Since 2012, the WageIndicator Foundation (
The resulting datasets contain the collective agreements’ full texts and all the clauses assigned to each question.
Research interests
The uniqueness and richness of such a dataset gives the opportunity to do research on many levels, as it sheds light on how different topics related to working conditions are addressed in different countries and expressed in different languages. The task of the hackathon group is to gain qualitative insights from the data and see how this output can be potentially shared/made visible for broader groups of Social Sciences and Humanities scientists via services provided by Research Infrastructures.
In this group, students with (digital) humanities background and students with an interest in computational language processing, e.g. multilingual texts analysis, will find something exciting to work on. Research ideas for this group might include:
Such work will contribute to the research on collective agreements provisions and ultimately help workers, trade unions and employers all over the world to know more about their labour rights at sectoral or company level.
Possible tasks to exemplify the work in the group
Group leaders:
The group will focus on the comparison of parliamentary debates before and during Covid across Europe from a linguistic, sociological, politological and/or computational perspective. The group’s objective will be to learn how to use comparable parliamentary corpora from various European countries that are annotated with metadata such as speaker and session information and linguistic annotations such as morphosyntactic and named entity tags for studying societal issues caused by the Covid-19 pandemic. The group will also learn how to use Orange (
National parliamentary data is a verified communication channel between the elected political representatives and society members in any democracy. One of the most important characteristics of parliamentary data is its direct correspondence with concurrent events, including the ones with a global impact on human health, social life, and economics such as the current COVID-19 pandemic. By comparing the data synchronically and diachronically in a cross-lingual context, we can obtain important insights into transnational characteristics as well as track the pan-European discussion in times of crisis.
The parliamentary corpora will be provided by the CLARIN ERIC
Possible topics and tasks for the group are: