The group will conduct research on the corpora of Finnic oral folk poetry: Suomen Kansan Vanhat Runot (Old Poems of the Finnish People), which in addition to Finnish contains material in Karelian, Izhorian and Votic languages, and Eesti Regilaulude Andmebaas (Estonian Runosongs’ Database). The corpora contain written records of original folk poems, including epics, lyrics, occasional songs (e.g. wedding songs) and charms.
From the computational perspective, the datasets are challenging because of large variation in terms of orthography and dialect. The songs contain recurring themes, characters, formulas (fixed short expressions) and overlapping text fragments, but due to the surface-level variation identifying those similarities is a research question for unsupervised language processing.
The data, tools and supervision will be provided by members of the project: Formulaic Intertextuality, Thematic Networks and Poetic Variation across Regional Cultures of Finnic Oral Folk Poetry (FILTER). The group can choose to concentrate on either one of the datasets (Finnish or Estonian) or a comparison of both. It is recommended that at least half of the group has high proficiency in either Finnish or Estonian (depending on the chosen dataset).
Possible research questions include:
Expertise or interest in the following areas will be especially useful:
It is commonly accepted that the emergence of social media, especially Facebook and Twitter, have changed and challenged the media landscape in important ways. However, because of the sparse availability of concurrent media and social media data, many aspects of the interaction between social media and traditional news media have been left unstudied. This has changed of late, as Twitter has improved the accessibility of its data for research purposes. At the same time, the Flows of Power -project has managed to acquire full dumps of the journalistic output of multiple major Finnish media outlets.
The group seeks to find different ways to study the interaction between digital newsmedia journalism and Twitter. The topic will be centred on comparing the presences of and debates around two citizen initiative campaigns in social media and political newsmedia: one for same-sex marriage (Tasa-arvoinen avioliittolaki 2017) and one for legal gender recognition (Translaki).
Workflows and analyses will be geared to capture dynamics and interactions related to, for example:
The main computational challenges of this topic are mapping different phenomenon arising from Twitter data to newsmedia data, and vice versa. This involves, for example
The data available for the group includes all articles from Helsingin Sanomat (the biggest Finnish broadside newspaper), YLE (public broadcast company), Iltalehti (national tabloid) and STT (the main Finnish news agency) between 2011 and 2017, as well as the metadata of YLE’s radio and TV broadcasts. We have also access to Twitter's historical API.
The international newspapers group at DHH21 will develop a multilingual case that investigates which places dominated the news reporting during the Great War between 1914 and 1918. The group will identify news articles that relate to the war and extract names of places in order to discover which war efforts were covered in the large multilingual collection of historical news.
The digitized newspapers are provided by the project “NewsEye: A digital investigator for historical newspapers”. The NewsEye collection includes digitized newspapers from Austria, France, and Finland in five languages, namely German, French, Finnish, Swedish, and English. We assume that at least one person in the group would know each of these languages, though it is not required to know any of them to join the group (except for English). In addition to data collections, NewsEye provides text processing tools, accessible in the user interface or from API, though it is possible to use anything else to analyse the collection.
The aim is not to describe the different battles and war efforts as such, but to compare which locations seemed relevant depending on the viewpoint of the different language papers. Helsinki, Vienna, and Paris got their news through different channels. Consequently, the imagination of what and where things happened during the Great War looked different in those places. Through a systematic comparison, we may be able to understand the spatial imaginaries of war.
Possible tasks are:
At Helsinki Computational History Group we have created a dataset of text reuses in the Eighteenth-Century Collections Online (ECCO). This dataset was created by running BLAST on EEBO-TCP and ECCO and sidestepping the OCR-problems that often hamper text mining of ECCO. We tracked each case of text reuse of strings of 50 characters or more totalling millions of text reuse cases.
The task of this hackathon group is to use this text reuse dataset to study eighteenth-century intertextuality through the uses of English translations of Pierre Bayle’s Historical and Critical Dictionary. This is not the first time that digital humanities project focuses on text reuse cases in dictionaries (Allen et al. 2010; Leca-Tsiomis 2013). The aim of this project is to also learn from these earlier experiences.
This group is particularly well suited for students with a computational background. We aim to create workflows that make the task of using and analysing the text reuse data more convenient. Computer scientists joining the group have the chance of developing tools that tackle challenging historical data, and contribute to the real research questions of historical text reusage. The developed tools would have great potential for further use in later analysis of the dataset beyond the hackathon project.
The dataset is very intriguing also from the perspective of eighteenth-century studies. We will focus on the concept of remediation by studying the little known phenomena of text reuse at large scale. We will also study translations as intellectual activity and switch the interest of knowledge from authors to publishing networks where the role of the author is seen in a different light.
Possible tasks to exemplify the work in the group
Workflow for studying text reuses of Bayle’s Dictionary
Study of the text reuse phenomenon in general through the case of translations of Bayle’s Dictionary.
Networks of publishing for Bayle’s Dictionary
References and further reading for potential group members
Allen, Timothy, Charles Cooney, Stéphane Douard, Russell Horton, Robert Morrissey, Mark Olsen, Glenn Roe, Robert Voyer. 2010. Plundering Philosophers: Identifying Sources of the Encyclopédie. Journal of the Association for History and Computing 13: http://hdl.handle.net/2027/spo.3310410.0013.10
Bayle, Pierre. 2000. Political Writings, trans. Sally L. Jenkinson, Cambridge: Cambridge University Press.
Justin Champion, 2008. “Bayle in the English Enlightenment,” in Pierre Bayle (1647-1706), le philosophe de Rotterdam: Philosophy, Religion and Reception, eds. van Bunge and Bots, Brill, 2008: 175-196.
Leca-Tsiomis, Marie. 2013. The Use and Abuse of the Digital Humanities in the History of Ideas: How to Study the Encyclopédie, History of European Ideas, 39:4, 467-476, DOI: 10.1080/01916599.2013.774115
Labrousse, Elisabeth. 1983. Bayle, trans. Denys Potts. Oxford and New York: Oxford University Press.
Lennon, Thomas. 2008. Pierre Bayle in Stanford Encyclopedia of Philosophy: https://plato.stanford.edu/entries/bayle/
The group focuses on the debates in the Parliament of Finland in the twentieth century. The group’s objective is to learn how to use public speech data, in this case parliamentary linked open data, for studying pressing societal issues of the past. Moreover, the group develops and uses tools that allow to identify themes, topics, and place names in the debates, and to classify the debates by using related metadata such as speaker information. The Finnish data exemplifies the parliamentary corpora and the linked open data standards that are developed and used internationally.
Parliaments are the main legislative institutions and key places of decision-making and political discussion in our democratic societies. The parliament is a national arena of speaking and debating, to which the Members of the Parliament (MPs), the “people’s representatives”, are elected in regional districts.The parties and the MPs align with political ideologies, but also with geographic areas such as urban centres, the countryside, or their home region. Moreover, locations are markers in the debates about policy issues, such as the environment or foreign policy, where a reference to the Soviet Union or Chernobyl can play different rhetorical roles. The group, thus, will study the different ways in which parliamentary politics and place are related. The group can approach the question from several perspectives in their project, including:
The parliamentary debate material and the related metadata are provided by the project Semantic Parliament – ParliamentSampo: Linked Open Data Service for Studying Political Culture (SEMPARL) (https://seco.cs.aalto.fi/projects/semparl/en/). As the parliamentary material is mainly in Finnish, basic knowledge of Finnish is recommended though not mandatory; the computational tasks, in particular, can be carried out in English. Besides the data, the SEMPARL project will provide the group with basic tools or a user interface which allow to browse and search the data.
Possible tasks for the project are:
Terms and conditions of employment are regulated on the society level and have further impact on each individual contract. When independent unions and employers (or employers’ organizations) negotiate those terms and conditions of employment and regulate relations between the parties, the activity is referred to as ‘collective bargaining’. The written document resulting from this negotiation is a collective bargaining agreement (CBA). While being very important for the workers and for the employers, these documents (CBAs) are not easy to find and their content is often unknown even to those who are covered by them.
Since 2012, the WageIndicator Foundation (http://wageindicator.org) has been collecting and coding CBAs on a global scale in the WageIndicator Collective Agreements Database (http://wageindicator.org/cbadatabase). The Database currently contains 1600 collective agreements from more than 50 countries and written in 28 languages. The texts have been manually annotated according to 250 labour rights related questions on nine main topics – Social security and pensions, Training, Employment contracts, Sickness and disability, Health and medical assistance, Work/family balance arrangements, Gender equality issues, Wages, Working hours – and the relevant clauses (i.e., parts of text) for each question have been manually selected. Part of the annotation has been carried out under the SSHOC project (https://sshopencloud.eu/) and supported by the CLARIN Research Infrastructure (https://www.clarin.eu/).
The resulting datasets contain the collective agreements’ full texts and all the clauses assigned to each question.
The uniqueness and richness of such a dataset gives the opportunity to do research on many levels, as it sheds light on how different topics related to working conditions are addressed in different countries and expressed in different languages. The task of the hackathon group is to gain qualitative insights from the data and see how this output can be potentially shared/made visible for broader groups of Social Sciences and Humanities scientists via services provided by Research Infrastructures.
In this group, students with (digital) humanities background and students with an interest in computational language processing, e.g. multilingual texts analysis, will find something exciting to work on. Research ideas for this group might include:
Such work will contribute to the research on collective agreements provisions and ultimately help workers, trade unions and employers all over the world to know more about their labour rights at sectoral or company level.
Possible tasks to exemplify the work in the group
The group will focus on the comparison of parliamentary debates before and during Covid across Europe from a linguistic, sociological, politological and/or computational perspective. The group’s objective will be to learn how to use comparable parliamentary corpora from various European countries that are annotated with metadata such as speaker and session information and linguistic annotations such as morphosyntactic and named entity tags for studying societal issues caused by the Covid-19 pandemic. The group will also learn how to use Orange (https://orangedatamining.com), a visual programming tool for data mining and machine learning, which means coding skills are not required for exploring the data set. Computer scientists will be able to use their skills to create advanced custom widgets for data processing and analysis.
National parliamentary data is a verified communication channel between the elected political representatives and society members in any democracy. One of the most important characteristics of parliamentary data is its direct correspondence with concurrent events, including the ones with a global impact on human health, social life, and economics such as the current COVID-19 pandemic. By comparing the data synchronically and diachronically in a cross-lingual context, we can obtain important insights into transnational characteristics as well as track the pan-European discussion in times of crisis.
The parliamentary corpora will be provided by the CLARIN ERIC ParlaMint project (currently available in Bulgarian, Croatian, Polish, and Slovenian) and is supported by the SSHOC project (https://sshopencloud.eu/). Its goal is to compile a collection of comparable corpora of debates from national parliaments from all over Europe in a harmonized format, covering both the data from the period of the Covid-19 pandemic as well as older, reference data. The first version of the corpora have already been processed linguistically and enriched with metadata, made searchable through popular concordancers for online querying as well as downloadable from the CLARIN repository for independent handling. By the time of the hackathon, a new version with many new languages will be available (English, Dutch, Icelandic, Lithuanian, Czech, Italian, Turkish, Danish, Hungarian, French, Latvian, Romanian, and Belgian Dutch/French).
Possible topics and tasks for the group are: