Helsinki Digital Humanities Hackathon 2018 #DHH18

Helsinki Digital Humanities Hackathon #DHH18 | 23.5.–1.6.2018

This event aims to bring together students and researchers of humanities, social sciences and computer science, for a week of active co-operation in groups under the heading of digital humanities.

Digital humanities, as understood here, is about applying modern data processing to solve research questions in the humanities and social sciences. At its best, such close collaboration offers unique benefits for both fields: scholars in the humanities are able to tackle questions too labour-intensive for manual study, while computer scientists encounter new and challenging use cases for the tools and algorithms they develop.

The hackathon consists of intensive work in small groups, formulating research questions with respect to particular data sets, applying and developing methods and tools, and presenting the work at the end of the week. For information on what the hackathon was like in previous years, see: #DHH17, #DHH16 and #DHH15.

During the hackathon, the participants will learn how to work in multidisciplinary research projects. The hackathon will also broaden their understanding of digital humanities, and what is possible to achieve with such collaboration.


We will have five areas of interest with at least one group in each topic, each with up to eight participants + group leaders.

People in the News

Group supervisors: Antti Kanner, Ilona Pikkanen, Risto Turunen, Jani Marjanen

The Finnish Newspaper group concentrates on people. Persons can become newsworthy for many reasons: some have lasting influence in political or cultural spheres, some have fame only for a brief period of time, while others might be mentioned briefly in an advertisement or even in an obituary. The dataset used is the National Library of Finland’s Newspaper corpus that contains nearly all of the newspapers and periodicals published in Finland from 1771 to 1919. By combining text mining tools such as Named Entity Recognition, Topic Modelling, Sentiment Analysis, Vector Space Models with metadata from other sources (such as the National Biography) and close reading, the group will have a number of interesting research questions to choose from, such as:

  • What kind of features predict newsworthiness of people in 19th century Finland?
  • Was the selection of people discussed affected by the language, location or political affiliation of the publication?
  • How were people now recognized as key actors in national history described in newspapers in their lifetime and after their death?
  • Can people discussed in the newspapers be categorized somehow based on how they were discussed?
  • Do any interesting networks of people emerge?

Results: presentation (PDF), poster (PDF), code (GitHub)

Russia ⇔ Finland

Group supervisors: Daria Gritsenko, Andrey Indukaev

This group focuses on representations of Russia in Finnish newspapers – and of Finland in Russian newspapers. The objective is to recognize, in large fulltext corpora of contemporary media, how do the two countries portray each other, which events, persons, places, developments, etc. are at the centre of the discussion. The group will be working in cooperation with researchers from Aleksanteri Institute. Relevant issues include:

  • Approaches to detect continuity and change in language representations of political images over a relatively short (ca. 10-20 years) period of time;
  • Application of linguistic methods (for example, semantic closeness, supervised and unsupervised text mining and sentiment analysis);
  • Visual analysis and methods for multi-modal analysis;
  • Event, place and personality tracing on the basis of substantive knowledge of Finnish-Russian relations;
  • Work with Cyrillic encoding for those interested in extra challenge!

Results: presentation (PDF), poster (PDF), code (GitHub)

Early Modern Publishing

Group supervisors: Mark Hill, Ville Vaara, Tanja Säily

This group focuses on English literature in the early modern period. The group will computationally analyse large databases of historical literature to unearth changes in publication practices, genres, and roles of publishers. Databases will include the Eighteenth Century Collections Online (full texts of ~50% of all literary works published in 18th century in English) and the English Short Title Catalogue (extensive metadata of 18th century English literature). The objective is to recognize micro-level historical processes that affected how the literary world became an integral part of public discourse during the period. The group will be working in cooperation with researchers currently focusing on the same topic, and will benefit from existing datasets and tools. In addition, you will explore ways of doing the above effectively within the time constraints of the project. Relevant issues and research approaches may include:

  • Questioning how these changes were reflected geographically in historical London.
  • Creating new methodological approaches for recognising historical phenomena.
  • Developing methodologies for recognising genres in historical texts.
  • Creating an effective data refinement workflow.
  • Analyzing currents in intellectual history reflected in the source material.
  • Creating typologies (from source material) that form the basis of statistical analysis.
  • Interpreting the developments and changes uncovered by computational methods in the material over the century.

Results: presentation (PDF), poster (PDF)

The Death Psalm of Bishop Henry

Group supervisors: Tuomas Heikkilä (Church History), Teemu Roos (Computer Science)

Despite the numerous dimensions of the written word, our communication has always been and still remains mainly oral. This group approaches and tackles the methodological challenges of orally transmitted literature through the oldest and most prominent Finnish example: the Death Psalm of Bishop Henry (Piispa Henrikin surmavirsi), describing the murder of the semi-legendary apostle of Finland in the mid-12th century but written down only centuries later. This group will be the first to use computational means to study this oral tradition of paramount importance to the early Finnish history.

The data set is provided by the organizers, and consists of just 14 very different versions – in both prose and verse – of the Death Psalm, written down in the 17th and 18th centuries.

The relevant research questions may include, e.g.:

  • How can we reconstruct the original contents of the Death Psalm?
  • How can we date the contents of the versions? Does their language provide hints about their area of origin?
  • How can we reconstruct the family tree of the extant versions?
  • What kinds of features of a story were most easily transformed / left out during the oral transmission? Which traits remained more constant?
  • Can we extract historical contents from the oral tradition?

Previous approaches to orally transmitted literature have included, e.g.:

  • Traditional comparison of the text of the individual versions
  • Comparison of the episodes contained in the versions
  • Use of methods and algorithms borrowed from evolutionary biology and computer-assisted stemmatology.

Methodological expertise and innovative ideas in the following areas will be useful:

  • XML processing
  • Information extraction from text, text mining, natural language processing
  • Information visualization
  • Machine learning
  • History
  • Philology

Results: presentation (PDF), poster (PDF)

Helsinki in Geotagged Social Media

Group supervisors: Tuomo Hiippala (multimodal communication), Tuuli Toivonen (geoinformatics)

Social media data can provide valuable clues about cities and what their inhabitants do, where, when and why. This group explores social media as a source of data for understanding Helsinki using large volumes of geotagged Twitter, Instagram and Flickr posts uploaded between 2014–2016. However, working with social media data entails many challenges, as the data is inherently cluttered, multilingual and features various types of content, such as texts, images and emojis. For this reason, the data requires careful preprocessing, but also opens an opportunity for combining different methods to answer research questions.

Possible research questions include:

  • Which languages are used in Helsinki and how do their users move about?
  • What kinds of topics are discussed in social media at different locations around Helsinki?
  • Can we identify groups of users based on the content they post and their movements?

Methodological expertise and innovative ideas in the following areas will be useful

  • Geoinformatics and human geography
  • Natural language processing
  • Computer vision
  • Machine learning
  • Communication and linguistics
  • Urban studies
  • Ethics

Results: presentation (PDF), poster (PDF), code (GitHub)

General Data Team

Ali Ijaz, Simon Hengchen, Hege Roivainen

Data team will provide computational help to different groups if necessary and they will help co-ordinating different computational tasks during the hackathon.


Application period: 9.4.2018–29.4.2018.

23.5.2018 09:00–17:00, Athena, Siltavuorenpenger 3 A, room 166 (and room 144)

Orientation and forming of the groups (remote participation possible)

24.–25.5.2018 09:00–17:00, Athena, Siltavuorenpenger 3 A, rooms 166 and 144

Intensive preparation period for the participants – introductions to material, forming of research questions and work plans (from morning until night, but with remote participation possible)

28.5.–1.6.2018 09:00–17:00, Minerva Square, Siltavuorenpenger 5 A, room K226

Intensive hackathon period (daily from morning until night, social programme included)

1.6.2018 13:00–17:15, Minerva Square, Siltavuorenpenger 5 A, room K226

Public presentations of the projects: see further information

The event will be streamed at (See the recorded stream here:


The hackathon is aimed mainly for MA students and beyond. As a course, it is part of the 30 credit digital humanities module (see here). Computer science and other students with sufficient programming skills may join without prior digital humanities studies. For students in the humanities, if the event should be overbooked, priority is given to those who have completed introduction to digital humanities and/or introduction to methods in digital humanities.


Register using this link:


3–5 ECTS credits may be gained from participating in the hackathon. Assessment: pass/fail, based on participation in the group work, presentation, and an individual report.

Students from the University of Helsinki: please contact Mikko Tolonen for more details on the credits.

Students from Aalto University: please contact Jukka Suomela for more details on the credits.


Mikko Tolonen, Eetu Mäkelä, Jukka Suomela & Jouni Tuominen