Helsinki Digital Humanities Hackathon #DHH20 will have five thematics areas of interest with one or more groups per topic, each with up to eight participants + group leaders.
The evolution of ideas in Parliamentary Debates
Parliaments are the centres of political power and decision making in democratic societies. The outputs from these institutions are, by design, public, yet the processes themselves can appear less plain to the outside observer. Although purposefully transparent, the discussions and debates, in which both individuals and parties with particular agendas come together to put forward their own agendas, are both rich and vast in content.
Parliamentary debates are a great resource for studying how ideas have evolved over time, or in specific national contexts. For example, how parliamentarians discussed the future of the nation holds ideas about the future, modernization as well as the nation itself. By examining debates on particular topics, we can get a better grasp of these views but also the ways in which they have developed. Can we detect slow changes or are there sudden breakpoints? Do these changes correspond to socio-political developments, or possibly military conflicts?
This research group will work on a subset of data from the CLARIN Parliamentary corpora (https://www.clarin.eu/resource-families/parliamentary-corpora), made up of parliamentary transcripts from across Europe (and the EU) and in multiple languages, to follow in this tradition. Selected based on the languages known by the group participants, this unique dataset will provide researchers with abundant opportunities to engage with political discussion and debates from multiple perspectives: over time, across borders, in various languages, across political affiliation, on particular issues, etc.
- Views on the nation, Europe, and the world
- Polarization and rudeness in parliamentary debates
- Views on the Future
- Tone of the debate
Potential research questions:
- What are the issues which have dominated or disappeared from political debate?
- How have important political debates, concepts, and/or topics changed over time? (i.e., diachronic concept change)
- Is the tone of the debate related to political stability?
- Did debates polarize over time?
- Are debates future or past-oriented?
- Various forms of quantitative text analysis (text mining, topic modelling, natural language processing, etc.)
- Statistical analysis/modelling
- Political scientists
- Experts in EU languages
The internet has become a treasure trove of data on any imaginable topic. Particularly, sources cover the minutiae of any topic which has a fan-base: popular music, films, anime, games, memes, etc… Much of this information has come to be aggregated and standardized, ready to be explored by and analysed by data enthusiasts and research professionals alike, with applications for computer science, data analytics, or cultural studies. Of particular interest to humanities researchers, these collections can provide a birds-eye view of cultural dynamics in ways which are more comprehensive than before, as well as more firmly grounded in data. We can begin to statistically answer questions such as: What were the trends that guided the 20th century film industry? What are the upcoming topics in new science-fiction works? How and when did Scandinavia become a center for jazz music? What helps a meme spread, its community or content?
This group will choose a particular domain(s) of interest – e.g. books, films, music – and develop associated questions that these datasets could provide answers to. To this end, the group will need to find a good match between questions and data: not all questions can feasibly be answered by all dataset, nor within the time constraints of the hackathon. Depending on research questions chosen, therefore, the group may be shown how to access public datasets and APIs on large aggregated information sources (such as Wikipedia, Wikidata, Goodreads, IMDb, Spotify, MyAnimeList, etc.); access already existing data in a structured format; or develop and identify its own datasets. The group will be challenged to look critically at the data, and find topics that could be meaningfully explored while collaborating with group members from various disciplinary backgrounds.
- Are dystopian films & books the latest trend, or have they always been around?
- Can you trust the scores on IMDb or Goodreads? To what extent do critics agree?
- How have improvements in consumer technology benefited computer games?
- Pop culture (!)
- Cultural history
- Data science
- Text mining
- Web interfaces
Coronavirus Epidemic and its racial politics on Social Media
In late January 2020, the breakout of new coronavirus in Wuhan, China sparkled a global conversation in the Twittersphere, sharing information and concerns, and often also spreading racist stereotypes and misinformation, which affected many diasporic Chinese and other East Asian communities in their diasporas. In this group, a sample of English-language Twitter data from the 23rd of January to… [?]] will be used to map the main frames, topics and themes that emerged in the weeks after the epidemic began, and how these evolved over time as it spread globally. We will use a combination of quantitative analysis of frequencies of topics and a large-scale mapping of interrelated hashtags, with a closer textual visual analysis of smaller samples of data.
Possible research questions include:
- How did the discussion change, from the outbreak of the epidemic, and over time?
- How was the Coronavirus framed? How did this framing differ geographically?
- How did health concerns mapped onto political issues such as closure of borders, impact on international travel or the economy?
- How were racial stereotypes deployed, which groups did they target, and what kind of alternative did they generate?
- What textual and visual means were used in the coverage of the epidemic and how were these distributed globally?
The group’s focus will be on text reuse, by which is meant almost word-to-word circulation of passages of texts in different publications. Examining patterns of reuse opens interesting venues for research from historical, linguistic and literary perspectives. They can be used to, for example, analyse the influence of specific segments of texts in circulation of news, the role that repeated segments have in linguistic changes, linguistic features predicting wide reusability of segments.
In concrete terms this enables a wide variety of historical, linguistic and literary research questions, such as tracing of circulation of "viral" news items from London's papers to the provinces, examining reach and variety of advertisement or practices of quoting in argumentative text. Other possibilities include analysing genre specific conventions of reuse for example by examining the role that reuse plays in literary genres such as serials published in newspapers.
The dataset used by the group is that of Burney Collection, which is available in a text data format. The cases of test reuse have been pre-detected with the BLAST algorithm which allows the group to concentrate on interpretation of patterns of reuse without having to work with the actual reuse detection. This leaves the group free to form its own hackathon research questions based on its interests and resources.
Space Wars / Multilingual Newspapers (NewsEye)
The Space Wars group at DHH20 will develop a multilingual case that investigates which places dominated the news reporting during the war efforts between 1914 and 1918. The group will identify news articles that relate to the war and extract names of places in order to discover which war efforts were covered in the large multilingual collection of historical news.
The digitized newspapers are provided by the project “NewsEye: A digital investigator for historical newspapers”. The NewsEye collection includes digitized newspapers from Austria, France, and Finland in five languages, namely German, French, Finnish, Swedish and English. We assume that at least one person in the group would know each of these languages, though it is not required to know any of them to join the group (except for English). In addition to data collections, NewsEye provides text processing tools, accessible in the user interface or from API, though it is possible to use anything else to analyse the collection.
The aim is not to describe the different battles and war efforts as such, but to compare which locations seemed relevant depending on the viewpoint of the different language papers. Helsinki, Vienna, and Paris got their news through different channels. Consequently, the imagination of what and where things happened during the Great War looked different in those places. Through a systematic comparison, we may be able to understand the spatial imaginaries of war.
Possible tasks are:
- Identifying key features in war reporting in Paris, Vienna and Helsinki
- Creating a method to automatically identify articles relating to the war
- Evaluating the precision and recall of the method
- Extracting named entities from chosen articles and linking them to places on a map
- Place extracted named entities in a dynamic network
- Analyzing the locations which do appear in the newspapers and comparing them to their role in the historiography of the Great War
- Dynamic visualization of locations on different maps
- Contextualizing our source newspapers with information about their history, affiliations and information channels