Helsinki Centre for Digital Humanities (HELDIG) was launched by a kick-off symposium on Oct 6, 2016 that was attended by some 200 friends of Digital Humanities. The first HELDIG Digital Humanities Summit in 2017 provided a snapshot of activities within the centre and its collaboration network and in HELDIG Digital Humanities Summit 2018 the overarching theme of was Infrastructures for Digital Humanities.
The special theme of HELDIG Summit 2019 is From Text to Knowledge. Most content used in DH research and applications is available originally in more of less unstructured textual form, e.g., as books, articles, newspapers, legislation and legal documents, web pages, social media discussions, parliamentary materials, letters, and folklore. Even in databases, such as collections in museums, archives, and libraries, much of the content may be in unstructured textual form. A key task in using texts computationally is to extract – in one way or another – structure and meaning from strings using methods such as linguistic analysis and natural language understanding, named entity recognition and linking, relation and event extraction, data mining, machine learning, topic modelling, and reference and network analysis.
To set context, the day starts with the keynote "Square pegs and round holes: addressing the mismatch between humanities questions and the state-of-the-art in language technology" by Dr. Marieke van Erp from KNAW, Amsterdam. After this follows presentations regarding tools and infrastructures for digitizing, processing, and analyzing large text data, and services for publishing textual datasets.
In the afternoon, we have the keynote "Legislative data portals and linked data quality" by prof. Jose Emilio Labra Gayo from University of Oviedo. The day continues with presentations of research projects and applications using digital humanities approaches for analyzing texts. To end the day, bubbles and nibbles are served in a social networking event.
The event will be streamed at http://video.helsinki.fi/unitube/live-stream.html?room=l21
Participation in HELDIG Digital Humanities Summit 2019 is open and free but registration is necessary for catering purposes. Register here Wednesday 30th October latest:
https://elomake.helsinki.fi/lomakkeet/100681/lomake.html
For more information, please contact HELDIG coordinator Jouni Tuominen or director Eero Hyvönen (see HELDIG Contact Info).
PRogramme
Time | Topic | Presenter(s) | Material |
---|---|---|---|
09:00–09:05 | HELDIG Summit 2019 Opening | Eero Hyvönen HELDIG, Aalto |
Presentation |
09:05–10:00 | Keynote | Chair: Eero Hyvönen | |
1 | Square pegs and round holes: addressing the mismatch between humanities questions and the state-of-the-art in language technology | Dr. Marieke van Erp KNAW, Amsterdam |
Presentation In this talk, I will discuss several projects in which we needed to address the mismatch between language technology tools and the humanities research objectives, and how we can go forward in fitting our computational methods to the diversity of humanities research questions. Bio: Marieke van Erp leads the Digital Humanities Lab at the Royal Netherlands Academy of Arts and Sciences Humanities Cluster in Amsterdam, the Netherlands. Her research is focused on combining natural language processing and semantic web applications in the digital humanities domain. She previously worked on the European NewsReader project, which was aimed at building structured indexes of events from large volumes of financial news and the CLARIAH project, a large Dutch project to develop infrastructure for humanities research. |
10:00–10:30 | Coffee Break | ||
10:30–12:30 | Tools and Infrastructures | Chair: Jouni Tuominen | |
2 | Publishing linguistic typological data at UH | Kaius Sinnemäki University of Helsinki |
Presentation |
3 | Words and actions | Andrey Indukaev and Daria Gritsenko University of Helsinki |
Words and actions project, presentation |
4 | Nokia – Person, Location, Organisation, Product, Event, Time or Common Noun? – Automatically Recognizing and Categorizing Names in Finnish Text | Krister Lindén University of Helsinki |
Presentation |
5 | Digitized newspaper clippings | Tuula Pääkkönen National Library of Finland |
Presentation In this presentation we go through the clipping functionalities of the National Library's https://digi.kansalliskirjasto.fi service and what kind of newspaper clipping collections there are already. We invite you to utilize the over 15 million of digitized pages that the newspapers and journals have. |
6 | Handwritten text recognition & search platform for historical court records | Sampo Viiri and Ville-Pekka Kääriäinen National Archives of Finland |
https://transkribus.eu/r/kansallisarkisto/ platform, presentation |
7 | Learning to understand languages with neural networks | Jörg Tiedemann University of Helsinki |
FoTran project, presentation Natural language understanding is the “holy grail” of computational linguistics and a long-term goal in research on artificial intelligence. The aim of the project is to develop models that learn to understand human languages by training on implicit information given by large collections of human translations. Translations are considered as alternative explanations providing additional views on information encoded in natural language. In FoTran we apply massively parallel data sets to acquire language-agnostic meaning representations that can be used for reasoning with natural languages and for other downstream tasks that require a deep understanding of the linguistic input. |
8 | Linked Open Data Service about Historical Finnish Academic People in 1640–1899 | Petri Leskinen and Eero Hyvönen University of Helsinki and Aalto University |
AcademySampo – Finnish Academic People 1640–1899 project, presentation |
9 | Exploring the Identity of the Ruling Elite in Cuneiform Text | Heidi Jauhiainen University of Helsinki |
Centre of Excellence on Ancient Near Eastern Empires, presentation |
10 | Overcoming Civil Wars? Comparative Conflict Resolution Models for Generational Recovery | Jussi Pakkasvirta University of Helsinki |
Presentation |
11 | Studying pseudo-history and text reuse in Finnish and Russian internet discussions | Reima Välimäki, Heta Aali, Mila Oiva, Anna Ristilä and Harri Hihnala University of Turku |
The Ancient Finnish Kings: a computational study of pseudohistory, medievalism and history politics in contemporary Finland and Russia (2019–21) project, presentation |
12 | Time Machine – Finland’s agenda in the massive European flagship project | Tomi Ahoranta National Archives of Finland |
Time Machine project, presentation Time Machine is a massive European initiative that aims at solving some of the biggest challenges of our time in the field of digital cultural heritage. If everything goes smoothly, the project will begin in 2021. In my presentation, I will tell what the project is all about and what we are doing in Finland to connect to it. Please visit www.timemachine.eu to find out more about the intended project. You are also welcome to join our team in https://www.timemachine.eu/registration/. |
12:30–13:30 | Lunch (on your own) | ||
13:30–14:30 | Keynote | Chair: Eero Hyvönen | |
13 | Legislative data portals and linked data quality | Prof. Jose Emilio Labra Gayo University of Oviedo |
Presentation Bio: PhD. Jose Emilio Labra Gayo from the University of Oviedo, Spain, is the main researcher of the WESO (WEb Semantics Oviedo) research group which applies semantic technologies to different domains like public administrations, eProcurement, life sciences, etc. He was a member of the W3C Data Shapes working group, co-author of the “Validating RDF data” book (http://book.validatingrdf.com) and maintains the online RDF validation service RDFShape (http://rdfshape.weso.es). |
14:30–15:00 | Coffee Break | ||
15:00–17:00 | Applications | Chair: Krista Lagus | |
14 | ’Metoo machine’ for those who do not campaign | Minna Ruckenstein University of Helsinki |
Lääketutka website, presentation Building on the findings of a study of antidepressants and their life-effects, taking advantage of a computational tool, Medicine Radar, this presentation argues that large datasets can be used for uncovering first-person experiences that need more attention. |
15 | Disappearing Discourses: Avoiding anachronisms and teleology with data-driven methods in studying digital newspaper collections | Elaine Zosa, Simon Hengchen, Lidia Pivovarova, Jani Marjanen and Mikko Tolonen University of Helsinki |
Presentation Research on the past tends to focus on topics that are relevant for the present. Recent unsupervised approaches allow for changing the perspective and concentrating on concepts that were important at the time of original publication even if they have since become less central. We claim that there is great potential in looking for themes that disappeared once new topics and values took over in the public sphere as they capture relevant parts of the historical experiences of past readers. This paper aims at identifying such disappearing discourses by using dynamic topic modeling for the collection of Finnish newspapers from the 19th to the early 20th century. |
16 | Bibliographic Data Harmonization in Research | Leo Lahti & Helsinki Computational History Group University of Turku and University of Helsinki |
Helsinki Computational History Group, Turku Data Science Group, presentation Research potential of bibliographic metadata collections have been recognized for decades but questions of data representativeness, completeness, and reliability have posed challenges for large-scale research use. Structured metadata collections can also remarkably complement and support the analysis of other digital data streams, such as full texts or audiovisual material. We showcase how a systematic algorithmic framework for large-scale data harmonization has helped us to overcome these challenges and generate new insights into broad historical patterns of knowledge production in Finland and Europe. |
17 | New words in early English letters: How to find them and what they can reveal | Tanja Säily, Eetu Mäkelä, Mika Hämäläinen University of Helsinki |
Presentation We apply a big-data approach to analysing the use of new vocabulary in a sociohistorical corpus. Our contribution is threefold: (1) we study a wider range of neologisms than previous corpus-based research has done; (2) to enable such a large-scale investigation, we develop a semi-automated pipeline; and (3) while building upon existing historical research and resources, we cover a wider social spectrum, as previous work is biased towards published texts by well-known authors. We present a case study of 17th-century neologisms identified through our pipeline. In addition to analysing their social embedding (who used them and why), we will discuss problems and solutions regarding the pipeline under development, including our methods of spelling normalization required to map the words across the resources. |
18 | Visualizing Mito: From text to a procedural view of Japanese intellectual historiography | Aliz Horvath University of Chicago |
Presentation Digital humanities constitutes a rapidly developing “field”, but it is still primarily dominated by inquiries focused on Western themes and texts. In this talk, however, I will introduce a possible application of digital tools, specifically data visualizations, in the context of the intellectual history of historiography in East Asia. Focusing on the procedural study of the Japanese Mito School, a controversial scholarly group that compiled the Dai Nihonshi (The History ofGreat Japan, 1657-1906), the most monumental history writing product in Japan, my project explores the shifting dynamics of intellectual history and historiography, as well as the significance of foreign elements in the formation of nationalism in an East Asian context. Due to the monumentality of the overarching theme, more specifically the length of the Dai Nihonshi, the 250 year-long compilation process, and the high number of contributors (more than 150 individuals), I developed a hybrid and integrated methodology to process the large amount of data by intertwining the close reading of the Dai Nihonshi and the individual records of the compilers with the embedded visualizations of the authors’ biographical details. My presentation will explain how the nature of dealing with non-Latin scripts affected the research process that led from text to knowledge. |
19 | Tracing democratization in (big and messy) digital newspaper archives | Turo Hiltunen, Turo Vartiainen and Minna Palander-Collin University of Helsinki |
Democratization, Mediatization and Language Practices project, presentation The British Library Newspapers database provides a plentiful source for linguistic research with its 2 million newspaper pages of national and regional newspapers from 1732 to 1950. The data exist as OCRed text files, which, in principle, constitute an ideal source for exploring how societal changes such as democratization and changes in language practices are intertwined. In practice, however, it has been a complex process to tame the data to the extent that it can be used to answer our sociolinguistic research questions. We shall elaborate on this process and methodological issues including the granularity of data, lack of linguistic annotation, quality of scanned documents, and representativeness. |
20 | Internet Folklore and Online Mediated Identity – A Netnography Study in Nyishi Community, Arunachal Pradesh | Deepika Kashyap University of Tartu and University of Hyderabad |
Presentation Modern technologies and innovations have transformed the culture and tradition of the Nyishi community. It has created a new identity for Nyishi people through the internet. The penetration of the internet or the WWW (world wide web) allowed the folk to express and represent their “lore”- culture, custom and tradition to a wider mass. The Internet also opened up for a new mode of communication where people can create and circulate the messages easily. With the advancement of technology and internet, many culture, tradition, and folklore around the world have revived and reaching out to the people by crossing the geographical barrier as well as the time limit. Nowadays people from many communities are coming online and, creating a space of their own and expressing their identity, culture, and agency. Nyishi folklore is also taking pace with the help of internet technology. |
21 | Language Technology for Publishing and Using Finnish Legislation and Case Law on the Semantic Web | Minna Tamper, Arttu Oksanen, Sami Sarsa, Jouni Tuominen, Aki Hietanen and Eero Hyvönen Aalto University, Ministry of Justice, Edita Ltd and University of Helsinki |
ANOPPI, APPI, LawSampo and Semantic Finlex project, presentation |
22 | Four Generations of Publishing and Using Texts in Digital Humanities: Forging Sampo Portals in the Digital Age |
Eero Hyvönen University of Helsinki and Aalto University |
LODI4DH project and Sampo Portals, presentation |
23 | How to Use Linked Data Infrastructure for Digital Humanities? – Practical View | Jouni Tuominen University of Helsinki and Aalto University |
Presentation This talk presents a practical view on how Linked Data can be used for Digital Humanities, with a focus on SPARQL queries for accessing the data for analysis purposes. |
24 | Current work in Helsinki Computational History Group | Mikko Tolonen University of Helsinki |
Helsinki Computational History Group, presentation |
25 | FCAI Special Interest Group in Language, Speech and Cognition | Jörg Tiedemann University of Helsinki |
Presentation
|
17:00–18:00 | Networking, Nibbles, and Bubbles |