Infrastructures for Digital Humanities
October 23 (Tuesday), 2018, 9:00–19:00
University of Helsinki, Main Building, Small Hall (Pieni juhlasali), 4050
Fabianinkatu 33, Helsinki, FINLAND
Helsinki Centre for Digital Humanities (HELDIG) was launched by a kick-off symposium on Oct 6, 2016 that was attended by some 200 friends of Digital Humanities. The first HELDIG Digital Humanities Summit in 2017 provided a snapshot of activities within the centre and its collaboration network after the first year of operation, facilitating networking and sharing results within the Finnish community of Digital Humanities research and education and beyond. In 2018, the overarching theme of HELDIG Digital Humanities Summit is Infrastructures for Digital Humanities.
The Summit 2018 will present a picture of the Finnish Digital Humanities Infrastructure landscape, and contrast it with international developments: Where are we now? What services are there available? How are they used, and what are the next steps ahead? To stimulate discussions, the day starts with the keynote "Infrastructures and Interfaces for DH Research: Dutch Experiences and Expectations" by prof. Charles van den Heuvel from Huygens Institute and University of Amsterdam. After this follows presentations regarding existing data, vocabulary, and ontology infrastructures, language technology services, and data infrastructures related to museums, libraries, and archives. In the end, bubbles and nibbles are served in a social networking event.
Participation in HELDIG Digital Humanities Summit 2018 is open and free, but registration is required for catering. Register here Monday 15th October latest:
For more information, please contact HELDIG coordinator Jouni Tuominen or director Eero Hyvönen (see HELDIG Contact Info).
Participating Finnish Organizations:
- HELDIG, Helsinki Centre for Digital Humanities (organizer)
- Aalto, Aalto University
- CSC, CSC - IT Center for Science Ltd
- Edita, Edita Publishing Ltd
- FHA, Finnish Heritage Agency
- FSD/UTA, Finnish Social Science Data Archive, University of Tampere
- HUL, Helsinki University Library
- Kotus, Institute for the Languages of Finland
- MJF, Ministry of Justice, Finland
- NAF, National Archives of Finland
- NLF, National Library of Finland
- SKS, Finnish Literature Society
- SLS, Society of Swedish Literature in Finland
- UH, University of Helsinki
- UTU, University of Turku
- Yle, National Broadcasting Company
|09:00-09:05||HELDIG Summit 2018 Opening||Eero Hyvönen (HELDIG, Aalto)||Presentation|
|1||Infrastructures and Interfaces for Digital Humanities Research: Dutch Experiences and Expectations||Prof. Charles van den Heuvel (Huygens ING, University of Amsterdam)||
|10:30-11:30||International Infrastructure Collaborations||Chair: Eetu Mäkelä|
|2||CLARIN – Digital Research Infrastructure for the Humanities and Social Sciences||Mietta Lennes (HELDIG)||FIN-CLARIN homepage, Kielipankki – The Language Bank of Finland, presentation
The FIN-CLARIN consortium is the Finnish part of the European CLARIN collaboration building a research infrastructure for language-related resources in Humanities and Social Sciences.
The Language Bank of Finland is a collection of resources, tools and services for researchers.
|3||Finnish Social Science Data Archive and CESSDA ERIC: Trusted, Sustainable and Integrated Infrastructures||Mari Kleemola (FSD/UTA)||FSD homepage, presentation
The Finnish Social Science Data Archive (FSD) provides a single point of access to a wide range of digital research data for learning, teaching and research purposes. FSD is a CTS certified repository and implements the FAIR data principles responsibly, promoting open access to research data as well as transparency, accumulation and efficient reuse of research data. CESSDA ERIC, Consortium of European Social Science Data Archives, is a research infrastructure for social science data archives in Europe. FSD is the Finnish Service Provider for CESSDA. This presentation will give an overview of FSD’s and CESSDA’s current services for researchers. These include, for example, long-term preservation of data, FSD’s data portal Aila, CESSDA’s European data catalogue, online guides on data management, and information services. In addition, I will take a glimpse into the forthcoming Social Sciences and Humanities Open Cloud (SSHOC) project, lead by CESSDA, that plans to realise the social sciences and humanities’ part of European Open Science Cloud (EOSC). All SSH ESFRI Landmarks and Projects, as well as relevant international SSH data infrastructures and the association of European research libraries (LIBER) participate in SSHOC.
|4||Towards DARIAH – Digital Research Infrastructure for the Arts and Humanities. DESIR project.||Mikko Tolonen (HELDIG), Maija Paavolainen (HUL)||DESIR homepage, presentation
University of Helsinki is taking part in the H2020 DESIR-project (2017–19) to make the Finnish DH community acquainted with DARIAH-EU. We will present our progress, future goals and invite you to join in developing the ideas for DARIAH-FI.
|5||LODI4DH – Linked Open Data Infrastructure for Digital Humanities||Eero Hyvönen (HELDIG, Aalto), Jouni Tuominen (HELDIG)||LODI4DH homepage, presentation
LODI4DH is a joint initiative of Aalto University, Department of Computer Science, and University of Helsinki, HELDIG Centre for Digital Humanities, for creating centralized national Linked Data services for open science. The services enable publication and utilization of datasets for data-intensive DH research in structured, standardized formats via open interfaces. LODI4DH is based on the large collaboration network and software created during a long line of national projects in DH between UH and Aalto since 2002 that created several in-use infrastructure prototypes, such as the ONKI ontology service, Finto ontology service at the National Library of Finland (that deployed SKOS-based parts of ONKI as a national service, and has been developing them further), and Linked Data Finland platform LDF.fi.
|11:30-12:15||International Projects||Chair: Eetu Mäkelä|
|6||READ Project: Handwritten Text Analysis Service||Maria Kallio (NAF)||READ homepage, presentation|
|7||NewsEye Project Seeks Significant Scientific Advances in Several Directions||Juha Rautiainen (NLF), Ruben Ros and Helsinki Computational History Group (UH)||NewsEye homepage, abstract, presentation
In the inter-, multidisciplinary and multilingual NewsEye project national libraries, humanities and social science research groups and computer science research groups are addressing a number of challenges in several directions. The project, launched in May 2018, develops integrated tools and methods for effective exploration and exploitation of digital newspapers by means of new technologies. The aim is to set a new standard by working on the interaction between methods such as layout analysis, automatic text recognition and article separation with various, newly incorporated semantic approaches.
|8||Copyrighted Media for Research? Yle, MeMAD Project, and Future Plans||Lauri Saarikoski (Yle)||MeMAD homepage, presentation
This presentation will give a brief overview on MeMAD project and its recent developments in the fields of automated audiovisual analysis and multilingual machine translation. Reflecting on this project, the presentation will discuss possibilities for making broadcasting archives more available for research.
|12:15-13:15||Lunch (on your own)|
|13:15-14:30||Vocabulary and Data Infrastructures||Chair: Jouni Tuominen|
|9||NameSampo: A Linked Open Data Infrastructure and Workbench for Toponomastic Research||Esko Ikkala (HELDIG, Aalto), Jouni Tuominen (HELDIG), Jaakko Raunamaa (UH), Tiina Aalto (UH), Terhi Ainiala (UH), Helinä Uusitalo (Kotus), Eero Hyvönen (HELDIG, Aalto)||NameSampo homepage, presentation
We present a series of projects where one of the main sources for toponomastic research in Finland, the corpora of place names in the Names Archive database of the Institute for the Languages of Finland, was digitized and how the resulting database was converted, enriched and published as Linked Open Data using a data processing pipeline. Utilizing the Linked Data infrastructure and various external data sources, a modern full-stack web application, NameSampo, was created in collaboration between toponomastic researchers and computer scientists for searching, analyzing, and visualizing digital toponomastic data sources.
|10||A Shared Agent Metadata Service for the Memory Organization Sector||Matias Frosterus (NLF)||Presentation
Libraries, archives and museums have begun to align their metadata in order to prevent overlap in cataloguing, improve the quality of the metadata, and support better common user interfaces like Finna. After two preliminary reports on a shared agent metadata service, a more concrete plan for a pilot is being developed. The pilot project would also consider the possibilities of linking to public administration sources such as trade registries and population information systems.
|11||The Helsinki Term Bank for the Arts and Sciences – Connecting People, Discourses and Disciplines||Johanna Enqvist (UH)||Helsinki Term Bank homepage, presentation
“The Helsinki Term Bank for the Arts and Sciences” (HTB) is a multidisciplinary research infrastructure project which is constructing both digital terminological resource and an innovative form of academic collaboration and publishing. The project aims to build an open, permanent and continuously updated terminological database for all fields of research in Finland. The HTB maintains a wiki-based website which offers a collaborative platform for terminological work and conceptual analysis for experts, and a discussion forum available for all registered users.
|12||National Infrastructures of CSC for Digital Humanities||Jessica Parland-von Essen (CSC)||CSC homepage, presentation
CSC – IT center for science offers a wide range of services for researchers, from computing and cloud environments to tools and data sharing. A quick overview and latest service development will be presented.
|13||Linked Data and Terminology||Lauri Carlson (HELDIG)||
*** Presentation canceled due to illness ***
|14:30-15:30||Language Technology Infrastructures||Chair: Jouni Tuominen|
|14||Turku Natural Language Processing Infrastructures||Aleksi Vesanto (UTU)||Turku NLP homepage, presentation
Finnish Internet Parsebank is an 8 billion token corpus of web crawled Finnish text including automatically produced morpho-syntactic analysis. Such large-scale corpus is optimal for providing material for language technology as well as linguistics research. Turku neural parser pipeline, used for word and sentence segmentation, lemmatization, morphological tagging and syntactic analysis, is a stand-alone Python based pipeline with state-of-the-art models available for over 50 languages. BLAST is a robust text reuse detection software capable of processing massive databases of data and detecting repeated passages even when the data itself is very noisy. This presentation gives a brief introduction to the above corpus and tools.
|15||Automated Subject Indexing and Classification Using Annif||Osma Suominen (NLF)||Annif homepage, presentation
Manually indexing documents for subject-based access is a very labour-intensive intellectual process. A machine could perform similar subject indexing much faster. However, an algorithm needs to be trained and tested with examples of indexed documents. Libraries have a lot of training data in the form of bibliographic databases, but often only a title is available, not the full text. We propose to leverage both title-only metadata and, when available, already indexed full text documents to help indexing new documents. To do so, we are developing Annif, an open source tool for automated indexing and classification.
|16||SeCo Text Annotation Services / Language tools in service of the humanities/social sciences||Minna Tamper (Aalto), Eetu Mäkelä (HELDIG)||Presentation
Much of the primary data of Digital Humanities is available only in textual form, and there is an ever-growing need for structuring it for semantic analysis. Transforming Cultural Heritage texts into a knowledge graph and a Linked Data service provide a flexible interface to use the data for different types of analyses.
As part of the Humanties-Computing Interaction research strand, multiple tools for applying language analysis in the service of the humanities and social sciences have been (and are being) developed, presented here.
|17||Anonymization Service for Texts and Semantic Finlex||Saara Packalén (MJF), Aki Hietanen (MJF), Minna Tamper (Aalto), Arttu Oksanen (Edita, Aalto), Jouni Tuominen (HELDIG), Eero Hyvönen (HELDIG, Aalto)||Presentation
The project focuses on producing tools to anonymize and annotate documents and records that contain personal data, e.g. court decisions. The tools are developed by using methods of language technology. Results of the project are based on open source data, and therefore, are widely applicable. The project was launched by the Ministry of Justice, and is carried out in co-operation with the University of Helsinki (HELDIG), Aalto University and Edita Publishing Oy.
|16:00-17:30||Museum, Library, and Archive Infrastructures||Chair: Mikko Tolonen|
|18||SuALT – Infrastructure for Finnish Archeological Finds||Suzie Thomas (UH), Anna Wessman (UH), Eero Hyvönen (Aalto, HELDIG), Jouni Tuominen (HELDIG, Aalto), Esko Ikkala (Aalto, HELDIG), Mikko Koho (Aalto), Ulla Salmela (FHA), Jutta Kuitunen (FHA), Marianna Niukkanen (FHA), Miikka Haimila (FHA), Ville Rohiola (FHA)||SuALT homepage, presentation
SuALT – The Finnish Archaeological Finds Recording Linked Open Database – is a concept for a digital web service for collecting information of archaeological finds made by the public, especially metal detectorists. In recent years, the growing flow of new archaeological finds and data made by public has represented unprecedented challenges to metal detectorists, researchers and cultural heritage managers particularly in the Archaeological collections of Finnish Heritage Agency (Fi. Museovirasto) managing the data about the finds. As a multidisciplinary research project, SuALT, develops innovative solutions for reporting, collecting and managing metal detecting finds, applying citizen science and semantic computing. Regarding different user needs, SuALT confirms the managing of digital collections, enables variety of searches, and analyzes of the open access data. To make new data easily obtainable, finds are connected to existing collections nationally and internationally. SuALT is by nature a participatory project where the metal detecting community, students, researchers and the museum authorities can influence the data in a democratic way from the grass root level.
|19||Finnish National Bibliography Fennica as Linked Open Data||Osma Suominen (NLF)||data.nationallibrary.fi, presentation
The National Library of Finland has made our national bibliography Fennica available as Linked Open Data. In the process, we are clustering works extracted from the bibliographic records, reconciling entities against internal and external authorities, cleaning up many aspects of the data and linking it to further resources. The Linked Data set is CC0 licensed and available for browsing, SPARQL querying and downloading at the data.nationallibrary.fi web site.
|20||Re-defining Our Services – National Library’s New Initiatives to Support the Open Science||Johanna Lilja (NLF), Jussi-Pekka Hakkarainen (NLF)||Project homepage, abstract, presentation
Any relationship needs to be pampered from time to time. You got to respect your partner, listen to her, bring the flowers, prepare a dinner and talk, talk, talk. This applies to a relationship of researchers and libraries as well – a small spark may kindle a great fire and therefore we at the National Library are defining our services for researchers in a new way...
|21||Towards More Flexible Digitizing Selections at the National Archives of Finland||Tomi Ahoranta (NAF)||Presentation
The National Archives has been relying on annual digitizing agenda for more than a decade. Now we are moving towards more flexible procedure by need-based, cooperational and voluntary digitizing. What does these new methods offer to digital humanities? How can individual researchers of research projects benefit from them?
|22||A Generic Platform for Digital Editions||Niklas Liljestrand (SLS)||Presentation
An ongoing Open Source project by The Society of Swedish Literature in Finland for streamlining work with publishing Digital Editions (critical editions) on the web. The platform provides tools for working with, among others, TEI formatted XML and publishing the results on a responsive website with rich opportunities for customization. This presentation will give a visual and technical overview of the platform, it's goals and components.
|23||BiographySampo: Infrastructure for Finnish Biographical Data||Petri Leskinen (Aalto), Minna Tamper (Aalto), Esko Ikkala (Aalto, HELDIG), Jouni Tuominen (HELDIG), Heikki Rantala (Aalto, UH), Kirsi Keravuori (SKS), Eero Hyvönen (Aalto, HELDIG)||BiographySampo online, BiographySampo homepage, presentation
The BiographySampo is a brand new data service based on more than 13 000 short biographies in the National Biography and other databases of the Finnish Literature Society. The material, enriched with information from other web sources, offers new tools for carrying out digital humanities research and data analysis of biographies using artificial intelligence and linked open data technologies. We will introduce the system, its features, and discuss their use in biographical and prosopographical research.
|17:30-19:00||Networking and Bubbles||Outside the Small Hall|