HELDIG Digital Humanities Summit 2018

Infrastructures for Digital Humanities

October 23 (Tuesday), 2018, 9:00–19:00
University of Helsinki, Main Building, Small Hall (Pieni juhlasali), 4050
Fabianinkatu 33, Helsinki, FINLAND

Helsinki Centre for Digital Humanities (HELDIG) by a kick-off symposium on Oct 6, 2016 that was attended by some 200 friends of Digital Humanities. The provided a snapshot of activities within the centre and its collaboration network after the first year of operation, facilitating networking and sharing results within the Finnish community of Digital Humanities research and education and beyond. In 2018, the overarching theme of HELDIG Digital Humanities Summit is Infrastructures for Digital Humanities.

The Summit 2018 will present a picture of the Finnish Digital Humanities Infrastructure landscape, and contrast it with international developments: Where are we now? What services are there available? How are they used, and what are the next steps ahead? To stimulate discussions, the day starts with the keynote "Infrastructures and Interfaces for DH Research: Dutch Experiences and Expectations" by prof. Charles van den Heuvel from Huygens Institute and University of Amsterdam. After this follows presentations regarding existing data, vocabulary, and ontology infrastructures, language technology services, and data infrastructures related to museums, libraries, and archives. In the end, bubbles and nibbles are served in a social networking event.

Registration

Participation in HELDIG Digital Humanities Summit 2018 is open and free, but registration is required for catering. Register here Monday 15th October latest:

For more information, please contact HELDIG coordinator Jouni Tuominen or director Eero Hyvönen (see ).

Programme

Participating Finnish Organizations:

HELDIG, Helsinki Centre for Digital Humanities (organizer)
Aalto, Aalto University
CSC, CSC - IT Center for Science Ltd
Edita, Edita Publishing Ltd
FHA, Finnish Heritage Agency
FSD/UTA, Finnish Social Science Data Archive, University of Tampere
HUL, Helsinki University Library
Kotus, Institute for the Languages of Finland
MJF, Ministry of Justice, Finland
NAF, National Archives of Finland
NLF, National Library of Finland
SKS, Finnish Literature Society
SLS, Society of Swedish Literature in Finland
UH, University of Helsinki
UTU, University of Turku
Yle, National Broadcasting Company

Time	Topic	Presenter(s)	Material
09:00-09:05	HELDIG Summit 2018 Opening	(HELDIG, Aalto)
09:05-10:00	Keynote
1	Infrastructures and Interfaces for Digital Humanities Research: Dutch Experiences and Expectations	Prof. (Huygens ING, University of Amsterdam)	CLARIAH CLARIAH Plus CLARIAH finances the Amsterdam Time Machine: ADAM Net: Amsterdam Time Machine project Amsterdam Time Machine wiki Data for History Golden Agents: Virtual Interiors:
10:00-10:30	Coffee Break
10:30-11:30	International Infrastructure Collaborations	Chair:
2	CLARIN – Digital Research Infrastructure for the Humanities and Social Sciences	Mietta Lennes (HELDIG)	, , The FIN-CLARIN consortium is the Finnish part of the European CLARIN collaboration building a research infrastructure for language-related resources in Humanities and Social Sciences. The Language Bank of Finland is a collection of resources, tools and services for researchers.
3	Finnish Social Science Data Archive and CESSDA ERIC: Trusted, Sustainable and Integrated Infrastructures	Mari Kleemola (FSD/UTA)	, The Finnish Social Science Data Archive (FSD) provides a single point of access to a wide range of digital research data for learning, teaching and research purposes. FSD is a CTS certified repository and implements the FAIR data principles responsibly, promoting open access to research data as well as transparency, accumulation and efficient reuse of research data. CESSDA ERIC, Consortium of European Social Science Data Archives, is a research infrastructure for social science data archives in Europe. FSD is the Finnish Service Provider for CESSDA. This presentation will give an overview of FSD’s and CESSDA’s current services for researchers. These include, for example, long-term preservation of data, FSD’s data portal Aila, CESSDA’s European data catalogue, online guides on data management, and information services. In addition, I will take a glimpse into the forthcoming Social Sciences and Humanities Open Cloud (SSHOC) project, lead by CESSDA, that plans to realise the social sciences and humanities’ part of European Open Science Cloud (EOSC). All SSH ESFRI Landmarks and Projects, as well as relevant international SSH data infrastructures and the association of European research libraries (LIBER) participate in SSHOC.
4	Towards DARIAH – Digital Research Infrastructure for the Arts and Humanities. DESIR project.	Mikko Tolonen (HELDIG), Maija Paavolainen (HUL)	, University of Helsinki is taking part in the to make the Finnish DH community acquainted with . We will present our progress, future goals and invite you to join in developing the ideas for DARIAH-FI.
5	LODI4DH – Linked Open Data Infrastructure for Digital Humanities	Eero Hyvönen (HELDIG, Aalto), Jouni Tuominen (HELDIG)	, LODI4DH is a joint initiative of Aalto University, Department of Computer Science, and University of Helsinki, HELDIG Centre for Digital Humanities, for creating centralized national Linked Data services for open science. The services enable publication and utilization of datasets for data-intensive DH research in structured, standardized formats via open interfaces. LODI4DH is based on the large collaboration network and software created during a long line of national projects in DH between UH and Aalto since 2002 that created several in-use infrastructure prototypes, such as the , at the National Library of Finland (that deployed SKOS-based parts of ONKI as a national service, and has been developing them further), and Linked Data Finland platform .
11:30-12:15	International Projects	Chair: Eetu Mäkelä
6	READ Project: Handwritten Text Analysis Service	Maria Kallio (NAF)	,
7	NewsEye Project Seeks Significant Scientific Advances in Several Directions	Juha Rautiainen (NLF), Ruben Ros and Helsinki Computational History Group (UH)	, , In the inter-, multidisciplinary and multilingual NewsEye project national libraries, humanities and social science research groups and computer science research groups are addressing a number of challenges in several directions. The project, launched in May 2018, develops integrated tools and methods for effective exploration and exploitation of digital newspapers by means of new technologies. The aim is to set a new standard by working on the interaction between methods such as layout analysis, automatic text recognition and article separation with various, newly incorporated semantic approaches.
8	Copyrighted Media for Research? Yle, MeMAD Project, and Future Plans	Lauri Saarikoski (Yle)	, This presentation will give a brief overview on MeMAD project and its recent developments in the fields of automated audiovisual analysis and multilingual machine translation. Reflecting on this project, the presentation will discuss possibilities for making broadcasting archives more available for research.
12:15-13:15	Lunch (on your own)
13:15-14:30	Vocabulary and Data Infrastructures	Chair:
9	NameSampo: A Linked Open Data Infrastructure and Workbench for Toponomastic Research	Esko Ikkala (HELDIG, Aalto), Jouni Tuominen (HELDIG), Jaakko Raunamaa (UH), Tiina Aalto (UH), Terhi Ainiala (UH), Helinä Uusitalo (Kotus), Eero Hyvönen (HELDIG, Aalto)	, We present a series of projects where one of the main sources for toponomastic research in Finland, the corpora of place names in the Names Archive database of the Institute for the Languages of Finland, was digitized and how the resulting database was converted, enriched and published as Linked Open Data using a data processing pipeline. Utilizing the Linked Data infrastructure and various external data sources, a modern full-stack web application, NameSampo, was created in collaboration between toponomastic researchers and computer scientists for searching, analyzing, and visualizing digital toponomastic data sources.
10	A Shared Agent Metadata Service for the Memory Organization Sector	Matias Frosterus (NLF)	Libraries, archives and museums have begun to align their metadata in order to prevent overlap in cataloguing, improve the quality of the metadata, and support better common user interfaces like Finna. After two preliminary reports on a shared agent metadata service, a more concrete plan for a pilot is being developed. The pilot project would also consider the possibilities of linking to public administration sources such as trade registries and population information systems.
11	The Helsinki Term Bank for the Arts and Sciences – Connecting People, Discourses and Disciplines	Johanna Enqvist (UH)	, “The Helsinki Term Bank for the Arts and Sciences” (HTB) is a multidisciplinary research infrastructure project which is constructing both digital terminological resource and an innovative form of academic collaboration and publishing. The project aims to build an open, permanent and continuously updated terminological database for all fields of research in Finland. The HTB maintains a wiki-based website which offers a collaborative platform for terminological work and conceptual analysis for experts, and a discussion forum available for all registered users.
12	National Infrastructures of CSC for Digital Humanities	Jessica Parland-von Essen (CSC)	, CSC – IT center for science offers a wide range of services for researchers, from computing and cloud environments to tools and data sharing. A quick overview and latest service development will be presented.
13	Linked Data and Terminology	Lauri Carlson (HELDIG)	* Presentation canceled due to illness *
14:30-15:30	Language Technology Infrastructures	Chair: Jouni Tuominen
14	Turku Natural Language Processing Infrastructures	Aleksi Vesanto (UTU)	, Finnish Internet Parsebank is an 8 billion token corpus of web crawled Finnish text including automatically produced morpho-syntactic analysis. Such large-scale corpus is optimal for providing material for language technology as well as linguistics research. Turku neural parser pipeline, used for word and sentence segmentation, lemmatization, morphological tagging and syntactic analysis, is a stand-alone Python based pipeline with state-of-the-art models available for over 50 languages. BLAST is a robust text reuse detection software capable of processing massive databases of data and detecting repeated passages even when the data itself is very noisy. This presentation gives a brief introduction to the above corpus and tools.
15	Automated Subject Indexing and Classification Using Annif	Osma Suominen (NLF)	, Manually indexing documents for subject-based access is a very labour-intensive intellectual process. A machine could perform similar subject indexing much faster. However, an algorithm needs to be trained and tested with examples of indexed documents. Libraries have a lot of training data in the form of bibliographic databases, but often only a title is available, not the full text. We propose to leverage both title-only metadata and, when available, already indexed full text documents to help indexing new documents. To do so, we are developing Annif, an open source tool for automated indexing and classification.
16	SeCo Text Annotation Services / Language tools in service of the humanities/social sciences	Minna Tamper (Aalto), Eetu Mäkelä (HELDIG)	Much of the primary data of Digital Humanities is available only in textual form, and there is an ever-growing need for structuring it for semantic analysis. Transforming Cultural Heritage texts into a knowledge graph and a Linked Data service provide a flexible interface to use the data for different types of analyses. As part of the Humanties-Computing Interaction research strand, multiple tools for applying language analysis in the service of the humanities and social sciences have been (and are being) developed, presented here.
17	Anonymization Service for Texts and Semantic Finlex	Saara Packalén (MJF), Aki Hietanen (MJF), Minna Tamper (Aalto), Arttu Oksanen (Edita, Aalto), Jouni Tuominen (HELDIG), Eero Hyvönen (HELDIG, Aalto)	The project focuses on producing tools to anonymize and annotate documents and records that contain personal data, e.g. court decisions. The tools are developed by using methods of language technology. Results of the project are based on open source data, and therefore, are widely applicable. The project was launched by the Ministry of Justice, and is carried out in co-operation with the University of Helsinki (HELDIG), Aalto University and Edita Publishing Oy.
15:30-16:00	Coffee Break
16:00-17:30	Museum, Library, and Archive Infrastructures	Chair:
	Museum Collections
18	SuALT – Infrastructure for Finnish Archeological Finds	Suzie Thomas (UH), Anna Wessman (UH), Eero Hyvönen (Aalto, HELDIG), Jouni Tuominen (HELDIG, Aalto), Esko Ikkala (Aalto, HELDIG), Mikko Koho (Aalto), Ulla Salmela (FHA), Jutta Kuitunen (FHA), Marianna Niukkanen (FHA), Miikka Haimila (FHA), Ville Rohiola (FHA)	, SuALT – The Finnish Archaeological Finds Recording Linked Open Database – is a concept for a digital web service for collecting information of archaeological finds made by the public, especially metal detectorists. In recent years, the growing flow of new archaeological finds and data made by public has represented unprecedented challenges to metal detectorists, researchers and cultural heritage managers particularly in the Archaeological collections of Finnish Heritage Agency (Fi. Museovirasto) managing the data about the finds. As a multidisciplinary research project, SuALT, develops innovative solutions for reporting, collecting and managing metal detecting finds, applying citizen science and semantic computing. Regarding different user needs, SuALT confirms the managing of digital collections, enables variety of searches, and analyzes of the open access data. To make new data easily obtainable, finds are connected to existing collections nationally and internationally. SuALT is by nature a participatory project where the metal detecting community, students, researchers and the museum authorities can influence the data in a democratic way from the grass root level.
	Library Collections
19	Finnish National Bibliography Fennica as Linked Open Data	Osma Suominen (NLF)	, The National Library of Finland has made our national bibliography Fennica available as Linked Open Data. In the process, we are clustering works extracted from the bibliographic records, reconciling entities against internal and external authorities, cleaning up many aspects of the data and linking it to further resources. The Linked Data set is CC0 licensed and available for browsing, SPARQL querying and downloading at the data.nationallibrary.fi web site.
20	Re-defining Our Services – National Library’s New Initiatives to Support the Open Science	Johanna Lilja (NLF), Jussi-Pekka Hakkarainen (NLF)	, , Any relationship needs to be pampered from time to time. You got to respect your partner, listen to her, bring the flowers, prepare a dinner and talk, talk, talk. This applies to a relationship of researchers and libraries as well – a small spark may kindle a great fire and therefore we at the National Library are defining our services for researchers in a new way...
	Archives
21	Towards More Flexible Digitizing Selections at the National Archives of Finland	Tomi Ahoranta (NAF)	The National Archives has been relying on annual digitizing agenda for more than a decade. Now we are moving towards more flexible procedure by need-based, cooperational and voluntary digitizing. What does these new methods offer to digital humanities? How can individual researchers of research projects benefit from them?
22	A Generic Platform for Digital Editions	Niklas Liljestrand (SLS)	An ongoing Open Source project by The Society of Swedish Literature in Finland for streamlining work with publishing Digital Editions (critical editions) on the web. The platform provides tools for working with, among others, TEI formatted XML and publishing the results on a responsive website with rich opportunities for customization. This presentation will give a visual and technical overview of the platform, it's goals and components.
23	BiographySampo: Infrastructure for Finnish Biographical Data	Petri Leskinen (Aalto), Minna Tamper (Aalto), Esko Ikkala (Aalto, HELDIG), Jouni Tuominen (HELDIG), Heikki Rantala (Aalto, UH), Kirsi Keravuori (SKS), Eero Hyvönen (Aalto, HELDIG)	, , The BiographySampo is a brand new data service based on more than 13 000 short biographies in the National Biography and other databases of the Finnish Literature Society. The material, enriched with information from other web sources, offers new tools for carrying out digital humanities research and data analysis of biographies using artificial intelligence and linked open data technologies. We will introduce the system, its features, and discuss their use in biographical and prosopographical research.
17:30-19:00	Networking and Bubbles		Outside the Small Hall

HELDIG Digital Humanities Summit 2018

Re­gis­tra­tion

Programme

Registration