NewsEye, the multidisciplinary research project at the University of Helsinki, has gained funding so that experts in digital humanities, computer science, and library science can work on digitalised memory material. They wanted to cooperate in order to gain results that are relevant and usable for research, teaching, and other uses of the material.
Data science automated by means of artificial intelligence
The focus of the research is on data science, and the project is headed by Professor Hannu Toivonen.
“The most interesting research object is the automated research assistant that can use new tools developed in the project independently to search for results that are interesting to the user, report its findings in clear text, and can explain the findings and its own work. This is our objective in Helsinki,” Professor Toivonen, known as a specialist in creative computing, says.
Newspapers digitalised in Mikkeli became machine-readable big data material
The starting point is the National Library’s material that has already been digitalised. The objective is to recover text automatically from the digitalised material, transforming images into text, and recovering separate articles.
The National Library will deliver historical Finnish newspaper material from the years 1771-1910 for NewsEye to process. It has digitalised all the Finnish newspapers that appeared during this time period and made them into a machine-readable data packet. The material will be complemented with newspaper material from 1911-1917. The extensive dititalisation of the National Library is carried out in its offices in Mikkeli.
A tool for analysing enriched text from different viewpoints
The project is also working on enriching text automatically by recovering names and attitudes from text. The Finnish researchers are also focusing on developing new tools for analysing enriched text from different viewpoints so that different contexts and baselines are observed.
Hannu Toivonen gives an example of how the automated research assistant would work if contexts and baselines have been observed:
“Say a user is interested of their family history and gives their surname for analysis. The research assistant will look for the surname in old newspapers, and also check the context in which the name appears. The assistant will observe that it is a surname, compare its contexts with those of other surnames, and then tell the user which contexts are especially frequent in connection with the given surname. Further, the assistant can report how the contexts have changed with time.”
A project handling several languages in parallel at best
In the inter-disciplinary project NewsEye: A Digital Investigator for Historical Newspapers, the funding share for the University of Helsinki is 900,000 euros. The work started recently and will continue for three years. The total European Union Horizon 2020 funding for the project is 3 Million euros.
From the University of Helsinki, the participants are Professor Hannu Toivonen, historian Mikko Tolonen and his research gropu, and from the National Library, Minna Kaukonen and her group of researchers. Similar tri-disciplinary teams from France and Austria are participating, as well as one German partner.
Multi-lingual novelty rare i Europe
The multi-lingual feature is a novelty; the methods and tools will be made as independent of language as possible, or at best they will be able to work with different languages at the same time. According to the researchers, this is important – but very rare – in a European context.
The National Library of Finland is the oldest and largest scholarly library in Finland.
It is responsible for the collection, description, preservation and accessibility of Finland’s printed national heritage and the unique collections under its care.
At the University of Helsinki, NewsEye is a core activity of both HiDATA and HELDIG.
HELDIG, Helsinki Centre for Digital Humanities, is a Finnish research network and infrastructure for solving research problems in humanities and social sciences with novel computational methods, and for studying digitalization as a phenomenon. HELDIG also supports education in Digital Humanities and application development.
HiData, Helsinki Centre for Data Science, is organized as a large multi-disciplinary network of researchers working on both methods and applications, supported by the emerging Data Science infrastructure. HiDATA is a joint hub of the two participating universities, the University of Helsinki and Aalto University.
HiData is arranging a kick off event on Tuesday May 29th in ThinkCorner from 9 a.m. to 3 p.m. See the full programme here.
Hannu Toivonen, Faculty of Science, firstname.lastname@example.org, http://www.cs.helsinki.fi/hannu.toivonen/, +358 50 9112405
Minna Kaukonen, National Library, email@example.com, +358 50 4155 450
Mikko Tolonen, National Library, firstname.lastname@example.org, +358 50 448 2055
Science Communicator Minna Meriläinen-Tenhu, @MinnaMeriTenhu, +358 50 415 0316, email@example.com