NewsEye, the multidisciplinary research project at the University of Helsinki, has gained funding so that experts in digital humanities, computer science, and library science can work on digitalised memory material. They wanted to cooperate in order to gain results that are relevant and usable for research, teaching, and other uses of the material.
Data science automated by means of artificial intelligence
The focus of the research is on data science, and the project is headed by Professor
“The most interesting research object is the automated research assistant that can use new tools developed in the project independently to search for results that are interesting to the user, report its findings in clear text, and can explain the findings and its own work. This is our objective in Helsinki,” Professor Toivonen, known as a specialist in creative computing, says.
Newspapers digitalised in Mikkeli became machine-readable big data material
The starting point is the National Library’s material that has already been digitalised. The objective is to recover text automatically from the digitalised material, transforming images into text, and recovering separate articles.
The National Library will deliver historical Finnish newspaper material from the years 1771-1910 for NewsEye to process. It has digitalised all the Finnish newspapers that appeared during this time period and made them into a machine-readable data packet. The material will be complemented with newspaper material from 1911-1917. The extensive dititalisation of the National Library is carried out in its offices in Mikkeli.
A tool for analysing enriched text from different viewpoints
The project is also working on enriching text automatically by recovering names and attitudes from text. The Finnish researchers are also focusing on developing new tools for analysing enriched text from different viewpoints so that different contexts and baselines are observed.
Hannu Toivonen gives an example of how the automated research assistant would work if contexts and baselines have been observed:
“Say a user is interested of their family history and gives their surname for analysis. The research assistant will look for the surname in old newspapers, and also check the context in which the name appears. The assistant will observe that it is a surname, compare its contexts with those of other surnames, and then tell the user which contexts are especially frequent in connection with the given surname. Further, the assistant can report how the contexts have changed with time.”
A project handling several languages in parallel at best
In the inter-disciplinary project NewsEye: A Digital Investigator for Historical Newspapers, the funding share for the University of Helsinki is 900,000 euros. The work started recently and will continue for three years. The total European Union Horizon 2020 funding for the project is 3 Million euros.
From the University of Helsinki, the participants are Professor Hannu Toivonen, historian
Multi-lingual novelty rare i Europe
The multi-lingual feature is a novelty; the methods and tools will be made as independent of language as possible, or at best they will be able to work with different languages at the same time. According to the researchers, this is important – but very rare – in a European context.
More details
It is responsible for the collection, description, preservation and accessibility of Finland’s printed national heritage and the unique collections under its care.
At the University of Helsinki, NewsEye is a core activity of both HiDATA and HELDIG.
JOIN US!
HiData is arranging a kick off event on Tuesday May 29th in ThinkCorner from 9 a.m. to 3 p.m. See the full programme
Contact details
Hannu Toivonen, Faculty of Science,
Minna Kaukonen, National Library,
Mikko Tolonen, National Library,
Science Communicator Minna Meriläinen-Tenhu, @MinnaMeriTenhu, +358 50 415 0316,