Researchers are clearing language barriers for automated news, aiming for an increasingly varied view of the world

Researchers at the University of Helsinki are developing news automation and information retrieval from text masses in cooperation with five other universities and the Finnish News Agency STT.

How to automatically find the essential content of news in various languages? How might a computer produce news smoothly, and does technology adapt to small linguistic areas such as Finland?

These are among the challenges to be tackled by EMBEDDIA, a research project launching in 2019 with EU funding, with the University of Helsinki participating. The three-year project will be developing methods for automated text analysis and generation.

One of the goals of the project is to simplify searching for information from online news, regardless of its language.

“Combining news written in several languages widens the perspective on the subject at hand, while making it possible to find out what is written on the item in different languages and in different media. The goal is to improve people's access to information,” says Professor of Computer Science Hannu Toivonen, whose research group is taking part in the project.

The University of Helsinki’s Swedish School of Social Science is also a project participant, focused on investigating the needs of media companies.

“This project opens fascinating avenues into developing entirely new solutions for media to utilise. Ensuring a genuine demand for them is also important,” says Docent Carl-Gustav Lindén, a researcher of media and journalism.

Computers have the capacity to report on every single game

Many media companies are already employing automated news for reporting on sports and elections. Using structured data, computers are able to write news articles. For example, ice hockey games are a comfortably regular phenomenon from a computational viewpoint: they consist of three periods, resulting in an unambiguous number of goals.

According to Toivonen, news automation is useful because it enables the production of a great amount of news from consistent data. Computers can write articles on local hockey games even for a handful of readers.

“In such cases, the audience of a single piece of news may be small, but when the number of articles is great, media businesses both achieve extensive coverage and respond to specific needs,” Toivonen explains.

For now, automated news is comprised of election and sports coverage, and the like, which is generated in a structured manner from structured data. In-depth profiles and news analyses produced by computers are still some way off in the future, since computers are yet unable to handle the linguistic and content variation of these text types.

“Reporters are still needed. The nature of the profession may evolve, and meta-editorial elements will be involved. For example, journalists may instruct computers on reporting various subject matter,” Toivonen says.

“Such developments don’t necessarily apply to all journalists, but everyone must understand the direction the world of media is taking and the possibilities generated by new technologies,” Lindén adds.

Increasingly creative content through metaphors

In the EMBEDDIA project, Toivonen's group is focusing on how to make computers able to automatically produce news as efficiently as possible and in several languages. This is a continuation of the group’s earlier research on news automation (in Finnish and Swedish only).

Modern technology provides computers with the ability to create relatively smooth content on election results, but they are not yet good at writing vividly. A creative touch is now being sought for both text structures and word choice.

“Metaphors employ structures that can be taught to computers, at least to a degree. This is how we hope to put a little colour into the language,” says Toivonen.

A partnership of universities and media businesses

The University of Helsinki is participating in EMBEDDIA, a research project to be launched in 2019, developing news automation across language boundaries.

The three-year project is funded by the EU’s Horizon 2020 research programme. The University of Helsinki’s share of the funding totals approximately €450,000.

In addition to the University of Helsinki, five other European universities are taking part in EMBEDDIA, as are the Finnish News Agency STT and three other media businesses.

The name of the project derives from machine learning technologies known as word embeddings, which learn relations between words based on the contexts of their occurrences. The multilingual word-embedding models to be developed in the project will help computers find connections between texts written in different languages.