Leo Leppänen defends his PhD thesis on Methods for Automated Generation of Natural-Language Reports

On Friday the 21st of April 2023, M.Sc. Leo Leppänen defends his PhD thesis on Methods for Automated Generation of Natural-Language Reports. The thesis is related to research done in the Department of Computer Science and in the Discovery Research group.

M.Sc. Leo Leppänen defends his doctoral thesis Methods for Automated Generation of Natural-Language Reports on Friday the 21st of April 2023 at 13 o'clock in the University of Helsinki Chemicum building, Auditorium A110 (A. I. Virtasen aukio 1, 1st floor). His opponent is Professor Ehud Reiter (University of Aberdeen, United Kingdom) and custos Professor Hannu Toivonen (University of Helsinki). The defence will be held in English.

The thesis of Leo Leppänen a part of research done in the Department of Computer Science and in the Discovery Research group at the University of Helsinki. His supervisor has been Professor Hannu Toivonen (University of Helsinki).

Methods for Automated Generation of Natural-Language Reports

The use of computer software to automatically produce natural language texts expressing factual content is of interest to practitioners of multiple fields, ranging from journalists to researchers to educators. This thesis studies natural language report generation from structured data for the purposes of journalism. The topic is approached from three directions.

First, we approach the problem from the perspective of analysing what requirements the journalistic domain imposes on the software, and how software might be architectured to account for the requirements. This includes identifying the key domain norms (such as the "objectivity norm") and business requirements (such as system transferability) and mapping them to software requirements. Based on the identified requirements, we then describe how a modular data-to-text approach to natural language generation can be implemented in the specific context of hard news reporting.

Second, we investigate how the highly domain-specific natural language generation subtask of document planning - deciding what information is to be included in an automatically produced text, and in what order - might be conducted in a less domain-specific manner. To this end, we describe an approach to operationalizing the complex concept of "newsworthiness" in a manner where a natural language generation system can employ it. We also present a broadly applicable baseline method for structuring the content in a data-to-text setting without explicit domain knowledge.

Third, we discuss how bias in text generation systems is perceived by key stakeholders, and whether those perceptions align with the reality of news automation. This discussion includes identifying how automated systems might exhibit bias and how the biases might be - potentially unconsciously - embedded in the systems. As a result, we conclude that common perceptions of automated journalism as fundamentally "unbiased" are unfounded, and that beliefs about "unbiased" automation might have the negative effect of further entrenching pre-existing biases in organizations or society.

Together, through these three avenues, the thesis sketches out a way towards more widespread use of news automation in newsrooms, taking into account the various ethical questions associated with the use of such systems.

Avail­ab­il­ity of the dis­ser­ta­tion

An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-951-51-9033-8.

Printed copies will be available on request from Leo Leppänen: leo.leppanen@helsinki.fi.