Katja Saarela defends her PhD thesis on Prediction of Work Disability Risk Using Machine Learning

On the 22nd of May 2025, M.Sc. and M.Sc. (Tech) Katja Saarela defends her PhD thesis on Prediction of Work Disability Risk Using Machine Learning. The thesis is related to research done in the Department of Computer Science and in the Empirical Software Engineering group.

M.Sc., M.Sc. (Tech) Katja Saarela defends her doctoral thesis "Prediction of Work Disability Risk Using Machine Learning" on Thursday the 22nd of May 2025 at 13 o'clock in the University of Helsinki Main building, hall Karolina Eskelin (U3032, Fabianinkatu 33, 3rd floor). Her opponent is Professor Mark van Gils (University of Tampere) and custos Professor Jukka K. Nurminen (University of Helsinki). The defence will be held in English.

The thesis of Katja Saarela is a part of research done in the Department of Computer Science and in the Empirical Software Engineering group at the University of Helsinki. Her supervisors have been Professor Jukka K. Nurminen (University of Helsinki) and Professor Tomi Männistö (University of Helsinki).

Prediction of Work Disability Risk Using Machine Learning

Virtually all developed countries share the problem of too many employees leaving the labor market permanently due to disability or health problems. Work disability means that a person cannot work until retirement age due to illness or disability. The risk factors for disability should be identified promptly, as early intervention is known to be more cost-effective and more effective than later treatments. Thus, it is important to identify people at high risk of ending up in disability retirement as early as possible. The purpose of our study was to find out if work disability risk is possible to predict using machine learning (ML). We selected the Design Science method and evaluated two artifacts for work disability risk prediction using systematic analysis and comparative study. The results can be utilized in occupational healthcare and pension funds. They also increase our theoretical understanding of work disability risk prediction.

Stakeholder analysis is essential to understand the big picture and to identify who the AI system can affect and how. We discovered five stakeholders in the work disability risk prediction: an employee, an employer, occupational healthcare, a pension fund, and society. It is in the common interest of the stakeholders to keep employees healthy and able to work as long as possible. Information systems, specifically ML systems, are implemented within an organization to improve effectiveness and efficiency. ML-based methods are typically efficient in the screening of large amounts of data. However, a few ML methods for general work disability risk prediction are in use or developed. Hence, we planned and implemented an ML method in our initial study, M_Health to show that prediction is possible with sufficient accuracy.

M_Health uses data from occupational healthcare labeled by doctors and Natural Language Processing (NLP) based deep learning algorithms. It is used in occupational healthcare to help with patient screening. We compared M_Health with another work disability risk prediction method, M_Pension. M_Pension uses structural data from pension-decision registers and applies different ML algorithms. M_Health had 72%, and M_Pension had an accuracy of 69-78%, depending on the algorithm used. The accuracy, sensitivity, and specificity are suitable to support expert work. The decision-maker must still be human; the responsibility for the decision cannot be on artificial intelligence but on the expert.

Non-maleficence, accountability and responsibility, transparency and explainability, justice and fairness, and respect for various human rights are the most important aspects of ethical artificial intelligence (AI) in work disability risk prediction. When estimating the ethicality of an ML method, we need to consider stakeholders' different interests, goals, and reasons for actions. We examined the AI ethics of work disability risk prediction based on these criteria. When an ML method estimates a person's ability to work, it is necessary to understand how the machine made the estimation. According to human rights, people are entitled to have explanations on how decisions are made. Thus, they can maintain agency, freedom, and privacy. ML methods should be transparent and explainable so the stakeholders can trust the results. However, deep learning methods are typically black boxes.

To study the explainability, we formulated the visualizations for the methods M_Health and M_Pension and discussed how explainable they are. We discovered that decision trees (M_Pension) are easier to explain than deep learning algorithms (M_Health). To summarize, M_Pension is more accurate and more explainable than M_Health. However, M_Health can be used earlier in the process, which is essential for early detection and preventive support for a person with an increased work disability risk. It is important to produce ML methods for work disability risk prediction that are accurate, sensitive, and specific to support decision-making but also trustworthy, transparent, and explainable.

Avail­ab­il­ity of the dis­ser­ta­tion

An electronic version of the doctoral dissertation will be available in the University of Helsinki open repository Helda at http://urn.fi/URN:ISBN:978-952-84-1321-9.

Printed copies will be available on request from Katja Saarela: katja.saarela@helsinki.fi.