M.Sc. Yang Liu defends her PhD thesis "Methodological and Extrinsic Challenges in Offline Evaluation of Recommender Systems" on Monday the 11th of August 2025 at 12 o'clock in the University of Helsinki Main building, Auditorium Tekla Hultin (F3003, Fabianinkatu 33, 3rd floor). Her opponent is Professor Joseph Konstan (University of Minnesota, The United States) and custos Associate Professor Dorota Głowacka (University of Helsinki). The defence will be held in English.
The thesis of Yang Liu is a part of research done in the Department of Computer Science and in the Exploratory Search and Personalisation group at the University of Helsinki. Her supervisors have been Associate Professor Dorota Głowacka (University of Helsinki) and IT Specialist Alan Medlar (University of Helsinki).
Methodological and Extrinsic Challenges in Offline Evaluation of Recommender Systems
Recent studies in recommender systems have shown that performance improvements achieved by state-of-the-art methods could be attributed to the misapplication of offline evaluation protocols. These results highlight the potential for both methodological and extrinsic aspects of evaluation to impact results. Furthermore, specialised recommendation tasks, such as sequential and cross-domain recommendation, additionally pose unique and underexplored challenges for evaluation.
In this thesis, we present a series of studies examining different aspects of evaluation in top-n recommendation, sequential recommendation, and cross-domain recommendation. The first study examined sampled evaluation metrics in top-n recommendation. Our results demonstrate greater consistency between sampled and traditional (non-sampled) metrics compared to prior studies, as well as additional advantages, such as the potential for higher discriminative power and robustness against popularity bias in sampled metrics. The second study looked at how the most widely-used data splitting strategy in offline evaluation of sequential recommenders is deeply flawed due to data leakage, which results in performance being significantly overestimated. Our third study investigates user perceptions of cross-domain recommendations. The results show that simply telling users recommendations were based on information from a different domain significantly altered their perceptions of recommendations, lowering both trust and interest. In the final study, we propose a novel meta-evaluation framework based on techniques from psychometric assessment that can be used to investigate various aspects of offline evaluation, including evaluation metrics, the information content of data sets and the role of item popularity and user engagement in recommendation.
This thesis contributes to our understanding of a diverse range of evaluation challenges in recommender systems. The findings raise critical concerns regarding the evaluation of recommender systems, showing that standard evaluation methods can result in questionable research findings. Additionally, this thesis contains insights into how evaluation can be improved for several recommendation tasks.
Availability of the dissertation
An electronic version of the doctoral dissertation will be available in the University of Helsinki open repository Helda at http://urn.fi/URN:ISBN:978-952-84-1350-9.
Printed copies will be available on request from Yang Liu: yang.liu@helsinki.fi.