Arpit Merchant defends his PhD thesis on Applications of Node Embeddings to Learning and Mining on Graphs

On Thursday the 24th of August 2023, M.Sc. Arpit Merchant defends his PhD thesis on Applications of Node Embeddings to Learning and Mining on Graphs. The thesis is related to research done in the Department of Computer Science and in the Algorithmic Data Science group.

M.Sc. Arpit Merchant defends his doctoral thesis "Applications of Node Embeddings to Learning and Mining on Graphs" on Thursday the 24th of August 2023 at 13 o'clock in the University of Helsinki Physicum building, Auditorium E204 (Gustaf Hällströmin katu 2, 2nd floor). His opponent is Assistant Professor Kiran Garimella (Rutgers University, USA) and custos Associate Professor Michael Mathioudakis (University of Helsinki). The defence will be held in English.

The thesis of Arpit Merchant is a part of research done in the Department of Computer Science and in the Algorithmic Data Science group at the University of Helsinki. His supervisor has been Associate Professor Michael Mathioudakis (University of Helsinki).

Applications of Node Embeddings to Learning and Mining on Graphs

Graphs play a central role in a wide range of domains from analyzing the structure of social networks, to identifying molecular patterns in biological networks, and optimizing the performance of communication systems. In recent years, node embeddings, i.e., vector representations of nodes of a graph, have emerged as powerful tools to concisely capture local and higher-order topological relationships. With advances in parallel computing and deep learning, fusing node attributes with adjacency information to build embeddings via, for instance, graph neural network models (or GNNs, for short) has led to their prevalence in applications such as node classification and link prediction. However, even as new architectures are continually introduced for improving the state-of-the-art on context-specific tasks, our understanding of the wider generalizability and usefulness of foundational designs remains incomplete. Thus, analyzing embedding-based learning algorithms through the impact of various graph characteristics such as homophily and class imbalance on their performance, scalability to large datasets, usefulness on classical tasks such as estimating graph properties, and their ethical implications in sensitive domains such as loan applications is of critical importance. 

In this thesis, we seek to address these gaps by characterizing the versatility of embeddings to learning and mining on graphs. To this end, we contribute novel model-agnostic techniques for adapting embeddings for four concrete tasks namely, node classification, distance estimation, graph summarization, and fair representation learning. First, we design JANE to incorporate label information into embeddings for attributed graphs such that we flexibly adapt to graphs with low, medium, and high homphily. Second, we propose frameworks for exact and approximate oracles to analyze the usefulness of embeddings for estimating graph distances. Third, we present spectral algorithms, SpecSumm and Ocsa, for building summaries from embeddings for adjacency reconstruction, visualization, and triangle estimation. Lastly, we quantify the tradeoff between algorithmic fairness and accuracy of embedding-based models on node classification and design two interventions, PFR-AX and PostProcess, to reduce algorithmic disparity and inequality towards nodes on the basis of protected attributes such as gender, age, etc. Through experiments on 29 datasets over six domains across these four tasks, we extensively evaluate the performance of our algorithms and benchmark our results against state-of-the-art baselines.

Avail­ab­il­ity of the dis­ser­ta­tion

An electronic version of the doctoral dissertation will be available on the e-thesis site of the University of Helsinki at

Printed copies will be available on request from Arpit Merchant: