Massimo Equi defends his PhD thesis on Lower and Upper Bounds for String Matching in Labelled Graphs

On Wednesday the 22nd of June 2022, M.Sc. Massimo Equi defends his doctoral thesis on Lower and Upper Bounds for String Matching in Labelled Graphs. The thesis is related to research done in the Department of Computer Science and in the Genome-Scale Algorithmics team of the Algorithmic Bioinformatics group.

M.Sc. Massimo Equi defends his doctoral thesis on Lower and Upper Bounds for String Matching in Labelled Graphs on Wednesday the 22nd of June 2022 at 13 o´clock in the University of Helsinki Chemicum building, Hall A129 (A. I. Virtasen aukio 1, 1st floor). His opponent is Assistant Professor Nicola Prezza (Ca' Foscari University of Venice, Italy) and custos Professor Veli Mäkinen (University of Helsinki). The defence will be held in English. It is possible to follow the defence as a live stream at https://video.helsinki.fi/unitube/live-stream.html?room=l42.

The doctoral thesis of Massimo Equi is a part of research done in the Department of Computer Science and in the Genome-Scale Algorithmics team of the Algorithmic Bioinformatics group at the University of Helsinki. His supervisors have been Professor Veli Mäkinen and Associate Professor Alexandru I. Tomescu (University of Helsinki).

Lower and Upper Bounds for String Matching in Labelled Graphs

String Matching in Labelled Graphs (SMLG) is a generalisation of the classic problem of finding a match for a string into a text. In SMLG, we are given a pattern string and a graph with node labels, and we want to find a path whose node labels match the pattern string. This problem has been studied since 1992, and it was initially intended to model the problem of finding a link in a hypertext. Recently, the problem received attention due to its applications in bioinformatics, but all of the solutions, old and new, failed to run in truly sub-quadratic time.

In this work, based on four published papers, we study SMLG from different angles, first proving conditional lower bounds, and then proposing efficient algorithms for special classes of graphs.

In the first paper, we unveil the reason behind the hardness of SMLG, showing a quadratic conditional lower bound based on the Orthogonal Vectors Hypothesis and the Strong Exponential Time Hypothesis. The techniques that we employ come from the fine-grained complexity, and involve finding linear-time reductions from the Orthogonal Vectors problem to different variations of SMLG.

In the second paper, we strengthen our findings by showing that an indexing data structure built in polynomial time is not enough to provide subquadratic time queries for SMLG. We devise a general framework for obtaining indexing lower bounds out of regular lower bounds, and we prove the indexing lower bound for SMLG as an application of this technique.

In the third paper, we surpass the limitations of our lower bounds by identifying a class of graphs, called founder block graphs, which support linear time queries after subquadratic indexing. This class of graph effectively represents collections of strings called multiple sequence alignments, if gap characters are not present.

In the fourth paper, we significantly improve our previous results on efficiently indexable graphs. We propose elastic founder graphs, a superset of founder block graphs, that are able to represent multiple sequence alignments with gaps. Moreover, we propose algorithms for constructing elastic founder graph, indexing them, and perform queries in linear time.

Avail­ab­il­ity of the dis­ser­ta­tion

An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-951-51-8217-3.

Printed copies will be available on request from Massimo Equi: massimo.equi@helsinki.fi