M.Sc. Nicola Rizzo defends his doctoral thesis "Indexable Sequence Graphs: Exploiting Uniqueness in the Pangenome Era" on Friday the 27th of June 2025 at 12 o'clock in the University of Helsinki Chemicum building, Hall A110 (A. I. Virtasen aukio 1, 1st floor). His opponent is Professor Marinella Sciortino (University of Palermo, Italy) and custos Professor Veli Mäkinen (University of Helsinki). The defence will be held in English.
The thesis of Nicola Rizzo is a part of research done in the Department of Computer Science and in the Genome-scale Algorithms team of the Algorithmic Bioinformatics group at the University of Helsinki. His supervisor has been Professor Veli Mäkinen (University of Helsinki).
Indexable Sequence Graphs: Exploiting Uniqueness in the Pangenome Era
Sequence-to-graph alignment is an active area of research in bioinformatics, motivated by recent pangenome graph references and by the computational hardness of the problem at hand. Indeed, graph-based pangenomes are a popular tool to represent collections of genomes, since the paths in a labeled graph can easily and compactly incorporate diversity, so finding the path most similar to a given new sequence is a fundamental task. Sequence alignment presents a quadratic-time hardness in both this setting and the classical sequence-to-sequence one, conditionally to popular fine-grained complexity hypotheses being true, whereas exact pattern matching between sequences can be solved in linear time but it exhibits the same quadratic-time hardness when we substitute one sequence with a labeled graph. This limitation affects practical sequence-to-graph alignment heuristics like seed-and-extend and seed-chain-extend.
Indexable elastic founder graphs have been introduced as a complete framework for the construction of graphs starting from a collection of aligned sequences. To obtain a graph that is indexable in polynomial time for linear-time exact pattern matching, they exploit the uniqueness of the input data to segment it into a graph where each node label represents a unique position. In the case of sequences aligned without gaps, indexable non-elastic founder graphs have been studied extensively in the literature. This thesis, based on three published papers and an accepted conference paper, expands the framework on elastic founder graphs by developing linear-time construction algorithms, a new pattern matching index, and a suite of praytical tools for indexable elastic founder graphs built from aligned sequences with gaps.
In the first paper, we complete the existing theoretical framework on elastic founder graphs by proposing linear-time construction solutions optimizing for different metrics, and by improving the time and space of the pattern-matching index from linear with respect to the cumulative length of paths of length 3, to linear in the paths of length 2. In the second paper, we extend maximal exact matches, an important class of seeds with connections to classical alignment metrics, to the sequence-to-graph setting. Then, we obtain an efficient algorithm for matches spanning at most a constant number of nodes and a general solution for indexable elastic founder graphs. In the third paper, we develop an efficient sequence-to-graph chaining algorithm maximizing mutual coverage, where this metric is connected to the classical longest common subsequence variant of sequence alignment, with a solution whose complexity is parameterized on the width of the graph. In the fourth paper, we develop a highly accurate and practical pipeline for seed-chain-extend alignment based on indexable elastic founder graphs: in the construction, we give practical considerations to handle missing nucleotides, long runs of gaps, a high number of genomes, and the variations to a linear reference; then, we implement efficient seeding based on exact pattern matching; we extend a practical chaining solution to the sequence-to-graph case, where the graph is a relaxed elastic founder graph; and we integrate a state-of-the-art tool for the final extend phase.
Availability of the dissertation
An electronic version of the doctoral dissertation will be available in the University of Helsinki open repository Helda at http://urn.fi/URN:ISBN:978-952-84-1345-5.
Printed copies will be available on request from Nicola Rizzo: nicola.rizzo@helsinki.fi