Jarno N. Alanko defends his PhD thesis on Space-Efficient Algorithms for Strings and Prefix-Sortable Graphs

On Wednesday the 3rd of June 2020, M.Tech. Jarno N. Alanko will defend his doctoral thesis on Space-Efficient Algorithms for Strings and Prefix-Sortable Graphs. The thesis is a part of research done in the Department of Computer Science and in the Genome-Scale Algorithmics research group at the University of Helsinki.

M.Tech. Jarno N. Alanko defends his doctoral thesis Space-Efficient Algorithms for Strings and Prefix-Sortable Graphs on Wednesday the 3rd of June 2020 at 16 o'clock in the University of Helsinki Porthania building, Auditorium PII (Yliopiston katu 3, 1st floor). His opponent is Assistant Professor Sharma Thankachan (University of Central Florida, USA) and custos Professor Veli Mäkinen (University of Helsinki). The defence will be held in English. It is possible to follow the defence as a live stream at https://helsinki.zoom.us/j/68865960125?pwd=bElCenBGVm5BU3hvN01ocEhENngxZz09.

The thesis of Jarno N. Alanko is a part of research done in the Department of Computer Science and in the Genome-Scale Algorithmics research group at the University of Helsinki. His supervisor has been Professor Veli Mäkinen (University of Helsinki).

Space-Efficient Algorithms for Strings and Prefix-Sortable Graphs

Space-efficient data structures are an active field of research that has found many applications in combinatorial pattern matching and bioinformatics. The idea is to build data structures that occupy space close to an information-theoretic lower bound, or even less, but still support efficient query operations. In the past few decades, compact index structures have been designed for a variety of different types of data, including bit vectors, strings, trees and graphs, to name a few prominent applications.

In this thesis, we design and apply compact data structures for problems related to bioinformatics, and advance the theory of Wheeler graphs, which are a class of graphs that admit a compact indexing data structure. The work is based on four published papers.

In the first two papers, we propose compact solutions for two problems on strings. In Paper I, we design and implement an algorithm that computes the classical greedy approximation for the shortest common superstring problem. In Paper II, we design and implement data structures for storing a variable-order Markov model in a compact and queryable form.

The last two papers of the thesis expand the theory of Wheeler graphs. In Paper III, we extend the theory into finite state automata, leading to a number of interesting algorithms for recognizing, sorting, determinizing and minimizing automata that are Wheeler graphs. We also show how to turn any acyclic automaton into the minimum equivalent Wheeler graph automaton. In Paper IV, we propose a method to compress a Wheeler graph while retaining the indexing functionality, by generalizing a recently introduced method of tunneling from the Burrows-Wheeler transform to Wheeler graphs.

Availability of the dissertation

An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-951-51-6168-0.

Printed copies will be available on request from Jarno N. Alanko: jarno.alanko@helsinki.fi.