PhD student Jarno Alanko from the Genome-scale algorithmics group at the Department of Computer Science won the Capocelli Prize together with his co-author, PhD student, Louisa Seelbach Benkner from the University of Siegen. Non-student authors of the award-winning paper “Tunneling on Wheeler Graphs" are Travis Gagie from the Diego Portales University and Gonzalo Navarro from the University of Chile.
The Burrows-Wheeler transform is an invertible string transformation that is the core of many compression tools such as the bzip2 utility found in most Unix systems. The transformed string also supports substring search queries on the original string, which has made it invaluable in the field of bioinformatics.
Last year, Uwe Baier introduced a new method for compressing the Burrows-Wheeler transform of a repetitive string. Baier’s method, called tunneling, can often yield better compression rates than standard run-length compression. This year, the award-winning paper of Alanko et. al generalizes this technique to a class of graphs called Wheeler graphs.
A Wheeler graph is a labeled directed graph where the nodes of the graph are sortable by lexicographic order of the incoming paths. There exists a succinct representation for such graphs with the remarkable property that is supports path queries, i.e. queries for whether a given string exists in the graph as a path. This can be seen as a kind of a generalization of the Burrows-Wheeler transform.
The work of Alanko et. al gives a new compressed representation for Wheeler graphs, which is still able to support path queries on the graph. The paper also considers the special case of strings, and shows how count and locate the occurrences of the query string, filling in important pieces missing from Baier’s original paper.
The work was initiated by the research visit of Jarno Alanko to the University of Chile supported by the EU RISE project BIRDS.