Software - Graph Algorithms
Our application-oriented publications are accommpanied by free software.

We list here the most representative implementations.

MIPUP

Discovering the evolution of a tumor may help identify driver mutations and provide a more comprehensive view on the history of the tumor. Recent studies have tackled this problem using multiple samples sequenced from a tumor, and due to clinical implications, this has attracted great interest. However, such samples usually mix several distinct tumor subclones, which confounds the discovery of the tumor phylogeny.

We study a natural problem formulation requiring to decompose the tumor samples into several subclones with the objective of forming a minimum perfect phylogeny. We propose an Integer Linear Programming formulation for it, and implement it into a method called MIPUP. We tested the ability of MIPUP and of four popular tools LICHeE, AncesTree, CITUP, Treeomics to reconstruct the tumor phylogeny. On simulated data, MIPUP shows up to a 34% improvement under the ancestor-descendant relations metric. On four real datasets, MIPUP’s reconstructions proved to be generally more faithful than those of LICHeE.

MIPUP is available at https://github.com/zhero9/MIPUP.

Gap2Seq

Gap filling is the last phase of de novo genome assembly where gaps between consecutive contigs in scaffolds are filled. We present a rigorous formulation of the gap filling problem. Gap2Seq (available at https://www.cs.helsinki.fi/u/lmsalmel/Gap2Seq/) provides an implementation of a pseudopolynomial algorithm for this NP-complete problem. Furthermore, Gap2Seq classifies the bases used to fill the gaps into safe and unsafe ones where the safe bases are present in each possible solution to the gap filling problem. A version of Gap2Seq tailored for insertion genotyping is also available at https://github.com/rikuu/Gap2Seq/

MetaFlow

High-throughput sequencing (HTS) of metagenomes is proving essential in understanding the environment and diseases. State-of-the-art methods for discovering the species and their abundance in an HTS metagenomic sample are based on genome-specific markers, which can lead to skewed results, especially at species level.

Metaflow (available at https://www.helsinki.fi/en/researchgroups/genome-scale-algorithmics/metaflow) is an accurate method based on coverage analysis across entire genomes that also scales to HTS samples.