Discovering the evolution of a tumor may help identify driver mutations and provide a more comprehensive view on the history of the tumor. Recent studies have tackled this problem using multiple samples sequenced from a tumor, and due to clinical implications, this has attracted great interest. However, such samples usually mix several distinct tumor subclones, which confounds the discovery of the tumor phylogeny.
We study a natural problem formulation requiring to decompose the tumor samples into several subclones with the objective of forming a minimum perfect phylogeny. We propose an Integer Linear Programming formulation for it, and implement it into a method called MIPUP. We tested the ability of MIPUP and of four popular tools LICHeE, AncesTree, CITUP, Treeomics to reconstruct the tumor phylogeny. On simulated data, MIPUP shows up to a 34% improvement under the ancestor-descendant relations metric. On four real datasets, MIPUP’s reconstructions proved to be generally more faithful than those of LICHeE.
MIPUP is available at
Proceedings of the 43rd International Workshop on Graph-Theoretic Concepts in Computer Science (WG 2017), Lecture Notes in Computer Science 10520 (2017) 303-315.
Gap filling is the last phase of de novo genome assembly where gaps between consecutive contigs in scaffolds are filled. We present a rigorous formulation of the gap filling problem.
High-throughput sequencing (HTS) of metagenomes is proving essential in understanding the environment and diseases. State-of-the-art methods for discovering the species and their abundance in an HTS metagenomic sample are based on genome-specific markers, which can lead to skewed results, especially at species level.
The third generation sequencing technologies, such as Pacbio SMRT sequencing, produce long sequencing reads but with an error rate of about 15%. The sequencing errors, which include numerous insertions and deletions in addition to substitutions, complicate downstream analysis of the data such as genome assembly and mapping of the reads.