Computational problems are usually proposed as models of real-world problems. In some cases we are interested in just a simple answer to a problem, e.g., any route from A to B minimizing the amount of fuel consumed. However, other problems are only approximately modeled by a mathematical formulation, and the best solution may depend on more complex criteria, unavailable when modeling the problem. One established way of coping with this is to enumerate all solutions, or only the first k best solutions. However, in many cases this is simply unfeasible, since there can be a large, even exponential, number of optimal solutions.
Another way to cope with issue, interesting from both a theoretical and a practical viewpoint, is to consider only those partial solutions that are common to all solutions. We call these safe. An algorithm outputting all safe partial solutions is called safe and complete. A practical reason why we are intersted in safe and complete algorithms comes from Bioinformatics, and more precisely from the analysis of high-throughput sequencing data.
Our key contributions until 2019 are:
M. Cairo, P. Medvedev, N. Obscura Acosta, R. Rizzi, A. I. Tomescu: An Optimal O(nm) Algorithm for Enumerating All Walks Common to All Closed Edge-covering Walks of a Graph. ACM Trans. Algorithms 15(4): 48:1-48:17 (2019) (also CPPM 2017)
L. Salmela, A. I. Tomescu: Safely Filling Gaps with Partial Solutions Common to All Solutions. IEEE/ACM Trans. Comput. Biology Bioinform. 16(2): 617-626 (2019)
N. Kiirala, L. Salmela, A. I. Tomescu: Safe and Complete Algorithms for Dynamic Programming Problems, with an Application to RNA Folding. CPM 2019: 8:1-8:16
N. Obscura Acosta, V. Mäkinen, A. I. Tomescu: A safe and complete algorithm for metagenomic assembly. Algorithms for Molecular Biology 13(1): 3:1-3:12 (2018)
For further details and more recent papers on this topic, check the page of our ERC Starting Grant "Safe and Complete Algorithms for Bioinformatics".
Many problems on strings can be naturally generalized to labeled graphs, since a string is just a labeled path. We work problems of the following type: given a string and a labeled graph, find an "occurrence" of the string in the graph. There are several ways to define "occurrence", the simplest of which being "a path whose concatenation of labels equals the given string". This is relevant in Bioinformatics (in Pan-genomics) since genomic references are being replaced by labeled graphs, which encoding not only a single reference, but also all the variation observed in a population.
The following publications deal with such problems:
M. Equi, R. Grossi, V. Mäkinen, A. I. Tomescu: On the Complexity of String Matching for Graphs. ICALP 2019: 55:1-55:15
V. Mäkinen, A. I. Tomescu, A. Kuosmanen, T. Paavilainen, T. Gagie, R. Chikhi: Sparse Dynamic Programming on DAGs with Small Width. ACM Trans. Algorithms 15(2): 29:1-29:21 (2019)
A. Kuosmanen, T. Paavilainen, T. Gagie, R. Chikhi, A. I. Tomescu, V. Mäkinen: Using Minimum Path Cover to Boost Dynamic Programming on DAGs: Co-linear Chaining Extended. RECOMB 2018: 105-121
The genome assembly problem asks for the reconstructing for a genome sequence from many small fragments sequenced from it, for example with high-throughut sequencing. Genome assembly is nowadays split into different stages, such as contig assembly, scaffolding, gap filling. Our key contributions to this field are:
Network flows, and in particular min-cost flows, are a computational model generalizing several classical problems on graphs. We were among the first to propose their application to the problem of assembling RNA transcripts from RNA-Seq data:
A. I. Tomescu, A. Kuosmanen, R. Rizzi, V. Mäkinen: A novel min-cost flow method for estimating transcript expression with RNA-Seq. BMC Bioinformatics 14(S-5): S15 (2013)
Later applications of network flows co0authored by our team are:
V. Mäkinen, V. Staneva, A. I. Tomescu, D. Valenzuela, S. Wilzbach: Interval scheduling maximizing minimum coverage. Discrete Applied Mathematics 225: 130-135 (2017)
A. Sobih, A. I. Tomescu, V. Mäkinen: MetaFlow: Metagenomic Profiling Based on Whole-Genome Coverage Analysis with Min-Cost Flows. RECOMB 2016: 111-121
A. Kuosmanen, A. Sobih, R. Rizzi, V. Mäkinen, A. I. Tomescu: On using Longer RNA-seq Reads to Improve Transcript Prediction Accuracy. BIOINFORMATICS 2016: 272-277
A. I. Tomescu, T. Gagie, A. Popa, R. Rizzi, A. Kuosmanen, V. Mäkinen: Explaining a Weighted DAG with Few Paths for Solving Genome-Guided Multi-Assembly. IEEE/ACM Trans. Comput. Biology Bioinform.12(6): 1345-1354 (2015)
R. Rizzi, A. I. Tomescu, V. Mäkinen: On the complexity of Minimum Path Cover with Subpath Constraints for multi-assembly. BMC Bioinformatics 15(S-9): S5 (2014)
We described some of these applications in our Bioinformatics textbook:
By sequecing several samples of a tumor, we can discover the mutations present in each of these, and further use them to reconstruct the history of the tumor, for example as a phylogenetic tree. We did some work involcing such reconstructions, by assuming that the tumor evolution was a perfect phylogeny:
E. Husic, X. Li, A. Hujdurovic, M. Mehine, R. Rizzi, V. Mäkinen, M. Milanic, A. I. Tomescu:
MIPUP: minimum perfect unmixed phylogenies for multi-sampled tumors via branchings and ILP. Bioinformatics 35(5): 769-777 (2019)
A. Hujdurovic, E. Husic, M. Milanic, R. Rizzi, A. I. Tomescu: Perfect Phylogenies via Branchings in Acyclic Digraphs and a Generalization of Dilworth's Theorem. ACM Trans. Algorithms 14(2): 20:1-20:26 (2018) (also WG 2017)
A. Hujdurovic, U. Kacar, M. Milanic, B. Ries, A. I. Tomescu:
Complexity and Algorithms for Finding a Perfect Phylogeny from Mixed Tumor Samples. IEEE/ACM Trans. Comput. Biology Bioinform.15(1): 96-108 (2018) (also WABI 2015)
K. E. van Rens, V. Mäkinen, A. I. Tomescu: SNV-PPILP: refined SNV calling for tumor data using perfect phylogenies and ILP. Bioinformatics 31(7): 1133-1135 (2015)
Well-founded sets are sets which contain as elements only other sets. They can be interpreted as directed graphs, by viewing each set as a node, and each membership relation as a directed edge between the two sets. Once this translation is made, there is a wide range of graph-theoretic questions that can be asked about such directed graphs corresponding to sets. Below is a list of publications dealing with such questions:
M. Milanič, A. I. Tomescu: Set graphs. I. Hereditarily finite sets and extensional acyclic orientations. Discrete Applied Mathematics 161(4-5): 677-690 (2013)
M. Milanič, R. Rizzi, A. I. Tomescu: Set graphs. II. Complexity of set graph recognition and similar problems. Theor. Comput. Sci. 547: 70-81 (2014)
E. G. Omodeo, A. I. Tomescu: Set Graphs. III. Proof Pearl: Claw-Free Graphs Mirrored into Transitive Hereditarily Finite Sets. J. Autom. Reasoning 52(1): 1-29 (2014)
M. Milanič, A. I. Tomescu: Set graphs. IV. Further connections with claw-freeness. Discrete Applied Mathematics 174: 113-121 (2014)
E. G. Omodeo, A. I. Tomescu: Set Graphs. V. On representing graphs as membership digraphs. J. Log. Comput. 25(3): 899-919 (2015)
R. Rizzi, A. I. Tomescu: Ranking, unranking and random generation of extensional acyclic digraphs. Inf. Process. Lett. 113(5-6): 183-187 (2013)
We discuss some of these result in this monograph:
E. G. Omodeo, A. Policriti, A. I. Tomescu: On Sets and Graphs: Perspectives on Logic and Combinatorics. Springer 2017