Some selected tools from the GSA group can be found here.
A metanenomic sample is a set of sequences of reads from microbial life living in a particular environment. Standard analysis involves estimating the species composition of the environment by aligning the reads against a reference database. Since the age of pangenomics, alignment is preferentially done against a variation graph encompassing all variation within a species.
Seq2DAGChainer is a prototype implementation of the algorithms proposed in the article
We published in
A framework to Pan-Genomize your Variant Calling pipeline.
Source code:
Reproducibility:
Pan-genome indexing:
A program for community profiling of a metagenomic sample, described in our RECOMB 2016 paper.
A software for RNA transcript expression prediction from long read RNA-sequencing data.
Through transcription and alternative splicing, a gene can be transcribed into different RNA sequences (isoforms). The development of third-generation sequencers allowed for sequencing of reads up to several kilobases long. Compared to the short next-generation sequencing reads, which generally only span two exons at most, long reads can give additional information about which non-neighboring exons are part of which transcript.
Traphlor is a novel transcript prediction tool that utilizes the connectivity information gained from long reads spanning more than two exons. It is based on the idea of modeling long reads as subpath constraints, presented in the article
For any questions, please contact us: aekuosma[at]cs.helsinki.fi
A software that implements an external-memory algorithm constructing the so-called LZ-End parsing (a variation of LZ77) of a given string of length n in O(n log L) expected time using O(z + L) space, where z is the number of phrases in the parsing and L is the limit on the length of the phrase. This parsing serves as a basis for a compressed index of Kreft and Navarro that allows fast access to the compressed string without uncompression. The algorithm constructs the parsing in streaming fashion in one left to right pass on the input string w.h.p. and performs one right to left pass to verify the correctness of the result.
Details of the algorithms and experimental evaluation can be found in the paper: Dominik Kempa, Dmitry Kosolobov "LZ-End Parsing in Compressed Space". In 2017 Data Compression Conference (DCC 2017), pages 350-359, 2017 (
A library that contains some proof-of-concept implementations of the various sequence analysis tasks considered in our ESA 2013 paper. Available on
time (s) | space (MB) | |
ours | 751 | 207 |
---|---|---|
vmatch | 437 | 938 |
mummer | 97 | 930 |
Software for RNA transcript expression prediction from RNA-sequencing data. See our RECOMB-seq and WABI 2013 papers.
Computes the unit cost edit distance between a haploid and a reference guided recombination of two diploids.
A tool for mapping (short) DNA reads into reference sequences. This is not as fast as some other Burrows-Wheeler-based aligners, but implements faithfully k-mismatches and k-errors search where some other tools may solve a slighly different or implicitly defined problem.