Compressed Data Structures

The team led by Professor Simon J. Puglisi focuses on efficient algorithms and data structures for searching, storing, and manipulating strings, trees and graphs, and applications thereof (like bioinformatics, information retrieval, database systems, and data mining).

Compressed data structures lie at the intersection of data structures and data compression. Often, if a particular data set is compressible, a data structure built from it is compressible too. The aim of compressed data structuring is to reduce the space consumed by a data structure using techniques from data compression, but to simultaneously support fast operations on the data structure too. Compressed data structures (and their cousins, succinct data structures) are a relatively young branch of computer science, but have already found heavy use in bioinformatics, database systems, and search engines.

Team members

Selected Software

Selected Recent Publications

  • Jarno N. Alanko, Jaakko Vuohtoniemi, Tommi Mäklin, and Simon J. Puglisi. Themisto: a scalable colored k-mer index for sensitive pseudoalignment against hundreds of thousands of bacterial genomes. Bioinformatics, 39(Supplement 1):i260–i269, 2023 [Article online] [Code]
  • Jarno N. Alanko, Simon J. Puglisi, Jaakko Vuohtoniemi:

    Small Searchable k-Spectra via Subset Rank Queries on the Spectral Burrows-Wheeler Transform. ACDA 2023: 225-236 [Article online] [Code]
  • Jarno N. Alanko, Elena Biagi, Simon J. Puglisi:

    Longest Common Prefix Arrays for Succinct k-Spectra. SPIRE 2023: 1-13 [Article online] [Code]
  • Diego Díaz-Domínguez, Saska Dönges, Simon J. Puglisi, Leena Salmela:

    Simple Runs-Bounded FM-Index Designs Are Fast. SEA 2023: 7:1-7:16 [Article online] [Code]

    Dominik Köppl, Simon J. Puglisi, Rajeev Raman:

    Fast and Simple Compact Hashing via Bucketing. Algorithmica 84(9): 2735-2766 (2022) [Article online]
  • Saska Dönges, Simon J. Puglisi, Rajeev Raman:

    On Dynamic Bitvector Implementations. DCC 2022: 252-261. [Article online] [Code]
  • Block Trees, Journal of Computer and System Sciences 117:1-22 (2021). [Article online]
  • L. Salmela, K. Mukherjee, S.J. Puglisi, M.D. Muggli, and C. Boucher: Fast and accurate correction of optical mapping data via spaced seeds. Bioinformatics 36(3): 682-689 (2020). [Article online] [Implementation]
  • M. Cáceres, S.J. Puglisi, B. Zhukova: Fast Indexes for Gapped Pattern Matching. Proceedings of SOFSEM 2020: 493-504. [Article online]
  • S. Gog, J. Kärkkäinen, D. Kempa, M. Petri, S.J. Puglisi: Fixed Block Compression Boosting in FM-Indexes: Theory and Practice, Algorithmica 81(4): 1370-1391 (2019). [Article online]
  • Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi, Bella Zhukova: Engineering External Memory Induced Suffix Sorting. ALENEX 2017: 98-108
  • Juha Kärkkäinen, Marcin Piatkowski, Simon J. Puglisi: String Inference from Longest-Common-Prefix Array. ICALP 2017: 62:1-62:14
  • Djamal Belazzougui, Simon J. Puglisi: Range Predecessor and Lempel-Ziv Parsing. SODA 2016: 2053-2071
  • Yasuo Tabei, Hiroto Saigo, Yoshihiro Yamanishi, Simon J. Puglisi: Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices. KDD 2016: 1875-1884

Recent News/Activities

December 2023

Simon visiting Hideo Bannai in Tokyo.

Touko Puro receives his MSc degree.

November 2023

Simon is an opponent at Garance Gourdel's PhD defence at Ecole Normale Superieur in Paris.

October 2023

Simon is an opponent at Max Pedersen's PhD defence at TU Denmark.

Elena, Jarno, and Simon attend the Symposium on String Processing and Information Retrieval (SPIRE) in Pisa. Elena presenting work on Longest-common-suffix array construction.

July 2022

Simon attends the Annual Symposium on Combinatorial Pattern Matching in Paris.

June 2022

Elena, Saska, and Simon attend the Symposium on Experimental Algorithms (SEA) in Barcelona.

Elena, Jarno, and Simon visit Zamin Iqbal's lab at the European Bioinformatics Institute.

May 2023

"Small Searchable k-Spectra via Subset Rank Queries on the Spectral Burrows-Wheeler Transform" wins the Best Paper Award at the SIAM Conference on Applied and Computational Discrete Algorithms in Seattle, USA.

October 2022

Elena Biagi (late of U. Verona) is starting a PhD in our group!

April 2022

Saska presents at the Data Compression Conference in Snowbird, Utah.

June 2021

Weighted-level ancestor queries revisited wins the Alberto Apostolico Best Paper Award at the Annual Symposium on Combinatorial Pattern Matching.

March 2021

Papers accepted to CPM and SEA:

D. Belazzogui, D. Kosolobov, S. J. Puglisi, & R. Raman, "Weighted-level ancestor queries revisited", to appear at CPM

S. J. Puglisi & B. Zhukova, "Document Listing Hacks", to appeat at SEA

January 2021

Papers appear in print at JCSS (on block trees) and TCS (on suffix tree breadth)

Saska Dönges starts as a PhD student

December 2020

Three journal papers appear this month at Bioinformatics, Algorithmica, and Algorithms + two papers accepted to the 2021 Data Compression Conference:

Simon J. Puglisi & Bella Zhokova, "Smaller RLZ-compressed suffix arrays"

Danyang Ma, Simon J. Puglisi, Rajeev Raman & Bella Zhokova, "On Elias-Fano for Rank Queries in FM-Indexes"

November 2020

Lauri Heino receives his MSc degree

September 2020

Pekka Jylha-Ollila receives his MSc degree

July 2020

Jussi Timonen, Antti Karjalainen, and Risto Haapasalmi receive their MSc degrees

Bella and Simon have paper "Relative Lempel-Ziv compression of suffix array" accepted to SPIRE 2020

May 2020

Simon is opponent for Florian Kurpicz's PhD thesis defence at TU Dortmund

March 2020

Two papers accepted to SEA 2020 in Palermo

February 2020

Simon attending London Stringology Days at King's College London

January 2020

Bella presenting "Fast Indexes for Gapped Pattern Matching" at SOFSEM 2020 in Cyprus

December 2019

Simon visiting Yasuo Tabei and Hideo Bannai at RIKEN AIP Tokyo

Simon visiting Alistair Moffat and Joel MacKenzie at U. Melbourne

November 2019

Dustin Cobas visiting from U. Chile

October 2019

Simon co-chairing (with Nieves Brisaboa) SPIRE 2019 conference in Segovia, Spain

Bella attending SPIRE 2019 and WCTA 2019 in Segovia, Spain

September 2019

Javier Soto visiting from U. Concepcion

August 2019

Mabel Vidal visiting from U. Concepcion

July 2019

Rajeev Raman visiting from U. Leicester

June 2019

Massimiliano Rossi visiting from U. Verona

May 2019

Simon visiting Rayan Chikhi's group at Institute Pasteur in Paris

Simon visiting Golnaz Badkobeh at Goldsmiths University of London

Rayan Chikhi and Camille Marchet visiting from Institute Pasteur in Paris for Helsinki Bioinformatics Day

Sara Giuliani starting as a summer research intern

Uula Ulkuniemi starting us as a summer research intern

Alumni/former students

  • Touko Puro (MSc, U. Helsinki, December 2023)
  • Eve Kivivuori (MSc, U. Helsinki, July 2023)
  • Heikki Lehtosalo (Research Programmer, Nov 2019-Nov 2020)
  • Lauri Heino (MSc, U. Helsinki, November 2020)
  • Pekka Jylha-Ollila (MSc, U. Helsinki, September 2020)
  • Antti Karjalainen (MSc, U. Helsinki, July 2020)
  • Risto Haapasalmi (MSc, U. Helsinki, July 2020)
  • Jussi Timonen (MSc, U. Helsinki, July 2020)
  • Sara Giuliani (summer research intern, May-July 2019)
  • Uula Ulkuniemi (summer research intern, May-July 2019)
  • Joonsa Nietosvaara (MSc, U. Helsinki, graduated 2019)
  • Massimiliano Rossi (visiting PhD student from U. Verona, July-October 2018)
  • Christopher Hoobin (PhD, RMIT, graduated 2015, now at NASDAQ)
  • Jasbir Dhaliwal (PhD, RMIT, graduated 2014, now at Monash U., formerly at IBM)
  • Alex Bowe (MSc, RMIT, graduated 2013, now at Cruise Automation, PhD from U. Tokyo)
  • Shanika Kuruppu (PhD, U. Melbourne, graduated 2013, now at Google, co-supervised with Justin Zobel)