Compressed Data Structures

The team led by Professor Simon J. Puglisi focuses on efficient algorithms and data structures for searching, storing, and manipulating strings, trees and graphs, and applications thereof (like bioinformatics, information retrieval, database systems, and data mining).

Compressed data structures lie at the intersection of data structures and data compression. Often, if a particular data set is compressible, a data structure built from it is compressible too. The aim of compressed data structuring is to reduce the space consumed by a data structure using techniques from data compression, but to simultaneously support fast operations on the data structure too. Compressed data structures (and their cousins, succinct data structures) are a relatively young branch of computer science, but have already found heavy use in bioinformatics, database systems, and search engines.

Team members

Alumni

Selected Software

Selected Recent Publications

  • Jarno N. Alanko, Jaakko Vuohtoniemi, Tommi Mäklin, and Simon J. Puglisi. Themisto: a scalable colored k-mer index for sensitive pseudoalignment against hundreds of thousands of bacterial genomes. Bioinformatics, 39(Supplement 1):i260–i269, 2023 [Article online] [Code]
  • Jarno N. Alanko, Simon J. Puglisi, Jaakko Vuohtoniemi. Small Searchable k-Spectra via Subset Rank Queries on the Spectral Burrows-Wheeler Transform. ACDA 2023: 225-236 [Article online] [Code]
  • Jarno N. Alanko, Elena Biagi, Simon J. Puglisi. Longest Common Prefix Arrays for Succinct k-Spectra. SPIRE 2023: 1-13 [Article online] [Code]
  • Diego Díaz-Domínguez, Saska Dönges, Simon J. Puglisi, Leena Salmela:
    Simple Runs-Bounded FM-Index Designs Are Fast. SEA 2023: 7:1-7:16 [Article online] [Code]
  • Dominik Köppl, Simon J. Puglisi, Rajeev Raman:
    Fast and Simple Compact Hashing via Bucketing. Algorithmica 84(9): 2735-2766 (2022) [Article online]
  • Saska Dönges, Simon J. Puglisi, Rajeev Raman:
    On Dynamic Bitvector Implementations. DCC 2022: 252-261. [Article online] [Code]
  • Block Trees, Journal of Computer and System Sciences 117:1-22 (2021). [Article online]
  • L. Salmela, K. Mukherjee, S.J. Puglisi, M.D. Muggli, and C. Boucher: Fast and accurate correction of optical mapping data via spaced seeds. Bioinformatics 36(3): 682-689 (2020). [Article online] [Implementation]
  • M. Cáceres, S.J. Puglisi, B. Zhukova: Fast Indexes for Gapped Pattern Matching. Proceedings of SOFSEM 2020: 493-504. [Article online]
  • S. Gog, J. Kärkkäinen, D. Kempa, M. Petri, S.J. Puglisi: Fixed Block Compression Boosting in FM-Indexes: Theory and Practice, Algorithmica 81(4): 1370-1391 (2019). [Article online]
  • Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi, Bella Zhukova: Engineering External Memory Induced Suffix Sorting. ALENEX 2017: 98-108
  • Juha Kärkkäinen, Marcin Piatkowski, Simon J. Puglisi: String Inference from Longest-Common-Prefix Array. ICALP 2017: 62:1-62:14
  • Djamal Belazzougui, Simon J. Puglisi: Range Predecessor and Lempel-Ziv Parsing. SODA 2016: 2053-2071
  • Yasuo Tabei, Hiroto Saigo, Yoshihiro Yamanishi, Simon J. Puglisi: Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices. KDD 2016: 1875-1884

Recent News/Activities

October 2024

Simon visiting Zsuzsanna Lipták in Verona.

September 2024

"Height-bounded Lempel-Ziv Encodings" (coauthored by Hideo Bannai,  Mitsuru Funakoshi, Diptarama Hendrian, Myuji Matsuda, and Simon J. Puglisi) wins the Best Paper Award at the European Symposium on Algorithms in London.

"Batched k-mer lookup on the Spectral Burrows-Wheeler Transform" (coauthored by Jarno Alanko, Elena Biagi, Joel Mackenzie, Simon J. Puglisi) is accepted to ALENEX '25.

August 2024

Anastasia Diseth starts her doctoral studies!

July 2024

Simon attends the Shonan seminar on Theoretical Foundations of Nonvolatile Memory.

June 2024

Simon chairs (with Shunsuke Inenaga) the 35th Annual Symposium on Combinatorial Pattern Matching Fukuoka, Japan, June 25–27.

April 2024

Dr. Joel Mackenzie of U. Queensland visiting us in Helsinki.

March 2024

Elena and Jarno attend DSB 2024 in Montpellier and present our Finimizer results.

February 2024

Bella Zhukova successfully defends her PhD thesis (!) against opponent Prof. Johannes Fischer of TU Dortmund.

Simon attends the Sequences meeting at City University of London.

December 2023

Simon visiting Hideo Bannai in Tokyo.

Touko Puro receives his MSc degree.

November 2023

Simon is an opponent at Garance Gourdel's PhD defence at Ecole Normale Superieur in Paris.

October 2023

Simon is an opponent at Max Pedersen's PhD defence at TU Denmark.

Elena, Jarno, and Simon attend the Symposium on String Processing and Information Retrieval (SPIRE) in Pisa. Elena presenting work on Longest-common-suffix array construction.

July 2022

Simon attends the Annual Symposium on Combinatorial Pattern Matching in Paris.

June 2022

Elena, Saska, and Simon attend the Symposium on Experimental Algorithms (SEA) in Barcelona.

Elena, Jarno, and Simon visit Zamin Iqbal's lab at the European Bioinformatics Institute.

May 2023

"Small Searchable k-Spectra via Subset Rank Queries on the Spectral Burrows-Wheeler Transform" wins the Best Paper Award at the SIAM Conference on Applied and Computational Discrete Algorithms in Seattle, USA.

October 2022

Elena Biagi (late of U. Verona) is starting a PhD in our group!

April 2022

Saska presents at the Data Compression Conference in Snowbird, Utah.

June 2021

Weighted-level ancestor queries revisited wins the Alberto Apostolico Best Paper Award at the Annual Symposium on Combinatorial Pattern Matching.

March 2021

Papers accepted to CPM and SEA:

D. Belazzogui, D. Kosolobov, S. J. Puglisi, & R. Raman, "Weighted-level ancestor queries revisited", to appear at CPM
S. J. Puglisi & B. Zhukova, "Document Listing Hacks", to appeat at SEA

January 2021

Papers appear in print at JCSS (on block trees) and TCS (on suffix tree breadth)
Saska Dönges starts as a PhD student

December 2020

Three journal papers appear this month at Bioinformatics, Algorithmica, and Algorithms + two papers accepted to the 2021 Data Compression Conference:

Simon J. Puglisi & Bella Zhokova, "Smaller RLZ-compressed suffix arrays"
Danyang Ma, Simon J. Puglisi, Rajeev Raman & Bella Zhokova, "On Elias-Fano for Rank Queries in FM-Indexes"

November 2020
Lauri Heino receives his MSc degree

September 2020
Pekka Jylha-Ollila receives his MSc degree

July 2020
Jussi Timonen, Antti Karjalainen, and Risto Haapasalmi receive their MSc degrees
Bella and Simon have paper "Relative Lempel-Ziv compression of suffix array" accepted to SPIRE 2020

May 2020
Simon is opponent for Florian Kurpicz's PhD thesis defence at TU Dortmund

March 2020
Two papers accepted to SEA 2020 in Palermo

February 2020
Simon attending London Stringology Days at King's College London

January 2020
Bella presenting "Fast Indexes for Gapped Pattern Matching" at SOFSEM 2020 in Cyprus

December 2019
Simon visiting Yasuo Tabei and Hideo Bannai at RIKEN AIP Tokyo
Simon visiting Alistair Moffat and Joel MacKenzie at U. Melbourne

November 2019
Dustin Cobas visiting from U. Chile

October 2019
Simon co-chairing (with Nieves Brisaboa) SPIRE 2019 conference in Segovia, Spain
Bella attending SPIRE 2019 and WCTA 2019 in Segovia, Spain

September 2019
Javier Soto visiting from U. Concepcion

August 2019
Mabel Vidal visiting from U. Concepcion

July 2019
Rajeev Raman visiting from U. Leicester

June 2019
Massimiliano Rossi visiting from U. Verona

May 2019
Simon visiting Rayan Chikhi's group at Institute Pasteur in Paris
Simon visiting Golnaz Badkobeh at Goldsmiths University of London
Rayan Chikhi and Camille Marchet visiting from Institute Pasteur in Paris for Helsinki Bioinformatics Day
Sara Giuliani starting as a summer research intern
Uula Ulkuniemi starting us as a summer research intern

Alumni/former students

  • Touko Puro (MSc, U. Helsinki, December 2023)
  • Eve Kivivuori (MSc, U. Helsinki, July 2023)
  • Heikki Lehtosalo (Research Programmer, Nov 2019-Nov 2020)
  • Lauri Heino (MSc, U. Helsinki, November 2020)
  • Pekka Jylha-Ollila (MSc, U. Helsinki, September 2020)
  • Antti Karjalainen (MSc, U. Helsinki, July 2020)
  • Risto Haapasalmi (MSc, U. Helsinki, July 2020)
  • Jussi Timonen (MSc, U. Helsinki, July 2020)
  • Sara Giuliani (summer research intern, May-July 2019)
  • Uula Ulkuniemi (summer research intern, May-July 2019)
  • Joonsa Nietosvaara (MSc, U. Helsinki, graduated 2019)
  • Massimiliano Rossi (visiting PhD student from U. Verona, July-October 2018)
  • Christopher Hoobin (PhD, RMIT, graduated 2015, now at NASDAQ)
  • Jasbir Dhaliwal (PhD, RMIT, graduated 2014, now at Monash U., formerly at IBM)
  • Alex Bowe (MSc, RMIT, graduated 2013, now at Cruise Automation, PhD from U. Tokyo)
  • Shanika Kuruppu (PhD, U. Melbourne, graduated 2013, now at Google, co-supervised with Justin Zobel)