Preface to “Computation over Compressed Data” at DCC 2022

Gagie, Travis; Prezza, Nicola

doi:10.1016/j.ic.2024.105165

This is the fifth special issue of the “computation over compressed data” session at the IEEE Data Compression Conference (DCC), which took place in Snowbird on March 22-24, 2022. This special issue features six articles from the field of compressed data structures, covering compact graph data structures and efficient algorithms and indexes for several variants of the ubiquitous Burrows-Wheeler transform. In “Compact representations of spatial hierarchical structures with support for topological queries”, José Fuentes-Sepúlveda, Diego Gatica, Gonzalo Navarro, M. Andrea Rodríguez, and Diego Seco show how to compactly represent hierarchical bidimensional spatial regions (such as the administrative partition of a country) while supporting a number of useful queries on the data. Their representation builds upon compact planar graph embeddings. New compact data structures for graphs are presented also in “Succinct data structure for path graphs”. Here, the authors Girish Balakrishnan, Sankardeep Chakraborty, N.S. Narayanaswamy and Kunihiko Sadakane show how to support efficient degree, adjacency, and neighborhood queries on path graphs in a space matching the information-theoretic lower bound. The remaining four papers deal with variants of the Burrows-Wheeler transform (BWT), a fundamental tool in text compression and indexing whose introduction in 1994 by Michael Burrows and David J. Wheeler opened a prolific line of research in the fields of data compression and indexing. In “A new class of string transformations for compressed text indexing”, Raffaele Giancarlo, Giovanni Manzini, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino generalize the BWT by using context-adaptive alphabet orderings. They show that, under certain conditions, on very repetitive text collections these more general transformations are indexable and more compressible than the original BWT. While the problem of building the BWT in linear time and space is well-understood to date, such repetitive collections pose the challenge of solving this task in compressed space: in those cases, the input is too large to be kept in memory in uncompressed format. This problem is tackled by Diego Díaz-Domínguez and Gonzalo Navarro in “Efficient construction of the BWT for repetitive text using string compression”. In this article, the authors show how to build a BWT variant (BCR BWT) with an algorithm keeping its intermediate results in grammar-compressed form. The resulting algorithm spectacularly achieves its goal, using a working space not exceeding 10% of the input’s size on real repetitive datasets. In “Constructing and indexing the bijective and extended Burrows–Wheeler transform”, Hideo Bannai, Juha Kärkkäinen, Dominik Köppl, and Marcin Pia̧tkowski consider the problem of building another BWT variant: the bijective BWT. They describe the first linear-time construction algorithm for such a variant, and present the first self-index based on the bijective BWT. Last but not least, in “r-indexing the eBWT” the authors Christina Boucher, Davide Cenzato, Zsuzsanna Lipták, Massimiliano Rossi, and Marinella Sciortino show how to index yet another BWT variant: the eBWT, originally designed to compactly represent collections of circular strings. They describe an extension of the r-index for such collections, able to simultaneously achieve high compression rates on very repetitive data, and (for the first time) to index circular strings. The success of this special issue shows that the field of compressed data structures is still very active and flourishing. As a matter of fact, these six articles clearly indicate two of the directions that the community is taking: compressing and indexing graphs and very repetitive text collections. In March 2024 a new special session on “computation over compressed data” took place at DCC, featuring ten new articles in the field which may result in a new successful special issue.