We propose a new compressed representation for weighted de Bruijn graphs, which is based on the idea of delta-encoding the variations of k-mer abundances on a spanning branching of the graph. Our new data structure is likely to be of practical value: to give an idea, when combined with the compressed BOSS de Bruijn graph representation, it encodes the weighted de Bruijn graph of a 16x-covered DNA read-set (60M distinct k-mers, k = 28) within 4.15 bits per distinct k-mer and can answer abundance queries in about 60 microseconds on a standard machine. In contrast, state of the art tools declare a space usage of at least 30 bits per distinct k-mer for the same task, which is confirmed by our experiments. As a by-product of our new data structure, we exhibit efficient compressed data structures for answering partial sums on edge-weighted trees, which might be of independent interest.
Nicola Prezza (Corresponding)
|Data di pubblicazione:||2021|
|Titolo:||Compressed Weighted de Bruijn Graphs|
|Titolo del libro:||32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)|
|Digital Object Identifier (DOI):||http://dx.doi.org/10.4230/lipics.cpm.2021.16|
|Appare nelle tipologie:||4.1 Articolo in Atti di convegno|