We show how to build several data structures of central importance to string processing by taking as input the Burrows-Wheeler transform (BWT) and using small extra working space. Let n be the text length and σ be the alphabet size. We first provide two algorithms that enumerate all LCP values and suffix tree intervals in O(nlog⁡σ) time using just o(nlog⁡σ) bits of working space on top of the input re-writable BWT. Using these algorithms as building blocks, for any parameter 0<ϵ≤1 we show how to build the PLCP bitvector and the balanced parentheses representation of the suffix tree topology in O(n(log⁡σ+ϵ−1⋅log⁡log⁡n)) time using at most nlog⁡σ⋅(ϵ+o(1)) bits of working space on top of the input re-writable BWT and the output. For example, we can build a compressed suffix tree from the BWT using just succinct working space (i.e. o(nlog⁡σ) bits) and Θ(nlog⁡σ+n(log⁡log⁡n)1+δ) time, for any constant δ>0. This improves the previous most space-efficient algorithms, which worked in O(n) bits and O(nlog⁡n) time. We also consider the problem of merging BWTs of string collections, and provide a solution running in O(nlog⁡σ) time and using just o(nlog⁡σ) bits of working space. An efficient implementation of our LCP construction and BWT merge algorithms uses (in RAM) as few as n bits on top of a packed representation of the input/output and process data as fast as 2.92 megabases per second.

Space-efficient construction of compressed suffix trees

Prezza, Nicola;
2021-01-01

Abstract

We show how to build several data structures of central importance to string processing by taking as input the Burrows-Wheeler transform (BWT) and using small extra working space. Let n be the text length and σ be the alphabet size. We first provide two algorithms that enumerate all LCP values and suffix tree intervals in O(nlog⁡σ) time using just o(nlog⁡σ) bits of working space on top of the input re-writable BWT. Using these algorithms as building blocks, for any parameter 0<ϵ≤1 we show how to build the PLCP bitvector and the balanced parentheses representation of the suffix tree topology in O(n(log⁡σ+ϵ−1⋅log⁡log⁡n)) time using at most nlog⁡σ⋅(ϵ+o(1)) bits of working space on top of the input re-writable BWT and the output. For example, we can build a compressed suffix tree from the BWT using just succinct working space (i.e. o(nlog⁡σ) bits) and Θ(nlog⁡σ+n(log⁡log⁡n)1+δ) time, for any constant δ>0. This improves the previous most space-efficient algorithms, which worked in O(n) bits and O(nlog⁡n) time. We also consider the problem of merging BWTs of string collections, and provide a solution running in O(nlog⁡σ) time and using just o(nlog⁡σ) bits of working space. An efficient implementation of our LCP construction and BWT merge algorithms uses (in RAM) as few as n bits on top of a packed representation of the input/output and process data as fast as 2.92 megabases per second.
File in questo prodotto:
File Dimensione Formato  
suffix_trees.pdf

accesso aperto

Descrizione: articolo principale, versione postprint (arXiv)
Tipologia: Documento in Pre-print
Licenza: Accesso gratuito (solo visione)
Dimensione 329.09 kB
Formato Adobe PDF
329.09 kB Adobe PDF Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/3734792
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 2
social impact