In-place sparse suffix sorting

Suffix arrays encode the lexicographical order of all suffixes of a text and are often combined with the Longest Common Prefix array (LCP) to simulate navigational queries on the suffix tree in reduced space. In space-critical applications such as sparse and compressed text indexing, only information regarding the lexicographical order of a size-b subset of all n text suffixes is often needed. Such information can be stored space-efficiently (in b words) in the sparse suffix array (SSA). The SSA and its relative sparse LCP array (SLCP) can be used as a space-efficient substitute of the sparse suffix tree. Very recently, Gawrychowski and Kociumaka [11] showed that the sparse suffix tree (and therefore SSA and SLCP) can be built in asymptotically optimal O(b) space with a Monte Carlo algorithm running in O(n) time. The main reason for using the SSA and SLCP arrays in place of the sparse suffix tree is, however, their reduced space of b words each. This leads naturally to the quest for in-place algorithms building these arrays. Franceschini and Muthukrishnan [8] showed that the full suffix array can be built in-place and in optimal running time. On the other hand, finding sub-quadratic in-place algorithms for building the SSA and SLCP for general subsets of suffixes has been an elusive task for decades. In this paper, we give the first solution to this problem. We provide the first in-place algorithm building the full LCP array in O(n log n) expected time and the first Monte Carlo in-place algorithms building the SSA and SLCP in O(n + b log2 n) expected time. We moreover describe the first in-place solution for the suffix selection problem: to compute the i-th smallest text suffix. In order to achieve these results, we show that we can quickly overwrite the text with a reversible and implicit data structure supporting Longest Common Extension queries in polylogarithmic time and text extraction in optimal time: this structure is strictly more powerful than a plain text representation and is of independent interest.

In-place sparse suffix sorting

Prezza N.

2018

Abstract

Suffix arrays encode the lexicographical order of all suffixes of a text and are often combined with the Longest Common Prefix array (LCP) to simulate navigational queries on the suffix tree in reduced space. In space-critical applications such as sparse and compressed text indexing, only information regarding the lexicographical order of a size-b subset of all n text suffixes is often needed. Such information can be stored space-efficiently (in b words) in the sparse suffix array (SSA). The SSA and its relative sparse LCP array (SLCP) can be used as a space-efficient substitute of the sparse suffix tree. Very recently, Gawrychowski and Kociumaka [11] showed that the sparse suffix tree (and therefore SSA and SLCP) can be built in asymptotically optimal O(b) space with a Monte Carlo algorithm running in O(n) time. The main reason for using the SSA and SLCP arrays in place of the sparse suffix tree is, however, their reduced space of b words each. This leads naturally to the quest for in-place algorithms building these arrays. Franceschini and Muthukrishnan [8] showed that the full suffix array can be built in-place and in optimal running time. On the other hand, finding sub-quadratic in-place algorithms for building the SSA and SLCP for general subsets of suffixes has been an elusive task for decades. In this paper, we give the first solution to this problem. We provide the first in-place algorithm building the full LCP array in O(n log n) expected time and the first Monte Carlo in-place algorithms building the SSA and SLCP in O(n + b log2 n) expected time. We moreover describe the first in-place solution for the suffix selection problem: to compute the i-th smallest text suffix. In order to achieve these results, we show that we can quickly overwrite the text with a reversible and implicit data structure supporting Longest Common Extension queries in polylogarithmic time and text extraction in optimal time: this structure is strictly more powerful than a plain text representation and is of independent interest.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2018
			
	Titolo del volume
	
				Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms
			
	DOI
	
				https://dx.doi.org/10.1137/1.9781611975031.98
			
	Appare nelle tipologie:
	
				4.1 Articolo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
sparse suffix sorting.pdf non disponibili Dimensione 392.07 kB Formato Adobe PDF Visualizza/Apri	392.07 kB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/3729812

Citazioni

ND

10

7

social impact