On Computing the Smallest Suffixient Set

Cenzato, Davide; Olivares, Francisco; Prezza, Nicola

doi:10.1007/978-3-031-72200-4_6

\emph{suffixient set} $\mathcal S \subseteq [n]$ for $T$ is a set of positions such that, for every one-character right-extension $T[i,j]$ of every right-maximal substring $T[i,j-1]$ of $T$, there exists $x\in S$ such that $T[i,j]$ is a suffix of $T[1,x]$. It was recently shown that, given a suffixient set of cardinality $q$ and an oracle offering fast random access on $T$ (for example, a straight-line program), there is a data structure of $O(q)$ words (on top of the oracle) that can quickly find all Maximal Exact Matches (MEMs) of any query pattern $P$ in $T$ with high probability. The paper introducing suffixient sets left open the problem of computing the smallest such set; in this paper, we solve this problem by describing a simple quadratic-time algorithm, a $O(n + \bar r|\Sigma|)$-time algorithm running in compressed working space ($\bar r$ is the number of runs in the Burrows-Wheeler transform of $T$ reversed), and an optimal $O(n)$-time algorithm computing the smallest suffixient set. We present an implementation of our compressed-space algorithm and show experimentally that it uses a small memory footprint on repetitive text collections.

On Computing the Smallest Suffixient Set

Cenzato, Davide;Olivares, Francisco;Prezza, Nicola

2024-01-01

Abstract

\emph{suffixient set} $\mathcal S \subseteq [n]$ for $T$ is a set of positions such that, for every one-character right-extension $T[i,j]$ of every right-maximal substring $T[i,j-1]$ of $T$, there exists $x\in S$ such that $T[i,j]$ is a suffix of $T[1,x]$. It was recently shown that, given a suffixient set of cardinality $q$ and an oracle offering fast random access on $T$ (for example, a straight-line program), there is a data structure of $O(q)$ words (on top of the oracle) that can quickly find all Maximal Exact Matches (MEMs) of any query pattern $P$ in $T$ with high probability. The paper introducing suffixient sets left open the problem of computing the smallest such set; in this paper, we solve this problem by describing a simple quadratic-time algorithm, a $O(n + \bar r|\Sigma|)$-time algorithm running in compressed working space ($\bar r$ is the number of runs in the Burrows-Wheeler transform of $T$ reversed), and an optimal $O(n)$-time algorithm computing the smallest suffixient set. We present an implementation of our compressed-space algorithm and show experimentally that it uses a small memory footprint on repetitive text collections.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2024
			
	Titolo del volume
	
				Proceedings of the 31st International Symposium on String Processing and Information Retrieval (SPIRE 2024)
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-031-72200-4_6
			
	Appare nelle tipologie:
	
				4.1 Articolo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
suffixient.pdf non disponibili Descrizione: full text Tipologia: Documento in Pre-print Licenza: Copyright dell'editore Dimensione 649.01 kB Formato Adobe PDF Visualizza/Apri	649.01 kB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5076323

Citazioni

ND

0

0

On Computing the Smallest Suffixient Set

Cenzato, Davide;Olivares, Francisco;Prezza, Nicola

2024-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)