We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result, we combine several recent findings, including string attractors - new combinatorial objects encompassing most known compressibility measures for highly repetitive texts - and grammars based on locally consistent parsing. More in detail, letγbe the size of the smallest attractor for a text T of length n. The measureγis an (asymptotic) lower bound to the size of dictionary compressors based on Lempel-Ziv, context-free grammars, and many others. The smallest known text representations in terms of attractors use space O(γlog (n/i3)), and our lightest indexes work within the same asymptotic space. Let ϵ > 0 be a suitably small constant fixed at construction time, m be the pattern length, and occ be the number of its text occurrences. Our index counts pattern occurrences in O(m+log 2+ϵ n) time and locates them in O(m+(occ+1)log ϵ n) time. These times already outperform those of most dictionary-compressed indexes, while obtaining the least asymptotic space for any index searching within O((m+occ),polylog, n) time. Further, by increasing the space to O(γlog (n/i3)log ϵ n), we reduce the locating time to the optimal O(m+occ), and within O(γlog (n/i3)log n) space we can also count in optimal O(m) time. No dictionary-compressed index had obtained this time before. All our indexes can be constructed in O(n) space and O(nlog n) expected time. As a by-product of independent interest, we show how to build, in O(n) expected time and without knowing the sizeγof the smallest attractor (which is NP-hard to find), a run-length context-free grammar of size O(γlog (n/i3)) generating (only) T. As a result, our indexes can be built without knowingi3.

Optimal-Time Dictionary-Compressed Indexes

Prezza, Nicola
2020-01-01

Abstract

We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result, we combine several recent findings, including string attractors - new combinatorial objects encompassing most known compressibility measures for highly repetitive texts - and grammars based on locally consistent parsing. More in detail, letγbe the size of the smallest attractor for a text T of length n. The measureγis an (asymptotic) lower bound to the size of dictionary compressors based on Lempel-Ziv, context-free grammars, and many others. The smallest known text representations in terms of attractors use space O(γlog (n/i3)), and our lightest indexes work within the same asymptotic space. Let ϵ > 0 be a suitably small constant fixed at construction time, m be the pattern length, and occ be the number of its text occurrences. Our index counts pattern occurrences in O(m+log 2+ϵ n) time and locates them in O(m+(occ+1)log ϵ n) time. These times already outperform those of most dictionary-compressed indexes, while obtaining the least asymptotic space for any index searching within O((m+occ),polylog, n) time. Further, by increasing the space to O(γlog (n/i3)log ϵ n), we reduce the locating time to the optimal O(m+occ), and within O(γlog (n/i3)log n) space we can also count in optimal O(m) time. No dictionary-compressed index had obtained this time before. All our indexes can be constructed in O(n) space and O(nlog n) expected time. As a by-product of independent interest, we show how to build, in O(n) expected time and without knowing the sizeγof the smallest attractor (which is NP-hard to find), a run-length context-free grammar of size O(γlog (n/i3)) generating (only) T. As a result, our indexes can be built without knowingi3.
2020
17
File in questo prodotto:
File Dimensione Formato  
Optimal_Time_Dictionary_Compressed_Indexes__TALG_.pdf

non disponibili

Descrizione: articolo principale, versione postprint
Tipologia: Documento in Post-print
Licenza: Accesso chiuso-personale
Dimensione 531.84 kB
Formato Adobe PDF
531.84 kB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/3734790
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 22
  • ???jsp.display-item.citation.isi??? 12
social impact