State-of-the-art encoders for inverted indexes compress each posting list individually. Encoding clusters of posting lists offers the possibility of reducing the redundancy of the lists while maintaining a noticeable query processing speed. In this article, we propose a new index representation based on clustering the collection of posting lists and, for each created cluster, building an ad hoc reference list with respect to which all lists in the cluster are encoded with Elias-Fano. We describe a posting lists clustering algorithm tailored for our encoder and two methods for building the reference list for a cluster. Both approaches are heuristic and differ in the way postings are added to the reference list: according to their frequency in the cluster or according to the number of bits necessary for their representation. The extensive experimental analysis indicates that significant space reductions are indeed possible, beating the best state-of-the-art encoders.
Clustered Elias-Fano indexes
Pibiri G. E.;Venturini R.
2017-01-01
Abstract
State-of-the-art encoders for inverted indexes compress each posting list individually. Encoding clusters of posting lists offers the possibility of reducing the redundancy of the lists while maintaining a noticeable query processing speed. In this article, we propose a new index representation based on clustering the collection of posting lists and, for each created cluster, building an ad hoc reference list with respect to which all lists in the cluster are encoded with Elias-Fano. We describe a posting lists clustering algorithm tailored for our encoder and two methods for building the reference list for a cluster. Both approaches are heuristic and differ in the way postings are added to the reference list: according to their frequency in the cluster or according to the number of bits necessary for their representation. The extensive experimental analysis indicates that significant space reductions are indeed possible, beating the best state-of-the-art encoders.File | Dimensione | Formato | |
---|---|---|---|
TOIS2017.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
Accesso libero (no vincoli)
Dimensione
1.3 MB
Formato
Adobe PDF
|
1.3 MB | Adobe PDF | Visualizza/Apri |
I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.