Learned dense representations are a popular family of techniques for encoding queries and documents using high-dimensional em- beddings, which enable retrieval by performing approximate 𝑘 nearest-neighbors search (A-𝑘NN). A popular technique for mak- ing A-𝑘NN search efficient is based on a two-level index, where the embeddings of documents are clustered offline and, at query processing, a fixed number 𝑁 of clusters closest to the query is visited exhaustively to compute the result set. In this paper, we build upon state-of-the-art for early exit A- 𝑘NN and propose an unsupervised method based on the notion of patience, which can reach competitive effectiveness with large efficiency gains. Moreover, we discuss a cascade approach where we first identify queries that find their nearest neighbor within the closest 𝜏 ≪𝑁 clusters, and then we decide how many more to visit based on our patience approach or other state-of-the-art strate- gies. Reproducible experiments employing state-of-the-art dense retrieval models and publicly available resources show that our techniques improve the A-𝑘NN efficiency with up to 5×speedups while achieving negligible effectiveness losses. All the code used is available at https://github.com/francescobusolin/faiss_pEE.

Early Exit Strategies for Approximate k-NN Search in Dense Retrieval

Francesco Busolin
;
Claudio Lucchese;Salvatore Orlando;
2024-01-01

Abstract

Learned dense representations are a popular family of techniques for encoding queries and documents using high-dimensional em- beddings, which enable retrieval by performing approximate 𝑘 nearest-neighbors search (A-𝑘NN). A popular technique for mak- ing A-𝑘NN search efficient is based on a two-level index, where the embeddings of documents are clustered offline and, at query processing, a fixed number 𝑁 of clusters closest to the query is visited exhaustively to compute the result set. In this paper, we build upon state-of-the-art for early exit A- 𝑘NN and propose an unsupervised method based on the notion of patience, which can reach competitive effectiveness with large efficiency gains. Moreover, we discuss a cascade approach where we first identify queries that find their nearest neighbor within the closest 𝜏 ≪𝑁 clusters, and then we decide how many more to visit based on our patience approach or other state-of-the-art strate- gies. Reproducible experiments employing state-of-the-art dense retrieval models and publicly available resources show that our techniques improve the A-𝑘NN efficiency with up to 5×speedups while achieving negligible effectiveness losses. All the code used is available at https://github.com/francescobusolin/faiss_pEE.
2024
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
File in questo prodotto:
File Dimensione Formato  
3627673.3679903.pdf

accesso aperto

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 994.38 kB
Formato Adobe PDF
994.38 kB Adobe PDF Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5081787
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact