On the Effect of Low-Ranked Documents: A New Sampling Function for Selective Gradient Boosting

Learning to Rank is the task of learning a ranking function from a set of query-documents pairs. Generally, documents within a query are thousands but not all documents are informative for the learning phase. Different strategies were designed to select the most informative documents from the training set. However, most of them focused on reducing the size of the training set to speed up the learning phase, sacrificing effectiveness. A first attempt in this direction was achieved by Selective Gradient Boosting a learning algorithm that makes use of customisable sampling strategy to train effective ranking models. In this work, we propose a new sampling strategy called High-Low-Sampl for selecting negative examples applicable to Selective Gradient Boosting, without compromising model effectiveness. The proposed sampling strategy allows Selective Gradient Boosting to compose a new training set by selecting from the original one three document classes: the positive examples, high-ranked negative examples and low-ranked negative examples. The resulting dataset aims at minimizing the mis-ranking risk, i.e., enhancing the discriminative power of the learned model and maintaining generalisation to unseen instances. We demonstrated through an extensive experimental analysis on publicly available datasets, that the proposed selection algorithm is able to make the most of the negative examples within the training set and leads to models capable of obtaining statistically significant improvements in terms of NDCG, compared to the state of the art.

On the Effect of Low-Ranked Documents: A New Sampling Function for Selective Gradient Boosting

Lucchese C.;Marcuzzi F.;Orlando S.

2023-01-01

Abstract

Learning to Rank is the task of learning a ranking function from a set of query-documents pairs. Generally, documents within a query are thousands but not all documents are informative for the learning phase. Different strategies were designed to select the most informative documents from the training set. However, most of them focused on reducing the size of the training set to speed up the learning phase, sacrificing effectiveness. A first attempt in this direction was achieved by Selective Gradient Boosting a learning algorithm that makes use of customisable sampling strategy to train effective ranking models. In this work, we propose a new sampling strategy called High-Low-Sampl for selecting negative examples applicable to Selective Gradient Boosting, without compromising model effectiveness. The proposed sampling strategy allows Selective Gradient Boosting to compose a new training set by selecting from the original one three document classes: the positive examples, high-ranked negative examples and low-ranked negative examples. The resulting dataset aims at minimizing the mis-ranking risk, i.e., enhancing the discriminative power of the learned model and maintaining generalisation to unseen instances. We demonstrated through an extensive experimental analysis on publicly available datasets, that the proposed selection algorithm is able to make the most of the negative examples within the training set and leads to models capable of obtaining statistically significant improvements in terms of NDCG, compared to the state of the art.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2023
			
	Titolo del volume
	
				Proceedings of the ACM Symposium on Applied Computing
			
	DOI
	
				https://dx.doi.org/10.1145/3555776.3577597
			
	Appare nelle tipologie:
	
				4.1 Articolo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
3555776.3577597.pdf accesso aperto Tipologia: Versione dell'editore Licenza: Creative commons Dimensione 2.45 MB Formato Adobe PDF Visualizza/Apri	2.45 MB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5034901

Citazioni

ND

0

ND

social impact