SOUR: an Outliers Detection Algorithm in Learning to Rank (Abstract)

Outlier data points are known to affect negatively the learning process of regression or classification models, yet their impact in the learning-to-rank scenario has not been thoroughly investigated so far. In this talk we present our effort to solve this research problem. The full version of this work will appear at ICTIR 2022 [1]. We designed SOUR, a learning-to-rank method that detects and removes outliers before building an effective ranking model. We limit our analysis to gradient boosting decision trees, but our algorithm can be easily adapted to handle different learning strategy, such as artificial Neural Network. SOUR searches for outlier instances that are consistently incorrectly ranked in several consecutive iterations of the learning process. We performed an extensive evaluation analysis on three publicly available datasets and we empirically demonstrated that i) removing a limited number of outlier data instances before re-training a new model, provides statistically significant improvements in term of effectiveness ii) SOUR outperforms state-of-the-art de-noising and outlier detection methods such as [2]. Finally, we investigated how the removal of the outliers affects the ensemble structure and we found that the ensemble leaves were purer when trained without the presence of the outliers.

SOUR: an Outliers Detection Algorithm in Learning to Rank (Abstract)

Marcuzzi F.;Lucchese C.;Orlando S.

2022-01-01

Abstract

Outlier data points are known to affect negatively the learning process of regression or classification models, yet their impact in the learning-to-rank scenario has not been thoroughly investigated so far. In this talk we present our effort to solve this research problem. The full version of this work will appear at ICTIR 2022 [1]. We designed SOUR, a learning-to-rank method that detects and removes outliers before building an effective ranking model. We limit our analysis to gradient boosting decision trees, but our algorithm can be easily adapted to handle different learning strategy, such as artificial Neural Network. SOUR searches for outlier instances that are consistently incorrectly ranked in several consecutive iterations of the learning process. We performed an extensive evaluation analysis on three publicly available datasets and we empirically demonstrated that i) removing a limited number of outlier data instances before re-training a new model, provides statistically significant improvements in term of effectiveness ii) SOUR outperforms state-of-the-art de-noising and outlier detection methods such as [2]. Finally, we investigated how the removal of the outliers affects the ensemble structure and we found that the ensemble leaves were purer when trained without the presence of the outliers.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2022
			
	Titolo del volume
	
				CEUR Workshop Proceedings
			
	Appare nelle tipologie:
	
				4.2 Abstract in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
paper12.pdf accesso aperto Tipologia: Versione dell'editore Licenza: Accesso gratuito (solo visione) Dimensione 451.14 kB Formato Adobe PDF Visualizza/Apri	451.14 kB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5004962

Citazioni

ND

0

ND

social impact