Parallel Traversal of Large Ensembles of Decision Trees

Machine-learnt models based on additive ensembles of regression trees are currently deemed the best solution to address complex classification, regression, and ranking tasks. The deployment of such models is computationally demanding: to compute the final prediction, the whole ensemble must be traversed by accumulating the contributions of all its trees. In particular, traversal cost impacts applications where the number of candidate items is large, the time budget available to apply the learnt model to them is limited, and the users' expectations in terms of quality-of-service is high. Document ranking in web search, where sub-optimal ranking models are deployed to find a proper trade-off between efficiency and effectiveness of query answering, is probably the most typical example of this challenging issue. This paper investigates multi/many-core parallelization strategies for speeding up the traversal of large ensembles of regression trees thus obtaining machine-learnt models that are, at the same time, effective, fast, and scalable. Our best results are obtained by the GPU-based parallelization of the state-of-the-art algorithm, with speedups of up to 102.6x.

Parallel Traversal of Large Ensembles of Decision Trees

Lettich, Francesco;Lucchese, Claudio;Nardini, Franco Maria;Orlando, Salvatore;Perego, Raffaele;Tonellotto, Nicola;Venturini, Rossano

2019

Abstract

Machine-learnt models based on additive ensembles of regression trees are currently deemed the best solution to address complex classification, regression, and ranking tasks. The deployment of such models is computationally demanding: to compute the final prediction, the whole ensemble must be traversed by accumulating the contributions of all its trees. In particular, traversal cost impacts applications where the number of candidate items is large, the time budget available to apply the learnt model to them is limited, and the users' expectations in terms of quality-of-service is high. Document ranking in web search, where sub-optimal ranking models are deployed to find a proper trade-off between efficiency and effectiveness of query answering, is probably the most typical example of this challenging issue. This paper investigates multi/many-core parallelization strategies for speeding up the traversal of large ensembles of regression trees thus obtaining machine-learnt models that are, at the same time, effective, fast, and scalable. Our best results are obtained by the GPU-based parallelization of the state-of-the-art algorithm, with speedups of up to 102.6x.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2019
			
	Titolo della Rivista
	
				IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
			
	N° Volume
	
				30
			
	DOI
	
				https://dx.doi.org/10.1109/TPDS.2018.2860982
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
08430583.pdf non disponibili Tipologia: Versione dell'editore Licenza: Accesso chiuso-personale Dimensione 802.78 kB Formato Adobe PDF Visualizza/Apri	802.78 kB	Adobe PDF	Visualizza/Apri
paper.pdf accesso aperto Tipologia: Documento in Post-print Licenza: Accesso gratuito (solo visione) Dimensione 1.85 MB Formato Adobe PDF Visualizza/Apri	1.85 MB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/3703670

Citazioni

ND

23

18

social impact