We consider the problem of selecting a parsimonious subset of explanatory variables from a potentially large collection of covariates. We are concerned with the case when data quality may be unreliable (e.g. there might be outliers among the observations). When the number of available covariates is moderately large, fitting all possible subsets is not a feasible option. Sequential methods like forward or backward selection are generally “greedy” and may fail to include important predictors when these are correlated. To avoid this problem Efron et al. (2004) proposed the Least Angle Regression algorithm to produce an ordered list of the available covariates (sequencing) according to their relevance. We introduce outlier robust versions of the LARS algorithm based on S-estimators for regression (Rousseeuw and Yohai, 1984). This algorithm is computationally efficient and suit- able even when the number of variables exceeds the sample size. Simulation studies show that it is also robust to the presence of outliers in the data and compares favourably to previous proposals in the literature.

Robust model selection with LARS based on S-estimators

AGOSTINELLI, Claudio;
2010-01-01

Abstract

We consider the problem of selecting a parsimonious subset of explanatory variables from a potentially large collection of covariates. We are concerned with the case when data quality may be unreliable (e.g. there might be outliers among the observations). When the number of available covariates is moderately large, fitting all possible subsets is not a feasible option. Sequential methods like forward or backward selection are generally “greedy” and may fail to include important predictors when these are correlated. To avoid this problem Efron et al. (2004) proposed the Least Angle Regression algorithm to produce an ordered list of the available covariates (sequencing) according to their relevance. We introduce outlier robust versions of the LARS algorithm based on S-estimators for regression (Rousseeuw and Yohai, 1984). This algorithm is computationally efficient and suit- able even when the number of variables exceeds the sample size. Simulation studies show that it is also robust to the presence of outliers in the data and compares favourably to previous proposals in the literature.
2010
Proceedings of COMPSTAT 2010, 19th International Conference on Comptutational Statistics
File in questo prodotto:
File Dimensione Formato  
Agostinelli and Salibian-Barrera - 2010 - Robust model selection with LARS based on S-estima.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: Accesso chiuso-personale
Dimensione 116.01 kB
Formato Adobe PDF
116.01 kB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/22829
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
social impact