The effectiveness of defect prediction depends on modeling techniques as well as their parameter optimization, data preprocessing and ensemble development. This paper focuses on auto-sklearn, which is a recently-developed software library for automated machine learning, that can automatically select appropriate prediction models, hyperparameters and data preprocessing techniques for a given data set and develop their ensemble with optimized weights. In this paper we empirically evaluate the effectiveness of auto-sklearn in predicting the number of defects in software modules. In the experiment, we used software metrics of 20 OSS projects for cross-release defect prediction and compared auto-sklearn with random forest, decision tree and linear discriminant analysis by using Norm(Popt) as a performance measure. As a result, auto-sklearn showed similar prediction performance as random forest, which is one of the best prediction models for defect prediction in past studies. This indicates that auto-sklearn can obtain good prediction performance for defect prediction without any knowledge of machine learning techniques and models.

Prediction of Software Defects Using Automated Machine Learning

Yucel, Zeynep
2019

Abstract

The effectiveness of defect prediction depends on modeling techniques as well as their parameter optimization, data preprocessing and ensemble development. This paper focuses on auto-sklearn, which is a recently-developed software library for automated machine learning, that can automatically select appropriate prediction models, hyperparameters and data preprocessing techniques for a given data set and develop their ensemble with optimized weights. In this paper we empirically evaluate the effectiveness of auto-sklearn in predicting the number of defects in software modules. In the experiment, we used software metrics of 20 OSS projects for cross-release defect prediction and compared auto-sklearn with random forest, decision tree and linear discriminant analysis by using Norm(Popt) as a performance measure. As a result, auto-sklearn showed similar prediction performance as random forest, which is one of the best prediction models for defect prediction in past studies. This indicates that auto-sklearn can obtain good prediction performance for defect prediction without any knowledge of machine learning techniques and models.
2019
Proc. International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, (SNPD 2019),
File in questo prodotto:
File Dimensione Formato  
c_23_snpd_prediction.pdf

non disponibili

Tipologia: Documento in Pre-print
Licenza: Copyright dell'editore
Dimensione 63.65 kB
Formato Adobe PDF
63.65 kB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5080108
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 23
  • ???jsp.display-item.citation.isi??? 11
social impact