Prediction of Software Defects Using Automated Machine Learning

The effectiveness of defect prediction depends on modeling techniques as well as their parameter optimization, data preprocessing and ensemble development. This paper focuses on auto-sklearn, which is a recently-developed software library for automated machine learning, that can automatically select appropriate prediction models, hyperparameters and data preprocessing techniques for a given data set and develop their ensemble with optimized weights. In this paper we empirically evaluate the effectiveness of auto-sklearn in predicting the number of defects in software modules. In the experiment, we used software metrics of 20 OSS projects for cross-release defect prediction and compared auto-sklearn with random forest, decision tree and linear discriminant analysis by using Norm(Popt) as a performance measure. As a result, auto-sklearn showed similar prediction performance as random forest, which is one of the best prediction models for defect prediction in past studies. This indicates that auto-sklearn can obtain good prediction performance for defect prediction without any knowledge of machine learning techniques and models.

Prediction of Software Defects Using Automated Machine Learning

Tanaka, Kazuya;Monden, Akito;Yucel, Zeynep

2019-01-01

Abstract

The effectiveness of defect prediction depends on modeling techniques as well as their parameter optimization, data preprocessing and ensemble development. This paper focuses on auto-sklearn, which is a recently-developed software library for automated machine learning, that can automatically select appropriate prediction models, hyperparameters and data preprocessing techniques for a given data set and develop their ensemble with optimized weights. In this paper we empirically evaluate the effectiveness of auto-sklearn in predicting the number of defects in software modules. In the experiment, we used software metrics of 20 OSS projects for cross-release defect prediction and compared auto-sklearn with random forest, decision tree and linear discriminant analysis by using Norm(Popt) as a performance measure. As a result, auto-sklearn showed similar prediction performance as random forest, which is one of the best prediction models for defect prediction in past studies. This indicates that auto-sklearn can obtain good prediction performance for defect prediction without any knowledge of machine learning techniques and models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2019
			
	Titolo del volume
	
				Proc. International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, (SNPD 2019),
			
	DOI
	
				https://dx.doi.org/10.1109/snpd.2019.8935839
			
	Appare nelle tipologie:
	
				4.1 Articolo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
c_23_snpd_prediction.pdf non disponibili Tipologia: Documento in Pre-print Licenza: Copyright dell'editore Dimensione 63.65 kB Formato Adobe PDF Visualizza/Apri	63.65 kB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5080108

Citazioni

ND

19

9

social impact