Data Smoothing for Software Effort Estimation

The goal of this paper is to improve the estimation performance of software development effort by mitigating the problem caused by outliers in a historical software project data set, which is used to construct an effort estimation model. To date, outlier removal methods have been proposed to solve this problem; however, they are not always effective because removing outliers reduces the number of data points (= software projects in our case) in a data set, and a model built from a small data set often suffers from lack of generality. In such a case, estimation performance can become even worse. In this paper we propose a method called data smoothing to mitigate the problem of outliers without reducing the number of data points. We consider that data points are outliers if they do not meet the assumption of Analogy-Based Estimation (ABE) such that "projects with similar features require similar development efforts." The proposed method changes the effort values (person-months or person-hours) in a data set so as to satisfy this assumption; and by this way, all outliers become non-outliers without decreasing the data points. As a result of experimental evaluation using 8 software development data sets, we found that the proposed data smoothing showed the same or higher effort estimation accuracy than the non-smoothing case, while conventional outlier removal method showed worse accuracy in some data set.

Data Smoothing for Software Effort Estimation

Korenaga, Kento;Monden, Akito;Yucel, Zeynep

2019

Abstract

The goal of this paper is to improve the estimation performance of software development effort by mitigating the problem caused by outliers in a historical software project data set, which is used to construct an effort estimation model. To date, outlier removal methods have been proposed to solve this problem; however, they are not always effective because removing outliers reduces the number of data points (= software projects in our case) in a data set, and a model built from a small data set often suffers from lack of generality. In such a case, estimation performance can become even worse. In this paper we propose a method called data smoothing to mitigate the problem of outliers without reducing the number of data points. We consider that data points are outliers if they do not meet the assumption of Analogy-Based Estimation (ABE) such that "projects with similar features require similar development efforts." The proposed method changes the effort values (person-months or person-hours) in a data set so as to satisfy this assumption; and by this way, all outliers become non-outliers without decreasing the data points. As a result of experimental evaluation using 8 software development data sets, we found that the proposed data smoothing showed the same or higher effort estimation accuracy than the non-smoothing case, while conventional outlier removal method showed worse accuracy in some data set.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2019
			
	Titolo del volume
	
				Proc. International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2019)
			
	DOI
	
				https://dx.doi.org/10.1109/snpd.2019.8935841
			
	Appare nelle tipologie:
	
				4.1 Articolo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
c_24_snpd_data.pdf non disponibili Tipologia: Documento in Pre-print Licenza: Copyright dell'editore Dimensione 322.39 kB Formato Adobe PDF Visualizza/Apri	322.39 kB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5080107

Citazioni

ND

5

3

social impact