The goal of this paper is to improve the estimation performance of software development effort by mitigating the problem caused by outliers in a historical software project data set, which is used to construct an effort estimation model. To date, outlier removal methods have been proposed to solve this problem; however, they are not always effective because removing outliers reduces the number of data points (= software projects in our case) in a data set, and a model built from a small data set often suffers from lack of generality. In such a case, estimation performance can become even worse. In this paper we propose a method called data smoothing to mitigate the problem of outliers without reducing the number of data points. We consider that data points are outliers if they do not meet the assumption of Analogy-Based Estimation (ABE) such that "projects with similar features require similar development efforts." The proposed method changes the effort values (person-months or person-hours) in a data set so as to satisfy this assumption; and by this way, all outliers become non-outliers without decreasing the data points. As a result of experimental evaluation using 8 software development data sets, we found that the proposed data smoothing showed the same or higher effort estimation accuracy than the non-smoothing case, while conventional outlier removal method showed worse accuracy in some data set.

Data Smoothing for Software Effort Estimation

Yucel, Zeynep
2019-01-01

Abstract

The goal of this paper is to improve the estimation performance of software development effort by mitigating the problem caused by outliers in a historical software project data set, which is used to construct an effort estimation model. To date, outlier removal methods have been proposed to solve this problem; however, they are not always effective because removing outliers reduces the number of data points (= software projects in our case) in a data set, and a model built from a small data set often suffers from lack of generality. In such a case, estimation performance can become even worse. In this paper we propose a method called data smoothing to mitigate the problem of outliers without reducing the number of data points. We consider that data points are outliers if they do not meet the assumption of Analogy-Based Estimation (ABE) such that "projects with similar features require similar development efforts." The proposed method changes the effort values (person-months or person-hours) in a data set so as to satisfy this assumption; and by this way, all outliers become non-outliers without decreasing the data points. As a result of experimental evaluation using 8 software development data sets, we found that the proposed data smoothing showed the same or higher effort estimation accuracy than the non-smoothing case, while conventional outlier removal method showed worse accuracy in some data set.
2019
Proc. International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2019)
File in questo prodotto:
File Dimensione Formato  
c_24_snpd_data.pdf

non disponibili

Tipologia: Documento in Pre-print
Licenza: Copyright dell'editore
Dimensione 322.39 kB
Formato Adobe PDF
322.39 kB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5080107
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
social impact