Investigating Fairness with FanFAIR: is Pre-Processing Useful Only for Performances?

Artificial Intelligence, and Machine Learning systems in general, are becoming pervasive in our society, from the industry to the public administration. AI can often provide a very efficient means to support decision-making, but it can represent a danger for high-risk applications such as bio-medicine and healthcare. In particular, biased datasets might lead to inaccurate or discriminatory ML systems, undermining the accuracy of their predictions and putting patients' health at risk. FanFAIR is a python tool that provides the community with a semi-automatic tool for datasets' fairness assessment. FanFAIR is designed to integrate qualitative considerations - such as ethics, human rights assessment, and data protection - with quantitative indicators of dataset's fairness, such as balance, the presence of invalid entries, or outliers. In this work, we extend FanFAIR to deal with categorical data, and introduce a new algorithm for outlier detection in the presence of missing values. We then provide a case study on the data collected from COVID patients admitted to pneumology departments in Italy. We show how the successive steps of data cleaning and variable selection improve the indicators provided by FanFAIR. This shows that data cleaning procedures are not only necessary to improve the performance of the machine learning algorithm using the data for learning, but are also a way to improve (a measure of) fairness. Hence, the proposed case study provides an example in which performance and fairness are not in contrast, like it is commonly believed to be, but they improve together.

Investigating Fairness with FanFAIR: is Pre-Processing Useful Only for Performances?

Rispoli, Michele;Nobile, Marco S.;Manzoni, Luca;D'Onofrio, Alberto;Confalonieri, Marco;Salton, Francesco;Confalonieri, Paola;Ruaro, Barbara;Gallese, Chiara

2025-01-01

Abstract

Artificial Intelligence, and Machine Learning systems in general, are becoming pervasive in our society, from the industry to the public administration. AI can often provide a very efficient means to support decision-making, but it can represent a danger for high-risk applications such as bio-medicine and healthcare. In particular, biased datasets might lead to inaccurate or discriminatory ML systems, undermining the accuracy of their predictions and putting patients' health at risk. FanFAIR is a python tool that provides the community with a semi-automatic tool for datasets' fairness assessment. FanFAIR is designed to integrate qualitative considerations - such as ethics, human rights assessment, and data protection - with quantitative indicators of dataset's fairness, such as balance, the presence of invalid entries, or outliers. In this work, we extend FanFAIR to deal with categorical data, and introduce a new algorithm for outlier detection in the presence of missing values. We then provide a case study on the data collected from COVID patients admitted to pneumology departments in Italy. We show how the successive steps of data cleaning and variable selection improve the indicators provided by FanFAIR. This shows that data cleaning procedures are not only necessary to improve the performance of the machine learning algorithm using the data for learning, but are also a way to improve (a measure of) fairness. Hence, the proposed case study provides an example in which performance and fairness are not in contrast, like it is commonly believed to be, but they improve together.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2025
			
	Titolo del volume
	
				2025 IEEE Symposium on Computational Intelligence in Health and Medicine (CIHM)
			
	DOI
	
				https://dx.doi.org/10.1109/cihm64979.2025.10969477
			
	Appare nelle tipologie:
	
				4.1 Articolo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Investigating_Fairness_with_FanFAIR_is_Pre-Processing_Useful_Only_for_Performances.pdf non disponibili Tipologia: Versione dell'editore Licenza: Copyright dell'editore Dimensione 466.82 kB Formato Adobe PDF Visualizza/Apri	466.82 kB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5095389

Citazioni

ND

0

0

social impact