Artificial Intelligence, and Machine Learning systems in general, are becoming pervasive in our society, from the industry to the public administration. AI can often provide a very efficient means to support decision-making, but it can represent a danger for high-risk applications such as bio-medicine and healthcare. In particular, biased datasets might lead to inaccurate or discriminatory ML systems, undermining the accuracy of their predictions and putting patients' health at risk. FanFAIR is a python tool that provides the community with a semi-automatic tool for datasets' fairness assessment. FanFAIR is designed to integrate qualitative considerations - such as ethics, human rights assessment, and data protection - with quantitative indicators of dataset's fairness, such as balance, the presence of invalid entries, or outliers. In this work, we extend FanFAIR to deal with categorical data, and introduce a new algorithm for outlier detection in the presence of missing values. We then provide a case study on the data collected from COVID patients admitted to pneumology departments in Italy. We show how the successive steps of data cleaning and variable selection improve the indicators provided by FanFAIR. This shows that data cleaning procedures are not only necessary to improve the performance of the machine learning algorithm using the data for learning, but are also a way to improve (a measure of) fairness. Hence, the proposed case study provides an example in which performance and fairness are not in contrast, like it is commonly believed to be, but they improve together.

Investigating Fairness with FanFAIR: is Pre-Processing Useful Only for Performances?

Nobile, Marco S.;Confalonieri, Marco;Gallese, Chiara
2025-01-01

Abstract

Artificial Intelligence, and Machine Learning systems in general, are becoming pervasive in our society, from the industry to the public administration. AI can often provide a very efficient means to support decision-making, but it can represent a danger for high-risk applications such as bio-medicine and healthcare. In particular, biased datasets might lead to inaccurate or discriminatory ML systems, undermining the accuracy of their predictions and putting patients' health at risk. FanFAIR is a python tool that provides the community with a semi-automatic tool for datasets' fairness assessment. FanFAIR is designed to integrate qualitative considerations - such as ethics, human rights assessment, and data protection - with quantitative indicators of dataset's fairness, such as balance, the presence of invalid entries, or outliers. In this work, we extend FanFAIR to deal with categorical data, and introduce a new algorithm for outlier detection in the presence of missing values. We then provide a case study on the data collected from COVID patients admitted to pneumology departments in Italy. We show how the successive steps of data cleaning and variable selection improve the indicators provided by FanFAIR. This shows that data cleaning procedures are not only necessary to improve the performance of the machine learning algorithm using the data for learning, but are also a way to improve (a measure of) fairness. Hence, the proposed case study provides an example in which performance and fairness are not in contrast, like it is commonly believed to be, but they improve together.
2025
2025 IEEE Symposium on Computational Intelligence in Health and Medicine (CIHM)
File in questo prodotto:
File Dimensione Formato  
Investigating_Fairness_with_FanFAIR_is_Pre-Processing_Useful_Only_for_Performances.pdf

non disponibili

Tipologia: Versione dell'editore
Licenza: Copyright dell'editore
Dimensione 466.82 kB
Formato Adobe PDF
466.82 kB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5095389
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact