Evaluating the robustness of explainable AI in medical image recognition under natural and adversarial data corruption

The integration of Explainable AI (XAI) into healthcare promises greater transparency and interpretability of machine learning models, enabling clinicians to understand predictions and make more reliable medical decisions. Yet, the robustness of XAI methods remains uncertain, as small input perturbations can drastically change their explanations, posing critical risks in clinical settings where they may lead to misdiagnoses or inappropriate treatment. Motivated by the central role of XAI in healthcare decision-making, this paper examines its robustness in the presence of data corruption. We systematically evaluate the stability of widely used XAI techniques against both naturally occurring noise (e.g., JPEG compression) and adversarial manipulations that alter explanations without affecting model predictions. To this end, we introduce a set of evaluation metrics that capture complementary aspects of explanation stability, ranging from pixel-level consistency to spatial coherence, and propose a protocol for assessing the resilience of XAI methods across diverse perturbation sources. Our analysis spans three medical imaging datasets, various convolutional and transformer models, and ten post-hoc XAI methods, including Grad-CAM++ for convolutional networks and LibraGrad for vision transformers. We find that current XAI techniques are often unstable, even under imperceptible perturbations. For adversarial noise, a clear set of robust methods emerges, whereas for natural noise, performance varies, with some methods maintaining spatial stability and others preserving pixel-wise consistency. All results together highlight the need for multi-perspective evaluation when selecting XAI techniques in practice.

Evaluating the robustness of explainable AI in medical image recognition under natural and adversarial data corruption

Repetto, Sara;Maljkovic, Igor;Lotto, Michele;Cinà, Antonio Emanuele;Vascon, Sebastiano;Roli, Fabio

2026-01-01

Abstract

The integration of Explainable AI (XAI) into healthcare promises greater transparency and interpretability of machine learning models, enabling clinicians to understand predictions and make more reliable medical decisions. Yet, the robustness of XAI methods remains uncertain, as small input perturbations can drastically change their explanations, posing critical risks in clinical settings where they may lead to misdiagnoses or inappropriate treatment. Motivated by the central role of XAI in healthcare decision-making, this paper examines its robustness in the presence of data corruption. We systematically evaluate the stability of widely used XAI techniques against both naturally occurring noise (e.g., JPEG compression) and adversarial manipulations that alter explanations without affecting model predictions. To this end, we introduce a set of evaluation metrics that capture complementary aspects of explanation stability, ranging from pixel-level consistency to spatial coherence, and propose a protocol for assessing the resilience of XAI methods across diverse perturbation sources. Our analysis spans three medical imaging datasets, various convolutional and transformer models, and ten post-hoc XAI methods, including Grad-CAM++ for convolutional networks and LibraGrad for vision transformers. We find that current XAI techniques are often unstable, even under imperceptible perturbations. For adversarial noise, a clear set of robust methods emerges, whereas for natural noise, performance varies, with some methods maintaining spatial stability and others preserving pixel-wise consistency. All results together highlight the need for multi-perspective evaluation when selecting XAI techniques in practice.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2026
			
	Titolo della Rivista
	
				MACHINE LEARNING
			
	N° Volume
	
				115
			
	DOI
	
				https://dx.doi.org/10.1007/s10994-025-06919-6
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
s10994-025-06919-6.pdf accesso aperto Tipologia: Versione dell'editore Licenza: Accesso libero (no vincoli) Dimensione 2.31 MB Formato Adobe PDF Visualizza/Apri	2.31 MB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5108448

Citazioni

ND

ND

ND

social impact