The integration of Explainable AI (XAI) into healthcare promises greater transparency and interpretability of machine learning models, enabling clinicians to understand predictions and make more reliable medical decisions. Yet, the robustness of XAI methods remains uncertain, as small input perturbations can drastically change their explanations, posing critical risks in clinical settings where they may lead to misdiagnoses or inappropriate treatment. Motivated by the central role of XAI in healthcare decision-making, this paper examines its robustness in the presence of data corruption. We systematically evaluate the stability of widely used XAI techniques against both naturally occurring noise (e.g., JPEG compression) and adversarial manipulations that alter explanations without affecting model predictions. To this end, we introduce a set of evaluation metrics that capture complementary aspects of explanation stability, ranging from pixel-level consistency to spatial coherence, and propose a protocol for assessing the resilience of XAI methods across diverse perturbation sources. Our analysis spans three medical imaging datasets, various convolutional and transformer models, and ten post-hoc XAI methods, including Grad-CAM++ for convolutional networks and LibraGrad for vision transformers. We find that current XAI techniques are often unstable, even under imperceptible perturbations. For adversarial noise, a clear set of robust methods emerges, whereas for natural noise, performance varies, with some methods maintaining spatial stability and others preserving pixel-wise consistency. All results together highlight the need for multi-perspective evaluation when selecting XAI techniques in practice.
Evaluating the robustness of explainable AI in medical image recognition under natural and adversarial data corruption
Lotto, Michele;Vascon, Sebastiano;Roli, Fabio
2026-01-01
Abstract
The integration of Explainable AI (XAI) into healthcare promises greater transparency and interpretability of machine learning models, enabling clinicians to understand predictions and make more reliable medical decisions. Yet, the robustness of XAI methods remains uncertain, as small input perturbations can drastically change their explanations, posing critical risks in clinical settings where they may lead to misdiagnoses or inappropriate treatment. Motivated by the central role of XAI in healthcare decision-making, this paper examines its robustness in the presence of data corruption. We systematically evaluate the stability of widely used XAI techniques against both naturally occurring noise (e.g., JPEG compression) and adversarial manipulations that alter explanations without affecting model predictions. To this end, we introduce a set of evaluation metrics that capture complementary aspects of explanation stability, ranging from pixel-level consistency to spatial coherence, and propose a protocol for assessing the resilience of XAI methods across diverse perturbation sources. Our analysis spans three medical imaging datasets, various convolutional and transformer models, and ten post-hoc XAI methods, including Grad-CAM++ for convolutional networks and LibraGrad for vision transformers. We find that current XAI techniques are often unstable, even under imperceptible perturbations. For adversarial noise, a clear set of robust methods emerges, whereas for natural noise, performance varies, with some methods maintaining spatial stability and others preserving pixel-wise consistency. All results together highlight the need for multi-perspective evaluation when selecting XAI techniques in practice.| File | Dimensione | Formato | |
|---|---|---|---|
|
s10994-025-06919-6.pdf
accesso aperto
Tipologia:
Versione dell'editore
Licenza:
Accesso libero (no vincoli)
Dimensione
2.31 MB
Formato
Adobe PDF
|
2.31 MB | Adobe PDF | Visualizza/Apri |
I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



