Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented with a specifc trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against diferent models, the factors afecting their efectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and infuence functions. We show that the efectiveness of backdoor attacks depends on (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors afect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy of clean test samples is still high while backdoor attacks are inefective, thereby suggesting novel criteria to improve existing defenses.

Backdoor learning curves: explaining backdoor poisoning beyond influence functions

Vascon, Sebastiano;Roli, Fabio;Pelillo, Marcello
2024-01-01

Abstract

Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented with a specifc trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against diferent models, the factors afecting their efectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and infuence functions. We show that the efectiveness of backdoor attacks depends on (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors afect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy of clean test samples is still high while backdoor attacks are inefective, thereby suggesting novel criteria to improve existing defenses.
File in questo prodotto:
File Dimensione Formato  
s13042-024-02363-5.pdf

accesso aperto

Tipologia: Versione dell'editore
Licenza: Accesso gratuito (solo visione)
Dimensione 16.78 MB
Formato Adobe PDF
16.78 MB Adobe PDF Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5081962
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact