Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented with a specifc trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against diferent models, the factors afecting their efectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and infuence functions. We show that the efectiveness of backdoor attacks depends on (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors afect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy of clean test samples is still high while backdoor attacks are inefective, thereby suggesting novel criteria to improve existing defenses.
Backdoor learning curves: explaining backdoor poisoning beyond influence functions
Vascon, Sebastiano;Roli, Fabio;Pelillo, Marcello
2024-01-01
Abstract
Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented with a specifc trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against diferent models, the factors afecting their efectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and infuence functions. We show that the efectiveness of backdoor attacks depends on (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors afect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy of clean test samples is still high while backdoor attacks are inefective, thereby suggesting novel criteria to improve existing defenses.File | Dimensione | Formato | |
---|---|---|---|
s13042-024-02363-5.pdf
accesso aperto
Tipologia:
Versione dell'editore
Licenza:
Accesso gratuito (solo visione)
Dimensione
16.78 MB
Formato
Adobe PDF
|
16.78 MB | Adobe PDF | Visualizza/Apri |
I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.