Backdoor learning curves: explaining backdoor poisoning beyond influence functions

Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented with a specifc trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against diferent models, the factors afecting their efectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and infuence functions. We show that the efectiveness of backdoor attacks depends on (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors afect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy of clean test samples is still high while backdoor attacks are inefective, thereby suggesting novel criteria to improve existing defenses.

Backdoor learning curves: explaining backdoor poisoning beyond influence functions

Cinà, Antonio Emanuele;Grosse, Kathrin;Vascon, Sebastiano;Demontis, Ambra;Biggio, Battista;Roli, Fabio;Pelillo, Marcello

2024-01-01

Abstract

Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented with a specifc trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against diferent models, the factors afecting their efectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and infuence functions. We show that the efectiveness of backdoor attacks depends on (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors afect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy of clean test samples is still high while backdoor attacks are inefective, thereby suggesting novel criteria to improve existing defenses.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno pubblicazione
	
				2024
			
	Titolo della Rivista
	
				INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS
			
	N° Volume
	
				16
			
	DOI
	
				https://dx.doi.org/10.1007/s13042-024-02363-5
			
	Appare nelle tipologie:
	
				2.1 Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
s13042-024-02363-5_reduced.pdf accesso aperto Tipologia: Versione dell'editore Licenza: Creative commons Dimensione 5.93 MB Formato Adobe PDF Visualizza/Apri	5.93 MB	Adobe PDF	Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/5081962

Citazioni

ND

0

0

social impact