Machine learning algorithms, however effective, are known to be vulnerable in adversarial scenarios where a malicious user may inject manipulated instances. In this work, we focus on evasion attacks, where a model is trained in a safe environment and exposed to attacks at inference time. The attacker aims at finding a perturbation of an instance that changes the model outcome.We propose a model-agnostic strategy that builds a robust ensemble by training its basic models on feature-based partitions of the given dataset. Our algorithm guarantees that the majority of the models in the ensemble cannot be affected by the attacker. We apply the proposed strategy to decision tree ensembles, and we also propose an approximate certification method for tree ensembles that efficiently provides a lower bound of the accuracy of a forest in the presence of attacks on a given dataset avoiding the costly computation of evasion attacks.Experimental evaluation on publicly available datasets shows that the proposed feature partitioning strategy provides a significant accuracy improvement with respect to competitor algorithms and that the proposed certification method allows ones to accurately estimate the effectiveness of a classifier where the brute-force approach would be unfeasible.

Feature partitioning for robust tree ensembles and their certification in adversarial scenarios

Calzavara, Stefano
Conceptualization
;
Lucchese, Claudio
Conceptualization
;
Marcuzzi, Federico
Software
;
Orlando, Salvatore
Methodology
2021-01-01

Abstract

Machine learning algorithms, however effective, are known to be vulnerable in adversarial scenarios where a malicious user may inject manipulated instances. In this work, we focus on evasion attacks, where a model is trained in a safe environment and exposed to attacks at inference time. The attacker aims at finding a perturbation of an instance that changes the model outcome.We propose a model-agnostic strategy that builds a robust ensemble by training its basic models on feature-based partitions of the given dataset. Our algorithm guarantees that the majority of the models in the ensemble cannot be affected by the attacker. We apply the proposed strategy to decision tree ensembles, and we also propose an approximate certification method for tree ensembles that efficiently provides a lower bound of the accuracy of a forest in the presence of attacks on a given dataset avoiding the costly computation of evasion attacks.Experimental evaluation on publicly available datasets shows that the proposed feature partitioning strategy provides a significant accuracy improvement with respect to competitor algorithms and that the proposed certification method allows ones to accurately estimate the effectiveness of a classifier where the brute-force approach would be unfeasible.
File in questo prodotto:
File Dimensione Formato  
s13635-021-00127-0.pdf

accesso aperto

Tipologia: Versione dell'editore
Licenza: Accesso gratuito (solo visione)
Dimensione 2.82 MB
Formato Adobe PDF
2.82 MB Adobe PDF Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/3753807
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
social impact