Poisoning Complete-Linkage Hierarchical Clustering

Biggio, B.; Rota Bulò, S.; Pillai, I.; Mura, M.; Mequanint, Eyasu Zemene; Pelillo, Marcello; Roli, F.

doi:10.1007/978-3-662-44415-3_5

Clustering algorithms are largely adopted in security applications as a vehicle to detect malicious activities, although few attention has been paid on preventing deliberate attacks from subverting the clustering process itself. Recent work has introduced a methodology for the security analysis of data clustering in adversarial settings, aimed to identify potential attacks against clustering algorithms and to evaluate their impact. The authors have shown that single-linkage hierarchical clustering can be severely affected by the presence of a very small fraction of carefully-crafted poisoning attacks into the input data, highlighting that the clustering algorithm may be itself the weakest link in a security system. In this paper, we extend this analysis to the case of complete-linkage hierarchical clustering by devising an ad hoc poisoning attack. We verify its effectiveness on artificial data and on application examples related to the clustering of malware and handwritten digits. © 2014 Springer-Verlag Berlin Heidelberg.