Multi-attacks: Many images + the same adversarial attack -> many target labels

提交: 2023-09-20更新: 2024-03-26

TL;DR

We show that a single, small adversarial perturbation (that we call a multi-attack) can influence the classification of hundreds of images simultaneously, changing each to an arbitrarily chosen target class.

摘要

We show that we can easily design a single adversarial perturbation $P$ that changes the class of $n$ images $X_1,X_2,\dots,X_n$ from their original, unperturbed classes $c_1, c_2,\dots,c_n$ to desired (not necessarily all the same) classes $c^*_1,c^*_2,\dots,c^*_n$ for up to hundreds of images and target classes at once. We call these multi-attacks. Characterizing the maximum $n$ we can achieve under different conditions such as image resolution, we estimate the number of regions of high class confidence around a particular image in the space of pixels to be around $10^{\mathcal{O}(100)}$, posing a significant problem for exhaustive defense strategies. We show several immediate consequences of this: adversarial attacks that change the resulting class based on their intensity, and scale-independent adversarial examples. To demonstrate the redundancy and richness of class decision in the pixel space, we look for its two-dimensional sections that trace images and spell words using particular classes. We also show that ensembling reduces susceptibility to multi-attacks, and that classifiers trained on random labels are more susceptible. Our code is available on GitHub.

关键词

adversarialattackdimensioninput spaceclassadversarydefensesecuritysafety

评审与讨论

暂无评审记录