PaperHub
5.8
/10
Rejected4 位审稿人
最低5最高6标准差0.4
5
6
6
6
3.8
置信度
正确性3.0
贡献度2.8
表达3.0
ICLR 2025

Understanding Model Ensemble in Transferable Adversarial Attack

OpenReviewPDF
提交: 2024-09-21更新: 2025-02-05

摘要

关键词
adversarial examplesadversarial transferabilitymodel ensemble attack

评审与讨论

审稿意见
5

This paper presents theoretical insights into model ensemble adversarial attacks. The authors define transferability error, which measures the error in adversarial transferability. They also discuss diversity and empirical model ensemble Rademacher complexity. The authors then decompose the transferability error to explain how it originated in the model ensemble attack. Furthermore, they derive bounds on the transferability error using complexity and generalization terms, and conclude three practical guidelines for reducing transferability error: (1) incorporating more surrogate models, (2) increasing their diversity, and (3) reducing their complexity in cases of overfitting. Some empirical evaluations are done on MNIST, CIFAR-10, and ImageNet.

优点

The Weaknesses of this paper include:

  • The writing is clear with intuitive explanations are in Figure 1.
  • Notations are clearly defined with neat formulations, the derivations are self-consistent.
  • The concluded practical guidelines are correct and already used in the literature.

缺点

The Weaknesses of this paper include:

  • Implicit assumptions in Eq. (3). The authors define the most transferable adversarial example zz^* in Eq. (3) as z=argmaxLPz^*=\textrm{argmax} L_P, where LPL_P in Eq. (1) is defined by taking expectation over θPΘ\theta\sim P_\Theta. This formulation has implicit assumptions that 1) the target model share the same parameter space Θ\Theta with the surrogate models, i.e., they have the same architectures; 2) the target model follow the same distribution PΘP_\Theta with the surrogate models, i.e., they apply the same (or same distribution) of training configurations. Both of these assumptions make the transfer problem overly simplistic, because in practice, the target model typically employs different model architectures and training configurations (including different datasets) than the surrogate models.

  • Using Rademacher Complexity in deep cases. First, I personally don't believe that Rademacher Complexity can convey reliable information when we are talking about deep networks. Second, Rademacher Complexity is more useful for asymptotic analysis, otherwise a lower upper bound of TE (i.e., Eq. (12)) does not indicate a lower value of TE.

  • The three practical guidelines are already well-known. While the authors demonstrated some theoretical bounds, the three guidelines they concluded—(1) incorporating more surrogate models, (2) increasing their diversity, and (3) reducing their complexity in cases of overfitting—are all well-known in literature. There is also a lack of empirical comparisons to previous baselines for ensemble-based transfer attacks.

问题

My main concerns are the implicit assumptions in Eq. (3), making the derivations much less interesting. Besides, the concluded practical guidelines are already widely applied in the literature, and there is also a lack of empirical comparisons to previous baselines for ensemble-based transfer attacks.

评论

2. Extension to Different Parameter Distributions with Domain Adaptation Theory.

There is another way to address the issues you mentioned above using domain adaptation theory [1].

  • Intuitively, there is a need for domain adaptation between the surrogate model and the target model.
  • Mathematically, a feasible and straightforward approach is to define a divergence metric and apply domain adaptation theory [1]. For instance,

Definition 1 (X\mathcal{X} divergence for transferable attack). Given a feature space X\mathcal{X} and a label space Y\mathcal{Y}, We denote the hypothesis space by H:XY\mathcal{H}: \mathcal{X} \mapsto \mathcal{Y}. Denote the parameter space of surrogate model and target model by Θ\Theta and Θ\Theta', respectively. Let f(θ;)Hf(\theta;\cdot) \in \mathcal{H} be a classifier parameterized by θ\theta, where θΘ\theta \in \Theta or θΘ\theta \in \Theta'. Consider a metric loss function :Y×YR_0+\ell: \mathcal{Y} \times \mathcal{Y} \mapsto \mathbb{R}\_0^+. Then the X\mathcal{X} divergence between the surrogate model domain and the target model domain can be defined as: d_X(P_Θ,P_Θ)=2sup_xXE_θP_Θ[f(θ;x),y]E_θP_Θ[f(θ;x),y]d\_{\mathcal{X}}\left(\mathcal{P}\_\Theta, \mathcal{P}\_{\Theta'}\right) = 2 \sup \_{x \in \mathcal{X}} \left| \mathbb{E}\_{\theta \sim \mathcal{P}\_\Theta}\ell \left[f(\theta;x), y\right]-\mathbb{E}\_{\theta \sim \mathcal{P}\_{\Theta'}}\ell \left[f(\theta;x), y\right] \right|

It's a natural expansion from domain adaptation theory [1] to transferable adversarial attack. We consider such divergence and redefine the population risk L_P(z)L\_P(z) as L_P(z,Θ)=E_θP_Θ[(f(θ;x),y)].L\_P(z, \Theta) = \mathbb{E}\_{\theta \sim \mathcal{P}\_\Theta} [\ell(f(\theta;x), y)]. Therefore, there is a connection between L_P(z)L\_P(z) of surrogate model domain and target model domain: L_P(z,Θ)L_P(z,Θ)12d_X(P_Θ,P_Θ)\left| L\_P(z, \Theta') - L\_P(z, \Theta) \right| \le \frac{1}{2}d\_{\mathcal{X}}\left(\mathcal{P}\_\Theta, \mathcal{P}\_{\Theta'}\right) Substituting our Theorem 2 into this inequality, we will obtain a general upper bound with an additional divergence term on the right-hand side: TE(z,ϵ)4R_N(Z)+2γβ2Nln21γH_α1α(P_ΘNP__i=1NΘ)δ+d_X(P_Θ,P_Θ).TE(z,\epsilon) \le 4\mathcal{R}\_{N}(\mathcal{Z}) + \sqrt{\frac{2 \gamma \beta^2}{N}\ln{\frac{2^{\frac{1}{\gamma}}H\_\alpha^{\frac{1}{\alpha}}\left(\mathcal{P}\_{\Theta^N} \| \mathcal{P}\_{\bigotimes\_{i=1}^N \Theta}\right)}{\delta}}}+d\_{\mathcal{X}}\left(\mathcal{P}\_\Theta, \mathcal{P}\_{\Theta'}\right).

According to this theory, the smaller the X\mathcal{X} divergence between the surrogate and target domains, the tighter the theoretical bound. Therefore, we need to let surrogate model domain be as close to the target model domain as possible. Such insight is in line with [2], which shows that reducing model discrepancy (which corresponds to the divergence defined above) can make adversarial examples highly transferable.

Moreover, to further advance the field, leveraging advanced domain adaptation theories (e.g., [3-4]) could yield deeper theoretical insights and inspire new algorithm designs. In the revision, we provide a more detailed analysis, including:

  • Extending our analysis to scenarios with different parameter spaces and distributions.
  • Future work can be done by identifying suitable mathematical tools from the extensive domain adaptation literature [5] to analyze adversarial transferability more deeply and inform algorithm development.

These enhancements will significantly expand the impact of our work by:

  • Being the first to draw an analogy between statistical learning theory and adversarial transferability, thereby introducing a new perspective to the field.
  • Being the first to encourage researchers to consider domain adaptation for deeper analysis and algorithmic innovations in transferable adversarial attack.

Overall, our theoretical framework is rigorous and highly adaptable, with the simplicity and flexibility to make it easy for researchers to follow and build upon. This fosters further innovation in addressing adversarial transferability challenges.

[1] Learning bounds for domain adaptation. NIPS 2007.

[2] Minimizing Maximum Model Discrepancy for Transferable Black-box Targeted Attacks. CVPR 2023.

[3] Information-Theoretic Analysis of Unsupervised Domain Adaptation. ICLR 2023.

[4] Bridging Theory and Algorithm for Domain Adaptation. ICML 2019.

[5] A survey on domain adaptation theory: learning bounds and theoretical guarantees. arXiv preprint arXiv:2004.11829, 2020.

评论

Q2: Using Rademacher Complexity in deep cases. First, I personally don't believe that Rademacher Complexity can convey reliable information when we are talking about deep networks. Second, Rademacher Complexity is more useful for asymptotic analysis, otherwise a lower upper bound of TE (i.e., Eq. (12)) does not indicate a lower value of TE.

A2: Thank you for your question.

  • On one hand, our primary objective is to establish a mathematical connection between generalization and adversarial transferability, thereby deepening our understanding of transferable adversarial attacks. Rademacher complexity, being a classic and elegant theoretical tool, serves this purpose effectively. Recent studies [1-4] have also employed it to analyze various aspects of deep learning, reaffirming its relevance in contemporary research. Consequently, we believe that leveraging this tool for the first time to explore the relationship between generalization and adversarial transferability in this field is both valuable and insightful. Furthermore, such a well-established and intuitive tool can help readers grasp the main concepts more effectively, making the solid theoretical framework presented in this paper easier to follow.
  • On the other hand, to comprehensively address your concerns, we have extended the theoretical tools employed in our analysis to inspire further advancements in the field. Specifically, we incorporate information-theoretic analysis [5] of our theoretical framework. Information-theoretic analysis is a recently promising framework for analyzing deep learning [6-8], as detailed in the Appendix D.5.

[1] Bridging the Gap: Rademacher Complexity in Robust and Standard Generalization. COLT 2024.

[2] On Regularization and Inference with Label Constraints. ICML 2023.

[3] On the Generalization Analysis of Adversarial Learning. ICML 2022.

[4] On Rademacher Complexity-based Generalization Bounds for Deep Learning. arXiv preprint arXiv:2208.04284, 2024.

[5] Information-theoretic analysis of generalization capability of learning algorithms. NeurIPS 2017.

[6] On f-Divergence Principled Domain Adaptation: An Improved Framework. NeurIPS 2024.

[7] How Does Information Bottleneck Help Deep Learning? ICML 2023.

[8] An Information-Theoretic Framework for Deep Learning. NeurIPS 2022.

Note that the training process of NN classifiers can be viewed as sampling the parameter sets θN=(θ_1,,θ_N)\overline{\theta}^N = (\overline{\theta}\_1, \ldots, \overline{\theta}\_N) from the distribution P_ΘN\mathcal{P}\_{\Theta^N}, i.e., θNP_ΘN\overline{\theta}^N \sim \mathcal{P}\_{\Theta^N}. We generate a transferable adversarial example using these NN models and evaluate its performance on another NN models θN=(θ_1,,θ_N)\theta^N = (\theta\_1, \ldots, \theta\_N), which is an independent copy of θN\overline{\theta}^N. For a data z=(x,y)Zz=(x,y) \in \mathcal{Z} and the parameter set θN\theta^N, our aim is to bound the difference of attack performance between the given NN models θN\overline{\theta}^N and NN unknown models θN\theta^N. In other words, if

  • An adversarial example zz can effectively attack the given model ensemble (i.e., high loss).
  • There is guarantee for the aforementioned difference of attack performance between known and unknown models.

Then there is adversarial transferability guarantee for zz.

Theorem. Given NN surrogate models θN=(θ_1,,θ_N)P_ΘN\overline{\theta}^N = (\overline{\theta}\_1, \ldots, \overline{\theta}\_N) \sim \mathcal{P}\_{\Theta^N} as the ensemble components. Let θN=(θ_1,,θ_N)P_ΘN\theta^N = (\theta\_1, \ldots, \theta\_N) \sim \mathcal{P}\_{\Theta^N} be the target models, which is an independent copy of θN\overline{\theta}^N. Assume the loss function \ell is bounded by βR_+\beta \in \mathbb{R}\_+ and P_ΘN\mathcal{P}\_{\Theta^N} is absolutely continuous with respect to P__i=1NΘ\mathcal{P}\_{\bigotimes\_{i=1}^N \Theta}. For α>1\alpha>1 and adversarial example z=(x,y)P_Zz=(x,y) \sim \mathcal{P}\_{\mathcal{Z}}, Let Δ_N(θ,z)=E_θNP_ΘN[1N_i=1N(f(θ_i;x),y)]1N_i=1N(f(θ_i;x),y).\Delta\_N(\theta,z)=\mathbb{E}\_{\overline{\theta}^N \sim \mathcal{P}\_{\Theta^N}} \left[ \frac{1}{N}\sum\_{i=1}^N \ell(f(\overline{\theta}\_i;x), y) \right] - \frac{1}{N}\sum\_{i=1}^N \ell(f(\theta\_i;x), y). Then there holds E_z,θNP_Z,ΘNΔ_N(θ,z)2βD_TV(P_ΘNP__i=1NΘ)+αβ22(α1)N(I(θN;z)+1αlogH_α(P_ΘNP__i=1NΘ)).\left| \mathbb{E}\_{z,\theta^N \sim \mathcal{P}\_{\mathcal{Z},\Theta^N}} \Delta\_N(\theta,z) \right| \le 2 \beta \cdot \mathrm{D}\_{\mathrm{TV}} \left( \mathcal{P}\_{\Theta^N} \| \mathcal{P}\_{\bigotimes\_{i=1}^N \Theta} \right) + \sqrt{\frac{\alpha \beta^2}{2 (\alpha-1) N} \left( I\left(\overline{\theta}^N;z\right) + \frac{1}{\alpha}\log H\_\alpha \left(\mathcal{P}\_{\Theta^N} \| \mathcal{P}\_{\bigotimes\_{i=1}^N \Theta}\right) \right)}. where D_TV()\mathrm{D}\_\mathrm{TV}(\cdot \| \cdot), I()I(\cdot \| \cdot) and H_α()H\_\alpha(\cdot \| \cdot) denotes TV distance, mutual information and Hellinger integrals, respectively.

评论

Thank you very much for your constructive comments! We address all your questions and concerns in the following responses.

Q1: Implicit assumptions in Eq. (3). The authors define the most transferable adversarial example zz^* in Eq. (3) as z=argmaxL_Pz^*=\operatorname{argmax} L\_P, where L_PL\_P in Eq. (1) is defined by taking expectation over θP_Θ\theta \sim P\_{\Theta}. This formulation has implicit assumptions that (1) the target model share the same parameter space Θ\Theta with the surrogate models, i.e., they have the same architectures; (2) the target model follow the same distribution P_ΘP\_{\Theta} with the surrogate models, i.e., they apply the same (or same distribution) of training configurations. Both of these assumptions make the transfer problem overly simplistic, because in practice, the target model typically employs different model architectures and training configurations (including different datasets) than the surrogate models.

A1: Thank you for your questions. Firstly, our proposed setting aligns with many realistic scenarios, as demonstrated in [1-5]. Specifically, they encompass cases where both the surrogate model and the target model adopt the same architectures such as ResNet-18, ResNet-50, Inception-v3, Inception-v4, and ViT. It reflects the fact that the settings in this paper are commonly considered in prior studies.

Furthermore, considering your kind suggestion, our theoretical framework is not only rigorous but also highly adaptable, making it straightforward to effectively extend and be more general using either of the two methods:

  • Defining the model space (Appendix D.4.1).
  • Leveraging insights from domain adaptation theory (Appendix D.4.2).

We have incorporated this discussion into our revision to highlight the impact and versatility of our work.

[1] Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement. CVPR 2024.

[2] Ensemble Diversity Facilitates Adversarial Transferability. CVPR 2024.

[3] Making substitute models more bayesian can enhance transferability of adversarial examples. ICLR 2023.

[4] Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability. CVPR 2022.

[5] Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks. ICLR 2020.

1. Defining the Model Space.

In particular, we consider NN surrogate classifiers f_1,,f_Nf\_1, \cdots, f\_N trained to generate adversarial examples. Let DD be the distribution over the surrogate models (for instance, the distribution of all the low-risk models), and f_iD,i[N]f\_i \in D, i \in [N]. The low-risk claim is in line with Lemma 5 in [1], which assumes that the risk of surrogate model and target model is low (have risk at most ϵ\epsilon from Lemma 5 in [1]). Therefore, the surrogate model and target model can be seen as drawing from the same distribution (such as a distribution of all the low-risk models). For a data point z=(x,y)Zz=(x,y) \in \mathcal{Z} and NN classifiers for model ensemble attack, define the population risk L_P(z)L\_P(z) and the empirical risk L_D(z)L\_D(z) as L_P(z)=E_fD[(f(x),y)],L\_P(z) = \mathbb{E}\_{f \sim D} [\ell(f(x), y)], and L_D(z)=1N_i[N],f_iD(f_i(x),y).L\_D(z) = \frac{1}{N} \sum\_{i \in [N], f\_i \in D} \ell(f\_i(x), y).

Now here is an extension of Theorem 2 based on the above definition, and the proof is almost the same.

Theorem 2 (Extension). Let P_DN\mathcal{P}\_{D^N} be the joint distribution of f_1,,f_Nf\_1, \cdots, f\_N, and P__i=1ND\mathcal{P}\_{\bigotimes\_{i=1}^N D} be the joint measure induced by the product of the marginals. If the loss function \ell is bounded by βR_+\beta \in R\_+ and P_DNP__i=1ND\mathcal{P}\_{D^N} \ll \mathcal{P}\_{\bigotimes\_{i=1}^N D} for any function f_if\_i, then for α>1\alpha>1 and γ=αα1\gamma=\frac{\alpha}{\alpha-1}, with probability at least 1δ1-\delta, there holds TE(z,ϵ)4R_N(Z)+2γβ2Nln21γH_α1α(P_DNP__i=1ND)δ.TE(z,\epsilon) \le 4\mathcal{R}\_{N}(\mathcal{Z}) + \sqrt{\frac{2 \gamma \beta^2}{N}\ln{\frac{2^{\frac{1}{\gamma}}H\_\alpha^{\frac{1}{\alpha}}\left(\mathcal{P}\_{D^N} \| \mathcal{P}\_{\bigotimes\_{i=1}^N D}\right)}{\delta}}}.

H_α(P_DNP__i=1ND)H\_\alpha (\mathcal{P}\_{D^N} \| \mathcal{P}\_{\bigotimes\_{i=1}^N D}) quantifies the divergence between the joint distribution P_DN\mathcal{P}\_{D^N} and product of marginals P__i=1ND\mathcal{P}\_{\bigotimes\_{i=1}^N D}. The divergence between them measures the degree of dependency among the NN classifiers f_1,,f_Nf\_1, \cdots, f\_N. Then we can get the same conclusion as Theorem 2.

[1] Trs: Transferability reduced ensemble via promoting gradient diversity and model smoothness. NeurIPS 2021.

评论

2. Significance of Our Work:

  • Unified framework and novel definitions: Our work introduces a novel definition and theoretical framework that unifies existing algorithms under a single paradigm. The Vulnerability-Diversity Decomposition is an equation rather than upper/lower bound in previous work. Our Vulnerability-Diversity Decomposition in adversarial transferability parallels the importance of bias-variance decomposition [1] in traditional machine learning, offering a high-level perspective to explain the logic of diverse algorithms in the field.
  • Theoretical contribution: The upper bound in Theorem 2 provides theoretical support for enhancing adversarial transferability. It supports regularization and diversity-boosting algorithms, bridging theoretical understanding with practical algorithmic designs. Our work is the first to connect generalization theory tools to adversarial transferability, providing a mathematical basis for long-standing empirical observations. Our work seeks to inspire the community, as suggested by Reviewer 6NcA, ignu and secg. It encourages researchers to derive deeper theoretical insights and design more effective algorithms by leveraging existing wisdom and advancing theoretical analyses.

As noted by the reviewers:

  • Reviewer 6NcA: "The theoretical results are solid and novel. The theoretical results can have a broader impact, as the analysis tools, such as those for bounding dependent random variables and the empirical Rademacher complexity for ensemble, can be applied elsewhere."
  • Reviewer ignu: "By defining the transferability error, authors make a good analogy to generalization error and derive some corresponding results to provide a better understanding of model ensemble attacks."
  • Reviewer secg: "The paper demonstrates strong originality by addressing the theoretical gaps in model ensemble-based adversarial attacks, introducing the novel concepts of transferability error, vulnerability-diversity decomposition, providing well-founded upper bounds for transferability error."

3. Regarding Empirical Comparisons to Previous Baselines for Ensemble-based Transfer Attacks.

Thank you for your question. Many studies have already conducted experimental validations. Regarding your comment on comparisons with baselines, our paper primarily focuses on theoretical analysis. Given this focus, our work does not propose new algorithms, and therefore empirical comparisons to previous baselines for ensemble-based transfer attacks are not the primary goal.

Regarding your comment on empirical comparisons, several studies with motivations aligned with our theoretical framework have already achieved significant success in practice, far exceeding previous baselines. For example, some works advocate enhancing model diversity to produce more transferable adversarial examples:

  • [2] introduces feature-level perturbations to existing models, potentially creating a vast set of diverse "Ghost Networks."
  • [3] emphasizes diversity in surrogate models by attacking a Bayesian model to achieve desirable transferability.
  • [4] proposes generating adversarial examples independently for individual models, further supporting the importance of improved diversity.

These studies report significant improvements, with attack success rates surpassing existing methods by approximately 10%, which is a remarkable advancement.

On one hand, the empirical comparisons in these studies provide strong support for our theoretical findings. On the other hand, our theoretical insights can inspire future algorithmic developments, leading to even more transferable adversarial attacks.

We sincerely thank you for your time and thoughtful review. We deeply appreciate the effort invested in evaluating our work and providing valuable suggestions. We kindly invite you to reconsider the innovative contributions of our paper to this theory-deficient field. Specifically, we hope our work not only establishes foundational insights but also inspires the research community to adopt and further innovate upon diverse theoretical tools, fostering a deeper understanding of adversarial transferability.

[1] Neural networks and the bias/variance dilemma. Neural Computation, 1992.

[2] Learning Transferable Adversarial Examples via Ghost Networks. AAAI 2020.

[3] Making substitute models more bayesian can enhance transferability of adversarial examples. ICLR 2023.

[4] Ensemble diversity facilitates adversarial transferability. CVPR 2024.

Thank you once again for your insightful review! We hope these revisions thoroughly address your concerns and enhance the clarity of our work. We would be happy to continue the discussion if you have any further questions or feedback!

评论

I'd like to thank the authors for their (very long) responses. I sincerely appreciate the authors' efforts.

  • About your responses to my Q1, the Theorem 2 (extension) just changes the parameter space definition thetasimP_Theta\\theta\\sim P\_{\\Theta} in Theorem 2 into the functional space definition fsimmathcalDf\\sim \\mathcal{D}, while you still assume that f_i_isimmathcalD\\{f\_{i}\\}\_{i}\\sim\\mathcal{D} follow the same distribution as ff. That's why as you mentioned, the proof is almost the same.

  • As to your second explanation using domain adaption, it is straightforward to know that the bound depends on d_mathcalX(P_Theta,P_Theta)d\_{\\mathcal{X}}(P\_{\\Theta},P\_{\\Theta'}). Transferable adversarial attacks are valuable only when d_mathcalX(P_Theta,P_Theta)d\_{\\mathcal{X}}(P\_{\\Theta},P\_{\\Theta'}) is large, which makes your bound quite loose. If you assume that d_mathcalX(P_Theta,P_Theta)d\_{\\mathcal{X}}(P\_{\\Theta},P\_{\\Theta'}) is small, or even equals to zero as assumed in Theorem 2 and Theorem 2 (extension), then what you do is just white-box attacks.

  • To support their assumptions and conclusions, the authors cite several papers accepted by top conferences. However, I do not buy in that a paper is accepted => its assumptions/conclusions are correct. In particular, in the adversarial literature, it is common that the conclusions in previously published papers are later overturned, e.g., [1].

  • Personally, I don't like a paper decorated with theoretical derivations (with strong assumptions) that cannot inspire new practices.

So overall, I admire the authors' efforts during rebuttal, and I would not challenge if AC decides to accept this paper. However, I want to insist my score of 5 to express my opposition to "theoretical papers" that cannot guide new practices.

[1] Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018 Best Paper

评论

Thank you very much for your thorough review and thoughtful feedback. We greatly appreciate the time and effort you have invested, as well as your encouraging comment, "I would not challenge if AC decides to accept this paper." It truly motivates us to further refine our work and engage in constructive discussions with you.

We believe that we share a common goal: to advance the understanding of critical challenges in our field and inspire progress in both theory and practice. Below, we would like to share our perspective on the role of theoretical research:

  • The practical relevance of a theory often emerges over time. The value of a framework may not be fully apparent from a single paper, but we hope our work will serve as a foundation that inspires future research in both attack and defense methodologies. This is the first step in what we see as a long-term effort, and we are committed to further developing solutions that bridge theory and practice.
  • Theoretical research serves as a foundation for deeper insights into poorly understood phenomena. While it is ideal for theoretical work to immediately inspire new practical algorithms, we believe it is equally valuable to provide theoretical explanations for observed phenomena. In this paper, we are the first to establish a theoretical foundation for transferable model ensemble adversarial attacks, addressing a previously underexplored area and unifying insights to guide the design of future algorithms.

We are deeply grateful for the opportunity to exchange ideas with you and value your insights, which have been instrumental in helping us improve our work. By working together as part of the broader research community, we can collectively advance the field and achieve meaningful progress in the future.

评论

In this theorem:

  • Δ_N(θ,z)\Delta\_N(\theta, z) quantifies how effectively the surrogate models represent all possible target models. Taking the expectation of Δ_N(θ,z)\Delta\_N(\theta, z) over zz and θN\theta^N accounts for the inherent randomness in both adversarial examples and surrogate models.
  • The mutual information I(θN;z)I\left(\overline{\theta}^N;z\right) quantifies how much information about the surrogate models is retained in the adversarial example. Intuitively, higher mutual information indicates that the adversarial example is overly tailored to the surrogate models, capturing specific features of these models. This overfitting reduces its ability to generalize and transfer effectively to other target models. By controlling the complexity of the surrogate models, the specific information captured by the adversarial example can be limited, encouraging it to rely on broader, more transferable patterns rather than model-specific details. This reduction in overfitting enhances the adversarial example's transferability to diverse target models.
  • The total variation (TV) distance, D_TV(P_ΘNP__i=1NΘ)\mathrm{D}\_\mathrm{TV} \left( \mathcal{P}\_{\Theta^N} \| \mathcal{P}\_{\bigotimes\_{i=1}^N \Theta} \right), and the Hellinger integral, H_α(P_ΘNP__i=1NΘ)H\_\alpha \left(\mathcal{P}\_{\Theta^N} \| \mathcal{P}\_{\bigotimes\_{i=1}^N \Theta}\right), capture the interdependence among the surrogate models.

It reveals that the following strategies contribute to a tighter bound:

  • Increasing the number of surrogate models, i.e., increasing NN;
  • Reducing the model complexity of surrogate models, i.e., reducing I(θN;z)I\left(\overline{\theta}^N;z\right);
  • Making the surrogate models more diverse, i.e., reducing D_TV(P_ΘNP__i=1NΘ)\mathrm{D}\_\mathrm{TV} \left( \mathcal{P}\_{\Theta^N} \| \mathcal{P}\_{\bigotimes\_{i=1}^N \Theta} \right) and H_α(P_ΘNP__i=1NΘ)H\_\alpha \left(\mathcal{P}\_{\Theta^N} \| \mathcal{P}\_{\bigotimes\_{i=1}^N \Theta}\right).

Note that these three strategies exactly align with those outlined in Theorem 2. A tighter bound ensures that an adversarial example maximizing the loss function on the surrogate models will also lead to a high loss on the target models, thereby enhancing transferability.

Q3: The three practical guidelines are already well-known. While the authors demonstrated some theoretical bounds, the three guidelines they concluded—(1) incorporating more surrogate models, (2) increasing their diversity, and (3) reducing their complexity in cases of overfitting-are all well-known in literature. There is also a lack of empirical comparisons to previous baselines for ensemble-based transfer attacks.

A3: Thank you for your question. We address your question logically in three steps:

  • The current research on transferable adversarial attacks highlights significant unknowns and ongoing debates.
  • Our contribution is primarily theoretical, bridging gaps and providing novel insights towards the unknowns and debates in this field.
  • Regarding empirical comparisons to baselines, we discuss how prior works aligned with our theoretical motivations have achieved significant advancements, offering strong support for our findings and inspiring future research.

1. The current state of research in this field:

  • Field challenges: Model ensembles in transferable adversarial attacks remain poorly understood and controversial (detailed in Appendix D.2.1 of our revision). Diverse definitions of "diversity" exist, as highlighted by Reviewer ignu. Additionally, numerous algorithms with varied motivations aim to enhance adversarial transferability (detailed in Appendix D.2.2 of our revision).
  • Survey insights: A recent survey on adversarial transferability [1] emphasizes the growing need for theoretical characterizations beyond empirical evaluations. They clarify that "In addition to empirical evaluations, there is also a growing recognition of the necessity for theoretical characterizations of transferability. Such theoretical analyses can provide valuable insights into the underlying principles governing the transferability of adversarial attacks."

[1] A Survey on Transferability of Adversarial Examples across Deep Neural Networks. TMLR 2024.

审稿意见
6

This paper provides a theoretical foundation for model ensemble methods used to generate transferable adversarial examples. The authors introduce three new concepts: transferability error, diversity, and empirical Rademacher complexity, which together decompose transferability error into two primary components: vulnerability and diversity. Futhermore, the authors establish a bound on transferability error and propose practical guidelines to reduce it, such as increasing model diversity and managing complexity to prevent overfitting. Extensive experiments validate these findings​.

优点

The paper demonstrates strong originality by addressing the theoretical gaps in model ensemble-based adversarial attacks, introducing the novel concepts of transferability error, vulnerability-diversity decomposition, providing well-founded upper bounds for transferability error.

缺点

Although this paper provides a strong theoretical foundation, some limitations affect its overall impact.

While the experiments are broad in scope, they can be enhanced by testing on a wider range of real-world scenarios or datasets outside of standard benchmarks such as MNIST and CIFAR-10 to verify applicability in more diverse contexts (e.g. CIFAR-100, SVHN, etc.).

问题

  • Given the identified trade-off between vulnerability and diversity, could the authors suggest any criteria or metrics for balancing these components during ensemble model selection?

  • The experiments use standard datasets like MNIST and CIFAR-10, which may not fully represent the complexity encountered in real-world applications. Have the authors considered testing on more complex datasets (e.g. CIFAR-100, SVHN, ImageNet, etc.)

  • Can the author give the specific method of generating adversarial samples in the experiment and the specific meaning of "steps" in fig. 2, 3 and 4.

评论

Thank you very much for your constructive comments! We address all your questions and concerns in the following responses.

Q1: Although this paper provides a strong theoretical foundation, some limitations affect its overall impact. While the experiments are broad in scope, they can be enhanced by testing on a wider range of real-world scenarios or datasets outside of standard benchmarks such as MNIST and CIFAR-10 to verify applicability in more diverse contexts (e.g. CIFAR-100, SVHN, etc.).

A1: Thank you for your question. As you suggested, we have included additional experiments on CIFAR-100 in Appendix E. The results are consistent with those for MNIST, Fashion-MNIST, and CIFAR-10 presented in our submission, further reinforcing the validity of our findings and insights.

Q2: Given the identified trade-off between vulnerability and diversity, could the authors suggest any criteria or metrics for balancing these components during ensemble model selection?

A2: Your comment raises an insightful point, and articulating this issue clearly will further enhance the impact of our work.

  • To achieve a better trade-off, a straightforward approach is to incorporate recently proposed diversity metrics, such as the Vendi score introduced in [1] and the EigenScore proposed in [2]. These metrics could be utilized either for model selection or as components of the optimization objective to identify the optimal vulnerability-diversity trade-off.
  • In practice, diversity can be incorporated into the optimization objective to strike a balance between diversity and ensemble loss. By doing so, transferability error could be reduced, thereby improving the transferability of adversarial examples. Exploring these directions would be a valuable avenue for future research.

[1] The Vendi Score: A Diversity Evaluation Metric for Machine Learning. TMLR 2023.

[2] INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection. ICLR 2024.

Q3: The experiments use standard datasets like MNIST and CIFAR-10, which may not fully represent the complexity encountered in real-world applications. Have the authors considered testing on more complex datasets (e.g. CIFAR-100, SVHN, ImageNet, etc.)

A3: Thank you for your question. As you suggested, we have included additional experiments on CIFAR-100 in Appendix E. The results are consistent with those for MNIST, Fashion-MNIST, and CIFAR-10 presented in our submission, further reinforcing the validity of our findings and insights.

Q4: Can the author give the specific method of generating adversarial samples in the experiment and the specific meaning of "steps" in fig. 2, 3 and 4.

A4: Thank you for highlighting this issue. We have added further details in Section 5 (Line 418-420, 426-430, 463) to enhance readers' understanding of the experiments. In particular:

  • For models trained on MNIST, Fashion-MNIST, we set the number of epochs as 1010. For models trained on CIFAR-10, we set the number of epochs as 3030. We use the Adam optimizer with setting the learning rate as 10310^{-3}. We set the batch size as 6464.
  • "Steps" indicates the number of steps for attack. And we denote λ\lambda as the weight decay.
  • We record the attack success rate (ASR), loss value, and the variance of model predictions with increasing the number of steps for attack. We use MI-FGSM [1] to craft the adversarial example and use the cross-entropy as the loss function to optimize the adversarial perturbation. Generally, the number of steps for the transferable adversarial attack is set as 1010 [1-4], but to study the attack dynamics more comprehensively, we perform 2020-step attack. In our plots, we use the mean-squared-error to validate our theory, which indicates the vulnerability from the theory perspective better.

[1] Boosting adversarial attacks with momentum. CVPR 2018.

[2] Ensemble Diversity Facilitates Adversarial Transferability. CVPR 2024.

[3] Boosting Adversarial Transferability by Block Shuffle and Rotation. CVPR 2024.

[4] Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability. CVPR 2022.

Thank you once again for your insightful review! We hope these revisions thoroughly address your concerns and enhance the clarity of our work. We would be happy to continue the discussion if you have any further questions or feedback!

评论

Thanks for considering my suggestions and responding to my questions.

This paper provides theoretical guarantees for model ensemble-based transfer attacks, which can offer some guidance for future work. However, I think the significance might be limited, as many existing papers have already proposed strong methods, and the theory in this paper does not seem to present new challenges in transfer attacks. Therefore, I would like to maintain my score.

评论

Thank you for your careful review and thoughtful feedback!

  • We appreciate your observation regarding the abundance of strong transferable model ensemble adversarial attacks in the literature. This recognition indeed highlights the motivation for our work. We would like to emphasize that, to the best of our knowledge, this paper is the first to establish a theoretical foundation for these algorithms in this field.
  • Moreover, as outlined in our response to Reviewer 6NcA’s Question 1, our theoretical framework and mathematical tools are not only applicable to analyzing adversarial attacks but can also provide valuable insights for ensemble model defenses. We believe this dual applicability represents a meaningful contribution to advancing both attack and defense strategies in this domain.

We sincerely thank you for your constructive suggestions and kind feedback. We believe the theoretical advancements presented in this paper address a critical gap in this underexplored area, and we would be truly grateful if you could consider raising your confidence score in recognition of this contribution. At the same time, regardless of the final score, we deeply respect your evaluation process and remain thankful for your insightful comments, which have been invaluable in improving our work.

If you have any additional suggestions or questions, please don’t hesitate to point out-we would be delighted to address them!

审稿意见
6

The authors propose a theoretical framework to explain the observations by prior empirical methods on increasing the effectiveness of model ensemble attacks. They define transferability error to measure the effectiveness of an adversarial example which is basically analogous to the generalization of the adversarial example to unseen trained models belonging to a specific function class. They also define an empirical version of Rademacher complexity as a measure of complexity for the input space for an ensemble of N classifiers and show that the transferability error is upper-bounded by a combination of this measure of input space complexity and the divergence of joint distribution of the model parameters of the ensemble from the product of their marginals which accounts for non-independence of the models of an ensemble.

优点

The paper is well-written and well motivated with a good survey of related works.

By defining the transferability error, authors make a good analogy to generalization error and derive some corresponding results to provide a better understanding of model ensemble attacks.

Authors avoid the independence assumption that is used for studying generalization and derive an upper-bound is based on the divergence of the joint distribution of the parameters of the models from the case where they are independent.

缺点

  1. Authors connect their theoretical results with empirical observations in prior work regarding the diversity of the models; however, their definition of diversity does not match with many of these prior works. For example in [1] and [2] the diversity is defined as having gradient vectors (with respect to the inputs) with low cosine similarity. What authors consider as diversity here actually is supposed to decrease naturally according to Lemma 5 in [1]. Could authors clarify how their definition of diversity relates to these previous definitions in the literature, particularly those based on gradient similarity.

[1] Yang, Z., Li, L., Xu, X., Zuo, S., Chen, Q., Zhou, P., ... & Li, B. (2021). Trs: Transferability reduced ensemble via promoting gradient diversity and model smoothness. Advances in Neural Information Processing Systems, 34, 17642-17655.

[2] Kariyappa, S., & Qureshi, M. K. (2019). Improving adversarial robustness of ensembles with diversity training. arXiv preprint arXiv:1901.09981.

  1. The complexity of the models in the ensemble and the complexity of the input space seem to be used interchangeably sometimes. Equation 12 shows the complexity of input space defined by the authors, but in the follow-up discussion (line 342) it is mentioned that the model complexity has to be controlled when using stronger and more diverse ensembles.

  2. The interpretation of input space Rademacher complexity defined by the authors does not seem clear! The presented results suggest decreasing this complexity to achieve a tighter upper bound on the transferability error. However, decreasing this complexity means achieving a state where the sample in the input space is not able to achieve a high loss value for the models in the ensemble. This basically means that the optimization in equation 3 will achieve a lower value for equation 1 which authors are seeking to increase. This seems contradictory and it would be great if authors could clarify that.

  3. The experiments do not seem to be comprehensive in evaluating the presented theoretical results. For example, there is no analysis with respect to the complexity of the input space or the trade-off of diversity and complexity.

问题

Other than the concerns pointed out in the weaknesses I have some additional questions for the authors:

  1. I have some confusion about the presented plots in the experiments which are not well-explained. Regarding the experiments, are you using mini-batch SGD as the optimizer? By "# step" on the x-axis do you mean the number of epochs? For loss value, is this the loss value of the expectation of logits on a training sample or test sample? Isn't that supposed to be decreasing as all the models are being trained?

  2. In figure 4, the variance of the logits from the models in the ensemble is shown to be increasing for CIFAR-10, but the number of epochs is too small and it is not clear whether the same trend continues. Could authors plot them with a higher number of epochs?

  3. The plots with increasing values of the variance of the logits from the models of the ensemble seem contradictory to Lemma 5 of [1]. The authors also mention for some datasets they see a decreasing trend similar to what is expected from Lemma 5 of [1]. Could the authors comment on the potential reasons for their different observations for other datasets?

评论

Q7: The plots with increasing values of the variance of the logits from the models of the ensemble seem contradictory to Lemma 5 of [1]. The authors also mention for some datasets they see a decreasing trend similar to what is expected from Lemma 5 of [1]. Could the authors comment on the potential reasons for their different observations for other datasets?

A7: Thank you for your question.

  • As mentioned in our response to your Question 1, the theoretical results in this paper, combined with Lemma 5 of [1], offer a comprehensive understanding of the factors affecting adversarial transferability from two distinct perspectives.
  • In other words, we believe that the trend of diversity can be harmonized with Lemma 5 in [1], highlighting their compatibility rather than contradiction.

Additionally, considering the different observations you mentioned, the dynamics of diversity may show varying trends due to the inherent trade-off between diversity and vulnerability. In particular, consider two distinct phases in the attack dynamics in Figures 2-4 (specifically the "variance" subfigure):

  • Initial phase of the attack (first few steps): During this phase, the adversarial example struggles to attack the model ensemble effectively (a low loss). Consequently, both the loss and variance increase, aligning with the Vulnerability-Diversity Decomposition.
  • Potential "overfitting" phase of the attack (subsequent steps): In this phase, the adversarial example can effectively attack the model ensemble, achieving a high loss. Here, the trade-off between diversity and complexity becomes evident, particularly at the final step of the attack. As the regularization term λ\lambda increases (i.e., lower model complexity), the variance of the model ensemble may increase. For instance, in the variance subfigure, the red curve may exceed one of the other curves, indicating this potential trade-off. However, if the adversarial example does not reach the "overfitting" phase, the trend will continue to follow the initial phase of the attack. This explains the different observations in our experiments.

The relationship between vulnerability and diversity merits deeper exploration in the future. The discussion of such dynamics below is in Appendix D.8 in the revision.

  • Drawing on the parallels between the vulnerability-diversity trade-off and the bias-variance trade-off [2], we find that insights from the latter may prove valuable for understanding the former, and warrant further investigation.
  • The classical bias-variance trade-off suggests that as model complexity increases, bias decreases while variance rises, resulting in a U-shaped test error curve. However, recent studies have revealed additional phenomena and provided deeper analysis [3], such as the double descent [4].
  • Our experiments indicate that diversity does not follow the same pattern as variance in classical bias-variance trade-off. Nonetheless, there are indications within the bias-variance trade-off literature that suggest similar behavior might occur.
    • For instance, [5] proposes that variance exhibits a bell-shaped curve, initially increasing and then decreasing as network width grows.
    • Additionally, [6] offers a meticulous understanding of variance through detailed decomposition, highlighting the influence of factors such as initialization, label noise, and training data.

Overall, the trend of variance in model ensemble attack remains a valuable area for future research. We may borrow insights from machine learning literature to get a better understanding of this.

[1] Trs: Transferability reduced ensemble via promoting gradient diversity and model smoothness. NeurIPS 2021.

[2] Neural networks and the bias/variance dilemma. Neural Computation, 1992.

[3] On the bias-variance tradeoff: Textbooks need an update. arXiv preprint arXiv:1912.08286, 2019.

[4] Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 2019.

[5] Rethinking bias-variance trade-off for generalization of neural networks. ICML 2020.

[6] What causes the test error? going beyond bias-variance via anova. JMLR 2021.

Thank you once again for your insightful review! We hope these revisions thoroughly address your concerns and enhance the clarity of our work. We would be happy to continue the discussion if you have any further questions or feedback!

评论

Q5: I have some confusion about the presented plots in the experiments which are not well-explained. Regarding the experiments, are you using mini-batch SGD as the optimizer? By "# step" on the x-axis do you mean the number of epochs? For loss value, is this the loss value of the expectation of logits on a training sample or test sample? Isn't that supposed to be decreasing as all the models are being trained?

A5: Thank you for highlighting this issue. We have added further details in Section 5 (Line 418-420, 426-430, 463) to enhance readers' understanding of the experiments. In particular:

  • For models trained on MNIST, Fashion-MNIST, we set the number of epochs as 1010. For models trained on CIFAR-10, we set the number of epochs as 3030. We use the Adam optimizer with setting the learning rate as 10310^{-3}. We set the batch size as 6464.
  • The number of steps for attack is indicated by "# step". And we denote λ\lambda as the weight decay.
  • We record the attack success rate (ASR), loss value, and the variance of model predictions with increasing the number of steps for attack. We use MI-FGSM [1] to craft the adversarial example and use the cross-entropy as the loss function to optimize the adversarial perturbation. Generally, the number of steps for the transferable adversarial attack is set as 1010 [1-4], but to study the attack dynamics more comprehensively, we perform 2020-step attack. In our plots, we use the mean-squared-error to validate our theory, which indicates the vulnerability from the theory perspective better.

Since xx-axis indicates the number of steps for attack, the loss value is increasing as we are maximizing the loss to generate adversarial examples.

[1] Boosting adversarial attacks with momentum. CVPR 2018.

[2] Ensemble Diversity Facilitates Adversarial Transferability. CVPR 2024.

[3] Boosting Adversarial Transferability by Block Shuffle and Rotation. CVPR 2024.

[4] Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability. CVPR 2022.

Q6: In figure 4, the variance of the logits from the models in the ensemble is shown to be increasing for CIFAR-10, but the number of epochs is too small and it is not clear whether the same trend continues. Could authors plot them with a higher number of epochs?

A6: Thank you for your question.

  • The number of steps in Figures 2-4 refers to the iterations used to generate adversarial examples, specifically the number of iterations in the gradient-based attack. We have included additional details in the revision to enhance readers' understanding of our experimental setup.
  • Regarding the number of steps, we utilize 20 iterations, exceeding the settings used in previous experiments, such as 10 steps [1-4] and 16 steps [5]. By considering more steps than prior works and illustrating the dynamics across each step, our results provide a more comprehensive experimental perspective. This detailed representation allows researchers to better grasp the theoretical claims presented in our paper.

[1] Ensemble Diversity Facilitates Adversarial Transferability. CVPR 2024.

[2] Boosting Adversarial Transferability by Block Shuffle and Rotation. CVPR 2024.

[3] Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability. CVPR 2022.

[4] Boosting Adversarial Attacks with Momentum. CVPR 2018.

[5] Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks. ICLR 2020.

评论

2. Clarification on the contradiction.

We understand your concern, and we will clarify how decreasing model complexity relates to transferability error. We first explain your concern in the context of traditional machine learning to make it more clear to understand:

  • Our definition of transferability error is similar to Equation (2.25) in [1] (Section 2.4.3, follow the link and see PDF Page 39/427) in statistical learning theory. This concept is related to Excess Risk, which refers to the difference between the risk of a model and the risk of the best possible model within a given class. If we select a model space with very low complexity (such as one consisting only of random guess models), the excess risk can be small because the model's risk and the optimal model's risk (random guess) can converge, despite both yielding trivial performance.
  • Come back to transferability error. If we reduce the model complexity too much as you suggest, the adversarial examples generated will be trivial and unable to effectively attack the models, as there won't be a valid adversarial example capable of causing a high loss. Our goal is to work with a model space that is not trivial, where adversarial examples are meaningful and can effectively attack the models in the ensemble. Therefore, the situation you described would render the problem trivial. For example, using a random guess model to classify the ImageNet dataset falls outside the scope of our theoretical framework. To ensure the problem remains meaningful, it is crucial to strike a balance when controlling model complexity.

In short, while reducing complexity too much can lower transferability error, it also diminishes the overall effectiveness of the attack, which is not the desired outcome. Therefore, we focus on finding an optimal balance that allows for meaningful adversarial examples while controlling transferability error.

[1] Foundations of machine learning. 2018. https://www.hlevkin.com/hlevkin/45MachineDeepLearning/ML/Foundations_of_Machine_Learning.pdf

Q4: The experiments do not seem to be comprehensive in evaluating the presented theoretical results. For example, there is no analysis with respect to the complexity of the input space or the trade-off of diversity and complexity.

A4: Thank you for your question.

Empirical model ensemble Rademacher complexity: Similar to the traditional Rademacher complexity in learning theory [1-3], it is challenging to compute directly. Instead, it serves as an elegant mathematical tool for analyzing and understanding learning problems. Likewise, our empirical model ensemble Rademacher complexity also mainly serves as an analytical tool for theoretical understanding. However, computing and estimating this complexity in experiments is non-trivial due to the infinite nature of adversarial examples in the input space. This represents an intriguing avenue for future exploration, although it lies beyond the scope of this work.

The trade-off between diversity and complexity: Our experimental results, particularly Figures 2-4 (specifically the "variance" subfigure), highlight the trade-off between diversity and complexity. Consider two distinct phases in the attack dynamics:

  • Initial phase of the attack (first few steps): During this phase, the adversarial example struggles to attack the model ensemble effectively (a low loss). Consequently, both the loss and variance increase, aligning with the Vulnerability-Diversity Decomposition.
  • Potential "overfitting" phase of the attack (subsequent steps): In this phase, the adversarial example can effectively attack the model ensemble, achieving a high loss. Here, the trade-off between diversity and complexity becomes evident, particularly at the final step of the attack. As the regularization term λ\lambda increases (i.e., lower model complexity), the variance of the model ensemble may increase. For instance, in the variance subfigure, the red curve may exceed one of the other curves, indicating this potential trade-off.

Thank you pointing out! We incorporate the above discussion into Line 481-490 in the revision.

[1] Generalization Guarantees via Algorithm-dependent Rademacher Complexity. COLT 2023.

[2] Rademacher Complexity for Adversarially Robust Generalization. ICML 2019.

[3] Size-Independent Sample Complexity of Neural Networks. COLT 2018.

评论

2. Resolution of the Potential Conflict: We point out that no actual contradiction exists between Lemma 5 and our work. Instead, they provide complementary analyses:

  • Upper bound interpretation: Lemma 5 provides an upper bound rather than an equality or lower bound. While an increase in ρ\rho loosens this upper bound, it does not necessarily imply that the left-hand side (i.e., transferability success) will increase. The significance of an upper bound lies in the fact that a tighter right-hand side suggests the potential for a smaller left-hand side. However, a looser upper bound does not necessarily imply that the left-hand side will increase. Therefore, while increasing ensemble diversity may loosen the upper bound in Lemma 5, it does not contradict the fundamental interpretation of it.
  • Complementary perspectives: While Lemma 5 analyzes the trade-off between ϵ\epsilon (model fit to the original data) and ρ\rho (distributional discrepancy), our work focuses on the trade-off between vulnerability and ensemble diversity. Together, they provide a comprehensive understanding of the factors influencing adversarial transferability.

3. Connecting Our Work with Lemma 5

We now further elucidate the relationship between our results and Lemma 5:

  • Reducing Transferability Error: To minimize transferability error (as in our work), the adversarial transferability described by Lemma 5 may have stronger theoretical guarantees, requiring its upper bound to be tighter.
  • Trade-off Between ϵ\epsilon and ρ\rho: To tighten the bound in Lemma 5, either ϵ\epsilon or ρ\rho must decrease. However, the two exhibit a trade-off:
    • If ϵ\epsilon decreases, A and B fit the original data distribution better. However, beyond a certain point, the adversarial examples generated by A diverge significantly from the original data distribution, increasing ρ\rho.
    • If ρ\rho decreases, the adversarial example distribution becomes closer to the original data distribution. However, beyond a certain point, A exhibits similar losses on both distributions, resulting in a higher ϵ\epsilon.

Therefore, Lemma 5 indicates the potential trade-off between ϵ\epsilon and ρ\rho in adversarial transferability, while our Theorem 1 emphasizes the trade-off between vulnerability and diversity.

By integrating the perspectives from both Lemma 5 and our findings, these results illuminate different facets of adversarial transferability, offering complementary theoretical insights. This combined understanding deepens our knowledge of the factors influencing adversarial transferability and lays a solid foundation for future research in the field.

Q2: The complexity of the models in the ensemble and the complexity of the input space seem to be used interchangeably sometimes. Equation 12 shows the complexity of input space defined by the authors, but in the follow-up discussion (line 342) it is mentioned that the model complexity has to be controlled when using stronger and more diverse ensembles.

A2: Thank you for your suggestion! We will clarify this point further in the revision. As stated in Lemma 2 and lines 302-307, the complexity of the input space is indeed correlated with the complexity of the models. Through Lemma 2, we provide an insight that the complexity of the input space can be reduced by lowering the model complexity and increasing the number of ensemble models. We will revise this section to make the explanation more coherent and straightforward.

Q3: The interpretation of input space Rademacher complexity defined by the authors does not seem clear! The presented results suggest decreasing this complexity to achieve a tighter upper bound on the transferability error. However, decreasing this complexity means achieving a state where the sample in the input space is not able to achieve a high loss value for the models in the ensemble. This basically means that the optimization in equation 3 will achieve a lower value for equation 1 which authors are seeking to increase. This seems contradictory and it would be great if authors could clarify that.

A3: Thank you for your question!

1. The interpretation of input space Rademacher complexity. To make it more clear and easy to understand for readers, Lemma 2 (Ensemble Complexity of MLP) in our paper offers an upper bound and demonstrates that reducing model complexity while increasing the number of models can effectively control the empirical model ensemble Rademacher complexity. We also provide additional explanations in the revision to help readers better understand input space complexity. In Appendix D.1, we discussed two specific scenarios (including the example you mentioned) to illustrate the potential consequences of overly high or overly low input space complexity.

评论

Thank you very much for your constructive comments! We address all your questions and concerns in the following responses.

Q1: Authors connect their theoretical results with empirical observations in prior work regarding the diversity of the models; however, their definition of diversity does not match with many of these prior works. For example in [1] and [2] the diversity is defined as having gradient vectors (with respect to the inputs) with low cosine similarity. What authors consider as diversity here actually is supposed to decrease naturally according to Lemma 5 in [1]. Could authors clarify how their definition of diversity relates to these previous definitions in the literature, particularly those based on gradient similarity.

[1] Yang, Z., Li, L., Xu, X., Zuo, S., Chen, Q., Zhou, P., ... & Li, B. (2021). Trs: Transferability reduced ensemble via promoting gradient diversity and model smoothness. Advances in Neural Information Processing Systems, 34, 17642-17655.

[2] Kariyappa, S., & Qureshi, M. K. (2019). Improving adversarial robustness of ensembles with diversity training. arXiv preprint arXiv:1901.09981.

A1: Thank you for your highly constructive question, which highlights an important aspect that can enhance the impact of this paper on the field of adversarial transferability. We include a discussion of [1-2] in the revised version (Line 278-283 and Appendix D.2-D.3). Specifically:

1. On the Definition of Diversity and Its Differences from Existing Works [1-2]:

  • In [1], gradient diversity is defined using the cosine similarity of gradients between different models, and instance-level transferability is introduced, along with a bound for transferability. This work cleverly uses Taylor expansion to establish a theoretical connection between the success probability of attacking a single sample and the gradients of the models.
  • In [2], inspired by the concept of adversarial subspace in [3], diversity is defined based on the cosine similarity of gradients across different models. The authors aim to encourage models to become more diverse, thereby achieving “no overlap in the adversarial subspaces,” and provide intuitive insights to readers. Both papers define gradient diversity and explain its impact on transferability.

In contrast, our definition of diversity stems from the unified theoretical framework proposed in this paper. Specifically:

  • We draw inspiration from statistical learning theory [4-5] on generalization, defining transferability error accordingly.
  • Additionally, we are motivated by ensemble learning [6-7], where we define diversity as the variation in outputs among different ensemble models.
  • Intuitively, when different models exhibit significant differences in their outputs for the same sample, their gradient differences during training are likely substantial as well. This suggests a potential connection between our output-based definition of diversity and the gradient-based definitions in [1-2], which is worth exploring in future research.

Overall, our perspective differs from that of [1-2]. Despite the differences in definitions, both our work and [1-2] provide valuable explanations for phenomena in the field of adversarial transferability.

[1] Trs: Transferability reduced ensemble via promoting gradient diversity and model smoothness. NeurIPS 2021.

[2] Improving adversarial robustness of ensembles with diversity training. ICML 2019.

[3] The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453, 2017.

[4] Learnability, Stability and Uniform Convergence. JMLR 2010.

[5] Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. JMLR 2002.

[6] Pathologies of Predictive Diversity in Deep Ensembles. TMLR 2024.

[7] A Unified Theory of Diversity in Ensemble Learning. JMLR 2023.

评论

Thank you very much for your complete responses and clarifications. My concerns are mostly addressed. With the additional explanation on the experiments, I noticed that the setting of the experiments does not match the assumption of the theorems since they are MLPs or Convolutional networks with different numbers of layers, and therefore, their parameters belong to different parameter spaces. Also, the models trained on different transformations of the inputs do not necessarily have the same distribution over the parameter space. In addition, it is not clear to me why they chose to use a low number of training epochs for the models (10 epochs for MNIST and 30 for CIFAR-10), but this one is a minor objection if the authors do not have the results for higher epochs.

Regardless of these points in the experiments, as I mentioned earlier, the theoretical results and connection to generalization are interesting and are at least a step forward toward a better understanding of model ensemble attacks. Of course, the impact was more noticeable if they could explain some of the empirical advances. As I mentioned previously, the empirical works discussed in this paper might use the same terms but for different criteria (e.g., cosine similarity of gradients for diversity). Even the paper that the authors have cited in their response (paper [2]) uses transformations in the input space which might not fit in the theoretical framework of this paper.

Still, because of the merits of this paper and the authors' responses, I increase my confidence and soundness scores. I would check the other reviewers' discussions for further decisions on this paper. Thanks again for your efforts!

评论

Thank you for your thoughtful follow-up question and for thoroughly reviewing our revisions and responses. We deeply appreciate your attention to detail and the opportunity to provide further clarification.

Regarding your concern about the number of epochs, we trained the model to convergence within the specified epochs. Thus, the experiments remain valid, as the model successfully achieved convergence under these conditions.

Regarding your concern about the parameter space and assumptions, many input transformation methods (e.g., [1-2]) do not alter input dimensions, which ensures that the parameter space remains consistent with the theoretical framework proposed in this paper. Our primary focus is to provide initial theoretical insights into transferable model ensemble adversarial attacks, and we designed the settings and assumptions to be accessible and straightforward for researchers in this field. As highlighted in Appendix D.4, extending this theory to address more complex scenarios, such as varying parameter space distributions as you suggested, is indeed an interesting and valuable direction for future work that could significantly benefit the research community.

Thank you again for your valuable suggestions! Your suggestions offer valuable guidance for future work and introduce a fresh perspective for research in the field. We deeply appreciate the opportunity to refine our work based on your thoughtful suggestions. If you have any further questions or recommendations, please don’t hesitate to share them—we would be delighted to discuss and address them.

[1] Admix: Enhancing the Transferability of Adversarial Attacks. ICCV 2021.

[2] Improving Transferability of Adversarial Examples with Input Diversity. CVPR 2019.

审稿意见
6

This submission theoretically studies the transferability error - the chance of being successful if the attack is generated by transferring from an ensemble of models. The core result is an upper bound of transferability error involving a vulnerability term and a diversity term, which further boils down to empirical ensemble Rademacher complexity and the Hellinger distance between joint model distributions and i.i.d. model distribution. The key insight is that the transfer attack needs to involve both more and diverse models and reduce model complexity to be powerful. Results are empirically verified on multiple datasets.

优点

  • The theoretical results are solid and novel. Specifically, it is interesting to see the transferability error can be connected with the empirical Rademacher complexity in a similar form with generalization bound, and the Hellinger distance can be used to quantify the inter-dependency across surrogate models.

  • The theoretical results can have a broader impact, as the analysis tools, such as those for bounding dependent random variables and the empirical Rademacher complexity for ensemble, can be applied elsewhere.

  • The writing is clear and easy to follow in general.

缺点

  • The studied problem and practical implications may be limited. The analysis is only applicable for ensemble-based transfer attacks and it can only directly guide the design of more powerful attacks of this kind. How to leverage the analysis to better defend the model, or how to generalize the results beyond L2-bounded attacks, are worth further exploration.

  • Some insights from the theory may not be justified enough. For example, in Line 333-335, the paper mentioned that we need to increase the diversity of parameters in surrogate models to reduce Hα()H_\alpha(\cdot). It seems that surrogate models need to be independently trained to achieve a minimal Hα()H_\alpha(\cdot). However, in practice, encouraging model diversity, e.g., introducing some diversity-promoting regularizers, can sometimes further improve the attack efficiency. As a result, encouraging model diversity introduces model-level dependency and increases Hα()H_\alpha(\cdot) but reduces transferability error. From this point of view, the theory may not reflect the full picture of adversarial transferability.

  • The experiment part is relatively not clear. For example, in Figure 2, good to mention that λ\lambda is the weight decay, explain what the xx-axis is, and discuss detail training parameters in the main text.

Minor:

  1. Line 106: combine -> combines
  2. Line 153: the hypothesis space maps to a discrete label space, and then the loss function :Y×YR0+\ell: \mathcal{Y} \times \mathcal{Y} \mapsto \mathbb{R}_0^+ has a discrete domain 1,1×1,1\\{-1,1\\} \times \\{-1,1\\} which is weird, may need some fix.
  3. Line 279: the redundant phrase "provided in Appendix"
  4. Line 1061: please define RR beforehand.
  5. Line 1281 - 1290: seems that there is a missing 1/N1/N coefficient before all i=1Nf(θi;x)\sum_{i=1}^N f(\theta_i; x).

问题

  1. Why is Line 1035 equal to Line 1038?
  2. Why is Line 1378 equal to Line 1381?
评论

Q5: Why is Line 1035 equal to Line 1038? sup_x_2B(max_i[U_1ixU_mix]_2)=msup_x_2B(max_imax_jU_jix_2).\sup\_{\|x\|\_2 \leq B} \left(\max\_i{\left\|\begin{bmatrix} U\_{1i}x \\\\ \vdots \\\\ U\_{mi}x \end{bmatrix}\right\|\_2} \right) = \sqrt{m}\sup\_{\|x\|\_2 \leq B} \left(\max\_i{\max\_j{\left\|U\_{ji}x\right\|\_2}} \right).

A5: We appreciate the reviewer's careful examination. Previously, this was written as an equality, but it should actually be an inequality. This step involves scaling by extracting the largest element in the vector. sup_x_2B(max_i[U_1ixU_mix]_2)msup_x_2B(max_imax_jU_jix_2).\sup\_{\|x\|\_2 \leq B} \left(\max\_i{\left\|\begin{bmatrix} U\_{1i}x \\\\ \vdots \\\\ U\_{mi}x \end{bmatrix}\right\|\_2} \right) \le \sqrt{m}\sup\_{\|x\|\_2 \leq B} \left(\max\_i{\max\_j{\left\|U\_{ji}x\right\|\_2}} \right). Note that this typo does not affect the overall result since the subsequent derivations all use inequalities and provide the upper bound of msup_x_2B(max_imax_jU_jix_2)\sqrt{m}\sup\_{\|x\|\_2 \leq B} \left(\max\_i{\max\_j{\left\|U\_{ji}x\right\|\_2}} \right).

Q6: Why is Line 1378 equal to Line 1381? \mathbb{E}\_{\mathcal{P}\_{\Theta^N}, \mathcal{P}'\_{\Theta^N}}\left\\{ \sup\_{z \in \mathcal{Z}} \frac{1}{N} \left[ \sum\_{i=1}^N \ell(f(\theta'\_i;x), y) - \sum\_{i=1}^N \ell(f(\theta\_i;x), y) \right]\right\\} \\ = \mathbb{E}\_{\boldsymbol{\sigma}} \mathbb{E}\_{ \mathcal{P}\_{\Theta^N}, \mathcal{P}'\_{\Theta^N}}\left\\{ \sup\_{z \in \mathcal{Z}} \frac{1}{N} \left[ \sum\_{i = 1}^N \sigma\_i \left[ \ell(f'\_i(x), y) - \ell(f\_i(x), y) \right] \right]\right\\}

A6: Here we introduce Rademacher variables that is uniformly distributed independent random variables taking values in 1,1\\{-1,1\\}. This does not change the expectation:

  • When σ_i=1\sigma\_i=1, the associated summand remains unchanged;
  • When σ_i=1\sigma\_i=-1, the associated summand flips signs, which is equivalent to swapping f_i(x)f\_i'(x) and f_i(x)f\_i(x) between P_ΘN\mathcal{P}'\_{\Theta^N} and P_ΘN\mathcal{P}\_{\Theta^N}. Since we are taking the expectation over all possible P_ΘN\mathcal{P}'\_{\Theta^N} and P_ΘN\mathcal{P}\_{\Theta^N}, this swap does not affect the overall expectation. We are changing the order of the summands within the expectation.

Thank you once again for your insightful review! We hope these revisions thoroughly address your concerns and enhance the clarity of our work. We would be happy to continue the discussion if you have any further questions or feedback!

评论

Thank you for the revision and response. My most concerns are addressed. However, for Q6, why swapping fi(x)f_i'(x) and fi(x)f_i(x) deso not affect the overall expectation? I thought fif_i is drawn from PΘNP_{\Theta^N} and fif_i' is drawn from PΘNP'_{\Theta^N}. Then σi=1\sigma_i = -1 would swap the two loss items.

评论

Q3: The experiment part is relatively not clear. For example, in Figure 2, good to mention that λ\lambda is the weight decay, explain what the x-axis is, and discuss detail training parameters in the main text.

A3: Thank you for highlighting this issue. We have added further details in Section 5 (Line 418-420, 426-430, 463) to enhance readers' understanding of the experiments. In particular:

  • For models trained on MNIST, Fashion-MNIST, we set the number of epochs as 1010. For models trained on CIFAR-10, we set the number of epochs as 3030. We use the Adam optimizer with setting the learning rate as 10310^{-3}. We set the batch size as 6464.
  • The number of steps for attack is indicated by the xx-axis. And we denote λ\lambda as the weight decay.
  • We record the attack success rate (ASR), loss value, and the variance of model predictions with increasing the number of steps for attack. We use MI-FGSM [1] to craft the adversarial example and use the cross-entropy as the loss function to optimize the adversarial perturbation. Generally, the number of steps for the transferable adversarial attack is set as 1010 [1-4], but to study the attack dynamics more comprehensively, we perform 2020-step attack. In our plots, we use the mean-squared-error to validate our theory, which indicates the vulnerability from the theory perspective better.

[1] Boosting adversarial attacks with momentum. CVPR 2018.

[2] Ensemble Diversity Facilitates Adversarial Transferability. CVPR 2024.

[3] Boosting Adversarial Transferability by Block Shuffle and Rotation. CVPR 2024.

[4] Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability. CVPR 2022.

Q4: Minors:

  • Line 106: combine -> combines.
  • Line 153: the hypothesis space maps to a discrete label space, and then the loss function :Y×YR0+\ell: \mathcal{Y} \times \mathcal{Y} \mapsto \mathbb{R}_0^+ has a discrete domain 1,1×1,1\\{ -1,1 \\} \times \\{ -1,1 \\}, which is weird, may need some fix.
  • Line 279: the redundant phrase "provided in Appendix"
  • Line 1061: please define RR beforehand.
  • Line 1281 - 1290: seems that there is a missing 1N\frac{1}{N} coefficient before all i=1Nf(θi;x)\sum_{i=1}^N f(\theta_i;x).

A4: We sincerely thank you again for your time, effort, and meticulous review. We have carefully addressed the issues you raised in the revision:

  • "Line 106: combine -> combines." We fix it in the revision.
  • "Line 153: the hypothesis space maps to a discrete label space, and then the loss function :Y×YR_0+\ell: \mathcal{Y} \times \mathcal{Y} \mapsto \mathbb{R}\_0^+ has a discrete domain 1,1×1,1\\{ -1,1 \\} \times \\{ -1,1 \\}, which is weird, may need some fix." We make slight modification to the definition: Given the input space XRd\mathcal{X} \subset \mathbb{R}^d and the output space YR\mathcal{Y} \subset \mathbb{R}, we have a joint distribution P_Z\mathcal{P}\_\mathcal{Z} over the input space Z=X×Y\mathcal{Z} = \mathcal{X} \times \mathcal{Y}. The training set Z_train={z_iz_i=(x_i,y_i)Z,y_i1,1,i=1,,K}Z\_{\text{train}} = \{ z\_i| z\_i=(x\_i, y\_i) \in \mathcal{Z}, y\_i \in \\{ -1,1 \\}, i=1, \cdots, K \}, which consists of KK examples drawn independently from P_Z\mathcal{P}\_\mathcal{Z}.
  • "Line 279: the redundant phrase "provided in Appendix"." We fix it in the revision.
  • "Line 1061: please define RR beforehand." We define it in the revision.
  • "Line 1281 - 1290: seems that there is a missing 1N\frac{1}{N} coefficient before all _i=1Nf(θ_i;x)\sum\_{i=1}^N f(\theta\_i;x)." We add the 1N\frac{1}{N} coefficient in the revision.
评论

Thank you very much for your constructive comments! We address all your questions and concerns in the following responses.

Q1: The studied problem and practical implications may be limited. The analysis is only applicable for ensemble-based transfer attacks and it can only directly guide the design of more powerful attacks of this kind. How to leverage the analysis to better defend the model, or how to generalize the results beyond L2-bounded attacks, are worth further exploration.

A1: Thank you for your insightful question. While our paper primarily focuses on analyzing model ensemble attacks, our theoretical findings can also provide valuable insights for model ensemble defenses:

From a theoretical perspective:

  • The vulnerability-diversity decomposition introduced for model ensemble attacks can likewise be extended to model ensemble defenses. Mathematically, this results in a decomposition similar to conclusions in ensemble learning (see Proposition 3 in [1] and Theorem 1 in [2]), which shows that within the adversarial perturbation region, Expected lossEmpirical ensemble lossDiversity.\text{Expected loss} \leq \text{Empirical ensemble loss} - \text{Diversity}.
  • Thus, to improve model robustness (reduce the expected loss within the perturbation region), the core strategy involves minimizing the ensemble defender’s loss or increasing diversity.
  • However, there is also an inherent trade-off between these two objectives: when the ensemble loss is sufficiently small, the model may overfit to the adversarial region, potentially reducing diversity; conversely, when diversity is maximized, the model may underfit the adversarial region, potentially increasing the ensemble loss.
  • Therefore, from this perspective, our work provides meaningful insights for adversarial defense that warrant further analysis.

From an algorithmic perspective: we can consider recently proposed diversity metrics, such as Vendi score [3] and EigenScore [4]. Following the methodology outlined in [5], diversity can be incorporated into the defense optimization objective to strike a balance between diversity and ensemble loss. By finding an appropriate trade-off between these two factors, the effectiveness of ensemble defense may be enhanced.

Our theoretical analysis is the first to explore this relationship systematically in this field. We hope that our work not only provides valuable insights for the field of adversarial attacks but also inspires advancements in adversarial defenses.

[1] A Unified Theory of Diversity in Ensemble Learning. JMLR 2023.

[2] Diversity and Generalization in Neural Network Ensembles. AISTATS 2022.

[3] The Vendi Score: A Diversity Evaluation Metric for Machine Learning. TMLR 2023.

[4] INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection. ICLR 2024.

[5] Understanding and Improving Ensemble Adversarial Defense. NeurIPS 2023.

Q2: Some insights from the theory may not be justified enough. For example, in Line 333-335, the paper mentioned that we need to increase the diversity of parameters in surrogate models to reduce Hα()H_\alpha(\cdot). It seems that surrogate models need to be independently trained to achieve a minimal Hα()H_\alpha(\cdot). However, in practice, encouraging model diversity, e.g., introducing some diversity-promoting regularizers, can sometimes further improve the attack efficiency. As a result, encouraging model diversity introduces model-level dependency and increases Hα()H_\alpha(\cdot) but reduces transferability error. From this point of view, the theory may not reflect the full picture of adversarial transferability.

A2: Thank you for your question. Firstly, note that if we train each model independently, the resulting models could exhibit correlations (i.e., they are not diverse) due to being trained on similar image datasets (For instance, ResNet [1] and DenseNet [2] can both be trained using CIFAR-10, so they tend to output similar result for an image although they are trained independently. And such similar output exhibit correlations). Therefore, to reduce such correlations and improve diversity, the "encouraging model diversity" strategy you mentioned would make the models more diverse and increasing their independence. This, in turn, decreases Hα()H_\alpha(\cdot), aligning with the theoretical results in reducing the transferability error.

[1] Deep Residual Learning for Image Recognition. CVPR 2016.

[2] Densely Connected Convolutional Networks. CVPR 2017.

评论

Thank you for your recognition! We greatly appreciate your suggestions and feedback!

评论

Thank you very much for your follow-up question and for carefully reviewing our revisions and responses. We sincerely appreciate your attention to detail and the opportunity to clarify this point further. To make our proof more clear for readers, we have made the following slight modifications to two notations in the revision in Appendix B.3:

Let θN=(θ_1,,θ_N)\theta^N=(\theta\_1, \ldots, \theta\_N), θN=(θ_1,,θ_N)\theta'^N=(\theta'\_1, \ldots, \theta'\_N) that satisfy θN,θNP_ΘN\theta^N, \theta'^N \sim \mathcal{P}\_{\Theta^N}, and the m-th member is different, i.e., θ_mθ_m\theta'\_m \neq \theta\_m.

We also update the proof in Appendix B.3 accordingly.

Now your question becomes:

Why we can introduce Rademacher variables in this equation: \mathbb{E}\_{\theta^N, \theta'^N}\left\\{ \sup\_{z \in \mathcal{Z}} \frac{1}{N} \left[ \sum\_{i=1}^N \ell(f(\theta'\_i;x), y) - \sum\_{i=1}^N \ell(f(\theta\_i;x), y) \right]\right\\} \\ = \mathbb{E}\_{\boldsymbol{\sigma}} \mathbb{E}\_{\theta^N, \theta'^N}\left\\{ \sup\_{z \in \mathcal{Z}} \frac{1}{N} \left[ \sum\_{i = 1}^N \sigma\_i \left[ \ell(f'\_i(x), y) - \ell(f\_i(x), y) \right] \right]\right\\}.

And the answer is that

  • When σ_i=1\sigma\_i=1, the associated summand remains unchanged;
  • When σ_i=1\sigma\_i=-1, the associated summand flips signs, which is equivalent to swapping f_i(x)f\_i'(x) and f_i(x)f\_i(x).

More specifically, let's take N=1N=1 as an example. Now the equation becomes \mathbb{E}\_{\theta^N, \theta'^N}\left\\{ \sup\_{z \in \mathcal{Z}} \left[ \ell(f(\theta'\_i;x), y) - \ell(f(\theta\_i;x), y) \right]\right\\} \\ = \mathbb{E}\_{\boldsymbol{\sigma}} \mathbb{E}\_{\theta^N, \theta'^N}\left\\{ \sup\_{z \in \mathcal{Z}} \sigma \left[ \ell(f'\_i(x), y) - \ell(f\_i(x), y) \right] \right\\}.

For the right hand side of it:

$ & \mathbb{E}\_{\boldsymbol{\sigma}} \mathbb{E}\_{\theta^N, \theta'^N}\left\\{ \sup\_{z \in \mathcal{Z}} \sigma \left[ \ell(f'\_i(x), y) - \ell(f\_i(x), y) \right] \right\\} \\\\ = & \frac{1}{2}\mathbb{E}\_{\theta^N, \theta'^N}\left\\{ \sup\_{z \in \mathcal{Z}} \left[ \ell(f'\_i(x), y) - \ell(f\_i(x), y) \right] \right\\} + \frac{1}{2}\mathbb{E}\_{\theta^N, \theta'^N}\left\\{ \sup\_{z \in \mathcal{Z}} \left[ \ell(f\_i(x), y) - \ell(f'\_i(x), y) \right] \right\\} \\\\ = & \frac{1}{2}\mathbb{E}\_{\theta^N, \theta'^N}\left\\{ \sup\_{z \in \mathcal{Z}} \left[ \ell(f'\_i(x), y) - \ell(f\_i(x), y) \right] \right\\} + \frac{1}{2}\mathbb{E}\_{\theta'^N, \theta^N}\left\\{ \sup\_{z \in \mathcal{Z}} \left[ \ell(f'\_i(x), y) - \ell(f\_i(x), y) \right] \right\\} \\\\ = & \mathbb{E}\_{\theta^N, \theta'^N}\left\\{ \sup\_{z \in \mathcal{Z}} \left[ \ell(f'\_i(x), y) - \ell(f\_i(x), y) \right] \right\\}. $

The second line follows the definition of Rademacher variables. And the third line is because we can change the definitions of θN\theta^N and θN\theta'^N (since they follow the same distribution), we can conclude that the two terms equal to each other. They above can also be extended to any NR_+N \in \mathbb{R}\_+, which answers your question.

Thank you once again for thoroughly reviewing the proof details in our paper. We have carefully revisited our proof and made some adjustments to enhance its clarity. It is truly an honor to have a reviewer as meticulous and insightful as you, and we deeply appreciate the opportunity to improve our work based on your thoughtful suggestions.

If you have any additional suggestions or questions, please don’t hesitate to point out-we would be delighted to address them.

评论

Thanks for the response. My concerns are resolved. Hence, I increase my confidence score in the review.

评论

We sincerely thank all the reviewers for their detailed and constructive feedback, which we deeply value and appreciate. We are especially encouraged by the reviewers’ recognition of our work and their thoughtful comments:

  • Reviewer 6NcA: "The theoretical results are solid and novel. The theoretical results can have a broader impact, as the analysis tools, such as those for bounding dependent random variables and the empirical Rademacher complexity for ensemble, can be applied elsewhere."
  • Reviewer ignu: "By defining the transferability error, authors make a good analogy to generalization error and derive some corresponding results to provide a better understanding of model ensemble attacks."
  • Reviewer secg: "The paper demonstrates strong originality by addressing the theoretical gaps in model ensemble-based adversarial attacks, introducing the novel concepts of transferability error, vulnerability-diversity decomposition, providing well-founded upper bounds for transferability error."
  • Reviewer w7Xy: "The writing is clear with intuitive explanations. The derivations are self-consistent. The concluded practical guidelines are correct."

We work diligently and have carefully addressed all the reviewers' concerns through very detailed clarifications, providing individual responses to each reviewer. Additionally, we have incorporated the following modifications into the revised manuscript:

  • Lines 278-280: Discussed alternative definitions of diversity (Reviewer ignu).
  • Lines 281-283: Compared Theorem 1 with Lemma 5 in Yang et al. (2021), highlighting their complementary perspectives in analyzing transferable adversarial attacks (Reviewer ignu).
  • Line 320: Changed some constant terms in Theorem 2 to make it more solid (Authors).
  • Lines 324-326: Explained how Theorem 2 can be naturally extended to scenarios where the surrogate and target model distributions differ (Reviewer w7Xy).
  • Lines 327-330: Introduced another version of our theoretical framework using information-theoretic bound (Reviewer w7Xy).
  • Line 418-420, 426-430, 463: Provided more details of experiments, including how the surrogate models are trained, the meaning of "# steps", the adversarial example generation algorithm (Reviewer 6NcA, ignu, secg).
  • Line 481-490: Provided a discussion on the potential trade-off between diversity and complexity (Reviewer ignu).
  • Appendix B.3 (especially Line 1385-1386): Changed two notations to make the proof more clear (Reviewer 6NcA).
  • Appendix D.2.1: Provided a detailed discussion on previous definitions of diversity (Reviewer ignu).
  • Appendix D.3: Examined Lemma 5 in Yang et al. (2021) in depth (Reviewer ignu).
  • Appendix D.4: Proposed two approaches to generalize our theoretical framework: redefining the model space (Appendix D.4.1) and drawing insights from domain adaptation theory (Appendix D.4.2) (Reviewer w7Xy).
  • Appendix D.5: Provided an information-theoretic analysis as a natural extension to the theoretical framework established in this paper (Reviewer w7Xy).
  • Appendix D.9: Offered insights into model ensemble defense strategies (Reviewer 6NcA).
  • Appendix E: Provided the experimental results on CIFAR-100 dataset (Reviewer secg).

The above modifications are all highlighted in blue for the reviewers' convenience for now. We hope these revisions address the reviewers' concerns comprehensively and improve the overall clarity of our work.

Thanks again to all reviewers. We are glad to continue the discussion if there are any further questions.

AC 元评审

This paper introduces a theoretical framework for understanding transferable adversarial attacks, proposing concepts such as transferability error and a vulnerability-diversity decomposition, with practical guidelines supported by experiments on standard datasets. While the theoretical insights are novel and well-presented, the paper suffers from limited empirical scope, with experiments confined to standard datasets like MNIST and CIFAR-10, and a lack of testing on more complex, real-world scenarios. The definition of diversity diverges from established gradient-based metrics, raising questions about its alignment with prior work. Additionally, the paper offers limited practical utility for defense mechanisms or broader adversarial settings. Despite revisions, unresolved concerns around experimental clarity, trade-offs between complexity and diversity, and the applicability of findings suggest that the work, though promising, is not yet ready for publication.

审稿人讨论附加意见

During the rebuttal period, reviewers raised concerns about the theoretical assumptions and bounds, the divergence of the diversity definition from prior work, limited empirical scope, clarity of experimental settings, and the practical utility of the paper. The authors addressed these points by extending theoretical results to broader parameter distributions, elaborating on their diversity definition as complementary to existing metrics, adding experiments on CIFAR-100, and clarifying experimental details. While these responses partially resolved concerns, key issues remained, including the limited generalizability of the theory, insufficient empirical validation, and misalignment with prior diversity definitions. Despite the paper's promising theoretical contributions, these unresolved issues, combined with its limited practical impact, led to a recommendation to reject.

最终决定

Reject