Transferability Bound Theory: Exploring Relationship between Adversarial Transferability and Flatness
摘要
评审与讨论
The paper investigates the relationship between the transferability of adversarial examples and the flatness of adversarial examples. The paper shows that flatness alone is not sufficient to guarantee transferability. Based on this theoretical result, it derives an optimization method for adversarial examples that improves transferability. This proposed method is evaluated empirically and compared to a wide range of baselines.
优点
- sound theoretical motivation and result
- the assumptions are reasonable and clearly stated
- comprehensive empirical evaluation
- evaluation on real-world applications
- interesting insights on the link between flatness and transferability of adversarial examples
缺点
- the paper does not discuss how its insights can be used to improve defense mechanisms
- The presentation of the the proof of Thm. 3.1 could be improved.
问题
- Eq. 17: why is ?
- The usage of D in the statement of Thm. 3.1 makes a mapping of terms in the proof to the result unnecessarily cumbersome.
- How does the proposed method compare to PGN [1] which improves transferability through flatness?
- The computational problem of measuring flatness, i.e., the second order gradient components, can be alleviated by considering relative flatness [2], which has also been applied to adversarial examples [3]. Could this be used as an alternative or means to improve the proposed method?
[1] Ge, Zhijin, et al. "Boosting adversarial transferability by achieving flat local maxima." Advances in Neural Information Processing Systems 36 (2023): 70141-70161. [2] Petzka, Henning, et al. "Relative flatness and generalization." Advances in neural information processing systems 34 (2021): 18420-18432. [3] Walter, Nils Philipp, et al. "The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective." arXiv preprint arXiv:2405.16918 (2024).
局限性
The paper clearly states assumptions and limitations throughout the manuscript. It would, however, be beneficial to justify the theoretical assumptions and discuss the resulting limitations (e.g., the assumption on smoothness of the target distribution, or that probabilities go to zero for ).
Q1: The presentation of the the proof of Thm. 3.1 could be improved & The usage of D in the statement of Thm. 3.1 makes a mapping of terms in the proof to the result unnecessarily cumbersome.
Response: In response to your suggestion, we have revised the proof and replaced to enhance clarity and readability.
Q2: why is ?
Response: Natural samples are drawn from the real-world distribution that the model was designed to handle. In contrast, adversarial examples are not a direct product of this distribution; they are artificially crafted through specific techniques that exploit the model’s vulnerabilities. Therefore, their occurrence in real-world scenarios, where data is not typically manipulated in such a adversarial manner, is far less common, i.e., .
Q3: How does the proposed method compare to PGN [1] which improves transferability through flatness?
Response: We employ ResNet50 as the proxy model to craft adversarial examples for 1000 natural samples, using our attack and PGN. The results, summarized in the table below, demonstrate the superior performance of our attack over PGN. For instance, our attack achieves an impressive ASR of 99.7% on EfficientNet, significantly surpassing the 81.6% achieved by PGN.
| Target Model | PGN [1] | Ours |
|---|---|---|
| EfficientNet | 81.6 | 99.7 |
| VGG19 | 86.3 | 98.5 |
| ConvNet | 65.8 | 94.6 |
| ViT | 51.1 | 93.8 |
Q4: The computational problem of measuring flatness, i.e., the second order gradient components, can be alleviated by considering relative flatness [2], which has also been applied to adversarial examples [3]. Could this be used as an alternative or means to improve the proposed method?
Response: We have carefully read [2,3] and found them to be quite enlightening. Indeed, relative flatness shares a significant similarity to the second-order gradient component in our bound. Nonetheless, calculating relative flatness presents a considerable challenge, as shown in [3], which only addresses the relative flatness concerning the penultimate layer. Similarly, directly penalizing the relative flatness of adversarial examples poses substantial computational difficulty. We value the implications of these inspiring works [2,3] and will include them in the revised manuscript. We also consider exploring relative flatness as a promising avenue for future research.
Dear authors,
Thank you for your response. I appreciate the additional results on PGN.
Regarding the assumption I suggest making that an explicit assumption. While this assumption is intuitively reasonable, we can construct target distributions for which it is likely broken: E.g., a nearly uniform distribution with low-amplitude high frequency waves in the pdf would make it likely that an example is sampled close to a valley of the pdf and close examples likely have a higher probability.
Thank you for also answering my question regarding relative flatness.
This paper focuses on the transferability of adversarial examples. The authors first derive an upper bound for the transferability loss used in the paper. Then, they propose a new loss function based on the derived bound to increase the adversarial transferability. The proposed TPA method is tested in both classic models and real applications.
优点
- This paper is well-organized.
- The proposed method is tested in real-world applications.
缺点
- The theoretical claims lack a strong and important assumption. Theorem 3.1 is based on a strong assumption that the source model and the target model have the same loss on all inputs (), which is used in Line 482 in Appendix A. However, this strong and important assumption is not explicitly stated in Theorem 3.1. Moreover, this assumption is not verified and may be not practical in different models.
- Theorem 3.1 cannot well reflect the bound of the adversarial transferability of inputs. First, there seems to be a typo w.r.t. the definition of in Line 125. It is supposed to be . Second, according to the proof in Appendix A, the term on the left side of the inequality should be . Please check the claim and the proof. Third, Theorem 3.1 only provides a bound for the "transfer-related loss term", but the adversarial transferability of adversarial examples also depends on the local effectiveness term. Thus, it is unclear whether the loss function based on this bound conflicts with the local adversarial loss.
- The attack success rates in experiments are not reported with the error bar or the standard deviation.
- It is unknown in Section 6 how the used 100 adversarial examples are crafted, e.g., which proxy model is used, which dataset is used, and how 100 examples are selected.
- The evaluation in Section 6 is conducted by only one volunteer, which is unreliable.
问题
See the weakness part.
局限性
The authors did not discuss the limitations.
Question 1&2: The theoretical claims lack a strong and important assumption. & Theorem 3.1 cannot well reflect the bound of the adversarial transferability of inputs.
Response: For first issue, we do not assume (See the derivation below).
For second issue, it is a typo and it should be . As you mentioned, the left side of our bound (Theorem 1) should be in squared terms.
Finally, based on revised Theorem 3.1, we can derive the bound of adversarial transferability. Let us briefly restate the revised proof to address your questions.
Based on , we have .
Taking the -norm on both sides and then taking expectiation, we obtain .
The main result in Appendix A is Equation 21, which does not use . Therefore, we have
Combining the above equations, we get our revised bound (revised Theorem 1):
Now, let us consider the bound for adversarial transferability. Specifically, based on and taking the -norm on both sides and applying basic norm properties, we get:
Since and higher loss of on the proxy model than the target model, there is The bound of is already provided. Thus we can obtain the bound of (See Response for Reviewer #x5ig's Q1). The loss of adversarial examples on the proxy model is our "local effectiveness term". This inspires design of our attack, as shown in Equation 4 in the original manuscript, where the first term maximizes the local effectiveness term and the second term minimizes the bound about transfer-related loss term.
Question 3: The attack success rates in experiments are not reported with the error bar or the standard deviation.
Response: We have conducted experiments to calculate the error bars and standard deviations and included them in revised manuscript. Below are some results, where we use ResNet50 as the proxy model and run attacks in five trials to report (ASR standard deviation). Our method not only achieves ASRs but also enjoys smaller deviations.
| Attack | DenseNet121 | EfficientNet | InceptionV3 | ConvNet | ViT |
|---|---|---|---|---|---|
| RAP | 95.050.41 | 95.150.38 | 93.760.56 | 90.620.37 | 62.540.37 |
| BSR | 96.980.36 | 95.070.41 | 93.390.33 | 88.270.21 | 82.140.36 |
| Ours | 99.680.08 | 99.550.04 | 98.740.11 | 94.370.07 | 93.540.08 |
Question 4: It is unknown in Section 6 how the used 100 adversarial examples are crafted, e.g., which proxy model is used, which dataset is used, and how 100 examples are selected.
Response: We randomly select 100 samples from the benchmark evaluation dataset ImageNet [1]. Using these samples, we generate 100 adversarial examples with our attack method, employing the default hyperparameters and ResNet50 as the proxy model. We have added these details in the revised manuscript.
Question 5: The evaluation in Section 6 is conducted by only one volunteer, which is unreliable.
Response: We have recruited two additional volunteers to conduct evaluation. The table below reports the average scores and variance from all three volunteers (the original evaluator + two additional volunteers). The results show a high degree of consistency provided by the three volunteers, reinforcing the superior performance of our method.
| Score | Classification | Object Detection | Google Search | Bing Search | Yandex Search | Baidu Search | GPT-4 | Claude3 |
|---|---|---|---|---|---|---|---|---|
| 5 | 21 | 2.330.58 | 00 | 00 | 00 | 00 | 1.670.58 | 0.330.58 |
| 4 | 72 | 21.670.58 | 10.671.15 | 6.330.58 | 6.331.53 | 4.330.58 | 13.331.53 | 11.330.58 |
| 3 | 13.330.58 | 7.670.58 | 171 | 111 | 12.331.15 | 51 | 28.331.15 | 261 |
| 2 | 91 | 5.332.31 | 171 | 20.670.58 | 16.671.53 | 10.330.58 | 29.670.58 | 312.65 |
| 1 | 68.671.15 | 632.65 | 55.330.58 | 621 | 64.671.53 | 80.331.53 | 271 | 31.331.15 |
[1] On success and simplicity: A second look at transferable targeted attacks
Dear Reviewer,
We thank you for the precious review time and valuable comments. We have provided responses to your question and the weakness you mentioned. We hope this can address your concerns.
We hope to further discuss with you whether or not your concerns have been addressed appropriately. Please let us know if you have additional questions or comments. We look forward to hearing from you soon.
Best regards,
Authors
Dear Reviewer QVii,
Sorry to bother you again. With the discussion phase nearing the end, we would like to know whether the responses have addressed your concerns.
Should this be the case, we are encouraged that you raise the final rating to reflect this.
If there are any remaining concerns, please let us know. We are more than willing to engage in further discussion and address any remaining concerns to the best of our abilities.
We are looking forward to your reply. Thank you for your efforts in this paper.
Best regards,
Authors
Thank you for the response. Most of my concerns about experiments are addressed. However, after reading the responses to all reviewers, I still have concerns about the theoretical claims in the paper. For example, some assumptions and intuitions like , , are used in the proof but not clearly listed. These assumptions are all important assumptions for the proof, but most of them are simply claimed to be "natural" or "should be" without a formal discussion and verification in the paper. On the other hand, I'm also confused about Eq.(12) mentioned by Reviewer x5ig.
Thank you for your feedback. We have explicitly made the assumptions ( and ) in the revised manuscript to avoid any potential confusion.
To clarify, represents the probability that the target model correctly classifies , i.e., . Regarding and the confusion surrounding Equation 12, we have provided a detailed explanation in our response to Reviewer x5ig. Please refer to that for further details.
Concerning , there is substantial supporting evidence from literature [1,2,3] and practical considerations. Adversarial examples are artificially crafted through specific algorithms and optimization processes, tailored to particular models and tasks. In practice, the distributions we encounter are typically those generated by the natural processes that generate the data, and these do not favor adversarial examples over natural ones. As such, they are not naturally occurring, and their probability of appearance in the real world is significantly lower compared to natural samples.
Formally, let us consider the following derivation:
Since the ground-truth label for a specific natural sample is fixed, we can simplify the above equation to:
Intuitively, should satisfy , which means that does not increase the probability of the ground-truth label after being applied to . This is because is generated by a proxy model, which is trained on the data distribution . Thus, is counterintuitive and unlikely, unless the proxy model has learned a distribution that is negatively associated with . In practice, DNNs, especially those used in real-world applications, perform well on . For poorly performing DNNs, they may already have low accuracy on , and thus generating adversarial examples for them is trivial. Therefore, the assumption that is reasonable.
We hope these clarifications help to strengthen the rationale behind our assumptions and provide a clearer understanding of the context. We also highlight that this manuscript serves as the first theoretical study on adversarial example transferability, and we believe it can offer the community deeper insights into understanding adversarial example transferability.
We look forward to your reply and hope that this addresses your concerns.
[1] On the (Statistical) Detection of Adversarial Examples
[2] Out-of-Distribution Data: An Acquaintance of Adversarial Examples -- A Survey
[3] Interpreting Adversarial Examples in Deep Learning: A Review
Dear Reviewer QVii,
Thank you for your ongoing efforts in helping us improve the quality of this manuscript. We greatly appreciate the time and attention you have dedicated.
We have responded to your latest comments. Specifically, you mentioned concerns regarding the assumptions and Equation (12), which were initially pointed out by Reviewer x5ig and Reviewer dRSh. We are pleased to report that Reviewer x5ig and Reviewer dRSh have expressed satisfaction with our response, indicating that these concerns have been adequately addressed for them.
As the discussion period draws to a close, we would like to reach out to see if you have any remaining questions or unresolved issues. If everything is now clear, we would be grateful if you could consider updating your evaluation to reflect this.
Once again, thank you for your constructive feedback and for your invaluable contribution to the development of this manuscript. We look forward to hearing from you soon.
Dear Reviewer QVii,
Sorry to bother you again. We appreciate the time and attention you have dedicated to this manuscript. With only one day left in the discussion period, we are eager to hear your feedback on whether our recent response has addressed your concerns.
Notably, in your initial feedback, you indicated that the original concerns had been addressed. The remaining concerns stem from other reviewers' opinions. It is encouraged tha Reviewer x5ig and Reviewer dRSh indicated that these concerns have also been satisfactorily resolved.
If you have any remaining concerns, please do not hesitate to let us know; we are more than happy to clarify and respond. Engaging in this discussion with you has been a rewarding experience, and your feedback has significantly improved the quality of this manuscript.
We look forward to your feedback.
Best regards,
Authors
Thank you for the response. In the response about eq.(12), the fourth inequality is based on the claim that "the second derivative of is negative", but I think also contains and , which are not necessarily negative?
Many thanks for your feedback. Here, denotes the derivative of the -th element of , which is a scalar. Alternatively, you can think of it as the second derivative of (a scalar) with respect to the -th element of . Note that the second derivative of is , hence is always negative.
We will revise the original text to ensure clarity. If you have any further questions, please let us know, and we will be happy to clarify.
Yes, I know it means the second derivative of with respect to , so I'm considering . It should be , which is not always negative. When is a linear transformation, it is indeed negative, but is usually highly nonlinear in neural networks. Thus, I think the provided explanation is not convincing. Can you prove this term is always negative for nonlinear ?
Thank you very much for your feedback.
Firstly, the proxy models commonly employed are ResNet or DenseNet, which use ReLU activation functions. ReLU inherently introduces linear characteristics in the model. In other words, with ReLU can be seen as a piecewise linear function with respect to . To illustrate this, consider a specific example: The first derivative of with respect to is where is an indicator function that equals 1 when the condition is met and 0 otherwise. It can be seen that does not contain terms involving (the indicator function does not contribute to the gradient computation). Consequently, the second derivative of with respect to is zero. The second derivative of thus is negative.
In practice, attackers can choose any proxy model they prefer, so they can pick a DNN with ReLU or even replace other activation functions in their models with ReLU (if necessary, with some fine-tuning). This ensures that our theory is practical.
We hope that this can address your question, and we will clarify these details in the revised manuscript. If you have any further questions or require additional clarifications, please let us know.
Dear Reviewer QVii,
We hope our recent response can address your latest question. With only one hour remaining until the discussion period ends, we regret that we may not be able to address any new questions you may have. Should you have further inquiries or require clarifications, please leave a message. Rest assured, we will carefully read and address them in the revised version of the manuscript.
Your comments have significantly improved the quality of this manuscript. We appreciate your time and effort dedicated to this paper. Thank you for your active participation in the discussion period.
Best regards,
Authors
Thank you for the further clarification. I would like to raise my score accordingly. On the other hand, I strongly suggest the authors explicitly state and explain all these assumptions in the revised version.
Dear Reviewer QVii,
Thank you so much for your feedback. We have responded to your latest question. If there are any remaining questions on your end, please let us know. If not, would you kindly consider increasing your rating for this manuscript?
We appreciate your time and attention to this manuscript.
Best regards,
Authors
The paper proposes a theoretical investigation into the relationship between the flatness of adversarial examples and their transferability. The authors challenge the prevailing belief that flatter adversarial examples necessarily have better transferability. They introduce a new method called Theoretically Provable Attack (TPA), which optimizes a surrogate of the derived transferability bound, enabling the generation of more transferable adversarial examples. The paper includes extensive experiments demonstrating the effectiveness of TPA on various benchmarks and real-world applications.
优点
This paper addresses a highly worthy research topic. The theoretical understanding of adversarial transferability is still under-explored. The experimental results on benchmarks are also very impressive. The authors claim that merely constraining the gradient norm at the adversarial examples is not sufficient to enhance model transferability; it is also necessary to consider second-order gradient information.
缺点
The theoretical analysis in the paper establishes an upper bound that appears to be quite loose (due to Taylor approximations and inequality relaxations). As a result, it lacks sufficient insights into how this upper bound inspired the design of the TPA method presented in Equation (4) of the paper.
问题
-
In transfer attacks, the target model ( here) is black-box and unknown, while the statement in line 160 of the paper that "a proxy model needs only yield predictions for x that are closely aligned with those of the target model" is not feasible in actual attack scenarios.
-
Furthermore, in Equation (3), the derived upper bound is only related to the target model in the first term, which is confusing. It is noted that the derivation of the second-order gradient component comes from Equation (12). My question is why the gradient of the target model disappears from the fourth to the fifth line in Equation (12). Is this due to the application of the integration by parts formula?
-
Please provide a performance comparison of the TPA method with existing methods that only constrain the gradient norm [11, 41]. Is the core difference that TPA uses uniformly distributed noise?
If the authors can satisfactorily address the above questions, I can consider raising my score.
局限性
see the questions.
Question 1: The tightness of our bound.
Response: We would like to provide some clarifications regarding our bound. First, as pointed out by Reviewer QVii, the first and second terms in Eq.3 should be squared. The revised bound in Theorem 1 is: We denote the terms on the right-hand side of the inequality as . Moreover, in our response to Reviewer QVii, we detail the lower bound for the squared loss of the generated adversarial examples on the target model:
We here conduct an empirical evaluation to examine the effectiveness of our bound. We craft 1000 adversarial examples using ResNet50 against DenseNet121. Our estimates show that the sum of squared losses for the examples on the proxy and target models is approximately 280.90 and 76.37, with of 69.27. We calculate the value of to be 170.75, setting to 1 due to . The difference between 170.75 and 69.27 is somewhat non-trivial. However, when we translate this difference into probabilities, it becomes quite minor. Specifically, the loss for the target model on the generated adversarial examples should approximate (). This implies that the probability of the target model correctly classifying the examples is at most , which is a rather small number. In summary, our bound is indeed tight and practical, due to the exponential mechanism and the typically high loss of the generated adversarial examples in the target model. We also evaluate several other models, as detailed in the table below.
| Target Model | Probability Bound | |
|---|---|---|
| EfficientNet | 182.03 | 4.81 |
| InceptionV3 | 189.53 | 7.05 |
| MobileNetV3 | 161.91 | 1.83 |
| ViT | 215.56 | 0.0003 |
Question 2: The target model is black-box and unknown.
Response: What we commonly refer to as a black-box scenario actually permits some queries to the target model but restricts access to its architecture and parameters. This is common across various AI applications, e.g., Google's AI services, which allow users to input data and receive predictions. Moreover, a truly inaccessible target model would negate the possibility of feeding adversarial examples into it, making black-box attacks trivial. Therefore, limited access is practical and reflects actual attack scenarios.
Question 3: Why does the gradient of the target model disappear from the fourth to the fifth line in Equation (12).
Response: The probability of the target model correctly predicting should be higher than that of the proxy model, that is, . Consequently, we have Additionally, considering the second derivative of , which is we have: We have added a clear explanation of this step to improve the manuscript's readability.
Question 4: Comparison of the TPA method with [11, 41].
Response: We employ ResNet50 as the proxy model to craft adversarial examples for 1000 natural samples, using our attack and the attacks proposed in [11,41]. The results, summarized in the table below, demonstrate that our attack significantly outperforms those described in [11,41]. Notice that the attacks presented in [11] and [41] are almost identical in terms of their objective functions and optimization methods, resulting in nearly identical attack success rates.
| Target Model | [11] | [41] | Ours |
|---|---|---|---|
| EfficientNet | 81.6 | 81.2 | 99.7 |
| VGG19 | 86.3 | 86.5 | 98.5 |
| ConvNet | 65.8 | 65.5 | 94.6 |
| ViT | 51.1 | 50.7 | 93.8 |
Formally, the key difference between our attack and those in [11, 41] lies in the use of uniformly distributed noise. Despite its simplicity, this additional random noise plays a unique and fundamentally important role in enhancing performance, as our theoretical analysis illustrates. This is also the primary reason why our method achieves notably higher success rates compared to [11, 41].
[11] Boosting adversarial transferability by achieving flat local maxima
[41] Gnp attack: Transferable adversarial examples via gradient norm penalty
Dear Reviewer,
We thank you for the precious review time and valuable comments. We have provided responses to your question and the weakness you mentioned. We hope this can address your concerns.
We hope to further discuss with you whether or not your concerns have been addressed appropriately. Please let us know if you have additional questions or comments. We look forward to hearing from you soon.
Best regards,
Authors
Dear Reviewer x5ig,
Sorry to bother you again. With the discussion phase nearing the end, we would like to know whether the responses have addressed your concerns.
Should this be the case, we are encouraged that you raise the final rating to reflect this.
If there are any remaining concerns, please let us know. We are more than willing to engage in further discussion and address any remaining concerns to the best of our abilities.
We are looking forward to your reply. Thank you for your efforts in this paper.
Best regards,
Authors
Thank you for the detailed rebuttal. However, I still have concerns and a few follow-up questions.
-
You claimed that
when we translate this difference into probabilities, it becomes quite minor. However, an average squad loss of 10.05 on does not imply the probability of the target model correctly classifying specific example is at most . -
My second concern is still not addressed. First of all, is not true for all samples over . Besides, my question is why the gradient of the target model, , disappears from the fourth to the fifth line in Equation (12). The only explanation in my understanding is that you apply the integration by parts by letting and , then . However, is totally wrong as is the target model rather than the gradient of the proxy model .
-
The comparison results with [11, 41] are promising, demonstrating the effectiveness of optimizing the gradient norm around rather than . Does the effectiveness and contribution of TPA come from [11, 41] + SAM[1]?
[1] Sharpness-aware minimization for efficiently improving generalization.
Based on the above concerns, I still keep my original score at the current phase.
We appreciate your feedback. We would like to provide further clarification as we sense some misunderstandings here. We also look forward to your reply and hope that this addresses your concerns.
Question: Our bound.
Response: Notice that the expected square loss of the generated adversarial examples on the target model (DenseNet121) is . In statistical terms, while individual sample's squared loss may vary around this expected value, we expect that the majority of samples will exhibit comparable losses. To be more specific, we empirically evaluate the variance of the squared losses to be 54.62 (out of 1000 adversarial examples generated by our attack). According to Chebyshev's inequality, at least 96% of the samples should lie within 5 standard deviations of the expected value. Within this range, the squared losses of our generated adversarial examples should be at least .
This implies that our generated adversarial samples incur a loss greater than 8.56 on DenseNet121 with at least 95% probability. A loss of 8.56 indicates that the probability of correct classification by DenseNet121 for these samples is approximately 0.0002. In other words, out of every hundred samples, at least 95 can effectively mislead the target model. Doesn't this high probability of successfully attacking the target model sufficiently demonstrate the effectiveness and practicality of our bound?
Question: The effectiveness of TPA.
Response: As stated in our introduction, inspiring works such as [1, 11, 41] prompt us to investigate the theoretical relationship between flatness and the transferability of adversarial examples. Our theory suggests that penalizing the first and second-order gradients of generated adversarial examples can effectively enhance their transferability. Notably, prior works [1, 11, 41] did not include penalties on second-order gradient. Our proposed method is simple yet quite effective, which can generate more transferable adversarial examples by penalizing both one-second and second-order gradients via additional noise. We acknowledge the contributions made by these existing works in our introduction and clarify the distinctions between our method and theirs.
Thank the authors for your rebuttals carefully. My concerns about unclear details and experiments are well addressed. Thus, I would like to raise my final score. Besides, the assumptions must be explicitly presented and explained in the final version.
Question: The assumption about .
Response: Regarding the second concern, there are some typos in the appendix of the original paper. Let us consider the following derivation:
where . The second equality utilizes integration by parts, the third equality uses and the fourth equality leverages and the fact that the second derivative of is negative. This expression indeed derives
We rely on , a generally valid assumption. typically maximizes the loss function on the proxy model, implying optimally suits the proxy, i.e., . Moreover, we conduct experiments using BIM and our method to generate adversarial examples for 1000 natural samples on ResNet50. For DenseNet121, EfficientNet, VGG19, ConvNet, and Vision Transformer (ViT), all correctly predict these generated adversarial examples with higher confidence compared to the proxy model, meaning they assign a higher prediction probability to the ground-truth class. Specifically, in the presence of adversarial examples generated by our attack, DenseNet121, EfficientNet, VGG19, ConvNet, and ViT have correct prediction probabilities that are 61 times, 197 times, 406 times, 1058 times, and 3674 times higher, respectively, than those of the proxy model ResNet50 (note that since the proxy model's correct prediction probability for adversarial examples is typically very tiny, such as around , the target models are still easily misled by these adversarial examples). Thus, assuming in transfer attack scenarios is justified.
Furthermore, even if we consider the existence of peculiar samples satisfying , there is no need to investigate these further because already suggests that can trick the target model (since adversarial examples generated on the proxy model often mislead the proxy itself). In other words, transfer-based attacks are meaningful only if . For theoretical rigor, we can adjust our analysis as follows:
Here, we split the entire integration domain into two non-overlapping parts: where and where . Due to , we can disregard this term. This adjustment does not compromise the integrity of our theory; it merely restricts the integration interval to . We hope this can address your concerns.
We would like to express our sincere appreciation for the efforts and feedback from all reviewers. We have taken into account reviewers' comments and suggestions, which have greatly enriched the quality of this manuscript.
As noted by some reviewers, there are minor errors and ambiguities in our proof. We have fixed these issues to enhance the manuscript's readability. Importantly, these revisions do not affect the core insights and contributions of this manuscript.
This analysis is grounded in practical assumptions and unveils a nuanced relationship between the transferability of adversarial examples and their flatness. We believe that this insight will provide the community with a deeper understanding of transferability, paving the way for future research.
Moreover, we have incorporated reviewers' constructive suggestions regarding experiments and other relevant interesting literature.
Overall, the reviewers' expertise and constructive feedback have significantly enhanced the clarity and depth of this manuscript. We believe that the revised manuscript now presents a more convincing and compelling study, and we once again extend our heartfelt gratitude to the reviewers for their efforts and feedback.
After the rebuttal and discussions, reviewers all agree that the paper has made good contribution to the theoretical and practical understanding of adversarial transferability. AC would recommend acceptance for this paper, conditioned on the case that the following comments will be addressed by the authors. Note that some of comments are technical and need to be faced seriously for the correctness/soundness of the paper.
-
The assumptions of , and and the justification behind them should be clearly stated.
-
The proof of Theorem 3.1 needs a revision.
-
Some typos should be fixed.
Please take these comments into consideration in the revised version of the paper.