6.3

/10

Rejected4 位审稿人

最低5最高8标准差1.1

4.3

置信度

ICLR 2024

Revisiting DeepFool: generalization and improvement

Alireza Abdolahpourrostam,Mahed Abroshan,Seyed-Mohsen Moosavi-Dezfooli

OpenReview PDF

提交: 2023-09-23更新: 2024-02-11

TL;DR

In this work, we have introduced a family of parameter-free, fast, and parallelizable algorithms for crafting optimal adversarial perturbations.

摘要

Deep neural networks have been known to be vulnerable to adversarial examples, which are inputs that are modified slightly to fool the network into making incorrect predictions. This has led to a significant amount of research on evaluating the robustness of these networks against such perturbations. One particularly important robustness metric is the robustness to minimal $\ell_2$ adversarial perturbations. However, existing methods for evaluating this robustness metric are either computationally expensive or not very accurate. In this paper, we introduce a new family of adversarial attacks that strike a balance between effectiveness and computational efficiency. Our proposed attacks are generalizations of the well-known DeepFool (DF) attack, while they remain simple to understand and implement. We demonstrate that our attacks outperform existing methods in terms of both effectiveness and computational efficiency. Our proposed attacks are also suitable for evaluating the robustness of large models and can be used to perform adversarial training (AT) to achieve state-of-the-art robustness to minimal $\ell_2$ adversarial perturbations.

关键词

Deep LearningAdversarial robustnessAdversarial AttacksAdversarial training

评审与讨论

审稿意见

评分: 5置信度: 32023-10-19

This paper proposes a modification and improvement to the DeepFool adversarial attack method which aims to find minimal perturbations under the $\ell_2$ -norm. The authors identify that DeepFool does not generate minimal samples. Then they propose to guide DeepFool to optimal solution by an additional projection step. The proposed method is called SuperDeepFool (SDF) and is evaluated as an attack against previous methods, as an adversary during adversarial training, and as an adversary in the AutoAttack test bench.

优点

The paper identifies weaknesses of DeepFool and undermines their theoretical observations (perturbations not located on the decision boundary/not orthogonal) through experimental evaluations
Based on the identified weaknesses a new method is proposed to guide the attack to minimal examples
The method is compared to prior work and appears to generate more minimal perturbations on natural and adversarially trained models.
The method is integrated and evaluated with adversarial training.
The method is integrated and evaluated within the popular AutoAttack benchmark where it shows an improvement in attack rates and/or computational time compared to the original implementation.

缺点

SDF appears to consequently require more gradient steps than DF. As such the comparisons in Tab. 2/3/4 are hardly fair. As an additional row in the table, it would be fair to restrict the number of gradients in SDF to the number of DF steps.
As the goal is to improve DF, comparisons to DF are missing in Tab. 5/19/20.
The comparison in Sec. 4.3 against DDN is not fair and thus not meaningful: 1) SDF is trained with different hyperparameters - most notably with more epochs: (200 + 60) instead of (200 + 30). A fair comparison would require training with identical parameters. 2) The results are single-run/seed. Expressive results would require at least 3 runs and error bars - especially for unstable training methods like AT.
"SDF adversarially trained model does not overfit to SDF attack" - this statement can - in general - not faithfully be made on a single training run and should be either backed theoretically or empirically on multiple architectures (with reported error bars).
The research field has largely moved on from minimum-norm- and/or $\ell_2$ -based adversarial attacks and is mostly focusing on $\ell_\infty$ -bounded attacks (if at all). As such, this submission may not be that relevant to the ICLR community.

Minor:

Some sections are very hard to read as indirect citations are not in brackets.
In Sec. 2 $f_k$ is defined twice.
Wrong quotation marks in Tab. 8. In LaTeX: "xxx" instead of ``xxx''
Multiple tables in the appendix overflow
This is a personal taste, but I find it odd to cite (Long, 2015) as a breakthrough in computer vision. I'd have expected earlier works by Alex Krizhevsky or the likes

问题

How are perturbations "renormalized" for AutoAttack++?
Why does AA++ perform worse for R4?

伦理问题详情

As this paper deals with adversarial attacks and defenses it should add a "potential negative impact for society" section.

评论- Part I

2023-11-19

SDF appears to consequently require more gradient steps than DF.

Our claim is that, in comparison to other algorithms, SDF offers the best tradeoff between computational cost and accuracy. As illustrated in Figure 1, compared to DF, SDF incurs a slightly higher computational cost but significantly gains in accuracy.

By design, both SDF and DF do not allow control over the number of gradient computations. They typically stop once a successful adversarial example is found. Terminating the process prematurely could prevent them from finding an adversarial example. One naive way would be the following: limiting SDF to a single iteration would equate its computational cost to DF, but this would essentially make SDF identical to DF, hence not a useful comparison.

As the goal is to improve DF, comparisons to DF are missing in Tab. 5/19/20.

DF, developed nearly eight years ago, has been demonstrated in various instances [1], [2], and [4] to be fast but no longer the SotA in identifying minimal adversarial perturbations. Therefore, we primarily conducted our experiments using SotA $\ell_2$ attacks. Table 1 is the exception, where we included a comparison with DF, as it was pertinent to compare SDF to DF – given that SDF is a modified version of DF.

But for the sake of completeness, we will include DF in the tables as suggested by the reviewer. Here are the corresponding results for DF:

In Table 5: the median $\ell_2$ norm: 6.1, fooling rate: 96.2%.
In Table 19: the median $\ell_2$ norm: 1.51, fooling rate: 100%.
In Table 20: Naturally trained model: the median $\ell_2$ norm: 0.17, per-sample time: 0.21s. Adversarially trained models (R1): the median $\ell_2$ norm: 2.1, per-sample time: 1.32s.

The comparison in Sec. 4.3 against DDN is not fair…

Thank you for the suggestion. We reran our experiments using three different random seeds for training, and we set the number of epochs used in the DDN paper. Despite these adjustments, our findings remain consistent: SDF AT continues to outperform DDN AT.

Attack	SDF Mean (std)	SDF Median (std)	DDN Mean (std)	DDN Median (std)
DDN	$1.01~(0.05)$	$1.00~(0.01)$	$0.80~(0.08)$	$0.68~(0.05)$
FAB	$1.07~(0.06)$	$1.01~(0.03)$	$0.90~(0.1)$	$0.71~(0.2)$
FMN	$1.40~(0.2)$	$1.38~(0.4)$	$1.41~(0.2)$	$1.39~(0.3)$
ALMA	$1.07~(0.03)$	$1.01~(0.1)$	$0.80~(0.07)$	$0.68~(0.06)$
SDF	$\mathbf{1.00}~(0.01)$	$\mathbf{0.98}~(0.03)$	$0.80~(0.02)$	$0.69~(0.08)$

"SDF adversarially trained model does not overfit to SDF attack”…

We hope that the experiment done above has addressed this issue, too.

The research field has largely moved on from minimum-norm…

We respectfully disagree with the reviewer. The reasons for using $\ell_2$ norm perturbations are manifold:

We acknowledge that $\ell_2$ threat model may not seem particularly realistic in practical scenarios (at least for images); however, it can be perceived as a basic threat model amenable to both theoretical and empirical analyses, potentially leading insights in tackling adversarial robustness in more complex settings. The fact that, despite considerable advancements in AI/ML, we are yet to solve adversarial vulnerability, motivates part of our community to return to the basics and work towards finding fundamental solutions to this issue.
In particular, thanks to their intuitive geometric interpretation, $\ell_2$ perturbations provide valuable insights into the geometry of classifiers. They can serve as an effective tool in the "interpretation/explanation" toolbox to shed light on what/how these models learn.
Moreover, it has been demonstrated that (see, for example,[4] and [9]), $\ell_2$ robustness has several applications beyond security.
While not the most compelling reason, the continued interest of the community in $\ell_2$ perturbations is evidenced by publications in top venues, including FMN [7] in NeurIPS 2021, DDN [5] in CVPR 2019, ALMA [10] in ICCV 2021, FAB [8] in ICML 2021, among many others.

How are perturbations "renormalized" for AutoAttack++?

For a given $\epsilon$ for AA++, if the norm of perturbation generated by SDF is greater than $\epsilon$ , we apply a scalar multiplication to bring its $\ell_2$ -norm down to $\epsilon$ . Hence, we ensure that all the norms of all SDF perturbations are at most $\epsilon$ .

评论- Part II

2023-11-19

Why does AA++ perform worse for R4?

For this particular model, we identified a few samples that SDF could not successfully fool, whereas APGD $^\top$ did. Therefore, we explored an alternative version of AA++ where we included SDF alongside the set of attacks, rather than replacing APGD $^\top$ with it, ensuring the preservation of AA’s original performance.

[1]: Rony, Jerome et al. "Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.

[2] Carlini, Nicholas and David A. Wagner. “Towards Evaluating the Robustness of Neural Networks.” 2017 IEEE Symposium on Security and Privacy (SP) (2016)

[3] Madry, Aleksander et al. “Towards Deep Learning Models Resistant to Adversarial Attacks.” ArXiv abs/1706.06083 (2017)

[4]: Pintor, Maura et al. "Fast minimum-norm adversarial attacks through adaptive norm constraints." Advances in Neural Information Processing Systems, vol. 34, pp. 20052-20062, 2021.

[5]: Croce, Francesco and Hein, Matthias. "Minimally distorted adversarial examples with a fast adaptive boundary attack." International Conference on Machine Learning, pp. 2196-2205, 2020, PMLR.

2023-11-21

Thank you for the rebuttal. The rebuttal has addressed some of my concerns. While I am still not in favor of acceptance - mostly due to the contribution - I revised my recommendation to " 5: marginally below the acceptance threshold" and have updated the presentation and contribution scores.

2023-11-21

We thank the reviewer for revising their score. To better address your remaining concerns, particularly regarding the contributions of our work, we would greatly appreciate more specific feedback.

We would like to bring to your attention that in Appendix F of our paper, we have dedicated a section to discussing how SDF has led us to an interesting observation regarding the effect of pooling layers on the adversarial robustness. It further highlights the importance of developing minimum-norm adversarial attacks.

2023-11-22

My main argument for not increasing the score above the acceptance level is the limited relevance to the community. I acknowledge that there may be some use cases, but largely this is a minor spin on a niche/old topic. This is not addressable in the current rebuttal but in case the paper is not accepted, I'd suggest dropping out some sections like AT and showing why the proposed method is practical in today's DL landscape (e.g., you mention applications beyond security). You could show an example.

Then some of the other weaknesses have not been addressed.

W1: I understand that early-stopping may not be reasonable for a robustness evaluation, but it would establish a comparison to the original DeepFool.
W4: I would recommend to change the wording. Experiments with more seeds give more evidence but no proof. You could provide even more evidence by testing different architectures, but overall this section is not that important for the paper (I think other reviewers have mentioned this as well).

Regarding pooling: To be fair I only quickly scanned it but it feels heavily disconnected from the rest of the paper. Besides, there have been multiple works analyzing pooling in the context of robustness, e.g., Grabinski et al. "FrequencyLowCut Pooling - Plug & Play against Catastrophic Overfitting", ECCV 2022. If you want to make a story out of this, you'd have to frame it in the context of previous work.

2023-11-22

W1: I understand that early-stopping may not be reasonable for a robustness evaluation, but it would establish a comparison to the original DeepFool.

We are going to include a query-distortion curve comparing DF and SDF in Appendix K, which might address this concern.

审稿意见

评分: 8置信度: 52023-10-22

This paper proposes an improved version of the DeepFool (DF) attack. This new algorithm achieves better convergence and finds smaller perturbations compared to DF, by enforcing additional optimization strategies based on geometrical observations on the properties of the decision boundary.

优点

clear writing and presentation
extensive evaluation

缺点

the other attacks are tested with default hyperparameters
the attack should produce valid perturbations

问题

Major comments

The paper is well written and clear and easy to read. I believe it is over the acceptance threshold, however I suggest some improvements that would improve this work.

First, limitations are not discussed. The authors should describe the limitations that at least should include:

the attack is not adaptive per-se, hence it can be blocked by gradient obfuscation or other defenses that break gradient descent. Is the attack easy to adapt in these cases (e.g., application of EoT to smooth the loss landscape, change of loss to incorporate detectors, ...)
the attack is still an empirical attack, thus the authors should discuss that even with the best optimization process there are no guarantees that the points found are global minima of the optimization function.
the attack formulation only works for the L2 norm
the attack is not tested in the targeted version.

The authors should also improve the evaluation to include additional insights and analyses:

analysis of complementarity with AA, do the two attacks find smaller perturbations for different points? This could suggest whether the best strategy is to add it to the ensemble or to remove directly some of the sub-optimal attacks of AA (e.g., FAB that seems sub-optimal w.r.t. the proposed method). Additionally, in AA the attacks are used in a cascade manner, i.e., if the first attack finds an adversarial perturbation within the eps bound, the other attacks are not launched. Would SDF be launched before or after, e.g., APGD?
the other attacks are tested with default hyperparameters, and the authors report that the performances degrade when changing the datasets. However, testing the attacks with a set of hyperparameters could potentially reveal that the proposed approach is sub-optimal w.r.t. the other attacks, while still defeating them because of the parameter-free advantage.
additionally, it would be interesting to see the query-distortion curves, such as in other related works (e.g., FMN), as the median is only a compact measure of the performances
the attack should also be tested and compared in the targeted version, otherwise, the authors should still list this as a limitation (as it has not been tested, thus it is difficult to evaluate whether this is the best approach in the case of targeted attacks). gradient computation $\neq$ fast attack
to claim that the attack finds perturbations "quickly", the authors should perform a per-query runtime comparison such as the one done in other papers (FMN, ALMA) to compare the efficiency of the attack in a fair manner. The attack seem to win in any case (given the runtimes reported in the appendix), however a per-query comparison would still be interesting to see, given that there are additional operations within the steps (even if they are probably negligible w.r.t. the time for computing the gradients).

On the formulation, the attack seems to be clear enough, however the attack might produce invalid perturbations in some domains (e.g., images)

we clip the pixel-values of SDF-generated adversarial images to [0, 1], consistent with the other minimum-norm attacks

The attack algorithm should enforce this constraint "by default", as for the majority of applications there are input space constraints and they should be enforced to produce valid perturbations. The authors should also discuss what happens when they enforce this constraint, as, for example, it is probably not possible anymore to enforce the orthogonality with the decision boundary. For this reason, also the plots added in the appendix comparing the orthogonality of FMN and CW might be biased by this additional constraints (while the comment on the line search is not clear, as FMN does not perform line searches unless initialized from the adversarial class).

Clarifications needed

There are some aspects that should be clarified to improve the quality of the paper.

For an optimal perturbation, we expect these two vectors to be parallel

The authors should clarify this aspect to make the paper accessible to a broader audience. Additionally, a figure with a 2D example might also help clarify this observation.

DeepFool, C&W, FMN, FAB, DDN, and ALMA are minimum-norm attacks, and FGSM, PGD, and momentum extension of PGD Uesato et al. (2018) are bounded-norm attacks.

The authors should state that the bounded-norm attacks solve a different problem than (1).
As it is a scalar multiplied by the gradient, is the first part of eq. 4 an estimation of the step size? Maybe this could be specified in the text to make the description clearer.
The authors should discuss why the other attacks are not parallelizable, as they state that attacks for AT should be and CW, for example, is not.

Comments on figures

caption of Figure 3 should be improved by clarifying all points shown (e.g., $x_2$ ). Moreover, f(x) used to denote the decision function does not seem to be defined in the text.

评论- Part II

2023-11-20

The other attacks are tested with default hyperparameters…

To ensure a fair comparison, for each dataset, we selected hyperparameters for each attack based on those recommended in their respective papers. The assumption was that the authors of these papers have either fine-tuned these parameters or claimed that their methods are not highly sensitive to them.

Given that comprehensive hyperparameter tuning for each attack across every model and dataset would be infeasible for us, we experimented with different variations of the attacks by varying their number of iterations (as detailed in Table 3). We consistently aimed to select this parameter so that it would work in favor of their performance. However, we acknowledge the possibility that there may still be hyperparameter settings for each (attack, model, dataset) combination that could equalize the performance of all attack algorithms. Similar scenarios have been observed previously in the context of training algorithms and GANs, (Lucic et al. “Are GANs Created Equal? A Large-Scale Study”, NeurIPS 2018.)

This further highlights the importance of designing algorithms with fewer number of hyperparamters such as SDF.

additionally, it would be interesting to see the query-distortion curves...

Many thanks for this suggestion.

Note that, SDF (and DF) do not allow control over the number of gradient computations. They typically stop once a successful adversarial example is found. Terminating the process prematurely could prevent them from finding an adversarial example. Hence, we opted to plot the median norm of achievable perturbations for a given maximum number of queries. Although this is not directly comparable to the query-distortion curves in FMN, it provides a more comprehensive view of the query distribution than the median alone. The figures are placed in the Appendix K.

the attack should also be tested and compared in the targeted version…

Thanks for this suggestion. If time allows, we will include this experiment; otherwise, we will mention it as a future direction.

to claim that the attack finds perturbations “quickly”…

It may be helpful to differentiate between two distinct objectives: A) With a fixed “computational budget”, which algorithm finds the smallest perturbation. B) For a given a level of distortion, which algorithm requires less computation.

While there is not a straightforward method to directly address points A and B for SDF (as mentioned above), our results suggest that SDF is the fastest for “very small” distortions. Also, with a “very small” number of queries, it tends to find the smallest perturbation. It was the message we aimed to convey in Figure 1.

We also measured the number of forward and backward passes separately for a robust WRN-28-10 model. The median values for the forward and backward passes of SDF are 50 and 32 per sample, respectively. The time spent on other operations was less than 1% of the total time. It achieves a median norm of 0.77. While FMN-100 achieves a median norm of 1.43 using 100 forward and 100 backward passes per sample, FMN-1000 reaches a median norm of 0.87 with 1000 forward and 1000 backward passes.

The attack algorithm should enforce this constraint "by default"...

We report all our results with clipping as the final step in the algorithm. We acknowledge the reviewer's point that this might not be enough IF our focus is on the "real-world" security implications of these attacks. However, if the purpose is to use them for data augmentation, for example, then such a requirement might not be needed anymore.

Anyway, we played with the in-loop clipping as well, yet observed no noticeable difference between the two methods in our experiments.

Clarifications

The authors should clarify this aspect to make the paper…

We will clarify this sentence in the text, and will include a figure in the Appendix XX to visualize the concept.

The authors should state that the bounded-norm attacks solve a different problem than (1).

Thanks for the suggestion. Indeed. We modify the text to clarify this distinction.

As it is a scalar multiplied by the gradient, is the first part of eq. 4 an estimation of the step size?

Yes, indeed. We will make it clear in the text.

The authors should discuss why the other attacks are not parallelizable, as they state that attacks for AT should be and CW, for example, is not.

Other algorithms usually use perform one backward pass per iteration, resulting in these backward passes being non-parallelizable as they need to be performed sequentially. However, in the case of SDF, some of the backward passes can be executed in parallel, especially the backward passes that are carried out in each iteration of the inner loop in the multi-class case.

评论- Official response from reviewer VFL7

2023-11-20

I acknowledge the authors' responses and thank them for their time in clarifying the aspects of their work. My points of concern have been fully addressed, and I stand by "the importance that fundamentally 'classic' approaches, if done right, might still serve as a strong baseline". I trust that the authors will improve the discussion on the limitation and will consider my feedback for their revision.

I would be interested in seeing the targeted results and the results with the clipping operation, as they would demonstrate even further that the approach is competitive even outside the "comfort" zone under which it has been stress-tested. As the authors state "we played with the in-loop clipping as well, yet observed no noticeable difference between the two methods in our experiments", I encourage them to add this result to the results tables (perhaps highlighting if and when they see a noticeable difference). Even a small experiment could be enough to strengthen the paper even more.

After all, I will raise my score for this paper, as it is clear to me that the authors invested time and effort in considering all aspects of the paper.

评论- Part I

2023-11-20

We appreciate the reviewer's constructive feedback and positive view of our work. We address the questions raised in the following:

Limitations

We are going to add a new section in the paper to discuss the limitations of our work (Appendix L). However, let us first elaborate on them here.

The attack is not adaptive per-se…

We acknowledge that, generally, extending geometric attacks to such cases is not immediately apparent. However, we do not view this as an inherent limitation of geometric attacks like DF and SDF. Instead, we believe the community's focus to obtain adaptive attacks has primarily been on loss-based attacks such as PGD, which allow for easy modification of the loss. Investigating ways to adapt geometric attacks could be of interest, particularly due to their computational benefits

The attack is still an empirical attack…

This indeed can be said about almost any gradient-based optimization technique applied to non-convex problems. As such, we thought it is evident from the context that there is no guarantee for finding the globally optimal perturbations for state-of-the-art neural networks—neither for us nor for any other attacks we are aware of. We apologize if our wording caused such confusion. If there is any specific sentence in the text implying otherwise, we would be grateful if the reviewers could point them out to us.

The attack formulation only works for the $\ell_2$ norm.

We had reasons to limit the scope of the paper to $\ell_2$ perturbations. We believed this approach would more clearly convey our meta-message, as summarized by Reviewer 34Xw: ”the importance that fundamentally 'classic' approaches, if done right, might still serve as a strong baseline". Anyway, we tried a simple idea to see the potential of our approach in finding $\ell_\infty$ perturbations, and it worked. We replaced the orthogonal projection with the $\ell_\infty$ projection.

We present our findings about the $\ell_\infty$ norm for two robust models M1 and M2 in the following table.

Attacks	M1	FR	Grads	M2	FR	Grads
DF	0.031	96.7	24	0.043	97.4	31
FAB	0.025	99.1	100	0.038	99.6	100
FMN	0.024	100	100	0.035	100	100
SDF	0.019	100	33	0.027	100	46

Improvements

Analysis of complementarity with AA, do the two attacks find smaller perturbations for different points? This could suggest whether the best strategy is to add it to the ensemble or to remove directly some of the sub-optimal attacks of AA (e.g., FAB that seems sub-optimal w.r.t. the proposed method). Additionally, in AA the attacks are used in a cascade manner, i.e., if the first attack finds an adversarial perturbation within the eps bound, the other attacks are not launched. Would SDF be launched before or after, e.g., APGD?

Our decision to replace APGD $^{\top}$ with SDF was primarily motivated by the former being a computational bottleneck in AA. As it is shown in Table 8, AA and AA++ achieve similar fooling rates, with AA++ being notably faster. Following your suggestion, we compared the sets of points that were fooled or not fooled by SDF/APGD $^{\top}$ across 1000 samples ( $\epsilon=0.5$ ). The results indicate that both algorithms fool approximately the same set of points, differing only in a handful of samples for this epsilon value. Therefore, the primary benefit of using SDF is the reduction in computation time.

	SDF Can Fool	SDF can't Fool
APGD $^{\top}$ Can Fool	995	2
APGD $^{\top}$ Can't Fool	3	5

We are currently conducting experiments to provide a comprehensive analysis, which includes substituting SDF with various attacks, integrating SDF into the mix, etc. Given that they are time-consuming, it may not be feasible to complete them all by Nov 22. Anyway, we intend to include the full set of results in the final version of our work.

审稿意见

评分: 6置信度: 52023-10-25

The paper proposes SuperDeepFool, an extension of the well-known DeepFool adversarial attack, which is more accurate and less time-consuming. Although DeepFool tries to find minimal adversarial perturbation, it has a drawback: It doesn't search for the perturbation orthogonal to the decision boundary. Authors propose to solve the issue approximately by applying DeepFool and Orthogonality in an alternative manner. Superiority of SuperDeepFool was shown on MNIST, CIFAR, and ImageNet.

优点

The revisiting of the 8 year-old 'classic' attack DeepFool and be able to find its drawback and improve it is novel. It shows the importance that fundamentally 'classic' approaches, if done right, might still serve as a strong baseline.
The proposed method is more accurate and less time consuming in all benchmarks.
Decent theoretical analysis and vivid geometric illustration.
Perturbations are not only evaluated on naturally trained models, but also on adversarially trained models.
It improves adversarial training and autoattack.

缺点

DeepFool was proposed and evaluated not only for $\ell_2-$ adversarial robustness, but also had formulas (Holder's inequality) and evaluations for any $\ell_p$ (including $\ell_0$ , $\ell_1$ , $\ell_\infty$ ) norm balls. In contrast, SuperDeepFool is only proposed and evaluated for $\ell_2$ , although claims to be more generalized version of DeepFool.
Only convolutional networks were used in the experiments, while it is known that vision transformers are robust learners.
Few additional hyperparameters $(n, m)$ are added and there is no methodology how to select them right

Minor weaknesses:

Lots of space could be saved by combining tables and figures in one row (for example Table 3 and Table 8)

问题

Why does white-box $\ell_2$ adversarial robustness matter? What is the practical scenario where high-frequency pixel-level $\ell_2$ perturbations, crafted with full access to the model and gradients, might cause problems? Maybe, it would be nice to show that more $\ell_2$ robust models have some other useful properties.
Why median $\ell_2$ and number of grads are used as the metric, while DeepFool used $\rho = \frac{1}{|D|}\sum_{x\in D} \frac{\|r(x)\|_2}{\|x\|_2}$ ?
Why does experiment with adversarial training with SuperDeepFool not have models trained with PGD adversarial examples?

$**Overall,**$

I liked the paper and its motivation. Authors are encouraged to reply, and I might consider raising my score if the weaknesses and questions are addressed.

评论- Part I

2023-11-19

We are glad that the reviewer has liked several aspects of our work. We have done our best to address the questions and comments. Please let us know if there is anything more we can do to positively influence your opinion and strengthen the paper.

DeepFool was proposed and evaluated not only for $\ell_2$ adversarial robustness, but also had formulas (Holder's inequality) and evaluations for any $\ell_p$ (including $\ell_0$ , $\ell_1$ , $\ell_\infty$ ) norm balls….

It should be noted that the focus of our paper, as stated in the abstract, is on minimal $\ell_2$ -norm perturbations. If the reviewer has identified any sentences that suggest otherwise, please let us know. We would greatly appreciate your feedback. Whenever we mention DF, we refer to its $\ell_2$ variant. Moreover, up to our knowledge, DF was primarily introduced for $\ell_2$ -norm, and its application to general $\ell_p$ -norm perturbations was not extensively tested in the original paper.

Nevertheless, we tried an $\ell_\infty$ version of SDF by replacing the orthogonal projection with the $\ell_\infty$ projection (Holder's inequality). The table below shows our results for $\ell_\infty$ on M1 [1] and M2 [5]. Our results show that this version of SDF also outperforms other algorithms in finding smaller $\ell_\infty$ perturbations. We will add this result to the Appendix.

Attacks	M1	FR	Grads	M2	FR	Grads
DF	0.031	96.7	24	0.043	97.4	31
FAB	0.025	99.1	100	0.038	99.6	100
FMN	0.024	100	100	0.035	100	100
SDF	0.019	100	33	0.027	100	46

Only convolutional networks were used in the experiments…

Many thanks for the suggestion. Given our available computational resources, we conduct experiments on a ViT-B-16 [2] trained on CIFAR-10, achieving 98.5 $\%$ accuracy. The results are summarized in the following table:

Attack	FR	Median- $\ell_{2}$	Grads
DF	98.2	0.29	19
ALMA	100	0.12	100
DDN	100	0.14	100
FAB	100	0.14	100
FMN	99.1	0.15	100
C&W	100	0.15	91,208
SDF	100	0.10	32

As seen, this transformer model does not exhibit significantly greater robustness compared to CNNs, with only a negligible difference of 0.01 compared to a WRN-28-10 trained on CIFAR-10 (0.09 in Table 3 of the paper). These results support the notion that there might not be a substantial disparity between the adversarial robustness of ViTs and CNNs. This aligns with the findings of [3]. They argue that earlier claims of transformers being more robust than CNNs stems from an unfair comparison and evaluation methods. We believe that thorough evaluations using minimum norm attacks could be helpful in resolving this debate.

Few additional hyperparameters $(n,m)$ are added and there is no methodology how to select them right.

We regard $(m,n)$ as a way to parameterize a class of algorithms, where each pair corresponds to an algorithm. As shown in Table 2 of the paper, all these variants yield comparable results. However, SDF( $\infty$ ,1) empirically demonstrates a slightly superior performance (and hence is considered the default algorithm). For a new dataset, it is important to note that SDF( $\infty$ ,1) might not be always the optimal algorithm. This is why we kept the class of algorithms general (it is needless to say that even to train a model on a new dataset, even the selection of architecture, optimization algorithm, etc. are not known apriori, and many methods used to determine those, can be equally applied here.)

评论- Part II

2023-11-19

Why does $\ell_2$ white-box adversarial robustness matter?

The reasons for using $\ell_2$ norm perturbations are manifold:

We acknowledge that $\ell_2$ threat model may not seem particularly realistic in practical scenarios (at least for images); however, it can be perceived as a basic threat model amenable to both theoretical and empirical analyses, potentially leading insights in tackling adversarial robustness in more complex settings. The fact that, despite considerable advancements in AI/ML, we are yet to solve adversarial vulnerability, motivates part of our community to return to the basics and work towards finding fundamental solutions to this issue.
In particular, thanks to their intuitive geometric interpretation, $\ell_2$ perturbations provide valuable insights into the geometry of classifiers. They can serve as an effective tool in the "interpretation/explanation" toolbox to shed light on what/how these models learn.
Moreover, it has been demonstrated that (see, for example,[4] and [9]), $\ell_2$ robustness has several applications beyond security.
While not the most compelling reason, the continued interest of the community in $\ell_2$ perturbations is evidenced by publications in top venues, including FMN [7] in NeurIPS 2021, DDN [5] in CVPR 2019, ALMA [10] in ICCV 2021, FAB [8] in ICML 2021, among many others.

A note on the spectral properties of $\ell_2$ perturbations: It is not entirely accurate to categorize $\ell_2$ perturbations as “high-frequency”. On the contrary, these perturbations tend to align more with "low-frequency" bands (see, e.g., [11]).

Why median and number of grads are used as the metric, while DeepFool used another one?

To ensure a fair comparison, we employed the metrics commonly used in recent literature on adversarial robustness. Specifically, we employed the two most prevalent metrics: the median/mean of perturbation norms (taken over dataset samples) to evalaute the method's accuracy, and the number of backward passes to measure the computational complexity.

Nevertheless, one could argue that DF's $\rho$ (which normalizes the perturbation norm by the input sample norm) is conceptually similar to the mean of norms of perturbations. This is especially true given that the norm of samples tend to be quite concentrated for high dimensional data like CIFAR-10 and ImageNet. Hence, it can be said that they are almost equivalent up to a scaling factor. We can compute this metric, though, if the reviewer insists on its importance.

Why does experiment with adversarial training with SuperDeepFool not have models trained with PGD adversarial examples?

We excluded PGD AT, as DDN already outperformed it. For completeness, here is a modified version of Table 6 that includes PGD AT results.

Attack	SDF (Ours) Mean	SDF (Ours) Median	DDN Mean	DDN Median	PGD AT Mean	PGD AT Median
DDN	1.09	1.02	0.86	0.73	0.68	0.61
FAB	1.12	1.03	0.92	0.75	0.84	0.71
FMN	1.48	1.43	1.47	1.43	1.31	1.27
ALMA	1.17	1.06	0.84	0.71	0.74	0.70
SDF	1.06	1.01	0.81	0.73	0.69	0.64

Lots of space could be saved by combining tables and figures in one row (for example Table 3 and Table 8)

Thank you for this comment. ٌe have attempted to improve the presentation by putting Tables 3 and 8 in-line with the text.

评论- Part III (References)

2023-11-19

[1]: Madry, Aleksander, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. "Towards deep learning models resistant to adversarial attacks." arXiv preprint arXiv:1706.06083, 2017.

[2]: Dosovitskiy, Alexey et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929, 2020.

[3]: Bai, Yutong et al. "Are transformers more robust than cnns?" Advances in neural information processing systems, vol. 34, pp. 26831-26843, 2021.

[4]: Ortiz-Jiménez, Guillermo, Apostolos Modas, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. "Optimism in the face of adversity: Understanding and improving deep learning through adversarial robustness." Proceedings of the IEEE 109, no. 5 (2021): 635-659.

[5]: Rony, Jerome et al. "Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.

[6] Carlini, Nicholas and David A. Wagner. “Towards Evaluating the Robustness of Neural Networks.” 2017 IEEE Symposium on Security and Privacy (SP) (2016)

[7]: Pintor, Maura et al. "Fast minimum-norm adversarial attacks through adaptive norm constraints." Advances in Neural Information Processing Systems, vol. 34, pp. 20052-20062, 2021.

[8]: Croce, Francesco and Hein, Matthias. "Minimally distorted adversarial examples with a fast adaptive boundary attack." International Conference on Machine Learning, pp. 2196-2205, 2020, PMLR.

[9]: Engstrom, Logan, et al. "Adversarial robustness as a prior for learned representations." arXiv preprint arXiv:1906.00945 (2019).

[10]: Rony, Jerome, Luiz G. Hafemann, Luiz S. Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric Granger. "Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.

[11]: Ortiz-Jimenez et al. “Hold me tight! Influence of discriminative features on deep network boundaries”, Advances in Neural Information Processing Systems, 2020.

2023-11-22

Thanks for your rebuttal! Many of my comments were addressed in details.

I agree that adversarial robustness is less security problem and more deep learning theory problem. What are the potential insights can be concluded by trying hard to find the minimal possible distance to decision boundaries of classification models with limited number of categories?

Another question: Although SDF adversarially trained model improves robustness to 30 %, how does it change the natural accuracy? How's it compared to other adversarially trained techniques that seek accuracy-robustness trade-off?

2023-11-22

We are happy that we could address many of your concerns.

Why $\ell_p$ ?

Several possibilities have been explored in the existing literature. For instance, training on $\ell_p$ -norm adversarial examples is shown to be a form of spectral regularization [A], while adversarial perturbations—viewed as counterfactual explanations—are linked to saliency maps in image classifiers [B]. The fast yet accurate generation of such perturbations is crucial for the empirical study of these phenomena. Additionally, minimal $\ell_p$ adversarial perturbations can be viewed as "first order approximations of the decision boundary", shedding light on the local geometric properties of models in the vicinity of data samples. It similarly demands fast yet accurate methods for exploration. Furthermore, these minimal adversarial perturbations offer a data-dependent, worst-case analysis for certain test-time corruptions and enable worst-case assessments in the transformation space, not just the input space [C]. In the context of Large Language Models (LLMs), we can speculate that such perturbations could serve as probing tools within their embedding space to explore their geometric properties. Nevertheless, we must acknowledge that, like many other researchers, we were primarily motivated by academic curiosity rather than their practical applications in this particular work.

[A] Roth et al., "Adversarial Training is a Form of Data-dependent Operator Norm Regularization", NeurIPS 2020.

[B] Etmann et al., "On the Connection Between Adversarial Robustness and Saliency Map Interpretability", ICML 2019.

[C] Kanbak et al., "Geometric robustness of deep networks: analysis and improvement", CVPR 2018.

Adversarial training

The natural accuracies are as follows; SDF AT: 90.8%, DDN AT: 89.0%, PGD AT: 87.3%.

The primary goal was to determine which adversarial generation technique improves robustness most effectively (PGD, DDN, or SDF), rather than comparing various adversarial training strategies (vanilla (Madry's) AT, TRADES, TRADES-AWP, HAT) Such approaches normally incorporate additional regularization techniques to improve Madry's method with PGD adversarial examples. Thus, our claim is not about developing a SotA robust model. Instead, our goal is to illustrate that vanilla AT, when coupled with minimum-norm attacks such as SDF, may yield better performance compared to PGD-based models. Therefore, we opted for vanilla adversarial training with SDF-generated samples and compared it to a network trained on DDN samples. Certainly, TRADES or similar AT strategies can be employed in conjunction with SDF. This is the subject of future work.

2023-11-23

I have read the comments of other reviewers and all replies. Thanks!

It is still not clear what benefits/insights might bring the search for the closest point on the decision boundary. We hope this curiosity-driven research will find some other useful applications. Despite this, the proposed methodology has several positive aspects that imho make this particular paper above the acceptance threshold:

The method has been evaluated on several datasets, including large-scale ImageNet. Community has witnessed lots of methods that work on small datasets but fail in practice, and ImageNet-level experiments bring the method closer to practice.
The attack was tested and outperformed not only on vanilla models, but also on robust models. There are indeed might be substantially different geometries of robust and non-robust models. Thus, the method that works on both is a good sign. Moreover, by request, authors also achieved positive results on ViT-B, which might also have completely different underlying geometries. It's encouraged to add ImageNet experiments with ViTs in the updated version.
The attack can be easily implemented for a batch-parallelized version and computationally efficient, which makes it a good choice for training robust models.
The method is indeed simple and surprisingly works

Therefore, I am changing my score to 6.

审稿意见

评分: 6置信度: 42023-11-04

The authors demonstrate that the l2 norm of the perturbations estimated by deepfool can be scaled down, while still fooling the model. Further, the authors demonstrate that these perturbations are not orthogonal to the decision boundary. These observations show that the perturbations generated by deepfool are not optimal. The authors propose a new projection step in the deep fool algorithm, which can help in aligning the perturbation to become perpendicular to the decision boundary. This helps in decreasing the minimum norm l2 distance required to fool the model and also helps in generating stronger attacks. Empirically the authors verify this by showing improved performance on metrics like the l2 norm of the perturbation required to fool the model and the attack success rate. Finally the authors incorporate the proposed super deepfool to the autoattack package and demonstrate around 0.1-0.3% improved attack strength.

优点

The proposed approach is well motivated and the solution provided is simple.
The results demonstrate improved attack strength over the existing minimum norm attacks.

缺点

It is not clear why the attack is strongest when the number of projection steps (value of n) is 1 and the number of deep fool attacks (value of m) is as large as possible. A detailed discussion on this would be helpful.
The comparison is done primarily for the l2 norm, it would be nice if the authors could also try their method for other norms like l-infinity norm.
Some parts of the paper seem a bit disjoint/not relevant. For example, it is not clear why the authors propose SDF adversarial training. The purpose of the paper is to demonstrate that the attack strength of deepfool can be improved, showing results on using super deepfool for adversarial training, doesn't seem to help strengthening this claim. In case the authors want to demonstrate the utility of using super deepfool for adversarial training, the authors should consider a more rigorous comparison with adversarial training methods like PGD [3], Trades [4] and Trades-AWP [5]. Just including Trades-AWP [5] as a baseline should also work.
Authors should also include comparison with some strong adversarial attacks which are not trying to find the minimum norm. For instance a comparison against guided margin aware attack [1] can help in developing a better understanding. The authors can also try to determine the norm from the proposed super deepfool and then try strong attacks like carlini and wagner attack [2], multi-targeted attack and guided margin aware attack [1] on the threat model with this norm.
It would be nice if the authors consider improving the presentation of tables. For instance, Table-3,8, Fig-3 have a lot of whitespace left on the right and left hand, which does not look good. The quality of the figure could also be improved. Further, I think it would be fine if the authors present merge Algorithm-1 and 2. There is a lot of duplicated content in these algorithms.
It would be nice if the authors incorporate their attack with the autoattack package and analyze a few more defences. Since Table-8 currently shows mixed trends, therefore a more rigorous analysis is required to justify the claims.

[1] Sriramanan, Gaurang et al. “Guided Adversarial Attack for Evaluating and Enhancing Adversarial Defenses.” ArXiv abs/2011.14969 (2020)

[2] Carlini, Nicholas and David A. Wagner. “Towards Evaluating the Robustness of Neural Networks.” 2017 IEEE Symposium on Security and Privacy (SP) (2016)

[3] Madry, Aleksander et al. “Towards Deep Learning Models Resistant to Adversarial Attacks.” ArXiv abs/1706.06083 (2017)

[4] Zhang, Hongyang et al. “Theoretically Principled Trade-off between Robustness and Accuracy.” ArXiv abs/1901.08573 (2019)

[5] Wu, Dongxian et al. “Adversarial Weight Perturbations Helps Robust Generalization.” ArXiv abs/2004.05884 (2020): n. pag.

问题

I would request the authors to kindly address the comments in the weakness section.

评论- Part I

2023-11-19

We are pleased that the reviewer appreciates the simplicity and efficiency of our proposed method. We have addressed their comments in detail below. If there are any further aspects we can clarify or improve to positively influence their view of our work, please let us know.

It is not clear why the attack is strongest...

Like any other gradient-based optimization method tackling a non-convex problem, providing a definitive explanation for why one algorithm outperforms others is not straightforward. We have the following speculation on why SDF( $\infty$ ,1) consistently outperforms the other configurations: Note that each projection step reduces the perturbation, while each DF step moves the perturbation nearer to the boundary. So when projection is repeated multiple times ( $n>1$ ), it might undo the progress made by DF, potentially slowing down the algorithm's convergence. On the other hand, by first reaching a boundary point through multiple DF steps and then applying the projection operator just once, we at least ensure that the algorithm has reached intermediate adversarial examples. Each subsequent outer loop is hoped to incrementally move the adversarial example closer to the optimal point. (see Figure 3).

The comparison is done primarily for the $\ell_2$ norm...

Many thanks for the suggestion. Although, SDF is primarily designed to find minimal $\ell_2$ perturbations, we tried a simple idea to extend it to $\ell_\infty$ norm by substituting the $\ell_2$ -projection step (line 5 in Algortihm 2) with the $\ell_\infty$ projection:

$\boldsymbol{x}\gets \boldsymbol{x}_0 + \frac{(\widetilde{\boldsymbol{x}}-\boldsymbol{x}_0)^\top\boldsymbol{w}}{||\boldsymbol{w}||_1} \text{sign}(\boldsymbol{w})$

The table below displays the results obtained for the $\ell_\infty$ norm. We conducted multiple $\ell_\infty$ attacks, namely FMN, FAB, and DF, on M1 [3] and M2 [6] adversarially trained networks on the CIFAR10 dataset. Our findings indicate that this variant of SDF also exhibits superior performance compared to other algorithms in discovering smaller $\ell_\infty$ perturbations.

Attacks	M1	FR	Grads	M2	FR	Grads
DF	0.031	96.7	24	0.043	97.4	31
FAB	0.025	99.1	100	0.038	99.6	100
FMN	0.024	100	100	0.035	100	100
SDF	0.019	100	33	0.027	100	46

We will add this result to the Appendix.

Some parts of the paper seem a bit disjoint/not relevant.

Adversarial training is more effective when conducted with strong, yet computationally efficient adversaries. Our proposed method is both fast and more accurate than the SotA in identifying adversarial perturbations, making it a logical choice for evaluating its effectiveness in enhancing the robustness of models through adversarial training.

The primary goal was to determine which adversarial generation technique improves robustness most effectively (PGD, DDN, or SDF), rather than comparing various adversarial training strategies (vanilla (Madry's) AT, TRADES, TRADES-AWP, etc.) Such approaches normally incorporate additional regularization techniques to improve Madry's method with PGD adversarial examples. Thus, our claim is not about developing a SotA robust model. Instead, our goal is to illustrate that vanilla AT, when coupled with minimum-norm attacks such as SDF, may yield better performance compared to PGD-based models. Therefore, we opted for vanilla adversarial training with SDF-generated samples and compared it to a network trained on DDN samples. Certainly, TRADES or similar AT strategies can be employed in conjunction with SDF, but we believe that falls beyond the scope of this paper.

Despite DDN AT already outperforms PGD AT, we have added PGD results for the sake of completeness.

Attack	SDF (Ours) Mean	SDF (Ours) Median	DDN Mean	DDN Median	PGD AT Mean	PGD AT Median
DDN	1.09	1.02	0.86	0.73	0.68	0.61
FAB	1.12	1.03	0.92	0.75	0.84	0.71
FMN	1.48	1.43	1.47	1.43	1.31	1.27
ALMA	1.17	1.06	0.84	0.71	0.74	0.70
SDF	1.06	1.01	0.81	0.73	0.69	0.64

Authors should also include comparison with some strong adversarial attacks...

We appreciate if the reviewer could clarify their point. From our understanding, C&W is a minimum-norm attack, and we have indeed included a comparison with it in our work. Once we fully grasp the specific type of comparison the reviewer is requesting, we will promptly provide it.

评论- Part II

2023-11-19

It would be nice if the authors consider improving the presentation of tables.

Thank you for this comment, we have attempted to improve the presentation by putting Tables 3 and 8 in-line with the text. This has removed some of the whitespace. We welcome any additional suggestions for improving the appearance of the paper. Regarding algorithms 1 and 2, the logic for having two separate algorithms is that the first one is for showing the general concept for binary case and the second one shows the multiclass version for SDF( $\infty$ ,1) which is the specific method that we show works best (we thought the general multiclass algorithm adds unnecessary complication and heavy notation).

It would be nice if the authors incorporate their attack with the autoattack package… As per the reviewer's request, we have conducted experiments on two other robust networks, namely, R5 [5] and R6 [7]. The updated Table 8 is provided below:

Models	Clean	AA acc	AA Grads	AA++ acc	AA++ Grads
R1	95.7%	82.3%	1259.2	82.1%	599.5
R2	90.3%	76.1%	1469.1	76.1%	667.7
R3	89.4%	63.4%	1240.4	62.2%	431.5
R4	88.6%	67.6%	933.7	68.4%	715.3
R5	89.05%	66.4%	846.3	62.5%	613.7
R6	88.02%	67.6%	721.4	63.4%	511.1
S	94.7%	0.00%	208.6	0.00	121.1

Furthermore, in the appendix, an alternative version of AA++ is presented, whereby we add SDF to the set of attacks, instead of substituting it with APGD $^\top$ , to ensure the preservation of AA's previous performance. We still see in this case that our modification improves the computational time by a factor of 20 $\%$ .

[6] Rony, Jerome et al. "Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. [7] Ding, Gavin Weiguang et al. "Mma training: Direct input space margin maximization through adversarial training." arXiv preprint arXiv:1812.02637, 2018.

评论- Reply to Authors

2023-11-23

I sincerely thank the authors for their efforts in responding to my concerns. I feel most of my concerns have been addressed other than these two:

I think that authors should perform a fine-grained analysis of the effect of different values of m,n on the strength of the attack. A plot for this in the camera-ready would be helpful.
I don't see any reason why authors performed AA+ analysis on [6], [7]. I checked the results reported on robustbench and these models don't seem to perform on par with other methods.

I request the authors to address the above two points in the camera-ready version. It would be great if the authors could include a table comparing all the methods on the robustbench leaderboard. I believe penalising authors for these things at this stage would be wrong, so I am happy to raise my score from 5 to 6.

2023-11-23

We are glad that we were able to address your concerns.

I think that authors should perform a fine-grained analysis of the effect of different values of m,n on the strength of the attack. A plot for this in the camera-ready would be helpful.

We will definitely add your suggestion to the final version.

I don't see any reason why authors performed AA+ analysis on [6], [7]. I checked the results reported on robustbench and these models don't seem to perform on par with other methods.

We have chosen these two networks to cover different adversarial training methods. We are currently running tests on all networks in RobustBench ( $\ell_2$ leaderboard), and these results will be included in the final version.

AC 元评审

2023-12-09

The authors propose with SDF an improved version of DeepFool for adversarial attacks with minimum l2-norm. The essential step seems to be adding an approximate projection onto the decision boundary. The authors show in several experiments that this minimal change to DeepFool seems to result in faster convergence compared to other minimum-norm based attacks.

Strengths:

the convergence of SDF seems to be significantly faster than SOTA attacks like ALMA, FAB, FMN
they have convergence results for DeepFool (however see Problem in the Proof of Proposition 1 below)

Weaknesses:

they are limited to the \ell_2-threat model while related work like ALMA, FAB, FMN can do any \ell_p-based attack
there is often no/little improvement to other attacks, thus the claims of improvement seem unjustified given the limited amount of models tested (see next point)
the experimental evaluation is non-systematic and the presentation rather unstructured. The number of selected models from RobustBench for a comparison to other minimum norm attacks is done is very small and it remains unclear why these models have been chosen
often only the median of the norms of the adversarial attacks is reported (sometimes with the mean). The median is a rather coarse statistic - one would expect to see always mean and median. Given that there is a lot of space left in the paper this omission looks strange to me. Also one of the main advantages of minimum-norm attacks is that one draw easily robust accuracy as a function of the radius - why are there no comparisons of the attacks using this?
it remains unclear why the authors want to replace APGD^T and not FAB in AutoAttack which would be the corresponding attack in AutoAttack
after the rebuttal a problem in the proof of Proposition 1 has been discovered

I think that this paper contains an interesting finding and the attack seems to work very efficiently but the comparison to the literature and the evaluation is very limited and does not fully justify the claims in the paper. As suggested by the reviewers I recommend the authors to concentrate on the main messages of the paper and properly show that SDF outperforms previous SOTA methods when tested on all models from RobustBench (or at least on a large subset).

Problem in the Proof of Proposition 1:

Typos: In (6) it should be the square of the norm of the gradient in the denominator. In (8) it should be the absolute value of f(x_n).
It remain unclear how the authors derived the inequality in (8). But let us assume that this inequality is correct, then it would hold: $s_{n+1} \leq \frac{L'}{2\zeta}(s_{n+1})^2$ as $s_{n+1}=\frac{|f(x_n)|}{\|\nabla f(x_n)\|}$ and thus $s_{n+1}\geq \frac{2\zeta}{L'}>0$ for all $n$ which contradicts the following statement that this sequence converges to zero. As this inequality seems to be used later on, the proof must be wrong.

为何不给更高分

see above

为何不给更低分

N/A

最终决定Reject

2024-01-16

Reject

Revisiting DeepFool: generalization and improvement

摘要

评审与讨论

优点

缺点

问题

伦理问题详情

优点

缺点

问题

Major comments

Clarifications needed

Comments on figures

Clarifications

Limitations

Improvements

优点

缺点

问题

Why ℓp\ell_pℓp​?

Adversarial training

优点

缺点

问题

为何不给更高分

为何不给更低分

Why $\ell_p$ ?