PaperHub
5.8
/10
Rejected4 位审稿人
最低5最高6标准差0.4
6
6
6
5
4.0
置信度
正确性3.0
贡献度2.8
表达2.8
ICLR 2025

Conflict-Aware Adversarial Training

OpenReviewPDF
提交: 2024-09-28更新: 2025-02-05
TL;DR

A new trade-off paradigm for adversarial training

摘要

关键词
Adversarial TrainingRobustness

评审与讨论

审稿意见
6

This work aims to address the limitations of traditional weighted-average adversarial training, the authors propose a novel framework called Conflict-Aware Adversarial Training (CA-AT). This work identifies that existing weighted-average AT methods face a gradient conflict, particularly with higher attack budgets. The proposed CA-AT mitigates this conflict by introducing a conflict-aware trade-off factor that adjusts the influence of standard gradient and the adversarial gradient based on their alignment.

优点

· This manuscript offers a new perspective on improving adversarial robustness by addressing gradient conflict.

· This work provides a theoretical analysis, supported by empirical experiments, to demonstrate the gradient conflict and present the motivation for CA-AT.

· Extensive experiments across different datasets are conducted to validate the effectiveness of the proposed CA-AT.

缺点

· Some detailed parameters are not clear, for example, the γ\gamma and the batch size. According to the study in [1], the batch size is related to the learning rate and they will influence the natural accuracy and robust accuracy.

[1] Bag of Tricks for Adversarial Training, ICLR, 2021.

· The adversarial attacks used to evaluate the adversarial training methods mainly belong to gradient-based methods, more optimization-based attacks can be used to evaluate the effectiveness of the proposed method.

· The main comparison method in this manuscript is vanilla AT. There are many improved adversarial training methods that can be considered for comparison.

问题

· How are the hyperparameter γ\gamma and the batch size set? What effect do they have on the performance of the proposed method?

· How well does the proposed method perform against more optimization-based attacks (e.g., C&W [2], DDN[3])?

[2] Towards Evaluating the Robustness of Neural Networks, IEEE SP, 2017.

[3] Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses, CVPR, 2019.

· How effective is the proposed method compared to the improved AT (e.g., TRADES, MART)?

评论

Weakness 2 and Question 2. The adversarial attacks used to evaluate the adversarial training methods mainly belong to gradient-based methods, more optimization-based attacks can be used to evaluate the effectiveness of the proposed method. How well does the proposed method perform against more optimization-based attacks (e.g., C&W [2], DDN[3])?

Response. Thank you for bringing this to our attention. We have evaluated our method against optimization-based attacks such as C&W [2] and DDN [3]. Please refer to Table 3 in the appendix for the results. We have also included results for black-box attacks like the Square attack [4] in Table 3. These evaluations show that CA-AT achieves better robustness against both black-box and optimization-based attacks.

评论

Weakness 3 and Question 3. The main comparison method in this manuscript is vanilla AT. There are many improved adversarial training methods that can be considered for comparison. How effective is the proposed method compared to the improved AT (e.g., TRADES, MART)?

Response. Thank you for pointing this out. We would like to clarify that Vanilla AT does not represent a single baseline but rather a paradigm that existing works use to achieve the trade-off between standard accuracy and adversarial accuracy. As presented in Equation 2, λLa+(1λ)Lc\lambda L_{a} + (1-\lambda) L_{c}, the adversarial loss LaL_{a} can be any improved AT method you mentioned (e.g., TRADES, MART). Moreover, we can incorporate these improved adversarial loss functions with CA-AT.

For TRADES [5] and CLP [6], please refer to our existing results in Figure 6 in the main draft and Figures 10 and 11 in the appendix. Additionally, we have added results comparing Vanilla AT and CA-AT using MART [7] as the adversarial loss function, where the superiority of CA-AT still holds.

[1] Bag of Tricks for Adversarial Training, ICLR, 2021.

[2] Towards Evaluating the Robustness of Neural Networks, IEEE SP, 2017.

[3] Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses, CVPR, 2019.

[4] Square attack: a query-efficient black-box adversarial attack via random search, ECCV, 2020.

[5] Trade-off between robustness and accuracy of vision transformers, CVPR 2023

[6] Adversarial logit pairing. arXiv preprint arXiv:1803.06373, 2018.

[7] Improving Adversarial Robustness Requires Revisiting Misclassified Examples, ICLR 2020

评论

Dear Reviewer 8orB,

Thank you again for dedicating your time and effort to review our paper! Your comments are constructive and we have put a lot of effort into updating our work based on your suggestions.

With just under two days remaining for the discussion, we would love to know if we have adequately addressed your concerns and whether this has influenced your score. If you have any questions, we are also happy to discuss anything further. Thank you!

评论

Thank authors for their responses. Their explanations addressed most of my concerns. Thus I am willing to recommend a positive rating score.

评论

Dear Reviewer 8orB,

Thank you a lot for your time and for acknowledging the efforts we put into addressing your concerns in our rebuttal. We are super happy that our rebuttal addresses most of your concerns.

As ICLR is a very competitive conference, we would like to make our submission as strong as possible based on your feedback. We kindly want to know if there might be some things we can do to make you raise your score.

Thank you again for participating in the discussion.

评论

We truly appreciate the great comments from the reviewer, and here are our responses to the proposed questions. If you find our response satisfactory, please consider raising your score.

We truly appreciate your constructive comments, and here are our responses to your proposed weaknesses and questions.

Weakness 1 and Question 1. Some detailed parameters are not clear, for example, the γ\gamma and the batch size. According to the study in [1], the batch size is related to the learning rate and they will influence the natural accuracy and robust accuracy. How are the hyperparameter γ\gamma and the batch size set? What effect do they have on the performance of the proposed method?

Response. Thank you for raising concerns about the sensitivity of hyperparameters such as batch size and γ\gamma. The roles of these two hyperparameters are different; please allow us to explain them separately.

  1. Projection Margin Threshold γ\gamma: This is a specific hyperparameter for CA-AT. It serves a similar role to the linear trade-off factor λ\lambda in Vanilla AT (Eq 2), which controls the trade-off between standard accuracy and adversarial accuracy. In Figures 4–7, we plotted the trade-off curves for Vanilla AT and CA-AT produced by different λ\lambda and γ\gamma, respectively. These figures show that CA-AT can achieve better trade-offs across various scenarios.

  2. Batch Size: This is a training hyperparameter. The paper [1] you cited draws attention to the influence of training hyperparameters such as batch size and learning rate on performance. We conducted additional experiments for the ablation study on Vanilla AT and CA-AT with different batch sizes (Figure 13(b) in the appendix) and learning rates (Figure 13(a) in the appendix). Our observation is that although batch size and learning rate affect the standard accuracy and adversarial accuracies against various attacks, CA-AT consistently leads to better standard performance and adversarial robustness across different batch sizes and learning rates.

审稿意见
6

This paper proposes a Conflict-Aware Adversarial Training (CA-AT) gradient operation method that effectively alleviates the gradient conflict between the clean loss and adversarial loss. The proposed method is well-supported by theoretical foundations and shows strong experimental results. Findings indicate that, with the gradient conflict mitigation approach, CA-AT improves the balance between standard accuracy (SA) and adversarial accuracy (AA) — the SA-AA frontier — compared to vanilla adversarial training.

优点

  1. The paper is well-organized, featuring a logical structure with clear illustrations, algorithms, experimental results, images, and tables, making it easy to understand and follow.

  2. The notation and terminology throughout the paper are highly consistent and precise, enhancing clarity and readability.

  3. The paper introduces the metric μ\mu based on the weighted average method to measure the conflict and convergence between gradients, providing theoretical upper bounds. The logical flow is clear, and the conclusions align well with practical observations.

  4. In addition to experiments on real datasets, the authors conducted synthetic experiments, which further reinforce the credibility of the proposed method.

  5. The experimental section is detailed and well-explained, making it easy for readers to comprehend the setup and results.

  6. Apart from the original cross-entropy loss, the authors extensively study the performance of the proposed methods under various other loss functions and test them against different types of attacks. The experimental design is comprehensive and thorough.

缺点

  1. Please provide detailed accuracy results for experiments on ViT and Swin-T in table format.

  2. Does CA-AT also perform well on larger datasets such as ImageNet?

  3. Additional baselines that achieve similar levels of balance should be included, such as other advanced weighted methods or gradient operations. Using only Vanilla AT as a baseline is insufficient.

问题

  1. The exact form of λ\lambda^* is interesting. Could you elaborate on why this form was chosen? What was the inspiration, and what benefits does it offer?

  2. Please include additional baselines in the experimental section to allow for a more extensive and objective comparison.

评论

Question 1 The exact form of λ\lambda^{*} is interesting. Could you elaborate on why this form was chosen? What was the inspiration, and what benefits does it offer?

Response Thank you for your interest in the form of λ\lambda^*. The form of λ\lambda^* was inspired by our desire to adaptively adjust the gradient for parameter updating based on the level of gradient conflict. Specifically, λ\lambda^* depends on the ϕ\phi, which is the cosine similarity between the clean gradient gcg_c and the adversarial gradient gag_a.

By dynamically adjusting λ\lambda^* based on the degree of conflict, CA-AT can prioritize either the standard or adversarial loss depending on which gradient direction is more favorable at a given training step. This adaptive mechanism ensures that, in scenarios where the gradients are in significant conflict, the influence of adversarial loss is reduced to avoid harming standard accuracy, and vice versa. This flexibility allows CA-AT to achieve a better balance than a fixed trade-off.

We will add a more detailed discussion on the inspiration and motivation for the specific form of λ\lambda^* in the revised manuscript.

[1] Trade-off between robustness and accuracy of vision transformers, CVPR 2023

评论

Dear Reviewer b5Cc,

Thank you a lot for dedicating your time and effort to review our draft!

With just under two days remaining for the discussion, we would love to know if we have adequately addressed your concerns and whether this has influenced your score. Please feel free to let us know if you have any further questions about our paper.

评论

Thank you for addressing my concerns. I will keep my original ratings unchanged.

评论

Dear Reviewer b5Cc,

Thank you a lot for your time and for acknowledging the efforts we put into addressing your concerns in our rebuttal.

We respect your judgment and understand that you have decided to maintain your original ratings. However, as ICLR is a very competitive conference, we would like to make our submission as strong as possible based on your feedback. We just kindly want to know if there are still some things we can do to make you raise your score.

Thank you again for participating in the discussion.

评论

Weakness 3 and Question 2 Additional baselines that achieve similar levels of balance should be included, such as other advanced weighted methods or gradient operations. Using only Vanilla AT as a baseline is insufficient. Please include additional baselines in the experimental section to allow for a more extensive and objective comparison.

Response. Thank you for pointing this out. We would like to clarify that Vanilla AT does not represent a single baseline but a paradigm that existing works use to achieve the trade-off between standard accuracy and adversarial accuracy. As shown in Equation 2, λLa+(1λ)Lc\lambda L_{a} + (1-\lambda) L_{c}, the adversarial loss LaL_{a} can be designed using other methods such as TRADES [1]. We can also incorporate these improved adversarial loss functions with CA-AT. We have conducted experiments where the adversarial loss is Cross Entropy, TRADES, CLP, and MART. Please refer to Figures 6, 10, 11, and Table 6in our main paper and appendix.

We believe the 'advanced weighted methods' you mentioned are methods that adjust λ\lambda in Equation 2 during training to achieve a better trade-off between standard accuracy and adversarial accuracy. We conducted a thorough literature review on adversarial training but have not found any such work. To the best of our knowledge, our work is the first in this direction. If you are aware of any related work, could you please let us know? We can conduct a comparative study based on them as soon as possible. Additionally, regarding other methods utilizing gradient operations during adversarial training, we believe our paper is the first to observe the problem of gradient conflict and to propose a method based on gradient operations to alleviate this problem. Please check our clarification on this in the Introduction section.

评论

We truly appreciate the great comments from the reviewer, and here are our responses to the proposed questions. If you find our response satisfactory, please consider raising your score.

Weakness 1 Please provide detailed accuracy results for experiments on ViT and Swin-T in table format.

Response. Sure, here is a case for Swin-T as follows. However, we would like to clarify that Figure 5 and Figure 5 are clearer and show the tradeoff curve significantly compared to the table.

StandardFGSMPGDAutoPGDMIFGSMFABPGD-L2AutoPGD-L2T-AutoPGD-DLRT-AutoPGD-L2T-FABAdversarial Mean Acc
lambda=00.86720.26110.04980.03070.08870.07110.85350.76460.01810.76050.05990.2958
lambda=0.50.82900.40890.29950.27180.32970.29580.81960.77190.24150.76560.29100.4495
lambda = 10.72720.35150.27950.26260.29780.25140.71850.66970.21450.65620.24920.3951
Ours, gamma=0.90.86560.44370.31880.29170.35310.32650.85840.81270.26470.80760.32130.4799
Ours, gamma=0.80.83140.42950.32600.30210.35470.31790.82340.78110.26890.77210.31320.4689

Weakness 2 Does CA-AT also perform well on larger datasets such as ImageNet?

Response. We appreciate your suggestion. However, given the substantial computational resources required for adversarial training on ImageNet and the limitations we face in academic institutions, conducting experiments on ImageNet is not feasible within the rebuttal period. Nevertheless, we have added new results using the Tiny ImageNet dataset, as shown in Table 5 in the appendix, which further demonstrates the superior performance of our proposed method on relatively large-scale datasets. We plan to include results on ImageNet in future work.

审稿意见
6

The paper proposes to use a dynamic conflict-aware factor to control the trade-off between the standard and adversarial loss. Only the standard gradient is used if the adversarial and standard gradients are close. Otherwise, the adversarial gradient is emphasized. Based on Figure 3 in the paper, I suppose both gradients tend to be similar in the later training phase, and the proposed methods will mainly optimize the standard loss. Extensive experiments are conducted and the results are promising.

优点

  • The paper is easy to follow.
  • Extensive experiments are conducted to demonstrate the effectiveness of the proposed method empirically.

缺点

  • The proposed factor seems a bit bizarre to me. Take Figure 1 as an example. Any gag_a that ends in the dotted line with arccos(ga,gb)>γarccos(g_a, g_b) > \gamma will result in the same gg^*, which doesn't make sense to me. Imagine gag_a equals gog_o in Figure 1, and one will still get the same gg^*.
  • One should conduct an ablation study by using traditional λ\lambda-weighted mean of gag_a and gcg_c when ϕγ\phi \leq \gamma and only gcg_c when ϕ>γ\phi > \gamma, as I suspect that this might be the reason for the performance boosts.
  • Some other suggestions:
    • Add the legends for Figure 6 (b) and (c).
    • Conduct experiments on ImageNet.
    • Run multiple-trails and report the error bar.

问题

Please refer to those in the weaknesses section.

评论

Weakness 3. Some other suggestions: (1) Add the legends for Figure 6b and 6c. (2) Conduct experiments on ImageNet. (3)Run multiple-trails and report the error bar.

Response.

(1) Thank you for pointing this out. We will update our draft accordingly and added the legends to Figures 6b and 6c.

(2) We appreciate your suggestion. However, given the substantial computational resources required for adversarial training on ImageNet and the limitations we face in academic institutions, conducting experiments on ImageNet is not feasible within the rebuttal period. Nevertheless, we have added new results using the Tiny ImageNet dataset, as shown in Table 5 in the appendix, which further demonstrate the superior performance of our proposed method on relatively large-scale datasets. We plan to include results on ImageNet in future work.

(3) Thank you for the suggestion. While including error bars is indeed valuable, it is not common practice in adversarial machine learning literature (e.g., [1]). We are currently running additional experiments to include error bars as you suggested; however, due to the short rebuttal period, we have prioritized addressing your valuable feedback in Weakness 2. We will update you with the new numerical results and error bars as soon as they are completed.

[1] Bag of Tricks for Adversarial Training, ICLR, 2021.

评论

Weakness 2. One should conduct an ablation study by using traditional λ\lambda-weighted mean of gag_a and gcg_c when ϕγ\phi \leq \gamma and only gcg_c when ϕ>γ\phi > \gamma, as I suspect that this might be the reason for the performance boost.

Response. Thank you for the constructive suggestion, which led us to further analyze the performance gain of CA-AT and investigate whether it is brought about by the gradient projection or just the threshold. Based on your suggestion, we conducted an ablation study using the traditional λ\lambda-weighted mean of gag_a and gcg_c when ϕγ\phi \leq \gamma and only gcg_c when ϕ>γ\phi > \gamma. We define this ablated version of CA-AT as CA-AT-AV in Algorithm 2 in the appendix. The comparison results shown in Figure 12 for Vanilla AT, CA-AT, and CA-AT-AV demonstrate that the performance boost is indeed due to the gradient projection, confirming the effectiveness of our method in improving the trade-off between standard accuracy and adversarial accuracy.

评论

We truly appreciate the great comments from the reviewer, and here are our responses to the proposed questions. If you find our response satisfactory, please consider raising your score.

Weakness 1. The proposed factor seems a bit bizarre to me. Take Figure 1 as an example. Any gag_a that ends in the dotted line with arccos(ga,gc)>γ\arccos(g_a, g_c) > \gamma will result in the same gg^*, which doesn't make sense to me. Imagine gag_a equals gg_\circ in Figure 1, and one will still get the same gg^*.

Response. Thank you for expressing your concerns regarding the design of the CA-AT algorithm. Our intention for designing CA-AT is to ensure that the final gradient gg_{*} is always aligned to avoid any harmful directional conflicts with the standard gradient gcg_{c}, as standard accuracy is highly prioritized in industrial applications of deep learning.

It is true that any gag_{a} satisfying arccos(ga,gc)>γarccos(g_{a}, g_{c}) > \gamma will be projected onto the cone around the standard gradient gcg_{c} to obtain gg_*. (please check Lines 75–75). This projection is calculated in Equation 4. Besides, the gg_* are not the 'same'. Imagine in 3-dimensional or high-dimensional space, assuming we have a set of gag_{a} that any item satisfies arccos(ga,gc)>γ\arccos(g_a, g_c) > \gamma, we will project all of them onto the cone of gcg_{c} and get a new set of gg_{*} according to CA-AT. Please note that Figure 1 illustrates this concept in 2-dimensional space for simplicity, which does not fully capture the nuances present in higher-dimensional spaces. We hope this clarification addresses your concern.

We hope this explanation addresses your concern. If not, could you please elaborate on why you believe that any gag_a ending on the dotted line with arccos(ga,gc)>γ\arccos(g_a, g_c) > \gamma would result in the same gg_*, and why this does not make sense to you?

评论

I appreciate the authors’ clarification. My primary concern remains with the cone projection operation, which seems more like an engineering solution than a fundamental approach to me. Other reviewers have also raised concerns regarding the novelty of the method. However, the performance of the proposed method is quite strong as reported, which is further demonstrated by the ablation study provided during the rebuttal. Therefore, I would like to increase my score to 6.

评论

Thank you for your response and increased score!

审稿意见
5

The paper explores the issue of gradient conflict in adversarial training, highlighting how traditional methods struggle to balance standard performance and adversarial robustness. The authors introduce Conflict-Aware Adversarial Training (CA-AT), which employs a novel trade-off paradigm based on gradient alignment to address this conflict. Through extensive experiments, CA-AT shows consistent improvements in the trade-off between standard and adversarial accuracy across various datasets and model architectures.

优点

  1. The proposed Conflict-Aware Adversarial Training (CA-AT) effectively addresses gradient conflict, achieving a better balance between standard performance and adversarial robustness.

  2. Comprehensive experimental results across various datasets and model architectures validate the effectiveness of CA-AT in improving the trade-off between standard and adversarial accuracy.

  3. The method demonstrates strong performance in both training from scratch and parameter-efficient fine-tuning, showcasing its versatility.

缺点

  1. The proposed CA-AT aims to manipulate the gradient of the clean examples and the adversarial example, the idea is not novel, either for input gradient alignment [1] or model gradient alignment [2].

  2. I think starting from PGD adversarial training, they just use the adversarial example for training, rather than the combination that CA-AT wants to tackle.

  3. Although CA-AT shows improved performance over Vanilla AT in several experiments, it lacks comparisons with other advanced adversarial training methods.

  4. The price of the double propagation should also be considered.

[1] Understanding and Improving Fast Adversarial Training

[2] Quantifying the preferential direction of the model gradient in adversarial training with projected gradient descent

问题

  1. In line 268, a reference is missing.
  2. The authors argue that the weighted-average method does not provide the best tradeoff for the standard performance and adversarial robustness, but in my mind, CA-AT also doesn't, like Vanilla AT can also beat CA-AT in your FAB experiments.
  3. What leads to the large fluctuation during epoch 60-90 in the middle image in Fig.3(a)?
  4. Is the initial learning rate 0.4 the normal setting? As I remember the decayed learning rate schedule is more common.
评论

Question 1. In line 268, a reference is missing.

Response. Thank you for catching this typo. We have corrected it our draft.

Question 2. The authors argue that the weighted-average method does not provide the best tradeoff for the standard performance and adversarial robustness, but in my mind, CA-AT also doesn't, like Vanilla AT can also beat CA-AT in your FAB experiments.

Thank you for pointing this out. We believe your concern arises from the FAB results for ResNet18 in Table 1 (please correct us if you are referring to other results in our draft). It is true that Vanilla AT can outperform CA-AT in defending against FAB attacks when the L-infinity bound is larger than 8/2558/255.

However, CA-AT outperforms Vanilla AT against many other attacks, which are much stronger than FAB. (e.g. Vanilla AT performs 0.809 on FAB while only 0.3996 on AutoPGD). Additionally, when we use ResNet34 instead of ResNet18, CA-AT achieves better performance against FAB attacks across different L-infinity bounds.

Question 3. What leads to the large fluctuation during epoch 60-90 in the middle image in Fig.3(a)?

Response: Thank you for pointing out this interesting observation. We believe you are referring to the fluctuation of the red line (Ours, γ=0.9\gamma=0.9) between epochs 60–90. This can be attributed to the learning rate schedule. During these epochs, the one-cycle learning rate schedule we used (please see Lines 371–374) involves a high learning rate, which can result in increased instability and thus larger fluctuations for the gradient conflict μ\mu. In addition, a similar fluctuation occurs in the blue line (Vanilla AT, λ=0.5\lambda=0.5) during epochs 60–90 in the right subfigure of Fig. 3(a). Our conclusion is that choosing an appropriate learning rate is important to reduce the gradient conflict μ\mu for both Vanilla AT and CA-AT, which can be explored further in future work. We have added a related discussion of this phenomenon in Section 3.2.

Question 4. Is the initial learning rate 0.4 the normal setting? As I remember the decayed learning rate schedule is more common.

Response. You are correct that a decayed learning rate schedule is a common practice in adversarial training. In our experiments, we used an initial learning rate of 0.4 combined with the one-cycle learning rate policy (Lines 370-377). This approach has been shown to converge faster [8]. To further address your concern, we also conducted an ablation study for different initial learning rates (lr=0.1,0.2,0.3lr=0.1,0.2,0.3). Please check the results in Figure 13 in the Appendix.

[1] Understanding and Improving Fast Adversarial Training, NeurIPS 2020

[2] Quantifying the preferential direction of the model gradient in adversarial training with projected gradient descent, Pattern Recognition 2023

[3] Explaining and Harnessing Adversarial Examples, ICLR 2015

[4] Towards Deep Learning Models Resistant to Adversarial Attacks, ICLR 2018

[5] Theoretically principled trade-off between robustness and accuracy, ICML 2019

[6] Once-for-all adversarial training: In-situ tradeoff between robustness and accuracy for free, NeurIPS 2020

[7] Trade-off between robustness and accuracy of vision transformers, CVPR 2023

[8] Improving Adversarial Robustness Requires Revisiting Misclassified Examples, ICLR 2020

评论

We truly appreciate the great comments from the reviewer, and here are our responses to the proposed questions. If you find our response satisfactory, please consider raising your score.

Weakness 1. The proposed CA-AT aims to manipulate the gradient of the clean examples and the adversarial example, the idea is not novel, either for input gradient alignment [1] or model gradient alignment [2].

Response: Thank you for pointing this out and for bringing these related papers to our attention. We acknowledge the relevance of input gradient alignment [1] and model gradient alignment [2], and we would like to clarify how our proposed CA-AT fundamentally differs from these two works.

Input gradient alignment [1] addresses catastrophic overfitting, where a model rapidly loses adversarial robustness within a single training epoch when trained with FGSM. Their method alleviates this problem by improving the quality of FGSM-generated adversarial examples through a regularization term, (1cos(xL(x,y),xL(x+δ,y)))(1 - \cos (\nabla_x L(x, y), \nabla_x L(x + \delta, y))) to align the gradients within the perturbation set explicitly. In contrast, [2] focuses on enhancing the alignment of model gradients by defining a "preferential direction" toward the nearest incorrect class boundary using GANs, aiming to improve how well the model's gradient direction matches an ideal defensive direction against adversarial changes.

While both works contribute valuable insights to adversarial training, the problems they addressed are different from the gradient conflict issue that we defined and observed (Lines 49–79), where the directions of the standard gradient and adversarial gradient may contradict each other during adversarial training. Additionally, neither [1] nor [2] proposes methods to directly manipulate the model's gradients as we did in CA-AT.

We understand that the similar terminology might have caused confusion. To clarify, we will cite and discuss these two papers in the related works section.

评论

Weakness 2. I think starting from PGD adversarial training, they just use the adversarial example for training, rather than the combination that CA-AT wants to tackle.

Response. Thank you for bringing this up, and we appreciate the opportunity to clarify. It is true that early works on adversarial training [3,4] used only adversarial samples to construct the training loss. However, as we mentioned in the first paragraph of the related works section (Lines 95–107), recent studies [5,6,7] have begun using a hybrid loss that combines standard loss and adversarial loss. This approach aims to achieve a trade-off between standard accuracy and adversarial accuracy. The hybrid training loss is the combination that our paper wants to tackle. We have revised the related works section in our draft to make this clearer, and we hope this resolves any confusion.

Weakness 3. Although CA-AT shows improved performance over Vanilla AT in several experiments, it lacks comparisons with other advanced adversarial training methods.

Response. Thank you for pointing this out. We would like to clarify that Vanilla AT here is not a single baseline but a paradigm that existing works use to achieve the trade-off between standard and adversarial accuracy. To be more specific, recalling the formula we presented in Eq 2 as λLa+(1λ)Lc\lambda L_{a} + (1-\lambda) L_{c}, the adversarial loss LaL_{a} can be designed in many ways, including but not limited to Cross Entropy, TRADES, and CLP (Lines 477–501 and Figure 6). We have also added new experimental results for MART [8] in Table 6 in the Appendix. Experimental results show the superiority of CA-AT among them.

Weakness 4. The price of the double propagation should also be considered.

Response. Thank you for expressing your concern about the computational cost of CA-AT. We acknowledge that CA-AT involves an additional gradient projection step, which could potentially increase training time for one more propagation compared to Vanilla AT. However, we believe that the additional cost of one more backpropagation is acceptable. We have added some sentences to point out that CA-AT is more appropriate for adversarial PEFT than full fine-tuning when dealing with very large models like ViT (Lines 526-539). Moreover, existing works aimed at enhancing adversarial training usually involve acceptable increases in computational cost. Kindly take the references you provide as the example, [1] requires an additional backpropagation to compute xL(x+δ,y)\nabla_x L(x + \delta, y), and [2] involves additional models like GANs to approximate the "preferential direction".

评论

Dear Reviewer bDvp,

Thank you again for dedicating your time and effort to review our paper!

With just under two days remaining for the discussion, we would love to know if we have adequately addressed your concerns and whether this has influenced your score. We have put a lot of effort into updating our work and would value your feedback. We are also happy to conduct any further discussion. Thank you!

评论

Dear Reviewer bDvp,

Thank you for taking the time to review our paper and for providing valuable feedback that has helped us improve our work. We wanted to kindly follow up to see if our rebuttal addresses your concerns, and whether this has influenced your original score.

We would love to address any questions you might have. Thank you again for your time and effort in reviewing our work.

评论

Dear Reviewer bDvp,

We sincerely appreciate you taking the time to review our paper and providing insightful feedback that has helped us enhance our work. We wanted to kindly check if our rebuttal has addressed your concerns and whether this has influenced your initial evaluation.

Please feel free to let us know if you have any further questions. Thank you once again for your time and effort in reviewing our submission.

评论

Dear Reviewers,

We appreciate the valuable and constructive comments from the reviewers on our work! We have responded to the concerns and conducted additional experiments based on your suggestion. The new experimental results and sentences that we added during the rebuttal period were marked as red in our regular paper and appendix. Please let us know if you have any questions for our response. We are happy to discuss any further questions.

AC 元评审

The paper tackles an important problem in adversarial training by addressing gradient conflicts through a conflict-aware trade-off factor. However, the proposed method lacks sufficient novelty, as gradient alignment has been explored in existing works, and CA-AT appears more incremental than groundbreaking. The comparisons are primarily limited to Vanilla AT, without sufficient evaluation against advanced methods like TRADES or MART, raising questions about its relative effectiveness. Furthermore, the experiments are confined to small-scale datasets, with no results on ImageNet or other large-scale benchmarks, undermining claims of broad applicability. Empirical results are further weakened by the absence of error bars and robust statistical analysis, while certain cases show inferior performance compared to the baseline. Additionally, the method introduces significant computational overhead, and hyperparameter sensitivity is insufficiently analyzed, limiting practical utility. Due to these substantial weaknesses, including limited originality, incomplete evaluations, and methodological concerns, the paper falls short of the standard for acceptance

审稿人讨论附加意见

During the rebuttal period, the authors addressed several concerns raised by reviewers, including the novelty of the proposed CA-AT method, insufficient comparisons with advanced adversarial training baselines, limited scalability to large datasets, lack of empirical rigor, high computational cost, and unclear hyperparameter sensitivity. While the authors clarified CA-AT's contributions relative to existing gradient alignment methods and added some baseline comparisons (e.g., MART). The lack of ImageNet-scale experiments, justified by computational constraints, further limited the method's practical applicability. Efforts to address empirical rigor (e.g., error bars) and hyperparameter ablations were partial and not fully integrated into the main narrative. Although the authors defended the computational cost, the absence of evaluations on large models weakened their argument. Overall, the responses did not adequately resolve key concerns about the paper's incremental novelty, incomplete evaluations, and limited scalability, leading to the decision to reject.

最终决定

Reject