PaperHub
7.3
/10
Poster4 位审稿人
最低4最高5标准差0.5
5
4
5
4
4.0
置信度
创新性3.3
质量3.3
清晰度3.5
重要性3.0
NeurIPS 2025

Consensus-Robust Transfer Attacks via Parameter and Representation Perturbations

OpenReviewPDF
提交: 2025-05-12更新: 2025-10-29

摘要

关键词
Black-box adversarial attack; Transferability

评审与讨论

审稿意见
5

The paper introduces CORTA, a novel transfer-based black-box adversarial attack designed to enhance the transferability of adversarial examples across deep neural networks (DNNs). It identifies two primary sources of transfer failure—decision-boundary variation and representation drift—and proposes a unified framework to address these through parameter perturbations and feature blending on a surrogate model. The approach is formalized as a Distributionally Robust Optimization (DRO) problem, with practical first-order approximations for scalability. The authors claim that CORTA outperforms state-of-the-art methods, achieving, for instance, a 19.1% higher transfer success rate (TSR) when transferring from ResNet-18 to Swin-B on CIFAR-100. Extensive experiments on ImageNet and CIFAR-100 across convolutional and transformer architectures are presented to support these claims.

优缺点分析

Strengths:

  1. The authors provide a rigorous theoretical analysis, deriving upper bounds on the worst-case adversarial loss using Lipschitz continuity and first-order approximations. This enhances the credibility of the proposed approximations for parameter linearization and Monte Carlo sampling for feature blending.

  2. The paper's conceptualization of transferability as a consensus-robust optimization problem is innovative. By modeling target models as perturbed versions of the surrogate via parameter and representation channels, it provides a principled approach to tackle both decision-boundary variation and representation drift. The DRO formulation is a strong theoretical contribution, aligning well with the goal of robust transferability.

Weaknesses:

  1. The paper explicitly states that it follows prior work in reporting only average results without error bars or statistical significance tests. This is a significant methodological flaw, as it hinders the assessment of result reliability, especially given the variability inherent in adversarial attack experiments. For example, Table 1 and Table 2 report TSRs without confidence intervals, making it difficult to evaluate the consistency of CORTA's performance across runs or datasets.

  2. While CORTA is tested with ResNet-18 and ViT-Tiny as surrogates, the baselines Ens and AdaEA use multiple surrogates (2 CNNs and 2 ViTs). This discrepancy makes direct comparisons less fair, as ensemble methods inherently leverage more diverse surrogate information. The paper would benefit from evaluating CORTA with multiple surrogates to better align with these baselines and demonstrate its robustness.

  3. This paper briefly mentions hyperparameters (e.g., perturbation probability ρ=0.5\rho=0.5, blending proportion λ\lambda sampled from U[0.25,1], and trade-off coefficient β\beta tuned empirically), but lacks a detailed ablation study to justify these choices. For instance, the choice of λmin=0.25forfeatureblendingisnotexplained,noristhereananalysisofhowsensitiveCORTAsperformanceistovariationsin\lambda_{min}=0.25 for feature blending is not explained, nor is there an analysis of how sensitive CORTA's performance is to variations in \betaoror\rho$. This omission limits the understanding of the method's robustness to hyperparameter settings.

问题

See weaknesses for details.

局限性

Yes

最终评判理由

The authors addressed my major concerns. I also paid attention to the other reviewers' questions and the replies by the authors. I consider this paper to be of high quality and useful for community development. Therefore I increase my rating by 1 point.

格式问题

N/A

作者回复

W.1 The paper explicitly states that it follows prior work in reporting only average results without error bars or statistical significance tests. This is a significant methodological flaw, as it hinders the assessment of result reliability, especially given the variability inherent in adversarial attack experiments. For example, Table 1 and Table 2 report TSRs without confidence intervals, making it difficult to evaluate the consistency of CORTA's performance across runs or datasets.

A.1 Thank you for highlighting the importance of reporting error bars or statistical significance tests. While prior papers do not include error bars, we appreciate your suggestion, as providing confidence intervals offers a clearer assessment of result reliability—particularly given the inherent variability in adversarial attack experiments.

We re-ran the experiments from Table 1 and Table 2 on 100 randomly selected samples, repeating each experiment 20 times with different random seeds. The results are now reported as mean ± standard deviation. Please note that these are preliminary results; full experimental results will be included in the revised paper.

Table.1 :

DatasetAttackRN-50WRN-101BiT-50BiT-101AvgViT-BDeit-BSwin-BSwin-SAvg
Admix84.2±4.192.5±2.575.6±4.375.6±3.982.0±1.939.3±4.333.0±2.837.0±5.052.7±4.040.5±2.3
Ens91.0±1.182.0±0.486.2±1.176.8±0.984.0±0.770.2±0.784.6±0.570.0±1.385.3±2.777.5±0.8
AdaEA88.0±1.780.2±1.182.1±1.871.9±1.580.6±0.761.8±1.379.8±1.763.0±2.178.0±2.470.7±1.2
CIFAR-100DHF90.8±0.889.1±1.477.0±2.572.3±1.982.3±0.937.0±2.227.5±2.325.4±1.852.0±2.135.5±1.1
BFA86.2±0.889.4±0.870.2±1.068.2±0.679.5±0.534.0±0.730.7±0.927.1±0.450.9±1.036.6±0.5
ANDA94.8±1.397.3±0.877.9±1.282.5±0.888.1±0.542.0±1.834.6±0.935.7±1.649.0±1.840.3±0.8
Ours98.8±1.4100.0±0.099.8±0.796.8±0.998.7±0.698.8±1.495.4±1.493.3±1.394.9±1.595.4±0.6
Admix91.4±1.783.3±2.781.8±2.967.3±3.081.0±1.121.8±3.434.3±2.424.1±3.031.6±2.928.0±1.7
Ens69.6±1.165.9±1.765.0±2.152.2±1.263.2±0.444.8±2.066.8±0.724.9±1.936.0±1.443.1±0.5
ImageNetAdaEA73.8±2.266.6±2.261.0±3.046.4±3.862.0±1.636.6±2.254.2±2.423.2±1.828.4±2.735.6±1.4
DHF98.8±0.696.2±1.095.4±1.688.3±2.094.7±0.738.8±2.452.6±2.840.1±2.252.4±2.646.0±1.5
BFA99.5±0.697.5±0.695.5±0.690.2±1.095.9±0.439.0±0.852.5±0.644.2±1.756.2±1.549.6±0.4
ANDA95.5±0.587.9±0.686.6±1.270.3±0.785.1±0.537.9±1.153.4±0.935.9±1.546.8±1.043.5±0.6
Ours99.9±0.498.1±1.497.5±1.489.3±2.096.0±1.045.6±3.465.9±2.852.4±2.965.2±2.456.9±1.8

Table.2:

AttackRN-50WRN-101BiT-50BiT-101AvgViT-BDeit-BSwin-BSwin-SAvg
Ens69.6±1.165.9±1.765.0±2.152.2±1.263.2±0.444.8±2.066.8±0.724.9±1.936.0±1.443.1±0.5
AdaEA73.8±2.266.6±2.261.0±3.046.4±3.862.0±1.636.6±2.254.2±2.423.2±1.828.4±2.735.6±1.4
Admix39.8±2.848.8±2.447.4±4.139.4±2.443.9±1.648.6±2.566.6±2.323.8±1.928.4±2.141.8±1.3
DHF43.8±2.551.6±2.655.8±3.745.7±3.549.2±1.871.0±2.679.0±2.330.0±3.538.1±4.154.5±1.8
BFA55.0±0.060.6±0.563.7±1.050.6±0.557.5±0.075.4±0.590.6±0.540.7±1.046.4±0.563.3±0.4
ANDA55.6±0.864.5±0.573.2±0.962.8±0.464.0±0.368.9±0.678.0±0.242.6±0.554.7±0.861.0±0.2
Ours62.7±3.871.2±4.073.1±3.063.4±3.267.6±2.082.8±4.190.5±3.052.3±4.363.9±4.272.4±2.3

W.2 While CORTA is tested with ResNet-18 and ViT-Tiny as surrogates, the baselines Ens and AdaEA use multiple surrogates (2 CNNs and 2 ViTs). This discrepancy makes direct comparisons less fair, as ensemble methods inherently leverage more diverse surrogate information. The paper would benefit from evaluating CORTA with multiple surrogates to better align with these baselines and demonstrate its robustness.

A.2 To ensure a fair comparison, we implemented "CORTA-ensemble" using the exact same surrogate set as Ens and AdaEA—ResNet-18, Inception-v3, ViT-Tiny, and DeiT-Tiny. The results on ImageNet are shown below:

MethodRN-50WRN-101BiT-50BiT-101Avg (CNN)ViT-BDeiT-BSwin-BSwin-SAvg (ViT)
Ours (ResNet-18)98.595.895.592.495.547.663.854.264.157.4
Ours (ViT-Tiny)63.670.474.468.369.277.887.850.758.468.7
Ens71.263.262.554.963.042.962.926.636.642.3
AdaEA73.561.459.150.961.236.953.825.033.437.3
Ours (Ensemble)96.994.594.291.594.382.594.771.777.181.5

In the identical 4-model ensemble setting, CORTA significantly outperforms both Ens and AdaEA, demonstrating its effectiveness in multi-surrogate scenarios and its ability to further enhance cross-architecture transfer.

Leveraging multiple surrogate models increases transfer success rates across both CNN-based and ViT-based target models. For example, CORTA improves transfer performance to ViT targets by 24.1% compared to using a single ResNet-18 surrogate. Similarly, compared to a single ViT-Tiny surrogate, transfer performance to CNN and ViT targets increases by 25.1% and 12.8%, respectively. This demonstrates that using multiple, diverse surrogates helps bridge the gap in transferability between very different model architectures.

W.3 This paper briefly mentions hyperparameters (e.g., perturbation probability ρ=0.5\rho = 0.5, blending proportion λ\lambda sampled from U[0.25,1], and trade-off coefficient β\beta tuned empirically), but lacks a detailed ablation study to justify these choices. For instance, the choice of λmin=0.25\lambda_{min}=0.25 for feature blending is not explained, nor is the reananalysis of how sensitive CORTA's performance is to variations in β\beta or ρ\rho. This omission limits the understanding of the method's robustness to hyperparameter settings.

A.3 In the following, we provide a more comprehensive explanation of the reasoning behind the selection of the key hyperparameters: ρ\rho, λmin\lambda_{min}, and β\beta -- we will revise the paper to better explain our selection of these hyperparameters.

  1. Choice of ρ\rho: We set ρ=0.5\rho = 0.5 based on optimization success on the surrogate model. If ρ\rho is too high (e.g., 0.8), the feature perturbation becomes overly strong, disrupting gradient signals and reducing optimization success. Conversely, if ρ\rho is too low (e.g., 0.2), the perturbation is too weak, resulting in limited transferability to the target model. Therefore, we selected ρ=0.5\rho = 0.5 for all experiments to achieve a balanced trade-off.

  2. Choice of λmin\lambda_{min}: Similarly, setting λmin\lambda_{min} too high or too low degrades optimization success on the surrogate model. We observed stable performance with λmin\lambda_{min} in the range [0.1, 0.3], and thus chose λmin=0.25\lambda_{min} = 0.25 for all experiments in the paper.

  3. Choice of trade-off coefficient β\beta: β\beta is chosen to balance the magnitudes of the two optimization losses in the surrogate model, ensuring that their contributions are comparable. Consequently, β\beta is not task- or dataset-specific, but rather related to the model architecture. For example, we used β=0.1\beta = 0.1 for ResNet-18 on both ImageNet and CIFAR-100. Additionally, our hyperparameter sensitivity analysis (see Figure 1 and Line 297 of the paper) demonstrates that β\beta yields stable results within the range [0.01, 0.1].

We address the sensitivity of these hyperparameters in Lines 294–299 of the paper and provide visualizations in Figures 1, 2, and 3 of the paper, which illustrate the effects of varying β\beta, ρ\rho, and λmin\lambda_{min}. As shown, CORTA maintains stable performance for β\beta in the range [0.01, 0.1], ρ\rho in [0.5, 1], and λmin\lambda_{min} in [0.1, 0.3].

We hope this clarifies our hyperparameter choices and their sensitivity. We will ensure this discussion is included in the revised paper.

评论

Thank you for your detailed experimental additions and parameter explanations. These have addressed my major concerns. I have already raised my score from 4 to 5 in the final justification.

评论

Thank you for your positive feedback and for raising the score. We are pleased that our additional experiments and explanations could address your concerns. We sincerely appreciate your time and careful review of our work.

审稿意见
4

The authors propose a novel method to generate the transferrable black-box adversarial attacks by incorporating the so called distributionally robust optimization technique leading to the following ingredients: 1) W+ΔWW+\Delta W analysis; 2) layer-wise feature perturbations. As a result, they proposed a very competing results on top of both singular and ensemble-based attack methods.

优缺点分析

Strengths:

  • The very fast speed - see Table 5
  • Performance on the well-known datasets like CIFAR-100 and Imagenet is significantly higher than of the competitors numbers

Weaknesses:

  • Theoretical analysis is quite weak, mainly because:
    • Line 148: "small ΔW\Delta W" - a very weak hypothesis as the setup is a black-box and targeting any neural network
    • Line 173: a Lipschitz constant LlL_{l} can be quite huge so the last term in the optimization objective (6) can be much higher than the gW(x,δ)F||g_{W}(x, \delta)||_{F}
  • No real data (e.g., a comparison table) for the optimization success rate - it was briefly mentioned in lines 308-309, but no the real analysis
  • Minor remarks:
    • line 127: better to use another letter than round LL for a set of selected layers - because it is also the notation of the loss function
    • line 184: the incorrect superscript 1 in here: gW(x,δ)F1||g_{W}(x, \delta)||_{F}^{1}
    • line 206: the blending probability ρ\rho should use the different notation - the same letter stands for the upper bound on ΔW|\Delta W|

[1] Szegedy, Christian, et al. "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199 (2013).

问题

Questions:

  • Why is stochastic feature blending applied when the output size is 1/16\leq 1/16 of the input (lines 244-245)? No reason behind it, no ablations
    • What about transformers?
  • Unclear why the blending probability ρ=0.5\rho=0.5 is chosen from the ablation study (Figure 2), because it seems that the best is somewhere around 0.8?

局限性

yes

最终评判理由

Update Increasing the score based on authors' comments.

格式问题

N/A

作者回复

W.1.1 Line 148: "small ΔW\Delta W - a very weak hypothesis as the setup is a black-box and targeting any neural network

A.1 For ΔW\Delta W, we lower the upper bound to minimize loss variations across models (see Eq. at line 170 of the paper, which holds for all ΔWFρ\|\Delta W\|_{F} \leq \rho, note that ρ\rho can be large). This approach addresses the worst-case scenario, ensuring the DRO objective is satisfied even under the most adverse conditions. We have made a minor correction to the equation at line 170 in the revised paper, but the underlying idea remains unchanged.

W.1.2 Line 173: a Lipschitz constant LlL_l can be quite huge so the last term in the optimization objective (6) can be much higher than the gW(x,δ)F||g_{W}(x, \delta)||_{F}

A.2 The Lipschitz constant LlL_{l} is a model-dependent term that appears in the upper bound of the optimization objective, but it does not directly participate in the optimization of adversarial samples. It is independent of the gradient term gW(x,δ)F||g_{W}(x, \delta)||_{F}, which is minimized during optimization. Therefore, even if LlL_l is large, CORTA only optimizes components directly related to the perturbation δ\delta, and does not rely on LlL_l. As a result, the magnitude of the Lipschitz constant has no impact on our optimization process.

W.2 No real data (e.g., a comparison table) for the optimization success rate - it was briefly mentioned in lines 308-309, but no the real analysis.

A.3 We present the following comparison table, which shows the optimization success rates of baseline methods on ImageNet using ResNet-18 as the surrogate model:

AdmixEnsAdaEADHFBFAANDAOurs
Optimization Success Rate97.010010099.499.496.369.9

As shown in the table, most baseline methods (Ens, AdaEA, DHF, BFA) achieve optimization success rates close to 100%, whereas our method (CORTA) achieves a success rate of 69.9%.

CORTA’s surrogate success rate is lower than that of baseline attacks mainly due to two factors inherent to its DRO formulation: (1) CORTA optimizes two objectives—representation and parameter channels—rather than a single objective, increasing optimization difficulty; and (2) the feature blending operation, which incorporates the original sample’s latent features, can interfere with the adversarial objective and further reduce success rates on the surrogate model.

However, this lower surrogate success rate is not a practical issue. Attackers can simply discard unsuccessful adversarial examples using the surrogate model and retain only those that succeed, which tend to have higher transfer success rates. In practice, this only slightly increases computational cost, as generating the same number of successful examples requires optimizing about 1.43 times more samples (e.g., 100/69.9 compared to attacks with 100% surrogate success rate).

We will include these updated results and explanations in the revised version.

W.3 Minor remarks

A.4 We will make all necessary revisions in accordance with your valuable suggestions.

Q.1 Why is stochastic feature blending applied when the output size is 1/16\leq 1/16 of the input (lines 244-245)? No reason behind it, no ablations What about transformers?

A.5 We adopted the "output size 1/16\leq 1/16 of the input" setting to align with the CFM method ([21] in the paper), which is similar to DHF but focusing on targeted adversarial attacks.

However, this constraint is not essential. Our experiments show that applying stochastic feature blending to all layers—using ResNet-18 as the surrogate—still outperforms baseline methods, as shown in the table below:

Blending layerRN-50WRN-101BiT-50BiT-101Avg(CNN)ViT-BDeiT-BSwin-BSwin-SAvg(ViT)
1/16\leq 1/16 of the input98.595.895.592.495.547.663.854.264.157.4
All layer97.296.094.190.894.547.464.356.064.057.9

or Transformer architectures, we apply stochastic feature blending to all linear layers. This setting will be clearly explained in the revised version of the paper.

Overall, this parameter is not critical, and we will update its description accordingly in the revision.

Q.2 Unclear why the blending probability ρ=0.5\rho = 0.5 is chosen from the ablation study (Figure 2), because it seems that the best is somewhere around 0.8?

A.6 Thank you for this insightful question. Our choice of ρ=0.5\rho = 0.5 was based on optimization performance on the surrogate model. When ρ\rho is too high (e.g., 0.8), the feature perturbation can overly disrupt gradient signals, reducing optimization success on the surrogate. Conversely, when ρ\rho is too low (e.g., 0.2), the perturbation is insufficient, limiting transferability. Thus, ρ=0.5\rho = 0.5 was selected as a balanced value.

While the ablation study indicates that ρ=0.8\rho = 0.8 achieves better performance on the target model, our choice of ρ=0.5\rho = 0.5 does not yield the highest transfer success rate on the target. However, as black-box attackers, we cannot tune hyperparameters based on the target model. Therefore, we selected ρ=0.5\rho = 0.5 based on the optimization success rate on the surrogate model, even though it may not be optimal for the target model.

评论

Thanks authors for the ablations. I'll increase the score.

评论

Thank you for your feedback and for increasing the score. We appreciate your time and consideration in reviewing our work.

审稿意见
5

The paper introduces CORTA, a novel transfer-based black-box adversarial attack that enhances transferability by addressing decision-boundary variation and representation drift. It frames transferability as a distributionally robust optimization problem, modeling target model variability through parameter and representation perturbations on a single surrogate model. CORTA employs lightweight first-order approximations with theoretical guarantees to ensure robust misclassification. Extensive experiments on ImageNet and CIFAR-100 demonstrate CORTA’s superior performance, achieving a 19.1% higher transfer success rate compared to state-of-the-art baselines, including ensemble methods, across diverse architectures like ResNet-18 and Swin-B. The paper’s key contributions include a consensus-robust formulation, dual-channel surrogate modeling, principled optimization, and the CORTA attack itself, setting a new benchmark for black-box adversarial evaluation.

优缺点分析

Strengths The paper presents a highly innovative approach with CORTA, effectively addressing adversarial transferability by modeling decision-boundary variation and representation drift through a consensus-robust optimization framework. Its dual-channel surrogate modeling, combining parameter and representation perturbations, is both novel and theoretically grounded, with first-order approximations ensuring scalability and provable guarantees. CORTA’s superior empirical performance, achieving higher transfer success rate over state-of-the-art baselines on ImageNet and CIFAR-100, demonstrates its robustness across diverse architectures like ResNet-18 and Swin-B. The clear articulation of contributions and rigorous experimental validation make this work a significant advancement in black-box adversarial attacks. Weaknesses No significant weaknesses were identified in the paper. The theoretical analysis is convincing, the methodology is robust, and the experimental results are compelling, leaving little room for critique.

问题

Given the paper’s strong theoretical foundation and impressive empirical results with CORTA, no glaring issues stand out.

局限性

yes

最终评判理由

This paper exceeds the standards for acceptance, offering novelty, strong empirical results, and theoretical depth. It is a valuable addition to the literature and is recommended for publication in its current form.

格式问题

No major formatting violations detected.

作者回复

Thank you for your thoughtful review and positive feedback. We are glad to hear that the theoretical foundation and empirical results of CORTA were well-received. We appreciate your recognition of the strengths of the paper, and we will continue to refine and expand on the ideas presented in future work.

审稿意见
4

The paper introduces CORTA, a consensus-robust transfer attack for adversarial examples in black-box settings. The key innovation lies in modeling two primary sources of transfer failure—decision-boundary variation and representation drift—as parameter and representation perturbations on a surrogate model. The authors formalize transferability as a distributionally robust optimization (DRO) problem over an uncertainty set of plausible target models and provide efficient first-order approximations with theoretical guarantees. Experiments on ImageNet and CIFAR-100 demonstrate that CORTA outperforms state-of-the-art transfer-based attacks, including ensemble methods, across diverse architectures (CNNs and transformers). Notably, CORTA achieves a 19.1% higher transfer success rate than the strongest baseline when transferring from ResNet-18 to Swin-B on CIFAR-100.

优缺点分析

Strengths

  • Formulation: The dual-channel perturbation framework (parameter and representation) is a new perspective on adversarial transferability. The DRO formulation is theoretically grounded and well-motivated. The first-order approximations (linearization for parameters, Monte Carlo for representation blending) are computationally efficient and supported by provable bounds.
  • Experiments: CORTA consistently outperforms baselines, including ensemble methods, across diverse architectures (CNNs, transformers) and datasets (ImageNet, CIFAR-100). It is also effective against defended models (e.g., adversarial training, input transformations) highlight robustness.
  • Analysis: The paper provides formal guarantees (Eq. 6) for the worst-case loss upper bounds. Ablation studies validate the contributions of each component (parameter regularization, feature blending).

Weaknesses

  • CORTA requires second-order derivatives (per-sample Hessians), which limits batch parallelization. While the authors argue the cost is manageable, this could hinder scalability for very large models.
  • CORTA’s optimization success rate on the surrogate (69.9%) is lower than baselines (e.g., ANDA: 96.3%). The authors justify this by emphasizing target-model success, but this trade-off deserves deeper analysis.
  • Although the performance is stable within tested ranges (e.g., β∈[0.01,0.1]), the paper does not explore why these ranges work best or how they generalize to other tasks.

问题

  1. How does CORTA’s computational cost scale with model size (e.g., ViT-Large)? Could approximations mitigate this?

  2. Optimization Trade-off: Why does CORTA’s surrogate success rate drop compared to baselines? Is this inherent to the DRO formulation?

  3. Hyperparameter generalization: (1) Are the chosen hyperparameters (e.g., β=0.1) task-specific, or can they be generalized? An analysis on additional datasets would help. (2) eps = 16/255 is quite large in AEs. What are the results for typical, smaller values of eps?

局限性

Yes

最终评判理由

The review is fair. I've responded to authors' rebuttal as well.

格式问题

no

作者回复

Q.1 How does CORTA's computational cost scale with model size (e.g., ViT-Large)? Could approximations mitigate this?

A.1 The computational cost of CORTA is closely related to the model size. As the model size increases, the time required to optimize each image also increases. The following table illustrates the optimization time per image using various surrogate models on ImageNet:

ResNet-18ResNet-50ResNet-152ViT-TinyViT-BaseViT-Large
Time(s)1.72.34.82.32.65.1

As shown in the table, the smallest model (ResNet-18) requires only 1.7 seconds of optimization time per image, while the largest model (ViT-Large) requires 5.1 seconds. Notably, even with the ViT-Large model, CORTA’s computational time remains within an acceptable range compared to ensemble methods that utilize multiple smaller models (Ensemble: 5.2s, AdaEA: 18.8s). Furthermore, CORTA achieves high transfer success rates even when using smaller models.

Thank you for the valuable suggestion regarding the use of approximations to mitigate computational costs. Reducing computational overhead is indeed a central focus of our ongoing research. In particular, we are actively exploring gradient approximation techniques to minimize computational expenses. Our goal is to develop a more lightweight approach for generating adversarial examples, while still maintaining high effectiveness.

Q.2 Optimization Trade-off: Why does CORTA's surrogate success rate drop compared to baselines? Is this inherent to the DRO formulation?

A.2 CORTA’s surrogate success rate is lower than that of baseline attacks mainly due to two factors inherent to its DRO formulation: (1) CORTA optimizes two objectives—representation and parameter channels—rather than a single objective, increasing optimization difficulty; and (2) the feature blending operation, which incorporates the original sample’s latent features, can interfere with the adversarial objective and further reduce success rates on the surrogate model.

However, this lower surrogate success rate is not a practical issue. Attackers can simply discard unsuccessful adversarial examples using the surrogate model and retain only those that succeed, which tend to have higher transfer success rates. In practice, this only slightly increases computational cost, as generating the same number of successful examples requires optimizing about 1.43 times more samples (e.g., 100/69.9 compared to attacks with 100% surrogate success rate).

Q.3 Hyperparameter generalization: (1) Are the chosen hyperparameters (e.g., β\beta=0.1) task-specific, or can they be generalized? An analysis on additional datasets would help. (2) eps = 16/255 is quite large in AEs. What are the results for typical, smaller values of eps?

A.3 (1) β\beta is chosen to balance the magnitudes of the two optimization losses in the surrogate model, ensuring that their contributions are comparable. Consequently, β\beta is not task- or dataset-specific, but rather related to the model architecture. For example, we used β=0.1\beta = 0.1 for ResNet-18 on both ImageNet and CIFAR-100. Additionally, our hyperparameter sensitivity analysis (see Figure 1 and Line 297 of the paper) demonstrates that β\beta yields stable results within the range [0.01, 0.1].

(2) We selected ϵ=16/255\epsilon = 16/255 in our main experiments to align with baseline methods such as Admix, DHF, and BFA, since black-box attacks are considerably more challenging than white-box attacks. To further assess the robustness and generalizability of our method, we also conducted experiments with a smaller, more typical value of ϵ\epsilon (ϵ=8/255\epsilon = 8/255).

For these experiments, we used ResNet-18 as the surrogate model on the ImageNet dataset, setting ϵ=8/255\epsilon = 8/255, step size α=0.8/255\alpha = 0.8/255, and number of iterations T=100T = 100. The table below presents the Transfer Success Rate (TSR) of CORTA and various baseline methods under these settings:

MethodRN-50WRN-101BiT-50BiT-101Avg(CNN)ViT-BDeit-BSwin-BSwin-SAvg(ViT)
Admix68.860.958.248.359.113.318.815.619.316.8
Ens47.043.344.537.243.026.739.715.520.925.7
AdaEA48.743.843.235.442.823.639.015.220.024.5
DHF86.879.275.565.576.821.331.222.631.526.7
BFA87.380.670.863.375.514.023.520.026.621.0
ANDA77.465.964.652.265.017.827.419.422.521.8
CORTA(Ours)86.983.884.372.681.925.339.228.236.532.3

The results show that even with a reduced perturbation budget (ϵ=8/255\epsilon = 8/255), CORTA consistently outperforms baseline methods. This demonstrates that our approach maintains strong attack effectiveness within a smaller perturbation range, further validating the robustness and generalization capability of CORTA.

评论

Thank you for your detailed responses. I appreciate the clarifications provided.

Given that this paper focuses on decision-boundary variation and representation drift when transferring from surrogate to target models, including CNN-to-transformer transfer, I found and would like to point out a recent related work: "Enabling Heterogeneous Adversarial Transferability via Feature Permutation Attacks" (2025). That work also tackles cross-architecture transfer, and includes MLPs in addition to CNNs and transformers.

While your paper and theirs adopt different methodologies, both address similar core challenges. A more principled comparison between the two, for example by highlighting distinctions in assumptions, objectives, and outcomes/performance, could enhance your contribution. I understand this may be beyond the time limit of rebuttal, but I encourage the authors to consider including such a discussion, even briefly, in the camera-ready version (should the paper be accepted). It would broaden the context of your findings and further strengthen the impact of the work.

评论

Thank you for bringing the Feature Permutation Attack (FPA) to our attention. We were previously unaware of this work. Below, we provide a brief comparison between CORTA and FPA; a more detailed comparison will be included in the revised paper. As the source code for FPA has not been released, we will also attempt to implement FPA ourselves and experimentally compare its performance with ours in the revised version.

Both FPA and CORTA aim to generate transferable adversarial examples. FPA specifically targets the transfer from CNN-based surrogate models to ViT/MLP-based target models. In contrast, CORTA is designed for general transferability, supporting transfers between any surrogate and target model architectures, including CNN to CNN/ViT, and ViT to CNN/ViT. Furthermore, CORTA enables the use of ensembles comprising various architectures (e.g., CNN + ViT surrogate models) to generate transferable adversarial examples (see A2 for Reviewer bSRq).

The two approaches are grounded in fundamentally different mechanisms. FPA is a purely heuristic method that relies on permutation at a feature layer of a CNN-based surrogate model to bridge the gap between the local receptive fields of CNNs and the global attention mechanisms of ViTs and MLPs, thereby enhancing transferability from CNN-based surrogates to ViT/MLP targets. In contrast, CORTA adopts a more systematic approach by explicitly modeling the discrepancies between surrogate and target models—specifically, differences in decision boundaries and latent representations. This leads to the formulation of transferable adversarial example generation as a distributionally robust optimization (DRO) problem, which is then simplified into a practical solution with accompanying theoretical analysis, thereby offering stronger theoretical guarantees.

The experimental results reported in this rebuttal and in the FPA paper offer a preliminary comparison of their performance. Our findings indicate that CORTA achieves higher transferability than FPA when transferring from CNN to CNN and from CNN to ViT. Moreover, CORTA supports two additional scenarios not addressed by FPA: (1) CORTA exhibits substantially higher transferability to ViT targets when using a ViT surrogate model, and (2) when employing an ensemble of CNN and ViT surrogate models, CORTA demonstrates high transferability to both CNN and ViT target models.

最终决定

This paper proposes a transfer attack method named CORTA (consensus-robust transfer attack) to mitigate the two factors that hinder transfer attack: decision-boundary variation and representation drift.

The reviewers appreciate the strengths of the paper: (Gq3G) dual-channel perturbation framework is a new perspective, DRO formulation is theoretically grounded, CORTA outperforms baselines, provides formal guarantees, (wbsA)  effectively addressing adversarial transferability by (dual-channel surrogate modeling) through a consensus-robust optimization framework, novel and theoretically grounded, first-order approximations ensuring scalability and provable guarantees, (Z58m) fast, significant performance, (bSRq) rigorous theoretical analysis, innovative conceptualization of transferability,

The reviewers also find the weaknesses of the paper: (Gq3G) second-order derivative, lower success rate on the surrogate than baselines, superior empirical performance, (Z58m) weak theoretical analysis, no real data, (bSRq) no error bars, discrepancy between the surrogate models, no ablation study.

After the discussion, many concerns seem resolved and clarified(e.g, ImageNet evaluation, error bars, ablation study), and the reviewers agreed upon accepting the paper as its dual-channel framework and DRO formalization are expected to make a clear contribution of the community.