PaperHub
8.2
/10
Spotlight3 位审稿人
最低5最高5标准差0.0
5
5
5
3.0
置信度
创新性2.7
质量2.7
清晰度2.7
重要性2.7
NeurIPS 2025

Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness

OpenReviewPDF
提交: 2025-05-11更新: 2025-10-29

摘要

关键词
Adversarial RobustnessSymmetryEquivariance

评审与讨论

审稿意见
5

This paper investigates how incorporating group-equivariant convolutions (rotation and scale) into CNNs can improve adversarial robustness without adversarial training. The authors provide a theoretical analysis showing that equivariant architectures provide stronger robustness guarantees through the CLEVER framework. The evaluation is carried out with diverse CNN architectures integrating equivariant layers. Experiments on CIFAR-10/100/10C demonstrate an increase in adversarial robustness against FGSM and PGD attacks.

优缺点分析

Strengths

  • The paper is well-structured with clear sections
  • Provides rigorous theoretical analysis with formal proofs linking equivariance to adversarial robustness through CLEVER bounds
  • The work addresses an interesting question from a novel architectural perspective
  • Code provided is clear and complete

Weaknesses

  • The mathematical treatment lacks clarity for scale equivariance. Some formulations are unclear to me and appear to work only for rotations and not scaling (e.g., line 192, I think a scaling term should be present).
  • The evaluation is limited to small datasets (CIFAR variants), which raises questions generalization to larger, more complex datasets.
  • The related work cites equivariant networks (G-CNNs, Harmonic Networks, Steerable CNNs), but the authors develop their own simplified models instead of using these existing architectures.
  • No direct comparison with adversarial training is provided, making it difficult to assess the relative benefits of the architectural approach.

typo : line 273: C4 instead of P4

问题

  • Could you clarify the theoretical treatment of scale equivariance?
  • Can you provide insight on how your approach would perform if integrated into equivariant network architectures from the literature?
  • What is the computational overhead of group convolutions compared to standard convolutions?

局限性

yes

最终评判理由

After the authors' rebuttal and discussions, I am updating my recommendation to Accept.

Issues Resolved:

  • Mathematical clarity on scale equivariance: One of my primary concerns was the lack of clarity in the mathematical treatment of scale equivariance. The authors provided a comprehensive response addressing this issue and committed to adding clearer explanations and formal definitions in the revised manuscript.
  • Experimental validation: The authors conducted additional experiments with larger datasets as requested. The new comparison with adversarial training methods provides valuable information on the method's effectiveness.
  • Architectural considerations: The productive discussion with Reviewer 1 regarding architectural choices led to new experiments that convincingly increase the overall quality of the paper.

Weight Assignment: I assign high weight to the resolution of the mathematical clarity issue (50%) and the strengthened experimental validation (30%). The architectural robustness demonstration carries moderate weight (20%).

格式问题

I have no concerns about paper formatting.

作者回复

We thank this reviewer for the thoughtful and constructive feedback.


Weakness 1:

The mathematical treatment lacks clarity for scale equivariance. Some formulations are unclear to me and appear to work only for rotations and not scaling (e.g., line 192, I think a scaling term should be present).

Answer:
We appreciate this insightful question.

While the theoretical analysis in Section 4 effectively establishes robustness guarantees for group-equivariant architectures under the assumption of norm-preserving transformations, such as rotations with orthogonal Jacobians, the extension to scale-equivariant models requires additional consideration.

Scale transformations, unlike rotations, do not preserve norms. Specifically, a scaling transformation xαxx \mapsto \alpha x has a Jacobian Dα=αID_\alpha = \alpha I, meaning that the gradient norm transforms as Dαv=αv\|D_\alpha v\| = \alpha \|v\|. This directly violates the assumption used in Lemma 1 and Theorem 1, which relies on the orthogonality of both the group representation ρ(g)\rho(g) and the transformation Jacobian Dg1D_{g^{-1}}.

As a result, the CLEVER-certified robustness bounds derived under the assumption of norm invariance do not apply directly to scale-equivariant networks. Instead, these bounds must be rescaled to account for the effect of dilation on the gradient magnitude. In particular, the Lipschitz constant of the margin function becomes scale-dependent, and its growth must be explicitly quantified.

Despite this, scale-equivariant networks still promote adversarial robustness through a different mechanism: orbit-averaged gradient smoothing. By aggregating gradients across scaled versions of the input, scale-equivariant architectures reduce high-frequency fluctuations and suppress gradient sensitivity to perturbations that deviate from the scale-induced orbit. This aggregation process, while not norm-preserving, effectively stabilizes model behavior under scale transformations and yields smoother decision boundaries.

To formalize this, we can define the orbit-averaged gradient field over a discrete scale group Gs={α1,,αk}R+G_s = \{\alpha_1, \ldots, \alpha_k\} \subset \mathbb{R}^+ as ϕˉj(x)=1GsαGsfj(αx),\bar{\phi}_j(x) = \frac{1}{|G_s|} \sum_{\alpha \in G_s} \nabla f_j(\alpha x), where each term captures the gradient at a different scale-transformed version of the input. Although these gradients vary in norm due to scaling, the averaging process reduces local gradient variance, contributing to robustness. This is particularly effective when combined with architectural fusion mechanisms, such as channel-wise concatenation or weighted summation of multi-scale features.

While scale-equivariant models do not satisfy the same Jacobian norm invariance as rotation-equivariant models, they still enhance robustness by regularizing the gradient landscape across scales.


Weakness 2:

The evaluation is limited to small datasets (CIFAR variants)

Answer:
We initially selected CIFAR-10, CIFAR-100, and CIFAR-10C as our primary benchmarks to facilitate controlled evaluation across architectural variants, attack settings, and theoretical alignment, all within a feasible computational budget.
To that end, we have conducted additional experiments on the ImageNet-100 subset, a 100-class subset of ImageNet commonly used for mid-scale evaluation. Our results demonstrate that group-equivariant architectures continue to yield consistent improvements in adversarial robustness on this more challenging dataset, while maintaining manageable computational overhead.

Adversarial Robustness on ImageNet-100 – 4-layer Equivariant Model

EpsilonFGSM Accuracy (%)PGD Accuracy (%)
0.0125.3614.52
0.0219.9413.58
0.0318.3612.24
0.0417.5610.74
0.0517.169.12
0.0616.847.96
0.0716.626.96
0.0816.486.22
0.0916.325.44
0.1016.224.90
0.3011.003.30

Adversarial Robustness on ImageNet-100 – 10-layer Equivariant Model

EpsilonFGSM Accuracy (%)PGD Accuracy (%)
0.0137.6217.92
0.0225.8412.25
0.0325.418.95
0.0425.185.96
0.0525.005.48
0.0624.844.22
0.0724.762.98
0.0824.702.84
0.0922.342.72
0.1021.262.64


Weakness 3:

the authors develop their own simplified models instead of using these existing architectures.

Answer:
This is a valuable observation. While our architectural design draws inspiration from established equivariant models such as G-CNNs , Harmonic Networks, and Steerable CNNs, we intentionally adopt a modular and interpretable architecture to facilitate controlled experimentation and analysis. Specifically, our design allows for the systematic ablation of individual symmetry-enforcing components—e.g., rotation-only, scale-only, and different fusion strategies (parallel vs. cascaded)—which is more difficult to achieve in tightly integrated models like Steerable or Harmonic CNNs.This level of interpretability and control is a key motivation behind our simplified implementation.


Weakness 4:

No direct comparison with adversarial training is provided.

Answer:
This is a fair and valuable critique. The table below compares the adversarial robustness of a Standard CNN with adversarial training against a Fully Equivariant G-CNN without adversarial training on CIFAR-10. Remarkably, the G-CNN achieves comparable or even superior performance under certain perturbation levels, highlighting the intrinsic robustness conferred by equivariance alone.

EpsilonFGSM (%) - Standard CNNPGD (%) - Standard CNNFGSM (%) - G-CNNPGD (%) - G-CNN
0.0174.567.073.0164.96
0.0270.260.470.1658.87
0.0366.154.067.0952.37
0.0461.748.363.7745.52

Question 1:

Could you clarify the theoretical treatment of scale equivariance?

Answer:
Please see the response in Weakness 1


Question 2:

Can you provide insight on how your approach would perform if integrated into equivariant network architectures from the literature?

Answer:
Our modular framework—particularly the parallel and cascaded fusion designs—is architecture-agnostic and readily extensible to more expressive equivariant models. In principle, the rotation-equivariant branch used in our current implementation could be replaced with more advanced architectures such as Harmonic Networks, Steerable CNNs, or LieConv, which provide continuous and steerable representations under group transformations.

Integrating such models may further enhance the robustness and representation capacity of our approach, especially in tasks requiring fine-grained geometric invariance. However, these architectures typically come with increased computational overhead and implementation complexity, which may limit their practicality in some settings.

We will experiment with steerable convolutional variants within our framework. These extensions, along with scalability and implementation considerations, are discussed in the revised manuscript’s Future Work section.


Question 3:

What is the computational overhead of group convolutions compared to standard convolutions?

Answer:
It is worth noting that in our parallel architecture, equivariant branches are introduced only at the first layer of the network. This design ensures that the computational overhead introduced by group-equivariant operations does not scale with network depth, allowing us to retain the benefits of symmetry enforcement while maintaining computational efficiency across deeper models.

Empirically, we find that rotation-equivariant convolutions based on the P4\mathrm{P4} group introduce approximately 1.5× more FLOPs than standard convolutional layers with equivalent dimensions, primarily due to additional orientation channels. Scale-equivariant branches, which process multiple rescaled versions of the input, incur modest additional runtime and memory overhead—typically around 1.7×—owing to interpolation and multi-scale channel stacking.

In the full parallel G-CNN configuration, which includes standard, rotation-, and scale-equivariant branches, the total computational cost scales roughly linearly with the number of branches. However, because each branch can be made shallow or low-dimensional, our design supports flexible, budget-aware trade-offs between robustness, representation capacity, and efficiency.

We will provide detailed runtime and FLOP analysis in the Appendix.

评论

Thank you for the thorough rebuttal and additional experimental results. I am satisfied with how the authors have addressed my concerns and, with the clarifications and results added in the manuscript, I'm ready to increase my score.

Thank you for clarifying that scale-equivariant networks achieve robustness through a different mechanism than rotation-equivariant ones. I suggest to explain this distinction in the final paper to help readers understand the different theoretical foundations for each type of equivariance. The ImageNet-100 experiments demonstrate the approach scales, and the direct comparison with adversarial training shows the practical value of the method.

I also found the response to Reviewer 1 very informative and the modular design rationale is now clear.

评论

Thank you for your thoughtful and encouraging comments. We're pleased that the clarifications and additional results addressed your concerns and that you're inclined to increase your score.

We especially appreciate your suggestion to elaborate on the distinct robustness mechanisms underlying scale- and rotation-equivariant models that is currently underemphasized in the draft. We will make sure to highlight this theoretical distinction more clearly in the revised manuscript. We're also pleased that the scalability results on ImageNet-100 and the comparison with adversarial training helped demonstrate the practical relevance of our approach.

Lastly, we’re grateful for your recognition of our modular design rationale and for your engagement in the review.

审稿意见
5

The paper looks at the role of symmetry aware network architecture for robustness to adversarial attacks. In particular, group-equivariant convolutions (rotation and scale equivariant layers). It proposes two symmetry-aware architectures: a parallel model, processing standard and equivariant features separately before fusion, and a cascaded model, applying equivariant layers sequentially. Theoretically, it provides analyses showing reduced hypothesis space complexity and improved robustness bounds. Empirical evaluations on three datasets demonstrate improvements in robustness under FGSM and PGD attacks without relying on adversarial training.

优缺点分析

Strengths

The architecture used group-equivariant convolutions to incorporate symmetry priors into CNNs. It provides theoretical analysis, demonstrating that equivariant architectures regularize gradient behavior and lead to tighter certified robustness bounds.

Empirical results show improved robustness and generalization across several standard benchmarks (CIFAR-10, CIFAR-100, CIFAR-10C).

The parallel architecture outperforms other designs providing useful insights in choosing architecture for symmetry.

Overall well written. No major typos or errors in the key theoretical equations, lemmas, or proofs.

Weaknesses

The paper’s entire theoretical robustness framework relies on local Lipschitz bounds. While reasonable this limits theoretical analysis to local neighborhood robustness. It does not guarantee robustness against global adversarial perturbations or more general forms of distribution shifts, limiting the practical scope.

Models are tested only on relatively small datasets. It is unclear whether the approach is scalable to larger, more complex datasets (e.g., ImageNet).

There is no discussion on computational cost introduced by equivariant transformations.

Figure 1 and 2 do not include error bars. Since datasets are small it would be better to train with multiple seeds. Also, it is important for reproducibility and reliability.

The motivation behind the chosen architectures in 6.1 is arbitrary. Not clear what are the pros and cons and how one would select the suitable architecture.

Theoretical analysis is for discrete and finite group transformations, would results extend to continuous/Lie group transformations which are common in computer vision tasks.

问题

How much computational overhead do equivariant layers add compared to standard CNN layers?

How robustness guarantees scale with network depth or complexity? Deep architectures may have different performance from shallow models, would this be true for robustness as well.

局限性

Evaluation is limited to standard adversarial attacks, potentially missing broader adversarial settings.

Chosen datasets are relatively simple, unclear how performance would generalize to complex scenarios. Equivariant architectures would result in additional computational cost which is not reported.

最终评判理由

To my comments authors have included additional results. Also, promised to include error bars in final version. I have therefore increased my score.

格式问题

No formatting concern

作者回复

We thank this reviewer for the thoughtful and constructive feedback.

Weakness 1: The paper’s entire theoretical robustness framework relies on local Lipschitz.

Answer:
We agree with the reviewer that our theoretical analysis is focused on local robustness. While this is a common and well-accepted approach for formal robustness analysis, we acknowledge that it does not extend to global robustness or general distribution shifts.
Our primary goal in this work is to establish a precise and tractable theoretical connection between group equivariance and local certified robustness—specifically, how symmetry-enforcing architectures influence the model's local Lipschitz properties and decision margin sensitivity.
This framework serves as a mathematically grounded tool that complements empirical evaluation, offering insights into the role of architectural inductive bias in enhancing robustness. Extending our theoretical framework to global robustness, distributional robustness, or manifold-aware perturbations is an important and promising direction for future work.


Weakness 2: Models are tested only on relatively small datasets.

Answer:
We initially selected CIFAR-10, CIFAR-100, and CIFAR-10C as our primary benchmarks to facilitate controlled evaluation across architectural variants, attack settings, and theoretical alignment, all within a feasible computational budget.
To that end, we have conducted additional experiments on the ImageNet-100 subset, a 100-class subset of ImageNet commonly used for mid-scale evaluation. Our results demonstrate that group-equivariant architectures continue to yield consistent improvements in adversarial robustness on this more challenging dataset, while maintaining manageable computational overhead.

Adversarial Robustness on ImageNet-100 – 4-layer Equivariant Model

EpsilonFGSM Accuracy (%)PGD Accuracy (%)
0.0125.3614.52
0.0219.9413.58
0.0318.3612.24
0.0417.5610.74
0.0517.169.12
0.0616.847.96
0.0716.626.96
0.0816.486.22
0.0916.325.44
0.1016.224.90
0.3011.003.30

Adversarial Robustness on ImageNet-100 – 10-layer Equivariant Model

EpsilonFGSM Accuracy (%)PGD Accuracy (%)
0.0137.6217.92
0.0225.8412.25
0.0325.418.95
0.0425.185.96
0.0525.005.48
0.0624.844.22
0.0724.762.98
0.0824.702.84
0.0922.342.72
0.1021.262.64

Weakness 3: There is no discussion on computational cost.

Answer:
Thank you for pointing this out. Please see our response in question 1.


Weakness 4: Figures 1 and 2 do not include error bars.

Answer:
We appreciate this valuable suggestion. we have included error bars in the draft. Due to limited space, the data is not included here.


Weakness 5: The motivation behind the chosen architectures.

Answer:
We appreciate the reviewer’s concern and agree that a clearer explanation of the architectural design choices is warranted.
To summarize:

  • The Parallel G-CNN offers modular symmetry enforcement by maintaining independent branches (e.g., standard and rotation-equivariant), which enhances robustness but increases model size and computational cost.
  • The Parallel G-CNN with Rotation- and Scale-Equivariant Branches extends this by incorporating additional symmetry priors and enabling more diverse feature extraction.
  • The Cascaded G-CNN provides a more compact design by stacking symmetry-aware layers sequentially, although this may introduce an information bottleneck if the early equivariant layer restricts feature expressiveness.
  • The Weighted Parallel G-CNN replaces hard feature concatenation with learnable fusion weights, offering a flexible mechanism to dynamically adapt to task-specific signal strength from each branch.
  • The Standard CNN serves as the baseline for comparison.

Weakness 6: Theoretical analysis is for discrete and finite group transformations, would results extend to continuous/Lie group transformations which are common in computer vision tasks.

Answer:
This is an excellent observation. Our current theoretical framework is developed for finite discrete groups (e.g., P4 for discrete rotations), which simplifies the analysis and aligns with widely adopted G-CNN implementations in practice.
The extension to continuous group requires generalizing the group action. For discrete groups, actions are defined over a finite set of transformations. In contrast, Lie groups possess a smooth manifold structure, and their actions on the input space are described by differentiable maps. To analyze equivariant robustness in this setting, we consider group elements in a neighborhood of the identity, represented as exponentials of Lie algebra elements. This provides a foundation for defining infinitesimal transformations—the continuous analogs of discrete group elements—and enables us to formulate sensitivity metrics along smooth group orbits. Extending orbit-averaged Jacobian smoothing techniques to Lie groups requires the use of Haar measures—the invariant integration measure on the group. For compact groups such as SO(2)\mathrm{SO}(2), this integration is well-defined and finite. For non-compact groups like \mathrm{SE}(2) $ $ or \mathbb{R}^2 $$, we must either impose bounded supports or consider localized versions of the group action.


Question 1: How much computational overhead?

Answer:
It is worth noting that in our parallel architecture, equivariant branches are introduced only at the first layer of the network. This design ensures that the computational overhead introduced by group-equivariant operations does not scale with network depth, allowing us to retain the benefits of symmetry enforcement while maintaining computational efficiency across deeper models.

Empirically, we find that rotation-equivariant convolutions based on the P4 group introduce approximately 1.5× more FLOPs than standard convolutional layers with equivalent dimensions, primarily due to additional orientation channels.
Scale-equivariant branches, which process multiple rescaled versions of the input, incur modest additional runtime and memory overhead—typically around 1.7×—owing to interpolation and multi-scale channel stacking.

In the full parallel G-CNN configuration, which includes standard, rotation-, and scale-equivariant branches, the total computational cost scales roughly linearly with the number of branches. However, because each branch can be made shallow or low-dimensional, our design supports flexible, budget-aware trade-offs between robustness, representation capacity, and efficiency.


Question 2: How do robustness guarantees scale with network depth or complexity?

Answer:
This is an excellent and important question. From a theoretical perspective, the gradient regularization induced by equivariance propagates through the layers of a network, preserving certain structural properties of the Jacobian and contributing to robustness.
However, in deeper architectures, the influence of symmetry constraints may become diluted, particularly if a large portion of the network consists of standard layers.

Although full-scale experiments on very deep models were constrained by available computational resources, we conducted additional evaluations on ResNet-18 and ResNet-50 architectures to explore scalability. These experiments indicate that while the absolute robustness tends to decrease with depth—likely due to increased model complexity and overfitting—the equivariant models (G-CNN variants) continue to outperform their standard CNN counterparts.
This confirms that the robustness advantage of equivariant design remains valid even as model depth increases, though the relative gain may be reduced.

ResNet50_GCNN on CIFAR-100

EpsilonFGSM Accuracy (%)PGD Accuracy (%)
0.0132.0911.58
0.0227.182.04
0.0323.210.47
0.0419.140.14
0.0515.150.03
0.0611.750.01
0.078.810.01
0.086.310.00
0.094.580.00
0.103.540.00

ResNet18_GCNN on CIFAR-100

EpsilonFGSM Accuracy (%)PGD Accuracy (%)
0.0137.3416.85
0.0231.192.73
0.0326.550.34
0.0422.720.05
0.0519.700.00
0.0616.980.00
0.0714.940.00
0.0813.430.00
0.0911.890.00
0.1010.330.00
评论

I appreciate the effort of authors in responding to my comments. My further comments are below:

Reported PGD accuracies on ImageNet-100 remain very low in absolute terms. Without side-by-side baselines on the same subset, it’s impossible to judge any real advantage from equivariance at scale.

The proposed roadmap for Lie-group equivariance is reasonable, but there are no toy experiments or empirical validations to show it’s actually feasible.

In ResNet-50 experiments, robust accuracy falls to near zero under moderate attacks. Although equivariant models stay slightly ahead, the absolute drop undermines claims of robustness in deeper nets.

I am hoping authors would include error bars as promised in the revision. Given the response to my comments I have increased overall score.

评论

Thank you for your thoughtful follow-up and for acknowledging our efforts with an increased score.

Regarding the discussion of Lie-group equivariance, we agree that empirical validation is important and are currently conducting experiments on Lie-group equivariant models. Hopefully, the results will be available by the end of the discussion phase. That said, we would like to emphasize that Lie-group equivariance is theoretically more expressive and generalizable than its discrete counterparts. Continuous equivariant models can more faithfully capture smooth symmetries present in natural data and allow for finer-grained transformations, which we believe may lead to improved adversarial robustness as well. We are currently exploring implementations of continuous group equivariance and will highlight this in the revised manuscript.

We apologize for omitting the adversarial robustness results of the CNN model on ImageNet-100 in our previous response. Please find the results attached below. As noted, in these equivariant experiments, equivariant design was only applied to the first layer of the network.

Epsilon4L CNN (FGSM / PGD)10L CNN (FGSM / PGD)
0.01    7.18 / 0.14              29.34 / 7.68              
0.02    2.66 / 0.04              20.80 / 7.02              
0.03    1.48 / 0.02              17.72 / 6.52              
0.04    1.12 / 0.00              16.38 / 5.08              
0.05    0.86 / 0.00              15.56 / 4.74              
0.06    0.74 / 0.00              15.30 / 2.41              
0.07    0.66 / 0.00              14.92 / 1.22              
0.08    0.56 / 0.00              14.60 / 0.08              
0.09    0.52 / 0.00              14.16 / 0.02              
0.10    0.48 / 0.00              13.84 / 0.00              

We also acknowledge the concern regarding the significant robustness drop in deeper models such as ResNet-50. It is important to clarify that, in our current 50-layer model, the equivariant design is applied only to the first layer. While the partially equivariant variants do show relative improvements, we will revise our claims to more accurately reflect the limitations and trade-offs associated with using non-fully equivariant designs in deep networks. We confirm that error bars will be included in the revision.

We appreciate your constructive feedback, which has helped us significantly improve the paper.

评论

Thank you for highlighting this important point. We agree that applying equivariance only at the first layer of deep architectures such as ResNet-50 represents a limited form of integration and may not fully leverage the benefits of symmetry.

In this work, we show that equivariant models consistently outperform baseline CNNs of the same depth in terms of adversarial robustness. Our primary focus is to theoretically explore the relationship between symmetry-based architectural priors and adversarial robustness, aiming to understand how incorporating equivariance influences model vulnerability.

While increasing model depth typically improves clean accuracy, it also tends to amplify vulnerability to adversarial perturbations such as FGSM and PGD—a well-documented phenomenon in deep neural networks. This increased susceptibility is partly due to the presence of sharper and more concentrated gradients in deeper networks, which adversarial methods are better able to exploit. This phenomenon likely explains the steep drop in robust accuracy observed in our 50-layer model—where equivariance is applied only in the first layer—and aligns with our claim in the rebuttal regarding the limitations of partial equivariant integration.

Due to computational constraints, our experiments have so far focused on shallower models (e.g., 4-layer and 10-layer architectures), within which fully equivariant designs demonstrate substantial gains in robustness. Nonetheless, we recognize the importance of evaluating the scalability of equivariance in deeper networks. In response, we are currently conducting experiments with fully equivariant versions of deeper architectures (e.g., ResNet-18 and ResNet-50) to more rigorously assess the benefits of extending equivariance throughout the entire network.

We will explicitly acknowledge this limitation in the revised manuscript and include a dedicated section outlining the current design boundaries, their limitations for generalization, and directions for future work.

We appreciate your insightful observations, which help make our work more rigorous and comprehensive.

评论

We have conducted additional experiments using fully equivariant EquiResNet-18 and EquiResNet-50 architectures, where equivariance is enforced throughout all convolutional layers using group-equivariant convolutions.

EquiResNet-50 – Adversarial Robustness

EpsilonFGSM Accuracy (%)PGD Accuracy (%)
0.01    58.73              39.99            
0.02    52.46              20.44            
0.03    48.28              9.68            
0.04    45.13              5.05            
0.05    42.31              2.78            
0.06    40.08              1.71            
0.07    38.18              1.21            
0.08    36.40              0.90            
0.09    34.72              0.64            
0.10    33.01              0.51            

EquiResNet-18 – Adversarial Robustness

EpsilonFGSM Accuracy (%)PGD Accuracy (%)
0.01    35.05              22.84            
0.02    25.19              10.26            
0.03    21.72              6.60            
0.04    19.59              4.50            
0.05    18.14              3.04            
0.06    17.05              1.92            
0.07    16.11              1.18            
0.08    15.27              0.69            
0.09    14.25              0.38            
0.10    13.40              0.27            

In the fully equivariant setting, EquiResNet-50, as the deeper model, outperforms EquiResNet-18 across a range of tested perturbation levels, consistent with the performance trends observed in the 4-layer and 10-layer fully equivariant models. These results highlight the importance of consistent symmetry enforcement across network depth and provide empirical evidence for the scalability and effectiveness of full equivariant integration in deeper architectures. Unlike the standard and partially equivariant model—where robust accuracy under PGD rapidly drops to near zero—the fully equivariant EquiResNet-18 and EquiResNet-50 models maintain higher robustness across the evaluated perturbation levels. Please note that these conclusions are based on the experiments with ResNet-18 and ResNet-50, and we will include these results and release the corresponding code in the updated draft.

Regarding the experiment with the continuous Lie group equivariant model, training is still ongoing due to the substantial computational overhead associated with continuous group transformations. Implementing and optimizing such models is non-trivial, as they involve complex operations—such as continuous convolutions and group integration—that are significantly more demanding than their discrete counterparts. To date, we have completed only 62 epochs over the course of more than two days of training. Despite our best efforts, the results are not yet available, but we plan to include them in the revised draft once training is complete.

We appreciate your comments on the above, which have helped strengthen our work and make it more rigorous and experimentally valid.

评论

Thanks for further comments. I am looking forward to an improved version of manuscript with additional results including error bars.

I still notice the limited gain with scalability of the model. Applying equivariance only to the first layer seems like a limitation. Your ResNet-50 results hit near-zero robust accuracy almost immediately under PGD. Perhaps there is not much value of equivariance only in first layer. Yet the claims are on scalability to deep models.

I would suggest to include a clear limitation section to help reader understand the broad applicability and generalizability of this work

评论

Please let the authors know whether their rebuttal has adequately addressed your concerns. If any issues remain, please communicate your specific, unresolved concerns as soon as possible to ensure timely discussion.

审稿意见
5

The authors present three key theoretical findings regarding the correspondence of equivariance and adversarial robustness: group-equivariant convolutions (1) maintain the Lipschitz constant across the group orbit, (2) yield smoother gradients, and (3) suppress gradients in the off-orbit directions. Experiments show that group-equivariant convolutions alone can improve adversarial robustness on CIFAR-10(C) and CIFAR-100.

优缺点分析

Strengths

  • Two insightful theorems on invariance of the Lipschitz continuity across the group orbit (Theorem 1) and the suppression of off-orbit perturbations (Theorem 2) in group-equivariant convolutions are proven formally.
  • Experiments on CIFAR-10(C) and CIFAR-100 that incorporate a single group-equivariant layer show that adversarial robustness is significantly improved even without traditional methods such as adversarial training.

Weaknesses

  • Experiments are done with a mix of standard and group-equivariant convolutions, and the theoretical findings thus do not directly transfer to the investigated architectures.
  • No results from reference methods, such as standard adversarial robustness, are reported and it is thus difficult to put the reported numbers into context based on this paper alone.
  • The formal proofs assume differentiability at xx, which is not necessarily a given in neural networks.

问题

  • Inhowfar does the usage of the standard convolution branch in the architectures detailed in Appendix C.3 not undermine most of the theoretical results presented in this paper, as some of the layers used do not fulfill those theoretical guarantees?
  • Would you expect a purely group-equivariant convolutional model to exhibit the same benefits? I am willing to update my score if this is either discussed sufficiently or, ideally, shown experimentally.

局限性

Limitations are discussed in Appendix B.

最终评判理由

The authors present interesting findings on the interplay between equivariance to rotation and scale, and robustness. My initial concern was that their theoretical proofs were not sufficiently supported by experimental results, as these were limited to partially equivariant architectures. However, their rebuttal fully addressed these concerns. Combined with the additional results on larger models (ResNet-50) and datasets (ImageNet-100), I am raising my score to a solid "Accept" rating.

格式问题

Citations should be in parentheses in most cases.

作者回复

We thank this reviewer for the great suggestion!


Weakness 1

Experiments are done with a mix of standard and group-equivariant convolutions, and the theoretical findings thus do not directly transfer to the investigated architectures.

Answer:
We appreciate this important observation. Our theoretical analysis is developed under the assumption of purely group-equivariant models, where all layers respect the symmetry constraints imposed by the group action. However, as presented in Appendix C.3, our implemented architectures—particularly the parallel design—intentionally combine standard convolutional layers with group-equivariant branches.

This hybrid structure was chosen for two key reasons. First, it enables a balance between representational expressiveness and computational efficiency. Second, and more importantly for this work, it provides a controlled framework to isolate and evaluate the impact of symmetry-enhancing components on adversarial robustness. Our central aim is not to enforce full equivariance end-to-end, but to investigate how the inclusion of symmetry-aware submodules influences a model’s robustness to adversarial perturbations. While the hybrid architecture does not fully meet the assumptions of our theoretical framework, the equivariant branches themselves do. Their behavior aligns closely with the theoretical predictions derived under group-equivariant assumptions. This is empirically supported by our ablation studies in Appendix D.2, where models containing only rotation-equivariant or scale-equivariant branches consistently outperform standard CNN baselines in adversarial robustness, even in the absence of adversarial training. Thus, the hybrid architecture serves as a practical testbed for validating the theoretical insights and allows us to draw meaningful conclusions about the role of equivariance in improving model robustness.


Weakness 2

No results from reference methods, such as standard adversarial robustness, are reported.

Answer:
This is a fair and valuable critique. Our goal was to investigate the intrinsic robustness of symmetry-enforced architectures without adversarial training or data augmentation, in order to isolate the architectural contribution. The table below compares the adversarial robustness of a Standard CNN with adversarial training against a Fully Equivariant G-CNN without adversarial training on CIFAR-10. Remarkably, the G-CNN achieves comparable or even superior performance under certain perturbation levels, highlighting the intrinsic robustness conferred by equivariance alone.

Adversarial Robustness Comparison on CIFAR-10

Epsilon (\ell_\infty)FGSM Accuracy (%) - Standard CNNPGD Accuracy (%) - Standard CNNFGSM Accuracy (%) - G-CNNPGD Accuracy (%) - G-CNN
0.0174.567.073.0164.96
0.0270.260.470.1658.87
0.0366.154.067.0952.37
0.0461.748.363.7745.52
0.0557.342.160.2337.80

Weakness 3

The formal proofs assume differentiability at xx, which is not necessarily a given in neural networks.

Answer:
We fully agree and appreciate the reviewer’s technical rigor. The theoretical results rely on local Lipschitz continuity and Jacobian-based reasoning, which strictly hold almost everywhere for ReLU networks due to their piecewise linearity. While differentiability may not hold at every point, prior works (e.g., [Weng et al., 2018], [Anselmi et al., 2019]) similarly adopt this assumption as a tractable and widely accepted approximation. We have clarified this point in the revised text, explicitly noting that our results apply under the standard assumption of almost-everywhere differentiability.

Weng, Tsui Wei, Cho Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. In ICLR 2018

Fabio Anselmi and Tomaso Poggio. Symmetry-adapted representation learning. Pattern Recognition 2019


Question 1

In how far does the usage of the standard convolution branch undermine most of the theoretical results...?

Answer:
Thank you for raising this important and nuanced concern. In our parallel architecture, the standard convolutional branch operates independently and is fused with the outputs of the equivariant branches only at later stages of the network. As such, it does not directly contribute to the theoretical guarantees derived under the assumption of full group equivariance.

However, the primary role of the standard branch is to complement the equivariant representations with additional expressive capacity rather than to interfere with or override the symmetry-induced regularization effects. To better understand the extent to which the inclusion of the standard branch affects the theoretical insights, we have conducted additional experiments (see response to the next question), where we compare models with and without the standard branch. These results help quantify the influence of each branch and assess how the theoretical benefits attributed to equivariance manifest in hybrid settings.


Question 2

Would you expect a purely group-equivariant model to exhibit the same benefits? I am willing to update my score if this is either discussed sufficiently or, ideally, shown experimentally.

Answer:
Yes—based on our theoretical framework, we indeed expect a fully group-equivariant model to offer even stronger robustness guarantees compared to hybrid architectures. To validate this expectation empirically, we have conducted additional experiments using both 4-layer and 10-layer models composed entirely of group-equivariant layers. These models were trained on CIFAR-10, CIFAR-100 to assess their performance across datasets of increasing complexity.

The experimental results (included below) demonstrate that purely group-equivariant models consistently achieve superior robustness under both FGSM and PGD attacks, outperforming the hybrid models presented in the main paper. These findings support the theoretical claim that enforcing full equivariance throughout the architecture enhances the model's ability to resist adversarial perturbations. We have added these results to the revised manuscript to further strengthen the empirical foundation of our analysis.


Adversarial Robustness of 4-layer Fully Equivariant Network on CIFAR-10

EpsilonFGSM Accuracy (%)PGD Accuracy (%)
0.0165.6552.20
0.0258.5432.04
0.0353.7823.30
0.0449.9218.63
0.0547.0815.85
0.0644.7213.03
0.0742.8911.05
0.0841.099.50
0.0939.518.17
0.1037.957.01

Adversarial Robustness of 4-layer Fully Equivariant Network on CIFAR-100

EpsilonFGSM Accuracy (%)PGD Accuracy (%)
0.0138.4021.59
0.0230.7112.09
0.0326.638.92
0.0424.056.95
0.0522.065.66
0.0620.584.53
0.0719.393.79
0.0818.033.13
0.0916.992.75
0.1015.962.39

Adversarial Robustness of 10-layer Fully Equivariant Network on CIFAR-10

EpsilonFGSM Accuracy (%)PGD Accuracy (%)
0.0173.0164.96
0.0270.1658.87
0.0367.0952.37
0.0463.7745.52
0.0560.2337.80
0.0656.8930.78
0.0753.4624.18
0.0850.4419.18
0.0947.5715.15
0.1044.9312.46

Adversarial Robustness of 10-layer Fully Equivariant Network on CIFAR-100

EpsilonFGSM Accuracy (%)PGD Accuracy (%)
0.0150.6036.29
0.0245.4227.98
0.0342.0221.34
0.0438.8316.02
0.0536.0912.01
0.0633.339.29
0.0730.947.31
0.0828.525.87
0.0926.414.73
0.1024.684.08
评论

Thank you for succinctly addressing my questions and providing suitable experimental results. I feel much more confident in the evaluation now and will thus increase my score from "borderline accept" to "accept". Regardless, I'd encourage the authors to revisit their messaging concerning the investigated models in either their camera-ready version or a manuscript submitted elsewhere. As other reviewers have also stated, it is not immediately clear how these architectures were derived, and I believe the overall structure would be much more convincing using the fully group-equivariant architectures presented in your rebuttal. The latter can also be adjusted to experiment with equivariance to rotation alone, scale alone, or a combination of both for further ablation studies.

评论

We sincerely appreciate the time and attention you devoted to reading our article, and we are truly grateful for your positive feedback. Your endorsement, which will elevate our evaluation from “borderline accept” to “accept”, is both encouraging and deeply affirming of the significance of our work. Thank you for your thoughtful support.

In the additional experiments provided during the rebuttal, the fully group-equivariant model was constructed by sequentially stacking rotation-equivariant blocks, following the theoretical principles outlined in our main analysis. These blocks were implemented using the e2cnn library (Weiler & Cesa, 2019), which provides tools for building layers that are equivariant under specified group actions. Each layer in the model is composed of an R2Conv operation that maps between representations over the group, followed by an InnerBatchNorm to normalize feature responses in a symmetry-preserving manner. Nonlinear activation is achieved via ReLU applied in the group feature space. To enable spatial downsampling, each layer concludes with a PointwiseMaxPool operation.

The implementation of a single equivariant block is as follows:

self.block = enn.SequentialModule(
    enn.R2Conv(in_type, out_type1, kernel_size=3, padding=1),
    enn.InnerBatchNorm(out_type1),
    enn.ReLU(out_type1),
    enn.PointwiseMaxPool(out_type1, kernel_size=2)
)

This end-to-end equivariant design ensures alignment with the theoretical assumptions made in our robustness analysis.

We are currently conducting experiments with fully scale-equivariant models as well as combined rotation–scale equivariant architectures. Hopefully, the results will be available by the end of the discussion phase. In the revised manuscript, we will provide a clearer explanation of how these architectures are derived and describe how the underlying group structure can be adjusted to isolate rotation equivariance, scale equivariance, or their combination. This flexibility enables more targeted ablation studies while remaining consistent with our theoretical framework.

We believe these updates will strengthen the paper’s empirical contributions and improve its clarity. We sincerely appreciate your thoughtful feedback and guidance throughout the review process.

Reference

Weiler, Maurice, and Gabriele Cesa. General E(2)-Equivariant Steerable CNNs. Advances in Neural Information Processing Systems 32 (2019).

评论

Thank you for further clarifying that the additional results you provided in your rebuttal were for fully rotation-equivariant models, and that you are planning on complementing these results with scale-equivariant models and rotation-scale-equivariant models. I remain confident that your paper is deserving of an "Accept" score and hope other reviewers will also find the time to read through your on-point responses, which I believe do address their concerns well.

评论

Thank you for your continued confidence in our work and for affirming that it deserves an “Accept.” We’re pleased that our clarifications and additional results helped address your concerns and reinforced your confidence in our work.

We also hope the other reviewers will take the opportunity to engage with the updates, as many of the revisions directly address concerns raised across multiple reviews—for example, the comparison with adversarial training noted by Reviewer 3, and the computational considerations highlighted by two reviewers. We believe these additions meaningfully strengthen the paper and will help resolve any remaining questions from the broader review panel.

As mentioned, we have now completed additional experiments involving fully scale-equivariant and combined rotation–scale equivariant models. These new results—including ablation studies and architectural clarifications—will be incorporated into the revised version of the paper to further enhance its empirical depth and transparency.

Adversarial Robustness on CIFAR-10 (FGSM / PGD Accuracy %)

EpsilonScaleEq-4LRotEq-4LRotScaleEq-4LScaleEq-10LRotEq-10LRotScaleEq-10L
0.0148.61 / 44.1865.65 / 52.2052.52 / 44.9359.64 / 57.6573.01 / 64.9665.98 / 54.34
0.0230.02 / 17.9258.54 / 32.0436.94 / 18.9147.61 / 38.9770.16 / 58.8756.23 / 32.85
0.0318.32 / 5.6153.78 / 23.3030.08 / 10.6240.73 / 24.6467.09 / 52.3747.37 / 17.89
0.0411.00 / 1.4049.92 / 18.6326.28 / 7.1436.45 / 14.8863.77 / 45.5239.89 / 9.03
0.056.80 / 0.3347.08 / 15.8524.03 / 4.6732.06 / 9.0160.23 / 37.8033.93 / 4.08

Adversarial Robustness on CIFAR-100 (FGSM / PGD Accuracy %)

EpsilonRotEq-4LScaleEq-4LRotScaleEq-4LRotEq-10LScaleEq-10LRotScaleEq-10L
0.0138.40 / 21.5921.14 / 16.1325.25 / 20.8450.60 / 36.2927.59 / 26.2628.53 / 15.30
0.0230.71 / 12.099.78 / 2.1513.77 / 6.6045.42 / 27.9819.52 / 14.9422.19 / 4.97
0.0326.63 / 8.925.12 / 0.368.92 / 2.1442.02 / 21.3413.86 / 7.9516.35 / 1.25
0.0424.05 / 6.953.10 / 0.046.43 / 0.8138.83 / 16.029.96 / 4.4211.46 / 0.33
0.0522.06 / 5.661.95 / 0.005.14 / 0.3836.09 / 12.017.56 / 2.288.10 / 0.07

Across all configurations, all the fully equivariant models consistently outperform the standard convolutional baseline under both FGSM and PGD attacks. Notably, while combining rotation and scale equivariance yields moderate improvements over scale-only models, it does not surpass the robustness achieved by rotation-equivariant networks alone. This suggests that while incorporating both symmetries has some advantages, the added complexity might not be fully leveraged without targeted or deeper architectural tuning.

评论

Thank you for taking the time to complete these experiments. I believe they complement your theoretical findings well. That a combination of rotation- and scale-equivariance performs worse is an important limitation but also highlights an interesting avenue for future researchers to investigate. To prevent any potential confusion or misunderstanding, I want to clarify that I will raise my score when I submit my mandatory acknowledgement after the author-reviewer discussion period is over.

评论

Thank you for your thoughtful feedback and for acknowledging the value of our additional experiments. We're encouraged to hear that you find the results complementary to our theoretical analysis. We also appreciate your clarification regarding your intent to raise the score after discussion.

最终决定

The authors investigate how group-equivariant convolutions (rotation and scale equivariant layers) can enhance adversarial robustness without adversarial training. They present three key theoretical findings: equivariant convolutions (1) preserve the Lipschitz constant across group orbits, (2) produce smoother gradients, and (3) suppress off-orbit gradients. Theoretical analysis with formal proofs is impressive. All reviewers agree to accept this paper.