PaperHub
4.8
/10
Poster3 位审稿人
最低2最高3标准差0.5
3
3
2
ICML 2025

Pixel2Feature Attack (P2FA): Rethinking the Perturbed Space to Enhance Adversarial Transferability

OpenReviewPDF
提交: 2025-01-23更新: 2025-07-24
TL;DR

We shift the perturbation from pixel space to feature space and perturb important features multiple times along the direction of feature importance within the feature space.

摘要

关键词
TransferabilityAdversarial ExampleAI Security

评审与讨论

审稿意见
3

This paper introduces Pixel2Feature Attack (P2FA), a novel approach aimed at enhancing the transferability of adversarial examples in black-box attacks. The main point of the paper is to address the inefficiency of existing feature-level attacks, which tend to perturb features multiple times in pixel space, leading to limited transferability.

给作者的问题

/NA

论据与证据

Yes

方法与评估标准

Yes

理论论述

Yes

实验设计与分析

Yes

补充材料

Yes

与现有文献的关系

/NA

遗漏的重要参考文献

/NA

其他优缺点

Strengths

The theoretical identification of the principle underlying feature-level attacks, which reveals the inefficiency of existing methods in disrupting important features.

The proposal of P2FA, which shifts perturbations from pixel space to feature space, improving the efficiency and transferability of adversarial examples.

Weaknesses

While the paper mentions using different feature importance assessment methods in the ablation study, it does not provide sufficient details on how these methods are implemented or how they differ from each other.

This paper not compare the performance of VIT backbone.

其他意见或建议

/NA

作者回复

We are grateful to the reviewer for their valuable feedback, and we will address the following issues in our response. (If the latex formula is not rendered, please refresh the page.)

Q1: Explanation of Feature Importance Assessment Methods

In Sec. A.1 of the appendix, we provided a brief introduction to the feature importance assessment methods used in different feature-level attacks. However, we did not offer sufficient details regarding the implementation of these methods or the distinctions between them. Next, I will provide a detailed explanation.

  • FIA: In FIA, feature importance WW can be expressed as

    W=Δˉkx=n=1NΔkTn(x)n=1NΔkTn(x)2,  Tn(x)=xMpdn,  MpdnBernoulli(1pd),W = -\bar{\Delta} _ k^x = -\frac{\sum _ {n=1}^N \Delta _ k^{\mathcal{T} _ n(x)}}{|| \sum _ {n=1}^N \Delta _ k^{\mathcal{T} _ n(x)} || _ 2}, ~~ \mathcal{T} _ n(x) = x \odot M _ {p _ d}^n, ~~ M _ {p _ d}^n \sim Bernoulli(1 - p _ d),

    where Δˉkx\bar{\Delta} _ k^x denotes the aggregate gradient, Δkx=l(x,y)fk(x)\Delta_k^x = \frac{\partial l(x, y)}{\partial f_k(x)}, l(,y)l(\cdot, y) denotes the logits output concerning the ground-truth label yy, and MpdnM_{p_d}^n is a binary matrix that satisfies the Bernoulli(1pd)Bernoulli(1 - p_d) distribution, where pdp_d represents the random pixel dropping rate. Therefore, the aggregated gradient Δˉkx\bar{\Delta}_k^x can be summarized as performing a batch of random pixel-dropping transformations on the original image and then unitizing the gradient sum of the feature maps of the transformed images.

  • RPA: The difference between RPA and FIA lies only in the different transformation T\mathcal{T}, which results in different feature importance WW. Specifically, FIA applies random pixel-dropping transformations to the image, while RPA first generates a mask MM of the same size as xx, which is a matrix with all elements of 1. Then, MM is divided into regular, non-overlapping patches PP, with each patch size being n2n^2. Next, we randomly select a subset of patches with the probability of PmP_m: Ppm=Rand(P,pm)P_{p_m}=Rand(P,p_m) and modify them to follow a uniform distribution: PmU[0,1)P_m \sim U[0,1). Therefore, the transformation T\mathcal{T} of RPA can be expressed as Tn(x)=xM\mathcal{T}_n(x) = x \odot M.

  • NAA: In NAA, the feature importance WW can be expressed as follows:

    W=m=1nF(xm,y)fk(xm)m=1nF(xm,y)fk(xm)2W = -\frac{\sum_{m=1}^n \frac{\partial F(x_m, y)}{\partial f_k(x_m)}}{|| \sum_{m=1}^n \frac{\partial F(x_m, y)}{\partial f_k(x_m)} ||_2}

    where F(,y)F(\cdot, y) denotes the softmax output of the true label yy, xm=(1mn)x+mnxx_m = (1 - \frac{m}{n})x' + \frac{m}{n} x and xx' denotes a baseline image. Therefore, the feature importance WW of NAA can be understood as taking nn points along the linear path from the baseline image xx' to the input image, calculating the unitized result of the gradient sum of the feature map of these nn points, and finally adding a negative sign.

  • DANAA: The difference between DANAA and NAA lies only in that the paths used to compute xmx_m, which results in different feature importance WW. DANAA uses a non-linear path instead of the linear path. Specifically, xm=x0+k=0m1Δxkx^m = x^0 + \sum_{k=0}^{m-1} \Delta x^k, where Δxk=lrsign(F(xk)xik)+N(0,σ)\Delta x^k = lr \cdot \text{sign}\left( \frac{\partial F(x^k)}{\partial x^k_i} \right) + N(0, \sigma), Fxik()\frac{\partial F}{\partial x^k_i}(\cdot) is the partial derivative of FF to the ii-th pixel, lrlr denotes learning rate and N(0,σ)N(0,\sigma) denotes Gaussian noise.

  • SFVA: In SFVA, the feature importance WW can be expressed as follows:

    W=W^=1ci=1NF(xi)fk(xi)W = -\hat{W}^*=-\frac{1}{c} \sum_{i=1}^{N}\frac{\partial F(x'_i)}{\partial f_k(x'_i)}

    where W^\hat{W}^* denotes the optimal feature weights, xi=Scale(Mask(x)+γi)x'_i=Scale(Mask(x)+\gamma_i). Specifically, we sequentially perform random mask, random addition of noise, and scale transformation on the original image NN times, and then compute the unitized result of the gradient sum of the feature map of the transformed images.

  • BFA: In BFA, feature importance WW can be expressed as follows:

    W=I=1Nm=1NF(xmIF,y)fk(xmIF)W =I = \frac{1}{N}{\sum_{m=1}^N \frac{\partial F(x_m^{IF}, y)}{\partial f_k(x_m^{IF})}}

    where xmIFx_m^{IF} denotes the fitted image at the mthmth iteration. Specifically, we compute the average of the fitted gradients of the fitted images with different degrees of fit to represent the feature importance WW.

Q2: Lack of Transformer-based models

The target models we selected, while classic, may not be sufficiently advanced and are exclusively CNN-based. We have additionally included four Transformer-based models, i.e., PiT-S, CaiT-S, DeiT-B, and Swin-B, as target models. The experimental results are presented in the response to Reviewer ASTV’s Q3 and Reviewer Li9f’s Q1. These results demonstrate that our proposed P2FA continues to exhibit higher transferability, further substantiating the effectiveness of our approach.

Finally, we express our gratitude for your valuable feedback, which will greatly contribute to improving the quality of our manuscript. We look forward to your response.

审稿意见
3

This paper theoretically analyzes that existing multi-feature-based attack methods are essentially equivalent to perturbing features once. Correspondingly, a P2FA is proposed to perturb the feature spaces multiple times, by shiting the perturb space from pixel to feature. Extensive experiments were conducted to demonstrate the effectiveness of the proposed method. Overall, this paper proposes the P2FA from a novel and interesting point with sufficiently theoretical analysis. I think this finding can enhance most feature-based adversarial attacks.

给作者的问题

No.

论据与证据

Most claims are supported by its experiments and analysis.

方法与评估标准

Yes.

理论论述

Yes.

实验设计与分析

Yes.

补充材料

No supplementary material was provided.

与现有文献的关系

This paper contributes to the trustworthiness AI.

遗漏的重要参考文献

Yes.

其他优缺点

This paper demonstrates good originality in terms of the integrated analysis of existing feature-level attacks. Also, there is sufficient derivation and experimental evidence to validate the correctness of their conclusion. The insightful conclusion drawn in this paper may offer valuable inspiration for future research endeavors.

However, the target classification models chosen, though classic, are not sufficiently advanced. Also, they are all CNN-based. Most of models are built upon Transformer nowadays, so people may want to know the performance of your attack on more advanced Transformer models.

Another weakness is that the proposed attack method lacks innovation, since it is largely based on previous works.

其他意见或建议

In your derived conclusion “is effectively equivalent to perturbing the features only once along the direction of feature importance in the feature space”, I am not sure “once” refers to one step or updating in one direction only. If you mean perturbing in only one direction, is there a clearer way to express it instead of “once”?

作者回复

Q1: Lack of Transformer-based models

You rightly pointed out that Transformer-based models dominate contemporary research, and there is a legitimate interest in understanding how our proposed attack performs against these more advanced architectures.

In response to your comment, we have followed your suggestion and expanded our experimental evaluation to include four Transformer-based models: Swin Transformer (Swin-B)[1], Data-efficient Image Transformer (DeiT-B)[2], Pooling-based Vision Transformer (PiT-S)[3], and Class-Attention in Image Transformers (CaiT-S)[4]. These Transformer-based models were used as target models by the recent work RPA[5]. The preliminary results are shown in Table 1 (more experimental results can be found in Reviewer ASTV's Q3), demonstrating that our attack method remains effective against these models and outperforms the current SOTA feature importance-based attack, BFA. Detailed results and analysis will be incorporated into the revised manuscript.

Table 1. Attack Success Rates of P2FA(Ours), BFA, and their Combinations with Input Transformations on Transformer-based Models, Using Inception-v3 as the Surrogate Model

ModelAttackSwin-BDeiT-BPiT-SCaiT-SAvg.
Inc-v3BFA42.052.868.852.854.1
Inc-v3P2FA(Ours)43.455.770.753.355.8
Inc-v3PIDI-BFA46.259.673.459.359.6
Inc-v3PIDI-P2FA(Ours)60.375.985.273.073.6

We believe this extension not only addresses the limitation you identified but also enhances the practical relevance of our findings to the current research landscape. We will include a discussion of this limitation, the newly added experimental results, and our planned follow-up experiments in the revised manuscript to reflect your suggestion adequately.

[1] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.

[2] Touvron H, Cord M, Douze M, et al. Training data-efficient image transformers & distillation through attention[C]//International conference on machine learning. PMLR, 2021: 10347-10357.

[3] Heo B, Yun S, Han D, et al. Rethinking spatial dimensions of vision transformers[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 11936-11945.

[4] Touvron H, Cord M, Sablayrolles A, et al. Going deeper with image transformers[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 32-42.

[5] Zhang Y, Tan Y, Chen T, et al. Enhancing the Transferability of Adversarial Examples with Random Patch[C]//IJCAI. 2022, 8: 13.

Q2: Clarification of innovations

Regarding innovation concerns, this paper offers two key contributions.

  • Innovation 1: Discover the inefficiency of existing feature importance-based attacks through mathematical proofs. We would like to highlight that our work stems from a comprehensive analysis of existing feature importance-based attacks. Through mathematical proofs (the latest proof see Reviewer ASTV's Q1) and experimental validation, we reveal that existing feature importance-based attacks, relying on multiple pixel-space perturbations, equate to just one step in the feature space along feature importance. Therefore, the perturbation efficiency of existing feature importance-based attacks is inefficient, which also motivates the proposed P2FA to apply feature-space perturbations directly to improve the perturbation efficiency.

  • Innovation 2: A new paradigm for transforming perturbed space from pixel to feature space. Based on the above proof and experiments, P2FA shifts perturbation from pixel to feature space, applying multiple efficient perturbations to critical features, boosting transferability across models. This redefines feature importance-based attack paradigms, and we will clarify these innovations in the revised introduction and methodology sections.

We are grateful for your feedback, which has prompted us to refine our exposition. In the revised manuscript, we will expand the discussion in the introduction and methodology sections to clearly articulate this innovation and underscore our contributions.

Q3: Poorly expressed

We thank you for noting ambiguity in our phrasing: “is effectively equivalent to perturbing the features only once along the direction of feature importance in the feature space.” We meant one perturbation step in the feature space matches the effect of multiple pixel-space perturbations. To avoid confusion, we will revise it to: “is effectively equivalent to perturbing the features in one step along the direction of feature importance in the feature space.” This clarification will improve readability in the updated manuscript.

审稿意见
2

In this paper, the authors propose Pixel2Feature Attack (P2FA) to enhance the transferability of feature-based attack across different DNN models.

To enhance the efficiency, the proposed P2FA shifts the disturbance space from the pixel space to the feature space. Specifically, P2FA perturbs feature maps within the feature space by following the direction of dynamically updated feature importance, and then generate adversarial samples through feature inversion.

Experiments using the ImageNet benchmark dataset demonstrate that the proposed P2FA method achieves the better attack transferability compared to state-of-the-art approaches. Moreover, ablation study is conducted to analyze the impact of training factors on the attack success rate.

update after rebuttal

Thank you to the author for addressing the raised questions. However, several critical concerns remain unresolved.

The rebuttal provides derivations suggesting that Eq. 11 serves as an upper bound for Eq. 8. However, there is no explicit connection established between the solutions of Eq. 8 and Eq. 11. Therefore, the conclusion that perturbing features once is equivalent to perturbing pixels multiple times remains questionable.

Moreover, the methodology heavily relies on the feature importance computed by the BFA method, resulting in interactions between the input pixel space and the feature space during perturbation training. As a result, the specific contribution of perturbing the feature space remains unclear.

For these reasons, I have decided to maintain my original score of “Weak Reject”.

给作者的问题

  1. Could the authors provide further clarification on why perturbing the feature only in the feature space can achieve the optimal solution of Eq.11 under the L-infinity constraint?

  2. As the feature importance WtW_t is iteratively updated from the input space, this additional computation raises concerns regarding the algorithm's time efficiency. Could the authors provide a comparison of P2FA’s efficiency against state-of-the-art methods?

  3. Could the authors clarify why PIM and DIM were integrated with feature-based attacks while recent input transformation methods, such as SIA and BSR, were not considered?

  4. Could the authors clarify why this optimization algorithm requires a large step size to update the feature map in the latent feature space? Additionally, in Fig.3, why is the attack performance not sensitive to the hyperparameter, particularly when the step size exceeds 10310^3?

论据与证据

  1. The claim that perturbing multiple times in the pixel space is equivalent to perturbing once in the feature space lacks rigorous theoretical support.

  2. Regarding the efficiency of feature attacks, this paper focuses solely on the attack success rate but lacks an analysis of efficiency itself, such as computational time cost.

  3. The feature importance WtW_t is updated using the fitted image obtained from the input space in each iteration. However, this approach does not effectively address the previously stated inefficiency issues.

方法与评估标准

Yes, the proposed methods and evaluation criteria are relevant and appropriate for addressing the problem.

理论论述

In Section 3.2, it is unclear how the conclusion that “perturbing multiple times in pixel space” equals “perturbing once along the direction of feature importance in feature space” was derived from Eq.11. To be specific, Eq.11 does not have a closed-form solution; Eq.11 should be a constrained optimization problem, rather than the simplified representation shown in Figure 2.

实验设计与分析

  1. The input transformation methods (e.g., PIM and DIM) considered are not the most up-to-date.

  2. In Table 4, only two defended CNN models are used as target models. To ensure a more comprehensive evaluation, additional models should be included, refer to the experimental settings used in FIA and RPA.

  3. In Section 4.1, the rationale behind choosing a large step size in the feature space for the experiments is unclear. Additionally, no clear justification is provided for defining the number of perturbations as 3.

补充材料

Sections A.1-A.4 have been reviewed.

与现有文献的关系

This paper focuses on the literature of feature-based attacks, leveraging feature importance from the BFA method to generate adversarial samples.

遗漏的重要参考文献

This paper discusses both feature-based attacks and input transformation attacks. However, the cited input transformation attacks are not up-to-date, as recent methods such as SIA [1] and BSR [2] are not included in the discussion.

[1] Structure Invariant Transformation for better Adversarial Transferability. ICCV 2023.

[2] Boosting Adversarial Transferability by Block Shuffle and Rotation. CVPR 2024.

其他优缺点

  1. Eq.12 does not clearly explain how the cross-entropy loss is defined between fkf_k and yy.

  2. The experiment demonstrates that the proposed method enhances attack transferability; however, there is no clear evidence indicating an improvement in efficiency.

其他意见或建议

In line 214, the parameter ss is introduced, but its role is not clearly explained in this context.

作者回复

We thank the reviewer for their valuable feedback, which we address below. (If the latex formula is not rendered, please refresh the page.)

Q1: Theoretical Proof

We appreciate the reviewer spotting an error in Eq. (11). We omitted the constraint on xadvx^{adv}. Next, we will focus on rigorously proving that multiple pixel-space perturbations in Eq. (8) equal a single feature-space perturbation along feature importance WW. First, we equivalently rewrite Eq. (8) as:

argmaxxadvW,fk(xadv)fk(x), s.t. xadvxpϵ.\underset{x^{adv}}{\arg \max} \langle W, f_k(x^{adv}) - f_k(x) \rangle,~ s.t.~||x^{adv} - x ||_p \leq \epsilon.

Then, using the Cauchy–Schwarz inequality (u,vu2v2\langle u, v \rangle \leq ||u||_2||v||_2, equality when u=sv,s0u = s \cdot v, s \geq 0), the following inequality still holds:

W,fk(xadv)fk(x)W2fk(xadv)fk(x)2\left \langle W, f_k(x^{adv}) - f_k(x) \right \rangle \leq ||W||_2 || f_k(x^{adv}) - f_k(x) ||_2

with equality holding when fk(xadv)=fk(x)+sW(s0)f_k(x^{adv}) = f_k(x) + s \cdot W (s \geq 0) and xadvxpϵ||x^{adv} - x||_p \leq \epsilon, also achieving the optimum. The role of ss is to ensure that the adversarial example obtained through feature inversion satisfies xadvxpϵ||x^{adv} - x||_p \leq \epsilon. In practice, we additionally add a clip function to ensure this. In other words, we only need a single feature-space perturbation sWs \cdot W and satisfy the ϵ\epsilon-ball to achieve the optimum of Eq. (8). Experiments in Sec. 3.2 also validates the correctness of the claimed conclusion. We’ll update the manuscript with a more detailed version of this proof and revise Fig. 2.

Q2: Efficiency

We clarify that efficiency in the submitted manuscript refers to fewer perturbation iterations (T=3T=3 for P2FA vs. T=10T=10 for baseline). As you rightly noted, efficiency also includes time efficiency. To fairly compare time efficiency, we tuned BFA’s hyperparameters (TT, NN) to match P2FA’s success rate on ImageNet-NIPS with Inception-v3 on an RTX 4090. P2FA (T=3,N=30T=3, N=30) achieves 84.1% average success in 0.616s/example, while BFA (T=50,N=200T=50, N=200) reaches 83.9% but in 1.577s/example. BFA is 2.5x slower than P2FA. We’ll add detailed data to the revised manuscript.

Q3: SIA and BSR

SIA and BSR are two highly commendable works. In response to your feedback, we have conducted additional experiments integrating P2FA with SIA and BSR. Partial experimental results are presented in the table below.

AttacksInc-v3*IncRes-v2Res-152Vgg-16Swin-BDeiT-BPiT-SCaiT-sAvg.
BFA+BSR100.089.185.792.034.345.862.646.569.5
P2FA+BSR100.096.296.297.356.673.683.169.184.0
BFA+SIA100.093.492.695.947.166.076.864.079.5
P2FA+SIA100.098.998.799.468.181.891.580.889.9

Results show P2FA+SIA and P2FA+BSR achieve higher transferability than SOTA BFA with SIA and BSR. Full results will be added to the revision.

Q4: Step Size

Q4.1: Large Step Size

P2FA requires a large step size for effective perturbations due to small WtW_t values (the average order of magnitude is between 10610^{-6} and 10510^{-5}) relative to the intermediate layer’s feature fk,tf_{k,t} (the average order of magnitude is 10110^{-1}).

Q4.2: Low Sensitivity Beyond 10310^3

Clipping within the ϵ\epsilon-ball limits the feature-space perturbation range when s>103s > 10^3, reducing sensitivity. The average values of pre- and post-clip features confirm this.

ssPre-clip(t=0/1/2)(t=0/1/2)Pose-clip(t=0/1/2)(t=0/1/2)
110.674/0.675/0.6780.675/0.677/0.678
10100.679/0.680/0.6830.675/0.677/0.678
10210^20.726/0.732/0.7390.675/0.679/0.682
10310^31.410/1.509/1.5660.697/0.731/0.757
10410^410.930/11.316/11.5440.766/0.820/0.848
10510^5108.424/110.887/112.9080.788/0.837/0.860
10610^61075.449/1096.937/1114.7030.790/0.838/0.861

Q4.3: Meaning of ss

ss is the scaling factor from the Cauchy–Schwarz inequality, acting as the step size along WW. We’ll clarify this in the revision.

Q5: Defense Models

Regrettably, we were unable to find PyTorch versions of Ens3-Inc-v3, Ens4-Inc-v3, and Adv-IncRes-v2, but tested P2FA on PiT-S, CaiT-S, DeiT-B, and Swin-B from RPA, showing higher transferability (see Reviewer Li9f’s Q1). Lastly, if you could kindly provide a solution for utilizing above defense models in PyTorch, we would be delighted to conduct further validation of our proposed method.

Q6: Eq. (12)

In gradient-based attacks, cross-entropy J(x,y)=1ylogsoftmax(f(x))J(x, y) = -\mathbb{1}_y \cdot \log \text{softmax}(f(x)) is used to update the input image xx. We shift it to J(fk,y)=1ylogsoftmax(fkpost(fk))J(f_k, y) = -\mathbb{1}_y \cdot \log \text{softmax}(f_k^{post}(f_k)) for feature updates, where fkpostf_k^{post} is the post-kk-th-layer model part. We’ll add this explanation to the revision.


We sincerely thank you for enhancing our manuscript’s quality.

最终决定

This paper presents P2FA, a method to enhance the transferability of feature-based attack across different DNN models. The experimental results show the superiority of the proposed method, though missing references and missing baselines are pointed out. At the same time, concerns about the quality of the paper were raised.