PaperHub
5.5
/10
Poster4 位审稿人
最低5最高6标准差0.5
5
6
5
6
3.5
置信度
正确性3.0
贡献度2.8
表达2.3
NeurIPS 2024

Generate Universal Adversarial Perturbations for Few-Shot Learning

OpenReviewPDF
提交: 2024-05-10更新: 2024-11-06
TL;DR

We find the ineffectiveness in generating UAP for open-set scenarios such as Few-Shot Learning. Through in-depth analysis, we point out the two shifts and address them to finally achieve a unified attacking framework.

摘要

关键词
Adversarial AttacksUniversal Adversarial PerturbationsFew-Shot Learning

评审与讨论

审稿意见
5

The authors proposed an attacking framework to generate transferable universal adversarial perturbations on the few-shot learning scenario. The authors align proxy tasks to the downstream tasks and leverage the generalizability of pre-trained encoders to tackle the task shift and the semantic shift. Experiments show that the proposed method outperforms the existing attacking methods using CIFAR-FS as the proxy dataset.

优点

  1. The combination of universal adversarial perturbations and few-shot learning is new.

缺点

The overall writing quality of the manuscript does not meet the requirements of the NeurIPS community. Especially in terms of:

  1. The confusion of chosen words. The words the authors used should be easy to understand and consistent between paragraphs, tables, and figures (e.g. “approximate the form” of the downstream tasks). It is confusing between the meaning of task and class.
  2. The erroneous sentences like: “Since the attacker cannot predict the downstream tasks, we alter the tasks we use during UAP training to align them with the downstream tasks”.
  3. Lacking sufficient explanation of the newly invented/used words. The authors need to explain the newly defined words right after using them (e.g. Base linear).
  4. The unclear figures. In Fig. 4, the reviewers cannot understand the difference and relationship between Base Classifier, Proxy Linear, and Proxy Protos by looking at the different colors and lines. Fig. 5 is an imagination instead of a qualitative illustration with real results that cannot show the effectiveness of proxy protos. The legends in Fig. 6 are confusing.
  5. The unclear motivation. How can the authors state that the downstream task becomes less similar to the pre-training task (64-way full shot) with higher ASR in Fig. 3 (a)? Why did the authors choose the combination of {Cifar, Mini} and {Mini, Tiered} in Fig. 3 (b)? Why the performance of the Mini proxy dataset is generally higher than the Cifar dataset? To sum up, the motivation section is misleading and should be rewritten.
  6. Why using GAP in Sec. 4? Why don’t the authors use the proposed Base classifier?
  7. The performance of the proposed proxy linear is not as good as base linear and there is no explanation for it, making the proposed method unconvincing.
  8. The reviewers would like to see the visualization of t-SNE visualization of the feature space like Fig. 2 in [48] to demonstrate the effectiveness of the proposed proxy protos.
  9. Could the authors better explain the identical and different part between the proposed proxy protos and the [48] in the related work section?

问题

  1. The idea of the proxy dataset is good. However, the authors did not mention why the attacker cannot access the base dataset and there is no definition of the proxy dataset in the previous works (e.g. [29, 48]) of “Traditional Attack”.

局限性

  1. There are no limitations mentioned in the manuscript. The statement in Broader Impact is too weak to serve as the description of limitation. The authors should present and analyze the performance degradation such as the one of diverged proxy tasks in Fig. 6.
作者回复

We truly appreciate your valuable comments. In the following, we respond to the concerns.

Q1: It is confusing between the meaning of task and class. How can the authors state that the downstream task becomes less similar to the pre-training task with higher ASR in Fig. 3 (a)?

Based on the introduction of FSL(lines 91-101), 'task' means a unique classification problem that the model encounters. 'class' or 'way' denotes the individual categories within those tasks. The pre-training task is a 64-way classification problem. As the downstream tasks transition from 64-way to 5-way, the distance to the pre-training task is increased, and a decrease in the ASR can be observed. The decline in ASR suggests that the UAP's performance degrades when faced with downstream tasks that diverge significantly from its pre-training tasks, which supports our proposition of the task shift.

Q2:Why using GAP in Sec. 4? Why don’t the authors use Base classifier?

In Section 4, our motivation for employing GAP is to compare the conventional UAP generation approach under the scenario of FSL. This section is to examine the challenges for UAP in FSL. The Base classifier is introduced in Section 5 as part of our attack methodology. The sequential presentation is intentional, allowing us first to explore the existing methodologies and their limitations, then emphasizing the need for our novel framework.

Q3: The reviewers cannot understand the difference and relationship between Base Classifier, Proxy Linear, and Proxy Protos. Why the performance of the proposed proxy linear is not as good as base linear?

Thank you for the reminder. Allow us to clarify the distinctions and relationships among the terms.

  • Base Classifier: As defined in line 192 of our paper, this refers to the classifier pre-trained on the base dataset. It serves as our starting point before further adaptations.
  • Proxy Linear: As mentioned in line 201, the Proxy Linear classifier is trained through proxy tasks. This is a key step to handle the task shift.
  • Base Linear: Defined around line 229, this refers to the linear classifier that trained on tasks sampled from the based dataset, which is a further improvement from the proxy linear classifier as it diminishes the semantic gap from proxy dataset to base dataset.
  • Proxy Protos: Defined at line 253, this represents our final approach, where we employ prototypes derived from the proxy tasks to enhance the performance.

Q4:Why the attacker cannot access the base dataset?There is no definition of the proxy dataset in the previous works

Regarding the attacker's inability to access the base dataset, we have outlined this threat model in Section 3.1, where we introduce a more challenging scenario(same threat model with [48]). This setup reflects real-world conditions more accurately, as a large amount of data, especially proprietary corporate datasets, are typically not publicly available due to privacy concerns and intellectual property rights.

As for the definition of the proxy dataset, we would like to clarify that it has been introduced in reference [45]. This prior work conducts a comprehensive analysis of the mutual information between images and perturbations, thereby establishing the feasibility of employing a proxy dataset to perform the UAP attack.

Q5: Why did the authors choose the combination of {Cifar, Mini} and {Mini, Tiered}? Why the performance of the Mini proxy dataset is generally higher than the Cifar dataset?

Thank you for your question. As mentioned in lines 165-170, the victim model is trained on MiniImageNet. Consequently, when the proxy dataset coincides with MiniImageNet, it simulates a scenario where the attacker has access to the pre-training dataset, which is advantageous and leads to generally higher performance due to the diminished semantic shift between the proxy and the base dataset.

The choice of different datasets is intentional and serves to illustrate four distinct scenarios. {CIFAR, Mini} represents the cases that the proxy dataset matches (Mini) and doesn't match(CIFAR) the pre-training dataset. {Mini, Tiered} represents the cases that the downstream dataset matches (Mini) and doesn't match(Tiered) the pre-training dataset.

When CIFAR is selected as the proxy and Tiered as the downstream, it embodies the most general and challenging scenario where the attacker suffers both semantic shift from the pre-training to the proxy and also to the downstream dataset. Conversely, using MiniImageNet for both proxy and downstream datasets reflects an idealized condition for the attacker, leading to optimal attack performance.

Q6: The legends in Fig. 6 are confusing.The authors should analyze the performance degradation in Fig.6.

Thank you for the reminder, we have revised the presentation. The analysis is provided in lines 331-339. From fig6, we observe that the ASR remains relatively high when transitioning from 5-way 1-shot to 25-way 1-shot, and from 5-way 1-shot to 5-way 20-shot scenarios(5-1 and 5-5 for downstream tasks). Meanwhile, a noticeable degradation in performance emerges as the ways and shots deviate further from the downstream tasks. This aligns with our task shift claim and suggests that a rough estimation of the downstream tasks is sufficient for an attacker to discern the task bias and produce transferable UAPs.

Q7:What is identical and different part between the proposed method and the [48]?

Thanks for the suggestion. While our threat model aligns with that of [48], we have introduced key advancements in addressing the two shifts:

  1. Task Shift: Unlike [48], our method is designed to handle scenarios comprising diverse downstream tasks with limited samples, which [48] doesn't take into consideration.
  2. Semantic Shift: [48] employs contrastive learning loss to mitigate semantic shift, our ablation studies (lines 322-329) reveal that our strategy outperforms that using contrastive losses in the FSL scenario.
评论

The authors addressed most of my concerns for the technical details. It would also be great to revise the writing quality and the presentation of the manuscript in the final version.

评论

Thank you for your response. We will continue to refine our paper in the final version. In light of the issues we have addressed, would you consider raising the score? Additionally, please do not hesitate to raise any further concerns or questions you may have.

评论

I've changed the rating.

评论

We are grateful for your acknowledgment of our work and the increased rating. We will revise our paper in the final version.

审稿意见
6

This paper proposed a framework for generating Universal Adversarial Perturbations (UAPs) by considering the task and the semantic shift in the context of few shot learning. The authors introduces proxy tasks and proxy protos to enhance the transferability of UAPs in different FSL tasks. Results are validated on miniImagetNet and tieredImageNet.

优点

  1. The authors consider a practical problem of attacking FSL with UAP settings
  2. Paper is well written with dedicated analysis on potential causes of performance gap of traditional UAPs

缺点

  1. Evaluation of the proposed UAP framework is conducted on smaller datasets like miniImageNet, it’s not clear if the task and semantic shifts still hold on larger scale datasets like meta-dataset.
  2. The paper lacks discussions regarding how the proposed proxy tasks relate to/differ from the existing meta-learning UAP with similar use of few shot learning tasks (e.g., https://arxiv.org/pdf/2009.13714).
  3. The paper can benefit from a more comprehensive comparison against recent UAP frameworks (e.g., Pre-trained Adversarial Perturbations/PAP), with more in-depth discussions regarding how these alternatives address the gap from task and semantic shifts, making clearer the advantage of the proposed approach.
  4. The uses of “base classifier” and “base linear” are ambiguous in Figure 4(b), it could help to add explanations of these terms to the caption.

问题

  1. In section 5.3, authors found that using base dataset as the proxy dataset improves the performance. The authors thus interpret it as “if the proxy dataset can be distributed near the base dataset, the generated UAP can be more transferable to the novel dataset.” Could the improvement be due to that the base distribution is more like that of the novel dataset (compared with the proxy distribution), instead of due to enhanced transferability?
  2. What’s the rationale of removing log() in eq(3) in section 5.1 to “demonstrate the fundamental performance of the attack within this framework”?

局限性

N/A

作者回复

We truly appreciate your valuable comments. In the following, we respond to the concerns.

Q1: It’s not clear if the task and semantic shifts still hold on larger-scale datasets like meta-dataset.

Thank you for your valuable suggestion. Due to constraints of time and space, we select three distinct sub-datasets from the meta-dataset that significantly differ from miniImageNet and conduct experiments on it. These datasets include CUB, Omniglot, and GTSRB. We utilized these datasets alternately as proxy and downstream datasets to conduct comprehensive tests for our framework's generalizability. The results are shown in table1 and table2 in 'Author Rebuttal', which demonstrate that even when scaled up to larger and more diverse datasets such as those within the meta-dataset, our approach can still achieve SOTA attack efficacy across all three FSL paradigms. This consistency in performance highlights the efficacy of our method in tackling the challenges posed by both semantic and task shifts.

Q2: The paper lacks discussions regarding how the proposed proxy tasks relate to/differ from the existing meta-learning UAP with the similar use of few-shot learning tasks.

Thank you for your valuable comment. We discussed the connections and distinctions between our proposed proxy protos and existing meta-learning-based UAP methods(e.g., LFT).

In terms of similarities, both our work and LFT share the same goal of generating UAPs that can effectively impair model performance on a wide range of unseen test images.

However, several key differences distinguish our approach from LFT and similar meta-learning-based UAP methodologies:

  1. Threat Model: A fundamental difference lies in the threat model we adopt. Unlike LFT, which assumes access to both source data and downstream data during the generation of UAPs, our framework is proposed under a more strict attacker scenario. Specifically, our attacker is not only denied access to the source dataset but also has no visibility into the downstream data. This setup reflects the real-world scenarios where attackers may have severely limited information.
  2. Attack Scenario: While LFT focuses on generating UAPs tailored to attack standard classification tasks, leveraging few-shot learning techniques as part of its training regime, our method is aimed at a broader spectrum of few-shot tasks.
  3. Application of UAPs: Another distinction relates to how the generated UAPs are applied. In the case of LFT, even after utilizing a meta-learning scheme to generate UAPs, these perturbations need to be additionally fine-tuned on the specific downstream task using a small number of examples. Conversely, our methodology employs a generator that yields generalized UAPs capable of being deployed directly without the need for further fine-tuning on the target tasks.

In summary, while both our work and prior studies like LFT explore the generation of UAPs within the context of few-shot learning, our research introduces a stricter threat model, extends the scope to a variety of downstream tasks, and innovates by generating UAPs that do not require task-specific fine-tuning. These points of divergence highlight the novelty and significance of our contribution to the field.

Q3: The paper can benefit from a more comprehensive comparison against recent UAP frameworks (e.g., PAP), with more in-depth discussions.

Thank you for the insightful suggestion. We have incorporated the PAP method under the few-shot learning scenario and made a comparative analysis. Through the experimental results, we delve into how PAP tackles the challenges posed by task and semantic shifts. Due to the space limit, PAP results are demonstrated in table3 and table4 in the reply to the reviewer id8n.

PAP mitigates the impact of fine-tuning by perturbing only the lower-level features. Additionally, it uses learnable Gaussian noise for data augmentation. However, it is crucial to note that PAP achieves generalizability by sacrificing model parameters used in generating UAPs, which comes at the cost of requiring access to the base dataset and a reduction of performance on the base dataset. In contrast, our proposed method demonstrates effectiveness not only on the base dataset but also exhibits superior cross-dataset performance without necessitating access to the base dataset. This feature enables our approach to bridge wider semantic gaps effectively. Moreover, our work uniquely addresses the task shift issue in the few-shot context, a dimension overlooked by PAP.

Q4:Could the improvement be due to the base distribution being more like that of the novel dataset instead of due to enhanced transferability?

We acknowledge your insightful observation. We agree that the improved performance could stem from the base distribution's inherent similarity to the novel dataset's distribution, which potentially leads to better results compared to the proxy distribution. This is the exact explanation for why Base Linear outperforms Proxy Linear.

However, an attacker does not have access to the base distribution in practice. So, during the process of crafting UAPs, the key lies in approximating the base distribution as closely as possible to effectively generalize to the unseen novel distribution. Our approach leverages the generalization capability of the encoder to extend its understanding to the UAP. By doing so, the generalizable UAP can be obtained through various proxy datasets and models.

Q5: What’s the rationale of removing log in eq(3)?

Thanks for the question. We need to mention that the log()log() function in eq(3) is not the log()log() inside the Cross-Entropy loss, but an additional component introduced from the GAP method, aimed at boosting performance. We remove it to demonstrate the fundamental performance of our framework, ablating the effect of the additional log()log(). The most basic form of the negative cross-entropy loss is loss=H(f(x+δ),y)loss = -\mathcal{H}(f(x+\delta), y).

评论

I thank the authors for their reply that have addressed most of my concerns.

评论

We are grateful for your acknowledgment of our work and constructive suggestions. We will continue to refine our paper in the final version!

审稿意见
5

This paper addresses the adversarial attack on image classification tasks, especially for the few-shot learning setup with universal adversarial perturbations. Authors claim that the traditional approach does not work well in the FSL setup, which has two properties including task shifts and semantic shifts. The paper addresses these issues by proposing to do the attack on few-shot tasks, with proxy protos. Experiments are carried out on multiple FSL frameworks, including finetuning/meta/metric-based paradigms, and show the effectiveness of the proposed method on all of those frameworks.

优点

  • The results reported in Fig 4(b) are very promising, and the proposed method also achieves good results comparing with other attack methods as reported in Tab 4.
  • The investigated problem is important yet less studied.
  • The proposed method is simple.

缺点

  • All datasets are natural images, and I wonder if the experiments can be improved by trying on out-of-domain datasets, e.g. Omniglot or CUB.
  • In Tab 2 and 3, sometimes the ASR is the best when victim and proxy datasets are different. E.g. in Tab 3 DN4, when backbone=RN18 and victim=Tiered, the Mini-proxy version has better ASR (74.85) than the Tiered-proxy version (73.06). This may need more analysis and ablation.
  • More ablations on the algorithms (line 275 to 278) are needed. For example, is 5-way-1-shot always the best proxy task.

问题

In Fig 1 and line 142, we mention that GAP cannot generate UAPs for metric-based models. I am curious can we use the base classifier for the attack? Which setup is used for reporting numbers in Fig 4(b)?

局限性

Limitations are sufficiently discussed.

作者回复

We truly appreciate your valuable comments. In the following, we respond to the concerns.

Q1: All datasets are natural images, and I wonder if the experiments can be improved by trying on out-of-domain datasets, e.g. Omniglot or CUB.

Thanks for your constructive suggestion. In addition to the original three natural image datasets, we have supplemented our study with three cross-domain datasets: CUB, Omniglot, and GTSRB (Traffic Signs), using them alternately as proxy and downstream datasets. Meanwhile, we select Baseline, ANIL, and ProtoNet as representatives of the finetuning, meta, and metric-based FSL paradigms. The experimental results are demonstrated in table1 and table2 in 'Author Rebuttal', which demonstrate that even with a substantial domain gap between datasets, our approach can still achieve SOTA attack efficacy across all three FSL paradigms.

Q2: In Tab 2 and 3, sometimes the ASR is the best when victim and proxy datasets are different. This may need more analysis and ablation.

Thank you for raising this point. We have carefully examined the data and observed that this phenomenon occurs specifically between the miniImageNet and tieredImageNet datasets. We speculate that this is due to the similarity and overlap between the images in the mini and tiered datasets. Consequently, in some cases, we observe higher ASR when the proxy and victim datasets are different.

Q3: More ablations on the algorithms (lines 275 to 278) are needed. For example, is 5-way-1-shot always the best proxy task?

We appreciate your suggestion, and we would like to clarify that we have indeed conducted ablation studies to explore the impact of varying the number of ways and shots in our proxy tasks, as outlined in lines 331 to 339. Aligning with the setup detailed in Section 5.1, we fixed the number of shots at 1 and varied the ways (from 5-way up to 35-way), and conversely, we held the way constant at 5 and increased the shots (from 1-shot up to 30-shot). The downstream tasks remain at 5-way 1-shot and 5-way 5-shot.

The results are depicted in Figure 6, which shows that the ASR for downstream tasks remains relatively high when the proxy tasks are within 25-way and 20-shot. Meanwhile, as both the way and shot numbers deviate further from the actual downstream task, a notable decline in attack effectiveness can be observed. This suggests that a rough estimation of the downstream tasks is sufficient for an attacker to discern the task bias and produce transferable UAPs.

Q4: In Fig 1 and line 142, we mention that GAP cannot generate UAPs for metric-based models. I am curious can we use the base classifier for the attack?

Thank you for the insightful suggestion. Theoretically, the suggestion is feasible. When attacking metric-based models using GAP, since the attacker does not have direct access to the pretraining and downstream datasets, he could train an additional base classifier based on the proxy dataset, which would then be used to generate UAP aimed at the downstream tasks. However, there's a crucial consideration: the classifier trained on this proxy dataset inherently introduces semantic shift, as it is tailored to the distribution of the proxy data rather than the actual target data. Consequently, UAPs generated through such a classifier are likely to exhibit diminished effectiveness.

To provide empirical evidence, we conducted experiments where CIFAR-FS served as the proxy dataset, while mini-ImageNet was employed both for pre-training and as the downstream dataset. We utilized the ProtoNet with ResNet12 as the backbone for our metric-based model. Our results, outlined below, illustrate that when using this approach to generate GAP-based UAPs, the performance in both 5-way-1-shot and 5-way-5-shot downstream tasks is significantly inferior to the methods that directly utilize the original model for UAP generation.

method5way 1shot ASR5way 5shot ASR
GAP (proxy trained base classifier)51.20±0.3961.36±0.26
AdvEncoder (ICCV2023)66.63±0.3467.76±0.26
Ours69.03±0.3667.96±0.28

Q5: Which setup is used for reporting numbers in Fig 4(b)?

Figure 4(b) illustrates the performance of our attacking framework, whose setting is detailed in section 5.1(lines 180-193). Within this framework, we choose a model pre-trained on the train split of the mini-ImageNet dataset with a ResNet12 backbone as our victim model. The train split of the CIFAR-FS dataset is chosen as the proxy dataset while the test split of the mini-ImageNet dataset is chosen as the downstream dataset.

评论

Dear Reviewer gmrg,

Sorry to bother you again. Since there are only few hours left in the discussion phase, we wish to know if your concerns have been addressed or if you have any additional concerns. Looking forward to your reply.

Best regards,

Authors

审稿意见
6

Authors argue that Univeral Adversarial Perturbation (UAPs) do not work well in few-shot learning (FSL) scenarios due to task and semantic shifts between the data. To this end, they proposed a method for crafting UAPs in FSL scenarios. The threat model assumes access to a pre-trained model with no access to pre-training data and downstream few-shot tasks. First, the authors conducted several experiments to understand the effect of semantic and task shifts for generating UAPs for few-shot scenarios. Next, they utilized several ablation experiments to design this method. The proposed method consists of training a UAP on a 5-way-1-shot FSL task with a cross-entropy-like fooling loss.

优点

The paper presents an interesting and novel method to craft UAP examples for FSL scenarios. Interesting empirical studies are conducted to understand different aspects of the method, including the failure of previous methods and components of the proposed method. Comprehensive experiments are performed to evaluate its performance.

缺点

  • Authors apply GAP (an existing UAP method) and show its bad performance on few-shot learning scenarios. They attributed the bad transferability of GAP in FAL to the task shift. However, one more relevant issue could be the small number of examples used for generating GAP-based UAP.

  • The writing of the paper can be improved. Parts of it are difficult to understand and confusing.

问题

  • For training of UAP generator (gθ(.)g_\theta(.)), just one 5-way-1-shot task is used or 5-way-1-shot tasks are sampled from a datasets?

    • It could be an extremely efficient method if, for one UAP, a single 5-way-1-shot is utilized.
  • Three FSL datasets used (two ImageNet variants and CIFAR10) are closely related. How does your method work for a significantly different downstream task?

  • Why was GAP chosen as a UAP generator, even though it is a 6-year-old paper? How does your method compare with recent UAP methods?

  • Why is a comparison with state-of-the-art UAPs not provided?

局限性

The authors have added a broader impact statement.

作者回复

We truly appreciate your valuable comments. In the following, we respond to the concerns.

Q1: Authors attributed the bad transferability of GAP in FSL to the task shift. However, one more relevant issue could be the small number of examples used for generating GAP-based UAP.

Thanks for the insightful comments. We would like to clarify that whether in the traditional scenario with GAP or in the FSL scenario, the generation of UAP is safeguarded by a substantial amount of image data. When generating the UAP using the GAP method in FSL, the attacker leverages a pre-trained model(comprising an encoder and a classifier) to conduct the pre-training classification task (such as 64-way classification) and optimizes the classification loss in reverse to achieve the attack effect. Consequently, all images from the proxy dataset are utilized during the generation of UAP. In the downstream finetuning stage, the model provider fine-tunes the model based on the downstream tasks (training a new classifier with a small set of samples). The attacker cannot manipulate this procedure as he doesn't know what the downstream tasks will be. When the fine-tuning is complete and the user is ready to infer with the new model, the attacker implants the unique UAP onto the images uploaded by the user, thereby executing the attack.

Consequently, we claim that the decline in GAP's effectiveness when applied to FSL is attributed to the task shift and the semantic shift. As these two shifts are progressively mitigated, we achieve a substantial enhancement in attack performance.

Q2: For training of UAP generator, just one 5-way-1-shot task is used or 5-way-1-shot tasks are sampled from a dataset?

Thank you for your question. We sample multiple 5-way 1-shot tasks from a dataset. The fundamental distinction between UAPs(image-agnostic) and image-dependent adversarial perturbations lies in the data required in their generation process. A single UAP is designed as a cumulative effect of perturbations across many individual samples, which enables it to be effective against a wide range of images. This process is crucial for its universal applicability across different inputs. Meanwhile, in the setting of few-shot learning, it is also necessary to first pre-train a transferable base model using a large amount of samples, and then generalize it to new downstream tasks with a limited number of samples.

Q3: Why was GAP chosen as a UAP generator? How does your method compare with recent UAP methods?Why is a comparison with state-of-the-art UAPs not provided?

We chose GAP due to its representativeness as a classical method and its convenience for an extension to the FSL scenario, but our comparisons extend well beyond it in the paper(lines 312-320, table4 and table5). We benchmarked our approach against several methods(including the recent AdvEncoder), across six different few-shot learning paradigms, two distinct downstream tasks (5way-1shot and 5way-5shot), and two victim datasets, outperforming all.

Regarding the similar motivation of the PAP[1] method to ours (both are generalizing UAPs to unseen downstream tasks), we've also evaluated its performance in FSL. PAP perturbs only the low-level feature to mitigate the effect of fine-tuning, which has proven effective in the original paper. However, given the scarcity of classes and samples in few-shot scenarios, PAP underperforms in FSL, The results are shown in table3 and table4.

Table 3: 5-way 1-shot results.

proxymethodbaselineanilprotonet
cifarPAP44.39±0.3044.52±0.2937.70±0.29
ours81.56±0.2977.84±0.2869.03±0.36
cubPAP45.59±0.3044.73±0.2937.92±0.30
ours81.17±0.3077.83±0.2871.22±0.36
OmniglotPAP47.91±0.3145.88±0.2838.52±0.29
ours76.32±0.3177.81±0.2867.70±0.35
GTSRBPAP42.62±0.2944.54±0.2937.96±0.29
ours79.99±0.3078.13±0.2069.17±0.38

Table 4: 5-way 5-shot results.

proxymethodbaselineanilprotonet
cifarPAP36.58±0.2642.34±0.2835.69±0.26
ours79.00±0.1878.31±0.2167.96±0.28
CUBPAP37.86±0.2642.65±0.2736.21±0.26
ours78.29±0.2178.25±0.2072.24±0.28
OmniglotPAP40.69±0.2743.92±0.2736.63±0.26
ours74.46±0.2177.87±0.2169.05±0.26
GTSRBPAP34.67±0.2642.39±0.2749.44±0.26
ours77.55±0.2077.93±0.2869.98±0.30

Q4: Three FSL datasets used are closely related. How does your method work for a significantly different downstream task?

Thank you for your valuable suggestion. In addition to the three FSL datasets, we have augmented our experiments with three out-of-domain datasets to further validate the generalizability of our method. These include CUB(fine-grained bird images), Omniglot(handwritten characters), and GTSRB(traffic sign images). We utilized these datasets both as proxy datasets and downstream datasets in extended experiments.

The experimental results are summarized in the table1 and table2 in the 'Author Rebuttal', which shows that our method is capable of achieving state-of-the-art adversarial attack performance even when there is a significant domain gap between the datasets.

[1] Pre-trained Adversarial Perturbations.

评论

Dear Reviewer id8n,

Sorry to bother you again. Since there are only few hours left in the discussion phase, we wish to know if your concerns have been addressed or if you have any additional concerns. Looking forward to your reply.

Best regards,

Authors

作者回复

We thank all reviewers for the detailed and constructive reviews. We have revised the paper based on your suggestion. Here are some highlights in the paper revision:

  1. Expanded Experimental Scope: We have augmented our study with additional experiments conducted on the CUB, Omniglot, and GTSRB datasets, utilizing them alternately as proxy and downstream datasets. We chose three victim models as representatives of finetuning, meta, and metric-based FSL paradigms, and compared them with three SOTA methods. The results, summarized as follows, offer a broader perspective on the generalizability of our proposed method.

    Table1: 5-way 1-shot results.

    proxydownstreammethodbaselineanilprotonet
    CUBminiUAN52.47±0.3162.85±0.27-
    GAP68.47±0.3167.47±0.28-
    AdvEncoder76.57±0.3172.86±0.2770.80±0.36
    ours81.17±0.3077.83±0.2871.22±0.36
    OmniglotminiUAN44.90±0.3037.58±0.26-
    GAP60.95±0.3363.82±0.35-
    AdvEncoder74.16±0.3347.97±0.3454.72±0.31
    ours76.32±0.3177.81±0.2867.70±0.35
    GTSRBminiUAN54.45±0.3250.17±0.26-
    GAP54.09±0.3368.22±0.31-
    AdvEncoder67.24±0.3262.86±0.3067.16±0.34
    ours79.99±0.3078.13±0.2069.17±0.38
    miniCUBUAN76.82±0.3673.75±0.48-
    GAP69.70±0.3575.08±0.45-
    AdvEncoder76.21±0.3576.82±0.5671.40±0.40
    ours79.22±0.3578.09±0.5871.60±0.40
    miniOmniglotUAN36.20±0.4056.63±0.59-
    GAP20.09±0.3225.40±0.33-
    AdvEncoder64.41±0.4166.27±0.6571.80±0.38
    ours76.44±0.2769.64±0.6475.27±0.36
    miniGTSRBUAN74.86±0.3479.82±0.33-
    GAP73.30±0.3479.04±0.36-
    AdvEncoder77.18±0.3378.27±0.3776.71±0.37
    ours80.29±0.3080.31±0.3678.41±0.35

    Table2: 5-way 5-shot results.

    proxydownstreammethodbaselineanilprotonet
    CUBminiUAN45.93±0.2863.53±0.24-
    GAP65.91±0.2467.58±0.25-
    AdvEncoder73.67±0.2372.67±0.2271.99±0.26
    ours78.29±0.2178.25±0.2072.24±0.28
    OmniglotminiUAN37.34±0.2735.05±0.26-
    GAP57.22±0.2961.09±0.31-
    AdvEncoder72.07±0.2344.73±0.3157.38±0.26
    ours74.46±0.2177.87±0.2169.05±0.26
    GTSRBminiUAN48.46±0.2948.73±0.26-
    GAP48.68±0.2965.94±0.28-
    AdvEncoder63.99±0.2760.51±0.2768.96±0.24
    ours77.55±0.2077.93±0.2869.98±0.30
    miniCUBUAN77.42±0.2064.42±0.58-
    GAP68.90±0.2768.89±0.52-
    AdvEncoder75.29±0.2267.31±0.6872.80±0.27
    ours77.46±0.2169.43±0.7172.86±0.27
    miniOmniglotUAN26.15±0.3952.25±0.60-
    GAP10.14±0.2321.69±0.32-
    AdvEncoder60.57±0.4462.99±0.6668.25±0.34
    ours75.04±0.2567.52±0.6773.15±0.30
    miniGTSRBUAN74.27±0.2378.31±0.28-
    GAP68.57±0.2977.77±0.29-
    AdvEncoder73.96±0.2677.50±0.2978.38±0.23
    ours77.22±0.2478.50±0.2878.77±0.23
  2. Enhanced Comparative experiments: We have included additional comparative studies, including a comparison with the PAP method across multiple datasets and an exploration of the GAP approach in metric-based paradigms.

  3. Improved paper presentation: We have made additional modifications and clarifications and double-checked the paper writing for a better presentation.

In the following, we respond to all concerns one by one.

最终决定

This paper studies universal adversarial perturbations (UAP) for the few-shot learning (FSL) scenarios. Overall, all reviewers acknowledge the novelty and importance of the paper's contribution to addressing the "less-studied" UAPs in FSL scenarios, and appreciate the simplicity and effectiveness of the proposed method. But meanwhile, some concerns are raised, mainly regarding 1) more (diversed) datasets should be evaluated; 2) comparison to some more recent UAP methods should be included; and 3) some technique aspects should be further clarified and the overall presentation should be improved.

These concerns are well addressed in the rebuttal, and all reviewers stand on the positive side of this submission. The final decision is accept.