Gradient Regularization-based Cross-Prompt Attacks on Vision Language Models
Using gradient regularization to enhance cross-prompt adversarial attacks on vision language models.
摘要
评审与讨论
The paper presents a novel Gradient Regularization-based Cross-Prompt Attack (GrCPA) targeting Vision-Language Models (VLMs), which addresses the issue of adversarial non-stationarity across diverse prompts. By leveraging gradient regularization, GrCPA mitigates the variability in adversarial success when multiple prompts are used. This approach enhances the robustness of adversarial examples, improving their transferability across prompts. Experiments on models such as Flamingo, BLIP-2, LLaVA, and InstructBLIP validate GrCPA’s effectiveness, showing superior attack stability and transferability compared to existing methods.
优点
-
The method introduced is original. GrCPA’s use of gradient regularization for adversarial robustness across prompts introduces an effective method for enhancing VLM attack transferability.
-
The extensive experimental analysis across models and tasks (e.g., image captioning, VQA) confirms the soundness of the approach.
-
The method’s formulation and rationale are clearly articulated, supported by structured experiments that compare GrCPA with established baselines.
缺点
-
The technical depth of this paper is somewhat limited. Adversarial attacks are really not something that is surprisingly new in machine learning models, even in VLM. Incremental improvement in this area does not contribute much to this community. The method only introduces gradient normalization to stabilize the adversarial optimization, which is more like a trick for attack implementation.
-
I would expect some black-box transferability analysis to demonstrate the effectiveness of this attack.
问题
See the weakness section
The similarities between this work and [1] in problem motivation, paper organization, experimental design, and writing text raise concerns about originality and potential plagiarism. Although the proposed approach does present some differences, the extent of overlap indicates that the authors may not have adequately distinguished their work from [1], published in ICLR 2024.
Here are some specific instances that suggest potential plagiarism:
-
The paper introduces a new problem, "cross-prompt transferability," which was first proposed in [1]. Notably, the main text, abstract, and introduction do not reference this prior work.
-
The organisation of this paper closely mirrors that of [1], with some tables being directly copied and merely modified to add an additional row.
-
Several paragraphs in this manuscript appear to be simple paraphrases of corresponding sections in [1].
-
In the experimental design, instead of acknowledging [1] as a basis, the authors claim to have independently designed the experiment, even though the design and details align precisely with those in [1].
There are additional similar issues present in the manuscript. Overall, this paper clearly does not adhere to accepted scientific writing standards.
[1] Luo, Haochen, et al. "An image is worth 1000 lies: Transferability of adversarial images across prompts on vision-language models." In The Twelfth International Conference on Learning Representations. 2023. Url: https://openreview.net/pdf?id=nc5GgFAvtk
优点
N/A
缺点
N/A
问题
N/A
伦理问题详情
The similarities between this work and [1] in problem motivation, paper organization, experimental design, and writing text raise concerns about originality and potential plagiarism. Although the proposed approach does present some differences, the extent of overlap indicates that the authors may not have adequately distinguished their work from [1], published in ICLR 2024.
Here are some specific instances that suggest potential plagiarism:
-
The paper introduces a new problem, "cross-prompt transferability," which was first proposed in [1]. Notably, the main text, abstract, and introduction do not reference this prior work.
-
The organisation of this paper closely mirrors that of [1], with some tables being directly copied and merely modified to add an additional row.
-
Several paragraphs in this manuscript appear to be simple paraphrases of corresponding sections in [1].
-
In the experimental design, instead of acknowledging [1] as a basis, the authors claim to have independently designed the experiment, even though the design and details align precisely with those in [1].
There are additional similar issues present in the manuscript. Overall, this paper clearly does not adhere to accepted scientific writing standards.
[1] Luo, Haochen, et al. "An image is worth 1000 lies: Transferability of adversarial images across prompts on vision-language models." In The Twelfth International Conference on Learning Representations. 2023. Url: https://openreview.net/pdf?id=nc5GgFAvtk
The authors proposed a method termed Gradient Regularized-based Cross-Prompt Attack (GrCPA) that creates adversarial images that transfer across prompts. The GrCPA method extends the previous cross-prompt framework by applying gradient regularisation. The effectiveness of the GrCPA is evaluated with Flamingo, BLIP-2, LLaVA, and InstructBLIP on different tasks.
优点
- The paper conducts extensive experiments on various VLMs to prove the effectiveness of GrCPA.
- The paper is easy to follow.
缺点
-
The novelty is limited: As detailed in Section A.2 (line 878), the only difference between GrCPA and a recent work termed CroPA [1] is the addition of Gradient Regularization. The pipeline of GrCPA is highly similar to that of CroPA.
-
Practical applicability to the real world is limited: As shown in Table 11, GrCPA does not demonstrate strong transferability across different models, with the average ASR remaining below 10%.
[1] Luo, Haochen, Jindong Gu, Fengyuan Liu, and Philip Torr. "An image is worth 1000 lies: Transferability of adversarial images across prompts on vision-language models." In The Twelfth International Conference on Learning Representations. 2023.
问题
-
What is the impact of using different extrema K?
-
Why the top k largest values of the gradient vectors are clipped directly to 0 instead of other constant values?
伦理问题详情
The overall structure of the paper closely resembles that of a recent work [1]. Notably, the experimental section fails to adequately acknowledge or reference the results and insights (such as the results on high ASR when targeted text is set to nonsensical phrases) in [1]. This omission creates the impression that these experiments are entirely original to the authors.
[1] Luo, Haochen, Jindong Gu, Fengyuan Liu, and Philip Torr. "An image is worth 1000 lies: Transferability of adversarial images across prompts on vision-language models." In The Twelfth International Conference on Learning Representations. 2023.
This paper addresses the challenge of creating transferable adversarial attacks across different prompts for vision language models (VLMs). The authors propose GrCPA (Gradient Regularized-based Cross-Prompt Attack), which utilizes gradient regularization to generate more robust adversarial examples.
优点
- The experiments show the consistent better performance.
- The writing is easy to follow.
缺点
- The novelty and contribution are marginal. It only modifies the training loss in a very simple way.
- The logic is unconvincing to me. It is claimed that large gradients can lead to local optima and trigger overfitting issues. However, the Gradient Regularization simply sets the largest and the lowest gradients to zero. This raises two questions: (1) Why do you set the lowest gradient to zero? (2) Does the largest gradient represent a ‘large’ gradient? For example, in some cases, the largest gradient could be lower than the lowest gradient in another sample or batch. How do you define ‘large’ and ‘small’?"
问题
- During training and testing, do you use the same text prompts? If so, it seems that cross-prompt is just overfit on several prompts rather than one.
I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.