Exploring the Collaborative Advantage of Low-level Information on Generalizable AI-generateted Image Detection

Ziyin Zhou,Ke Sun,Zhongxi Chen,Yunpeng Luo,Xianming Lin,Ke Yan,Xiaoshuai Sun,Shouhong Ding,Rongrong Ji

OpenReview PDF

提交: 2024-09-20更新: 2024-11-13

摘要

关键词

AI-generateted Image DetectionLow-level Information

评审与讨论

审稿意见

评分: 5置信度: 42024-11-03

In the current image detection task, the existing detection methods are based on various high-level information or low-level information to achieve. This paper proposes the Adaptive Low-level Experts Injection framework, which uses multiple low-level information pairs to detect the generated images. The framework is based on CLIP, uses LORA to extend the attention module, and uses the cross-attention mechanism to integrate multiple information. Training on four categories of ProGAN datasets, the method achieves SOTA effect on data including GAN and Diffusion models.

优点

The network structure of this paper is novel. In this paper, CLIP network is used as the backbone network, and a Low-level Information Adapter module is designed to input low-level information into the CLIP network, which mainly extracts semantic information, so as to enhance the embedding of original information.
In this paper, a large number of experimental accuracy results on the generated image dataset are presented, including the mainstream generation methods based on GAN and stable diffusion. The data are detailed and the method is relatively solid.
In general, similar to the idea of using multi-expert models to detect fake images, it is relatively valuable to study.

缺点

Lack of necessary explanation: In this article, the judgment method is combined with various underlying information. The existing single underlying information method in the article is explained. However, the lack of detailed comparison and explanation with the method that also uses multiple underlying information points out the shortcomings of the existing multi-information method to prove the effectiveness of the method. Also specify if this is the first method of this type.
This paper introduces a variety of low-level information, such as SRM filters, DIRE, etc. However, NPR, DnCNN, and NoisePrint are selected in the model. According to the description in the paper, the model's generalization performance is enhanced with more information, and data is needed to explain the reasons for selecting these three types of information.
In the detection task of diffusion model, the effect is inferior to PatchCraft, and the possible reasons need to be explained.

问题

Lack of necessary explanation: In this article, the judgment method is combined with various underlying information. The existing single underlying information method in the article is explained. However, the lack of detailed comparison and explanation with the method that also uses multiple underlying information points out the shortcomings of the existing multi-information method to prove the effectiveness of the method. Also specify if this is the first method of this type.
This paper introduces a variety of low-level information, such as SRM filters, DIRE, etc. However, NPR, DnCNN, and NoisePrint are selected in the model. According to the description in the paper, the model's generalization performance is enhanced with more information, and data is needed to explain the reasons for selecting these three types of information.
Your model is not as effective as PatchCraft in detecting diffusion models. Possible causes need to be explained.
The resolution used in training is 224*224, is it the same resolution when reasoning? As for the diffusion model, this model is better generated at large resolution. At what resolution does your reasoning work?

审稿意见

评分: 5置信度: 42024-11-04

This paper instroduces Adaptive Low-level Experts Injection (ALEI) framework to investigate the generalization issue in AI-Generated image detection. The authors claim that that integrating diverse low-level information helps overcome the limitations of generalizing to unseen generative models. Besides, Cross-Low-level Attention and Dynamic Feature Selection are used to fuse low level features and to select suitable features dynamically.

优点

This paper uses low-level features to address the issue of limited generation in AI-Generated image detection and gives some insights into it. Also, the experiments and analysis of different types of low-level information shows that each type has distinct contribution of detection.
The cross-low-level attention layer integrates different low-level features without losing their unique contributions, helping avoid the pitfalls of simple feature fusion.

缺点

The ALEI uses LoRA experts to integrate low-level features, I wonder what are the criteria used to determine the number of experts? Is this method sensitive to the number of experts?
In dynamic feature selection, how does the model ensure the reliability of this selection, especially for forgeries that may lie in the overlapping space of multiple generative models? Such as the potential risk of feature redundancy or competition between modalities during selection.
The model is trained on ProGAN firstly, and uses further training with the fusion module. How does the two-stage training process impact the final generalization ability? Would joint training perform better? Or is there a risk of overfitting due to low-level and high-level features competition?

问题

See weaknesses.

审稿意见

评分: 5置信度: 32024-11-04

This paper proposes the Adaptive Low-level Experts Injection (ALEI) framework for AI-Generated image detection.
Extensive experiments demonstrate that ALEI achieves state-of-the-art results on multiple benchmarks.

优点

The method is effective in many benchmarks.

缺点

Limited novelty: This method is mainly a simple fusion of high-level information and low-level information, which has been introduced into AI-Generated image detection in [1]. Moreover, high-level information has been introduced in [2] and low-level information has been introduced in [3].
The concept of utilizing multiple LoRAs is quite prevalent, as highlighted in [4], where their application in AI-generated image detection is discussed.
ALEI is much bigger than existing work such as PatchCraft that uses just few convolution layers as the classifier. This makes one wonder whether the performance increase claimed in the article is due to the increased FLOPS/PARAMS or the ALEI itself.
Lack of a lot of experiments. Lack of experimental results for benchmarks such as GenImage, DiffusionForensis, etc.
Performance evaluation. Table 1 indicates that PatchCraft [69] outperforms the proposed method across several generators，especially in diffusion-based generators. [1]. A Sanity Check for AI-generated Image Detection [2]. Towards Universal Fake Image Detectors that Generalize Across Generative Models [3]. PatchCraft: Exploring Texture Patch for Efficient AI-generated Image Detection [4]. MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection

问题

See weakness

审稿意见

评分: 3置信度: 52024-11-04

This paper proposed the Adaptive Low-level Experts Injection framework and developed a Low-level Information Adapter that interacts with the features extracted by the backbone. Experiments on several datasets show that the proposed model achieves state-of-the-art performance.

优点

The motivation of this paper is clear and the authors chose a straightforward but effective method to achieve the goal.
The combination of low-level features, Lora Experts, and feature selection for AI-generateted Image Detection is interesting.

缺点

What concerns me the most is whether the Lora Experts is truly effective and whether the experiments conducted by the authors are solid enough. The core issue of the AI-generated Image Detection task lies in the generalization of the model. However, from the experimental results in the tables, it can be observed that the algorithm does not perform well enough when facing cross-models, especially diffusion models.
Moreover, it is better if the paper provides some failure cases in the visualization figure and explains the reason why these cases happen.
Some figures in the paper are not clear enough, it is recommended to utilize figures in PDF or eps format. Moreover, it is better to unify fonts in all figures.
Please unify the format of references. At least ensure that the citation formats of conferences and journals are consistent.
The sources of citations in this paper should be corrected. For example, “Dire for diffusion-generated image detection” is from CVPR2023 rather than arXiv.
There are some typos in this paper, such as “Many work (works) (Zhao et al., 2023; Peng et al., 2021; Yuan et al., 2021) suggests …… (4.3 LOW-LEVEL INFORMATION INTERACTION ADAPTER)”.

问题

Please standardize the capitalization of English letters in the references. Many abbreviations of proper nouns are incorrect, such as Stargan (StarGAN).
We usually use "generated" rather than "generated".

撤稿通知

2024-11-13

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.