4.3

/10

withdrawn4 位审稿人

最低3最高6标准差1.3

4.5

置信度

正确性2.0

贡献度2.3

表达2.3

ICLR 2025

ACID: A Comprehensive Dataset for AI-Created Image Detection

Haoming Lu,Kai Wang,Bin Sun,Hovhannes Margaryan,Xingqian Xu,Humphrey Shi

OpenReview PDF

提交: 2024-09-20更新: 2024-11-13

TL;DR

We present a benchmark for AI generated image detection.

摘要

Generative models have demonstrated remarkable capabilities in generating photorealistic images under proper conditional guidance. Such advancements raise concerns about potential negative social impacts, such as the proliferation of fake news. In response, numerous methods have been developed to differentiate fake from real. Yet, their accuracy and reliability still need to be improved, especially when facing state-of-the-art generative models such as large diffusion models. Infrastructure-wise, the existing testing datasets are sub-optimal in terms of research dimensions and product utility due to their limited data volume and insufficient domain diversity. In this work, we introduce a comprehensive new dataset, namely ACID, which consists of 13M samples sourced from over 50 different generative models versus real-world scenarios. The AI-generated images in this collection are sampled based on fine-grained text prompts and span multiple resolutions. For the real-world samples, we broadly searched public data sources and carefully filtered text-image pairs based on visual and caption quality. Using ACID, we present ACIDNet, an effective framework for detecting AI-generated images. ACIDNet leverages texture features from a Single Simple Patch (SSP) branch and semantic features from a ResNeXt50 branch, and achieves overall cross-benchmark accuracy of $86.77%$, significantly outperforming previous methods such as SSP and CNNSpot by over $10%$. Both our model and dataset will be open-released to the public.

关键词

Computer visionGenerative ModelAI Ethics

评审与讨论

审稿意见

评分: 5置信度: 52024-10-29

This paper proposes a large-scale new dataset called ACID, which consists of 13M samples over 50 different recent generative models and real-world scenarios. The dataset is collected from very recent generative models, such as Stable Diffusion XL, with high resolutions, object categories, and augmentation. Furthermore, the authors propose a baseline for their method termed ACIDNet, which consists of two branches: one semantic branch with ResNetXt50, and a texture branch with high-pass filters for a single simple patch. The experiments on their proposed dataset support their method' effectiveness.

优点

The dataset collects images generated from very recent generative models, which should contribute to the related community.
The authors consider several different scenarios, such as the art, unnatural forgery, and post-processing photos, which are very interesting and should be discussed in this field.
The dataset considers many different settings, such as style, and object categories, which is also a issue unaddressed by former datasets.
The proposed detector baseline is effective for detecting AI-generated images, supported by their experiments.
The paper is well-written and easy to follow.

缺点

How will the proposed dataset be effective or contribute to future research? Since the generative models always evolving, there will be countless new models in the future. The ACID dataset is novel enough for now, but how to make sure for the future? I acknowledge the authors should have spent enough time and effort on collecting the dataset, but it is not enough if it is just a work depending on time. Maybe there are more insights this dataset can give for related future work.
The dataset considers many different scenarios and settings, which is good. Therefore, it is a little confusing to follow all the different settings, category them may be better for reviewers to understand, such as for generalization, for robustness, etc.
For the proposed detector baseline: the resnet branch is a widely-used baseline for image classification, and the texture branch is based on the SSP and Patchcraft, which underestimate the authors own contributions.

问题

The authors claim 13M samples for their ACID dataset, but in line 131, they claim 22M images. I don't know whether it is typo.
The authors regard images uploaded on online platform A before 2019 as not AI-created in line 215. But why? How can you make sure there is no generated/manipulated images before 2019?
For the post-processing augmentation, did the authors only employ them for training their ACIDNet? Or they also used them to organize their dataset?
For the simplest patch method, it is a little strange the most discriminative part of an image is the simplest part, since intuitively the more difficult part should also be more difficult to generate. Can the authors provide any proof for this claim beyond two cited previous work?
For comparisons in Tab.4, the authors compare on their proposed benchmark and show the superiority. Did the authors try to evaluate on other previous public benchmarks? This should provide more evidence for the performance.
For Tab.4, did the authors evaluate other detectors by using their pre-trained checkpoints? Or fine-tuning on the proposed datasets? We should make the comparisons as fair as possible.

伦理问题详情

This dataset contains different data sources, the authors should make sure everything is ok for, such as privacy, terms of use.
It could be better to add some ethical discussion on how the dataset and method could impact the community.

审稿意见

评分: 3置信度: 42024-10-29

This paper introduces the ACID dataset, comprising 13 million samples collected from over 50 different generative models and real-world sources, offering a broad range of resolutions. Alongside the dataset, the authors propose ACIDNet, a detection model that combines texture and semantic features. ACIDNet achieves 98.01% accuracy on their dataset, surpassing existing methods (e.g., SSP) by over 10%.

优点

This paper constructs a large-scale dataset that includes images generated by a variety of generative models, enhancing the dataset's practicality and broad applicability.

缺点

The paper exhibits some deficiencies in its writing logic. The transitions between paragraphs are not sufficiently cohesive, and the internal coherence within some paragraphs is lacking.
When describing the dataset, there is a lack of detailed statistical information about the data distribution, such as the number of generated images from different categories or various generative models.
The paper lacks comparative analysis with other existing datasets in terms of dataset construction; specifically, it could refer to the relevant practices in the GenImage paper.

问题

In Table 3, could you provide the parameter settings or random parameter ranges for the following augmentation methods: JPEG Compression, Add Shape, Sharpness Adjustment, Rotation, Color Jitter, Gaussian Blur, and Add Noise?
In Appendix 9 of the AEROBLADE paper, it is revealed that the image storage format in the dataset can lead models to learn compression biases, significantly affecting model performance. What is the image format of your dataset? Did you use a unified image storage format?
In Table 4, the top 7 rows use pretrained models to evaluate the generalization of different models on ACID through inference, while the bottom 3 rows use different methods to train and validate on ACID. Placing these two approaches in the same table can be confusing; I recommend separating them into two tables.
Currently, generated image detection models are not limited to texture and semantic methods. CNNSpot and SSP are not the best-performing detection models. You might consider adding some baselines (e.g., ResNet50, ViT) and some new detection models: DRCT, AEROBLADE, NPR, RIGID, ZED, Fake-Inversion (the first three are open-source, and the others will be open-sourced).
In line 127, you state that "ACIDNet consistently achieves an average accuracy of 81.1%." How was the 81.1% figure obtained? I only found possibly related data of 86.77% in Table 5.
In Table 5, what is the difference between "Texture branch only" and "SSP (ACID)"?

审稿意见

评分: 3置信度: 52024-11-02

This paper introduces a new dataset and dual-flow detection framework aimed at addressing the challenges posed by the proliferation of AI-generated images and their potential negative social impacts, such as the spread of fake news.

优点

S1: ACID Dataset: The authors present a comprehensive dataset named ACID, which contains 13 million samples sourced from over 50 different generative models and real-world scenarios. The AI-generated images in ACID are created using fine-grained text prompts, and the real-world samples are carefully selected from public data sources based on visual and caption quality, ensuring a broad representation of different image types.

S2: Extensive testing on various AI detectors demonstrates the challenging nature of the ACID dataset. ACIDNet, in particular, shows impressive accuracy of 98.01% on the ACID benchmark, indicating a substantial advancement in the detection of AI-created images.

缺点

W1: The dataset construction requires generating thousands of images for each model, which poses scalability challenges, especially for proprietary models that may not allow such extensive access.

W2: The framework proposed in this paper is simply a combination, lacking innovation. For example, it combines the addition of filters in SSP with the traditional backbone + classifier approach.

问题

None

伦理问题详情

审稿意见

评分: 6置信度: 42024-11-08

This paper is relatively well-motivated as AI-generated image detection is a crucial issue. I also find the evaluations thorough.

优点

1.The target issues of the paper are meaningful and worth exploring. 2.The motivation is clear. 3.The paper is easy to follow.

缺点

1.The number of images is small. Only 57693 real images and 42307 fake images. This number of images is smaller than GenImage.

2.GAN-based methods are not included in this benchmark.

3.Do the detectors trained on ACID benchmark perform well on real datasets? For example, the images collected from fake news on the Internet.

问题

See Weaknesses

伦理问题详情

N/A

撤稿通知

2024-11-13

I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.