7.8

/10

Spotlight4 位审稿人

最低4最高5标准差0.4

4.0

置信度

创新性3.0

质量3.0

清晰度3.3

重要性3.5

NeurIPS 2025

Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models

Tomas Soucek,Sylvestre-Alvise Rebuffi,Pierre Fernandez,Nikola Jovanović,Hady Elsahar,Valeriu Lacatusu,Tuan A. Tran,Alexandre Mourachko

OpenReview PDF

提交: 2025-05-10更新: 2025-10-29

TL;DR

Forging image watermarks from single watermarked image by removing watermarking artifacts via backpropagation through preference model.

摘要

关键词

image watermarkingwatermark forging

评审与讨论

审稿意见

评分: 4置信度: 42025-06-23

The paper proposes a novel watermark forging method that leverages a preference model to extract the watermark from a single watermarked image. By optimizing the input image, the method recovers the watermark without access to the underlying watermarking model. This makes the process more practical than prior approaches, which typically require multiple watermarked examples or white-box access.

优缺点分析

Strengths

The paper is generally well-written and easy to follow. The methodology and results are clearly described, and the topic is relevant.
The proposed approach improves practicality by not requiring access to the watermark decoder and by working with only a single watermarked image.
The authors conduct a comprehensive set of experiments and ablation studies to validate the effectiveness of their method.

Weaknesses

From the results, it appears that the proposed approach only outperforms prior work on the Video Seal case in terms of bit accuracy. It does not seem to generalize well to other use cases and does not show improvements in image quality as measured by PSNR.
The small sample size (100) may limit the reliability of the findings.
Including additional empirical results, such as AUROC, could provide a more complete picture of detection performance.

问题

The watermarking methods referenced in your work (e.g., CIN, MBRS, etc.) are all learning-based, and as such, the encoding process is non-linear and specifically non-additive. Given this, what is the intuition behind the subtraction and addition operations used in your framework to retrieve and apply the watermark?
Have you considered incorporating an image regularization loss to mitigate the unnatural smoothing artifacts observed in Figure 2 of the appendix?
Could you clarify how the watermark is selected in your experiments?

局限性

Yes

最终评判理由

The authors have addressed most of my concerns, so I will raise my score.

格式问题

No major formatting issues found.

作者回复

2025-07-31

Thank you for your thoughtful feedback. In this rebuttal, we show that our method is the best available one‑shot, black‑box attack, clearly outperforming alternatives like DiffPure in image quality while remaining highly effective. As requested, we also provide new experiments on 1000 test images confirming reliability, along with AUROC/ROC analysis to highlight detector performance in realistic FPR ranges. We hope the reviewer finds these arguments and results compelling, and if so, we would be very grateful if they could consider raising their score.

W1: Our method’s results with respect to the related work
We thank the reviewer for this comment. While the numbers may appear mixed at first glance, we argue that our method is the best attack for real‑world watermark forging and removal because it is the only practical one‑shot, black‑box method that combines strong accuracy with high perceptual quality. Here’s why:

Forging (Table 1): Our method achieves 1.00 bit accuracy on CIN, 0.83 on MBRS, and 0.83 on VideoSeal, on par with or better than DiffPure (0.75 on VideoSeal). Crucially, we preserve 31.3 PSNR, significantly higher than DiffPure (26.6 PSNR). The only method with higher quality (but lower bit accuracy) is Warfare (39.6 PSNR), but it requires 1000 watermarked images with the same hidden message, which is infeasible in realistic scenarios.

Removal (Table 2): Our method delivers 31.2 PSNR, while DiffPure and CtrlRegen degrade to 25.4 and 24.4 PSNR, respectively. The only method with higher PSNR is VAE (34.3), but it is not a functional attack: watermark bit accuracy remains ≥99%, meaning the watermark is essentially intact. In contrast, our method meaningfully reduces bit accuracy (e.g., 0.82 on CIN, 0.49 on VideoSeal) while preserving high visual quality.

In summary, our method is the most practical and effective one‑shot, black‑box attack available today as it:

Matches or surpasses other one-shot black-box methods (e.g. DiffPure) in bit accuracy while improving PSNR by +5–6 points.
Outperforms Image Averaging and Warfare in feasibility, requiring only a single image instead of hundreds or thousands.

W2: Small sample size
We chose the sample size of 100 images due to the long run time of all the experiments and baselines. To alleviate any doubts, we also evaluate our method on 1000 test images for watermark removal (see the table below) with minimal difference to the reported numbers for 100 images.

	CIN (bit. acc.) ↓	MBRS (bit. acc.) ↓	TrustMark (bit. acc.) ↓	VideoSeal (bit. acc.) ↓
Ours (test set 100 images)	0.82	0.64	0.60	0.49
Ours (test set 1000 images)	0.84	0.65	0.60	0.47

W3: Reporting AUROC
Thank you very much for the suggestion on AUROC. The paper currently focuses purely on multi-bit watermarking and introducing an experiment in a more practical scenario of watermark detection makes it more relevant. We therefore present the AUROC values for watermark forging in the table below. For watermarking, a very low false positive rate is crucial, therefore, we will also include in the paper the full ROC curves highlighting the performance of various models in the [10^(-6), 10^(-4)] FPR range.

For this experiment, we first forge the watermark with the same setup as the main paper, then extract the binary message and compute the number of matching bits with regards to the original one. We then compute the ROC, considering theoretical FPRs (assuming that the number of matching bits for non-watermarked images follows a binomial distribution). The AUROC results show, similarly to the bit accuracy results in Table 1, that our method is very competitive or outperforms the related methods while maintaining good visual quality.

	CIN (AUROC) ↑	MBRS (AUROC) ↑	TrustMark (AUROC) ↑	VideoSeal (AUROC) ↑
Image averaging (n=100)	1.00	1.00	0.88	0.51
DiffPure (FLUX.1 [dev])	1.00	1.00	0.78	0.91
Ours	1.00	1.00	0.79	0.93

Q1: Non-additive Watermarks
This is an excellent remark. Recent work (Yang et al., NeurIPS 2024) shows that even simple additive operations can remove or forge watermarks in state‑of‑the‑art systems. We argue this vulnerability arises because detectors are not trained to enforce image–watermark consistency: nothing prevents a watermark from image x from being valid on image y, making it easier for embedders to collapse toward non‑content‑aware solutions. In our experiments, swapped watermarks were still detected with high accuracy (CIN 100%, MBRS 94%, TrustMark 77%, VideoSeal 87%). This is why we urge the community to develop truly content‑aware detectors — for example, by using antiforging losses that penalize when a watermark from one image is validated on another. We will add this point in the paper.

Yang, P., Ci, H., Song, Y., & Shou, M. Z. (2024). Can Simple Averaging Defeat Modern Watermarks? NeurIPS 2024.

Q2: Image regularization loss to mitigate smoothing artifacts
After the submission, we tested adding blurring to the set of synthetic artifacts. This substantially decreased the smoothing artifacts in the images. We will include these new findings in the paper.

Q3: Watermark selection details
As described in Section 4.1, we watermarked 100 images from the SA‑1b validation set using CIN, MBRS, TrustMark, and VideoSeal, with random messages of the appropriate length. For comparability, each method used the same hidden message across all 100 images, since approaches like Warfare and Image Averaging cannot handle multiple messages. We measured bit accuracy of the respective extractors for both removal and forging.

2025-08-04

The authors have addressed most of my concerns, so I will raise my score.

审稿意见

评分: 5置信度: 42025-06-29

The authors address watermark forging from a single image in a black-box setting. They (1) train a preference model on procedurally corrupted images using a ranking loss, (2) treat the model’s score as a surrogate objective to optimize pixels and extract an estimated watermark, and (3) paste this estimate onto arbitrary targets to forge or remove post-hoc watermarks, all without access to the underlying watermarking system. Experiments on four recent schemes (CIN, MBRS, TrustMark, VideoSeal) show competitive removal accuracy.

优缺点分析

Strengths

Introduces a ranking-based preference model trained without any real watermarks.
The design of artifact types for the training preference model is reasonable.
The idea is clearly conveyed.

Weaknesses

The evaluated watermarking models lack sufficient robustness. Leading models such as StegaStamp [1] and VINE [2], known for their enhanced robustness, were not included in the tests. Incorporating experiments with these robust watermarking models into Tables 1, 2, and 3 is recommended, particularly for VINE, which claims resilience to image editing. A comprehensive evaluation of these models would significantly strengthen this work.
Do forged watermarks exhibit the same robustness as their original counterparts? Specifically, are forged watermarked images from VINE equally resistant to image editing? Investigating the impact on robustness would provide valuable insights.

[1] StegaStamp: Invisible Hyperlinks in Physical Photographs

[2] Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances

问题

Please review the Weaknesses section. If all of my concerns are fully resolved, I’ll gladly raise my rating further; if they aren’t, I may lower it.

局限性

yes

最终评判理由

Thank you very much for your effort and thorough response.

I have carefully read all the review comments as well as the authors’ responses. I believe the authors have adequately addressed my concerns, and I have raised my score.

格式问题

作者回复

2025-07-31

We thank the reviewer for their thoughtful comments. We are pleased that they find our paper clear and well motivated. We respond to the specific points raised below and kindly ask the reviewer to let us know if these address their concerns.

Q1: Results for StegaStamp and VINE
We considered StegaStamp and VINE. Unfortunately, the official StagaStamp weights are no longer available online. Nonetheless, we considered the community implementation of StegaStamp [A] and show the watermark forging results in the table below. Please note that StegaStamp is robust only because it makes very big unnatural changes to the input image – this is also reported by the authors of StegaStamp where the method achieves a very low PSNR/SSIM score of 28.5/0.905 for the 100 bit model. In contrast, the tested watermarking methods in our paper (CIN, MBRS, TrustMark and VideoSeal) reach values of more than 0.99 in SSIM and more than 40 in PSNR. The very visible changes to the input images make StegaStamp less useful in practice.

VINE claims leading robustness to various attacks such as image editing, however, this is achieved by embedding the watermark message very visibly at the image edges (see the bottom right corner of the watermarked image in the official demo_colab.ipynb of the VINE GitHub repository). The watermark can be removed from the image by simple cropping by 10 pixels, i.e., running the detection on Image.fromarray(np.array(output_pil)[:-10, :-10]) in the official colab yields bit accuracy 55%. However, as requested, we also show the watermark forging results for VINE in the table below.

Method	StegaStamp (bit acc.) ↑	VINE (bit acc.) ↑
Image averaging (n=100)	0.70	0.98
DiffPure (FLUX.1 [dev])	0.56	0.80
Ours	0.60	0.84

We use the same setup and data as for the other methods in Table 1 in the paper. The extremely high score of Image averaging for the VINE method indicates that VINE watermarks are not content dependent in practice. In comparison, the benefit of our method is that it requires only a single image to forge the watermark. Also, our method can be applied to watermarks that are highly content-dependent, e.g., VideoSeal watermarks, where the Image averaging method does not work and cannot work by principle.

[A] WAVES: Benchmarking the Robustness of Image Watermarks

Q2: Robustness of forged watermarks.
Robustness is valuable because it allows watermark owners to identify their images even if modified by various transformations that may appear in the wild on the Internet (e.g., image compression, cropping, etc.). In the context of forging, the goal of the attacker is to create forged watermarked images that appear legitimate (i.e. they are detected as watermarked by a detector). Thus, robustness to transformations is less relevant in this thread model.

Nonetheless, prompted by the reviewer, we evaluated the robustness of the forged VINE watermarks. We apply the VINE image editing script (image_editing.py from the official VINE GitHub repository) to the images with forged watermarks. We achieve bit accuracy of 66% (down from 84%). This result indicates the forged watermarks are less robust than their original counterparts but they cannot be stripped completely.

2025-08-01

Thank you very much for your effort and thorough response. I can easily imagine that conducting additional experiments must have been quite challenging, and I truly appreciate your dedication.

I have carefully read all the review comments as well as the authors’ responses. I believe the authors have adequately addressed my concerns, and I am inclined to raise my score. I would also like to observe the ongoing discussions between the authors and the other reviewers before making a final decision.

Thank you again.

审稿意见

评分: 5置信度: 32025-07-03

This paper presents a method for watermark forging and removal that requires only a single watermarked image and no access to the watermarking algorithm or decoder, representing a realistic black-box threat model. The core of the method is an image preference model trained on synthetically corrupted images to learn to detect unnatural visual artifacts. This preference model is then used as a surrogate loss function to guide gradient-based optimization that can remove or forge watermarks in a generic and transferable manner. The authors validate their method across several state-of-the-art post-hoc watermarking schemes and show strong empirical results, outperforming prior work that requires significantly more data and assumptions.

优缺点分析

Strengths:

The attack assumes only access to a single watermarked image and no interaction with the underlying watermark model or decoder, making it applicable to realistic black-box settings—unlike prior methods that require many samples or access to APIs.
The proposed method is transferable and data-efficient, working across different watermarking methods with minimal assumptions and no need for model-specific adaptation or multiple samples.
The preference model, trained on synthetic perturbations, provides a general-purpose signal for guiding watermark removal and synthesis—without requiring real watermarked data
The paper is evaluated on four major watermarking systems and reports both quantitative metrics (bit accuracy, PSNR) and qualitative visual results. It includes detailed ablation studies that strengthen the empirical claims.

Weaknesses:

While strong overall, the method underperforms on TrustMark, which uses a highly content-dependent decoder. This suggests limitations when watermarking schemes are deeply entangled with semantic content.
All evaluations are performed on benchmark datasets and simulated watermarking schemes. The effectiveness on real-world dataset remains uncertain.
The preference model is based on synthetic noise and may not fully capture semantic-level indicators of watermarks—especially if the watermark is adaptively designed or learned jointly with the content.
The experimental section lacks statistical significance analysis or variance reporting, which would strengthen claims about generalizability and robustness. Some additional theoretical analysis and/or results would strengthen the paper.
The paper does not provide robustness evaluations under adversarial detection or detection-aware defense schemes.

问题

1: Have you evaluated your method on AI-generated content from diffusion models (e.g., Stable Diffusion) that may contain embedded watermarks or platform fingerprints? How do you anticipate your method will generalize to such real-world scenarios?

2: Your method performs well on synthetic and high-frequency watermarks. How would it perform on semantic or learned watermarks (e.g., those embedded in latent space or object features)?

3: Did you test whether the forged images can evade watermark detectors used by platforms (e.g., Google SynthID)?

4: Have you compared your preference model to traditional image quality metrics (e.g., LPIPS)?

局限性

Partly: while the authors mention some safeguards and discuss a bit the ethical aspects, the justification (there are already methods out there that do the same thing, in essence) is not a great reasoning. This should be further discussed.

最终评判理由

Based on the response and clarifications, I have updated my rating and am in favor of acceptance, under the assumption the authors clarify better potential safeguards and ethical considerations, given the potential for misuse of this type of work, as the current justification for lack of safeguards is not reasonable.

格式问题

作者回复

2025-07-31

We thank the reviewer for their insightful comments and questions, as well as for finding our work relevant and empirically strong. We respond to the specific points raised below and kindly ask the reviewer to let us know if these address their concerns.

Q1: Evaluation of our method on the Stable Diffusion watermark
To the best of our knowledge, the only watermarked Stable Diffusion images come from the original CompVis GitHub repository. Other Stable Diffusion proxies, such as HuggingFace do not contain any watermarking. Images generated by Stable Diffusion from the CompVis GitHub repository are watermarked using the dwtDct method from the invisible-watermark library. This method has been proven to be very brittle and not to have good performance, as demonstrated in [11, 42]. However, to mitigate any doubts, we also run watermark removal on images watermarked using the dwtDct method and we observe that our method is able to completely remove the watermark, i.e., the bit accuracy for the resulting images is 50%. We would be happy to include any other publicly available method.

[11] The stable signature: Rooting watermarks in latent diffusion models. ICCV 2023.
[42] Tree-rings watermarks: Invisible fingerprints for diffusion images. NeurIPS 2023.

Q2 & W3: Semantic watermarks
Our method’s focus is on the post-hoc watermarking, which cannot introduce large semantic changes to the input image. This is especially true in the case when the post-hoc watermarking is used to ensure authenticity of images (e.g., if a photo comes from a reputable news source). Semantic watermarking, such as Tree-Ring or RingID, watermarks AI generated content by altering the objects and their locations in a generated image. Our method cannot semantically change these objects, therefore, different methods must be used for forging these semantic watermarks (e.g., see [31]).

[31] Black-box forgery attacks on semantic watermarks for diffusion models. CVPR 2025.

Q3: Google SynthID forging
Google SynthID is a closed source product where the watermark detector is not available, therefore, we were unable to evaluate against that. We note that at Google’s developer conference on May 20, 2025, i.e., after the NeuIPS paper deadline, waitlist-based access to SynthID has been announced. However, we were unable to gain access to the API yet. For other providers (HuggingFace, Ideogram, Flux, etc.), there is very little information on if/how they do watermarking, and even less on how to get access to the detection. Therefore we were unable to test against the detectors used by all these platforms, although doing so would have indeed strengthened the paper.

Q4: Additional qualitative metrics
We were unsure about the reviewer’s question, specifically, whether they were (a) asking us to compare our attack using the preference model against other qualitative metrics, or (b) suggesting that the preference model itself be used as a qualitative metric, in which case it would be compared to LPIPS for evaluating the quality of other images.

If (a) We show the requested LPIPS quality metric and CVVDP quality metric [B] for watermark removal in the table below. We observed that CVVDP metric correlates with human judgement better than many standard metrics. We will add the new metrics into the paper.

	PSNR ↑	LPIPS ↓	CVVDP ↑
VAE (FLUX.1 [dev])	34.3	0.060	9.58
DiffPure (FLUX.1 [dev])	25.4	0.278	7.14
CtrlRegen (step = 0.1)	24.4	0.204	7.10
Ours	31.2	0.198	8.08

If (b) This is an interesting suggestion, although it goes beyond the scope of our current paper. It would indeed be a valuable direction to explore, as we have observed that traditional metrics like PSNR, SSIM, LPIPS, and CVVDP can sometimes yield high scores for watermark distortions that are clearly visible, or, conversely, give low scores (e.g., PSNR < 30dB) for distortions that are barely perceptible. That said, using the preference model as a metric could be challenging, as its outputs may vary significantly depending on the type of image (typically textured vs flat) and could therefore be hard to calibrate. Moreover, properly evaluating the metric would require assessing its correlation with human judgments and comparing it to LPIPS, which would demand substantial effort to design and carry out a robust user study.

[B] ColorVideoVDP: A visual difference predictor for image, video and display distortions. Arxiv 2024.

W1: Lower performance for TrustMark
Indeed, the TrustMark watermarks are difficult to forge using any forging method. However, 61% bit accuracy is not random, and there are still individual forged images with bit accuracy close to 90%. Therefore, with enough effort, it is still possible to confidently fool TrustMark’s detector with a forged watermarked image. The key contribution of this work is highlighting this potential attack vector for watermarking methods and we believe this work will motivate further research into watermarking methods that utilize highly content-dependent decoders, thus making any forgery very difficult.

W2: Effectiveness on real-world dataset
Our experiments are done on the SA-1b dataset. This is a high-resolution and high-quality dataset of real-world photos. To mitigate any doubts, we also test our method on AI generated images from Stable Diffusion. From the results on watermark forging in the table below, we can see our method performs very similarly on both real-world photos as well as AI generated images. We are happy to include any other dataset in our experiments.

	CIN (bit. acc.) ↑	MBRS (bit. acc.) ↑	TrustMark (bit. acc.) ↑	VideoSeal (bit. acc.) ↑
SA-1b	1.00	0.83	0.61	0.83
Images generated by SD	1.00	0.82	0.61	0.82

W4: Statistical significance analysis
We train 3 additional preference models, each with different random seed and report standard deviation for our watermark removal results in the table below. The table shows there is not much difference in performance across different models.

	CIN (bit. acc.) ↓	MBRS (bit. acc.) ↓	TrustMark (bit. acc.) ↓	VideoSeal (bit. acc.) ↓
Ours (as reported in the paper)	0.82	0.64	0.60	0.49
Ours (mean/std, 4 models)	0.84 ± 0.02	0.64 ± 0.01	0.62 ± 0.02	0.47 ± 0.02

W5: Robustness evaluations under detection-aware defense schemes
Unfortunately, we are not aware of any such work in the context of watermarking. We are happy to include any evaluation if we are given the reference to such work or additional details on how to conduct such evaluation.

Safeguards and ethical aspects
Thank you for the feedback. We will include further discussion and clarification on the method’s impact in these areas, and would be interested in any further feedback the reviewer could give.

2025-08-07

Thank you for the response and clarifications.

I have gone over the other reviews, discussion, and paper again, and am in favor of acceptance based on the responses and clarifications.

Regarding the questions around safeguards and ethical considerations: the checklist indicates NA with the reasoning: "Justification: While our work has the potential of forging watermarks, there are other works with similar capabilities available, albeit with worse visual quality. Our work recommends mitigations that strengthen watermarking methods against various kinds of attacks, including the attack presented in this work."

The existence of other methods, especially when your method claims superior visual quality (hence perhaps harder to detect and could arise eg in deepfakes perhaps eg with watermarks from generative AI to prevent deepfakes by providing provenance e.g. knowing an image is AI-generated) is not a reasonable safeguard or justification why there are no safeguards. There are further concerns eg around IP misuse, for example, enabling copyright infringement and general IP theft. I would encourage you to think about this carefully and present some safeguard discussion, as clearly there is potential for misuse. As an extreme analogy to make the point clear: the existence of nuclear weapons (with safeguards) does not justify the development of new nuclear weapons (especially more powerful ones) without safeguards. I don't think the checklist is accurate based on the potential capabilities and potential for misuse, especially with a justification there is no need for safeguards because other things already exist.

审稿意见

评分: 5置信度: 52025-07-03

This paper proposes a generalizable black-box watermark extraction attack that requires only a single watermarked image to extract the underlying watermark pattern for forging or removal. Specifically, the authors synthesize artifacts to simulate invisible watermark patterns and train a preference model to distinguish between watermarked and clean images. This trained model is then used as a surrogate model to perform adversarial optimization and extract the watermark pattern from any given watermarked image. The method is evaluated on four watermarking techniques—CIN, MBRS, TrustMark, and VideoSeal—demonstrating its effectiveness.

优缺点分析

Strengths

The proposed watermark attack is both novel and effective. Compared to previous extraction methods, the ability to attack using only a single watermarked image significantly improves its practicality.
The paper is well-written, and the experiments are thorough.

Weaknesses

The attack would be more convincing if it were evaluated on diffusion-based watermarking methods such as RingID and GaussianShading.

问题

Why did the authors choose wave-, noise-, and line-style synthetic artifacts? What is the individual contribution of each type? Which one is most effective in mimicking real watermark patterns?

局限性

yes

最终评判理由

The author's reply resolved my concerns about which synthetic artifact is most effective. I would like to maintain my previous score.

格式问题

作者回复

2025-07-31

We thank the reviewer for their thoughtful comments. We are pleased that they find our work novel and thorough. We respond to the specific points raised below and kindly ask the reviewer to let us know if these address their concerns.

Q1: Contribution of synthetic artifacts
We thank the reviewer for raising this insightful question. To address it, we conducted an additional experiment focusing on the choice of synthetic artifact types, which are motivated by common artifact patterns observed in computer vision, such as noise and checkerboard patterns. Specifically, we trained our model independently for each style of watermark pattern to evaluate their individual effectiveness. From the watermark removal results in the table below, we can see that the wave pattern is the most effective in removing watermarks. But on average, the combination of the three watermark types produces significantly better results than only the wave style pattern. We will include this analysis in the paper.

	CIN (bit. acc.) ↓	MBRS (bit. acc.) ↓	TrustMark (bit. acc.) ↓	VideoSeal (bit. acc.) ↓
Ours	0.82	0.64	0.60	0.49
Only wave style	0.98	0.83	0.55	0.45
Only noise	1.00	0.97	0.99	0.97
Only line style	1.00	0.95	0.99	0.95

W1: Diffusion-based watermarking
Our method’s focus is on the post-hoc watermarking, which is a widely used approach that has broad applications beyond diffusion models and beyond generated content in general. In diffusion watermarking, such as RingID, watermark is encoded into an image by altering objects and their locations in the generated image. Our method cannot semantically change these objects, therefore, different methods must be used for forging these semantic watermarks (e.g., see [31]).

[31] Black-box forgery attacks on semantic watermarks for diffusion models. CVPR 2025.

2025-08-03

Thanks for the response. I have no further concerns.

最终决定Accept (spotlight)

2025-09-17

The paper proposes a method to remove of forge post-hoc watermarks from images. Therefore, it requires only one watermarked image. The method relies on first programmatically generating artifacts (without knowledge of the watermark) and then to train a preference model on watermarked and natural images. This preference model yields a surrogate loss function that can guide gradient-based optimization to remove or forge generic watermarks.

Strengths: The proposed novel and practical, requiring access only to one watermarked image and does not make assumptions on the watermarking scheme. The evaluation is thorough, both in comparison baselines and ablations on the inner-workings of the method. The rebuttal backed the strong initial results up further by adding additional qualitative metrics and experiments on additional data.

Weaknesses: The reviewers suggest the authors to tune down their claims on being the only practical solution and discussing more in detail for which setups the methods is best suited and for which applications it is limited (e.g. semantic watermarks).

As a future direction, it is suggested to extend the work towards in-generation watermarks.