PaperHub
7.1
/10
Poster5 位审稿人
最低4最高6标准差0.8
6
4
4
4
4
4.2
置信度
创新性3.2
质量3.2
清晰度3.0
重要性3.2
NeurIPS 2025

On the Coexistence and Ensembling of Watermarks

OpenReviewPDF
提交: 2025-05-10更新: 2025-10-29
TL;DR

We find that, surprisingly, different watermarks can coexist in the same image which enables us to build ensembles of watermarks that can open up new accuracy-robustness-capcity-quality trade-offs without further training.

摘要

关键词
WatermarkingEnsemblingMachine Learning

评审与讨论

审稿意见
6

This paper provides a pioneering investigation into the coexistence and ensembling of watermarking methods, demonstrating that multiple watermarks can coexist in a single image with minimal impact on image quality and decoding accuracy.

优缺点分析

Strengths:

  1. The paper offers a unique contribution by exploring the coexistence of multiple watermarks within the same image, a topic not previously studied, challenging the assumption that overlapping watermarks would overwrite each other.

  2. The paper conducts extensive experiments to validate the coexistence of different watermarks and proposes an effective method for watermark ensembling using strength control and error-correcting codes.

  3. The writing of this paper is generally good and easy to understand.

Weaknesses:

  1. The authors in this work did not present complete scientific innovation; instead, it focuses more on engineering improvements and exploration, as strength clipping and error correction are relatively mature techniques. I am somewhat concerned about whether the authors have demonstrated sufficient scientific innovation in this work.

  2. The authors are encouraged to provide some theoretical justification as to why most watermarking methods can coexist. Additionally, they could offer paradigms, such as which types of watermarking integrations tend to yield better results. For example, it would be helpful to investigate whether watermarks using different hiding techniques, such as frequency and spatial domains, are more easily integrated. This would aid in the design of future watermarking methods.

  3. For the initially embedded watermark, the second watermark added actually serves as a form of degradation. Have the authors considered jointly training the encoding and decoding processes of the first and second watermarks? This might lead to better coexistence results.

Overall, I consider this to be an excellent paper. The authors offer an original and insightful perspective on the problem of watermark coexistence, demonstrating strong innovation and solid research groundwork.

问题

Please refer to the weakness.

局限性

yes

格式问题

None

作者回复

We are extremely grateful for you noting the novelty of the question we are looking at, its counterintuitiveness, the extensive experimentation and the easy-to-understand presentation. We would like to answer your questions too:

The authors in this work did not present complete scientific innovation; instead, it focuses more on engineering improvements and exploration, as strength clipping and error correction are relatively mature techniques. I am somewhat concerned about whether the authors have demonstrated sufficient scientific innovation in this work.

We observed an interesting, counterintuitive and important phenomenon which opens a few avenues for interesting further work and we wish to share these findings with the broader research community. Feedback from colleagues and the reviewers here has consistently highlighted the novelty, non-obviousness, and potential impact of our findings, which we take as strong evidence of their value to the community.

The authors are encouraged to provide some theoretical justification as to why most watermarking methods can coexist. Additionally, they could offer paradigms, such as which types of watermarking integrations tend to yield better results. For example, it would be helpful to investigate whether watermarks using different hiding techniques, such as frequency and spatial domains, are more easily integrated. This would aid in the design of future watermarking methods.

While we did provide an intuitive explanation in Appendix A of why watermarks can coexist, formalizing this proved to be a much more ambitious project. That being said, we are nearing the completion of that and hope to soon release a follow up work putting the theory of watermarking and watermark coexistence on a solid theoretical basis. As for investigating which methods coexist better together, we have some observations to this end in the present paper. Similar methods (in terms of architecture and losses) appear to not coexist well with one another (e.g., TrustMark B and Q, or SSL with two different carrier vectors). RoSteALS appears to overwrite any other watermark that might be applied before it. DwtDct and DwtDctSvd, which are frequency-based methods tend to coexist well with all the deep watermarking methods. Therefore, similar methods appear to use similar embedding spaces and hence are less likely to coexist, while more different methods use different embedding spaces and are more likely to coexist.

For the initially embedded watermark, the second watermark added actually serves as a form of degradation. Have the authors considered jointly training the encoding and decoding processes of the first and second watermarks? This might lead to better coexistence results.

We did consider fine-tuning the models to work better with one another and even mentioned it in the Discussion section. Ultimately, we decided this would be more of a distraction from the key message of the paper because once we start to fine-tune then it is less clear whether the performance is due to the ensemble or whether it is thanks to the further fine-tuning. For practical deployments, however, this does not matter though, and we expect to improve the performance at least a bit.

评论

Thanks for the authors' response. It has solved my concerns. I will keep my score.

审稿意见
4

This paper presents the first comprehensive study of the coexistence of multiple watermarking methods within a single image. Contrary to the common assumption that one watermarking method would overwrite others, the authors empirically demonstrate that multiple watermarking methods can coexist with minor accuracy and quality degradation. The paper further proposes watermark ensembling, a post-training model modification technique that combines different watermarking methods to improve capacity and adjust trade-offs among capacity, robustness, accuracy, and image quality. Through extensive experiments using both classical and deep learning-based watermarking algorithms, the authors show that ensembling opens new avenues for watermark adaptation without retraining. This work challenges the assumption of exclusive image-channel occupation and suggests a new direction in watermark design and application.

优缺点分析

Strengths This article raises a very interesting question, one that we rarely consider in our normal work.

  1. This is the first study to systematically evaluate the coexistence of watermarking methods. The finding that multiple watermarks can be decoded from the same image significantly advances our understanding of deep watermarking.

  2. The authors benchmark a wide range of watermarking methods (e.g., HiDDeN, RivaGAN, SSL, TrustMark) in pairwise combinations and provide a large empirical matrix showing coexistence results.

  3. A geometric explanation is provided to intuitively explain why watermark coexistence may occur, supported by experimental data.

Weaknesses

  1. Ensembling is shown to improve weak models, but often fails to outperform strong models in all metrics. This limitation is acknowledged but makes ensembling less compelling as a general-purpose solution.

  2. The increased inference cost from ensembling two models is not quantitatively analyzed. While suggestions such as distillation are mentioned, no concrete evidence is provided.

问题

After reading this article, I understand the author's attempts to achieve watermark coexistence, but I still have the following questions I would like the author to answer.

  1. A curious point in Table 1 is the presence of an extraction accuracy of 0, which is rare for a multibit watermark, as the accuracy should be about 50% to be reasonable when the watermarked signal is completely erased. And 0% means that the extracted watermark is the bitwise inverse of the original watermark, which is also unusual for the same method embedding different random watermarks; therefore, the authors should add a note about this.

  2. From the results of the experiments, many of the current watermarking methods actually embed the watermark in a different feature space, thus enabling coexistence in the realization. Then, suppose we manually construct different orthogonal feature spaces (e.g., FFT), and constrain the same watermarking method on different orthogonal feature spaces, can the same watermarking method realize coexistence?

  3. The authors tested robustness by embedding both watermarks before testing robustness. However, as shown in Fig. 1, the distortions of the watermarked image may be sequentially superimposed, and the feature space corresponding to the watermarked signal of watermark 1 may be shifted towards the feature space corresponding to watermark 2 after some distortion, and then embedding watermark 2 will erase the signal of watermark 1 to a greater extent.

局限性

yes

格式问题

NA

作者回复

We appreciate you finding this first systematic study on the coexistence of watermarking methods to significantly advance our understanding of deep watermarking! As for your questions, you can find our answers below:

A curious point in Table 1 is the presence of an extraction accuracy of 0, which is rare for a multibit watermark, as the accuracy should be about 50% to be reasonable when the watermarked signal is completely erased. And 0% means that the extracted watermark is the bitwise inverse of the original watermark, which is also unusual for the same method embedding different random watermarks; therefore, the authors should add a note about this.

We are reporting the percentage of samples for which all bits of the messages were successfully decoded, meaning bit accuracy of 100%. Hence, when the signal is entirely removed, we indeed have bit accuracy of roughly 50% and therefore the fraction of samples with their full message correctly decoded is 0%. We tried to explain this in lines 71–74 and in the caption of Table 1 but will further expand on it in the main text of the revised version of the paper.

From the results of the experiments, many of the current watermarking methods actually embed the watermark in a different feature space, thus enabling coexistence in the realization. Then, suppose we manually construct different orthogonal feature spaces (e.g., FFT), and constrain the same watermarking method on different orthogonal feature spaces, can the same watermarking method realize coexistence?

This is a really good question, and indeed one of the questions we find very interesting for further studies! We discuss it a bit in the beginning of Section 5. Indeed, one could split the space of all possible images into several subspaces and constrain each method to one of them. A naive way of doing this would be to give each method a different channel (three methods can coexist this way) or to split the image in an appropriate number of tiles. However, one complication is how such a split would affect the robustness to augmentations, e.g.: if we split channels but then perform a color transform, this might mix the individual watermarks. Therefore, this designation of orthogonal spaces needs to be done in a way that these subspaces are also invariant to the augmentations one is considering. Moreover, this would prevent additional actors embedding watermarks post-factum.

The authors tested robustness by embedding both watermarks before testing robustness. However, as shown in Fig. 1, the distortions of the watermarked image may be sequentially superimposed, and the feature space corresponding to the watermarked signal of watermark 1 may be shifted towards the feature space corresponding to watermark 2 after some distortion, and then embedding watermark 2 will erase the signal of watermark 1 to a greater extent.

We are afraid we don’t fully understand your question. Would you be able to elaborate on it? We do explore the sequential application of watermarks as one of the mechanisms for ensembling in the paper, and explore its impact in terms of accuracy (i.e. the extent to which watermark 1 interferes with the signal of watermark 2 and vice versa), image quality and robustness to perturbations.

We would also like to comment on the two weaknesses you have highlighted:

Ensembling is shown to improve weak models, but often fails to outperform strong models in all metrics. This limitation is acknowledged but makes ensembling less compelling as a general-purpose solution.

This is absolutely correct! The goal of our paper was to assess in an unbiased and as comprehensive as possible way if and when ensembling of watermarking methods can be of practical utility. And we found out that it is useful as a tool for trading-off various axes of performance, even if it fails to produce models that outperform the constituent models across all axes.

The increased inference cost from ensembling two models is not quantitatively analyzed. While suggestions such as distillation are mentioned, no concrete evidence is provided.

The inference cost of ensembling, as with any kind of ensemble, is linear in the number of constituent models. So, if we are ensembling two models, the cost of the ensemble will be the sum of the costs of the individual models. Distillation is something we have not actively explored because it will also complicate the analysis, it would be unclear if some of the performance improvements would stem from the additional training, rather than directly due to ensembling. As such, we decided to consider only training-free ensembles.

评论

Thank you for your response. I have some follow-up questions that I would like to further explore with you.

The authors tested robustness by embedding both watermarks before testing robustness. However, as shown in Fig. 1, the distortions of the watermarked image may be sequentially superimposed, and the feature space corresponding to the watermarked signal of watermark 1 may be shifted towards the feature space corresponding to watermark 2 after some distortion, and then embedding watermark 2 will erase the signal of watermark 1 to a greater extent.

I apologize for not expressing myself clearly earlier. In the paper, the robustness experiments are conducted in the order of Watermark 1 → Watermark 2 → Distortion → Robustness Testing. However, in practical scenarios, a more common sequence would be Watermark 1 → Distortion → Watermark 2 → Distortion → Robustness Testing. I was wondering if you have conducted any experiments following this setting?

评论

Thank you so much for clarifying! This is indeed a really interesting question. We did not study this because that would need sets of augmentations that are partitioned into two parts. Not impossible, of course, but would add further complexity.

That being said, the experiment you propose could be very interesting. Intuitively, it feels that adding an additional distortion between the introduction of the first and the second watermark would probably reduce accuracy a bit. However, this need not be the case. A small added distortion might, in fact, help the detectability of both watermarks by further separating the subspaces in which they operate.

Overall, a very interesting question! We will look into whether we can perform some additional experiments to this end. Thank you for the suggestion!

审稿意见
4

This paper presents the first systematic study of deep image watermark coexistence, challenging the prevailing intuition that watermarks from different methods inevitably interfere with one another. The study demonstrates that combining multiple watermarking methods using ensemble techniques can enhance overall performance. This approach enables capacity expansion and optimization of quality robustness without the need for model retraining.

优缺点分析

Strengths

  1. This study provides the first in-depth examination of image watermark coexistence, addressing a novel and intellectually compelling research question.

2.The work demonstrates significant potential by showing how intelligently combining different watermarking methods through ensemble techniques can substantially improve overall watermark performance metrics.

Weaknesses 1.The presentation of the paper is poor; the main content is overly reliant on dense textual analysis, which makes the narrative hard to follow. The authors should consider adding more visualizations to support the analysis and move less important findings to the appendix.

2.The study primarily focuses on experimental observations of watermark coexistence but offers only a naive use of ensembling, lacking deeper theoretical insights or more effective strategies for watermark performance enhancement.

问题

1.The authors argue that watermark coexistence implies orthogonal or non-overlapping perturbation spaces. However, they do not analyze the actual perturbation directions across methods. Why didn't the authors perform spectral or subspace overlap analysis of the residuals to empirically support this hypothesis?

2.The paper primarily investigates pairwise coexistence between two watermarking methods. Have the authors considered whether similar coexistence holds when three or more watermarking methods are applied sequentially or in parallel? Is there evidence that the observed compatibility generalizes to higher-order combinations, or do interference effects compound beyond pairwise interactions?

局限性

1.While the study shows that different watermarks can coexist in the same image, it lacks a deeper understanding of why this happens. The explanation based on orthogonal perturbation subspaces is mostly qualitative, without empirical analysis of residual directions or interference patterns.

2.The paper only considers pairwise combinations of watermarking methods. It remains unclear whether similar coexistence holds when three or more methods are applied. This limits the practical relevance for multi-actor pipelines where multiple watermarks may accumulate.

3.In addition, the ensembling strategy is relatively naive. It relies on simple residual averaging or sequential application, without any learning-based coordination or optimization. As a result, the full potential of watermark coexistence is not fully explored.

最终评判理由

Thanks for the response. I have updated the score.

格式问题

NA

作者回复

We would like to thank the reviewer for recognizing that this is the first in-depth examination of image watermark coexistence, a novel and intellectually compelling research question, and for finding our work to be of significant potential. With this in mind, we would like to address your questions and concerns.

Orthogonal perturbation spaces. You asked if we did perform spectral or subspace overlap analysis of the residuals to empirically support our hypothesis that different watermarking methods use different orthogonal spaces. We did study that but found the results of little additional insight and hence they did not make it to the draft. We can add these results in the revised version.

In a nutshell, the answer is that yes, the residuals of all the methods we considered are almost perfectly orthogonal. This is also visible from the samples we have in Appendix E: the residuals are visually very different in RGB, YCbCr and Fourier space. For numerical comparison, the table below shows the angles in degrees (computed in RGB space) between the 8 residuals (differences between the watermarked image and the cover image) for the first image in Appendix E (the credit card). This result is very much expected: the residuals are in a very-high dimensional space where most vectors are nearly-orthogonal.

RivaGANDwtDctHiDDeNSSLTrustMark QTrustMark BRostealsDwtDctSvd
RivaGAN0.00000089.99999789.99999989.99999889.99999889.99999589.99999990.000000
DwtDct89.9999970.00000089.99999589.99999589.99999789.99999989.99999489.999997
HiDDeN89.99999989.9999950.00000089.99999489.99999889.99999489.99999889.999999
SSL89.99999889.99999589.9999940.00000089.99999789.99999989.99999589.999999
TrustMark Q89.99999889.99999789.99999889.9999970.00000089.99999689.99999689.999996
TrustMark B89.99999589.99999989.99999489.99999989.9999960.00000089.99999589.999995
Rosteals89.99999989.99999489.99999889.99999589.99999689.9999950.00000089.999997
DwtDctSvd90.00000089.99999789.99999989.99999989.99999689.99999589.9999970.000000

Applying more than two watermarking methods. You asked whether the results on coexistence extend to more than two watermarks. It indeed does, and we conducted some experiments to this end. However, the more watermarks you apply to the same image, the more the image quality and decoding robustness go down, which is, of course, expected. The other problem is that there are much fewer combinations of three or four watermarking methods, and much more permutations of each one of them, making the analysis of the results more complicated and less useful. Nevertheless you can see our results in the two tables below: the bit accuracy remains high (>88%) but the PSNR goes down to about 30dB.

Three watermarks:

Method 1Method 2Method 3Bit Acc for Method 1Bit Acc for Method 2Bit Acc for Method 3PSNR
DwtDctRivaGANSSL88.85%99.97%100.00%30.60 dB
TrustMark QDwtDctRivaGAN98.97%89.95%99.97%30.77 dB
TrustMark QDwtDctSSL97.88%89.37%100.00%31.24 dB
TrustMark QRivaGANSSL97.79%99.94%100.00%31.96 dB

Four watermarks:

Method 1Method 2Method 3Method 4Bit Acc for Method 1Bit Acc for Method 2Bit Acc for Method 3Bit Acc for Method 4PSNR
TrustMark QDwtDctRivaGANSSL97.86%87.69%99.97%100.00%29.82 dB

More advanced ensembling strategies. You asked why we restricted ourselves to rather simpler ensembling strategies like series applying the watermarking methods one after the other (in series) or averaging their individual residuals (in parallel). The main reason for that is simplicity and clarity of the evaluation.

These two methods for combining watermarks are natural and non-parametric. Furthermore, the main motivation for us to study the problem of watermarking coexistence was to understand non-cooperative multi-actor watermarking, i.e., multiple organisations putting their own watermarks in a piece of media without actively coordinating. This would be exactly the series ensembling setup and is the one that is most likely in practice.

That being said, we are very open to the possibilities of more advanced ensembling approaches. We outline several such ideas in the Discussion section:

Beyond parallel and series ensembling, there could be more advanced adaptive techniques or small learnable mixers that one could use for more performant ensembles. Finally, a further boost to the accuracy and robustness of the ensembles could be possible by fine-tuning their decoders on the jointly watermarked images.

One concern with learned solutions is, though, that then it might be more difficult to ensure that the boost in performance is due to ensembling, and not due to the additional fine-tuning and/or the added model capacity.

Therefore, we chose the setups that result in clear-cut and unambiguous evaluation and relevance for the coexistence cases that occur in our practice.

审稿意见
4
  • This paper presents the first investigation into the coexistence of multiple DL based image watermarks in a single image
  • The paper challenges the assumption that sequentially applying watermarks would overwrite previous signals
  • Primary contribution: Empirically demonstrates that many pairs of existing open source watermarking methods can coexist with minor degradation in decoding accuracy for each watermark with little impact on image quality. Experiments across 8 watermarking techniques are presented to support this.
  • Another contribution of the paper: Proposes ensembling as a post training tool for modifying watermarking systems by combining different watermarks (sequentially or in parallel) plus techniques like strength clipping and ECCs.
  • The work highlights practical use cases ex: 'super watermark' for identifying the correct decoder and 'multi actor provenance chains' for attribution.

优缺点分析

Strengths

  • The finding that different watermarks can coexist to an unexpected degree is a significant discovery and opens a new direction for research.  
  • The empirical investigation about the coexistence is extensive and sound in many ways:
    • 8 methods are tested from classical frequency based techniques (DwtDct, DwtDctSvd) to multiple modern deep learning approaches (HiDDeN, RivaGAN, RoSteALS, SSL, TrustMark B, TrustMark Q).
    • The results (Table 1) provide strong evidence for the coexistence phenomenon.
    • Reports full secret accuracy rather than the bit accuracy which is a great choice because it better reflects the practical utility of a watermark.
    • The analysis is rigorous - exploring boundary conditions of coexistence, noting that a method cannot coexist with itself (Table 1) and that methods from the same family also overwrite each other.
  • Further, The paper is well written and easy to follow

Weaknesses

  • The paper's robustness tests are conducted against image augmentations that are specific to the training protocols of the models being tested ex: RivaGAN augmentations for RivaGAN, SSL augmentations for SSL etc in Appendix D. These does not seem like a realistic scenario where an attacker would use more potent attacks for ex: the evaluation is missing modern watermark removal techniques like diffusion based attacks, which have been shown to be highly effective. The authors could also use standard benchmarks like WAVES that include more diverse and challenging set of attacks.
  • The analysis only uses PSNR as the metric for image quality. This is a flaw as the paper's main argument about ensembling involves finding a trade off between increased capacity and decreased quality. A small drop in PSNR like 2 db could correspond to a (perceptually) significant and unacceptable level of visual artifacts. Evaluation using more perceptually relevant metrics such as LPIPS or SSIM, or a user study would be helpful.
  • The ensembling is not shown to be a consistently better strategy. For ex: Figure 6 where applying ECC to a single strong model (TrustMark Q) yields better performance across the accuracy-quality spectrum than ensembling it with a weaker one (RivaGAN). The contribution is more an exploration of the trade off space than a new SOTA technique. It is not clear under which conditions ensembling is the best choice.

问题

Following the order of the weakness section above:

  • How do the coexistence and ensembling results hold up against a standardized benchmark of more potent and modern attacks like diffusion based attacks? Showing non trivial resilience to these attacks would be very helpful to support robustness claims.
  • Results with re-evaluation on more perceptually metrics like LPIPS or SSIM would be helpful. Presenting that the observed quality degradation is perceptually minimal for a significant capacity gain would also be useful.
  • Can the authors more clearly describe the conditions under which ensembling provides a pareto improvement? A clearer characterization of the regime where ensembling is the better choice would be helpful.

局限性

Yes

最终评判理由

Tha authors addressed all my questions in the discussion below. Therefore I would like to maintain my current rating and review.

格式问题

No

作者回复

Thank you so much for finding our work comprehensive, rigorous and easy to follow. As for your outstanding questions and concerns, here are our answers:

The paper's robustness tests are conducted against image augmentations that are specific to the training protocols of the models being tested ex: RivaGAN augmentations for RivaGAN, SSL augmentations for SSL etc in Appendix D.

We want to clarify that we don’t evaluate each method only against its own set of augmentations but against all sets of augmentations. Hence, every experiment included evaluation against the same 5 sets of augmentations.

How do the coexistence and ensembling results hold up against a standardized benchmark of more potent and modern attacks like diffusion based attacks? Showing non trivial resilience to these attacks would be very helpful to support robustness claims.

This is a really interesting question! We did not evaluate against removal attacks because none of the models in the paper is robust to them (as also mentioned in the response to Reviewer ZkVT). Therefore, it is highly unlikely that the ensemble would be robust to removal. Similarly, we did not evaluate against augmentations not considered by the base methods because we expected robustness there to be low, due to the base models not being robust to them. Our main focus was to check when the base methods don’t deteriorate after ensembling.

Results with re-evaluation on more perceptually metrics like LPIPS or SSIM would be helpful. Presenting that the observed quality degradation is perceptually minimal for a significant capacity gain would also be useful.

Additional perceptual metrics is also something we considered extensively. However, we did not see them adding much additional insight. They tend to be generally correlated with PSNR (although, of course, not perfectly predicted by it). Ultimately, we decided that would further complicate the presentation of already dense tables and figures of results while offering little additional insight. Nevertheless, we can add these values to the camera ready version.

Can the authors more clearly describe the conditions under which ensembling provides a pareto improvement? A clearer characterization of the regime where ensembling is the better choice would be helpful.

That is a great question! We found no ensemble that is strictly better in all dimensions than its constituent models, as we elaborate on that in the “Ensembling is not sufficient to produce state-of-the-art watermarking models” paragraph on page 7. And we find this very surprising, because in every other setting that we are aware of, ensembling tends to boost performance. Therefore, one of our most interesting findings is that while ensembling is indeed, in principle possible, in practice it does not seem to be of practical utility. We are actively working on understanding why that is the case.

评论

I appreciate the authors’ responses to my questions and will maintain my current rating.

审稿意见
4

The paper investigates whether multiple image watermarks can co-exist in a single image without interfering with each other's decodability. The authors find that many watermarking algorithms can coexist with minor degradation to image quality and decoding robustness. Hence, they introduce watermark ensembling, combining multiple watermarking models to improve message capacity and explore new trade-offs between capacity, image quality, robustness, and accuracy.

优缺点分析

The authors address a very interesting, timely, and real-world (and well-motivated!) challenge of whether multiple watermarking methods can co-exist on the same image. The questions they pose are relevant and can make a direct impact as to how we think of watermark deployment. They addresses real-world scenarios where multiple actors (creators, publishers, model providers) may add their own watermarks. They test a wide range of watermarking methods and combination strategies. I particularly appreciated the focus on open-source methods, allowing reproduction of their results and encouragement of use of these watermarking methods.

That said, there are some downsides of the paper as of now that should be addressed by the authors before publication. It is not clear from the paper, which dataset was used for the comparison of the co-existence of the different watermarking methods, and generally the paper would benefit from a more in-depth description of the methodology of this part. This also includes a better presentation of the results in Table 1, which is hard to read and to compare the different results (maybe a matrix plot, e.g., https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.matshow.html) could help here? Besides the fact that the watermarks can co-exist, and that it is indeed still possible to retrieve them, it would be good to extend a bit on the rest of the impact of multiple watermarks, such as describing (in the main paper) more the quality decrease (quantitatively) as well as at least an intuition as to whether watermark coexistence makes the watermarks more vulnerable to removal or spoofing. Overall the paper would benefit from stronger theoretical grounding - the empirical coexistence is surprising, but no strong theoretical framework explains why it occurs or how to design for/against it.

问题

  • How resilient are coexisting watermarks to removal attacks or adversarial perturbations? Can you conduct experiments testing whether watermark removal tools (e.g., generative inpainting, adversarial attacks) are more effective against coexisting watermarks?
  • See above review

局限性

Yes

最终评判理由

I appreciate the authors' feedback. My concerns are mostly addressed, hence I will keep my score as I believe it still adequately reflects my evaluation of the paper.

格式问题

None

作者回复

We are extremely happy that you found our work interesting, timely and well-motivated and posing relevant questions, with our results being directly impactful. Your comments on the presentation are indeed valid and we would like to briefly address them here.

How resilient are coexisting watermarks to removal attacks or adversarial perturbations? Can you conduct experiments testing whether watermark removal tools (e.g., generative inpainting, adversarial attacks) are more effective against coexisting watermarks?

That is a great question. We did not study this because the constituent models are themselves not designed to be robust to removal attacks, adversarial perturbations, denoising attacks and other more active ways of stripping the watermark. Therefore, it is highly unlikely that the ensemble will have emergent robustness properties beyond these of the individual models. Our main focus is checking if the base methods don’t deteriorate after ensembling, hence we only evaluate against the robustness objectives of the 8 watermarking models in the paper.

It is not clear from the paper, which dataset was used [...]

We used a dataset with images that we have the rights to train and evaluate on. We have anonymized its name for the submission as that would uniquely identify us. It contains diverse images of different aspect ratios and resolutions, with roughly half being photographs and half being graphic art. You can see a small sample of the dataset in Appendix E. We will provide further details in the non-anonymous camera ready version of the paper and also share the dataset itself.

Better presentation of the results in Table 1.

We agree that Table 1 is cluttered. All the information from Table 1 (and a lot more) is also in Figure 10, but that is also difficult to read. We will explore more clear ways of presenting it.

Overall the paper would benefit from stronger theoretical grounding - the empirical coexistence is surprising, but no strong theoretical framework explains why it occurs or how to design for/against it.

We believe that a comprehensive empirical evaluation of a new phenomenon is a key contribution we can offer the broader community, as this is the basis upon which further studies can lie. We agree that stronger theoretical grounding on why and when watermark coexistence occurs would be extremely beneficial. Hence, we outlined some intuition and ideas to this end in Appendix A. Furthermore, in a follow-up work, we are studying the theoretical interplay of capacity, quality and robustness, which casts some more light on watermark coexistence.

评论

I appreciate the authors' feedback. My concerns are mostly addressed, hence I will keep my score as I believe it still adequately reflects my evaluation of the paper.

最终决定

This paper presents a systematic study of image watermark coexistence, showing that multiple watermarking methods can co-exist in the same image with limited interference, and introduces watermark ensembling to explore trade-offs among capacity, robustness, accuracy, and quality.

Strengths: The work addresses a timely and practical question, provides extensive empirical evaluation across many watermarking methods, and reveals the interesting and potentially impactful finding that watermark coexistence is feasible. Weaknesses: The paper lacks a deeper theoretical grounding, relies heavily on empirical observations, and one reviewer believes that the proposed ensembling strategy is relatively naive.

Reviewers agreed that the contribution is novel and practically relevant, though largely exploratory; while the rebuttal clarified presentation issues, the need for stronger robustness testing and theoretical analysis remains, making the work empirically valuable but not fully comprehensive.

I recommend acceptance because the study opens an important new direction and provides strong empirical evidence, even though its impact is limited by the lack of theoretical justification.