Efficiently Identifying Watermarked Segments in Mixed-Source Texts
摘要
评审与讨论
This paper proposes an efficient watermark localization when the text under scrutiny is a mix of generated (and watermarked) sentences and human (and non-watermarked) passages.
优点
S1. This paper tackles an important issue in LLM watermarking
S2. The connection with TV regularized estimator is nice
缺点
W1. The Document False Positive Rate is not under controlled
There are multiple reasons:
- For a given chunk in the GCD, the detection translates the score into a p-value with the correct formula except for Unigram where this is done via a Z-statistic. Since Unigram and KGW are very similar, I wonder why Unigram deserves special treatment. The Z-statistic approximation may not be accurate for a length of 32, especially when targeting low segment-FPR . Moreover, Fernandez et al. advice to remove repeating tokens windows. This is done for Unigram, but not for Aaronson and KGW.
- The algorithm first sets the segment-FPR (also called FPR for intervals) and then extrapolates the Document-FPR (also called Family-Wise Error Rate). It should be the other way around.
- This extrapolation is done empirically (line 369)
- The only approximation is an upper bound (line 255) which is vacuous for the experiments: and in the order of (Table 3). Note that the Document-FPR is not given in Table 3.
W2. Dilution of the watermark
The authors argue that classical watermark detection makes a global decision, which is less reliable when a small fraction of the text is watermarked. I do agree, but it also holds for their proposal. For a given Document-FPR (which should be requirement number 1, independent of the text length), there are more intervals in GCD. Therefore, the threshold at the interval level is a lower p-value, increasing the probability of missing the detection. This difficulty is inherent to the problem.
W3. Strong limitations
In the same way, "Additionally, positive samples created by inserting the generated watermark paragraph into natural text may not be detectable with our approach." , "we assume that our method needs only to detect reasonably high quality watermarked text segments". Limitations are hidden in the appendices. I disagree that this is due solely to the watermarking technique. This is partly due to the new detection scheme (see W2). Appendix A.3 reveals the experiments are biased by a filtering technique sieving the good examples.
W4. Completeness
A recap of the ALIGATOR algorithm as applied on watermarking detection in the appendix would be a good idea as the papers from Baby are not easy to read.
W5. Advantage
The main advantage is the complexity in . I am not so sure it makes a big advantage. One first computes the score per token in . This is the slowest computation. Then analyzing these real values one way or another should not make a big difference (compared to the first step).
W6. Experiments The experiments consider only one configuration: normal text + watermarked text + normal text. What happens if the watermarked text is spread all over the document?
问题
Q1. Some parameters are missing
The value of (line 228)? The value of the window size for Aaronson and KGW?
Q2. Baselines
The baseline using RoBERTa is very surprising. If (like for Unigram), why not! But for , I just don't understand how it could work. Why the baselines mentioned in Section 2.3 are not considered in the benchmark? In the end, the problem seems to be similar to a segmentation-based object categorization. This opens the door to many baseline solutions like Graph min-cut using spectral techniques.
Q3. Typos
- line 161. to be replaced by ?
- Eq.(3) is not 100% `correct'. is not one interval, but a subset of intervals. is not a union of intervals but a set of intervals.
- Eq. line 298. I have some doubt. First, I have difficulty finding this result back in the paper of Baby. Second, to be replaced by ? is compared to a unique value not depending on ?
Thank you for your comments.
W1:
We acknowledge your observations and will address this in the next version.
W2 & W3:
Although interval-level thresholds need to reach a low level to reject the null hypothesis ("the text does not contain a watermark"), intervals within watermarked segments are more likely to be detected with lower p-values compared to document-level detection mixed with natural text. This distinction supports the robustness of our method.
W4:
We will include a recap of the Aligator algorithm in the revised draft for clarity.
W5:
Existing algorithms, such as the Vanilla detection method, operate in O(n) but cannot detect watermark fragments in mixed-text scenarios, requiring additional computation. Our approach addresses this limitation.
W6:
If watermarked text is spread throughout the document, the task can be decomposed into smaller segments where the AOL algorithm can be applied iteratively.
Q1:
For ζ, we use 0.65 for KGW and Unigram Watermarks, and 1.3 for Gumbel Watermarks. For the window size, we use h = 2 for Gumbel and h = 1 for KGW. These details will be discussed in the revised version.
Q2:
We believe that h > 1 does not imply the watermark is unlearnable and explore whether RoBERTa can learn the watermarking rule. Thank you for suggesting Graph min-cut; it is worth exploring in future work.
Q3:
Thank you for pointing out the typos. We will fix these.
We appreciate your feedback and will incorporate these improvements in the next version.
The paper proposes methods to identify and pinpoint watermarked segments in long and mixed-source texts.
优点
The paper proposes two methods, not only detecting the watermark segments in long texts, but also identifying the position of the watermarked segments.
缺点
The technical contribution of the paper appears limited. It does not propose a novel watermarking scheme or detection method. Instead, it leverages the Geometric Cover technique introduced by Daniel et al. for designing the collection of intervals used in watermark detection, and it utilizes the Aligator algorithm proposed by Baby et al. for watermark localization. The paper seems to be a straightforward combination of these existing ideas. There is little connection between these two methods, and the paper does not address whether the complexity could be further optimized by applying both methods simultaneously.
To enhance its value, the paper could benefit from a more comprehensive evaluation. One of its key strengths is efficiency, and it would be beneficial to include experiments that assess efficiency in terms of both time and computing resource utilization.
问题
How did you choose the thresholds in Algorithm 2?
Thank you for your thoughtful feedback. Below, we address your concerns and questions in detail:
The technical contribution of the paper appears limited. It combines existing methods (Geometric Cover by Daniel et al. and Aligator by Baby et al.) without clear novelty or connection.
We believe that introducing online algorithms into watermark detection is an innovative attempt. Additionally, watermark localization is indeed a relatively new task within the research community, and we are pleased to contribute to its exploration. We would like to clarify that the core concept of the Geometry Cover (GC) approach is inherently integrated into the Aligator (AOL) algorithm. The progression from GC to AOL reflects an extension and enhancement of the underlying principles. Specifically, the AOL algorithm incorporates a Geometric Cover methodology internally, where words located mid-paragraph are typically included within multiple intervals of varying lengths during updates.
The paper could benefit from a more comprehensive evaluation, particularly regarding efficiency (time and resource utilization).
Thank you for this suggestion. We agree that efficiency is one of the strengths of our method, and we will include experiments that assess runtime performance and resource utilization.
How did you choose the thresholds in Algorithm 2?
The thresholds in Algorithm 2 were determined empirically through experiments designed to balance detection accuracy and computational efficiency. We will make this clearer in the revised manuscript and provide additional details on how these thresholds were selected.
Thank you again for your constructive suggestions!
This paper aims to identify individual watermark segments within longer, mixed-source documents. The authors utilized geometry cover to partition the document into segments, then compute a detection score for each segments. If one segment is detected as watermarked, then the whole document is watermarked. Then they introduced an adaptive online learning algorithm Alligator to pinpoint the precise location of watermark segments. Evaluations show that the approach achieves high accuracy, significantly outperforming baseline methods.
优点
The topic is interesting. The idea to exploit online learning prediction to localize watermark segments is attractive.
缺点
1 The proposed method lacks a clear description of its underlying intuition. 2 The reason for choosing the compared baseline method is missing. 3 Some parameter settings require further explanation. 4 Some variables lack clear explanations, such as "set N "on line 213, which set? and FPR-1, FPR-2, et al.
问题
1 What is the difference of partial watermarked text localization for traditional and LLM scenarios? 2 The authors described some related works on identifying watermarked portions in long text in Section 2.3, and claimed that the AOL improves the efficiency of detecting watermarked portions, why not choose them as baseline? Why not compared the time cost? 3 According to GCD, some segmentation boundary is fixed across different I(k), how about the situation that the fixed boundary is in the middle of some watermarked content? 4 In section 3.2, why start from 32? 5 In the experimental settings, the watermarking percentage is set as 10% of the mix-sourced text. Why? I suggest evaluating the method under different watermarking percentages.
Thank you for your thoughtful and detailed feedback. Below, we address each of your comments and questions comprehensively:
The proposed method lacks a clear description of its underlying intuition.
What is the difference in partial watermarked text localization for traditional and LLM scenarios?
As described in lines 044–052 of the manuscript, the primary intuition behind our approach is to extend watermarking detection methods from segment-level classification to localization within longer texts. Traditional scenarios typically deal with shorter, isolated segments, whereas LLM scenarios often involve extended, complex contexts where watermarked and non-watermarked portions may overlap. Our method bridges this gap by enabling accurate identification of watermarked regions in such challenging long contexts.
The reason for choosing the compared baseline method is missing.
Why not choose related works (e.g., in Section 2.3) as baselines? Why not compare time cost?
We selected the original watermark detector associated with each watermarking method as the baseline for comparison in segment detection tasks. The related works in Section 2.3 were not chosen because they are tailored to specific scenarios and lack generalizability. In contrast, our method is validated across three distinct settings: KGW, Unigram, and Gumbel. Regarding time cost, we acknowledge this as an important metric and will include an evaluation of time efficiency in the revised version to strengthen our comparisons.
Some parameter settings require further explanation.
What happens if a fixed segmentation boundary falls in the middle of watermarked content?
In Section 3.2, why start from 32?
Why evaluate watermarking percentage at 10% in the experimental settings?
- If a segmentation boundary falls within a non-minimum detection segment, detection remains unaffected as smaller segments provide sufficient coverage. If it falls within a minimum detection segment, the edge of the watermark may be slightly misclassified. However, this does not impact the primary goal of GCD, which is to detect watermark presence. AOL handles precise boundary localization, where this issue does not arise.
- The choice of starting from 32 stems from the original watermark detection method. While other values like 30 or 50 could also be used, smaller values increase errors due to natural text may also contain 10 consecutive red list words. This threshold ensures a balance between detection accuracy and robustness.
- We used 10% watermarking as a representative setting based on common use cases. However, we performed ablation studies (refer to Table 3) that explore variations in watermarking percentages. These results show consistent trends across different watermarking levels.
Some variables lack clear explanations.
What does "set N" refer to, and what are FPR-1, FPR-2, etc.?
- Set N: Refers to the set of natural numbers.
- FPR-1, FPR-2, etc.: These represent calibrated false positive rates for pre-segment detection, evaluated at three distinct levels (FPR-1, FPR-2, FPR-3).
We will add these clarifications to the revised version.
We deeply appreciate your valuable feedback, which has helped us identify areas for improvement. Thank you again for your time and thoughtful review.
This paper proposes a new method for detecting watermarked short text in long texts. First, multi-scale division is performed based on GC algorithm to roughly identify whether there is a watermark (paragraph-level identification). Next, AOL is used for adaptive and precise positioning to determine the specific watermark location (token-level identification). The proposed method can effectively reduce the time calculation complexity and greatly improve the recognition of the original watermark algorithm.
优点
It is proposed to first use GC to quickly and roughly determine whether a long text contains a watermark. Then AOL is used for "denoising" to accurately identify the watermark position.
Based on three watermark algorithms on the C4 and Arxiv datasets, watermark detection experiments in long texts were carried out, demonstrating the superiority of the proposed method over some baseline detection methods.
缺点
The threat model is not discussed enough. The design of this method assumes that the watermark detection algorithms can achieve perfect detection (or encounter minimal false positives and false negatives), which is overly idealistic. Furthermore, common adversarial watermarking techniques, such as manual modifications or rewrites of watermark texts and the adversarial manipulation of certain tokens, have not been formalized in this context.
The original contribution is not solid enough: For the recognition of target short texts in long texts, it is a natural idea to use GC or similar divide-and-conquer algorithms, binary search, etc. for multi-scale fast detection. The proposed method does not make enough innovations for watermark recognition scenarios. In addition, the AOL used seems to be a complete copy of existing work, and it also lacks original improvements.
Experimental settings are too narrow: First, the watermark text is fixed at a 10% proportion, which lacks a reasonable justification. What trends in detection results would emerge if the watermark text proportion were lower (1%) or higher (99%)?
Second, tokens modified by LLMs may inherently be more detectable. In other words, the detection of watermark texts might not stem from the presence of watermarks but rather from their generation by an LLM. Therefore, it is recommended to add additional comparative experiments to evaluate whether text generated by LLM but without watermarks will not be detected by the proposed method.
Finally, there is a lack of experiments to evaluate time complexity. This paper mentioned several times in the introduction that it can significantly improve the time complexity (theoretically), but did not show any quantitative comparative experiments.
问题
Why does Table 1 show a 20-30% difference in TPR between the baseline and the proposed method? Is this due to the baseline processing the entire long document as input for detection? If so, would the baseline achieve detection performance similar to that of the proposed method if it were to input single tokens for detection, albeit requiring a longer runtime?
Thank you for your thorough review and constructive feedback. We would like to address the concerns you raised as follows:
The threat model is not discussed enough.
Our work builds on the assumption that the original watermark detection schemes operate as intended. Our contribution focuses on scaling detection to long texts by segmenting them effectively. We also consider adversarial settings, such as text edits and paraphrasing attacks, as discussed in Section 4.7.
The original contribution is not solid enough.
While divide-and-conquer and binary search are possible strategies, our method is the first to apply the GC method specifically for practical watermark segment detection in long texts. This represents a novel application in watermarking scenarios. Additionally, while the AOL method is inspired by existing work, we have tailored it to include watermark-specific scores for segment detection, enhancing its applicability and efficacy in this domain.
Therefore, it is recommended to add additional comparative experiments to evaluate whether text generated by LLM but without watermarks will not be detected by the proposed method.
Our watermarking approach is specifically designed to detect text generated by watermark-enabled LLMs. Text generated by LLMs without watermarks, as well as human-generated text, are classified as non-watermarked in our experiments. To the best of our knowledge, no existing watermarking schemes are capable of detecting text generated by LLMs that do not employ watermarks.
Experimental settings are too narrow.
We conducted ablation studies to evaluate the impact of varying the watermark proportion, as presented in Table 3. The experiments demonstrate that when total text length changes, the watermark proportion can vary between 10% and 1.6%. Given that our focus is on detecting watermark fragments within long texts, we believe this range is sufficient to showcase the comparative performance of different methods. Expanding this range further is an excellent suggestion for future research to explore broader trends.
Why does Table 1 show a 20-30% difference in TPR between the baseline and the proposed method?
The baseline method uses the entire long document as input for detection, leading to suboptimal performance due to its inability to localize watermark segments effectively. Processing single tokens for detection, as suggested, is not feasible in this context. For example, in schemes like red-green watermarking, individual tokens can independently belong to the "red" list due to probabilistic overlap with natural text. Consequently, additional processing is required to group these tokens into coherent watermark segments, which is precisely what our Method 2 achieves. This segmentation is the core strength of our approach and explains the significant improvement in TPR.
We hope this response addresses your concerns and provides clarity on the contributions and experimental design of our work. Thank you again for your valuable feedback.
We sincerely thank each reviewer for their thoughtful feedback and valuable insights. While we have decided to withdraw our paper, we remain committed to improving it further. We deeply appreciate the time and effort you dedicated to reviewing our work.