We thank the reviewer for their insightful review. We address their comments below.

Perinucleus sampling is essentially a rebranding of backdoor/fingerprinting approaches, with only a minor modification on how response tokens are chosen (i.e., picking less likely but not too unlikely tokens). This is a straightforward trade-off and not a fundamentally new technique.

We agree with the reviewer that LLM fingerprinting is not a fundamentally new technique introduced by us. It builds upon the (relatively recent) paradigm of fine-tuning backdoors for model authentication. However, Perinucleus sampling for fingerprints is novel, which is related to a major contribution of this paper: Scalability. We believe that a technique like Perinucleus sampling was not considered before because no other paper focussed on the importance of scalability, i.e. embedding a large number of fingerprints. One fundamental contribution of our paper is that we introduce the notion of scalability for the first time and justify why scalability is necessary (i.e., for better trade-off between false-discovery rate and missed detection, better protection against fingerprint leakage, and better security against collusion attacks.) This includes, for example, Proposition 5.3. And this novel motivation naturally leads to a solution like Perinucleus sampling. We suspect that if other researchers wanted to solve scalability in fingerprinting, they might have arrived at something similar to our Perinucleus sampling. This is not a weakness of our approach, in our humble opinion, but rather speaks to how natural and fundamental the solution is.

The collusion analysis is simplistic, and the defense is only probabilistic. There is no discussion of how an actual, motivated adversary with access to multiple fingerprinted models (as would occur in real-world leakage) could reverse-engineer and remove or mask fingerprints.

We agree with the reviewer that there is a larger space of possible attacks not fully investigated in our paper. We believe that might be outside the scope of our paper, whose main contribution is in studying the scalability of fingerprints and investigating robustness with some obvious attack surfaces (fine-tuning, system prompts, model merging, collusion attacks, etc.). Investigating complex attack surfaces is itself an important topic which definitely warrants further investigation, and we thank the reviewer for raising this interesting direction for future work. We want to emphasize that even the “simple” collusion attacks we studied are powerful, and we are the first to look at this attack vector carefully. Our collusion analysis covers attacks at decoding time, and guarantees defense against any inference time strategy which adheres to a mild assumption. We also empirically show defending against model merging in Appendix E.

The simplicity of our analysis could refer to either the stated guarantee of Proposition 5.3 and its proof or the assumptions made in Assumption 5.2. We will address each below.

First, in hindsight, the analysis we provide in Proposition 5.3 seems simplistic. However, it is quite challenging to get the dependence , the total number of models, as small as logarithmic. In fact, our initial approach had polynomial dependence in with a large order. It took several iterations to get the dependence all the way down to logarithmic. The analysis has been refined quite a bit in that process so that the resulting footprint of the proof is quite slim. We add an explanation of this progression of the analysis in the revision.

Next, Assumption 5.2 might look simplistic, but we do need some assumption like Assumption 5.2 to make the problem make sense. Under this assumption, if all the models in a coalition emit the same token, the coalition has to respond with that token. We believe this is a natural assumption, since this follows from the fact that the coalition cannot query a non-fingerprinted model, which we believe is essential for ensuring any security of fingerprints. There are two ways Assumption 5.2 can break down, and neither of them are any interesting. One way is for the adversary to output a random token even when all the models agree on the next token. This will certainly hurt the utility significantly. Another way is to use another model (that is not from one of the fingerprinted models) to answer. If this is the case, any model authentication is impossible because the fingerprinted models are essentially not being used. And the fingerprinter should call it a success since they forced the adversary to use another model.

Regarding probabilistic defenses, we believe this is necessary because the attacker can always use a randomized attack (for example randomly picking one fingerprinted model to generate the output), in which case only probabilistic defenses are possible. This is the same in many security analyses, where probabilistic guarantees are given against worst-case attackers who can use randomized schemes. For example,one corollary of our Proposition 5.3 is that the failure probability goes down exponentially in the security parameter , which is the number of fingerprints we can inject, , where the constant depends on other parameters that are fixed. In this sense, our probabilistic analysis is well-aligned with the notion of security parameter that governs how complex the scheme is in traditional security literature, where failure probability goes down exponentially in the security parameter in the best case. This also underscores that scalable fingerprinting schemes can quickly increase the success probability of our defense.

Hyperparameter sensitivity: While this is mentioned in the appendix, the method appears to require careful tuning of multiple hyperparameters (t, k, λ_WA, β_DM), and many important details are relegated to the appendix.

Our fingerprint design has two hyper-parameters - the threshold for Perinucleus sampling, and the width for randomization. In Fig 1 of the main paper, we show that the method is fairly robust to the choice of and . Note that the right most figure there is a log scale plot showing that utility of the fingerprinted model is relatively flat against k, while the center plot shows that the utility is unaffected for a large range of before dipping sharply for values close to 1. These hyper-parameters provide a trade-off between security (by controlling the false positive rates according to Proposition 1) and model utility, but as we show empirically, a wide range of values work just fine.

The values and are regularization hyper-parameters not specifically tied to our method. We show the sensitivity to these in the Appendix, and find that empirically, large enough values of are sufficient to prevent catastrophic forgetting independent of the model being used.

Crucially, we tune all these hyper-parameters only once, i.e. on 1 set of fingerprints and on 1 model, and these transfer to different models and different numbers of fingerprints.

Due to space constraints, we had to relegate some of the implementation details to the appendix, however, we would be glad to include them in the main paper.