HuRef: HUman-REadable Fingerprint for Large Language Models
We generate a dog image as an identity fingerprint for an LLM, where the dog's appearance strongly indicates the LLM's base model.
摘要
评审与讨论
This paper investigated the problem of identifying the base model of a given large language model (LLM) using fingerprint. First, the authors found that the vector direction of the parameters of the LLM is unique to each LLM. Thus, the vector direction can be leveraged as a model fingerprint. Based on this finding, the authors further proposed three invariant terms that are robust to several weight rearrangement attacks. Furthermore, the authors also proposed to generate a human-readable fingerprint by inputting the invariant terms into an image generation model and also introduced zero-knowledge proof to guarantee the fingerprint is honestly generated.
优点
- The proposed method can effectively identify the base model.
- The proposed three invariant terms are robust to weight rearrangement.
- Adequate experiments on a large amount of LLMs
缺点
- Motivation of human-readable fingerprints: My major concern is the motivation of the proposed human-readable fingerprints, especially the property of 'human-readable'. The experimental results in this paper have shown that comparing the vector direction has already been able to identify the base model quite well. However, adding the step of generating human-readable images may have a negative impact on the accuracy of the identification, since the human perception of images can be flawed. The authors stated that they generated the human-readable images to mitigate information leakage. But I think other techniques such as multi-party computing or zero-knowledge proof are better to tackle this issue.
- Lack of related works: This paper took some recent fingerprinting methods designed for LLMs as the baseline. However, many model fingerprinting methods for a broader range of deep learning models are missing. The authors should incorporate these papers into the related work section. The list of these papers can be found in any recent survey about model copyright protection, such as [A].
- The applicability of the proposed zero-knowledge proof for fingerprints: This paper proposed to use zero-knowledge proof to ensure the fingerprint is honestly generated without access to the model parameters. However, I think the proposed method may not be able to achieve this goal since the adversary can simply use another model to generate the fingerprint. Thus, I think this proposed method may not conducted in the black-box setting since it necessitates access to the model parameters.
- It may be better for the authors to introduce the formal definition of the 'vector direction' in Section 3.1.
[A] Deep intellectual property protection: A survey. 2023.
问题
- Explain the motivation and necessity for the human-readable fingerprints.
- Discussions on more related works.
- Explain whether the proposed zero-knowledge proof for fingerprints can actually work.
局限性
This paper has clarified the limitations.
Thank you for acknowledging the effectiveness of our method and the adequacy of our experiments. We appreciate your time in reviewing our paper and providing valuable suggestions. Below are our point-by-point responses to your comments.
- Motivation of human-readable fingerprints.
The MPC scheme is also considered in our work, but it doesn't suit our application scenarios, so we did not present it in this article. Specifically, MPC operates as an interactive proof process, requiring each pair of manufacturers to interact for comparison and proof generation. It is unrealistic to expect every LLM manufacturer to engage in such interactions.
Even if feasible, this would make MPC suitable only for one-to-one comparisons, which is inefficient. For example, if there are N LLMs and their corresponding manufacturers, it would require interactive comparisons and proofs. In contrast, our method allows each LLM manufacturer to generate a human-readable fingerprint and proof just once, enabling easy and efficient comparison across all models (a total of only computations and proofs).
If we understand correctly, the zero-knowledge proof you mentioned also relies on interactive computations to yield results directly, facing similar challenges as those with MPC.
Therefore, we opted for SNARK (Succinct Non-interactive Argument of Knowledge)[1] instead of MPC, providing a non-interactive solution with greater simplicity and feasibility. Each LLM manufacturer can independently generate their fingerprint and corresponding proof, facilitating quick and easy comparison with other LLMs.
While generating human-readable images may slightly reduce identification accuracy, this trade-off enhances the security of LLM parameters, simplifies fingerprint interpretation, and enables efficient one-to-many comparisons. We believe this trade-off is acceptable.
- Lack of related works.
Thank you for recommending survey [A], which offers a comprehensive review and introduces a novel taxonomy.
Due to space constraints, we focused on related works primarily in the LLM area. However, we agree that a broader discussion on copyright protection in other domains is valuable. We have identified 26 additional related works and will include a discussion of these in the next version of the paper (not listed in the references due to character limits).
- The applicability of the proposed zero-knowledge proof for fingerprints.
The question you’ve raised is also a classic problem in cryptography. A traditional method to address this problem is cryptographic commitments[2,3], which possess the dual properties of being binding and hiding:
- Binding: This property ensures that it is computationally infeasible to find more than one valid opening for any given commitment, thereby preventing the substitution of the committed data.
- Hiding: This ensures that the commitment itself discloses no information about the data it secures.
When a prover aims to demonstrate that certain private information satisfies a statement, they initially commit to this information. This commitment secures the information, ensuring its immutability throughout the proof process. For model fingerprinting, the manufacturer needs to commit to their model and publish the commitment first. The commitment’s binding nature ensures that no other model can match this commitment, preventing substitution attacks. All subsequent proof processes are carried out with this commitment, and anyone can verify if the model parameters used in calculations (such as fingerprinting or inferences) match those sealed within the commitment.
For instance, if a developer commits to model parameter A but uses a different model B for services, the public can request inference proofs for the model of the API for verification. Since the parameters used by model B inference are different from the parameters hidden in the commitment, the proof cannot pass the verification, substitution attacks will be revealed.
This is what we do in Section 4.3.2.a: where we constrain must match the specific LLM parameters we intend to prove (the model B in your example). Reference [4] offers an effective implementation for the Zero-Knowledge proof of LLM inference. Therefore, our method is applicable in the black-box setting.
- It may be better for the authors to introduce the formal definition of the 'vector direction' in Section 3.1.
The formal definition of the 'vector direction' is given below:
Let be the weight matrices and be the bias vectors of a LLM. Each weight matrix is flattened into a vector , and all these vectors are concatenated along with the bias vectors to form a single large vector :
The direction of the vector is defined by the unit vector , which is given by:
where denotes the Euclidean norm (magnitude) of the vector , and is the unit vector indicating the direction of .
References
[1]Chiesa A, Hu Y, Maller M, et al. Marlin: Preprocessing zkSNARKs with universal and updatable SRS[C]//Advances in Cryptology–EUROCRYPT. 2020: 738-768.
[2] Kate A, Zaverucha G M, Goldberg I. Constant-size commitments to polynomials and their applications[C]//Advances in Cryptology-ASIACRYPT. 2010: 177-194.
[3] Wahby R S, Tzialla I, Shelat A, et al. Doubly-efficient zkSNARKs without trusted setup[C]//2018 IEEE Symposium on Security and Privacy (SP). IEEE, 2018: 926-943.
[4] Sun H, Li J, Zhang H. zkLLM: Zero Knowledge Proofs for Large Language Models[J]. arXiv preprint arXiv:2404.16109, 2024.
Thank you for the response. The explanations address my concern. I will raise my rating to 5.
References
[1] N. Lukas, Y. Zhang, and F. Kerschbaum, “Deep neural network fingerprinting by conferrable adversarial examples,” in International Conference on Learning Representations (ICLR), 2021.
[2] H. Chen, B. D. Rouhani, et al, “Deepmarks: A secure fingerprinting framework for digital rights management of deep learning models,” in ICMR, 2019, pp. 105–113.
[3] T. Wang and F. Kerschbaum, “Riga: Covert and robust white-box watermarking of deep neural networks,” in WWW, 2021, pp. 993–1004.
[4] H. Liu, Z. Weng, and Y. Zhu, “Watermarking deep neural networks with greedy residuals,” in Proceedings of the International Conference on Machine Learning (ICML), PMLR, 2021, pp. 6978–6988.
[5] B. D. Rouhani, H. Chen, and F. Koushanfar, “Deepsigns: an end-to-end watermarking framework for protecting the ownership of deep neural networks,” in ASPLOS, 2019.
[6] Y. Li, L. Zhu, X. Jia, Y. Jiang, S.-T. Xia, and X. Cao, “Defending against model stealing via verifying embedded external features,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 36, no. 2, 2022, pp. 1464–1472.
[7] Y. Li, L. Zhu, X. Jia, Y. Bai, Y. Jiang, S.-T. Xia, and X. Cao, “Move: Effective and harmless ownership verification via embedded external features,” arXiv preprint arXiv:2208.02820, 2022.
[8] X. Lou, S. Guo, T. Zhang, Y. Zhang, and Y. Liu, “When nas meets watermarking: ownership verification of dnn models via cache side channels,” TCSVT, 2022.
[9] X. Chen, T. Chen, Z. Zhang, and Z. Wang, “You are caught stealing my winning lottery ticket! making a lottery ticket claim its ownership,” Advances in Neural Information Processing Systems (NeurIPS), vol. 34, pp. 1780–1791, 2021.
[10] L. Fan, K. W. Ng, and C. S. Chan, “Rethinking deep neural network ownership verification: Embedding passports to defeat ambiguity attacks,” Advances in Neural Information Processing Systems (NeurIPS), vol. 32, 2019.
[11] L. Fan, K. W. Ng, C. S. Chan, and Q. Yang, “Deepipr: Deep neural network intellectual property protection with passports,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021.
[12] Y. Adi, C. Baum, et al, “Turning your weakness into a strength: Watermarking deep neural networks by backdooring,” in 27th USENIX Security Symposium (USENIX Security 18), 2018, pp. 1615–1631.
[13] J. Guo and M. Potkonjak, “Watermarking deep neural networks for embedded systems,” in ICCAD, IEEE, 2018, pp. 1–8.
[14] E. Le Merrer, P. Perez, and G. Trédan, “Adversarial frontier stitching for remote neural network watermarking,” Neural Computing and Applications (NCA), vol. 32, no. 13, pp. 9233–9244, 2020.
[15] H. Chen, B. D. Rouhani, and F. Koushanfar, “Blackmarks: Blackbox multibit watermarking for deep neural networks,” arXiv preprint arXiv:1904.00344, 2019.
[16] H. Wu, G. Liu, Y. Yao, and X. Zhang, “Watermarking neural networks with watermarked images,” TCSVT, vol. 31, no. 7, pp. 2591–2601, 2020.
[17] S. Abdelnabi and M. Fritz, “Adversarial watermarking transformer: Towards tracing text provenance with data hiding,” in IEEE Symposium on Security and Privacy (S&P), 2021, pp. 121–140.
[18] X. He, Q. Xu, L. Lyu, F. Wu, and C. Wang, “Protecting intellectual property of language generation apis with lexical watermark,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 36, no. 10, 2022, pp. 10758–10766.
[19] X. He, Q. Xu, et al, “CATER: Intellectual property protection on text generation APIs via conditional watermarks,” in Advances in Neural Information Processing Systems (NeurIPS), 2022.
[20] H. Jia, M. Yaghini, et al, “Proof-of-learning: Definitions and practice,” in IEEE Symposium on Security and Privacy (S&P), IEEE, 2021, pp. 1039–1056.
[21] Y. Zheng, S. Wang, and C.-H. Chang, “A dnn fingerprint for non-repudiable model ownership identification and piracy detection,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 2977–2989, 2022.
[22] H. Chen, H. Zhou, et al, “Perceptual hashing of deep convolutional neural networks for model copy detection,” TOMCCAP, 2022.
[23] C. Xiong, G. Feng, et al, “Neural network model protection with piracy identification and tampering localization capability,” in Proceedings of the 30th ACM International Conference on Multimedia (MM), 2022, pp. 2881–2889.
[24] J. Zhao, Q. Hu, et al, “Afa: Adversarial fingerprinting authentication for deep neural networks,” Computer Communications, vol. 150, pp. 488–497, 2020.
[25] X. Pan, Y. Yan, M. Zhang, and M. Yang, “Metav: A meta-verifier approach to task-agnostic model fingerprinting,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD), 2022, pp. 1327–1336.
[26] K. Yang, R. Wang, and L. Wang, “Metafinger: Fingerprinting the deep neural networks with meta-training,” in Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2022.
This paper proposes a method to identify the original base model of a fine-tuned LLM via weight-based invariant terms, addressing the issue of a potential misuse or licensing violations with respect to the foundation models.
Authors have identified a consistently low cosine distance between the parameter vector of a wide-range of LLMs and their base models. They then utilize this finding to create a weight term invariant to parameter permutation, addressing a potential attack by an adversarial model developer who can simply rearrange model weights while maintaining utility.
Authors then develop a fingerprint based on the invariant term to make it a viable verification method in a black-box scenario, where model developers consider raw model weights to be commercially sensitive. Under the proposed fingerprinting method, model developers would only need to share a fingerprint of the invariant term and provide a ZKP proof it has been computed on the real model.
Finally, authors perform extensive experiments on a wide range of LLMs, spanning different fine-tuning paradigms, to demonstrate the accuracy of the proposed approach.
优点
This paper is insightful and very well-written. It addresses a prominent issue of model training transparency, and provides a realistic and effective method to identify the base model for a given LLM. The extent to which cosine distance between parameter vectors is preserved throughout fine-tuning, regardless of the exact method, is indeed remarkable and surprising.
Authors have identified and mitigated real-world concerns associated with the proposed approach (black-box access and robustness to parameter permutation).
I believe the paper provides an important contribution to our understanding of fine-tuning dynamics, and the relationship between fine-tuned and base models.
缺点
I'm not very comfortable with the Zero-Knowledge Proofs, and can be mistaken, but it seems to me that ZKP mechanism described in Sec. 4.2 is vulnerable to the model substitution attack. In a black-box scenario, where the verifier only has access to the model inputs and outputs (and potentially logits), malicious model developer can perform the proof on model A, but actually serve another model B through their API. Unless I'm mistaken, the proof does not provide a way to verify that LLM parameters are the same ones used to compute a publicly visible output based on
问题
Given that fingerprinting model (Sec. 4.1) is public, does it make fingerprint images vulnerable to reverse engineering? In other words, can an adversary, having access to the FPM and a fingerprint, reconstruct (part of) the model weights?
局限性
Limitations are duly addressed in the appendix.
Thank you for recognizing the insights and contributions of our research. We appreciate your time and constructive feedback. Our point-to-point responses to your comments are given below.
- I'm not very comfortable with the Zero-Knowledge Proofs, and can be mistaken, but it seems to me that ZKP mechanism described in Sec. 4.2 is vulnerable to the model substitution attack. In a black-box scenario, where the verifier only has access to the model inputs and outputs (and potentially logits), malicious model developer can perform the proof on model A, but actually serve another model B through their API. Unless I'm mistaken, the proof does not provide a way to verify that LLM parameters are the same ones used to compute a publicly visible output based on
Thank you for raising this important question, which is also a classic problem in cryptography. A conventional approach to address this issue is through cryptographic commitments[1,2], which possess the dual properties of being binding and hiding:
- Binding: This property ensures that it is computationally infeasible to find more than one valid opening for any given commitment, thereby preventing the substitution of the committed data.
- Hiding: This ensures that the commitment itself discloses no information about the data it secures.
In our method, when a model developer wants to generate a fingerprint, they first commit to their model and publish this commitment. The binding property guarantees that no other model can match the same commitment, thereby preventing substitution attacks. All subsequent proof processes are carried out with this commitment, allowing anyone to verify if the model parameters used in calculations (such as fingerprinting or inferences) match those sealed within the commitment.
For example, if a developer commits to model parameter A but uses a different model B for services, the public can request inference proofs for the model of the API for verification. Since the parameters used by model B inference are different from the parameters hidden in the commitment, the proof cannot pass the verification, substitution attacks will be revealed.
This is what we do in Section 4.3.2.a: where we constrain must match the specific LLM parameters we intend to prove (the model B in your example). For the Zero-Knowledge proof of LLM inference, we referred to [3], which provides an effective implementation.
- Given that fingerprinting model (Sec. 4.1) is public, does it make fingerprint images vulnerable to reverse engineering? In other words, can an adversary, having access to the FPM and a fingerprint, reconstruct (part of) the model weights?
Practical Perspective: We think that the realization of reverse engineering is hard from two perspectives:
-
Extracting hidden information from the reconstructed invariant terms requires extremely high reconstruction accuracy. For example, to extract the model’s embedding dimension from the invariant terms, one would need to compute the rank of these terms. Since a matrix’s rank is sensitive to numerical values, even minor reconstruction errors in the invariant terms would render the extracted information meaningless. Moreover, the invariant terms we calculate have very small values, with variances mostly below 0.01, further raising the accuracy demands for reconstruction. Reversing a 512x512 fingerprint generated by an FPM with over 20 nonlinear layers to obtain a 6x4096x4096 input, while maintaining extremely high reconstruction accuracy, would be extremely difficult.
-
Attacker can’t derive the exact parameters from invariant terms. The FPM’s input consists of invariant terms, which are products of model parameters rather than the parameters themselves. Even if an attacker could exactly reconstruct the invariant terms, they still wouldn’t be able to recover the specific model parameters. For example, given the invariant term , it is impossible to derive the exact parameters without additional information.
Theoretical Perspective: Given access to the FPM and a fingerprint, some level of information leakage is inevitable due to the inherent nature of the fingerprinting process. Specifically, a fingerprint serves more as a form of data compression rather than data encryption[4], meaning some methods in encryption like introducing randomness cannot be used to prevent this type of information leakage. Nevertheless, we assess this leakage to be negligible and acceptable, as the amount of information leaked holds minimal practical significance. If the goal is to avoid any information leakage, zk can be employed to directly produce the final comparison results when one model is open source, as demonstrated in (see Section 4.3).
References
[1] Kate A, Zaverucha G M, Goldberg I. Constant-size commitments to polynomials and their applications[C]//Advances in Cryptology-ASIACRYPT. 2010: 177-194.
[2] Wahby R S, Tzialla I, Shelat A, et al. Doubly-efficient zkSNARKs without trusted setup[C]//2018 IEEE Symposium on Security and Privacy (SP). IEEE, 2018: 926-943.
[3] Sun H, Li J, Zhang H. zkLLM: Zero Knowledge Proofs for Large Language Models[J]. arXiv preprint arXiv:2404.16109, 2024.
[4] Katz J, Lindell Y. Introduction to modern cryptography: principles and protocols[M]. Chapman and hall/CRC, 2007.
Thanks to the authors for the clarifications, specifically on the ZKP question. I will maintain my score.
This work aims at producing a human-readable watermark for LLMs as unique identifiers in a black-box setup, e.g., without exposing model parameters. Starting from an interesting observation that the model parameters become stable after convergence, esp in the post-training process, the authors proposed a creative method to produce the visual information via a pretrained image generator to mark these base LLMs.
优点
-
I think the problem studied in this work, and the proposed fingerprint via visual information, is interesting.
-
From the experiments, I think the generated images effectively direct to the base model identity.
缺点
-
It is noteworthy that using the unique visual identifier to reveal the model identity has been proposed in some related works [1].
-
I think the authors should focus more on the experiment part of demonstrating the superiority and the pros of the proposed method, why the proposed method is effective (at least empirically), rather than making the derivation process of different attacks dominate. In the current version, it is more like a tech report.
-
Generalizability of the proposed method: In the current version, the visual identifier information relies more on the qualitative check. In practice, a larger scale with variants of image generators (GAN/VAE/Diffusion Models with different architecture/capacity/domain) to help confirm the observation / conclusion is necessary, via quantitative metrics (I already note the human-based evaluation in Figure 10).
Reference:
[1] Zhao, Yunqing, et al. "A recipe for watermarking diffusion models." arXiv preprint arXiv:2303.10137 (2023).
问题
See weakness.
局限性
The authors adequately discusses the limitations.
Thank you for acknowledging that our method is creative and effective. We appreciate your time in reading the paper and providing helpful suggestions. Our point-to-point responses to your comments are given below.
- It is noteworthy that using the unique visual identifier to reveal the model identity has been proposed in some related works [1].
We want to highlight the fundamental differences between our work and [1]:
- In [1], the method involves watermarking the image generation model itself, whereas our approach does not target an image generation model but rather uses it to derive a fingerprint for an LLM.
- [1] fine-tunes the diffusion model to produce a specific image as the unique visual identifier, similar to how a watermarked language model generates predefined text as its identifier. In contrast, our method does not involve any training or fine-tuning of the LLM and does not impact model performance.
- Additionally, while the visual identifier in [1] is predefined and embedded through training, our visual identifier is derived from the LLM’s parameters and is not predefined.
- I think the authors should focus more on the experiment part of demonstrating the superiority and the pros of the proposed method, why the proposed method is effective (at least empirically), rather than making the derivation process of different attacks dominate. In the current version, it is more like a tech report.
We conducted comprehensive experiments to demonstrate the superiority and advantages of our method. In Section 5.1, we tested it on 28 independently based LLMs and 51 offspring LLMs, proving its effectiveness across various LLMs. Notably, we showcased its superior performance compared to the latest fingerprinting methods in Section 5.1.4, and its robustness against subsequent training processes in Section 5.1.1—benefits that other methods usually lack. In Section 5.1.3, our method achieved 100% accuracy in identifying the base model of 51 offspring LLMs. Additionally, we conducted human-based evaluations in Section 5.1.2, quantitatively assessing the discrimination ability of our generated fingerprints.
Furthermore, we provided empirical evidence for why our method is effective in Sections 3.1.1 and 3.1.2 in two folds. First, the model parameters' vector direction is closely related to the base model, subsequent training steps (such as SFT, RLHF, or continued pretraining) won't change it significantly. Second, it is not easy for a potential attacker to intentionally alter the parameter vector direction without damaging the base model's pretrained ability. This underlies the foundation of the reliability of our proposed method. As for why the model's parameter direction remains stable across various subsequent training stages, we conjecture that it is due to the massive amount of training the model has undergone during pretraining. The unique parameter vector direction of a trained model can be ultimately traced back to its random initialization. As long as the models are initialized independently, their vector directions could be completely different despite that the training procedures and data are identical (c.f. our experiments in Appendix E.1). During pretraining, as the model converges, the vector direction of the model also stabilizes gradually (c.f. our experiments in Appendix E.2). Once it stabilizes, the vector direction won't change too much unless it is intentionally altered, which results in major damage to the model (c.f. Figure 1, and Section 3.1.2).
By the way, we want to emphasize that the derivation process of different attacks is also important, forming an indispensable part of the paper. It provides the foundation for the experiments, ensuring the robustness and applicability of our method. However, we will revise the structure in the updated version to better highlight our experimental results.
- Generalizability of the proposed method.
Our method is designed to fingerprint LLMs with the assistance of an image generator, so we primarily focused on generalizability across various LLMs. However, we agree that testing our method’s generalizability with more variants of image generators is also valuable.
We conducted experiments on 6 additional image generators, covering GANs, VAEs, and diffusion models, and achieved consistently high accuracy in quantitative human-based evaluations. These results demonstrate the generalizability of our method across different types of image generators. Below are the accuracy rates for each of the 6 image generators, based on evaluations conducted with 55 college-educated individuals:
| Image Generator | Soft-IntroVAE[2] | StyleGAN2(metface)[3] | BigGAN[4] | Stable-Diffusion1[5] | DDPM[6] | Stable-Diffusion2 | Mean |
|---|---|---|---|---|---|---|---|
| ACC | 99.48 | 98.70 | 99.48 | 99.48 | 98.18 | 98.83 | 99.03 |
References
[1] Zhao, Yunqing, et al. "A recipe for watermarking diffusion models." arXiv preprint arXiv:2303.10137 (2023).
[2] Daniel, Tal, and Aviv Tamar. "Soft-introvae: Analyzing and improving the introspective variational autoencoder." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
[3] Karras, Tero, et al. "Analyzing and improving the image quality of stylegan." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
[4] Brock, Andrew, Jeff Donahue, and Karen Simonyan. "Large scale GAN training for high fidelity natural image synthesis." arXiv preprint arXiv:1809.11096 (2018).
[5] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
[6] Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in neural information processing systems 33 (2020): 6840-6851.
I thank the authors response, though I still think the main point and contributions are not well-demonstrated. I would suggest in the next version, the authors can improve the manuscript via making the writing more self-contained, easy to understand the main contribution, esp. in the ZKP part.
In the response, the author well answered many of the concerns, therefore I will increase my score to 5.
Thank you for acknowledging that our rebuttal addressed many of your concerns and for offering constructive feedback. We will incorporate your suggestions to improve our writing in the next version.
We sincerely thank Reviewer YYto for maintaining his strong overall rating of 8 throughout the rebuttal phase. We also appreciate Reviewer BRHe for acknowledging that our rebuttal addressed his concerns and improved his overall rating. As Reviewer TSUQ has not yet responded to our rebuttal, we would like to further emphasize the uniqueness of our method:
- Our approach does not interfere with training or fine-tuning of the LLM, thus has zero affection on LLMs' performances.
- The method is entirely a black-box method since we incorporate zero-knowledge proof into the fingerprint generation process.
- Despite the black-box setting and zero interference with LLM output, our approach achieves high accuracy (100% accuracy in identifying the base model of 51 offspring LLMs) and is robust against various subsequent training processes, including RLHF, SFT, modality extension, and continued pretraining in new languages, etc.
- As Reviewer YYto noted, our method offers insights into the fine-tuning dynamics of LLMs.
During the development of our method, we aimed to make our fingerprinting work 1) provide high performance in recognizing base models, 2) could be conducted under a black-box setting, and 3) without interfering with the LLM training process. We believe these are necessary features to make a fingerprinting method practically applicable under the current situation of LLM protection.
1x SA and 2x BA. This paper studies the problem of identifying the base model of an LLM using fingerprints. The reviewers agree on the (1) interesting topic, (2) effective results, and (3) clear robustness. Most of the concerns, such as the unclear motivation and insufficient deeper analysis, have been addressed by the rebuttal. Therefore, the AC leans to accept this submission.