Local Pan-privacy for Federated Analytics
Maintaining differential privacy under intrusion on the local state.
摘要
评审与讨论
This paper explores three fundamental problems within the framework of local pan differential privacy: estimating the number of non-zero entries, calculating the mean, and constructing histograms. The authors begin by establishing a lower bound for counting non-zero entries, which includes a factor, indicating that local pan privacy may be significantly more challenging than local differential privacy. They then showed how to remove this dependence by relaxing to computational local pan privacy: they proposed an algorithm leveraging rerandomizable public-key encryption, and extended the result to other two problems. Finally, they presented a theorem suggesting that the rerandomizable public-key encryption scheme is necessary for computational 0-local pan-privacy.
给作者的问题
N/A
论据与证据
The proofs presented in the paper are clearly articulated and well-structured, effectively supporting the authors' claims.
方法与评估标准
Yes, estimation error serves as the standard metric for assessing the performance of a differentially private algorithm.
理论论述
I reviewed all the proofs and found no significant flaws.
实验设计与分析
N/A
补充材料
N/A
与现有文献的关系
This paper introduces algorithms for several fundamental problems, which could serve as foundational components for addressing othertasks within the realm of local pan differential privacy.
遗漏的重要参考文献
N/A
其他优缺点
Strengths:
-
The paper is well-written, making it accessible to non-experts.
-
The results are intriguing. Although the techniques used to prove them are not particularly sophisticated, they effectively illustrate the fundamental limits of pan privacy within the local model.
Weakness:
The paper lacks experimental results. While this is acceptable for a theoretical study, including simulations or even a basic complexity analysis (e.g., time and space considerations) could enhance the demonstration of the proposed methods' advantages over fully homomorphic encryption (FHE).
其他意见或建议
-
In Theorem 7, it appears that one is missing -- the aggregator algorithm should be -DP.
-
In the "if r = 0" part of Algorithm 6, the superscripts are incorrect -- you wrote instead of . Additionally, should it be as in Algorithm 3?
-
In Theorem 6.1, should the number of allowed rerandomizations be ? Because after creating the ciphertext using , there are only transitions left.
We thank the reviewer for their positive feedback, and for pointing out the typos in Thm 7, Thm 6.1, and in Algorithm 6. We will fix those in the updated version.
Public-key vs. FHE: the existing implementations of these primitives have very large gaps. Indeed the public-key encryption based aggregation schemes are already used, both in the single-server and the two-server setting. These schemes in practice incur a fairly small overhead in run-time and communication and our algorithms requiring re-randomization at each step would add a small additional overhead. FHE implementations currently incur very large overheads, not just for computation but also for communication. E.g. in Google’s FHE library, the encryption of a single bit is nearly 3kB, compared to 256 bytes for encrypting a field element that are currently used in HPKE. From a computational point of view, our algorithms just add re-randomization operations at each step, which are relatively inexpensive. We will add a more detailed discussion of these costs in an updated version. We also remark that our algorithms effectively implement the standard algorithms for counting and histogram estimation in the local DP model and thus incur no overhead in terms of utility. Indeed for these applications, these algorithms are known to be optimal in terms of privacy-utility trade-offs.
This paper introduces a privacy protection model termed local pan-privacy, aiming to address privacy leakage issues in federated analytics when local devices are subject to continuous intrusions. The authors first theoretically prove that achieving pan-privacy in an information-theoretic sense leads to estimation errors at least times larger than traditional local differential privacy methods. To overcome this limitation, they propose a computationally secure Local Pan-Privacy model based on rerandomizable public-key encryption schemes. Specifically, this method protects privacy by storing encrypted states locally, updating the encrypted state via rerandomization, and using the randomized response mechanism to securely transmit ciphertexts to the server. This design enables accurate computation of the number of devices experiencing a particular event (COUNTNONZERO) and the histogram of event occurrences, without significant loss in practical performance. Moreover, the proposed approach is effective in both single-server and two-server aggregation settings.
给作者的问题
(1)The paper mainly focuses on some simple statistical tasks, such as the COUNTNONZERO mentioned. But does it also have applicability or limitations in more complex statistical or machine learning tasks? (2)Theorem 3 states that exact local pan-privacy algorithms require a public key encryption scheme. Does this rule out the possibility of using lightweight cryptography, such as symmetric encryption or hash functions? Or does it imply that public-key encryption is the only viable option or the best choice in the context of the paper?
论据与证据
(1)Overall, both the theoretical claims and methods put forward in this paper are supported by clear and rigorous mathematical proofs, and thus enjoy a relatively high level of credibility at the theoretical level. Nevertheless, the “no additional cost” mentioned in the paper, despite being proven at the theoretical level, lacks sufficient support from real-life scenarios or experiments and awaits further verification. (2)The paper clearly expounds on the deficiencies of the existing local differential privacy and traditional pan-privacy methods in the scenario of “privacy protection when client devices in federated analytics are subject to continuous intrusions”, and puts forward an improved scheme based on cryptography. However, the authors did not clearly present experimental verification. (3)The theoretical derivation and proof process in the paper are logically clear, and no obvious errors or derivation problems have been found. For example, it is proved in the paper that the cost of pan-privacy under the information theory is relatively high, while the local pan-privacy based on cryptography has no significant overhead. (4)The paper does not include any experimental section. The supplementary materials provide proofs for Theorem 12, Theorem 13 and Theorem 6.1. Meanwhile, the pseudocodes of Algorithm 4, Algorithm 5 and Algorithm 6 are also supplemented. These are mainly used to support Theorem 8, Theorem 9, Lemma 3.1 and Lemma 3.2.
方法与评估标准
yes
理论论述
yes
实验设计与分析
yes
补充材料
yes
与现有文献的关系
(1)The local pan-privacy proposed in this paper is an innovative extension based on the classic differential privacy and the existing pan-privacy theories. Meanwhile, this research also ingeniously applies the existing rerandomizable public-key encryption schemes in the field of cryptography. The application of such schemes is a further innovative integration based on the existing cryptographic theories and practices like PRIO. (2)The scope and relevance of the citations in the paper are currently good. No work that is highly relevant to the paper has been found to be uncited yet.
遗漏的重要参考文献
no
其他优缺点
(1)This paper demonstrates strong theoretical originality and plays an important role in promoting the development of the privacy field. This paper demonstrates strong theoretical originality and plays an important role in promoting the development of the privacy field.
其他意见或建议
no
We thank the reviewer for their positive feedback.
Choice of tasks: In this work, we have focused on the simplest and most natural tasks in this new model. These histogram-type tasks have also been used in federated analytics applications to enable a larger class of statistical tasks. To our knowledge, nearly all implemented federated analytics systems build on histograms as the basic building block. We leave to future work more complex tasks, and believe that the techniques in this work would be useful for other tasks as well.
The need for PKC: We will discuss the implications of Theorem 3 in more detail. Constructing public-key crypto (PKC) from symmetric key encryption of hashing is a major and long-standing open problem in cryptography, and it is widely believed that PKC is "harder" to build. Theorem 3 implies that constructing accurate algorithms for countnonzero under local pan-privacy from symmetric-key crypto or hashing is at least as hard as constructing PKC from these primitives (and thus at the very least it requires resolving a central and long-standing open question in crypto).
The response is not clear.
We assume the reviewer has found our explanation of the PKC result unclear. We elaborate below.
Our Theorem 3 says that if we have a locally pan-private algorithm for CountNonZero, then we can use it to build a public-key cryptosystem (PKC). I.e. Local-pan-privacy => PKC.
We take that to mean that lightweight crypto such as symmetric-key-crypto (SKC, or hashing) is not going to be sufficient for building Locally pan-private algorithms.
The justification for this is the following: Suppose that we could build locally-pan-private-algorithms from symmetric-key-crypto (SKC). Then using Thm 3, we would get a construction for PKC from SKC. It is widely believed that PKC needs stronger assumptions than SKC and thus we don’t expect to be able to construct PKC from SKC (this is a long standing major open problem in cryptography). Thus Thm 3 leads us to conclude that we should not expect to be able to construct locally pan private algorithms from SKC.
We also remark that this is the standard approach in the literature to proving that some primitive cannot be built from SKC, and this distinction is indeed important since in practice SKC can be more efficient than PKC.
The paper introduces and examines the concept of local pan-privacy, which assumes that an adversary can eavesdrop on a local device’s internal memory. As a result, clients must ensure that their internal states remain secure. Under this constraint, the authors investigate (federated) analytics problems such as count-nonzero and count-histogram in a streaming setting, where each client observes a bit stream. The goal is to privately release either the total number of nonzero bits or a histogram of bit occurrences while maintaining local pan-privacy.
The paper makes two key contributions. First, it establishes an information-theoretic lower bound, demonstrating that for the count-nonzero problem, local pan-privacy introduces an error proportional to , where is the length of the bit stream. In contrast, standard local differential privacy (DP) does not impose this additional error. Second, the authors propose a computational pan-DP approach that circumvents this fundamental limitation. Their method is conceptually simple: it encrypts the internal memory, decrypts it when needed, and then applies a privatization mechanism before transmitting data to the server.
给作者的问题
Is the information-theoretic lower bound established in Theorem 9 tight in an information-theoretic sense? In other words, does there exist an information-theoretic pan-privacy scheme that consistently achieves this error for any input stream? Note that the mechanism constructed in Lemma 3.1 is not -local DP, as in the worst-case scenario, every bit in could change. Consequently, while Theorem 9 provides a valid lower bound, it remains unclear whether it is a tight one.
论据与证据
The proofs of the theorems are correct to my best knowledge.
方法与评估标准
N/A since this is a theory paper.
理论论述
The proofs of the theorems are correct to my best knowledge.
实验设计与分析
N/A since this is a theory paper.
补充材料
I've skimmed through the additional proofs.
与现有文献的关系
The references seem to be adequate.
遗漏的重要参考文献
The references seem to be adequate.
其他优缺点
Strengths:
While the two key contributions of the paper—the information-theoretic lower bound and the computational upper bound—are relatively simple, the central message is intriguing. The paper highlights that by relaxing the definition of differential privacy from an information-theoretic to a computational setting, one can reduce the estimation error by a factor of in the count-nonzero problem. This result establishes an interesting distinction between information-theoretic and computational differential privacy.
Weaknesses:
Although the paper presents an interesting theoretical result, I have concerns about its practical applicability, particularly given the strong assumptions of the threat model. If an adversary has the capability to eavesdrop on internal states and memory, wouldn’t the public-key encryption system also be vulnerable to compromise? Related to that, I would appreciate if the authors can elaborate on the threat model in the framework.
其他意见或建议
See the above comments.
We thank the reviewer for the positive feedback.
Threat model: We will clarify, motivate and illustrate the threat model. Our main motivating example is a shared device: a computer in a public library, a shared device in a home or other setting. The attacker is another user of the device, who can observe its state regularly. This is the same kind of attack model that private browsing modes in browsers are designed to address. A stolen device may also lead to an attack of this kind. In our schemes, it is important that only the public encryption (and re-randomization) key is held on the device: even if an attacker sees the entire internal state, it still cannot decrypt ciphertexts.
Tightness of lower bound: Our lower bound instances have the property that each stream has at most a single ‘1’, and the lower bound can be shown to be tight for this case. We conjecture that in the general case where the stream may have multiple ‘1’s, the lower bound would be larger by a factor. We discuss this briefly in the Conclusions section, and will elaborate on this.
The paper proposes a model that combines the restrictions of local privacy (where data is distributed among devices and there is no trusted central server), and pan privacy (where the state of a device can be observed by an intruder). The main results are an information theoretic lower bound, and an upper bound against computationally bounded adversaries using cryptographic techniques.
The model is intriguing. The results are a good first step, although perhaps a little too simple in terms of techniques.