PaperHub
6.8
/10
Poster4 位审稿人
最低4最高5标准差0.4
4
5
4
4
2.5
置信度
创新性2.5
质量3.0
清晰度2.8
重要性2.8
NeurIPS 2025

Locally Optimal Private Sampling: Beyond the Global Minimax

OpenReviewPDF
提交: 2025-05-11更新: 2025-10-29

摘要

We study the problem of sampling from a distribution under local differential privacy (LDP). Given a private distribution $P \in \mathcal{P}$, the goal is to generate a single sample from a distribution that remains close to $P$ in $f$-divergence while satisfying the constraints of LDP. This task captures the fundamental challenge of producing realistic-looking data under strong privacy guarantees. While prior work by Park et al. (NeurIPS'24) focuses on global minimax-optimality across a class of distributions, we take a local perspective. Specifically, we examine the minimax error in a neighborhood around a fixed distribution $P_0$, and characterize its exact value, which depends on both $P_0$ and the privacy level. Our main result shows that the local minimax error is determined by the global minimax error when the distribution class $\mathcal{P}$ is restricted to a neighborhood around $P_0$. To establish this, we (1) extend previous work from pure LDP to the more general functional LDP framework, and (2) prove that the globally optimal functional LDP sampler yields the optimal local sampler when constrained to distributions near $P_0$. Building on this, we also derive a simple closed-form expression for the locally minimax-optimal samplers which does not depend on the choice of $f$-divergence. We further argue that this local framework naturally models private sampling with public data, where the public data distribution is represented by $P_0$. In this setting, we empirically compare our locally optimal sampler to existing global methods, and demonstrate that it consistently outperforms global minimax samplers.
关键词
private samplinglocal minimaxlocal differential privacyfunctional LDP

评审与讨论

审稿意见
4

This paper studies how to generate private samples from a user’s data under local differential privacy (LDP). In particular, the authors show that the locally minimax-optimal private sampler can be derived from the global solution restricted to a local neighborhood. They provide explicit formulas for these samplers and prove they are optimal for a broad class of privacy definitions. Experiments on both discrete and continuous data show that the proposed local samplers outperform existing global methods in terms of accuracy across various privacy levels.

优缺点分析

Strengths:

The paper is well-organized and logically structured, with a clear progression from global to local minimax analysis. This work makes a contribution to the field of private sampling by introducing a local minimax-optimal framework that extends beyond traditional worst-case analysis.

Weaknesses:

  1. The condition c2c11c1N\frac{c_2 - c_1}{1 - c_1} \in \mathbb{N} in Assumption 3.2 is somewhat restrictive and lacks clear justification, potentially limiting the generality of the results. What is the practical impact of requiring fracc2c11c1N\\frac{c_2 - c_1}{1 - c_1} \in \mathbb{N}? Can you demonstrate that this condition approximately holds for common distributions or suggest a relaxation?
  2. The intuition behind the superiority of the non-linear sampler, currently deferred to the appendix, would be more impactful if summarized earlier in the main manuscript to guide the reader’s understanding.
  3. The practical utility of the proposed method remains unclear due to the lack of experiments on real-world datasets. Including a discussion on potential applications, such as private estimation or generative modeling, would help contextualize the broader impact.
  4. While the neighborhood-based formulation builds on prior work, the connection to existing literature in local differential privacy estimation is not sufficiently emphasized.
  5. In high-dimensional settings, does the clipping technique outlined in Theorem 5.1 become computationally prohibitive? It would be helpful to discuss the associated complexity and whether efficient approximations or relaxations are feasible.
  6. The relationship between the proposed local minimax framework and instance-optimal differential privacy (e.g., Feldman et al. [44]) is not explicitly addressed. Clarifying this connection could provide further insight, particularly regarding implications for private SGD or generative modeling.

问题

Please see the Weaknesses.

局限性

Please see the Weaknesses.

最终评判理由

Most of my issues have been addressed.

格式问题

N/A

作者回复

We thank the reviewer for their constructive feedback. We are glad they found the paper well-organized and appreciated our contribution to private sampling via a local minimax-optimal framework that extends beyond worst-case analysis. Below, we address each of the reviewer’s comments in detail.

 

1. The Condition boldsymbolc2c11c1N\\boldsymbol{\frac{c_2 - c_1}{1 - c_1} \in \mathbb{N}}:

We define P~\tilde{\mathcal{P}} as the set of distributions of interest over a sample space X\mathcal{X}. To enable a unified theoretical treatment of both discrete and continuous settings, we restrict P~\tilde{\mathcal{P}} to be P~c1,c2,μ:=PP(X):Pμ,;c1dPdμc2;μ-a.e.\tilde{\mathcal{P}}_{c_1,c_2,\mu}:=\\{P\in\mathcal{P}(\mathcal{X}): P \ll \mu,\\; c_1\leq\frac{dP}{d\mu}\leq c_2\\; \mu\text{-a.e.}\\} for some measure μ\mu. As shown in Park et al. [17] and mentioned in the manuscript, such restriction is necessary to avoid the trivial result. For this general setting, we assume the technical condition c2c11c1N\frac{c_2-c_1}{1-c_1} \in \mathbb{N} to facilitate the construction of our proofs via a decomposability argument.

In the discrete case, where the domain is [k][k], we define P([k])=P~0,k,μk\mathcal{P}([k])=\tilde{\mathcal{P}}_{0,k,\mu_k}, where μk\mu_k is the uniform distribution over [k][k]. In this setting, we have X=[k]\mathcal{X}=[k], c1=0c_1=0, c2=kc_2=k, and μ=μk\mu=\mu_k, which yields c2c11c1=kN\frac{c_2-c_1}{1-c_1}=k \in\mathbb{N}. Thus, the above assumption trivially holds.

In the continuous case, following the approach of Park et al. [17], we assume P~P~_c1,c2,μ\tilde{\mathcal{P}} \subset \tilde{\mathcal{P}}\_{c_1,c_2,\mu} with base measure μ\mu being Lebesgue. Here, P~c1,c2,μ\tilde{\mathcal{P}}_{c_1, c_2, \mu} acts as a tractable superset of the true distribution class. If c2c11c1N\frac{c_2-c_1}{1-c_1}\notin\mathbb{N} for the original parameters, one can slightly increase c2c_2 or decrease c1c_1 to satisfy the condition. This adjustment results in a slightly enlarged superset, which is still consistent with the modeling assumption that P~\tilde{\mathcal{P}} lies within a known, manageable class of distributions.

For example, setting c1=0c_1=0 always ensures the condition holds for any integer c2c_2. Moreover, in some natural families of distributions—such as mixtures of Gaussians with shared variance (see Example 3.1)—the assumption is satisfied without needing any relaxation.

In summary, the condition c2c11c1N\frac{c_2-c_1}{1-c_1}\in\mathbb{N} is not restrictive: it is either automatically satisfied in discrete settings, or can be enforced without loss of generality in continuous ones, purely to streamline the technical development of the theory.

 

2. Intuition Behind the Outperformance of the Non-Linear Sampler:

We thank the reviewer for the insightful suggestion. The non-linear sampler outperforms the linear one because it is not only worst-case optimal but also instance-optimal: for each input distribution PP~c1,c2,μP \in \tilde{\mathcal{P}}_{c_1,c_2,\mu} and any ff-divergence DfD_f, it minimizes Df(PQ)D_f(P\\|\mathbf{Q}) over all admissible Q\mathbf{Q} satisfying ε\varepsilon-LDP (see Proposition D.2, line 1273). This instance-optimality arises because the non-linear sampler is constructed by solving the optimization problem for each individual PP, producing a privatized distribution that minimizes the divergence from PP under the given LDP constraints. In contrast, the linear sampler applies the same transformation across all inputs. To provide clearer context for the reader, we will incorporate this intuition before presenting the comparison between the linear and non-linear samplers in the main manuscript.

 

3. Practical Utility of the Proposed Samplers and Discussion of Potential Applications:

We thank the reviewer for the suggestion. We agree that including concrete downstream applications helps underscore the potential impact of our framework.

Our framework is particularly relevant in settings where private samples must be generated in the presence of public data. For example, recent work on differentially private next-token prediction (e.g., Flemings et al. [28]) forms sampling distributions by linearly combining private and public distributions (e.g., from a public LLM), closely resembling our linear samplers. Our approach offers a principled, theoretically grounded alternative with formal optimality guarantees.

As discussed in the introduction, our work is also motivated by the growing relevance of personalized collaborative learning [18–20], where individual models are trained collaboratively and public data may reflect privatized information shared by other users. Since real-world distributions are rarely worst-case, our local minimax formulation provides a more realistic and effective framework for capturing achievable performance in practice. This makes our approach applicable to tasks such as private model fine-tuning, federated estimation, and privacy-preserving generative modeling.

We acknowledge that our experiments are limited to synthetic data due to the computational overhead of the clipping step in high-dimensional settings, as noted in the limitations section (lines 328–333). A promising direction for future work is to develop a scalable approximation for the optimal sampler that does not require clipping. Such approximation would enable us to implement private sampling on high-dimensional real datasets.

 

4. Connection to Existing Literature in LDP Estimation:

We thank the reviewer for pointing this out. Several prior works have explored local minimax formulations under DP, primarily in estimation settings. For example, Chen et al. [45] study discrete distribution estimation under local DP using neighborhoods around a reference distribution. McMillan et al. [47] and Duchi et al. [43] develop locally minimax estimators for 1D parameters in the central and local models, respectively. Asi et al. [48] extend this to function estimation, and Feldman et al. [44] study nonparametric density estimation with both local and instance-optimal guarantees.

Thus, local minimax formulations using neighborhood structures have been well-studied in DP estimation and distribution learning, with definitions tailored to specific inference goals. As outlined in the related work section, while our focus differs in that we study private sampling rather than estimation, our formulation adopts a similar neighborhood-based perspective, adapted to the sampling setting. We will revise the related work section to clarify this connection and include relevant context.

 

5. Computational Complexity of Clipping in High Dimensions:

The optimal sampler in Theorem 5.1 is defined by the mapping Qε(P)\mathbf{Q}^\star_\varepsilon(P), whose density is

q(x)=clip(1rPp(x);γ+1γ+eεp0(x),(γ+1)eεγ+eεp0(x)),\hspace{6cm}q(x)=\text{clip}\big(\tfrac{1}{r_P}p(x);\tfrac{\gamma+1}{\gamma+e^{\varepsilon}}p_0(x),\tfrac{(\gamma+1)e^{\varepsilon}}{\gamma+e^{\varepsilon}}p_0(x)\big),

where rP>0r_P>0 is chosen so that Rnq(x)dx=1\int_{\mathbb{R}^n}q(x)dx=1. Therefore, the computational burden of Qε(P)\mathbf{Q}^\star_\varepsilon(P) is dominated by computing rPr_P. To that end, consider the family qr(x):=clip(1rp(x);γ+1γ+eεp0(x),(γ+1)eεγ+eεp0(x))q_r(x):=\text{clip}\bigl(\tfrac{1}{r}p(x);\tfrac{\gamma+1}{\gamma+e^{\varepsilon}}p_0(x),\tfrac{(\gamma+1)e^{\varepsilon}}{\gamma+e^{\varepsilon}}p_0(x)\bigr) and search for r>0r>0 satisfying Rnqr(x)dx=1\int_{\mathbb{R}^n}q_r(x)dx=1. Let r1:=γ+eεγ(γ+1)r_1:=\tfrac{\gamma+e^{\varepsilon}}{\gamma(\gamma+1)} and r2:=γ(γ+eε)eε(γ+1)r_2:=\tfrac{\gamma(\gamma+e^{\varepsilon})}{e^{\varepsilon}(\gamma+1)}. It can be shown that Rnqr1(x)dx1\int_{\mathbb{R}^n}q_{r_1}(x)dx\ge1 and Rnqr2(x)dx1\int_{\mathbb{R}^n}q_{r_2}(x)dx\le1, while the map rqr(x)dxr\mapsto\int q_r(x)dx is continuous. Therefore, the desired normalizing constant rPr_P can be computed via the bisection method with initial endpoints (r1,r2)(r_1,r_2).

While the bisection procedure is efficient, the repeated numerical integration required at each step can become computationally prohibitive in high-dimensional settings. This underscores the need for alternative approaches that avoid explicit integration. One promising direction is to design approximate samplers by minimizing Df(QQε(P))D_f(\mathbf{Q}\\|\mathbf{Q}^\star_\varepsilon(P)), where Qε\mathbf{Q}^\star_\varepsilon is the optimal ε\varepsilon-LDP sampler.

We consider these directions to be promising avenues for future work, as they address an important and nontrivial challenge in scaling optimal clipping-based mechanisms to high-dimensional domains.

 

6. Connection to Instance-Optimal Differential Privacy:

The concept of instance-optimality, formulated by Feldman et al. [44], refers to algorithms that perform nearly as well on a given instance as the best differentially private algorithm designed specifically for that instance and its neighborhood. To avoid trivial solutions, performance is benchmarked not just against the instance itself but over a surrounding neighborhood, ensuring meaningful generalization without requiring the algorithm to be tailored to each instance.

Instane-optimality captures the idea that algorithms should adapt to the "hardness" of the input rather than the worst case—particularly useful when there is a large gap between typical and worst-case performance, as in nonparametric estimation.

In contrast, our local minimax approach evaluates worst-case risk over a neighborhood around a reference distribution, offering robust guarantees across plausible inputs rather than individual adaptation. This is especially relevant when prior knowledge or public data (e.g., a shared P0P_0) is available to guide the analysis.

That said, although the two formulations address different goals, they share structural similarities. As noted in the related work section, the neighborhood N(P0)N(P_0) in our framework aligns with that in Feldman et al.’s instance-optimality. We will add this comparison to the appendix to clarify their conceptual and technical connections.

评论

Thank you so much for clarifying these issues. While I appreciate the efforts and may consider increasing my score, I remain concerned about the practical relevance of the privacy guarantee.

评论

We thank the reviewer for their thoughtful comments and the time they dedicated to our paper. We would like to emphasize that our work is the first to study locally minimax-optimal samplers under local differential privacy (LDP). It follows the line of research on local minimax rates in private estimation and learning problems (e.g., [43–48]), including Duchi et al. [43] on one-dimensional parameter estimation and Chen et al. [45] on discrete distribution estimation, both in the LDP setting. Our objective is to precisely characterize locally optimal samplers in the most theoretically general settings (i.e., with non-restrictive assumptions). This by no means implies that our theoretical guarantees lack practical relevance. On the contrary, such guarantees can provide useful benchmarks for what is fundamentally achievable in real-world scenarios such as personalized collaborative learning or fine-tuning foundational models. As noted in the paper (particularly in Section 7 and further elaborated in our response to Q5), implementing the exact optimal sampler in high-dimensional settings is (admittedly) computationally challenging. This opens up an important avenue for future research—developing approximation tools for clipped-based (non-linear) global minimax-optimal samplers. Such approximations can be leveraged, via the reduction framework introduced in our work, to construct approximate locally minimax-optimal samplers.

评论

Thank you for the clarification.

审稿意见
5

In this paper, the authors generalize the optimal minimax LDP sampling procedure of Park to the functional-LDP setting. Then, they consider distributions that are neighbors of a fixed distribution P0P_0 and present 2 algorithms to perform sampling in the pure LDP setting. Lastly, authors compare their 2 algorithms with synthetic experiments.

优缺点分析

Strengths

  • The paper is well written, the hypotheses are appropriately discussed, and the results statements are clear.
  • The generalization of the work of Park to the functional LDP setting is an interesting and new contribution
  • The existence of a minimax optimal sampler that is point-wise better than the one derived from the generalization of the work of Park is also an interesting contribution.

Weaknesses

  • The class of distributions for which the sampler is optimal is restricted (the density verifies: c1 h(x) <= p(x) <= c2 h(x)). Although some restrictions are necessary to exclude trivial samplers, it is unclear to me why this particular condition arises or whether other choices could have been made.

问题

Could you discuss the choice of distribution class ?

局限性

yes

最终评判理由

I have no more questions. I keep my score.

格式问题

no

作者回复

We thank the reviewer for their thoughtful review and insightful question. We are glad that they found the paper well written and appreciated our contributions, including the generalization of the global minimax framework and the introduction of instance-optimal samplers. Below, we provide a detailed discussion of the distribution class used in our work.

The class of distributions for which the sampler is optimal is restricted (the density verifies: c1h(x)p(x)c2h(x)c_1 h(x) \leq p(x) \leq c_2 h(x)). Although some restrictions are necessary to exclude trivial samplers, it is unclear to me why this particular condition arises or whether other choices could have been made.

Could you discuss the choice of distribution class?

We thank the reviewer for raising this important point. The condition c1h(x)p(x)c2h(x)c_1 h(x) \leq p(x) \leq c_2 h(x) used to define the distribution class P~c1,c2,h\tilde{\mathcal{P}}_{c_1, c_2, h} may seem restrictive at first glance, but it arises naturally from the need to balance generality with analytical tractability in LDP sampling.

As discussed in Park et al. [17], one may consider broader classes such as: (i) P(Rn)\mathcal{P}(\mathbb{R}^n), the set of all probability measures on Rn,\mathbb{R}^n, or (ii) C(Rn)\mathcal{C}(\mathbb{R}^n), the set of all continuous distributions on Rn\mathbb{R}^n. However, for such general choices, it has been shown that the minimax rate is vacuous, that is, any ε\varepsilon-LDP sampling mechanism attains the largest possible value of ff-divergence in the worst case (see Proposition 3.2 in Park et al.). As a result, all mechanisms perform equally poorly under these settings, making meaningful comparison or claims of optimality infeasible.

To obtain nontrivial and practically useful bounds, one must restrict attention to more regular classes of distributions. The class P~c1,c2,h\tilde{\mathcal{P}}_{c_1, c_2, h}, which consists of distributions whose densities (with respect to a general base measure such as Lebesgue) are bounded between two scalings of a reference function hh, is one such choice. This class offers two key properties: (i) it includes a wide range of realistic distribution families (so it is rich enough), as we illustrate below, and (ii) it ensures that the worst-case ff-divergence is non-vacuous.

This modeling choice is not purely theoretical. As shown in Example 3.1 of our manuscript, mixtures of Gaussian or Laplace distributions with bounded location parameters fall within this class. For instance, Laplace mixtures supported within the 1\ell_1 unit ball satisfy tildePmathcalLsubseteqtildemathcalP_e1/b,e1/b,h_mathcalL\\tilde{\mathcal{P}}_{\\mathcal{L}} \\subseteq \\tilde{\\mathcal{P}}\_{e^{-1/b}, e^{1/b}, h\_{\\mathcal{L}}} and Gaussian mixtures with bounded means lie in tildemathcalP_NtildemathcalP_0,1,h_mathcalN, \\tilde{\\mathcal{P}}\_N \subseteq \\tilde{\\mathcal{P}}\_{0, 1, h\_{\\mathcal{N}}}, for suitable choices of the reference functions h_mathcalLh\_{\\mathcal{L}} and h_mathcalNh\_{\\mathcal{N}} as defined in Example 3.1. These examples demonstrate that the proposed model captures a broad and practically relevant class of distributions.

In summary, while other distribution classes are conceivable, the choice of P~c1,c2,h\tilde{\mathcal{P}}_{c_1, c_2, h} provides a balanced trade-off between expressiveness and analytical feasibility, and encompasses many distribution families relevant to real-world applications.

评论

Thank you for your answer, I have no more questions and will keep my score.

审稿意见
4

This paper investigates the problem of distribution sampling under local differential privacy (LDP). The authors begin by examining the global setting, where they extend existing results to both approximate LDP and functional LDP. They then shift focus to the local setting, in which the distribution class is restricted to a neighborhood around a fixed distribution P0P_0. Building on their global sampling approach, they develop a sampler that achieves the optimal minimax rate in the local setting.

优缺点分析

Strengths:

The paper addresses an important and timely problem in distribution sampling under local differential privacy, and it proposes a solution that achieves the optimal minimax rate. The extension from the global to the local setting is meaningful and adds significant value to the literature.

Weaknesses and Questions:

  1. The presentation could be improved, particularly regarding the purpose and organization of Section 3. Although the paper primarily focuses on local sampling, Section 3 is entirely devoted to global sampling. While I understand parts of section 3 is necessary for later sections like extension to functional LDP case and introducing QgQ^*_g, it would be clearer to restructure this material into a dedicated subsection.
  2. The proposed sampler in Theorem 4.1 is not computation efficient for general P~\widetilde{\mathcal{P}}. In particular, the projection step to compute P^\hat{P} can be computation demanding. By the way, what's the meaning of "minimizes infPN(P0)Df(PP)\inf_{P'\in N(P_0)} D_f(P||P')"? For any fixed PP and P0P_0, this infimum is a constant value.
  3. The relationship between global and local sampling is not fully clarified. Intuitively. one would assume that since local sampler is restricted to a smaller distribution set, it will achieve a better minimax rate. However, this improvement is not quantified in the paper. If general case is infeasible to quantify, is it possible to at least give concrete examples to demonstrate the rate gap between global and local sampling.

There are some other minor comments:

  1. what is the definition of gg^* in equation 4?
  2. The definition of P~\widetilde{P} is unclear in section 4, are we still considering class defined in equation (2) or a general distribution class is considered?

问题

The questions are included in the "Strengths And Weaknesses" section.

局限性

Yes

格式问题

N/A

作者回复

We thank the reviewer for their thoughtful feedback and for pointing out areas needing clarification, such as the definitions of gg^* and P~\tilde{\mathcal{P}}. We are also glad they found the extension from global to local minimax meaningful and acknowledged our contribution to private distribution sampling. Below, we address each of the reviewer’s questions and comments individually.

 

1. Presentation and Organization of Section 3:

We appreciate the reviewer’s suggestion regarding the presentation and structure of Section 3. While the main focus of the paper is on local sampling, we included Section 3 to provide the necessary theoretical groundwork for the subsequent sections, including the extension to functional LDP, the formulation of key assumptions, and the definition of global samplers used in the analysis. Specifically, we study global minimax samplers under the general framework of functional LDP and derive compact characterizations of the optimal samplers. As noted at the beginning of Section 3 (lines 170–171), these global samplers serve as proxies for constructing local minimax-optimal samplers in the following sections.

We aimed to preserve a natural progression in the paper’s structure, moving from functional LDP global minimax in Section 3, to functional LDP local minimax in Section 4, and then to pure LDP local minimax in Section 5. This order reflects the way the results were derived, with each step building on the previous one. That said, we agree that the purpose of Section 3 could be emphasized more clearly. In the revision, we will improve the framing of this section and better articulate its connection to the rest of the paper. We will also consider reorganizing key parts into a dedicated subsection to enhance clarity and flow.

 

2. Computational Efficiency of the Proposed Sampler and Clarification of the Projection Step in Theorem 4.1:

For general ff-divergences and sample spaces X\mathcal{X}, this projection step may require solving a constrained optimization problem, and its complexity is on the same order as that of the clipping-based projection described in Theorem 5.1. However, as shown by Husain et al. [7], in certain special cases such as when the ff-divergence is the KL divergence and the sample space is finite, the projection problem admits a closed-form solution using the Karush-Kuhn-Tucker (KKT) conditions. In these settings, the projection step can be carried out efficiently without incurring additional computational overhead.

As the reviewer points out, the projection can be computationally demanding, particularly in high-dimensional settings, even though a closed-form expression for it is available. This is because the expression involves a clipping step that depends on a normalization constant (similar to rPr_P in the statement of Theorem 5.1). Computing this constant requires performing numerical integration over a high-dimensional space, which can be costly and often becomes the main computational bottleneck.

In the following, we include a discussion on the computational complexity of the samplers proposed in our work. Overall, the samplers derived in this work fall into two categories: (1) linear samplers, such as the core sampler in Theorem 4.1, and (2) nonlinear samplers involving clipping functions, as in Theorem 5.1.

The linear sampler Q_c1,c2,h,g\mathbf{Q}\_{c_1,c_2,h,g}^\star, defined in Theorem 3.4 (Equation (4)), has density qg(P)(x):=λc1,c2,gp(x)+(1λc1,c2,g)h(x)q_g^{\star}(P)(x) := \lambda_{c_1, c_2, g}^{\star} p(x) + \left(1 - \lambda_{c_1, c_2, g}^{\star} \right) h(x), a convex combination of the input pp and reference hh. Its complexity lies in computing the mixing coefficient λc1,c2,g=infβ0eβ+c2c11c1(1+g(eβ))1(1c1)eβ+c21\lambda_{c_1,c_2,g}^{\star} = \inf_{\beta\geq0}\frac{e^{\beta}+\frac{c_2-c_1}{1-c_1}\left(1+g^*(-e^{\beta})\right)-1}{(1-c_1)e^{\beta}+c_2-1}.

In many cases—e.g., pure, approximate, and Gaussian LDP—the conjugate gg^* is known in closed form, allowing this optimization to be solved analytically or efficiently numerically. Thus, the linear sampler is computationally tractable.

In contrast, the nonlinear sampler in Theorem 5.1 is defined by the mapping Qε(P)\mathbf{Q}^\star_\varepsilon(P) , whose density is

q(x)=clip(1rPp(x);γ+1γ+eεp0(x),(γ+1)eεγ+eεp0(x)),q(x) = \operatorname{clip}\left( \tfrac{1}{r_P}p(x); \tfrac{\gamma+1}{\gamma+e^{\varepsilon}}p_0(x), \tfrac{(\gamma+1)e^{\varepsilon}}{\gamma+e^{\varepsilon}}p_0(x) \right),

where the constant rP>0r_P>0 is selected to ensure normalization condition Rnq(x)dx=1\int_{\mathbb{R}^n} q(x) dx = 1 . The main computational burden lies in determining this normalizing constant rPr_P. To compute it, we define the parametric family qr(x):=clip(1rp(x);γ+1γ+eεp0(x),(γ+1)eεγ+eεp0(x))q_r(x) := \operatorname{clip}\left( \tfrac{1}{r}p(x); \tfrac{\gamma+1}{\gamma+e^{\varepsilon}}p_0(x), \tfrac{(\gamma+1)e^{\varepsilon}}{\gamma+e^{\varepsilon}}p_0(x)\right), and search for a value r>0r > 0 such that Rnqr(x)dx=1\int_{\mathbb{R}^n} q_r(x) dx = 1. It can be shown that the endpoints r1:=γ+eεγ(γ+1)r_1 := \tfrac{\gamma+e^{\varepsilon}}{\gamma(\gamma+1)} and r2:=γ(γ+eε)eε(γ+1)r_2 :=\tfrac{\gamma(\gamma+e^{\varepsilon})}{e^{\varepsilon}(\gamma+1)} satisfy qr11\int q_{r_1}\ge 1 and qr21\int q_{r_2} \le 1, and that rqr(x)dx r \mapsto \int q_r(x)\,dx is continuous. Therefore, rPr_P can be found using the bisection method on the interval (r1,r2) (r_1,r_2).

While the bisection method requires only a logarithmic number of steps, the main computational cost lies in evaluating the integral I(r)=Rnqr(x)dxI(r) = \int_{\mathbb{R}^n} q_r(x) dx at each step. This is tractable in low-dimensional settings but quickly becomes prohibitive as dimensionality increases.

Regarding the reviewer’s question about the expression “P^\hat{P} minimizes infPN(P0)Df(PP)\inf_{P' \in N(P_0)} D_f(P \\| P'),” we intended to convey that we find the distribution P^\hat{P} that minimizes Df(PP)D_f(P \\| P') over all PN(P0)P' \in N(P_0). The reviewer is correct, and we will revise the text accordingly for clarity.

 

3. Relationship Between Global and Local Sampling and the Associated Rate Gap:

Under an ε\varepsilon-LDP constraint, consider the global model class

\quad \mu\text{-a.e.}\\}.$$ The global minimax risk for this class is given in Equation (6), line 227: $$\hspace{5cm}\mathcal{R}\bigl(Q_{\mathcal{X},\\,\tilde{\mathcal{P}},g_{\varepsilon}}, \tilde{\mathcal{P}},f\bigr)= \frac{1-r_1}{r_2 - r_1}\\,f(r_2)+\frac{r_2 - 1}{r_2 - r_1}\\,f(r_1),$$ where $r_1 = c_1 \cdot \frac{(1 - c_1)e^{\varepsilon} + c_2 - 1}{c_2 - c_1}$ and $r_2 = \frac{c_2}{c_2 - c_1} \cdot \frac{(1 - c_1)e^{\varepsilon} + c_2 - 1}{e^{\varepsilon}}. $ The local minimax risk is defined over the neighbourhood $N_{\gamma}(P_0) :=\\{ P \in \mathcal{P}(\mathcal{X}): E_{\gamma}(P \\| P_0) = E_{\gamma}(P_0 \\| P) = 0 \\}, $ which satisfies $N_{\gamma}(P_0) \subseteq \tilde{\mathcal{P}}\_{c_1,c_2,\mu}$. As defined in line 1249 of the technical appendix, the corresponding local minimax risk is $$\hspace{5cm}\mathcal{R}\bigl( Q_{\mathcal{X},N_{\gamma}(P_0),\varepsilon}, N_{\gamma}(P_0),f \bigr) = \frac{1 - r_1}{r_2 - r_1}f(r_2) + \frac{r_2 - 1}{r_2 - r_1}f(r_1),

where r1=eε+γγ(γ+1) r_1 = \frac{e^{\varepsilon} + \gamma}{\gamma(\gamma + 1)} and r2=γ(eε+γ)eε(γ+1).r_2 = \frac{\gamma(e^{\varepsilon} + \gamma)}{e^{\varepsilon}(\gamma + 1)}.

While the difference between the local and global risks can be computed in closed form via direct subtraction, the resulting expression is not particularly insightful, as it depends on several problem-specific parameters, including (c1,c2,γ)(c_1, c_2, \gamma), the sample space, and the privacy level ε\varepsilon. Although the subtraction can be evaluated for any given instance, it does not, in general, yield a simple or interpretable expression.

To better illustrate the improvement, we provide numerical comparisons in both discrete and continuous settings. In the discrete case (Figures 3, 5, 6), we compare the global and local minimax-optimal samplers for X=[k]\mathcal{X} = [k], where the global class is P~_global=P([k])=P~_0,k,μk,\tilde{\mathcal{P}}\_{\text{global}} = \mathcal{P}([k]) = \tilde{\mathcal{P}}\_{0,k,\mu_k}, with μk\mu_k being the uniform distribution. The local neighbourhood is defined as P~_local=N_γ(μk)=P~_1γ,γ,μk,\tilde{\mathcal{P}}\_{\text{local}} = \mathcal{N}\_\gamma(\mu_k) = \tilde{\mathcal{P}}\_{\frac{1}{\gamma}, \gamma, \mu_k}, with γ=k21\gamma = \frac{k}{2} - 1, ensuring that Nγ(μk)P([k])\mathcal{N}_\gamma(\mu_k) \subseteq \mathcal{P}([k]).

We instantiate both the global and local neighbourhoods using the parameters defined in Section 6.1, and compute the exact minimax risks in each case. Figures 3, 5, and 6 present these theoretical worst-case risks, clearly demonstrating the benefit of tailoring the mechanism to the local neighbourhood. These plots consistently illustrate a significantly improved minimax rate in the local setting compared to the global one, across various ff-divergences and parameter ranges.

 

4. Definition of g\boldsymbol{g^*} in Equation (4):

We define g\*g^{\*} to be the convex conjugate of the function gg, as introduced in the notation section. Formally, it is given by g(y)=supxdom(g)(xyg(x))g^{*}(y) = \sup_{x \in \text{dom}(g)} \Bigl( xy - g(x) \Bigr). To improve clarity and completeness, we will include this formal definition in the notation section.

 

5. Definition of P~\boldsymbol{\tilde{\mathcal{P}}} in Section 4:

As for the definition of P~\tilde{\mathcal{P}} in Section 4, we clarify that P~\tilde{\mathcal{P}} refers to a general set of possible distributions relevant to the problem under consideration. In particular, in Section 4, P~\tilde{\mathcal{P}} may represent any general class of continuous distributions, subject only to the constraint Nγ(P0)P~,N_\gamma(P_0) \subseteq \tilde{\mathcal{P}}, which is assumed in the statement of Theorem 4.1. We will include a note at the beginning of the section to make this generality of P~\tilde{\mathcal{P}} explicit.

审稿意见
4

This paper studies the problem of generating a sample from a distribution under the constraints of local differential privacy. This work extends previous work in two directions: 1) it studies sampling algorithms satisfying functional local differential privacy with optimal global minimax risk 2) it studies local minimax risks of private samplers in the presence of public data. The proposed algorithms are validate experimentally.

优缺点分析

Strengths

  • The paper presents two methods to identify private samplers that are locally minimax optimal when a public distribution is available,

  • the paper characterizes samplers that are globally minimax optimal and that satisfy functional LDP.

Weaknesses

  • The focus of the paper on the local model makes the sampling problem less about algorithms and more about finding the right parameters,

  • the paper doesn't give much intuition about the derived samplers.

Several recent works have studied sampling from distributions under the constraints of differential privacy in different models. This paper considers this problem in the local model of differential privacy, where every user holds a distribution and aims at releasing a sample from it. More specifically, the paper focuses on minmax notions of utility of the samplers with respect to f-divergences. In particular, the utility is defined at the user level rather than at the protocol level as in works that have studied statistical tasks in the local model. While certainly meaningful, this focus makes the paper more about figuring out the right setup and parameters of the samplers rather than coming up with novel algorithm.
The paper does this well focusing on problems that were not studied in previous works. The global minimax results extend the set of results available to functional LDP and to different discrete and continuous samplers. Similarly to what I said above, functional DP was proposed in the context of the central model. While it can certainly also be used for the local model, its properties and semantics are different and make some problem more interesting and some problem less interesting. The results on local minimax extend previous work which studied samplers with public data in the local model by providing optimal local minimax samplers. It is worth to notice that the paper presents two frameworks to do this. The experiments confirm the theoretical results. Overall, I fund this paper contributing good results to the theory of differentially private local samplers, although I didn't find them particularly exciting in terms of new ideas required.

问题

Can you please provide a comparison between how you use public data an previous work [22] used them?

In the limitation you mentions the computational complexity of your samplers, can you give more details on it?

局限性

yes

格式问题

none

作者回复

We thank the reviewer for their thoughtful and detailed review. We are glad they appreciated our contributions, including the extension to functional LDP and the characterization of optimal local minimax samplers. Below, we address each of the reviewer’s questions and concerns individually.

 

The focus of the paper on the local model makes the sampling problem less about algorithms and more about finding the right parameters.

While global minimax private sampling under pure LDP has been studied previously (e.g., Park et al. [17]), our work extends this by developing both global and local minimax results under the more general functional LDP framework, which encompasses pure, approximate, and Gaussian LDP. This required introducing new assumptions, proof techniques, and a carefully defined local neighborhood that fundamentally differs from the global setting. These developments allow us to bridge global and local minimax frameworks and derive new linear and nonlinear samplers that are optimal in the local minimax sense under functional LDP. Importantly, we propose new algorithms tailored to this setting—not merely new parameters for existing ones. Our framework also lays a foundation for future work, including broader neighborhood definitions and scalable implementations, as outlined in the limitations.

 

The paper doesn't give much intuition about the derived samplers.

We thank the reviewer for their constructive feedback. While we made an effort to convey the intuition behind the derived samplers—through both textual explanations and visual illustrations—we agree that additional elaboration could help improve clarity. Below, we provide a complementary interpretation of the two main classes of samplers discussed in the paper: linear and nonlinear samplers.

For the linear samplers, such as the global sampler Qc1,c2,h,g\mathbf{Q}^\star_{c_1,c_2,h,g} defined by qg(P)(x):=λc1,c2,gp(x)+(1λc1,c2,g)h(x),q^\star_g(P)(x):=\lambda^\star_{c_1,c_2,g}p(x)+\left(1-\lambda^\star_{c_1,c_2,g}\right)h(x), as presented in lines 204–206 of the manuscript, a natural interpretation is that the mechanism samples from the input distribution PP with probability λc1,c2,g\lambda^\star_{c_1,c_2,g}, and from the reference distribution hh otherwise. This probabilistic mixture reflects the privacy–utility trade-off: a larger λc1,c2,g\lambda^\star_{c_1,c_2,g} yields outputs closer to PP, while sampling from hh introduces the randomness required for privacy. This same intuition applies to all linear samplers derived in the paper.

For the nonlinear sampler presented in Theorem 5.1, the optimal sampling mechanism is given by
q(x)=clip(1rPp(x);γ+1γ+eεp0(x),(γ+1)eεγ+eεp0(x)).\hspace{6cm}q(x)=\text{clip}\left(\tfrac{1}{r_P}p(x);\tfrac{\gamma+1}{\gamma+e^{\varepsilon}}p_0(x), \tfrac{(\gamma+1)e^{\varepsilon}}{\gamma+e^{\varepsilon}}p_0(x)\right).

As described in lines 287–288 of the manuscript, this construction defines a nonlinear sampler by projecting the input distribution PP onto a family of distributions referred to as the mollifier MεM_\varepsilon, as defined in the paper (see Figure 2). The clipping operation ensures that the output distribution Qε(P)\mathbf{Q}^\star_\varepsilon(P) satisfies the LDP constraint while maintaining a close match to the original distribution PP in terms of ff-divergence. In other words, Qε(P)\mathbf{Q}^\star_\varepsilon(P) belongs to the mollifier and is instance-optimal for the given input distribution. This approach yields a nonlinear transformation of PP that carefully balances privacy constraints and utility preservation.

Figure 2 further illustrates the sampling process under the local minimax framework. It involves two steps: (1) if PP lies outside the local neighborhood of the public distribution P0P_0, it is projected inward; and (2) optimal sampling is performed using either a linear or nonlinear sampler, depending on the context.

We will incorporate these clarifications and interpretations in the final version to better communicate the intuition behind the derived samplers.

 

Can you please provide a comparison between how you use public data and previous work [22] used them?

Zamanlooy et al. [22] use public data as an explicit optimization constraint in their global minimax formulation, which is defined over discrete domains and restricted to linear samplers. Note that in this discrete domain, the sampling distribution can be expressed as PKPK, where PP is the data distribution (viewed as a vector) and KK is an LDP mechanism (viewed as a matrix that satisfies the LDP constraint). Zamanlooy et al. [22] considered ε\varepsilon-LDP mechanisms KK that preserve the public prior P0P_0, enforcing P0K=P0P_0K = P_0. This guarantees that the public data remains invariant under the mechanism while privacy is provided for private data. They then formulated their minimax optimization as: infsubstackKQεP0K=P0supPP(X)Df(PPK).\hspace{8cm}\inf_{\\substack{K \in\mathcal{Q_\varepsilon}\\\\P_0K=P_0}}\sup_{P \in\mathcal{P}(\mathcal{X})}D_f(P\\|PK).

In contrast, we use public data in a conceptually different way. Instead of treating it as a hard constraint, we use the public distribution P0P_0 to define a local neighborhood Nγ(P0)N_\gamma(P_0). The worst-case input distributions are then considered within this neighborhood. This leads to a local minimax formulation of the private sampling problem over a general sample space, without restricting to linear mechanisms. Specifically, we search over mechanisms QQε\mathbf{Q}\in\mathcal{Q_\varepsilon}, where Qε\mathcal{Q_\varepsilon} again denotes the class of ε\varepsilon-LDP samplers:

infQQεsupPNγ(P0)Df(PQ(P)).\inf_{\mathbf{Q}\in\mathcal{Q_\varepsilon}}\sup_{P \in N_\gamma(P_0)}D_f\big(P\\|\mathbf{Q}(P)\big).

Note that the formulation of [22] is a special form of our setting: Q(P)=PK\mathbf{Q}(P)=PK where KK is an ε\varepsilon-LDP sampler. Our goal is to move beyond the overly pessimistic global worst-case risk and instead focus on distributions that are close to P0P_0, which is assumed to be publicly available to all users. This enables a more refined and robust characterization of optimal samplers from a local minimax perspective. It is important to note that in the formulation of Zamanlooy et al., having public data does NOT necessarily lead to a more accurate sampling (smaller minimax risk), whereas this is the case in our formulation.

That said, as shown in Theorems 4.1 and 5.1, the local minimax-optimal samplers Qg,Nγ(P0)Q^\star_{g, N_\gamma(P_0)} and Qε,Nγ(P0)Q^\star_{\varepsilon,N_\gamma(P_0)} both satisfy Qg,Nγ(P0)(P0)=P0Q^\star_{g, N_\gamma(P_0)}(P_0)=P_0 and Qε,Nγ(P0)(P0)=P0Q^\star_{\varepsilon, N_\gamma(P_0)}(P_0)=P_0, even though the preservation of P0P_0 is not explicitly enforced as a constraint. This demonstrates that our formulation inherently respects the public data when the input distribution coincides with it, without requiring this condition to be imposed as part of the optimization, meaning that our formulation is a non-trivial generalization of Zamanlooy et al. [22] to general data distribution (not necessarily to discrete distribuition, as in [22]).

 

In the limitation you mentions the computational complexity of your samplers, can you give more details on it?

We thank the reviewer for raising this important point regarding computational complexity. Overall, the samplers derived in this work fall into two categories: (1) inear samplers, such as the core sampler in Theorem 4.1, and (2) nonlinear samplers involving clipping functions, as in Theorem 5.1.

The linear sampler \\mathbf{Q}^\\star_{c_1,c_2,h,g} defined in Theorem 3.4 (Equation (4)), has density q^\\star_g(P)(x):=\\lambda_{c_1,c_2,g}^{\\star}\\,p(x)+\\left(1-\lambda_{c_1,c_2,g}^{\star} \right) h(x), a convex combination of the input pp and reference hh. Its complexity lies in computing the mixing coefficient λc1,c2,g\lambda_{c_1,c_2,g}^{\star}, which is given by λc1,c2,g=infβ0eβ+c2c11c1(1+g(eβ))1(1c1)eβ+c21\lambda_{c_1,c_2,g}^{\star} = \inf_{\beta\geq0}\frac{e^{\beta}+\frac{c_2-c_1}{1-c_1}\left(1+g^*(-e^{\beta})\right)-1}{(1-c_1)e^{\beta}+c_2-1}.

In many cases—e.g., pure, approximate, and Gaussian LDP—the conjugate gg^* is known in closed form, allowing this optimization to be solved analytically or efficiently numerically. Thus, the linear sampler is computationally tractable.

In contrast, the nonlinear sampler in Theorem 5.1 is defined by the mapping Qε(P) \mathbf{Q}^\star_\varepsilon(P), whose density is q(x)=clip(1rPp(x);γ+1γ+eεp0(x),(γ+1)eεγ+eεp0(x)),\hspace{6cm}q(x)=\text{clip}\left(\tfrac{1}{r_P}p(x);\tfrac{\gamma+1}{\gamma+e^{\varepsilon}}p_0(x), \tfrac{(\gamma+1)e^{\varepsilon}}{\gamma+e^{\varepsilon}}p_0(x)\right), where the constant rP>0r_P>0 is selected so that Rnq(x)dx=1\int_{\mathbb{R}^n}q(x)dx=1. The main computational burden lies in determining this normalizing constant rPr_P. To compute it, we define the parametric family qr(x):=clip(1rp(x);γ+1γ+eεp0(x),(γ+1)eεγ+eεp0(x))q_r(x):=\text{clip}\left( \tfrac{1}{r}p(x); \tfrac{\gamma+1}{\gamma+e^{\varepsilon}}p_0(x),\tfrac{(\gamma+1)e^{\varepsilon}}{\gamma+e^{\varepsilon}}p_0(x) \right), and search for a value r>0r>0 such that Rnqr(x)dx=1\int_{\mathbb{R}^n}q_r(x)dx=1. It can be shown that the endpoints r1:=γ+eεγ(γ+1)r_1:=\tfrac{\gamma+e^{\varepsilon}}{\gamma(\gamma+1)} and r2:=γ(γ+eε)eε(γ+1)r_2:=\tfrac{\gamma(\gamma+e^{\varepsilon})}{e^{\varepsilon}(\gamma+1)} satisfy qr11\int q_{r_1}\ge1 and qr21\int q_{r_2} \le 1, and that rqr(x)dxr \mapsto \int q_r(x)dx is continuous. Therefore, rPr_P can be found using the bisection method on the interval (r1,r2)(r_1,r_2).

While the bisection method requires only a logarithmic number of steps, the main computational cost lies in evaluating the integral I(r)=Rnqr(x)dxI(r)=\int_{\mathbb{R}^n}q_r(x)dx at each step. This is tractable in low-dimensional settings but quickly becomes prohibitive as dimensionality increases.

A promising direction to mitigate this is to bypass explicit integration by designing approximate samplers Q\mathbf{Q} that minimize the divergence Df(QQε(P))D_f(\mathbf{Q}\\|\mathbf{Q}^\star_\varepsilon(P)), where Qε\mathbf{Q}^\star_\varepsilon is the optimal ε\varepsilon-LDP sampler. This offers a rich path for future work, especially in scaling optimal mechanisms to high-dimensional domains.

评论

Thanks for your reply. This helps clarifying several aspects of the paper. I recommend you to include in the paper the more detailed discussion about the use of public data, as well as the discussion about the complexity challenge.

评论

We appreciate the reviewer’s constructive feedback and are glad our rebuttal helped clarify their concerns. For completeness and clarity, we will incorporate the discussions on the use of public data and the computational complexity challenges into the final version. Thank you again for the helpful suggestions.

最终决定

This paper studies the problem of generating a sample from a distribution given to a user under the constraints of local differential privacy. The problem was studied/introduced in two previous works. This work extends previous work in two directions: (a) goes beyond pure DP with optimal global minimax risk (b) it studies local minimax risks of private samplers, modeled as the knowledge of a "public" distribution P_0. The errors of the proposed algorithms are validated numerically. This is a nice technical and conceptual progress on a relatively natural problem. The main weakness is that there are no compelling motivating problems that would allow one to evaluate the significance of any of the improvements. Still given the importance of sampling problems in modern ML the insights given in this work may prove useful.