Locally Optimal Private Sampling: Beyond the Global Minimax
摘要
评审与讨论
This paper studies how to generate private samples from a user’s data under local differential privacy (LDP). In particular, the authors show that the locally minimax-optimal private sampler can be derived from the global solution restricted to a local neighborhood. They provide explicit formulas for these samplers and prove they are optimal for a broad class of privacy definitions. Experiments on both discrete and continuous data show that the proposed local samplers outperform existing global methods in terms of accuracy across various privacy levels.
优缺点分析
Strengths:
The paper is well-organized and logically structured, with a clear progression from global to local minimax analysis. This work makes a contribution to the field of private sampling by introducing a local minimax-optimal framework that extends beyond traditional worst-case analysis.
Weaknesses:
- The condition in Assumption 3.2 is somewhat restrictive and lacks clear justification, potentially limiting the generality of the results. What is the practical impact of requiring ? Can you demonstrate that this condition approximately holds for common distributions or suggest a relaxation?
- The intuition behind the superiority of the non-linear sampler, currently deferred to the appendix, would be more impactful if summarized earlier in the main manuscript to guide the reader’s understanding.
- The practical utility of the proposed method remains unclear due to the lack of experiments on real-world datasets. Including a discussion on potential applications, such as private estimation or generative modeling, would help contextualize the broader impact.
- While the neighborhood-based formulation builds on prior work, the connection to existing literature in local differential privacy estimation is not sufficiently emphasized.
- In high-dimensional settings, does the clipping technique outlined in Theorem 5.1 become computationally prohibitive? It would be helpful to discuss the associated complexity and whether efficient approximations or relaxations are feasible.
- The relationship between the proposed local minimax framework and instance-optimal differential privacy (e.g., Feldman et al. [44]) is not explicitly addressed. Clarifying this connection could provide further insight, particularly regarding implications for private SGD or generative modeling.
问题
Please see the Weaknesses.
局限性
Please see the Weaknesses.
最终评判理由
Most of my issues have been addressed.
格式问题
N/A
We thank the reviewer for their constructive feedback. We are glad they found the paper well-organized and appreciated our contribution to private sampling via a local minimax-optimal framework that extends beyond worst-case analysis. Below, we address each of the reviewer’s comments in detail.
1. The Condition :
We define as the set of distributions of interest over a sample space . To enable a unified theoretical treatment of both discrete and continuous settings, we restrict to be for some measure . As shown in Park et al. [17] and mentioned in the manuscript, such restriction is necessary to avoid the trivial result. For this general setting, we assume the technical condition to facilitate the construction of our proofs via a decomposability argument.
In the discrete case, where the domain is , we define , where is the uniform distribution over . In this setting, we have , , , and , which yields . Thus, the above assumption trivially holds.
In the continuous case, following the approach of Park et al. [17], we assume with base measure being Lebesgue. Here, acts as a tractable superset of the true distribution class. If for the original parameters, one can slightly increase or decrease to satisfy the condition. This adjustment results in a slightly enlarged superset, which is still consistent with the modeling assumption that lies within a known, manageable class of distributions.
For example, setting always ensures the condition holds for any integer . Moreover, in some natural families of distributions—such as mixtures of Gaussians with shared variance (see Example 3.1)—the assumption is satisfied without needing any relaxation.
In summary, the condition is not restrictive: it is either automatically satisfied in discrete settings, or can be enforced without loss of generality in continuous ones, purely to streamline the technical development of the theory.
2. Intuition Behind the Outperformance of the Non-Linear Sampler:
We thank the reviewer for the insightful suggestion. The non-linear sampler outperforms the linear one because it is not only worst-case optimal but also instance-optimal: for each input distribution and any -divergence , it minimizes over all admissible satisfying -LDP (see Proposition D.2, line 1273). This instance-optimality arises because the non-linear sampler is constructed by solving the optimization problem for each individual , producing a privatized distribution that minimizes the divergence from under the given LDP constraints. In contrast, the linear sampler applies the same transformation across all inputs. To provide clearer context for the reader, we will incorporate this intuition before presenting the comparison between the linear and non-linear samplers in the main manuscript.
3. Practical Utility of the Proposed Samplers and Discussion of Potential Applications:
We thank the reviewer for the suggestion. We agree that including concrete downstream applications helps underscore the potential impact of our framework.
Our framework is particularly relevant in settings where private samples must be generated in the presence of public data. For example, recent work on differentially private next-token prediction (e.g., Flemings et al. [28]) forms sampling distributions by linearly combining private and public distributions (e.g., from a public LLM), closely resembling our linear samplers. Our approach offers a principled, theoretically grounded alternative with formal optimality guarantees.
As discussed in the introduction, our work is also motivated by the growing relevance of personalized collaborative learning [18–20], where individual models are trained collaboratively and public data may reflect privatized information shared by other users. Since real-world distributions are rarely worst-case, our local minimax formulation provides a more realistic and effective framework for capturing achievable performance in practice. This makes our approach applicable to tasks such as private model fine-tuning, federated estimation, and privacy-preserving generative modeling.
We acknowledge that our experiments are limited to synthetic data due to the computational overhead of the clipping step in high-dimensional settings, as noted in the limitations section (lines 328–333). A promising direction for future work is to develop a scalable approximation for the optimal sampler that does not require clipping. Such approximation would enable us to implement private sampling on high-dimensional real datasets.
4. Connection to Existing Literature in LDP Estimation:
We thank the reviewer for pointing this out. Several prior works have explored local minimax formulations under DP, primarily in estimation settings. For example, Chen et al. [45] study discrete distribution estimation under local DP using neighborhoods around a reference distribution. McMillan et al. [47] and Duchi et al. [43] develop locally minimax estimators for 1D parameters in the central and local models, respectively. Asi et al. [48] extend this to function estimation, and Feldman et al. [44] study nonparametric density estimation with both local and instance-optimal guarantees.
Thus, local minimax formulations using neighborhood structures have been well-studied in DP estimation and distribution learning, with definitions tailored to specific inference goals. As outlined in the related work section, while our focus differs in that we study private sampling rather than estimation, our formulation adopts a similar neighborhood-based perspective, adapted to the sampling setting. We will revise the related work section to clarify this connection and include relevant context.
5. Computational Complexity of Clipping in High Dimensions:
The optimal sampler in Theorem 5.1 is defined by the mapping , whose density is
where is chosen so that . Therefore, the computational burden of is dominated by computing . To that end, consider the family and search for satisfying . Let and . It can be shown that and , while the map is continuous. Therefore, the desired normalizing constant can be computed via the bisection method with initial endpoints .
While the bisection procedure is efficient, the repeated numerical integration required at each step can become computationally prohibitive in high-dimensional settings. This underscores the need for alternative approaches that avoid explicit integration. One promising direction is to design approximate samplers by minimizing , where is the optimal -LDP sampler.
We consider these directions to be promising avenues for future work, as they address an important and nontrivial challenge in scaling optimal clipping-based mechanisms to high-dimensional domains.
6. Connection to Instance-Optimal Differential Privacy:
The concept of instance-optimality, formulated by Feldman et al. [44], refers to algorithms that perform nearly as well on a given instance as the best differentially private algorithm designed specifically for that instance and its neighborhood. To avoid trivial solutions, performance is benchmarked not just against the instance itself but over a surrounding neighborhood, ensuring meaningful generalization without requiring the algorithm to be tailored to each instance.
Instane-optimality captures the idea that algorithms should adapt to the "hardness" of the input rather than the worst case—particularly useful when there is a large gap between typical and worst-case performance, as in nonparametric estimation.
In contrast, our local minimax approach evaluates worst-case risk over a neighborhood around a reference distribution, offering robust guarantees across plausible inputs rather than individual adaptation. This is especially relevant when prior knowledge or public data (e.g., a shared ) is available to guide the analysis.
That said, although the two formulations address different goals, they share structural similarities. As noted in the related work section, the neighborhood in our framework aligns with that in Feldman et al.’s instance-optimality. We will add this comparison to the appendix to clarify their conceptual and technical connections.
Thank you so much for clarifying these issues. While I appreciate the efforts and may consider increasing my score, I remain concerned about the practical relevance of the privacy guarantee.
We thank the reviewer for their thoughtful comments and the time they dedicated to our paper. We would like to emphasize that our work is the first to study locally minimax-optimal samplers under local differential privacy (LDP). It follows the line of research on local minimax rates in private estimation and learning problems (e.g., [43–48]), including Duchi et al. [43] on one-dimensional parameter estimation and Chen et al. [45] on discrete distribution estimation, both in the LDP setting. Our objective is to precisely characterize locally optimal samplers in the most theoretically general settings (i.e., with non-restrictive assumptions). This by no means implies that our theoretical guarantees lack practical relevance. On the contrary, such guarantees can provide useful benchmarks for what is fundamentally achievable in real-world scenarios such as personalized collaborative learning or fine-tuning foundational models. As noted in the paper (particularly in Section 7 and further elaborated in our response to Q5), implementing the exact optimal sampler in high-dimensional settings is (admittedly) computationally challenging. This opens up an important avenue for future research—developing approximation tools for clipped-based (non-linear) global minimax-optimal samplers. Such approximations can be leveraged, via the reduction framework introduced in our work, to construct approximate locally minimax-optimal samplers.
Thank you for the clarification.
In this paper, the authors generalize the optimal minimax LDP sampling procedure of Park to the functional-LDP setting. Then, they consider distributions that are neighbors of a fixed distribution and present 2 algorithms to perform sampling in the pure LDP setting. Lastly, authors compare their 2 algorithms with synthetic experiments.
优缺点分析
Strengths
- The paper is well written, the hypotheses are appropriately discussed, and the results statements are clear.
- The generalization of the work of Park to the functional LDP setting is an interesting and new contribution
- The existence of a minimax optimal sampler that is point-wise better than the one derived from the generalization of the work of Park is also an interesting contribution.
Weaknesses
- The class of distributions for which the sampler is optimal is restricted (the density verifies: c1 h(x) <= p(x) <= c2 h(x)). Although some restrictions are necessary to exclude trivial samplers, it is unclear to me why this particular condition arises or whether other choices could have been made.
问题
Could you discuss the choice of distribution class ?
局限性
yes
最终评判理由
I have no more questions. I keep my score.
格式问题
no
We thank the reviewer for their thoughtful review and insightful question. We are glad that they found the paper well written and appreciated our contributions, including the generalization of the global minimax framework and the introduction of instance-optimal samplers. Below, we provide a detailed discussion of the distribution class used in our work.
The class of distributions for which the sampler is optimal is restricted (the density verifies: ). Although some restrictions are necessary to exclude trivial samplers, it is unclear to me why this particular condition arises or whether other choices could have been made.
Could you discuss the choice of distribution class?
We thank the reviewer for raising this important point. The condition used to define the distribution class may seem restrictive at first glance, but it arises naturally from the need to balance generality with analytical tractability in LDP sampling.
As discussed in Park et al. [17], one may consider broader classes such as: (i) , the set of all probability measures on or (ii) , the set of all continuous distributions on . However, for such general choices, it has been shown that the minimax rate is vacuous, that is, any -LDP sampling mechanism attains the largest possible value of -divergence in the worst case (see Proposition 3.2 in Park et al.). As a result, all mechanisms perform equally poorly under these settings, making meaningful comparison or claims of optimality infeasible.
To obtain nontrivial and practically useful bounds, one must restrict attention to more regular classes of distributions. The class , which consists of distributions whose densities (with respect to a general base measure such as Lebesgue) are bounded between two scalings of a reference function , is one such choice. This class offers two key properties: (i) it includes a wide range of realistic distribution families (so it is rich enough), as we illustrate below, and (ii) it ensures that the worst-case -divergence is non-vacuous.
This modeling choice is not purely theoretical. As shown in Example 3.1 of our manuscript, mixtures of Gaussian or Laplace distributions with bounded location parameters fall within this class. For instance, Laplace mixtures supported within the unit ball satisfy and Gaussian mixtures with bounded means lie in for suitable choices of the reference functions and as defined in Example 3.1. These examples demonstrate that the proposed model captures a broad and practically relevant class of distributions.
In summary, while other distribution classes are conceivable, the choice of provides a balanced trade-off between expressiveness and analytical feasibility, and encompasses many distribution families relevant to real-world applications.
Thank you for your answer, I have no more questions and will keep my score.
This paper investigates the problem of distribution sampling under local differential privacy (LDP). The authors begin by examining the global setting, where they extend existing results to both approximate LDP and functional LDP. They then shift focus to the local setting, in which the distribution class is restricted to a neighborhood around a fixed distribution . Building on their global sampling approach, they develop a sampler that achieves the optimal minimax rate in the local setting.
优缺点分析
Strengths:
The paper addresses an important and timely problem in distribution sampling under local differential privacy, and it proposes a solution that achieves the optimal minimax rate. The extension from the global to the local setting is meaningful and adds significant value to the literature.
Weaknesses and Questions:
- The presentation could be improved, particularly regarding the purpose and organization of Section 3. Although the paper primarily focuses on local sampling, Section 3 is entirely devoted to global sampling. While I understand parts of section 3 is necessary for later sections like extension to functional LDP case and introducing , it would be clearer to restructure this material into a dedicated subsection.
- The proposed sampler in Theorem 4.1 is not computation efficient for general . In particular, the projection step to compute can be computation demanding. By the way, what's the meaning of "minimizes "? For any fixed and , this infimum is a constant value.
- The relationship between global and local sampling is not fully clarified. Intuitively. one would assume that since local sampler is restricted to a smaller distribution set, it will achieve a better minimax rate. However, this improvement is not quantified in the paper. If general case is infeasible to quantify, is it possible to at least give concrete examples to demonstrate the rate gap between global and local sampling.
There are some other minor comments:
- what is the definition of in equation 4?
- The definition of is unclear in section 4, are we still considering class defined in equation (2) or a general distribution class is considered?
问题
The questions are included in the "Strengths And Weaknesses" section.
局限性
Yes
格式问题
N/A
We thank the reviewer for their thoughtful feedback and for pointing out areas needing clarification, such as the definitions of and . We are also glad they found the extension from global to local minimax meaningful and acknowledged our contribution to private distribution sampling. Below, we address each of the reviewer’s questions and comments individually.
1. Presentation and Organization of Section 3:
We appreciate the reviewer’s suggestion regarding the presentation and structure of Section 3. While the main focus of the paper is on local sampling, we included Section 3 to provide the necessary theoretical groundwork for the subsequent sections, including the extension to functional LDP, the formulation of key assumptions, and the definition of global samplers used in the analysis. Specifically, we study global minimax samplers under the general framework of functional LDP and derive compact characterizations of the optimal samplers. As noted at the beginning of Section 3 (lines 170–171), these global samplers serve as proxies for constructing local minimax-optimal samplers in the following sections.
We aimed to preserve a natural progression in the paper’s structure, moving from functional LDP global minimax in Section 3, to functional LDP local minimax in Section 4, and then to pure LDP local minimax in Section 5. This order reflects the way the results were derived, with each step building on the previous one. That said, we agree that the purpose of Section 3 could be emphasized more clearly. In the revision, we will improve the framing of this section and better articulate its connection to the rest of the paper. We will also consider reorganizing key parts into a dedicated subsection to enhance clarity and flow.
2. Computational Efficiency of the Proposed Sampler and Clarification of the Projection Step in Theorem 4.1:
For general -divergences and sample spaces , this projection step may require solving a constrained optimization problem, and its complexity is on the same order as that of the clipping-based projection described in Theorem 5.1. However, as shown by Husain et al. [7], in certain special cases such as when the -divergence is the KL divergence and the sample space is finite, the projection problem admits a closed-form solution using the Karush-Kuhn-Tucker (KKT) conditions. In these settings, the projection step can be carried out efficiently without incurring additional computational overhead.
As the reviewer points out, the projection can be computationally demanding, particularly in high-dimensional settings, even though a closed-form expression for it is available. This is because the expression involves a clipping step that depends on a normalization constant (similar to in the statement of Theorem 5.1). Computing this constant requires performing numerical integration over a high-dimensional space, which can be costly and often becomes the main computational bottleneck.
In the following, we include a discussion on the computational complexity of the samplers proposed in our work. Overall, the samplers derived in this work fall into two categories: (1) linear samplers, such as the core sampler in Theorem 4.1, and (2) nonlinear samplers involving clipping functions, as in Theorem 5.1.
The linear sampler , defined in Theorem 3.4 (Equation (4)), has density , a convex combination of the input and reference . Its complexity lies in computing the mixing coefficient .
In many cases—e.g., pure, approximate, and Gaussian LDP—the conjugate is known in closed form, allowing this optimization to be solved analytically or efficiently numerically. Thus, the linear sampler is computationally tractable.
In contrast, the nonlinear sampler in Theorem 5.1 is defined by the mapping , whose density is
where the constant is selected to ensure normalization condition . The main computational burden lies in determining this normalizing constant . To compute it, we define the parametric family , and search for a value such that . It can be shown that the endpoints and satisfy and , and that is continuous. Therefore, can be found using the bisection method on the interval .
While the bisection method requires only a logarithmic number of steps, the main computational cost lies in evaluating the integral at each step. This is tractable in low-dimensional settings but quickly becomes prohibitive as dimensionality increases.
Regarding the reviewer’s question about the expression “ minimizes ,” we intended to convey that we find the distribution that minimizes over all . The reviewer is correct, and we will revise the text accordingly for clarity.
3. Relationship Between Global and Local Sampling and the Associated Rate Gap:
Under an -LDP constraint, consider the global model class
\quad \mu\text{-a.e.}\\}.$$ The global minimax risk for this class is given in Equation (6), line 227: $$\hspace{5cm}\mathcal{R}\bigl(Q_{\mathcal{X},\\,\tilde{\mathcal{P}},g_{\varepsilon}}, \tilde{\mathcal{P}},f\bigr)= \frac{1-r_1}{r_2 - r_1}\\,f(r_2)+\frac{r_2 - 1}{r_2 - r_1}\\,f(r_1),$$ where $r_1 = c_1 \cdot \frac{(1 - c_1)e^{\varepsilon} + c_2 - 1}{c_2 - c_1}$ and $r_2 = \frac{c_2}{c_2 - c_1} \cdot \frac{(1 - c_1)e^{\varepsilon} + c_2 - 1}{e^{\varepsilon}}. $ The local minimax risk is defined over the neighbourhood $N_{\gamma}(P_0) :=\\{ P \in \mathcal{P}(\mathcal{X}): E_{\gamma}(P \\| P_0) = E_{\gamma}(P_0 \\| P) = 0 \\}, $ which satisfies $N_{\gamma}(P_0) \subseteq \tilde{\mathcal{P}}\_{c_1,c_2,\mu}$. As defined in line 1249 of the technical appendix, the corresponding local minimax risk is $$\hspace{5cm}\mathcal{R}\bigl( Q_{\mathcal{X},N_{\gamma}(P_0),\varepsilon}, N_{\gamma}(P_0),f \bigr) = \frac{1 - r_1}{r_2 - r_1}f(r_2) + \frac{r_2 - 1}{r_2 - r_1}f(r_1),where and
While the difference between the local and global risks can be computed in closed form via direct subtraction, the resulting expression is not particularly insightful, as it depends on several problem-specific parameters, including , the sample space, and the privacy level . Although the subtraction can be evaluated for any given instance, it does not, in general, yield a simple or interpretable expression.
To better illustrate the improvement, we provide numerical comparisons in both discrete and continuous settings. In the discrete case (Figures 3, 5, 6), we compare the global and local minimax-optimal samplers for , where the global class is with being the uniform distribution. The local neighbourhood is defined as with , ensuring that .
We instantiate both the global and local neighbourhoods using the parameters defined in Section 6.1, and compute the exact minimax risks in each case. Figures 3, 5, and 6 present these theoretical worst-case risks, clearly demonstrating the benefit of tailoring the mechanism to the local neighbourhood. These plots consistently illustrate a significantly improved minimax rate in the local setting compared to the global one, across various -divergences and parameter ranges.
4. Definition of in Equation (4):
We define to be the convex conjugate of the function , as introduced in the notation section. Formally, it is given by . To improve clarity and completeness, we will include this formal definition in the notation section.
5. Definition of in Section 4:
As for the definition of in Section 4, we clarify that refers to a general set of possible distributions relevant to the problem under consideration. In particular, in Section 4, may represent any general class of continuous distributions, subject only to the constraint which is assumed in the statement of Theorem 4.1. We will include a note at the beginning of the section to make this generality of explicit.
This paper studies the problem of generating a sample from a distribution under the constraints of local differential privacy. This work extends previous work in two directions: 1) it studies sampling algorithms satisfying functional local differential privacy with optimal global minimax risk 2) it studies local minimax risks of private samplers in the presence of public data. The proposed algorithms are validate experimentally.
优缺点分析
Strengths
-
The paper presents two methods to identify private samplers that are locally minimax optimal when a public distribution is available,
-
the paper characterizes samplers that are globally minimax optimal and that satisfy functional LDP.
Weaknesses
-
The focus of the paper on the local model makes the sampling problem less about algorithms and more about finding the right parameters,
-
the paper doesn't give much intuition about the derived samplers.
Several recent works have studied sampling from distributions under
the constraints of differential privacy in different models. This
paper considers this problem in the local model of differential
privacy, where every user holds a distribution and aims at releasing a
sample from it. More specifically, the paper focuses on minmax notions
of utility of the samplers with respect to f-divergences. In
particular, the utility is defined at the user level rather than at
the protocol level as in works that have studied statistical tasks in
the local model. While certainly meaningful, this focus makes the
paper more about figuring out the right setup and parameters of
the samplers rather than coming up with novel algorithm.
The paper does this well focusing on problems that were not studied
in previous works. The global minimax results extend the set of results available to
functional LDP and to different discrete and continuous
samplers. Similarly to what I said above, functional DP was proposed
in the context of the central model. While it can certainly also be
used for the local model, its properties and semantics are different
and make some problem more interesting and some problem less
interesting. The results on local minimax extend previous work which
studied samplers with public data in the local model by providing
optimal local minimax samplers. It is worth to notice that the paper
presents two frameworks to do this. The experiments confirm the
theoretical results. Overall, I fund this paper contributing good
results to the theory of differentially private local samplers,
although I didn't find them particularly exciting in terms of new
ideas required.
问题
Can you please provide a comparison between how you use public data an previous work [22] used them?
In the limitation you mentions the computational complexity of your samplers, can you give more details on it?
局限性
yes
格式问题
none
We thank the reviewer for their thoughtful and detailed review. We are glad they appreciated our contributions, including the extension to functional LDP and the characterization of optimal local minimax samplers. Below, we address each of the reviewer’s questions and concerns individually.
The focus of the paper on the local model makes the sampling problem less about algorithms and more about finding the right parameters.
While global minimax private sampling under pure LDP has been studied previously (e.g., Park et al. [17]), our work extends this by developing both global and local minimax results under the more general functional LDP framework, which encompasses pure, approximate, and Gaussian LDP. This required introducing new assumptions, proof techniques, and a carefully defined local neighborhood that fundamentally differs from the global setting. These developments allow us to bridge global and local minimax frameworks and derive new linear and nonlinear samplers that are optimal in the local minimax sense under functional LDP. Importantly, we propose new algorithms tailored to this setting—not merely new parameters for existing ones. Our framework also lays a foundation for future work, including broader neighborhood definitions and scalable implementations, as outlined in the limitations.
The paper doesn't give much intuition about the derived samplers.
We thank the reviewer for their constructive feedback. While we made an effort to convey the intuition behind the derived samplers—through both textual explanations and visual illustrations—we agree that additional elaboration could help improve clarity. Below, we provide a complementary interpretation of the two main classes of samplers discussed in the paper: linear and nonlinear samplers.
For the linear samplers, such as the global sampler defined by as presented in lines 204–206 of the manuscript, a natural interpretation is that the mechanism samples from the input distribution with probability , and from the reference distribution otherwise. This probabilistic mixture reflects the privacy–utility trade-off: a larger yields outputs closer to , while sampling from introduces the randomness required for privacy. This same intuition applies to all linear samplers derived in the paper.
For the nonlinear sampler presented in Theorem 5.1, the optimal sampling mechanism is given by
As described in lines 287–288 of the manuscript, this construction defines a nonlinear sampler by projecting the input distribution onto a family of distributions referred to as the mollifier , as defined in the paper (see Figure 2). The clipping operation ensures that the output distribution satisfies the LDP constraint while maintaining a close match to the original distribution in terms of -divergence. In other words, belongs to the mollifier and is instance-optimal for the given input distribution. This approach yields a nonlinear transformation of that carefully balances privacy constraints and utility preservation.
Figure 2 further illustrates the sampling process under the local minimax framework. It involves two steps: (1) if lies outside the local neighborhood of the public distribution , it is projected inward; and (2) optimal sampling is performed using either a linear or nonlinear sampler, depending on the context.
We will incorporate these clarifications and interpretations in the final version to better communicate the intuition behind the derived samplers.
Can you please provide a comparison between how you use public data and previous work [22] used them?
Zamanlooy et al. [22] use public data as an explicit optimization constraint in their global minimax formulation, which is defined over discrete domains and restricted to linear samplers. Note that in this discrete domain, the sampling distribution can be expressed as , where is the data distribution (viewed as a vector) and is an LDP mechanism (viewed as a matrix that satisfies the LDP constraint). Zamanlooy et al. [22] considered -LDP mechanisms that preserve the public prior , enforcing . This guarantees that the public data remains invariant under the mechanism while privacy is provided for private data. They then formulated their minimax optimization as:
In contrast, we use public data in a conceptually different way. Instead of treating it as a hard constraint, we use the public distribution to define a local neighborhood . The worst-case input distributions are then considered within this neighborhood. This leads to a local minimax formulation of the private sampling problem over a general sample space, without restricting to linear mechanisms. Specifically, we search over mechanisms , where again denotes the class of -LDP samplers:
Note that the formulation of [22] is a special form of our setting: where is an -LDP sampler. Our goal is to move beyond the overly pessimistic global worst-case risk and instead focus on distributions that are close to , which is assumed to be publicly available to all users. This enables a more refined and robust characterization of optimal samplers from a local minimax perspective. It is important to note that in the formulation of Zamanlooy et al., having public data does NOT necessarily lead to a more accurate sampling (smaller minimax risk), whereas this is the case in our formulation.
That said, as shown in Theorems 4.1 and 5.1, the local minimax-optimal samplers and both satisfy and , even though the preservation of is not explicitly enforced as a constraint. This demonstrates that our formulation inherently respects the public data when the input distribution coincides with it, without requiring this condition to be imposed as part of the optimization, meaning that our formulation is a non-trivial generalization of Zamanlooy et al. [22] to general data distribution (not necessarily to discrete distribuition, as in [22]).
In the limitation you mentions the computational complexity of your samplers, can you give more details on it?
We thank the reviewer for raising this important point regarding computational complexity. Overall, the samplers derived in this work fall into two categories: (1) inear samplers, such as the core sampler in Theorem 4.1, and (2) nonlinear samplers involving clipping functions, as in Theorem 5.1.
The linear sampler \\mathbf{Q}^\\star_{c_1,c_2,h,g} defined in Theorem 3.4 (Equation (4)), has density q^\\star_g(P)(x):=\\lambda_{c_1,c_2,g}^{\\star}\\,p(x)+\\left(1-\lambda_{c_1,c_2,g}^{\star} \right) h(x), a convex combination of the input and reference . Its complexity lies in computing the mixing coefficient , which is given by .
In many cases—e.g., pure, approximate, and Gaussian LDP—the conjugate is known in closed form, allowing this optimization to be solved analytically or efficiently numerically. Thus, the linear sampler is computationally tractable.
In contrast, the nonlinear sampler in Theorem 5.1 is defined by the mapping , whose density is where the constant is selected so that . The main computational burden lies in determining this normalizing constant . To compute it, we define the parametric family , and search for a value such that . It can be shown that the endpoints and satisfy and , and that is continuous. Therefore, can be found using the bisection method on the interval .
While the bisection method requires only a logarithmic number of steps, the main computational cost lies in evaluating the integral at each step. This is tractable in low-dimensional settings but quickly becomes prohibitive as dimensionality increases.
A promising direction to mitigate this is to bypass explicit integration by designing approximate samplers that minimize the divergence , where is the optimal -LDP sampler. This offers a rich path for future work, especially in scaling optimal mechanisms to high-dimensional domains.
Thanks for your reply. This helps clarifying several aspects of the paper. I recommend you to include in the paper the more detailed discussion about the use of public data, as well as the discussion about the complexity challenge.
We appreciate the reviewer’s constructive feedback and are glad our rebuttal helped clarify their concerns. For completeness and clarity, we will incorporate the discussions on the use of public data and the computational complexity challenges into the final version. Thank you again for the helpful suggestions.
This paper studies the problem of generating a sample from a distribution given to a user under the constraints of local differential privacy. The problem was studied/introduced in two previous works. This work extends previous work in two directions: (a) goes beyond pure DP with optimal global minimax risk (b) it studies local minimax risks of private samplers, modeled as the knowledge of a "public" distribution P_0. The errors of the proposed algorithms are validated numerically. This is a nice technical and conceptual progress on a relatively natural problem. The main weakness is that there are no compelling motivating problems that would allow one to evaluate the significance of any of the improvements. Still given the importance of sampling problems in modern ML the insights given in this work may prove useful.