/10

Oral4 位审稿人

最低4最高5标准差0.4

ICML 2025

On Differential Privacy for Adaptively Solving Search Problems via Sketching

Shiyuan Feng,Ying Feng,George Zhaoqi Li,Zhao Song,David Woodruff,Lichen Zhang

提交: 2025-01-15更新: 2025-07-24

TL;DR

We develop data structures for search problems such as regression and nearest neighbor search that are robust against an adaptive adversary.

摘要

Recently differential privacy has been used for a number of streaming, data structure, and dynamic graph problems as a means of hiding the internal randomness of the data structure, so that multiple possibly adaptive queries can be made without sacrificing the correctness of the responses. Although these works use differential privacy to show that for some problems it is possible to tolerate $T$ queries using $\widetilde{O}(\sqrt{T})$ copies of a data structure, such results only apply to {\it numerical estimation problems}, and only return the {\it cost} of an optimization problem rather than the solution itself. In this paper we investigate the use of differential privacy for adaptive queries to {\it search} problems, which are significantly more challenging since the responses to queries can reveal much more about the internal randomness than a single numerical query. We focus on two classical search problems: nearest neighbor queries and regression with arbitrary turnstile updates. We identify key parameters to these problems, such as the number of $c$-approximate near neighbors and the matrix condition number, and use different differential privacy techniques to design algorithms returning the solution point or solution vector with memory and time depending on these parameters. We give algorithms for each of these problems that achieve similar tradeoffs.

关键词

data structureadaptive robustnessdifferential privacy

评审与讨论

审稿意见

评分: 42025-02-27

First the authors consider the differentially private ANN problem. They reduce this problem of differentially private ANN to differentially private selection problem, and then solve the differentially private selection problem via differentialy private selection methods. Next, the authors consider the problem of differentially private linear regression. They solve this problem by applying matrix sketching techniques. The key idea is to use standard a number of different random matrix sketches, and then to aggregate the solutions in a differentially private manner via an extension of a differentially private median approach of (Beimel et all 2022). Finally, to bound the utility they use they compute an ell-infinity bound on the error introduced by sketching. As the authors note, this is an interesting approach to bounding the utility as it is an uncommon use of ell-infinity bounds in matrix sketching.

update after rebuttal

Thank you for the helpful clarifications. I have raised my score accordingly. Please include these clarifications in the final version.

给作者的问题

It would be good if the authors could clarify how their work improves on prior works.

论据与证据

Yes

方法与评估标准

Yes

理论论述

I read the technical overview, but did not carefully check the proofs.

实验设计与分析

There are no experiments in the paper.

补充材料

Please see question about "proofs" above.

与现有文献的关系

The authors do a good job of discussing which previous techniques they build on.

However, as far as I can tell, the authors do not provide much comparison to prior related works. It would be good if the authors could clarify how their work improves on prior works, or point to where the improvement over prior works is discussed in the paper.

遗漏的重要参考文献

Please see "Relation To Broader Scientific Literature" question above.

其他优缺点

I think the strength of the paper is in the techniques, which are well explained in the technical overview. However, the comparison to prior works can be improved. Specifically, it would be good if the authors could clarify how their work improves on prior works.

其他意见或建议

N/A

作者回复

2025-04-01

We thank the reviewer for their question about prior work, which we address below:

Prior works using differential privacy for adaptivity focus on numerical estimates [Hassidim et al., 2022; Beimel et al., 2022; Song et al., 2023; Cherapanamjeri et al., 2023]. Our work instead tackles the search problem—returning a point using an adaptive DP data structure. While differentially private heavy hitter algorithms (e.g., [Chadha et al., 2023]) relate to private search, they lack adaptivity. To our knowledge, this is the first work combining DP with adaptive search.
Because of this, prior adaptive search methods either (a) build $T$ data structures (one per query), or (b) use a net argument over the space, requiring $\widetilde{O}(d)$ structures.

Below we compare our method to these baselines for ANN and regression. We omit $\widetilde{O}(\cdot)$ for clarity.

Approximate Nearest Neighbor (ANN)

Method	Space	Amortized Prep Time	Query Time	Update Time
$T$ copies	$T n^{1+\rho} + nd$	$n^{1+\rho} d$	$n^\rho d$	$T n^\rho d$
$d$ copies	$n^{1+\rho} d$	$\frac{d}{T} n^{1+\rho} d$	$n^\rho d$	$d n^\rho d$
Ours	$\sqrt{T} s n^{1+\rho} + nd$	$\frac{s}{\sqrt{T}} n^{1+\rho} d$	$s n^\rho d$	$\sqrt{T} s n^\rho d$

For $T$ adaptive queries/updates, the exact method scans the dataset. If $T \leq d$ , we outperform $T$ -copies when $s \leq \min\\{\sqrt{T}, n\\}$ ; if $d \leq T$ , we outperform $d$ -copies when $s \leq \min\\{\frac{d}{\sqrt{T}}, n\\}$ .

Regression

Let $U \in \mathbb{R}^{n \times d}$ and $b \in \mathbb{R}^n$ , with goal to compute $x$ minimizing $\\|Ux - b\\|_2$ up to $(1+\alpha)$ . Each step updates $U$ , $b$ via $v_t$ ; $\kappa$ bounds $U$ 's condition number.

Method	Space	Amortized Prep Time	Query Time	Update Time
$T$ copies	$\frac{T d^2}{\alpha^2}$	$\text{nnz}(U,b)+d^3+\frac{d^2}{\alpha^2}$	$\frac{d^{\omega+1}}{\alpha^2}$	$T(\text{nnz}(v_t)+d^3+\frac{d^2}{\alpha^2})$
$nd$ copies	$\frac{n d^3}{\alpha^2}$	$\frac{nd}{T}(\text{nnz}(U, b)+d^3+\frac{d^2}{\alpha^2})$	$\frac{d^{\omega+1}}{\alpha^2}$	$nd(\text{nnz}(v_t)+d^3+\frac{d^2}{\alpha^2})$
Ours	$\frac{\sqrt{T} d^{2.5} \kappa^2}{\alpha^2}$	$\sqrt{\frac{d}{T}}(\text{nnz}(U, b)+d^3+\frac{d^2 \kappa^2}{\alpha^2})$	$\frac{d^{\omega+1} \kappa^2}{\alpha^2}$	$\sqrt{T d}(\text{nnz}(v_t)+d^3+\frac{d^2 \kappa^2}{\alpha^2})$

We outperform $T$ -copies when $\kappa^2 \leq \sqrt{T/d}$ , and $nd$ -copies when $\kappa^2 \leq n \sqrt{d/T}$ —conditions often met in practice.

Regression with Sparse Label Updates

If only $b$ is updated and each $v_t$ changes at most $s$ entries, $\kappa^2$ dependence can be reduced to polylogarithmic.

Method	Space	Amortized Prep Time	Query Time	Update Time
$T$ copies	$\frac{T d^2}{\alpha^2}$	$\text{nnz}(U)+d^3+\frac{d^2}{\alpha^2}$	$d^2$	$T(s+d^3+\frac{d^2}{\alpha^2})$
$n$ copies	$\frac{n d^2}{\alpha^2}$	$\frac{n}{T}(\text{nnz}(U)+d^3+\frac{d^2}{\alpha^2})$	$d^2$	$n(s+d^3+\frac{d^2}{\alpha^2})$
Ours	$\frac{\sqrt{T d} \cdot d^2}{\alpha^2}$	$\sqrt{\frac{d}{T}}(\text{nnz}(U)+d^3+\frac{d^2}{\alpha^2})$	$d^2$	$\sqrt{T d}(s+d^3+\frac{d^2}{\alpha^2})$

Only $n$ copies are needed. Our method is faster when $T \leq n$ and $d \leq T$ , or when $n \leq T$ and $\sqrt{T d} \leq n$ . Since $T$ is user-chosen (rebuilding occurs every $T$ queries), trade-offs can be tuned (see Claims D.16–D.17).

Summary:
Our method leverages DP with advanced composition, achieving $\widetilde{O}(T)$ scaling and improved space, preprocessing, query, and update complexity. These advantages are especially notable when $T$ is large, or when sparsity $s$ and condition number $\kappa$ are favorable. We will ensure these comparisons are clearly included in the revision.

审稿意见

评分: 42025-03-13

The paper addresses the problem of hiding internal randomness of data structures using differential privacy. Specifically, the authors focus on nearest neighbor search and regression problem and aim to protect the internal randomness against adaptive adversary using differentially private techniques.

Update after rebuttal

The author response provided during the rebuttal clearly addressed my questions regarding the interaction between the privacy mechanism and the underlying data structure, and the privacy loss accumulation. I have raised my rating.

给作者的问题

See above

论据与证据

A sequence of theorems presented in Section 1.1 clearly demonstrates improvements over existing algorithms. They provide probabilistic utility guarantees as well as time and memory complexity, but it is not immediately clear how differential privacy interacts with these bounds. While I understand that characterizing the interplay between data structure and differential privacy is not the main focus, it could be of interest to a broader audience.

方法与评估标准

Section 2 provides explanation on how the search problem can be formulated as differentially private selection, which completely makes sense. The authors use the standard Laplace mechanism to achieve pure $(\epsilon, 0)$ -DP. I wonder if the Gaussian mechanism can be used instead to achieve $(\epsilon, \delta)$ -DP. In that case, how does it affect the guarantees stated in theorems in Section 1?
Another aspect not discussed in the paper is the accumulation of privacy loss. In differential privacy, the total privacy loss accumulates as the mechanism releases private answers to successive queries, which means the mechanism must shut down the database and stop answering after a certain number of queries. However, it is unclear from the paper whether there is a limit to the number of queries the underlying data structure can handle.

理论论述

I didn’t thoroughly check the correctness of the proofs.

实验设计与分析

This is a theory paper that doesn't contain experimental results.

补充材料

No supplementary materials are available.

与现有文献的关系

This paper develops efficient data structures that protect their internal randomness from adaptive adversaries, offering a secure and private foundation for machine learning tasks.

遗漏的重要参考文献

I don't find any missing related work.

其他优缺点

Strength
- The paper presents a sequence of theorems that describe improvements over existing results.
- The authors provide interpretations of theorems.
Weakness
- This may not be a weakness, but rather a comment. While the paper is clear about the algorithmic properties of underlying data structures, e.g., how its time and storage complexities scale with input size, they don’t reveal much about the properties of differential privacy, such as composition and privacy loss accumulation.
- I believe that including some figures or diagrams would have been helpful for better understanding the proposed technique.

其他意见或建议

None

作者回复

2025-04-01

We thank you for your very helpful comments. We would also like to address your questions and comments regarding differential privacy.

Let us start with a high-level overview of the framework, first introduced in [Hassidim et al., 2022; Beimel et al., 2022]. The rough idea is to treat the random seeds used by data structures as the database for which we want to preserve privacy. In the simplest setting, to achieve a target privacy level $\epsilon$ , one might need to aggregate the responses of $T$ data structures in a differentially private manner. However, by the advanced composition theorem, the same level of privacy in terms of $\epsilon$ can be achieved using only $\widetilde{O}(\sqrt{T})$ independent outputs—at the cost of moving from pure DP to approximate DP with $\delta > 0$ . As long as $\delta$ is small, the approximation is acceptable.

Responses to Specific Questions

Q: Can the Gaussian mechanism be used for $(\epsilon, \delta)$ -DP? In that case, how does it affect the final guarantees?

A: Yes, the Gaussian mechanism is valid for achieving $(\epsilon, \delta)$ -DP, provided that $\delta$ is not too large. Suppose we want the data structure to succeed with probability at least $1 - \beta$ . Let $\delta_{\text{final}}$ denote the total privacy loss, composed of:

$\delta_{\text{Gaussian}}$ : from the Gaussian mechanism
$\delta_{\text{advanced}}$ : from advanced composition

Then:

\delta_{\text{final}} = \delta_{\text{advanced}} + T \cdot \delta_{\text{Gaussian}}.

In our original setting, we set $\delta_{\text{advanced}} = \beta / 100$ using an appropriate $\epsilon$ . To use the Gaussian mechanism instead, we may set $\delta_{\text{Gaussian}} = \frac{\beta}{200T}$ and reduce $\delta_{\text{advanced}}$ to $\beta/200$ . This still yields $\delta_{\text{final}} = \beta / 100$ , matching the Laplace mechanism’s privacy loss.

We chose the Laplace mechanism for conceptual simplicity and because the task is essentially a counting problem, where Laplace noise is more natural.

Q: In DP, privacy loss accumulates with repeated queries, so mechanisms are often shut down after a limit. Does your adaptive data structure exhibit this behavior?

A: Yes, the adaptive data structure’s ability to handle $T$ queries hinges on privacy composition. Using the advanced composition theorem, we guarantee $(\epsilon, \delta)$ -DP across $T$ interactions by using $\widetilde{O}(T)$ resources (e.g., data structure copies or random seeds), rather than $O(T)$ .

However, this requires $T$ to be fixed in advance. For indefinite querying, a standard approach is to rebuild the data structure periodically after every $T$ queries. This permits optimization of $T$ to balance space, amortized preprocessing time, and query/update efficiency.

Additional Comments

It would be better to discuss more on differential privacy, such as composition and loss accumulation.

A: Thank you for pointing this out. We will ensure that the final version includes a more detailed discussion on differential privacy, particularly covering composition theorems and cumulative privacy loss (please refer to our high-level overview and answers regarding loss accumulation above).

Including figures and diagrams would be helpful for explaining the techniques.

A: We appreciate the suggestion. We agree that visualizations—especially for illustrating how the sparse argmax mechanism is applied in our ANN algorithm—would enhance clarity and intuitively explain why our algorithm is both correct and efficient. We will include appropriate figures and diagrams to better illustrate the framework in the revision.

审稿意见

评分: 42025-03-14

The paper explores using differential privacy for answering adaptively chosen queries (potentially by and adversary) for search problems. The authors consider approximate nearest neighbor and regression as the main problems in this work. The main contributions are showing that under reasonable assumptions, it is possible to maintain $O(\sqrt{T})$ copies of data structure to answer $T$ queries for both problems, matching the bounds from previous work that uses DP for adversarial streaming. For the ANN problem, they reduce the problem to private selection and provide a new mechanism that obtains better runtime for sparse vectors. For regression problem with updates to both design matrix and response vector, the approach involves maintaining and updating multiple sketch matrices, solving to generate multiple solution vectors and using DP median to generate the final solution. The authors also show application to online weighted matching and computing terminal embeddings.

给作者的问题

No additional questions.

论据与证据

All the claims made in paper are accompanied with proofs. Reasoning and justification for the assumptions made are also provided.

方法与评估标准

The paper does not have experiments.

理论论述

The proofs are correct to the best of my knowledge. The contributions sublinear dependence on $T$ for time/space complexity in answering queries. The baseline comparison are the naive or existing bounds of $O(T)$ and $O(dim)$ . The accuracy guarantees follow from success probability for the data structure or the generalization guarantee for DP algorithms.

实验设计与分析

This is a theoretical work, no experiments are provided.

补充材料

The proofs in appendix are correct to the best of my knowledge.

与现有文献的关系

This work contributes line of prior work that explores connection between differential privacy and robustness to adaptivity. The paper makes novel contribution in this field by giving approaches that work for search problems.

遗漏的重要参考文献

Not to the extent of my knowledge.

其他优缺点

The sparse argmax mechanism is a nice contribution and might be of interest to broader DP community. For regression problem, the dependence on $\kappa$ is quadratic which might be suboptimal -- in addition the alternative approach suggested obtains linear in $T$ .

其他意见或建议

No additional comments.

作者回复

2025-04-01

We thank you for your very helpful comments and we would like to address your comments on the quadratic dependence on $\kappa$ and linear dependence on $T$ for the alternative.

For the $\kappa^2$ dependence, the main reason is that by using the $\ell_\infty$ guarantee together with the standard relative error guarantee of sketched regression, we can show that the regression cost of the private median estimator $g$ satisfies $\\|Ug-b\\|_2 \leq (1+\kappa \alpha’)\\|Ux^*-b\\|_2$ . Hence, to account for the blowup in condition number, we need to scale down the approximation factor by $\kappa$ . Since the sketching dimension depends inverse quadratically on the approximation factor, we have a $\kappa^2$ dependence in the runtime. One could argue that known sketching-based methods based on the following framework need to incur such a dependence: our algorithm estimates each coordinate to high precision, and this inevitably incurs a dependence on the condition number for the error, and the sketching dimension in turn depends quadratically on the condition number. When there are no updates to $U$ , we could use an iterative method instead to remove the polynomial condition number dependence. It is an interesting open question to develop an iterative algorithm that can remove the dependence on the condition number while also supporting updates to $U$ . Therefore, while the $\kappa^2$ dependence is a consequence of our current sketching approach for worst-case guarantees, we believe the overall framework offers significant advantages, and reducing this dependence under updates is an important avenue for future work.

The alternative method based on the bounded path technique can be treated as a refinement to the method of using a fresh sketch for each update and query. Specifically, the algorithm depends on $\log |{\cal P}|$ , and $|{\cal P}|$ can be naively upper bounded by $(n\kappa)^{dT}$ , making it less appealing. However, we note that $|{\cal P}|$ is a more fine-grained parameter that could capture the structure of the input, e.g., if only entry changes between queries are allowed, then the upper bound reduces to $(nd)^T$ . If the updates are infrequent, e.g., only $\sqrt T$ updates are performed, then the bound becomes $(nd)^{\sqrt T}$ . Hence, while the alternative is no more efficient in the most general sense, it provides more fine-grained control and a speedup in situations for which the instance has extra structure.

审稿意见

评分: 52025-03-14

The paper studies the problem of adaptive algorithms: the algorithms where the adversary interacts with a randomized algorithm and the goal of the adversary is to increase the error probability.

It is known that differential privacy could be used to design such algorithms, but the existing applications were for algorithms computing numeric results; this paper generalizes the approach to vector-valued algorithms. As a result, it builds an algorithm for the nearest neighbor problem with better than previously known running and preprocessing time.

给作者的问题

N/A

论据与证据

The evidence sufficiently supports the claims.

方法与评估标准

Methods and evaluations are appropriate for the problem at hand.

理论论述

The proofs presented in the paper are correct.

实验设计与分析

There are no experiments in the paper.

补充材料

I haven't reviewed supplementary material.

与现有文献的关系

For me, personally, the idea of using differential privacy for designing adaptive algorithms is one of the most exciting ideas in the last couple of years: it connects the fields that from the first glance have almost nothing in common. While this is not the first paper in this direction, it generalizes the idea to a much wider class of problems.

遗漏的重要参考文献

I don’t think any specific reference is missing.

其他优缺点

N/A

其他意见或建议

Line 40: It would be better to rephrase the sentence about queries; it is a bit confusing in the current form. Line 94: The assumption doesn’t say anything about neighbours, it only talks about cr balls. Perhaps the explanation needs to be rewritten. Line 258: It should be argmax \sum b_{(i), j}; the j index is missing in the paper.

作者回复

2025-04-01

We appreciate your encouraging and helpful comments. Regarding your comments:

Line 40:
We will clarify this sentence. Our intention is to express both ANN and regression in a unified way, and we will revise it to clearly explain the query and update models in each problem.
Line 94:
The assumption states that the predicate function $f_{v_t}$ is sparse, and we define the predicate as the intersection of the dataset $U$ and the $cr$ -ball around $v_t$ —that is, all points in $U$ within distance $cr$ of the query. We acknowledge that the current formulation does not make this connection explicit and will revise it for clarity.
Line 258:
We will fix the missing index $j$ .

最终决定Accept (oral)

2025-05-01

The paper extends the literature on using differential privacy as a tool to design randomized data structures that maintain their correctness properties against adaptive (rather than oblivious) adversaries. The main contribution is the development of techniques that allow solving search problems under adaptive queries, in particular approximate near neighbor search. The paper addresses important problems, and introduces interesting techniques. The reviewer consensus is that the paper would be a strong contribution to the ICML program.