5.8

/10

Rejected4 位审稿人

最低3最高8标准差1.8

3.3

置信度

正确性2.8

贡献度2.8

表达2.8

ICLR 2025

A Dynamic Low-Rank Fast Gaussian Transform

Baihe Huang,Zhao Song,OMRI WEINSTEIN,Junze Yin,Hengjie Zhang,Ruizhe Zhang

OpenReview PDF

提交: 2024-09-27更新: 2025-02-05

TL;DR

We present a dynamic data structure for the Fast Gaussian Transform (FGT) that supports efficient updates and queries in polylogarithmic time.

摘要

The Fast Gaussian Transform (FGT) enables subquadratic-time multiplication of an $n\times n$ Gaussian kernel matrix $\mathsf{K}_{i,j}= \exp ( - \|| x_i - x_j \||_2^2 ) $ with an arbitrary vector $h \in \mathbb{R}^n$, where $x_1,\dots, x_n \in \mathbb{R}^d$ are a set of fixed source points. This kernel plays a central role in machine learning and random feature maps. Nevertheless, in most modern data analysis applications, datasets are dynamically changing (yet often have low rank), and recomputing the FGT from scratch in (kernel-based) algorithms incurs a major computational overhead ($\gtrsim n$ time for a single source update $\in \mathbb{R}^d$). These applications motivate a dynamic FGT algorithm, which maintains a dynamic set of sources under kernel-density estimation (KDE) queries in sublinear time while retaining Mat-Vec multiplication accuracy and speed. Assuming the dynamic data-points $x_i$ lie in a (possibly changing) $k$-dimensional subspace ($k\leq d$), our main result is an efficient dynamic FGT algorithm, supporting the following operations in $\log^{O(k)}(n/\varepsilon)$ time: (1) Adding or deleting a source point, and (2) Estimating the kernel-density of a query point with respect to sources with $\varepsilon$ additive accuracy. The core of the algorithm is a dynamic data structure for maintaining the projected ``interaction rank'' between source and target boxes, decoupled into finite truncation of Taylor and Hermite expansions.

关键词

Fast Gaussian TransformKernel density estimationDynamic data structure

评审与讨论

审稿意见

评分: 3置信度: 52024-10-27

The paper studies the following problem. Given $n$ points in $\mathbb{R}^d$ lying in a $k$ dimensional subspace, we would like to construct a data structure that supports insertion, deletion and query. More precisely, we can insert a new point into the data structure or delete an existing point from the data structure. Also, we can compute the kernel desnity estimation of the points in the data structure at a certain point within $\epsilon$ error efficiently. The result relies on fast Gauss transform. The authors constructure a data structure such that the running time of these operations is at most $\log^{O(k)}(1/\epsilon)$ time.

优点

The problem seems well-motivated.

缺点

The result relies heavily on the previous results. I am not sure if there is a new idea introduced.

问题

评论- Thank you very much for your insightful comments!

2024-11-22

We express our deepest gratitude to the reviewer for the time and effort in reviewing our work. We appreciate that you evaluate our work as well-motivated. Below, we want to address your concerns.

We respectfully disagree that this paper has no novelty or is merely an incremental combination of existing techniques. While it builds on prior work on FGT, we introduce several key technical innovations that enable the design of an efficient, fully dynamic data structure for the Fast Gaussian Transform problem. Our first contribution is the novel "decoupling" technique to disentangle the source points from target box centers in the FGT coefficients. This allows updating the coefficients in polylogarithmic time when adding/removing sources instead of the prohibitive linear scan. Second, generalizing the data structure to handle points from an unknown, dynamically changing low-dimensional subspace via efficient basis updates. To the best of our knowledge, this is the first result showing that FGT can bypass the curse of dimensionality for kernel density queries. Furthermore, an efficient dynamic FGT data structure has important practical implications in scaling up kernel methods and Gaussian process regression to massive streaming data. As evident from recent works on continually learning systems and online optimization, handling concept drift and covariate shifts in real-world datasets necessitates dynamic algorithms.

We again thank you for your time and for giving us your insightful comments, and we hope that our rebuttal may address your concerns.

审稿意见

评分: 8置信度: 32024-11-01

The fast gaussian transform (FGT) can be used to efficiently apply a gaussian kernal matrix to a vector. However traditionally the number of points in the gaussian kernal matrix is already fixed, so even adding one single point can require us to recompute the FGT from scratch, which makes it very hard to deal with datasets that are dynamically changing. In this work, the authors propose a dynamic FGT algorithm that supports adding or deleting a source point, and also estimating the kernel-density of a query point, under the assumption that the added data-points lie in a low dimensional subspace of the original vector space.

优点

I think this paper is well-written. The introduction clearly outlines the history of FMM and FGT, and states their drawback clearly: they are static and it is hard to add/delete points. I also think the main result section is also well-written, and state the contributions clearly.

I also think the technical contributions are solid: the problem of efficiently updating FGT for changing datasets is important, and this work seems to be one of the first to address this problem.

缺点

I do not see any major weaknesses, but I think it would be better if the authors could include some numerical experiments to showcase their results. In particular, how much faster is the dynamic FGT compared with recomputing from scratch?

问题

See above.

评论- Thank you very much for your insightful comments!

2024-11-22

We express our deepest gratitude to the reviewer for the time and effort in reviewing our work. We really appreciate for your invaluable comments, and we are very excited to see that our work is evaluated as well-written and technically sound. Besides these, the problem we study is also thought to be important. Below, we want to address your concerns.

You raise a very important concern about the lack of experimental results, which is similar to Reviewer 1GBM and Reviewer pkC2. We regretfully do not have more experimental results at this point. However, we also want to highlight that the works we build upon [1, 2] are completely theoretical works. Our focus in this paper remains on developing the theoretical side of this line of work, extending the landscape of fast Multipole method in a dynamic setting to support data insertion and deletion and to support data in a low-dimensional subspace. We admit that the lack of experimental results is a weakness of this series of works, including ours. We will try to include more experimental results in the revised version, but we believe that to fully fill in the lack of experimental results in this line of work would require more substantial effort in the future.

We thank you again for your insightful comments, and we agree with the general lack of experimental results in the line of works related to ours and will try to include more experimental results in the final version of our paper to fill this gap.

[1] Yeshwanth Cherapanamjeri, and Jelani Nelson. "Uniform approximations for randomized hadamard transforms with applications." STOC’22.

[2] Josh Alman, Timothy Chu, Aaron Schild, and Zhao Song. "Algorithms and hardness for linear algebra on geometric graphs." FOCS’20.

审稿意见

评分: 6置信度: 22024-11-02

The paper introduces a Dynamic Low-Rank Fast Gaussian Transform (FGT), designed to handle efficiently evolving datasets in kernel density estimation (KDE) and matrix-vector multiplication. The Fast Gaussian Transform (FGT) traditionally enables subquadratic-time multiplication of an n×n Gaussian kernel matrix with a vector, but it struggles with dynamically changing data. The proposed dynamic FGT algorithm supports fast updates and KDE queries in sublinear time, with quasi-linear time complexity for full matrix-vector multiplication.

The paper proposes a new data structure that dynamically maintains low-rank subspaces, supporting efficient insertion and deletion of points, along with fast KDE queries. The method builds on classical fast multipole methods (FMM) and uses Taylor and Hermite expansions to manage “interaction ranks” between source and target boxes. By focusing on data in low-dimensional subspaces, the method bypasses the curse of dimensionality, achieving sublinear update and query times under certain conditions. This is particularly useful for large, evolving datasets common in fields like machine learning. Importantly, they answered a question in an affirmative way:

Is it possible to ‘dynamize’ the Fast Gaussian Transform, in sublinear time? Can the exponentialdependence on d (Greengard & Strain, 1991) be mitigated if the data-points $x_i$ lie in a k-dimensional subspace of R^d?

优点

The key contribution is the new methodology proposed by this paper to handle efficiently evolving datasets in kernel density estimation (KDE) and matrix-vector multiplication in a dynamic setting. This question itself is very important in practice and the new method has good time complexity that supports fast updates and KDE queries in sublinear time, with quasi-linear time complexity for full matrix-vector multiplication. Overall, the paper’s strengths lie in its innovative adaptation of FGT to a dynamic setting, its computational efficiency, and its broad relevance to various machine learning and statistical applications. The work addresses a crucial need for scalable kernel-based methods in dynamic, real-time data environments.

缺点

I think one problem or actually a question that is worth thinking and needs to be clarified is that the method is mainly tailored specifically for Gaussian kernels. Although in Section 3.4, it is generalized to a broader class of "fast-decaying kernels". It is still unclear if it allows other important kernels such as polynomial kernels, Moreover, as I see the kernel is defined using the Euclidean distance, maybe in manifold settings, it is restrictive and not the proper kernel choice.

The other issue is that the paper does not provide detailed guidance on choosing parameters, for example, in Section 3.1, the kernel bandwidth $\delta$ and the length of the smaller boxes $L$ , as it is mainly a theoretical paper without real experiments.

While the paper lists theoretical contributions, since the paper proposes a Fast Gaussian Transform (FGT), which is an algorithm on the issue of fast computation, I think it would be good to do some, at least simple simulations on a dataset that is dynamically changing (as claimed by the paper) to see its empirical performance.

问题

See weakness section on how to extend the results to other types of kernels? how to handle tuning parameters as mentioned in the weakness section?

伦理问题详情

N/A

评论- Thank you very much for your insightful comments!

2024-11-22

We express our deepest gratitude to the reviewer for the time and effort in reviewing our work. We sincerely thank you for evaluating our work as having novel techniques in efficiently handling changing datasets in KDE and matrix-vector multiplication in a dynamic setting, having good time complexity, and the problem we study is very important in practice. Below, we want to address your concerns about our work.

Regarding the first weakness, our method indeed can be generalized to a very broad class of "fast-decaying kernels". We formally present the specific properties, derived from [1], of this class of the kernel in Appendix B, Definition B.1, which includes:

$\bullet$ The kernel function $f$ is non-increasing, i.e., $f(x)\leq f(y)$ when $x\geq y$ .

$\bullet$ $f$ is decreasing fast, i.e., $f(\Theta(\log(1/\epsilon)))\leq \epsilon$ .

$\bullet$ $f$ ’s Hermite expansion and Taylor expansion are truncatable: the truncation error of the first $(\log^d(1/\epsilon))$ terms in the Hermite and Taylor expansion of $\mathsf{K}$ is at most $\epsilon$ .

The specific examples of the kernel function $f$ satisfying these three properties include:

$\bullet$ inverse polynomial kernels: $K(x,y) = 1 / \|| x - y \||_2^c$ for constant c > 0,

$\bullet$ exponential kernel: $K(x,y) = \exp(-\|| x - y \||_2)$ ,

$\bullet$ inverse multiquadric kernel $K(x,y) = 1/\sqrt{\|| x - y \||_2^2 + c}$ [2, 3],

$\bullet$ rational quadratic kernel $K(x,y) = 1/(1 + \||x - y\||_2^2/\alpha)$ where $\alpha > 0$ ,

$\bullet$ piecewise exponential kernel [1], and

$\bullet$ more can be found in [1].

All of these kernels are widely used in the literature (see [1]). Thank you for pointing this out, and we hope that these examples may make our claim more clear.

Regarding the second weakness, the kernel bandwidth $\delta > 0$ can be set using standard rules like median heuristic or cross-validation. For the box length $L = r \sqrt{2 \delta}$ , we have that $r$ controls the tradeoff between computational cost and accuracy. We recommend $r = 1/2$ as it provides a good balance, and the error bound (see Eq. (1) in Line 243-244) scales as $\exp(-2 r^2 k^2)$ where $k$ is a parameter that controls the number of neighboring boxes. For the truncation parameter $p$ , we set it to $p=\log(\|| q \||_1/\epsilon)$ to achieve desired accuracy $\epsilon$ (see Lemma E.6). It can be adjusted dynamically based on observed errors. Thank you for pointing out this issue! We hope that our response may address your concern.

Regarding the third weakness, we regretfully do not have more experimental results at this point. However, we also want to highlight that the works we build upon [1, 4] are completely theoretical works. Our focus in this paper remains on developing the theoretical side of this line of work, extending the landscape of fast Multipole method in a dynamic setting to support data insertion and deletion and to support data in a low-dimensional subspace. We admit that the lack of experimental results is a weakness of this series of works, including ours. We will try to include more experimental results in the revised version, but we believe that to fully fill in the lack of experimental results in this line of work would require more substantial effort in the future. Thank you for pointing this out.

We once again provide our sincere thanks for your insightful comments. For the first and the second weaknesses, we have adjusted our manuscript accordingly to make them more clear. We uploaded a new version of our paper and marked our changes in red. For the third weakness, we agree with the general lack of experimental results in the line of works related to ours and will try to include more experimental results in the final version of our paper to fill this gap.

[1] Josh Alman, Timothy Chu, Aaron Schild, and Zhao Song. "Algorithms and hardness for linear algebra on geometric graphs." FOCS’20.

[2] Charles A Micchelli. “Interpolation of scattered data: distance matrices and conditionally positive definite functions.” Springer Netherlands, 1984.

[3] Per-Gunnar Martinsson. Encyclopedia entry on fast multipole methods. In University of Colorado at Boulder, 2012.

[4] Yeshwanth Cherapanamjeri, and Jelani Nelson. "Uniform approximations for randomized hadamard transforms with applications." STOC’22.

审稿意见

评分: 6置信度: 32024-11-04

This paper presents a dynamic data structure for maintaining the Fast Gaussian Transform (FGT) under insertions and deletions of source points. The main contribution is an algorithm that supports polylogarithmic-time updates and kernel density estimation queries when the data points lie in a low-dimensional subspace while retaining quasi-linear time for matrix-vector multiplication. The key technical innovation is a hybrid approach combining Taylor and Hermite expansions to "disentangle" source and target points, enabling efficient dynamic updates.

优点

This paper has a strong theoretical contribution by solving an open problem of dynamizing the FGT algorithm, which has applications in machine learning and data analysis.
This paper includes a novel technical approach combining Taylor and Hermite expansions to enable efficient updates, with careful error analysis and proofs.
This paper improves complexity bounds when data lies in low-dimensional subspaces, addressing the curse of dimensionality issue in previous FGT work.

缺点

This paper has limited experimental evaluations, the paper focuses primarily on theoretical analysis without empirical validation of real datasets.
The assumptions of data points lie in a low-dimensional subspace, which may not hold in practice for many applications.
The paper has limited discussion of numerical stability issues that may arise in practice when computing Taylor/Hermite expansions.

问题

Please refer to the weakness.

伦理问题详情

N/A

评论- Thank you very much for your insightful comments!

2024-11-22

We express our deepest gratitude to the reviewer for the time and effort in reviewing our work. It is very encouraging for us to see the evaluation of our work as having a “strong theoretical contribution by solving an open problem of dynamizing the FGT algorithm”, being applicable to other machine learning and data analysis fields, and possessing novel theoretical approaches. Below, we want to address your concerns about our work.

Thank you for raising important points about the numerical stability and the experimental evaluations. We regretfully do not have more experimental results at this point. However, we also want to highlight that the works we build upon [1, 2] are completely theoretical works. Our focus in this paper remains on developing the theoretical side of this line of work, extending the landscape of the fast Multipole method in a dynamic setting to support data insertion and deletion and to support data in a low-dimensional subspace. We admit that the lack of experimental results is a weakness of this series of works, including ours. We will try to include more experimental results in the revised version, but we believe that to fully fill in the lack of experimental results in this line of work would require more substantial effort in the future.

Regarding the assumptions that data points lie in a low-dimensional subspace, we first thank you for pointing this out. We agree that the dimension of many data points is usually high, especially in our big data era. However, we believe that there are many modern deep learning architectures, used in various fields, that explicitly project high-dimensional data into lower-dimensional spaces, which may alleviate the assumption we have. Dimension reduction involves two main strategies: feature selection which focuses on selecting a subset of the most relevant features from the original dataset to lower dimensionality while preserving interpretability; and feature extraction which creates new, lower-dimensional features from the original data to encapsulate key patterns and relationships. For example,

$\bullet$ Autoencoders consist of an encoder and a decoder, where the encoder maps high dimensional data points to low dimensional and the decoder may reconstruct the original high dimensional data [3].

$\bullet$ Principal component analysis projects data into a lower-dimensional space by maximizing variance along new axes [4].

There are numerous other dimension reduction techniques which can be used, and we believe that these methods may make our method more applicable.

We thank you again for your insightful comments. We agree with the general lack of experimental results in the line of works related to ours and will try to include more experimental results in the final version of our paper to fill this gap. Regarding the assumption we use, we hope that the use of the dimension reduction techniques may address your concerns.

[1] Yeshwanth Cherapanamjeri, and Jelani Nelson. "Uniform approximations for randomized hadamard transforms with applications." STOC’22.

[2] Josh Alman, Timothy Chu, Aaron Schild, and Zhao Song. "Algorithms and hardness for linear algebra on geometric graphs." FOCS’20.

[3] Kamal Berahmand, Fatemeh Daneshfar, Elaheh Sadat Salehi, Yuefeng Li, and Yue Xu. "Autoencoders and their applications in machine learning: a survey." Artificial Intelligence Review 57, no. 2 (2024): 28.

[4] Feiping Nie, Jianjun Yuan, and Heng Huang. "Optimal mean robust principal component analysis." ICML’14.

2024-11-27

Thank you for your considerate reply to my feedback. Your intention to include findings and deeper analysis in the iteration is encouraging. I am excited to witness these enhancements as they are sure to bolster the quality of the project. I prefer keeping the current evaluation until the results from the experiments are included.

AC 元评审

2024-12-20

The paper address the problem of kernel computation in sub quadratic time via fast Gaussian transofm on Matrix-Vec computation. The proposed method incorporate dynamization where the data source can be added and deleted, while the dimension of the query points can also vary. Despite the interesting setting, from the discussion, the novelty of the paper is limited to appear in the top conference such as ICLR.

审稿人讨论附加意见

The reviewers' concerns are partially addressed.

最终决定Reject

2025-01-22

Reject