PaperHub
5.4
/10
Poster5 位审稿人
最低3最高8标准差1.6
5
3
6
8
5
3.2
置信度
正确性3.0
贡献度2.4
表达2.8
ICLR 2025

Singular Subspace Perturbation Bounds via Rectangular Random Matrix Diffusions

OpenReviewPDF
提交: 2024-09-27更新: 2025-03-25

摘要

Given a matrix $A \in \mathbb{R}^{m\times d}$ with singular values $\sigma_1\geq \cdots \geq \sigma_d$, and a random matrix $G \in \mathbb{R}^{m\times d}$ with iid $N(0,T)$ entries for some $T>0$, we derive new bounds on the Frobenius distance between subspaces spanned by the top-$k$ (right) singular vectors of $A$ and $A+G$. This problem arises in numerous applications in statistics where a data matrix may be corrupted by Gaussian noise, and in the analysis of the Gaussian mechanism in differential privacy, where Gaussian noise is added to data to preserve private information. We show that, for matrices $A$ where the gaps in the top-$k$ singular values are roughly $\Omega(\sigma_k-\sigma_{k+1})$ the expected Frobenius distance between the subspaces is $\tilde{O}(\frac{\sqrt{d}}{\sigma_k-\sigma_{k+1}} \times \sqrt{T})$, improving on previous bounds by a factor of $\frac{\sqrt{m}}{\sqrt{d}}$. To obtain our bounds we view the perturbation to the singular vectors as a diffusion process-- the Dyson-Bessel process-- and use tools from stochastic calculus to track the evolution of the subspace spanned by the top-$k$ singular vectors, which may be of independent interest.
关键词
Matrix Perturbation BoundsMatrix DiffusionsLow-rank ApproximationDifferential Privacy

评审与讨论

审稿意见
5

The paper studies a very interesting problem, namely that of bounding the Frobenius norm of the subspaces spanned by the top-k singular vectors of any fixed matrix A and A+G. The paper is well structured and reads well, but has very few thematic overlaps with the ICLR.

I am rejecting this paper for two reasons. First, the authors used the wrong template with significantly wider margins than the standard ICLR template. While this is nitpicking, choosing the right template comes with no overhead but is critical to maintain fair standards on the length of the paper among all submissions.

The second reason is that the paper targets a problem in random matrix theory, which would fit well to a conference like ALT or COLT or a mathematical journal. In order to submit this paper to ICLR, I would expect at least one longer section highlighting the connections to machine learning and demonstrate how these bounds help us to arrive at non-trivial novel insights directly relevant for machine learning.

优点

I find the topic very interesting. Proving these results is non-trivial and insightful.

缺点

问题

伦理问题详情

审稿意见
3

This paper considers estimation of the right singular subspace of a matrix ARm×dA\in\mathbb{R}^{m\times d} when the matrix is perturbed by a Gaussian random matrix. The authors focused on the case d>md>m and derived a perturbation Frobenius bound independent of mm.

优点

  • The paper is easy to read.
  • The problem considered is important and has wide applications to many machine learning problems.

缺点

The main weakness of this paper is that the question posed in the Introduction has already been addressed in the existing literature. The prior works have already proposed minimax optimal subspace estimators that attain tighter bounds than the one derived in this paper. Moreover, the required signal-to-noise condition in this paper is also more stringent than those in the prior works.

In [1,2,3], the authors proposed left singular subspace estimators of a matrix ARd1×d2A\in\mathbb{R}^{d_1\times d_2}, which achieve the statistically optimal estimation bounds (in both spectral norm and (2,)({2,\infty}) bound) that only depend on d1d_1 when the noise variance σ2\sigma^2 is small. In addition, it has been shown that when σ2\sigma^2 is large, it is information-theoretically impossible to obtain an estimation bound that is independent of d2d_2.

By transposing the target matrix and converting an (2,)({2,\infty}) bound to a Frobiuns bound (multiplying by d\sqrt{d}), the estimation guarantee for the right singular subspace in terms of the Frobenius norm follows directly from these prior results. Further, the signal-to-noise ratio condition in this paper T/(σ1σi+1)=O~(1/m)\sqrt{T}/(\sigma_1-\sigma_{i+1}) = \tilde{O}(1/\sqrt{m}) (Assumption 2.1) is much stringent than the conditions required in these existing works (T/(σ1σi+1)=O~(1/d1/(md)1/4)\sqrt{T}/(\sigma_1-\sigma_{i+1}) = \tilde{O}(1/\sqrt{d} \wedge 1/(md)^{1/4})). Moreover, the estimation guarantees in the prior work are established in high probability while this paper only establishes it in expectation.

In short, under less stringent signal-to-noise ratio conditions, prior works have already achieved statistically better bounds (which are indeed minimax optimal up to log factors) for the singular subspace than the bound obtained in this paper.

Therefore, the intellectual contribution of this paper is limited. Given the estimation error in the existing works is already minimax optimal and the required SNR condition therein is shown to be necessary to ensure consistent estimation, one approach to improve the contribution might be removing the log factors in the error bounds.

[1] "Subspace Estimation from Unbalanced and Incomplete Data Matrices: 2,\ell_{2,\infty} Statistical Guarantees," Annals of Statistics, 944-967, 2021 [2] "Inference for heteroskedastic PCA with missing data," Annals of Statistics, 729-756, 2024 [3] "Deflated HeteroPCA: Overcoming the Curse of Ill-Conditioning in Heteroskedastic PCA," Annals of Statistics, to appear

问题

N/A

审稿意见
6

This paper studies the Frobenius norm distance between the projection matrix to the top rank-k row space of A and the projection matrix to the top rank-k row space of A + G where G is a random Gaussian matrix and A is a tall n x d matrix, i.e., n >> d. The previous result by O’Rourke et al. (2023) shows that if one considers both distances between the projection matrices corresponding to column space and row space, then they can get a tight result. In this work, authors gave an improved bound when considering the projection matrices to row spaces only.

优点

  1. The main proof technique requires a clever combination of many building blocks: They firstly regard the added Gaussian noise G as a Brownian motion. This Brownian motion induces a stochastic diffusion process on the singular values and singular vectors. The evolution of these eigenvalues and eigenvectors is determined by a system of stochastic differential equations. Then, one can apply Ito’s lemma from stochastic calculus to track and analyze the Frobenius norm distances between the top-k projection matrices. It requires solid work to make the entire theoretical analysis work.

  2. The problem is very fundamental, and the improvement of the bound is by a factor of sqrt(n / d), when n >> d, this improvement is very significant.

缺点

  1. As mentioned in the conclusion, the result needs to assume a reasonable top-k singular value gap of A.

  2. A bigger concern would be the interest of the audience of the venue. Although there are some potential applications of the derived bound for handling data corruption / noise, or differential privacy (Gaussian mechanism), there is no concrete application discussed in the paper.

问题

In my opinion, the proposed bound may be used to improve the tradeoffs between privacy and accuracy of some differentially private algorithms in numerical linear algebra. Can authors show some with provable guarantees? Or can authors give any concrete applications of the improved bound in the machine learning literature?

审稿意见
8

This paper presents new bounds on the Frobenius norm of the perturbation to the singular subspace spanned by the top-k right singular vectors of a matrix ARm×dA \in \mathbb{R}^{m \times d}, when AA is perturbed by a matrix GRm×dG \in \mathbb{R}^{m \times d} with Gaussian N(0,T)\mathcal{N}(0,T) random entries. More specifically, assuming AA has a top-k singular value gap of at least Ω(mT)\Omega(\sqrt{mT}) (i.e. σiσi+1Ω(mT)\sigma_i-\sigma_{i+1} \geq \Omega(\sqrt{mT}) for all i[k]i \in [k]), the paper bounds the expected Frobenius norm distance between the projection matrices on the top kk right singular subspaces of AA and A+GA+G by roughly O(kdTσkσk+1)O( \frac{ \sqrt{kdT} }{ \sigma_k - \sigma_{k+1} }) where σ1σd0\sigma_1 \geq \ldots \geq \sigma_d \geq 0 are the singular values of AA. When σiσi+1Ω(σkσk+1) \sigma_i - \sigma_{i+1} \geq \Omega( \sigma_k -\sigma_{k+1} ) for i[k]i \in [k] is also true, the bound is O(dTσkσk+1)O(\frac{ \sqrt{dT} }{ \sigma_k - \sigma_{k+1} }) which improves upon previously known high probability bounds on the actual Frobenius norm distance by md\frac{\sqrt{m}}{\sqrt{d}}. The paper also improves previously known bounds for the rank-k covariance matrix approximation by a factor of md\frac{\sqrt{m}}{\sqrt{d}}.

To derive the bounds, the paper views the addition of a gaussian noise matrix to AA as a Brownian motion on the entries of a matrix Φ(t):=A+B(t)\Phi(t) := A+B(t) where B(t)B(t) is a matrix whose entries undergo standard Brownian motion. This Brownian motion induces a stochastic diffusion process, known as the Dyson-Bessel process on the singular values and singular vectors of Φ(t)\Phi(t). The evolution of the eigenvalues and eigenvectors of of this matrix is determined by a system of stochastic differential equations. Using techniques from stochastic calculus (Ito's lemma), the evolution of the Frobenius distance can be tracked as a stochastic integral of a sum-of-squares of perturbations to the right singular vectors of Φ(t)\Phi(t). Overall, higher order matrix derivatives which appear in the Taylor series expansion of deterministic perturbations vanish in the stochastic case due to the independence of random noise in the Brownian motion and this gives the stronger bounds.

优点

Though someone with a more thorough grasp of stochastic calculus would be much better suited to comment on the merit of the exact proof technique used in the paper, I found the overall technique of viewing the noise addition as Brownian motion and then using stochastic differential equations to track the eigenspace evolution quite novel.

The bounds on the Frobenius norm distance on the right singular subspace presented have an improved dependence of d\sqrt{d} instead of m\sqrt{m} and in many applications where we have m>>dm>>d, I think the bounds could be useful (when there are large enough singular value gaps).

That being said, I have a few concerns about the paper which I list below.

缺点

Though this is not a weakness per se, I'm not sure how suited this paper is for ICLR which is primarily an applied ML venue. I feel at least a core theoretical venue like COLT or maybe a mathematics journal is better suited for this kind of work. That being said, if the authors are able to show a concrete ML related application of the utility of their bounds, that would make the paper a stronger contender for publication at ICLR and also improve the overall quality of the paper. For example, as mentioned in the paper, in differential privacy, gaussian noise is added to the data matrix to preserve privacy (Dwork et al 2014). Can the authors try to use their bounds to see if they can get any improved bounds in this case?

Also, the bounds in previous papers like O'Rourke et al. 2023 are high probability bounds while here, the bounds are just on the expected Forbenius norm. How difficult would it be to use the same techniques to get high probability bounds?

I also have a couple of questions about comparison to previous work (specially to O'Rourke et al. 2023) which I list below in the questions section.

问题

I'm confused about the claim in the paper that Theorem 7 of O'Rourke et al. 2023 achieves a bound of O(mk/σk)O(\sqrt{mk}/\sigma_k). It seems that in Theorem 7 they assume that σiσi+1r2\sigma_i - \sigma_{i+1} \geq r^2 for i[k] i \in [k] (where rr is the rank of A). I maybe misunderstanding something but it seems that eq 8 of Theorem 7 then gives us an upper bound of kr+E2σkkr+mσk\frac{\sqrt{k}}{r}+ \frac{|| E ||_2}{\sigma_k} \leq \frac{\sqrt{k}}{r} +\frac{\sqrt{m}}{\sigma_k} ? Can the authors please clarify this?

Can the authors also discuss the assumptions used in O'Rourke et al. 2023 or other previous works to derive the bounds and how it compares to their own on the singular values gaps i.e. σiσi+1Ω(σkσk+1) \sigma_i - \sigma_{i+1} \geq \Omega( \sigma_k -\sigma_{k+1} )? This would be helpful in doing a more fair comparison of the bounds.

Though this is a very interesting and can definitely be published at ICLR, I would request other reviewers and the area chair to the issues I have raised above. I'm giving a marginal accept. Assuming the points I have raised above are resolved, I will be happy to give a stronger accept.

审稿意见
5

The paper provides a new upper-bound for the right singular vectors of a perturbed matrix by Gaussian noise. The authors show that under a set of extra assumptions compared to the literature, we are able to further reduce the bound for tall matrices. The authors used the same technique as Manghoubi et al. 2022 for using Dyson Brownian motions to analyze the singular values and vectors of the perturbed matrix.

优点

The paper is concise and with a well-defined objective of finding the upper-bound on the error of singular vectors of perturbed matrices. The strength of the paper is therefore on the theoretical results.

缺点

Although I enjoyed reading the manuscript, I should mentioned as an ICLR submission, it is strangely formed. It is a very short introduction on the problem and the objective of the paper is not well-motivated. Furthermore, although abstractly mentioned that the results can be used in differentially private subspace reconstruction, there is neither further analysis on that (which is fairly easy to do) nor any experiments, including experiments that bridge the work to the machine learning / privacy area.

Furthermore, reading the paper and the abstract, it appears that the manuscript is written in a rush and with a variety of notational issues, brevity of explanations, etc. I mention such issues in Questions section.

Finally, although the manuscript uses a similar approach as Manghoubi et al. 2022, it appears that they achieve a tighter bounds. This would ask for further missing explanation on what is the novelty of this paper compared to the literature and how is that making this new bound possible. In the next step, the authors are expected to show that how this reduction of the bound is not an incremental result and how it improves the methods that need such upper-bounds (e.g., DP-subspace-recovery) in real applications.

问题

1- Could you please motivate the theoretical result with some examples, even simply using toy datasets on differential privacy?

2- Could you run a set of experiments on some datasets or even synthetic matrices that show the upper-bound and compares it to the real error?

2- Could you further motivate why tall matrices are of interest in this particular setting?


Technical Comments:

1- In Line 144, the authors referred to Appendix J of. Manghoubi et al. to justify the assumption of exponential decay. I have checked and their manuscript has no Appendix J. Could you clearly mention their result in the manuscript and correctly refer to their work?

2- Line 191 has no period in the end.

3- In Line 204, should it be γi=0\gamma_i=0 for i>ki>k?

4- In scientific writing, when we use a certain theorem, equation, etc. we refer to them as first name and last name and therefore capitalize "Theorem" and "Equation". Could you further edit this on the manuscript in Line 292, 343, 355, etc.

5- The factor 64 and 32 in Eq. (14) is missing in Eq. (12).

6- In Line 393, the authors write "Noting that the second term on the right-hand side of (12) is at least as small as the first term". Could you explain why this is the case?

7- In Line 595 please explain why dvidviT\mathrm{d} v_i\mathrm{d} v_i^T is neglected.

8- Line 649 is not clear to me and the identities mentioned in 658-661 is not helping either. Please rewrite this section of the proof and explain why these identites and inequalities hold.

9- The proof of Lemma A.3, although seems very rudimentary and practical for the rest of the proofs, it is not clearly written. If we assume σi(t)σiσi+G2|\sigma_i(t)-\sigma_i|\leq \sigma_i + \|G\|_2, then the triangle inequality concludes σi(t)σj(t)=σi(t)σi+σjσj(t)+σiσjσiσjσi(t)σi+σjσj(t)+σiσiσjσiσj4mlog1/δ|\sigma_i(t)-\sigma_j(t)|=|\sigma_i(t)-\sigma_i +\sigma_j-\sigma_j(t) +\sigma_i - \sigma_j|\geq |\sigma_i - \sigma_j| - |\sigma_i(t)-\sigma_i +\sigma_j-\sigma_j(t) +\sigma_i | \geq |\sigma_i - \sigma_j| - \sigma_i-\sigma_j-4\sqrt{m}\log 1/\delta, which is not what the authors claim. From what I understand the singular values of a perturbed matrix has the error bounded by the norm of that matrix. Therefore, σi(t)σiG2|\sigma_i(t)-\sigma_i|\leq \|G\|_2 instead of σi(t)σiσi+G2|\sigma_i(t)-\sigma_i|\leq \sigma_i + \|G\|_2? Also, how does TT not appear in the inequality?

10- In Line 698, factor 2 is not needed since σi(t)0\sigma_i(t)\geq 0 and a2+b2a+b\sqrt{a^2+b^2}\leq a+b?

11- In 721 and 725, why there is factor 3? Frobenius is a norm and therefore has triangle inequality.

12- In Line 752, please either stick with the notation AA^T or instead use \langle\rangle instead of <><>.

13- From where does factor 2 in 772 appears?

14- In 775, should we have 4dt4\mathrm{d}t?

AC 元评审

This paper provides new matrix perturbation bounds for the top singular vectors of a matrix A under Gaussian perturbations. The authors improve prior results in the setting when the matrix is much taller than it is wide (a common setting in data applications). Most of the reviewers (besides one outlier) felt that the work was interesting and addresses a fundamental question. There was a broad concern about fit to ICLR given the lack of applications -- the work might find a better home in a theory-focused conference. To their credit, the authors added a section during the rebuttal phase on an application to differential privacy (basically a direct corollary of their result is they can bound the accuracy of the top subspace of a data matrix perturbed using the common Gaussian mechanism. Nevertheless, on balance, this paper seems like a nice contribution to include in ICLR.

审稿人讨论附加意见

During the rebuttal phase, the authors responded to some seeming misunderstandings by Reviewer Bu3u. While the reviewer did not follow-up on the final response, due to the authors response, I am discounting the low score of this reviewer in my decision, which I believe is based on an incorrect belief on how the authors results relate to prior work.

最终决定

Accept (Poster)