Outlier-Robust Phase Retrieval in Nearly-Linear Time
We provide a robust algorithm for the phase retrieval problem when a small fraction of the input is adversarially corrupted.
摘要
评审与讨论
This paper presents a computationally efficient, nearly linear-time algorithm for solving the real-valued robust phase recovery problem, which aims to reconstruct a signal from phaseless measurements corrupted by adversarial outliers. The proposed method consists of two key stages: (1) a robust spectral initialization via a lightweight convex program to obtain a rough estimate near the ground truth, followed by (2) a refinement step using a robust gradient descent variant to achieve exact recovery within specified accuracy. The authors demonstrate that their approach is both near-sample-optimal and computationally efficient, with potential applicability to other non-convex optimization problems. While the paper provides theoretical analysis of the algorithm's superiority, some concerns were raised about insufficient mathematical rigor (e.g., unproven lemmas) and lack of numerical simulations, casting doubt on whether it currently meets NeurIPS publication standards. The framework is noted for its conceptual simplicity and generalizability, but may require additional theoretical and empirical validation.
优缺点分析
Strengths: Compared with existing works, the main innovations of this article are as follows:
- Phase retrieval is a well-studied non-convex problem, and analyzing the performance of robust algorithms on it remains crucial.
- This article proposes a new low-complexity algorithm for robust phase recovery through robust spectral initialization and robust gradient descent.
- This article demonstrates that the proposed algorithm can approximately solve the robust phase recovery problem within a linear time of the input size, under any given accuracy condition.
Weaknesses:
- The paper does not explicitly point out any difficulties in terms of theory or technology that the theoretical analysis and algorithm derivation of the proposed algorithm compared with the existing works.
- This paper lacks numerical simulation experiments, which prevents it from demonstrating the superiority of the proposed algorithm compared to other algorithms.
- The descriptions of some lemmas in this paper are rather vague, for example: 3.1. The description of the parameter in Lemma 3.1 as "sufficiently small" 3.2. The description of the parameter in Lemma 4.1 is insufficient 3.3. The constant corresponding to in Lemma 5.1 is not given
- The proofs of some theorems in this article are not rigorous, for example: 4.1. In the proof of Lemma 3.1, on line 254 of the formula, the subtraction of two terms results in , which is incorrect 4.2. In the proof of Lemma 5.2, on line 300, the second inequality selects , which may not satisfy the condition
问题
- Why not conduct numerical simulation to compare with the existing algorithms and verify the impact of different attack strategies on the proposed algorithm?
- Why does the article extensively use Big O notation instead of accurately writing out the specific constants?
- Why is the step size of gradient descent in Algorithm 2 set to ?
- In line 127, "the corrupted samples can only add directions to Yw but cannot remove directions." What is the meaning of "direction"? Why do corrupted samples add directions to Y_{w}?
- Why do you compute w that minimizes the sum of the largest "two" eigenvalues of Y_{w}?
局限性
This article is not only not clearly expressed in theory, but also lacks numerical simulation experiments.
最终评判理由
I have carefully reviewed the authors’ replies, and I would like to retain my original score.
格式问题
No
We thank the reviewer for their time spent reading our work.
Reviewer: The paper does not explicitly point out any difficulties in terms of theory or technology that the theoretical analysis and algorithm derivation of the proposed algorithm compared with the existing works.
Authors: A key technical contribution of our paper is the robust spectral initialization step, which differs significantly from prior work. Specifically, we design a nonnegative reweighting scheme that ensures the empirical covariance matrix is robust to outliers without explicitly computing the covariance matrix, which would require matrix multiplication time . Instead, we introduce a new Ky Fan norm formulation and show that good weights can be computed in nearly-linear time . Note that concurrent work [BR25] (as discussed in Lines 69–71) performs robust spectral initialization in a different way but is much slower than linear time.
Reviewer: This paper lacks numerical simulation experiments, which prevents it from demonstrating the superiority of the proposed algorithm compared to other algorithms.
Authors: We believe our results are substantial even without experiments, and we hope that our theoretical results are judged based on their merits.
Reviewer: The descriptions of some lemmas in this paper are rather vague, for example: 3.1. The description of the parameter in Lemma 3.1 as "sufficiently small" 3.2. The description of the parameter in Lemma 4.1 is insufficient 3.3. The constant corresponding to in Lemma 5.1 is not given.
Authors: We respectfully disagree. Our theorems, lemmas, and proofs are rigorous. A key point is that many constants appear throughout our analysis, and explicitly tracking each of them would introduce notational overhead without providing additional insight. Omitting explicit constants is standard practice in the theory community. As a concrete example, in Appendix B where we prove Lemma 5.1, the constant can be written explicitly as . While it is possible to include this constant in Lemma 5.1 and substitute it into other parts of our proofs, doing so would make the presentation unnecessarily cumbersome without affecting the correctness of our results.
Reviewer: The proofs of some theorems in this article are not rigorous (4.1 - Example in line 254; 4-2, Proof of Lemma 5.2)
Authors: Thank you for pointing this out. The statement can indeed be made more precise here. In Lemma 4.1, we should have stated the inequalities of Line 248 using a value u: e.g., the top eigenvalue is between and for some , and complete the rest of the proof using .
We should have stated in Line 254 that . We will revise the paper accordingly.
Reviewer: Why is the step size of GD in algorithm 2 set to 1/300?
Authors: Our choice of step size is close to the best-possible constant to minimize the expression in Line 654 when .
Reviewer: In line 127, "the corrupted samples can only add directions to Yw but cannot remove directions." What is the meaning of "direction"? Why do corrupted samples add directions to Y_{w}?
Authors: This was explained in Lines 122-126. When there is no corruption, we have , which is a matrix with one eigenvalue (in the direction of ) and all other eigenvalues . Corrupted samples may perturb this spectrum, but since each term is a positive semidefinite matrix, they cannot reduce variance in any direction (that is, the corrupted samples cannot decrease for any direction ). We will clarify this.
Reviewer: Why do you compute w that minimizes the sum of the largest "two" eigenvalues of Y_{w}?
Authors: Suppose the adversarial corrupts to become for some arbitrary irrelevant direction . The largest eigenvalue of is close to , which is the same as what we expect when there is no corruption, but the top eigenvector of is not a good initialization for converging to the ground truth .
Thank you for the response. I would like to retain my original score.
This paper studies phase retrieval under the strong contamination model, where an adversary can arbitrarily replace an -fraction of measurements and the corresponding measurement vectors . It proposes suitable modifications to the standard 2-step approach (spectral initialization followed by gradient descent) that lead to an outlier-robust algorithm with near-optimal runtime and sample complexity , where is the target estimation accuracy. The first stage -- robust spectral initilization -- reduces to finding per-sample weights such that
by means of solving a generalized packing SDP proposed in prior work. The latter stage -- robust gradient descent -- leverages the fact that it is sufficient to obtain an estimate of the gradient, , satisfying the aiming inequality
in a neighborhood of the solution set.
优缺点分析
Strengths:
- The overall algorithm follows the standard 2-step scheme and is simple to state (although the solver for the generalized packing SDP might be complicated).
- The runtime is near-optimal and the sample complexity is linear in for moderate estimation accuracies .
Weaknesses:
- Real gaussian measurement vectors are somewhat unrealistic in phase retrieval.
- The algorithm requires a fresh batch of samples in each iteration.
- I am not sure if the strong contamination model is the right way to study corruptions to the measurement vectors (other than, of course, adversarial tampering). For example, neither hardware noise nor miscalibration seem to require an adversary that is allowed to inspect all inputs before deciding which ones to corrupt (and how to corrupt them).
问题
My main questions are the following:
- Appendix C shows that prior "robust" approaches -- focusing on corruption solely in the measurement space -- fail when measurement vectors can be corrupted. However, my understanding is that the argument focuses on the failure of spectral initialization. Is it possible that, given a sufficiently good initial estimate, prior methods (such as optimizing the robust loss a-la Duchi-Ruan) succeed in recovering ? In my opinion, a formal argument ruling this out would considerably strengthen this paper.
- Is it possible to remove the sample-splitting requirement by following an approach similar to the SEVER algorithm (https://proceedings.mlr.press/v97/diakonikolas19a.html), wherein samples are used to certify first-order stationarity and discarded iteratively based on outlier scores? See, also, the recent work by Gao et al. (https://arxiv.org/abs/2412.11003).
Other minor questions:
- What is the relationship between in Lemma 4.1 and the upper bound on the contamination level?
- How much of the overall analysis depends on Gaussianity? I suspect the robust gradient descent analysis can be easily extended.
- Can you expand on how you anticipate attacking the sample-splitting requirement? I suspect that verifying stability conditions without fresh samples is very difficult in general settings.
- Is there a weaker notion of adversary suitable for some of the corruptions you describe in the introduction (hardware noise + miscalibration)?
局限性
Yes
最终评判理由
I did not have any major reservations about this paper, so I am maintaining my original score.
格式问题
None
We thank the reviewer for taking the time to read our work and provide their feedback.
Reviewer: Is it possible to remove the sample-splitting requirement by following an approach similar to the SEVER algorithm (https://proceedings.mlr.press/v97/diakonikolas19a.html), Can you expand on how you anticipate attacking the sample-splitting requirement? I suspect that verifying stability conditions without fresh samples is very difficult in general settings.
Authors: This is a good observation. We believe that a covering argument on the stability condition would allow us to remove the sample-splitting (and thus the dependency on ). At a high level, proving an -net plus union bound argument for the stability conditions (i.e., stability of the mean and covariance of individual gradients) to hold uniformly for all is not straightforward. These conditions only hold when none of the 's are close to the current solution . Of course, if the 's are chosen after , this holds with high-probability, and this is why we use a fresh set of samples in each iteration. One possible idea is to consider an -net argument in regions far from all 's, and argue that gradient descent never gets close to any of the 's. We defer this to future work, as our main technical contribution is in the robust initialization step (via the Ky Fan SDP solver).
Reviewer: What is the relationship between in Lemma 4.1 and the upper bound on the contamination level?
Authors: Thank you for asking this. We need Lemma 4.1 to hold for our choice of (from Line 256), and the simplest way to do this is to require . We will clarify this.
Reviewer: Is there a weaker notion of adversary suitable for some of the corruptions you describe in the introduction (hardware noise + miscalibration)?
Authors: We welcome the study of other adversarial models. We want to emphasize that there is value in studying phase retrieval in the strong contamination model we consider in this paper. Our results show that even against such a strong adversary, phase retrieval can be solved with only logarithmic overhead in both sample and computational complexity.
Reviewer: Appendix C shows that prior "robust" approaches -- focusing on corruption solely in the measurement space -- fail when measurement vectors can be corrupted. However, my understanding is that the argument focuses on the failure of spectral initialization. Is it possible that, given a sufficiently good initial estimate, prior methods (such as optimizing the robust loss a-la Duchi-Ruan) succeed in recovering ? In my opinion, a formal argument ruling this out would considerably strengthen this paper.
Authors: We currently do not have such a formal impossibility result. We agree that proving this formally would strengthen the paper and view it as a promising direction for future work.
Reviewer: How much of the overall analysis depends on Gaussianity? I suspect the robust gradient descent analysis can be easily extended.
Authors: The conditions we require on the (clean) distribution of are: an upper bound on its tail (used in the proof of Lemma 4.2) and bounded first few moments (used in the proof of Lemma 5.1). Therefore, our results directly extend to isotropic sub-Gaussian distributions and other distributions with these properties.
Thank you very much for your response. I encourage you to consider alternative contamination models and/or extend your impossibility result, if possible. I will maintain my original score.
This article proposes a two-stage algorithm for the real-valued phase retrieval problem, with a small fraction of measurement pair corrupted. The algorithm first estimates a high-quality initial point through a spectral-based convex optimization that adaptively weights corrupted sample matrices to approximate the ideal matrix. This initialization is then refined through corruption-resistant gradient descent using robust mean estimation. The paper claims its near-optimal sample complexity and near-linear runtime with theoretical guarantees.
优缺点分析
Strength:
-
Paper is well-written and provides a clear picture of the setting of the problem
-
Phase retrieval is a non-convex problem with important applications, studying its robustness certainly is an important
-
the algorithm is weill supported by the theoretical analysis.
Weakness
-
I do not believe the problem setting studied in this paper has genuine practical value. In phase retrieval, if noise is considered, then all pairs will be corrupted. If outliers are considered, they would correspond to measurements with very large noise levels. The setting proposed in the paper does not make sense in real-world scenarios.
-
The analysis appears to be more incremental in nature compared to prior theoretical works on phase retrieval. Specifically, the main novelty of the spectral initialization step lies in assigning a nonnegative weight to each sample. For the gradient descent step, the problem is essentially reduced to the analysis of robust mean estimation algorithms.
-
Although the authors claim it is a theoretical paper, the work essentially proposes an algorithm with theoretical justification. The technique in the theoretical derivations themselves have no novelty. Since the primary objective of the paper is to develop a robust algorithm, it is quite surprising that no experimental evaluation is provided to assess its performance or efficiency. For an algorithm study paper, the absence of empirical validation is unacceptable.
This paper is a resubmission of the work originally submitted to NeurIPS 2024. One of the main concerns previously raised was the lack of numerical experiments; however, this revision fails to address that issue entirely.
问题
- The main contribution claimed is the near-linear runtime. However, the analysis about lemma 4.1 and 5.2 shows that the running time depends on the reciprocal of epsilon and desired precision. Since they are rather small, it's unclear whether the method is more efficient over other alternatives. How its practical runtime compared with similar approaches (e.g., [3])?
局限性
No limitation is discussed.
最终评判理由
I remain unconvinced by the submission. The rebuttal was not informative and failed to meaningfully address the core concerns raised in the initial review. In particular, the key issues regarding the practical relevance of the proposed method and the lack of necessary experimental evidence remain unresolved. Moreover, they appear reluctant to acknowledge these limitations or outline concrete plans to address them in future work. This lack of engagement further undermines confidence in the contribution.
As such, the work falls short in both depth and rigor, and the central claims remain insufficiently supported. I therefore downgrade my rating to a rejection.
格式问题
NIL
We thank the reviewer for their time and thoughtful comments.
Reviewer: Although the authors claim it is a theoretical paper, the work essentially proposes an algorithm with theoretical justification. The techniques in the theoretical derivations themselves have no novelty.
Authors: We respectfully disagree. Our contribution is both algorithmic and theoretical, and the techniques we introduce are novel in the context of robust phase retrieval. In particular, our main result establishes that one can solve phase retrieval with arbitrary corruption on the pairs in nearly-linear time and with near-optimal sample complexity. Prior work typically considers simpler corruption models, such as adversarial perturbations limited to the intensity measurements . Handling corruptions in both presents new challenges, which we address through novel techniques. A key technical contribution of our paper is the robust spectral initialization step, which differs significantly from prior work. Specifically, we design a nonnegative reweighting scheme that ensures the empirical covariance matrix is robust to outliers without explicitly computing the covariance matrix, which would require matrix multiplication time . Instead, we introduce a new Ky Fan norm formulation and show that good weights can be computed in nearly-linear time . Note that concurrent work [BR25] (as discussed in Lines 69–71) performs robust spectral initialization in a different way but is much slower than linear time. To our knowledge, no prior work achieves this level of robustness and efficiency; we would welcome the reviewer pointing us to any such existing result.
Reviewer: The analysis appears to be more incremental in nature compared to prior theoretical works on phase retrieval. Specifically, the main novelty of the spectral initialization step lies in assigning a nonnegative weight to each sample.
Authors: The main novelty is that we can compute good weights in nearly-linear time. We invite the reviewer to point to prior work that achieves this when there is corruption in both and . As noted above, our spectral initialization is not merely a reweighting heuristic. It is backed by a rigorous analysis showing how to select these weights efficiently via Ky Fan norm minimization, and how this leads to accurate recovery despite arbitrary corruptions. This represents a departure from existing techniques and is not a straightforward extension of prior methods.
Reviewer: This paper is a resubmission of the work originally submitted to NeurIPS 2024. One of the main concerns previously raised was the lack of numerical experiments; however, this revision fails to address that issue entirely.
Authors: We believe our results are substantial even without experiments, and we hope that our theoretical results are judged based on their merits.
That said, the current version has significantly improved over the NeurIPS24 submission: we have clarified the setting, strengthened the presentation of the results, and expanded the discussion of the techniques.
Reviewer : I do not believe the problem setting studied in this paper has genuine practical value. In phase retrieval, if noise is considered, then all pairs will be corrupted. If outliers are considered, they would correspond to measurements with very large noise levels. The setting proposed in the paper does not make sense in real-world scenarios.
Authors : We respectfully disagree. We welcome the study of other adversarial models. We want to emphasize that there is value in studying phase retrieval in the strong contamination model we consider in this paper. Our results show that even against such a strong adversary, phase retrieval can be solved with only logarithmic overhead in both sample and computational complexity. Extending our work to the setting with noisy observations is an exciting avenue of future work, where our nearly linear-time robust initialization step will remain useful.
The rebuttal is not informative. The main concerns about practical relevance and the absence of experiments remain unaddressed. The authors offer assertions without providing substantial supporting evidence.
This work tackles the challenge of recovering real-valued signals in the phase retrieval problem under adversarial corruption in possibly both sampling vectors and intensity measurements. The proposed method operates in two stages:
- It first uses a convex optimization procedure to obtain a corruption-resilient initialization.
- Then refines the estimate through gradient descent combined with a robust mean estimation technique.
A key result of the analysis is that the convergence rate is independent of the corruption level.
优缺点分析
The authors address an interesting problem that is relevant to both the robust statistics and signal processing communities. However, the paper is quite dense and can be difficult to follow at times. Given the theoretical nature of the contributions, it is especially important to maintain clearer bookkeeping of the key variables involved in the theoretical guarantees.
The central claim of the paper is that the results are independent of the corruption level . For instance, in lines 93–99, the authors assert that is an absolute constant, independent of both and . However, this assertion does not appear to be fully justified. A closer examination of the statement of Lemma 2.3 suggests otherwise.
- It is evident that the statement of Lemma 2.3 does not hold with high probability when is very small, for example, when . This issue can be addressed by assuming , allowing one to replace with . However, this introduces a practical challenge, as the value of is not explicitly specified.
- Even setting aside practical considerations, for the statement of Lemma 2.3 to hold with high probability, it is necessary that . This implies that cannot be any absolute constant independent of . Similar issues arise elsewhere - for example, in Lemma 5.2, where the choice of must depend on for the Lemma statement to hold.
The presentation of the results would benefit from explicitly retaining all occurrences of (or ) in the theoretical statements. Doing so would provide better context and help clarify the dependence of the guarantees on the corruption level.
问题
Please see previous sections.
局限性
yes
最终评判理由
After reading the rebuttal and subsequent discussions, I have decided to maintain my score.
Unresolved concerns: I remain unconvinced regarding the dependence between and .
格式问题
We thank the reviewer for their time spent reading and providing feedback on our work.
The reviewer’s main concern is that “the central claim is that the results of this paper are independent of ” is incorrect. We respectfully disagree. Our results indeed hold as stated, and a careful reading of our proofs confirms this.
Below, we address the reviewer's specific doubts in detail.
Reviewer: “It is evident Lemma 2.3 does not hold with high probability when is very small”
Authors: This is true. However, in our algorithm, we only need to robustly estimate the gradient up to constant precision (e.g., in Lemma 5.3), so we never invoke Lemma 2.3 with very small .
Reviewer: For the statement of Lemma 2.3 to hold with high probability , it is necessary . This implies that cannot be any absolute constant independent of .
Authors: An absolute constant is .
Reviewer: Similar issues arise elsewhere - for example, in Lemma 5.2, where the choice of m must depend on epsilon_0 for the Lemma 5.2 statement to hold.
Authors: There is nothing wrong with depending on . We only need a constant approximation when estimating the gradient (e.g., in Lemma 5.3). This leads to dependence in both sample complexity and runtime, but when .
Our paper provides detailed mathematical proofs for all theorems and lemmas. If there are concerns about correctness, we would appreciate it if the reviewer could point out specific incorrect steps.
Thank you for your response. Let me clarify my concerns.
When , Lemma 2.3 is only valid with a probability at least 0 and has no utility. At other places, you need to be sufficiently small (Lemma 3.1, 3.2, etc.). Therefore, it is important to explicitly show the dependence of and to ensure that both conditions can be satisfied simultaneously.
In Lemma 5.2, the only place where Lemma 2.3 is actually used, we invoke Lemma 2.3 with . Therefore, the reviewer's concern about very small (e.g., ) is irrelevant to our analysis, as we never use Lemma 2.3 in that regime.
In other parts of the paper (e.g., Lemmas 3.1 and 3.2), we only need to be at most a universal constant, which is independent of . We kindly ask the reviewer to point out a specific step in our proofs where must be at least some function of . To the best of our knowledge, no such dependence is required.
Dear Reviewers and Authors,
Thank you all for your efforts so far. As the author–reviewer discussion period will conclude on August 6, please start the discussion as soon as possible.
For Reviewers: Please read the authors’ responses and, if necessary, continue the discussion with them.
-
If your concerns have been addressed, consider updating your review and score accordingly.
-
If some concerns remain, or if you share concerns raised by other reviewers, clearly state these in your review and consider adjusting your review (positively or negatively).
-
If you feel that your concerns have not been addressed, you may also choose to keep your review as is.
-
I will follow up with you again during the reviewer–AC discussion period (August 7–13) to finalize the reviews and scores.
For Authors: If you have not already done so, please respond to all questions raised by the reviewers. Keep your responses factual, concise, and ensure that every point raised is addressed.
Best regards,
The AC
This paper proposes a nearly linear-time algorithm for robust phase recovery with a novel combination of spectral initialization and robust gradient descent, supported by theoretical analysis. The problem is important and the approach is conceptually simple and potentially generalizable, but the work currently falls short of NeurIPS standards due to presentation clarity, vague or incomplete proofs, and a lack of numerical experiments of comparison to substantiate the theoretical claims. Without stronger validation and clearer exposition, the contribution remains promising but not yet ready for publication.