PaperHub
8.2
/10
Spotlight4 位审稿人
最低4最高6标准差0.7
6
5
4
5
2.8
置信度
创新性2.5
质量3.0
清晰度3.3
重要性3.0
NeurIPS 2025

PCA++: How Uniformity Induces Robustness to Background Noise in Contrastive Learning

OpenReviewPDF
提交: 2025-05-12更新: 2025-10-29
TL;DR

This paper proposes PCA++, a uniformity-constrained contrastive PCA for robust signal recovery under strong background noise.

摘要

关键词
Contrastive LearningPrincipal Component AnalysisUniformity RegularizationSelf-Supervised LearningRepresentation Learning

评审与讨论

审稿意见
6

This paper proposes PCA++, a PCA algorithm for contrastive learning, which can robustly extract signal from background under a broad class of scenarios. This is achieved by enforcing feature uniformity via a hard constraint. A baseline PCA+ method is first proposed. Theoretical analysis of both PCA+ and PCA++ are provided, as well as a closed form solution. Numerical experiments verify the improved performance in contrastive learning tasks.

优缺点分析

Strength

  1. The proposed PCA++ framework addresses the challenge in contrastive learning in a robust manner with closed form.
  2. This paper is very technically solid, and has contribution in the theoretical analysis of high-dimensional data analysis.
  3. The insight on the relationship between uniformity and contrastive learning is valuable.
  4. Numerical experiments cover different aspects of the method and scenarios.

Weakness

  1. The assumptions 1-3 are somewhat restrictive, and may be unrealistic in real scenarios, in particular the orthogonality in Assumption 1. The authors should discuss the dependency and sensitivity of the results on these assumptions, either by theoretical analysis or numerical experiments.
  2. The authors should discuss the computational cost of the proposed framework, and it scalability to large datasets.

问题

  1. Please include discussion on the computational cost and scalability.
  2. Please discuss the restrictiveness of the model assumptions, and their implications on real-world data scenarios.

局限性

Please see above.

最终评判理由

I appreciate the authors' response to my questions, and will maintain the previous score.

格式问题

None

作者回复

The assumptions 1-3 are somewhat restrictive, and may be unrealistic in real scenarios, in particular the orthogonality in Assumption 1. The authors should discuss the dependency and sensitivity of the results on these assumptions, either by theoretical analysis or numerical experiments.

A: We thank the reviewer for their insightful questions regarding the assumptions of our model. We agree that discussing their scope and necessity is crucial. We address the main points below and will add these clarifications to the revised manuscript.

  • On the orthogonality assumption: (Assumption 2.1) We acknowledge that this is a strong assumption, made primarily for analytical tractability. Our framework's core mechanism is, however, robust to its violation. The key insight is that even if signal and background subspaces overlap, the contrastive energy (as analyzed in Lemma C.1) of the shared directions remains strictly positive, while that of pure background directions is zero. This allows PCA++ to distinguish the full signal space from the background. The main effect of overlap is a reduction in the generalized eigenvalue for the shared directions, but they remain detectable. For a more detailed technical sketch and experimental validation, we respectfully refer the reviewer to our response to Reviewer [9ppt].

  • On the Gaussian latent factor assumption: (Assumption 2.2) This assumption was also made for analytical convenience, as it allows for the clean derivation of exact constants and closed-form error rates. However, we expect the core results to hold more generally for sub-Gaussian distributions. Many of the key results from random matrix theory that we rely on (e.g., in Lemmas D.4 and D.5) have well-known extensions beyond the Gaussian case, often requiring only a few finite moments. We chose to present the Gaussian case for clarity and precision, but relaxing this assumption is a natural direction for future theoretical work, which would likely involve more complex proofs.

    To provide strong empirical evidence for this claim, we have run new simulations where the Gaussian latent factors (w_i,h_i)(w\_i, h\_i) and noise (ϵ_i)(\epsilon\_i) from Assumptions 2.2 and 2.3 were replaced with samples from a standardized Beta(2,2)Beta(2,2) distribution (a symmetric, bounded, non-Gaussian distribution). We repeated the experiments from Sec E.4 (Figure 4) under this new setting.

    The results below show that the empirical performance of PCA++ under Beta-distributed noise continues to align almost perfectly with our theoretical predictions, which were derived under the Gaussian assumption.

    Fixed Aspect Ratio with Beta distribution

    Aspect Ratio0.10.30.50.70.91.11.31.51.71.8
    PCA++0.0950.1950.2440.2660.2950.3610.3850.3810.4020.412
    PCA++ theory predicted0.1040.1790.2290.2680.3010.3300.3560.3790.4000.410

    Growing-spike regime with Beta distribution

    Aspect Ratio0.10.30.50.70.91.11.31.51.71.8
    PCA++0.1120.1710.2230.2600.2960.3310.3480.3600.3890.409
    PCA++ theory predicted0.1000.1710.2180.2560.2870.3150.3390.3610.3810.391

    This remarkable consistency provides strong evidence that the Gaussian assumption is a technical choice for analytical clarity rather than a strict requirement for the validity of our results, which appear to exhibit universality. We will add these new findings to the appendix to empirically support the robustness of our framework.

  • On other assumptions: (e.g., Assumption 4.1, 4.3) We have also provided a detailed justification for the high-dimensional assumptions regarding the BBP threshold and distinct eigenvalues in our response to Reviewer [r13i]. The key takeaway is that the BBP threshold is a fundamental limit of PCA-based detection, while the distinct eigenvalue assumption is a technical convenience that we validate empirically.

We believe these clarifications properly scope our theoretical contributions, and we will incorporate this discussion into the final manuscript. Thank you again for your valuable feedback.


The authors should discuss the computational cost of the proposed framework, and it scalability to large datasets.

A: Thanks for the suggestion. We will add a detailed analysis of the computational cost and scalability of PCA++ to the revised manuscript.


Computational Complexity

The computational cost of PCA++ has two main components:

  1. Covariance matrix formation: We compute the sample covariance S_nS\_n and the contrastive covariance S_n+S\_{n}^{+}. For data matrices of size n×dn \times d (samples × features), forming these d×dd \times d matrices requires matrix multiplications with a complexity of O(nd2)O(nd^{2}). In the common high-dimensional setting where dnd\gg n, this can be optimized to O(dn2)O(dn^{2}) by first computing the n×nn\times n Gram matrix.

  2. Generalized eigenvalue problem (GEP) solution: The cost is dominated by the initial truncated eigendecomposition of S_nS\_n to rank ss. Using an iterative solver like the Implicitly Restarted Lanczos Method (IRLM), as implemented in scipy, this step has a complexity of approximately O(sd2)O(sd^{2}). Since the truncation rank ss is typically much smaller than dd, this step is highly efficient.

Overall, the complexity is comparable to standard PCA, primarily driven by the feature dimension dd.


Empirical Scalability

To demonstrate its practical performance, we benchmarked the runtime of PCA++ while varying the number of samples (nn) and features (dd). The results confirm our theoretical analysis.

Computational Cost of PCA++ (in seconds)

n,dn, d1001000500010000
1000.0020.1572.2889.492
10000.0030.2043.78115.666
50000.0090.66213.91953.610
100000.0171.31026.433100.656

Setup: Truncation rank s=10s=10, top k=5k=5 eigenvectors estimated. Benchmarked on an Intel Xeon CPU @ 2.20GHz.

The benchmarks show that the runtime is dominated by the feature dimension dd, scaling quadratically, while it scales nearly linearly with the number of samples nn. This profile makes PCA++ computationally feasible for typical high-dimensional datasets with tens of thousands of features, confirming its practical scalability.

评论

I am satisfied with the authors' response, with discussions and further experiments that validate their theoretical results. The added section on computational complexity is nice. I will maintain my positive score for this paper.

评论

We sincerely thank the reviewer for their positive feedback and support. We are very pleased that our clarifications and new results were helpful.

审稿意见
5

This paper introduces PCA++, a contrastive Principal Component Analysis method designed to robustly isolate shared signal subspaces from paired observations under strong background noise. The key innovation is the addition of a hard uniformity constraint , which requires the projected features to have identity covariance. Leveraging a generalized eigen decomposition with low-rank truncation for stability, PCA++ admits a closed-form solution and provides rigorous non-asymptotic and high-dimensional guarantees under a linear contrastive factor model.

优缺点分析

Strengths:

  1. This paper introduces a hard uniformity constraint and leverages paired observations to isolate shared signal subspaces, leading to a closed-form generalized eigenproblem solution.
  2. The authors provide a theoretical explanation for how the uniformity constraint enhances robustness in contrastive learning, especially under structured noise.
  3. Numerical experiments on synthetic datasets, corrupted MNIST, and real single-cell RNA-seq data demonstrate the effectiveness of the proposed algorithm.

Weaknesses:

  1. The assumption which requires orthogonal signal and background subspaces is quite strong and may not hold in many real-world applications.
  2. The method is limited to linear data, whereas many real-world datasets exhibit nonlinear structures.
  3. The strategy for selecting the hyperparameter s is not reasonable.
  4. The limitations of PCA++ are not clearly discussed in the paper.
  5. Some typos.

问题

See weaknesses.

局限性

  1. Some assumptions might be quite strong and not hold for real-world applications.
  2. This method doesn't consider data with non-linear structure.

最终评判理由

All my concerns have been solved.

格式问题

Some typos. For example, the sentence on line 219 ends with “:” but is not followed by any content.

作者回复

We thank Reviewer [9ppt] for their constructive feedback and for recognizing the paper's strengths. We agree with their assessment of the weaknesses and will address each point to significantly improve the manuscript.


On the Orthogonality Assumption:

The assumption which requires orthogonal signal and background subspaces is quite strong and may not hold in many real-world applications.

A: We thank the reviewer for raising this crucial point. We agree this assumption is strong and adopted it for analytical tractability. However, the core mechanism of PCA++ is robust to violations of this assumption.

Why the core mechanism is robust to overlapping subspaces: The key insight, enabled by the linearity of our factor model (Eq. 2.1), is that we can analyze the behavior of the covariance matrices in a shared basis. Even with overlap, the contrastive covariance S_n+S\_{n}^{+} isolates signal components, while the standard covariance S_nS\_n accumulates variance from both signal and background in the shared directions.

Let's sketch this for a simple case. Because the model is linear, we can define an orthonormal basis that accounts for the overlap. Suppose span(A) and span(B) share a single direction v_0v\_0. We can decompose the subspaces as:

  • Signal space: span(A) is spanned by v_0,v_A,1,...,v_A,k1\\{ v\_0, v\_{A,1}, ..., v\_{A,k-1} \\}, where v_0v\_0 is the shared part and v_A,iv\_{A,i} are the pure signal directions, orthogonal to v_0v\_0.

  • Background space: span(B) is spanned by v_0,v_B,1,...,v_B,m1\\{ v\_0, v\_{B,1}, ..., v\_{B,m-1} \\}, where v_B,jv\_{B,j} are the pure background directions.

The population covariances then become:

  • Contrastive covariance: E[Sn+]=λ_Av_0v_0+_i=1k1λ_A,iv_A,iv_A,i.E[S_{n}^+] = \lambda\_{A} v\_{0} v\_{0}^\top + \sum\_{i=1}^{k-1} \lambda\_{A, i} v\_{A, i} v\_{A, i}^\top. Here, λ_A,0\lambda\_{A,0} is the signal variance in direction v_0v\_{0}. Thus, contributions from the pure background directions v_B,j\\{v\_{B,j}\\} are still averaged out. The resulting expectation remains spanned only by the signal-related directions v_0,v_A,1,...,v_A,k1\\{ v\_0, v\_{A,1}, ..., v\_{A,k-1} \\}. The shared direction v_0v\_{0} is not cancelled.

  • Standard covariance: E[Sn]=(λ_A,0+λ_B,0)v_1v_1+_i=1k1λ_A,iv_A,iv_A,i+_i=1m1λ_B,iv_B,iv_B,i.E[S_{n}] = (\lambda\_{A, 0}+\lambda\_{B, 0}) v\_{1} v\_{1}^\top + \sum\_{i=1}^{k-1} \lambda\_{A, i} v\_{A, i} v\_{A, i}^\top + \sum\_{i=1}^{m-1} \lambda\_{B, i} v\_{B, i} v\_{B, i}^\top. Here, λ_B,0\lambda\_{B, 0} is the background variance in direction v_0v\_{0}. The variance in the shared direction v_0v\_{0} is amplified by both signal and background components.

Intuition -- why PCA++ is robust to violations of orthogonality assumption: The robustness of PCA++ stems from how the generalized eigenvalue problem S_n+v=λS_nvS\_{n}^{+} v = \lambda S\_{n} v interacts with these modified covariance structures. We can analyze this through the lens of the asymptotic contrastive energy for each direction, as defined in our analysis for Lemma C.1.

  • Pure background directions (v_B,jv\_{B,j}): The contrastive energy v_B,jTS_n+v_B,jv\_{B,j}^T S\_{n}^{+} v\_{B,j} remains asymptotically zero, so these directions are filtered out.
  • Shared direction (v_0v\_0): The contrastive energy v_0TS_n+v_0v\_0^T S\_{n}^{+} v\_0 is strictly positive due to the signal component. However, its variance in the standard covariance is now amplified by both signal (λ_A,0\lambda\_{A,0}) and background (λ_B,0\lambda\_{B,0}) components.

The resulting contrastive energy for shared direction will be smaller than that of a pure signal spike, but it will still be bounded away from zero. Therefore, the PCA++ objective still detects the shared direction as part of the signal space. The fundamental mechanism—isolating all directions with non-zero contrastive energy—remains intact.


Empirical Validation with Overlapping Subspaces:

To provide strong empirical evidence, we ran new simulations with non-orthogonal signal and background subspaces and will add these to the Appendix. We ran new simulations based on the setting in Sec E.4, but introduced overlap between the signal and background subspaces. We aligned two background directions with the two weakest signal directions, creating a two-dimensional shared subspace. We tested this under two background noise levels.

  • Signal variances: [50, 25, 20, 15, 10]
  • Moderate noise background: [500, 400, 300] (pure background) + [25, 12.5] (shared background)
  • Large noise background: [500, 400, 300] (pure background) + [100, 50] (shared background)

The results for the fixed aspect ratio regime are shown below. For this demonstration, we compare against our original theoretical predictions (derived under orthogonality).

Fixed Aspect Ratio with Moderate Overlapping Noise:

Aspect Ratio0.10.30.50.70.91.11.31.51.71.8
PCA++0.1370.2070.2690.2870.3170.3110.3560.3880.4160.431
PCA++ theory predicted0.1040.1790.2290.2680.3010.3300.3560.3790.4000.410

Fixed Aspect Ratio with Large Overlapping Noise

Aspect Ratio0.10.30.50.70.91.11.31.51.71.8
PCA++0.2540.2060.2780.2800.3570.3020.3590.4050.3910.416
PCA++ theory predicted0.1040.1790.2290.2680.3010.3300.3560.3790.4000.410

Growing-spike regime with Moderate Overlapping Noise

Aspect Ratio0.10.30.50.70.91.11.31.51.71.8
PCA++0.1210.1970.2160.2610.2990.3340.3690.3740.4000.426
PCA++ theory predicted0.1000.1710.2180.2560.2870.3150.3390.3610.3810.391

Growing-spike regime with Large Overlapping Noise

Aspect Ratio0.10.30.50.70.91.11.31.51.71.8
PCA++0.1380.1820.2140.2620.3160.3230.3630.3750.3800.397
PCA++ theory predicted0.1000.1710.2180.2560.2870.3150.3390.3610.3810.391

Conclusion: The results show that PCA++ is remarkably robust, even when the orthogonality assumption is violated. In the moderate overlapping noise case, the empirical error of PCA++ continues to track the theoretical predictions remarkably well. This demonstrates that when the background variance in the shared subspace is not excessively large, the impact of the overlap is minimal.

In the large overlapping noise case, we observe a slight increase in estimation error, as expected. This is because the background noise in the shared directions becomes strong enough to reduce the effective signal-to-noise ratio, making recovery more challenging. Nevertheless, even in this challenging scenario, PCA++ remains stable and successfully recovers the signal subspace with controlled error, confirming that perfect orthogonality is not a practical prerequisite for our method's success. We will add this discussion and the full experiment to the paper.


On Selecting Hyperparameter ss:

The strategy for selecting the hyperparameter s is not reasonable.

A: We thank the reviewer for this important practical point. We agree that our paper must provide clearer guidance on choosing the truncation rank ss.

The choice of ss in truncated PCA++ controls a fundamental trade-off between the stability of the generalized eigenproblem and the effectiveness of the uniformity constraint. We will add a new subsection to the Appendix with practical guidelines for selecting s, centered on the following trade-off:

  • A small ss ensures that the inverse of the truncated covariance (S_n)_s(S\_n)\_s is well-conditioned, promoting numerical stability. However, it may discard dimensions needed to enforce uniformity effectively, potentially biasing the result.
  • A large s enforces uniformity over a larger subspace but risks instability if S_nS\_n is ill-conditioned, as including directions with near-zero eigenvalues can amplify noise. As shown in our experiments (Figure 2, left).

Based on this trade-off, we propose to combine the following practical strategies for selecting ss:

  • Information-based criterion (lower bound): As in standard PCA, one can determine a minimum ss by examining the cumulative variance explained by the eigenvalues of the sample covariance S_nS\_n. For example, choose ss large enough to capture a significant portion (e.g., 90%) of the total variance. This ensures the uniformity constraint operates on the most meaningful directions.

  • Stability-based criterion (upper bound): To ensure numerical stability, one can monitor the condition number of the truncated matrix (S_n)_s(S\_n)\_s, which is the ratio of its largest to its ss-th eigenvalue (λ_1(S_n)/λ_s(S_n))(\lambda\_1(S\_n) / \lambda\_s(S\_n)). One should choose ss such that this ratio remains below a reasonable threshold, avoiding severe ill-conditioning.

In our experiments (e.g., Figure 2, right), we found that PCA++ is robust across a reasonable range of ss. We will incorporate these guidelines into Section 4 of the paper to make the method more accessible.


On Limitations and Typos:

The limitations of PCA++ are not clearly discussed in the paper.

A: We agree with the reviewer that this was an omission. As detailed in our response to Reviewer [uGvG], we will add a dedicated Limitations section to the paper, focusing on the assumptions underpinning our theoretical model.

Some typos.

A: We will carefully proofread the manuscript and correct all identified errors.

We are confident these substantial revisions will fully address the reviewer's concerns and strengthen the paper. We are grateful for their thoughtful feedback.

评论

Dear Reviewer,

As the discussion period draws to a close, we would be sincerely grateful if you could kindly review our clarifications. If there are any remaining questions or concerns, we would be happy to address them.

If our responses have adequately addressed your queries, we would greatly appreciate it if you could consider revisiting your evaluation in light of the clarifications provided.

Thank you once again for your time and thoughtful review.

审稿意见
4

This paper utilizes ideas from contrastive learning literature to propose PCA methods for robust signal recovery under strong background noise. Firstly, the authors introduce PCA with alignment-only contrastive learning (they name it ), analyze the finite-sample performance of the proposed method PCA+, and identify some cases (high background spikes relative to signal spikes or ambient dimension comparable to the number of samples) where PCA+ is not enough for robust signal recovery. To solve this issue, they propose to use a “hard uniformity” constraint, leading to a PCA method that they named PCA++. Then, the authors provide high-dimensional characterizations of the proposed method (PCA++) and a theoretical explanation for the uniformity’s power in contrastive learning. Finally, the paper includes experimental results with simulated data (based on contrastive factor model), corrupted MNIST, and single-cell transcriptomics, indicating PCA++ outperforms standard PCA.

优缺点分析

Strengths

  1. An easily understandable paper
  2. Studies an interesting and relevant problem: robust signal recovery under strong background noise.
  3. Provides theoretical and empirical results regarding the proposed PCA methods.
  4. Gives a high-dimensional analysis of a uniformity-constrained estimator together with some insights regarding the role of uniformity in contrastive learning, which is valuable from my point of view.

Weaknesses

  • The main problem with this paper is its positioning in the literature. With the current presentation of the paper, it sounds like this is the first paper to study contrastive PCA. However, there exists some work studying contrastive PCA in the literature (see the examples below). Furthermore, these highly relevant papers are not mentioned or cited in this paper. Therefore, evaluating the contributions of this work is challenging without any comparison with the prior contrastive PCA literature. Due to this major issue, I recommend rejection.
  • Even if the presentation of the paper is fixed by adding the relevant citations and so on, the fundamental differentiating point of this work compared to the contrastive PCA literature seems to be "uniformity constraint", which is also an existing idea in the contrastive learning literature (e.g., see [41] from the references of the submitted paper). Therefore, I think the originality/contribution of this work is questionable at this point.

Example papers about "contrastive PCA" in the literature:

  • "Contrastive principal component analysis", A Abid, MJ Zhang, VK Bagaria, J Zou, arXiv:1709.06716, 2017.
  • "Probabilistic contrastive principal component analysis", D Li, A Jones, B Engelhardt, arXiv:2012.07977, 2020.
  • "Exploring high-dimensional biological data with sparse contrastive principal component analysis", P Boileau, NS Hejazi, S Dudoit, Bioinformatics, 2020.
  • "Sparse discriminant PCA based on contrastive learning and class-specificity distribution", Q Zhou, Q Gao, Q Wang, M Yang, X Gao, Neural Networks, 2023.
  • "Contrastive Functional Principal Component Analysis", E Zhang, D Li, AAAI, 2025.

问题

  1. Are the authors aware of the existence of the mentioned papers about "contrastive PCA"?

  2. What is the additional contribution of this work compared to the existing contrastive learning and contrastive PCA literature?

局限性

Limitations of the proposed method (PCA++) are not discussed explicitly in the paper.

最终评判理由

The authors properly addressed my concerns and promised to fix the issues that I mentioned in my reviews and comments. I have no further questions/concerns left, so I am raising my rating to 4. My reason for not increasing the rating further is that the initial submission missed (lacked any discussion or citation of) significantly related literature completely, which was and still is a major problem that can only be solved in the camera-ready version.

格式问题

No formatting concern

作者回复

We sincerely thank Reviewer [uGvG] for their detailed feedback and for identifying a problem in our paper's positioning. We agree that our failure to cite and discuss the previous "contrastive PCA" literature (e.g., Abid et al., 2017; Li et al., 2020, Boileau et al. 2020, Zhou et al. 2023, Zhang et al. 2025, etc) was a major oversight. We see now how this created an inaccurate impression of our work's novelty, and we are grateful for the chance to correct this problem and clarify our specific contributions.

Our response is structured to directly address the reviewer's two primary concerns:

  1. Fundamental difference in problem formulation: Our work (PCA++) addresses a very different, and arguably more common, problem setting than the previous contrastive PCA literature (cPCA) cited by the reviewer.

  2. Novelty and significance of our contribution: Our core contribution is a novel and rigorous high-dimensional analysis that yields new theoretical insights into the role of uniformity, not the invention of the uniformity concept itself.


  1. Clarifying our problem setting vs. cPCA

The reviewer's main concern stems from a misunderstanding of the problem setting, which was caused by our poor positioning. We will rectify this in the manuscript. There is a fundamental distinction:

Input Data:

  • cPCA: Two distinct datasets -- a foreground/target dataset (XX) containing both signal and background, and an explicit background-only dataset (YY).

  • PCA++: Paired "positive views" (X,X+)(X, X^{+}) that share a signal but have different background instances.

Objective:

  • cPCA: Find directions of variance in XX that are not present in YY. (e.g., via covariance subtraction Σ_XαΣ_Y\Sigma\_{X} - \alpha \Sigma\_{Y}).

  • PCA++: Find directions that are shared/invariant between the positive views (X,X+)(X, X^{+})

Typical Scenario:

  • cPCA: Comparing cases vs. controls; comparing two different experimental conditions.

  • PCA++: Self-supervised learning from data augmentations (standard setting of modern contrastive learning, e.g., SimCLR, MoCo); analyzing paired measurements (e.g., our single-cell data).

This distinction is critical. The methods cited by the reviewer are not applicable to our problem setting, as they require access to a pure background dataset, which is unavailable in the standard positive-pair contrastive learning framework. Our work is expressly designed for this latter, widely used scenario.

Action: We will revise our Introduction and Related Work sections. We will begin by clearly delineating these two distinct "contrastive" paradigms. We will properly cite and discuss the foreground-background cPCA literature (Abid et al., etc.) and explicitly state that our work addresses the positive-pair setting, thereby correctly positioning our contribution within the context of modern contrastive learning [8, 10, 41].


  1. On the novelty and significance of our contributions

The reviewer suggests our contribution is questionable because the "uniformity constraint" is an existing idea. We respectfully but firmly disagree. Our novelty lies not in proposing the concept of uniformity, but in the following non-trivial contributions:

A. Principled Motivation and Failure Analysis: We do not simply add a uniformity constraint. Our paper's narrative is a core part of our contribution:

  • We first establish a natural baseline, alignment-only pca+.

  • We then provide a novel theoretical failure analysis (Thm 3.4), proving that this intuitive approach provably fails when background noise is strong -- a key insight that motivates the need for a stronger regularizer.

  • This directly leads to PCA++, where the uniformity constraint is introduced as a principled and necessary remedy to this specific, proven failure mode.

B. New, Rigorous High-Dimensional Theory: Our primary contribution is a deep theoretical analysis that goes far beyond prior work.

  • Exact Asymptotic Limits: Instead of loose non-asymptotic bounds, we are the first to provide exact, closed-form asymptotic limits for the subspace recovery error using powerful tools from random matrix theory (Thm 4.2, 4.3). This offers a much sharper and more precise understanding.

  • Proving Why Uniformity Works: Our theory provides the first quantitative proof of how uniformity confers robustness. By comparing the error formula for PCA+ (which depends on background strength) with that of PCA++ (which is provably independent of background strength), we demonstrate a precise, mathematical mechanism. While Wang and Isola [41] noted that InfoNCE implicitly encourages uniformity, our work provides a rigorous explanation for why this is so beneficial in the presence of structured noise.

Action: We will sharpen the language in our abstract and introduction to emphasize that our core technical novelty lies in the exact high-dimensional characterization of a uniformity-constrained estimator and the new insights this theory provides into the noise-suppressing role of uniformity in contrastive learning.


  1. On Discussing Limitations

We agree that a clear discussion of limitations is essential and will add a dedicated "Limitations and Future Work" section to the revised manuscript.

The primary limitation of our work—which is also its key analytical strength—is the adoption of a linear contrastive factor model. While real-world data often exhibits non-linearities, this deliberate choice was crucial for achieving our paper's main goal: to provide a precise, quantitative explanation of how uniformity provides robustness in contrastive learning.

This tractable linear structure enabled us to:

  1. Derive exact, closed-form asymptotic limits for the subspace estimation error (Theorems 4.2 and 4.4), a stronger result than the looser upper bounds common in more general settings.
  2. Precisely isolate and demonstrate the mechanism by which the uniformity constraint filters out strong, structured background noise—a level of clarity that would be difficult to achieve with more complex, non-linear models.

Our work thus serves as a foundational, theoretical study that provides concrete insights into the role of uniformity.

Future Work: An important next step is to extend this framework to handle non-linearities. As the reviewer might anticipate, a natural direction is to develop a kernelized version of PCA++, which we plan to explore in future work. This would involve applying the same principles of contrastive alignment and hard uniformity in a reproducing kernel Hilbert space, potentially offering the same robustness benefits for non-linear data.

We will add this discussion to the paper to clearly scope our contributions and outline promising future directions.

Action: We will add this discussion to the paper to clearly scope our contributions and outline promising future directions.

We are confident that once our paper is revised to properly position our work, the novelty and significance of our contributions will be clear. Our work tackles a different problem than prior cPCA and provides a new, deep theoretical analysis that has not been previously shown. We believe the reviewer's concerns, while stemming from a real flaw in our initial manuscript's presentation, are fully addressable. We hope they will re-evaluate our work based on these clarifications and planned revisions.

评论

Thanks for the rebuttal and detailed clarifications.

I now see the positioning of the paper, i.e., the first paper to provide rigorous high-dimensional analysis of a uniformity-constrained estimator together with some insights regarding the role of uniformity in contrastive learning. I think this is valuable. However, the current presentation (including the title) highlights the PCA++ (the proposed method) more than the aforementioned theoretical contribution. Therefore, a major revision (highlighting the mentioned theoretical contribution) is required for the title, abstract, introduction, and related work as outlined in the "action" plans provided in the rebuttal.

Regarding the proposed method (PCA++) and the setting, I still have some concerns as follows:

  1. While I understand the distinction between cPCA and PCA++, I think the cPCA technique can still be used to solve the problem PCA++ is solving. Namely, starting from your positive-pair samples (xi,xiT)(x_i,x_i^T), you can construct target samples ti:=12(xi+xi+)t_i := \frac{1}{2} (x_i + x_i^+) and background samples bi:=12(xixi+)b_i := \frac{1}{2} (x_i - x_i^+) and then apply cPCA on them. By tuning the α\alpha parameter in cPCA objective or using its parameter-free variant cPCA++ (Salloum and Kuo, 2022), you may achieve reasonable results (can be a good baseline). Have you considered this?
  2. Furthermore, the problem that PCA++ is solving is also related to a version of canonical-correlation analysis (CCA) where the canonical directions are shared (kept the same for the two datasets when applying CCA). Also, CCA formulation involves a covariance constraint among the found canonical directions, which seems partially related to the uniformity constraint (specifically related to the 0 entries on the off-diagonal of II). Overall, could you please comment on the relation between CCA and your method (PCA++)?
  3. The used "Contrastive Factor Model" (Eq. 2.1) is stated to provide a positive pair of samples (xi,xi+x_i, x_i^+) with the same signal differing in background and noise. However, considering the Assumptions 2.2 and 2.3, indeed, the background and noise for xix_i are identical in distribution to those of xi+x_i^+. In this regard, xix_i and xi+x_i^+ can be considered as two different samples of the same distribution. This seems to contradict the original motivation (i.e., same signal but different in background and noise), oversimplifying the setting. Could you please comment on this as well?

I am willing to reconsider my evaluation based on your responses to my questions.

(Salloum and Kuo, 2022): "cPCA++: An efficient method for contrastive feature learning", Pattern Recognition, 2022.

评论

Experimental results (Subspace Error)

The results are provided in the following tables. As the results clearly show, the cPCA adaptation is highly unstable even with its optimal hyperparameter setting α=1\alpha=1. This, combined with the conceptual issue in the truncated cPCA++ approach, demonstrates that these methods are not well-suited for this problem. In contrast, PCA++ method remains stable, highlighting the robustness of our GEP formulation. We will add these results to the final manuscript.

Results with with Moderate noise background

Aspect Ratio0.10.30.50.70.91.11.31.51.71.8
cPCA0.2930.8190.4300.9770.2840.6260.8130.5811.0000.964
cPCA++1.0001.0001.0001.0001.0001.0001.0000.9990.9991.000
CCA0.1600.4880.9991.0001.0001.0001.0000.9991.0001.000
PCA++0.1250.2120.2500.2750.3110.3470.3750.3880.4150.401
PCA++ theory predicted0.1040.1790.2290.2680.3010.3300.3560.3790.4000.410

Results with Large noise background

Aspect Ratio0.10.30.50.70.91.11.31.51.71.8
cPCA1.0001.0000.9990.9410.9980.9990.9981.0000.9980.996
cPCA++1.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
CCA0.1540.3690.9991.0001.0001.0001.0001.0001.0001.000
PCA++0.1560.1850.2300.2830.3210.3390.3790.3530.4100.447
PCA++ theory predicted0.1040.1790.2290.2680.3010.3300.3560.3790.4000.410

  1. On the Relation to Canonical Correlation Analysis (CCA):

We thank the reviewer for this question, as the connection to Canonical Correlation Analysis (CCA) is insightful and helps clarify the unique contribution of PCA++.

While both methods aim to find shared structure between paired datasets, they differ crucially in their objectives and constraints, leading to significant performance gaps in noisy, high-dimensional settings.

Objective functions and constraints:

  • Standard CCA: Seeks two distinct projection matrices, UU and VV, that maximize the correlation between the projected views UTXU^T X and VTX+V^T X^+. The objective for the leading component is: max_u,vuT(XTX+)v(uT(XTX)u)(vT(X+TX+)v)\max\_{u,v} \frac{u^T (X^T X^+) v}{\sqrt{(u^T (X^T X) u) (v^T (X^{+T} X^+) v)}} The key instability lies in the denominator, which normalizes by variance from both views (XX and X+X^{+}). In our problem setting, both views contain large, independent background and noise components, making this normalization highly susceptible to noise amplification. Although one can also revise CCA for the positive-pair setting by imposing constrain U=VU=V, the objective becomes a Generalized Rayleigh Quotient, which, to our knowledge, does not have the same closed-form solution as standard CCA.

  • PCA++: Our formulation makes two critical changes that enhance stability:

    • Shared projection space: It enforces U=VU=V, searching for a single, shared subspace that captures the signal common to both views.
    • Robust normalization: Instead of normalizing by the noisy variance from both views, PCA++ maximizes the shared covariance vT(XTX+)vv^T (X^T X^+) v subject to a hard uniformity constraint on only one view vT(XTX)v=1v^T (X^T X) v = 1. This constraint acts as a powerful regularizer, anchoring the solution to a more stable variance structure and preventing distortion from the background and noise present in the second view X+X^+.

Empirical Performance:

This difference in formulation has a dramatic impact on performance. As our new experiments (provided in our response to the previous question) demonstrate:

  • Standard CCA is highly unstable in the settings we study. It fails to recover the signal, with its performance collapsing as dimensionality or noise increases.

  • PCA++ remains stable and robust, successfully recovering the signal subspace across all tested regimes.

This confirms that our specific choice of a shared projection space and a single-view uniformity constraint is essential for reliable signal recovery in this problem setting. We will add a discussion of this important relationship with CCA to the revised paper.

评论

  1. On the "Contrastive Factor Model" and i.i.d. Views:

We thank the reviewer for this observation about our modeling assumptions. The reviewer notes that under Assumptions 2.2 and 2.3, x_ix\_i and x_i+x\_i^+ are drawn from the same marginal distribution. This is a deliberate choice that aligns with standard theoretical models of contrastive learning.

The goal in contrastive learning is to learn representations that are invariant to nuisance variations while being sensitive to shared semantic content. Our model captures this by linking x_ix\_i and x_i+x\_i^+ through a shared latent signal w_iw\_i. The "contrast" arises because they are conditioned on different instances of background and noise (h_ih\_i vs. h_i+)h\_{i}^{+}). The task is to recover the signal AA that is invariant across these instances. Considering the positive pair (x_i,x_i+)(x\_i, x\_i^+) as a single sample from a joint distribution with identical marginals is a standard way to formalize this (e.g., as in Wang and Isola, [41]).

The reviewer suggests that assuming x_ix\_i and x_i+x\_{i}^{+} come from different distributions might seem more realistic. However, such a setting could paradoxically simplify the learning problem. For example, if the background subspace for x_i+x\_{i}^+ (say, B+B^+) were orthogonal to the background subspace for x_ix\_i (say, BB), then the cross-term in the contrastive covariance would be exactly (Aw_i+Bh_i)T(Aw_i+B+h_i+)=AAT(Aw\_i+Bh\_i)^T(Aw\_i+B^{+}h\_{i}^+) = AA^T. In this scenario, S_n+S\_{n}^+ would be a cleaner estimator of AATAA^T than in our current, more challenging setting, where both views share the same background space BB. Therefore, our assumption of identical marginal distributions is not an oversimplification but rather a standard and rigorous formulation for the problem of learning invariances from different augmentations or views of the same underlying data type. We will clarify this motivation in the paper.

评论

Thanks for your time and effort in preparing such a comprehensive response, involving additional experimental results and thorough explanations.

I find the results useful for differentiating the proposed method, PCA++, from other techniques in the literature. Also, I mostly agree with your explanations. I want to clarify one point, which is that I meant "imposing constraint U=VU=V for CCA," but I understand it may be relatively challenging to deal with that case due to the lack of a closed-form solution. Other than this point, I am satisfied with your answers, and I believe our discussions will be useful for the final paper.

Therefore, I have no further questions/concerns left, so I am raising my rating to 4. My reason for not increasing the rating further is that the initial submission missed (lacked any discussion or citation of) significantly related literature completely, which was and still is a major problem that can only be solved in the camera-ready version.

评论

We thank Reviewer [uGvG] for their thoughtful re-evaluation and helpful follow-up questions. We address each of the additional points below.


  1. On Using cPCA and cPCA++ as a Baseline:

Synthesizing a foreground x_f=(x+x+)/2x\_{f} = (x + x^{+})/2 and background x_b=(xx+)/2x\_{b} = (x - x^{+})/2 to use with cPCA-like methods is an interesting approach. While this is theoretically sound at the population level, our new experiments, prompted by the reviewer's comment, reveal that it can be less stable in finite-sample settings compared to our proposed PCA++.

At the population level, the signal is perfectly isolated. Under the contrastive factor model, the population covariances of the synthesized data are:

  • Σ_f=E[x_fx_fTˆ]=AATˆ+12BBTˆ+12I_d\Sigma\_f = \mathbb{E}[x\_f x\_f\^{T}] = AA\^{T} + \frac{1}{2}BB\^{T} + \frac{1}{2}I\_d

  • Σ_b=E[x_bx_bT]=12BBT+12I_d\Sigma\_b = \mathbb{E}[x\_b x\_b^{T}] = \frac{1}{2}BB^{T} + \frac{1}{2}I\_d

Therefore, the difference Σ_fΣ_b=AATˆ\Sigma\_f - \Sigma\_b = AA\^T perfectly recovers the signal covariance. The practical challenge, however, arises from finite-sample estimation error. Define foreground matrix X_f=(X+X+)/2X\_{f} = (X + X^{+})/2 and background matrix X_b=(XX+)/2X\_{b} = (X - X^{+})/2. The sample covariances of the synthesized data are Σ^_f=X_fTX_f/n\hat{\Sigma}\_f = X\_{f}^T X\_{f}/n and Σ^_b=X_bTX_b/n\hat{\Sigma}\_b = X\_{b}^T X\_{b}/n, respectively.

  • Regarding cPCA (Abid et al., 2017): The reviewer's proposed approach relies on the matrix subtraction Σ^_fαΣ^_b\hat{\Sigma}\_{f} - \alpha \hat{\Sigma}\_{b}, this subtraction of two large, noisy matrices can amplify noise, potentially overwhelming the true signal and leading to unstable eigenvectors.

  • Regarding cPCA++ (Salloum and Kuo, 2022): This method solves the generalized eigenvalue problem with Σ^_b1Σ^_f\hat{\Sigma}\_{b}^{-1} \hat{\Sigma}\_{f}. As we demonstrate in our paper (Figure 2), directly inverting Σ^_b\hat{\Sigma}\_{b} is numerically unstable when dimension dd is large. The reviewer might suggest stabilizing this by using a truncated pseudoinverse, similar to our PCA++. However, this reveals a deeper conceptual issue: this procedure is equivalent to projecting the foreground data X_fX\_{f} onto the principal subspace of the background data X_bX\_{b} and then performing PCA. Since the synthesized background X_bX\_{b} contains no signal AA by construction, its principal subspace is also signal-free. Projecting the foreground onto this signal-free subspace would annihilate the very signal we aim to recover. Therefore, a truncated cPCA++ would fundamentally fail.

Empirical Validation:

To investigate these different approaches empirically, we performed a new set of experiments comparing PCA++ against alternative methods proposed by the reviewer. We will add a full discussion of these results to the Appendix.

  1. Baselines on synthesized data: Following the reviewer's suggestion, we evaluated methods that operate on synthesized foreground X_f=(X+X+)/2X\_{f} = (X + X^{+})/2 and background X_b=(XX+)/2X\_{b} = (X - X^{+})/2 data. This includes standard cPCA and cPCA++ (based on a difference of covariances). This allows us to directly test the stability of the "subtract-then-decompose" approach versus our "decompose-from-cross-covariance" method.

  2. Canonical Correlation Analysis (CCA): We also introduced CCA as a canonical and highly relevant baseline. We apply it by treating the paired data matrices, XX and X+X^+, as the two views. In our model, the shared signal is the sole source of population-level correlation between these views. Therefore, CCA, which finds directions of maximal correlation, is theoretically suited for recovering the signal subspace and serves as a strong benchmark.

We consider the following experiment setup:

  • Experimental Setup: The experiment follows the n=500n=500, k=5k=5 fixed aspect ratio setting from Sec E.4. For cPCA, we set α=1\alpha=1. This is the most principled choice as it yields an unbiased population estimator E[Σ^_fΣ^_b]=AAT+BBT/2+I_d/2BBT/2I_d/2=AAT.\mathbb{E}[\hat{\Sigma}\_{f} - \hat{\Sigma}\_{b}] = AA^T + BB^{T}/2 + I\_d/2 - BB^{T}/2 - I\_d/2 = AA^{T}. We tested two scenarios: one with moderate background noise and one with strong background noise.

  • Signal variances: [50, 25, 20, 15, 10]

  • Moderate noise background: [100, 50, 40, 30, 20]

  • Large noise background: [500, 400, 300, 200, 100]

评论

Dear Reviewer uGvG,

Thank you for your thoughtful re-evaluation and for engaging so constructively with our work. Your feedback has been invaluable in helping us substantially improve the manuscript.

We recognize that your primary initial concern was the paper's novelty and positioning. We have worked diligently to address this, and the manuscript has been significantly enhanced as a result. The key improvements include:

  1. Clarified novelty with extensive new baselines: We have thoroughly revised the paper to highlight our unique focus on robustness against strong, structured background noise. Based on your suggestions, we performed extensive new experiments comparing PCA++ against CCA and cPCA-style baselines, empirically demonstrating the superior stability of our GEV-based formulation.

  2. Robustness beyond idealized assumptions: We have added new theoretical discussions and simulations to show that PCA++ remains effective even when key model assumptions like orthogonality, Gaussianity, and distinct eigenvalues are violated, confirming its practical utility.

  3. Enhanced practical guidance: To make our work more complete, we have added a principled strategy for hyperparameter selection, a detailed computational scalability analysis, and a new Limitations and Future Work section.

Given that the original primary obstacles to acceptance have been fully addressed with new theoretical discussions and empirical results, we hope you'll agree that the manuscript in its current form is a strong, clear, and validated contribution.

We would be very grateful if you would consider the paper in its current, substantially improved state for a score more reflective of a clear accept.

Thank you again for your invaluable guidance throughout this process.

Best,

The Authors of Submission #26142

审稿意见
5

The paper introduces a new contrastive PCA algorithm, PCA++, that isolates shared signals in positively related datasets under strong background noise. PCA++ enforces hard uniformity constraints in the projected features. And it results in closed-form solutions with stability and non-asymptotic guarantees. The paper compares this approach with alignment only contrastive PCA and reports improved performance on synthetic (simulation, MNIST) and one real-world (sequencing data) experiments. The paper further provides justification and proof for the importance of uniformity in improving performance and makes PCA robust to high background noise.

优缺点分析

S1: The paper presents a well-motivated problem. It presents the classical and alignment-only contrastive PCA and their limitations with examples, setting up a perfect ground for the proposed method, PCA++. In addition, the paper is easier to follow, with a clear outline of its contribution.

S2: The paper presents a novel contribution to contrastive learning theory by analysing the role of uniformity constraints on projected features in high-dimensional strong noise setting. This result could have an impact on other machine learning areas.

S3: The paper precisely shows the limitation of PCA and the utility of the proposed techniques, with the real-world experiments showing isolating noise and extracting shared space. Specifically, the synthetic experiments showed clear improvement in error trends with aspect ratio and truncation levels.

Weakness

W1: While the authors have mentioned the work could be extended to other branches of contrastive learning, the authors did not discuss the limitations of the work, for instance, with regards to the assumption or settings considered (Assumptions 4.2, and 4.3), and how real-world data conforms to these assumptions.

W2: The simulation experiment shows the effect of varying aspect ratio d/n, and truncation etc.. However, there seems to be a missing discussion on how these factors affect the performance of PCA++ in real-world experiments (RNA sequencing).

W3: I think the only real-world experiment presented in the paper ( using sequencing data ) is limiting in judging how important the uniformity constraint is in contrastive PCA. I think it would be important to at least show how well the alignment-only PCA+ does on the data. This limited real-world experiment and the considered baselines make it hard to assess the utility of the method in real-world experiments.

问题

  • In Figure 3, only classical PCA seems to be considered as a baseline for comparison. This limited real-world experiment and the considered baselines make it hard to assess the utility of the method in real-world experiments.

  • I encourage authors to consider citing an important early work on contrastive PCA[1] and comparing the method with existing baselines.

[1] Contrastive Principal Component Analysis https://arxiv.org/pdf/1709.06716

局限性

While the authors have mentioned extensions to other existing contrastive learning methods, the limitations of this work are not clearly outlined.

最终评判理由

My main concern, also shared by other reviewers, is the comparison of the paper with prior contrastive PCA work. The authors have responded with clarification on the differences from existing contrastive PCA works. And I am satisfied with the responses, and I will maintain my original score and confidence.

格式问题

None

作者回复

... the authors did not discuss the limitations of the work, for instance, with regards to the assumption or settings considered (Assumption 4.2, and 4.3), and how real-world data conforms to these assumptions.

A: We thank the reviewer for their thoughtful questions regarding the assumptions underlying our main theoretical results. We agree that discussing the scope and necessity of these assumptions is important. We address each point below and will add these clarifications to the camera-ready manuscript. (For a discussion of the paper's broader limitations, such as the linear model, please see the detailed response to Reviewer [9ppt].)

  • On Assumption 4.1 (BBP detectability threshold): This assumption, λ>c\lambda > \sqrt{c}, is a classical and fundamental requirement in high-dimensional PCA for any spiked covariance model. As we note on lines 216-217, any signal spike weaker than this threshold becomes statistically indistinguishable from the noise bulk of the Marcenko-Pastur distribution. This is a fundamental limit of signal detection via PCA in this regime, not a limitation specific to our PCA++ method.

  • On Assumption 4.3 (distinct eigenvalues): This assumption was made primarily for analytical convenience. With distinct eigenvalues, we can cleanly map each sample eigenvector to a population counterpart using existing results (Lemmas D.4, D.5). Without this assumption, if a set of eigenvalues were identical, we would instead estimate a subspace for those directions. Within that estimated subspace, it would be difficult to distinguish which basis vectors correspond to signal and which to background just by looking at SnS_n alone, making it harder to explain how the uniformity constraint works on a per-direction basis.

The key insight is that even if the standard covariance S_nS\_n has degenerate subspaces (i.e., multiple identical eigenvalues mixing signal and background components), the contrastive covariance S_n+S\_{n}^{+} resolves this ambiguity. Since S_n+S\_{n}^{+} has asymptotically zero energy on pure background directions, the generalized eigenvalue problem can still correctly identify the signal subspace and separate it from the background.


To empirically validate this claim, we have run a new set of simulations for the setting in Sec E.4 (Figure 4) but with:

  • Signal variances: [50, 50, 20, 15, 10]
  • Background variances: [500, 500, 300, 50, 50]

The results below show that even with these degeneracies, the empirical subspace error for PCA++ continues to track our theoretical predictions, which were derived under the distinct spike assumption. This provides strong evidence that the assumption is a technical convenience rather than a practical necessity.

Fixed aspect ratio regime

Aspect Ratio0.10.30.50.70.91.11.31.51.71.8
PCA++0.1200.1890.2300.2620.2880.3230.3610.3660.4230.418
PCA++ theory predicted0.1040.1790.2290.2680.3010.3300.3560.3790.4000.410

Growing-spike regime

Aspect Ratio0.10.30.50.70.91.11.31.51.71.8
PCA++0.1360.1950.2280.2700.3070.3210.3530.3840.4030.384
PCA++ theory predicted0.1000.1710.2180.2560.2870.3150.3390.3610.3810.391

We will add a remark clarifying the role of these assumptions and include this new empirical validation in the appendix to strengthen our claims. Thank you again for pushing us to be more precise on these important details.

... I think it would be important to at least show how well the alignment-only PCA+ does on the data...

A: Thank you for your thoughtful feedback regarding the limitations of our real-world experiments and the importance of evaluating the uniformity constraint in contrastive PCA.

In response, we have included a comprehensive comparison across several methods, including standard PCA, alignment-only PCA+ (which lacks the uniformity constraint), PCA++, as well as additional baseline dimensionality reduction techniques (U-MAP, t-SNE, Robust PCA [Candes et al., 2011]). All methods were evaluated on the same single-cell RNA-seq data, focusing on their ability to recover cell-type groupings in both control and stimulated conditions.

The Adjusted Rand Index (ARI; lower is better in this context) for each method is summarized in the table below:

CellsB CellsNK CellsAll
PCA0.20840.29770.25020.1478
PCA+0.00360.02420.02990.014
PCA++0.00080.0050.00490.0172
U-MAP0.20310.25060.20780.1392
t-SNE0.02330.09670.04770.0217
Robust PCA0.03990.19070.13140.0443

As shown, both PCA+ and PCA++ dramatically reduce the ARI compared to standard PCA and other popular methods, indicating much better mixing of the two conditions within cell types. Importantly, adding the uniformity constraint in PCA++ further improves the results over PCA+, especially in B cells and NK cells. These quantitative results highlight the utility of both the alignment and uniformity components. PCA++ achieves superior separation of invariant populations compared to alternative approaches. We will includ these expanded results and discussion in the revised manuscript.

I encourage authors to consider citing an important early work on contrastive PCA[1] and comparing the method with existing baselines.

A: We thank the reviewer for their suggestion to cite contrastive PCA (cPCA) [1] and for the prompt to compare with existing baselines. We will add this important citation and others from the cPCA literature to our related work section.

Regarding a direct comparison, it is important to note that our work and the cPCA method in [1] address fundamentally different problem settings:

  • cPCA: This method requires two distinct datasets: a "foreground" dataset containing the signal of interest and an explicit "background" dataset. Its goal is to find variance in the foreground that is absent in the background.
  • Our work (PCA++): We operate in the standard self-supervised learning setting, where we are given only paired positive views (X,X+)(X, X^{+}) of the data, where both views contain a mixture of the foreground signal and background context. Our method does not require access to a pure background dataset.

Because our setting does not provide the separate background dataset that cPCA requires, their method cannot be applied, which prevents a direct comparison.


For a more comprehensive discussion of the distinctions between our framework and the broader cPCA literature, we respectfully refer the reviewer to our detailed response to Reviewer [uGvG], where we elaborate on these points. Thank you again for bringing this important line of work to our attention.

评论

I thank the authors for the additional experiments and the discussion on how cPCA methods relates to theirs. I highly encourage authors to include the discussion in the revised paper. It would make the position of the paper in the literature clear and enhance its readability.

评论

We sincerely thank the reviewer for their positive and constructive feedback. We will incorporate all the discussed clarifications and new experiments into the final paper.

最终决定

This paper introduces PCA++, a contrastive PCA framework specifically designed for datasets with strong background noise. The paper is well-motivated by an important problem for the contrastive learning audience, with technically solid theoretical results and strong empirical evidence. Three reviewers recommended acceptance while one reviewer recommended rejection in the original review. The negative recommendation was primarily due to mispositioning this paper in the contrastive dimension reduction literature, which is a fair comment given the use of same terminology, as contrastive PCA does exist in the other literature for case-control studies. During the rebuttal, such concern was addressed by distinguishing between contrastive learning, a self-supervised learning method, and contrastive dimension reduction for case-control studies, as confirmed by the reviewer. Finally, all reviewers recommend acceptance, which I agree with.

However, in the camera-ready version, the authors should clarify the distinction between contrastive learning and contrastive dimension reduction and cite related references to better position this paper.