/10

Rejected3 位审稿人

最低2最高3标准差0.5

ICML 2025

Consistent Multigroup Low-rank Approximation

Antonis Matakos,Martino Ciaperoni,Heikki Mannila

OpenReview PDF

提交: 2025-01-23更新: 2025-06-18

TL;DR

We introduce a new low-rank approximation method for multigroup data.

摘要

We consider the problem of consistent low-rank approximation for multigroup data: we ask for a sequence of $k$ basis vectors such that projecting the data onto their spanned subspace treats all groups as equally as possible, by minimizing the maximum error among the groups. Additionally, we require that the sequence of basis vectors satisfies the natural consistency property: when looking for the best $k$ vectors, the first $d

关键词

Low-Rank Approximation; Multigroup Dimensionality Reduction; Fair PCA

评审与讨论

审稿意见

评分: 32025-03-06

The paper introduces the concept of "consistent multigroup low-rank approximation," which extends the principles of singular value decomposition (SVD) to handle data partitioned into multiple groups. The goal is to find a set of basis vectors that minimize the maximum reconstruction error across all groups while maintaining the consistency property of SVD.

给作者的问题

see above.

Additional questions:

How does the performance of the method degrade as the number of groups increases?
Could the framework be extended to nonlinear low-rank approximation methods (e.g., kernel PCA)?

论据与证据

The Claims in this paper are mostly well formulated and supported by an analytical framework
The empirical evaluations on various datasets demonstrate the practical applicability and effectiveness of the proposed methods

方法与评估标准

The proposed methods and evaluation criteria make sense for the problem at hand. The use of real-world datasets and synthetic data ensures a comprehensive evaluation of the algorithms' performance.

理论论述

I have not checked the proofs in detail, but the theoretical claims look sound.

Comment:

The convexity analysis indicates that the primal problem is non-convex, which might raise concerns about convergence guarantees for more than two groups. Can the authors provide comment and a numerical test to mitigate this concern?

实验设计与分析

The experimental designs seem sound and valid. The paper evaluates the algorithms on both real-world and synthetic datasets, providing a diverse set of scenarios to test the algorithms' performance.

The experiments cover the case of more than 2 groups, where some theoretical guarantees (optimality) are not valid. The results look reasonable. Question: How does the method perform for a high number of groups, i.e. more than 20 groups?

补充材料

I looked at the additional numerical experiments. they look reasonable.

与现有文献的关系

The paper builds upon existing techniques in low-rank approximation, fair PCA, and algorithmic fairness. It references foundational works and recent advancements in the field, positioning its contributions within the broader literature.

I am not an expert in tensor completion, thus I defer a more detailed judgement to fellow reviewers.

遗漏的重要参考文献

see above.

其他优缺点

Pro:

The paper is well-structured, with clear explanations of the algorithms and their underlying principles

Contra:

A more detailed disussion about the limitations for a high number of groups would be beneficial for the paper. Non-convexity of the optimization problem raises concerns of convergence issues in practise.

其他意见或建议

Please fix the formatting of Figure 3 in the appendix, it seems to be out of bounds.

作者回复

2025-03-29

Thank you for your comprehensive review and for sharing new interesting ideas.

"The convexity analysis indicates that the primal problem is non-convex, which might raise concerns about convergence guarantees for more than two groups. Can the authors provide comment and a numerical test to mitigate this concern? A more detailed discussion about the limitations for a high number of groups would be beneficial for the paper. Non-convexity of the optimization problem raises concerns of convergence issues in practice."

Thank you for raising this excellent point. We will add to the current manuscript a discussion on the regime of a high number of groups in the data, focusing on the convergence of our algorithms and offering numerical tests.

A note on the convergence of our method. We clarify that, even though the primal problem is indeed non-convex, the dual problem is always convex. In our work, we introduce algorithms that either solve the dual problem or its dual (the bidual). Thus, from an optimization perspective, our algorithms solve convex problems so that they are always expected to converge, which is why we have not included an extensive discussion of convergence issues in the manuscript.

The effect of the number of groups on convergence. As we discuss in the manuscript, for more than two groups strong duality does not hold in general, meaning that our algorithms are not guaranteed to retrieve the optimal solution of the primal problem (but only the optimal solution of the dual problem). However, as we show in the experimental evaluation, the duality gap is consistently narrow. In practice, our algorithms quickly converge, regardless of the number of groups. If the reviewer is interested, we have added a simple numerical experiment to our repository showing that our method quickly converges also when the number of groups becomes large (https://anonymous.4open.science/r/multigroupSVs-F716/notebooks/ManyGroupsExample.ipynb).

"The experiments cover the case of more than 2 groups, where some theoretical guarantees (optimality) are not valid. The results look reasonable. Question: How does the method perform for a high number of groups, i.e. more than 20 groups?"

This is certainly an interesting question.

A note on the choice of datasets in our experiments. Consistent with related work, we primarily rely on benchmark datasets that can be naturally partitioned in two, three or four groups, which is a setting particularly widespread in the real world. At the same time, for the specific purposes of the experiments, considering too large datasets (with many large groups), provided that they can be found, would preclude the comparison with the main baseline (FAIR-PCA), which hinges upon an expensive SDP solver and thus quickly incurs scalability issues as the data size grows.

The setting with a large number of groups. We note that the Frank-Wolfe algorithm and the SDP we introduce can handle any number of groups, and their performance is not intrinsically connected to the number of groups. This is also suggested by the experimental results on the COMPAS and COMMUNITIES dataset (although the number of groups increases only up to three and four). Nevertheless, as thoughtfully observed by the reviewer, it is interesting to study the behavior of our method as the number of groups becomes very large. Should the reviewer already wish to explore this question further, we have added to our repository a simple numerical experiment to investigate the duality gap (and hence the performance) of our method as the number of groups increases from 3 to 25 (https://anonymous.4open.science/r/multigroupSVs-F716/notebooks/ManyGroupsExample.ipynb).

"Please fix the formatting of Figure 3 in the appendix, it seems to be out of bounds."

Thank you for noticing. We will fix the figure to constrain it within the page bounds.

"Could the framework be extended to nonlinear low-rank approximation methods (e.g., kernel PCA)?"

We thank the reviewer for this stimulating suggestion. Developing a nonlinear method for consistent multigroup low-rank approximation surely represents an exciting avenue for future research.
We believe that both our problem formulation and method are amenable to extension toward nonlinear settings.
Specifically, since our method is inspired from the SVD, future work could introduce a method inspired from Kernel SVD (or Kernel PCA). To accomplish this, we would compute a kernel matrix and then extract and study the multigroup singular vectors associated with the kernel matrix.

审稿意见

评分: 32025-03-10

This manuscript addresses the problem of consistent low-rank approximation for multigroup data. It aims to find a sequence of k basis vectors that treats all groups equally by minimizing the maximum error among them and satisfies the consistency property. The paper proposes an iterative algorithm that adds the vector with the best rank - 1 projection according to the min - max criterion and projects the data onto its orthogonal complement. It uses primal - dual approaches or semidefinite programming to find the best rank - 1 projection. The theoretical analysis shows that for two - group data, the rank - 1 problem can be solved optimally and the algorithm has polynomial - time complexity. Experimental results on real - world datasets in the FAIR - PCA task demonstrate that the proposed method outperforms existing methods in terms of fairness and efficiency, with a more balanced low - dimensional data representation and shorter running times.

给作者的问题

I have a question about whether the method studied by the author can be effectively transferred to other fields such as clustering and matrix completion

论据与证据

The claims in the paper are generally supported by clear and convincing evidence.

方法与评估标准

The method proposed by the author may bring new inspiration in solving large-scale problems?

理论论述

实验设计与分析

The experimental part is relatively sufficient

补充材料

no supplementary material

与现有文献的关系

遗漏的重要参考文献

The author should give a more detailed description of the relevant background.

其他优缺点

Although the paper is innovative as a whole, there is insufficient emphasis on the comparison and differentiation between the paper and the existing research when explaining the innovation points. The essential differences in principle, algorithm and performance from existing multi-group data dimensionality reduction methods should be pointed out more clearly.

其他意见或建议

It is suggested that the author improve the content of related work and introduce the research content in detail to facilitate readers' understanding.

作者回复

2025-03-29

We sincerely thank you for your review and valuable comments, which will also be very useful in future work.

"The method proposed by the author may bring new inspiration in solving large-scale problems?"

Our method guarantees high scalability. We appreciate the suggestion of the reviewer. An important advantage of our method over existing approaches for closely related problems is the scalability it ensures. By imposing the consistency property on the output low-rank representation, we effectively decompose a larger problem into multiple smaller problems that can be efficiently solved. The Frank-Wolfe algorithm we introduce as well as the algorithm we design tailored to the two-group case (but not the SDP solver) scale to large problem instances.

Possibilities for future work addressing large-scale problems. For future work, it would be interesting to study even more scalable (stochastic) hardware-accelerated gradient-based solutions to extract multigroup singular vectors. It would also be interesting to study different applications of our method beyond fairness, since our method can provide high-quality data compression whenever a partitioning of the data in multiple groups is relevant.

"Although the paper is innovative as a whole, there is insufficient emphasis on the comparison and differentiation between the paper and the existing research when explaining the innovation points. The essential differences in principle, algorithm and performance from existing multi-group data dimensionality reduction methods should be pointed out more clearly. It is suggested that the author improve the content of related work and introduce the research content in detail to facilitate readers' understanding."

As pointed out when addressing the comments of Reviewer 2Q4n, we fully share this concern. We will give more details on related work, with particular emphasis on the work of Samadi et al (2018)., which represents the closest existing work to ours. We will highlight more the fundamental aspects that set our problem and method apart from related work, and we will separate more clearly our innovation points from the state of the art in multigroup-data dimensionality reduction.

"I have a question about whether the method studied by the author can be effectively transferred to other fields such as clustering and matrix completion "

We again thank the reviewer for their useful suggestions.

Clustering. Unlike clustering methods, our method considers a partitioning of the data into groups that is given as input.

However, our method, being a low-rank approximation method, could be used to address clustering tasks. For instance, one can take advantage of our method in preprocessing, to extract informative features that can be used by existing clustering algorithms.

Similarly, clustering algorithms can be used to address multigroup low-rank approximation. In particular, we can leverage clustering algorithms prior to our method to find a meaningful partitioning of the data into clusters (i.e., a clustering). Then, we can give the clustering in input to our method and obtain a balanced low-rank approximation that takes the clustering into consideration.

For future work, it would be valuable to design an iterative procedure that combines clustering and multigroup low-rank approximation, by iteratively optimizing the clustering structure and the corresponding multigroup low-rank approximation.

Matrix completion. Our method can also be regarded as an extension of the singular value decomposition (SVD) that incorporates a partitioning of the data into multiple groups. The SVD has been successfully leveraged to tackle matrix-completion tasks. Given a partially observed matrix, our method could be extended to address matrix-completion tasks by accounting for the missing values. In this direction, a straightforward approach would be to impute the missing values (e.g., by the mean), and then compute the multigroup singular vectors and the resulting multigroup low-rank approximation as usual. In future work, we will study more refined approaches for multigroup matrix-completion tasks.

审稿意见

评分: 22025-03-14

The paper studies the problem of estimating the mutligroup singular vectors for the multigroup FAIR PCA method. The Frank-Wolfe method and the SDP relaxation method are both conduct to solve the min-max type nonconvex object function.

update after rebuttal

(Sorry for the late update.) The paper studies the problem of estimating the mutligroup singular vectors for the multigroup FAIR PCA method, which shares similar formulation and objective target to Samadi et al. (2018). The strategy of iterative updating rank-1 approximation, and the proposed Frank-Wolfe method and SDP relaxation method described in the main context are different to previous works.

However only little discussions rather than theoretical justifications are provided for these algorithms. In theory, under the same two-group setting in Samadi et al. (2018), this work study the theory of a further more different algorithm, and show the optimality of it. But the detail of the third algorithm is only shortly mentioned in Lemma E.1 in the appendix.

In general, it could be a novel and interesting one, but the quality of the overall organization makes it hard to recognize the contribution of the whole work, which needs a detailed refinement. For the above reasons, after the rebuttal, I decide to give a score of 2 for the final decision of myself.

给作者的问题

I have no further questions.

论据与证据

On lines 18–19, the authors assert that prior methods, such as those in Samadi et al. (2018), do not guarantee the consistency property of the SVD. This is confusing because the min-max loss considered in this paper is identical to that in Samadi et al. (2018), and the same SDP relaxation is applied. How can the authors claim that their method is consistent while others are not?

方法与评估标准

The min-max PCA loss is commonly used for the multi-source aggregation problem and is suitable for the Fair-PCA problem. The numeric evaluations make sense. But the authors claim that they show their SVD estimations are consistent through numeric study. The metrics like the marginal, incremental, and reconstruction loss do not directly imply the consistency. One may also be interested in the estimation error of the estimated singular vectors.

理论论述

The theoretical results in Theorem 7.1, Lemma 7.2, Lemma 7.3 is incomplete. Which algorithm achieves the poly-nomial time? What does the tight property guarantee? etc. Some detailed discussions and remarks are needed.

实验设计与分析

In the numeric study, a lot of dataset is considered, which is solid.

补充材料

I have reviewed the proofs in the supplementary material, though not line-by-line. The theoretical justifications appear to be correct and make sense.

与现有文献的关系

The Fair-PCA problem, the solution (the min-max loss approach), the convergence, and the polynomial-time property have all been studied in Samadi et al. (2018). This paper does not provide novel findings.

遗漏的重要参考文献

The paper provides a thorough review of related works.

其他优缺点

The main strengths and weaknesses have been discussed in the preceding sections. The primary issue lies in the novelty of both the task and the method examined in this work. The paper employs the exact same formulation and loss function for the Fair PCA problem as those found in the existing literature.

其他意见或建议

I have no further comments.

作者回复

2025-03-29

Thank you for your thorough review. We understand your concerns, and we believe that your feedback gives excellent input for improving the manuscript.

We begin by carefully addressing the reviewer’s main concern, i.e., the novelty of our work in the context of previous work, notably that of Samadi et al. (2018). We acknowledge that the distinction between our work and that of Samadi et al. has not been sufficiently clarified.

Different problem. Our problem and the problem studied by Samadi et al. are fundamentally different. Intuitively, given an integer $d$ , Samadi et al. seek a balanced rank- $d$ approximation of the data. The obtained solution is not related to the solutions of lower rank by any straightforward transformation. On the other hand, our problem asks for a rank- $d$ approximation where balance is also achieved in all dimensions lower than $d$ . We call this the consistency property, inspired by the SVD: given a rank- $d$ SVD, for all $d’ < d$ the rank- $d’$ approximation is also optimal.

Different loss function. We note that our min-max loss is only equal to the Samadi et al. marginal loss in the rank- $1$ sub-problem. Our final solution of rank- $d$ (by Algorithm 1), optimizes the incremental loss not the marginal loss. There is a crucial difference: the incremental loss for a rank- $d$ solution is the sum of $d$ rank- $1$ min-max losses, whereas the marginal loss directly applies the min-max criterion to a rank- $d$ solution and cannot be decomposed into separate rank- $1$ losses.

Different methodology. Our approach for obtaining a rank- $d$ solution is based on an orthonormalization procedure similar to that of the SVD (Algorithm 1) based on solving iteratively a rank- $1$ problem, instead of directly solving a rank- $d$ SDP. We highlight important differences:

The rank- $1$ sub-problem is solved by a novel primal-dual framework unlike Samadi et al. For this problem, we use a variety of methods such as Frank-Wolfe, root-finders, or SDP.
Regarding the SDP: our SDP is tailored to the rank- $1$ dual-problem only, and is based on the bidual of the rank- $1$ problem, with different constraints.
Our approach always returns a rank- $1$ solution, unlike Samadi et al., which requires one extra dimension.

Finally, we visually demonstrate that the method of Samadi et al., unlike our method, does not guarantee the consistency property through a simple example, which can be found in our repository (https://anonymous.4open.science/r/multigroupSVs-F716/notebooks/ConsistencyExample.ipynb).

Relevance of consistency in view of previous work. Consistency yields several practical advantages. Just like SVD or PCA, the user can run our method once, and then retain the desirable number of basis vectors. Instead, the method of Samadi et al. needs to be run separately for all possible values of $d$ , possibly obtaining drastically different approximations, which can be cumbersome in many applications (e.g., in extracting and selecting features for a machine-learning model).

The consistency property furthermore means that the basis vectors output by our method are meaningful, and they can be interpreted as the principal components, that is, the orthogonal directions of maximum variance when considering all groups.

In addition, as remarked in Section 1, the consistency property breaks down a large problem into several smaller problems, offering significant benefits in terms of computational efficiency and scalability.

Empirical evaluation of consistency. As rightly noted by the reviewer, the metrics do not imply the consistency on their own. The consistency is achieved by the sequential orthonormalization process described in Algorithm 1. In the experimental evaluation, to ensure an unbiased comparison, we monitor the different loss functions that are optimized by our method and the baselines. The results of the experiments in Figure 2,3 and 4 empirically demonstrate the consistency property: the errors incurred by the method of Samadi et al. can deviate drastically across groups for $d' < d$ , which is not the case for our method.

More discussion in Section 7. We agree with the reviewer; more details will be added to Section 7.

Theorem 7.1 states that Problem 1 is computationally tractable, and can be solved efficiently by any of the algorithms we propose, as will be clarified.
As for Lemma 7.2, a tight semidefinite-program relaxation is one where the optimal solution of the relaxed problem corresponds to the optimal solution of the original problem. We will expand the statement of the lemma.
Finally, in Lemma 7.3, we will explain the exact meaning of equal error in this context.

Conclusion. Our work bridges the gap between the existing literature in multigroup (or fair) low-rank approximation and the standard SVD and PCA, and, as shown by our thorough experimental evaluation, provides an unprecedented trade-off between result quality and efficiency.

审稿人评论

2025-04-09

The authors' responses during the rebuttal period address my concerns. I agree that incrementally find the leading vector like PCA and SVD is a reasonable proposal. However I will maintain my point that though Samadi et al. solved rank-d approximation without the orthogonal constraints, it is natural to perform the rank-1 approximation incrementally like PCA and SVD since their results covered the rank-1 case.

In general, I decide to raise the score to 2.

作者评论

2025-04-09

Thank you very much for the constructive discussion. We are pleased that we were able to address your concern.

While it is indeed possible to solve the rank- $1$ problem using the approach of Samadi et al., their method is not directly applicable to our setting. This key distinction motivated the development of the novel approach we introduce. To this end, we highlight several differences between our method and that of Samadi et al., which also illustrate why their approach is not well-suited to our context.

Guaranteed rank- $1$ solutions: Our method guarantees a rank- $1$ solution by design, whereas the approach of Samadi et al. requires an additional embedding dimension and can return solutions of rank $2$ . In our setting, obtaining a true rank- $1$ solution is essential.
Novel analysis: Our method is built on a new analysis of the rank- $1$ problem. Unlike the rank- $d$ case, which requires optimization over the space of rank- $d$ positive semidefinite matrices, the rank- $1$ problem admits a tractable dual formulation. This enables us to recast the problem as a parametric eigenvalue problem, offering new insights into the min-max objective. Notably, our analysis reveals a connection between the min-max criterion and the leading eigenvector of the optimal convex combination of the group-specific matrices.
Efficiency and scalability: This analysis allows us to design efficient and scalable algorithms—such as Frank-Wolfe and root-finding methods—that are particularly well-suited for the rank- $1$ case. In contrast, the SDP-based approach of Samadi et al. is significantly more computationally intensive.
Support for multiple groups: Our method seamlessly accommodates more than two groups, while the approach of Samadi et al. was primarily developed for the two-group case.

Finally, even if one were to apply the Samadi et al. SDP directly to the rank- $1$ problem, extending it to obtain a valid rank- $d$ solution remains nontrivial. Despite the simplicity of our main algorithm (Algorithm 1), it incorporates a novel and non-obvious idea: the use of sequential orthonormalization to construct rank- $d$ solutions with desirable properties, akin to those of PCA or SVD.

最终决定Reject

2025-05-01

The paper investigates the problem of finding low-rank approximation for multigroup data. It claims that the results are consistent. The main contribution of the paper seems to be the formulation of multi-group singular vector(MG-SINGULAR) problem which seems to generalize FAIR PCA. It derives a dual problem and solves it through first order method. They also try solving it through SDP.

Reviewers have raised several concerns which if addressed can make a strong paper. Indeed the rebuttal helped, but still there are no strong champions for the paper. Additionally there are few more concerns the author(s) can kindly take note. The paper does not provide any Statistical Consistency result, yet the first line in the abstract claims that they wish to tackle the problem of consistent low-rank approximation. They show that the MG_SINGULAR(Section 5) problem is non-convex. Since they solve the dual without showing strong duality it is not clear how does the dual solution (problem 2) solves the original problem. It is also disturbing to note that the MG_SINGULAR(section 5) is claimed to have been solved by SDP. This is not possible because the original problem is non-convex, unless a relaxed version of the problem is solved. This again contradicts the claim of consistency. Indeed comments are made about duality gap being non-zero(line 313-316) and also SDP relaxation(Sec 6 line 239-242). All these are interesting efforts to solve the problem approximately, which are laudable, but it contradicts the consistency claim. We believe this are all writing fixes which needs to be done.

Thus, at this point the paper is not ready for publication.