On Socially Fair Low-Rank Approximation and Column Subset Selection
摘要
评审与讨论
This paper considers the fair low-rank approximation and fair column subset selection problem. These two problems are similar; they aim to select a subset of vectors that optimize the algorithm's performance across all sub-populations. Formally, given matrices A and matrices B, the goal is to choose k vectors that span these B matrices such that the maximum distance between A and B is minimized. The distance is defined as some norm function between these matrices. Such a problem has applications in feature selection, which is one of the most important topics in machine learning.
The main contributions of this work are (1) a -approximation algorithm running in O(2^n); (2) using the similar key idea, one can obtain an -approximation running in polynomial time for column selection. The authors also show a lower bound that achieving constant approximation requires exponential running time under the ETH conjecture. Besides these theoretical results, the authors also provide a set of experimental results.
优点
-
The studied problem is well-motivated and it has wide application in machine learning. I believe that it should be interesting in the ML community.
-
This is a technical solid paper. I like the high-level idea of the algorithm. Namely, it starts from some "bad" solution, and then repeatedly decreases the approximation until the ratio becomes . Checking feasibility requires some new ideas.
-
The paper is well-written. I appreciate that the authors provide sufficient intuition and high-level descriptions for algorithms in Section 1.1. They are very helpful.
缺点
The main weakness is that the -approximation runs in exponential time, while for the column selection algorithm, its running time is polynomial but the approximation ratio is linear in k. I understand that there is a lower bound under ETH conjecture, but this running time strictly restricts the applications to the algorithm. Maybe it is more suitable to study the parallel algorithms for these problems.
问题
Is it possible to make the proposed algorithms run in parallel?
局限性
This is a theoretical paper, there are no potential negative societal impacts.
The main weakness is that the -approximation runs in exponential time while for the column selection algorithm, its running time is polynomial but the approximation ratio is linear in . I understand that there is a lower bound under ETH conjecture, but this running time strictly restricts the applications to the algorithm. Maybe it is more suitable to study the parallel algorithms for these problems... Is it possible to make the proposed algorithms run in parallel?
Thanks for the suggestion. It could be possible to make the proposed algorithms run in parallel, but due to our hardness results, the total runtime for a -approximation algorithm must still be exponential, across all servers. That said, this is a very interesting future direction.
Additionally, in most practical instances, is usually a small constant. Therefore, for these instances our algorithm provides a polynomial time constant factor approximations.
I'd like to thank the authors for replying to my concerns. I will keep my score unchanged. Please add some descriptions for the scenarios where k is a small constant.
The paper considers low-rank approximation and column selection in a specific setting to which authors refer as socially fair setting. Basically, given matrices , they consider the problem of approximating solution of . They first show that this problem is in general NP-hard, and then propose two algorithms that achieve time complexity polynomial in and either exponential in (rank) for approximation, or polynomial in for an approximation with multiplicative constant polynomial in . Lastly, proposed algorithm for low rank approximation is then used for column subset selection, where similar guarantees are obtained.
优点
-
This paper has a few interesting and nontrivial contributions. The way these are presented is also very good - starting from computationally infeasible (in polynomial time) results, to less time demanding algorithms.
-
I believe the authors made a nice connection with fairness, and this clearly increases significance of the paper. Nonetheless, I believe this is a fundamental problem of multi-matrix approximation and I appreciate derived results even from pure theoretical perspective.
-
It seems to be the first paper considering socially fair low-rank matrix approximation and related topics.
-
Although the main results and their analysis are not simple, I believe the authors made great efforts to make it as accessible as possible.
缺点
- There are a lot of things to understand in Introduction i.e. section 1.1. Although it gets easier once you are done with it, I would prefer keeping some of the details of section 1.1 until section 3. Furthermore, there is a significant overlap between the two sections, so merging them might give you more space for explanations in the main.
问题
-
According to Theorem 1.3 the runtime is given by where . You present this result for fixed number of groups and as a function of and . But in the case when all matrices are (approximately) low rank (say of rank ) then it does not make sense to have ? So, in this case we are equally interested in dependence on , right?
-
Notation in line (286) is confusing - I am not sure if is defined at that point, and how becomes in Lemma 4.3. In addition, it seems like , so why are there these superscripts?
-
In the setting when , your column selection algorithm has the same (or higher) time complexity as SVD, but has multiplicative factor in front of the approximation error. Thus, in this setting your algorithm is as complex as the optimal algorithm, but achieves worse performance. But, in any setting , it is not clear how to combine subspaces obtained from SVD to achieve small fair low-rank approximation error. Is this correct?
-
Could you please explain how you choose rank for bicriteria approximation in Section 5? Namely, at the end of Section 1.1 you say that bicriteria algorithm performs better "even when the bicriteria algorithm is not allowed a larger rank than the baseline". Later in Section 5 you choose bicriteria solution to have rank . I find this a bit confusing, especially since I thought was the rank...
Other comments
- missing definition of in the abstract
- there is a typo in Lemma 3.5, on the right hand side of the last inequality it should be and not .
- in Section 5 I did not find definition of the costs ratio you have plotted (I believe it is given in Appendix), so please add it there.
局限性
The authors have addressed limitations adequately.
There are a lot of things to understand in Introduction i.e. section 1.1. Although it gets easier once you are done with it, I would prefer keeping some of the details of section 1.1 until section 3. Furthermore, there is a significant overlap between the two sections, so merging them might give you more space for explanations in the main.
We will apply your suggestion in the next version of our paper, thank you.
According to Theorem 1.3 the runtime is given by where . You present this result for fixed number of groups and as a function of and . But...in the case when all matrices are (approximately) low rank (say of rank ) then it does not make sense to have ?...So, in this case we are equally interested in dependence on , right?
Even in this case, we believe that the focus can be on the input parameter . In particular, in the case where all matrices have rank at most , the input value of should not be set larger than . Indeed, given a more reasonable input parameter of that does not trivialize the problem, the algorithmic runtime dependency on is still quite important.
Notation in line (286) is confusing - I am not sure if is defined at that point, and how becomes in Lemma 4.3. In addition, it seems like , so why are there these superscripts?
Thanks for the question, there is a slight typo -- the expression in Line 286 should be . We are selecting columns of the overall matrix , which also corresponds to the same columns for each group . Note that is defined in this optimization problem as the minimizing factor. Given the selection of the columns induced by , then is the matrix that minimizes the low-rank cost for . We then use a change of notation, the column selection matrix becomes the column sampling matrix and becomes , so that the matrices differ across the groups. We will fix the typo and unify the notation in the next version of the paper.
In the setting when , your column selection algorithm has the same (or higher) time complexity as SVD, but has multiplicative factor in front of the approximation error...Thus, in this setting your algorithm is as complex as the optimal algorithm, but achieves worse performance. But, in any setting , it is not clear how to combine subspaces obtained from SVD to achieve small fair low-rank approximation error. Is this correct?
Note that even for , SVD does not translate to column subset selection since the top singular values generally do not correspond to elementary vectors (i.e., columns of the matrix), though SVD does give the optimal solution for low-rank approximation. More generally, it is not even clear how to combine subspaces acquired from SVD for the socially fair low-rank approximation with .
Could you please explain how you choose rank for bicriteria approximation in Section 5?...Namely, at the end of Section 1.1 you say that bicriteria algorithm performs better ``even when the bicriteria algorithm is not allowed a larger rank than the baseline''. Later in Section 5 you choose bicriteria solution to have rank . I find this a bit confusing, especially since I thought was the rank...
We perform multiple experiments in Section 5. In Figure 1a, the bicriteria algorithm is not allowed a larger rank than the baseline. In Figure 1b, the bicriteria algorithm is permitted rank , where is the rank of the baseline.
Thank you for your reply. I believe this is a strong paper and I will maintain my score.
The authors investigate socially-fair low-rank approximation and column subset selection problems. The concept of socially-fair (in the context of clustering problems this fairness notion is well studied) is introduced when the input matrix rows can be partitioned resulting in sub-matrices, the goal is to find a low-rank matrix of rank that minimizes the maximum reconstruction error or loss. In the socially-fair column subset selection problem, the task is to select columns from the input matrix that minimize the maximum reconstruction error. The quality of the solution (error) is measured by the residual norm when the input matrix is projected onto the subspace spanned by the selected columns.
First they show that fair low-rank approximation is NP-hard to approximate to any constant factor, further they strengthen their inapproximability claim by showing that it is not possible to find an algorithm with running time that gives a constant factor approximation. Note that these results only apply for low-rank approximation and not column-subset selection (correct me if I am wrong)---these proofs/construction was easier to follow and I have verified in sufficient detail, they appear to be correct. They proceed to present approximation algorithms, asserting that these algorithms run in time . I have not been able to verify these proofs myself.
优点
The paper presents approximation algorithms for the socially-fair low-rank approximation and column subset selection problem, which are both important from the context of dimensionality reduction and algorithmic fairness perspective.
缺点
The paper seems to be written for a specialized audience deeply embedded in the field, rather than for a general audience. The authors frequently cite lemmas and theorems from previous papers without providing sufficient explanations or context, assuming readers already have extensive background knowledge. This approach can alienate those who are not experts in the niche area of matrix approximations. The writing style is convoluted, making it challenging to follow the arguments and understand the implicit assumptions. The authors should consider that not everyone working on these topics is an expert with a long history in the field, and make an effort to present the material more clearly and accessibly.
I was not able to understand the paper in sufficient detail to provide the best possible review. I cannot confirm that the proofs are correct, as I could not verify them without clarifying several unclear aspects of the paper. However, if the authors can address my questions in sufficient detail, I am willing to review the paper again and update my assessment.
问题
Line 58: , what operations does symbol denote?
Line 82: Why does the naive algorithm require and not ? Can you clarify this precisely? Which specific naive algorithm are you referring to here?
Line 88-92: I'm having trouble understanding this. Could you explain what this value of is and how it relates to Theorem 1.3 in simpler terms? Once this is clear, the subsequent statements will be easier to follow. The theorem statement does not mention the term ; how does it connect to ? What do you mean when you say is feasible?
Line 131: Could you please provide a precise definition of the fair column subset selection problem, or indicate where it is defined in the paper, before discussing algorithmic results for the problem?
Line 183: Is the problem studied by Matakos et al. 2023 the same or "similar" to the problem addressed in this paper? What are the specific differences between them? Specifically, does this paper generalize the problem defined on two groups by Matakos et al. to more than two groups, or is it a different notion of fairness altogether?
Line 206 - 228: It is redundant to repeat the exact same text as outlined in our contributions and technical overview. I am not sure if this is the most efficient way to utilize the space by reiterating these statements multiple times, there is nothing new, the explanation is again in high-level without any details.
Line 286: What is the relationship between matrix and ? Where is defined in the paper? Without clarifying this, I cannot proceed to verify the subsequent details. Are you selecting as a column matrix of or ? I am concerned that the dimensions of matrices may not match with the current problem formulation.
局限性
No discussion on potential negative social impact.
The paper seems to be written for a specialized audience deeply embedded in the field, rather than for a general audience. The authors frequently cite lemmas and theorems from previous papers without providing sufficient explanations or context, assuming readers already have extensive background knowledge. This approach can alienate those who are not experts in the niche area of matrix approximations. The writing style is convoluted, making it challenging to follow the arguments and understand the implicit assumptions. The authors should consider that not everyone working on these topics is an expert with a long history in the field, and make an effort to present the material more clearly and accessibly.
We emphasize that to provide context, the previous version already includes simple intuitive explanations prior to each lemma and theorem statement, even those from previous papers. Moreover, the nature of our result is theoretical and, in principle, builds on technical results in the area of randomized numerical linear algebra. However, we will provide more background and preliminaries in the next version of our paper.
Line 58: , what operations does symbol denote?
denotes vertical concatenation of the rows, c.f., line 542 in the preliminaries.
Line 82: Why does the naive algorithm require and not ? Can you clarify this precisely? Which specific naive algorithm are you referring to here?
The algorithm we were referring to is a brute force search over a net on all subsets of factors, which for constant dimension would require time (and even worse for super-constant dimension). Moreover, it should be noted that a runtime of would still fall under runtime, i.e., the polynomial would be linear.
Line 88-92: I'm having trouble understanding this. Could you explain what this value of is and how it relates to Theorem 1.3 in simpler terms? Once this is clear, the subsequent statements will be easier to follow. The theorem statement does not mention the term ; how does it connect to ? What do you mean when you say is feasible?
is simply a guess for a -approximation to the optimal solution. The theorem does not need to mention the term because the algorithm makes a small number of guesses, one of which must be feasible, i.e., a -approximation to the optimal solution for which there exists factors that realize the fair low-rank approximation cost.
Line 183: Is the problem studied by Matakos et al. 2023 the same or ``similar'' to the problem addressed in this paper? ... What are the specific differences between them? Specifically, does this paper generalize the problem defined on two groups by Matakos et al. to more than two groups, or is it a different notion of fairness altogether?
Matakos et al. (2023) study fair column subset selection, which is a specific type of low-rank approximation. Thus, their focus is more narrow than ours. They present an algorithm that can only achieve fairness for two groups, which may be inapplicable for many applications. Therefore, it is accurate to say that our setting for the specific problem of socially fair column subset selection generalizes the problem they study, although the techniques in the corresponding algorithms are quite different. To emphasize, we both study the same notion of fairness.
Line 206 - 228: It is redundant to repeat the exact same text as outlined in our contributions and technical overview...I am not sure if this is the most efficient way to utilize the space by reiterating these statements multiple times, there is nothing new, the explanation is again in high-level without any details.
This section expands on the algorithmic outline given in the technical overview to provide more intuition. While we understand that some details may appear repetitive, this is necessary to maintain coherence and clarity for the reader, especially when transitioning between different sections of the paper. That said, we will use the extra page in the introduction to address your comments. Additionally, we provide the full details in Algorithms 1 and 2 on Line 238.
Line 286: What is the relationship between matrix and ?...Where is defined in the paper? Without clarifying this, I cannot proceed to verify the subsequent details. Are you selecting as a column matrix of or ? I am concerned that the dimensions of matrices may not match with the current problem formulation.
Ah thanks, there is a slight typo -- the expression in Line 286 should be and hence the dimensions of the matrices match the current problem formulation. We are selecting columns of the overall matrix , which also corresponds to the same columns for each group . Note that is defined in this optimization problem as the minimizing factor. Given the selection of the columns induced by , then is the matrix that minimizes the low-rank cost for .
["slight typo" in Line 286] In the current formulation of the submitted paper, the dimensions of the matrices do not match, which the authors have casually dismissed this as a "slight typo." This, however, is a significant error, raising concerns about whether the proofs in Section 4 were thoroughly read and verified. Given that the objective being minimized was not accurately stated, it is unclear how one could have validated the claimed approximation ratios and algorithmic results in that section.
[improving writing] The authors do not seem fully committed to revising the writing to enhance the accessibility of their research, as the descriptions currently lack clarity. In my view, the paper requires significant revision to be accessible to a wider audience. Unless the authors address these concerns and demonstrate a strong commitment to improving the writing, I would maintain my current evaluation.
Thank you for the follow-up questions! We hope the following clarifications address your points; if not, we would also be happy to continue discussing any possible concerns.
["slight typo" in Line 286] In the current formulation of the submitted paper, the dimensions of the matrices do not match, which the authors have casually dismissed this as a "slight typo." This, however, is a significant error, raising concerns about whether the proofs in Section 4 were thoroughly read and verified. Given that the objective being minimized was not accurately stated, it is unclear how one could have validated the claimed approximation ratios and algorithmic results in that section.
We would like to emphasize that the objective function precisely matches both the informal description of the problem, i.e., "to select columns", and the subsequent analysis. In particular, the cost precisely matches the cost in Lemma 4.3 up to a change of notation (where is and is ). Note that and for all , so that has dimension , which is the same as the dimension of . Therefore, the dimensions match.
For the sake of consistency, we further remark that Reviewer CrGT raised a similar question about the objective and we provided a response that exactly matches these details, i.e., see the response to Reviewer CrGT (https://openreview.net/forum?id=EO1Qev952p¬eId=A1p5yN7fG0)
[improving writing] The authors do not seem fully committed to revising the writing to enhance the accessibility of their research, as the descriptions currently lack clarity. In my view, the paper requires significant revision to be accessible to a wider audience. Unless the authors address these concerns and demonstrate a strong commitment to improving the writing, I would maintain my current evaluation.
We apologize that our initial response conveys this sentiment. We are fully committed to producing a manuscript that is accessible to the general research community and in fact, we have already incorporated the explicit suggestions from the initial reviews. We are also taking additional more thorough passes to improve the overall exposition of the paper, as well as preparing a "full version" with the complete details and additional intuition in the main body. We hope these updates will address the remaining concerns.
I thank authors for their responses.
After carefully checking the rebuttal and considering the comments from other reviewers, I went through the paper again. Unfortunately, I still find it challenging to follow the arguments and writing due to inconsistent and confusing notation. For example, the matrix is sometimes referred to as , with corresponding changes from to , and the column matrix is inconsistently labeled as , , or across different sections. is used before it is defined. In Lemma 3.5, is not defined; one has to go to the proof of the Lemma in Line 725 to even known what is or refer back to the statement in Theorem 1.4. Such inconsistency makes the paper difficult to read and understand. Also, the equations are poorly organized and spread across multiple lines. I often had to rewrite these equations on a paper to make sense of them.
I acknowledge that the paper may have strong theoretical contributions, and if all the claimed results are correct, they are significant. However, despite my experience with similar topics, I struggled to follow the proof arguments due to unclear writing and constant change in notation. The authors should recognize that this level of writing is not ideal and likely falls short of the standard while reviewing. Sections 3, 4, and 5 are particularly problematic. Even after multiple clarifications, these sections remain difficult to read and understand. While the statements and approaches appear correct at a high level, verifying the proofs becomes tedious due to the way the it is presented, making it difficult to check the details precisely.
Being said that, I am not fully confident about the correctness of the proofs, so I will retain my original evaluation. To be clear, my evaluation is based on the difficulty in validating the claims due to the imprecise writing. If the theoretical contributions are correct, this work is a valuable addition in the study of fair variants of matrix approximation problems. I hope other reviewers have been able to thoroughly check the proofs in ways I could not.
Regarding the proof arguments in Section 5, they seem to hold at a high level. Also, Matakos et al. (2023) have established a lower bound on the number of columns, , and choosing fewer columns than this would make it unbounded. Therefore, the results for column subset selection are only meaningful if (a similar comment also noted by reviewer CrGT in the context of Theorem~1.3).
Thank you for both the continued correspondence and the time for taking an additional pass over our paper. We really appreciate the extra effort by the referee toward the overall review process -- our paper will certainly benefit from the thorough feedback.
...For example, the matrix is sometimes referred to as , with corresponding changes from to and the column matrix is inconsistently labeled as , , or across different sections. is used before it is defined. In Lemma 3.5, is not defined; one has to go to the proof of the Lemma in Line 725 to even known what is or refer back to the statement in Theorem 1.4.
For what it's worth, our intention is to use as the input matrix to the problem and as an input matrix for leverage score sampling, the matrices as the input groups, and the matrices as outputs to Algorithm 4. We will unify and as the column selection matrix, and our intention is to use as a set of rank- factors that need not be restricted to a subset of columns. We will make the implicit definition of in Line 286 more explicit. Finally, we agree that the trade-off parameter is not clearly stated prior to Lemma 3.5. We apologize for the confusion -- we will clarify the purpose of in the statement of Lemma 3.5, as well as the surrounding context.
Also, Matakos et al. (2023) have established a lower bound on the number of columns, , and choosing fewer columns than this would make it unbounded. Therefore, the results for column subset selection are only meaningful if (a similar comment also noted by reviewer CrGT in the context of Theorem~1.3).
Actually, while Matakos et al. (2023) considers the same intuitive goal of choosing columns to minimize the maximum cost, their cost function is normalized by , where is the best rank- approximation to . Therefore, the lower bound of Matakos et al. (2023) does not apply to our objective.
For example, it can be shown that returning the best columns for our column subset selection objective on the entire matrix is an -approximation to the fair column subset selection problem. Thus, it is not necessary to choose columns and more generally, our results for column subset selection do actually hold across all settings of .
I acknowledge that the paper may have strong theoretical contributions, and if all the claimed results are correct, they are significant. However, despite my experience with similar topics, I struggled to follow the proof arguments due to unclear writing and constant change in notation. The authors should recognize that this level of writing is not ideal and likely falls short of the standard while reviewing. Sections 3, 4, and 5 are particularly problematic. Even after multiple clarifications, these sections remain difficult to read and understand. While the statements and approaches appear correct at a high level, verifying the proofs becomes tedious due to the way the it is presented, making it difficult to check the details precisely.
Thank you for your positive assessment of the theoretical contributions. We emphasize that we are fully committed to improving the overall presentation of the paper, so that it is accessible to the general research community.
[Lower bound on number of columns: Actually, while Matakos et al. (2023) considers ] I agree with the authors on this point, and they are correct.
The paper studies the socially-fair variants of low-rank approximation and column subset selection. The authors prove hardness results similar to those in the literature for the non-fair versions of the problems. They then propose exponential time close-to-optimal solution for socially-fair low rank approximation and polynomial time bi-criteria approximation algorithms for both the problems.
优点
- The paper studies two important and relevant problems.
- The socially-fair variants are established notions of fairness in clustering, dimension reduction etc., and applies very well to both low rank approximation and column subset selection, neither of which are solved in the literature.
- The theoretical analysis is well done.
- The experimental results show that the algorithms are practical enough to use in the real-world applications.
缺点
The paper is not easy to read. The introduction may be shortened.
问题
Are there any open problems related to socially-fair low rank approximation and subset selection not addressed in the paper? If yes, I request the authors to mention some.
局限性
Limitations adequately addressed by the authors.
The paper is not easy to read. The introduction may be shortened.
The nature of our result is theoretical and, in principle, builds on technical results in the area of randomized numerical linear algebra. However, we will provide more background and preliminaries in the next version of our paper.
Are there any open problems related to socially-fair low rank approximation and subset selection not addressed in the paper? If yes, I request the authors to mention some.
Thank you for your suggestions. We will add related open problems in the next version of our paper, including both technical questions and more general fairness-aware optimization of data summarization methods. For example, a natural future direction is the efficient construction of -coresets with minimal size for socially fair subset selection.
We thank the reviewers for their thoughtful comments and valuable feedback. We also appreciate the positive remarks, such as:
- The paper studies two important and relevant problems. (Reviewer icAD)
- The theoretical analysis is well done. (Reviewer icAD)
- The experimental results show that the algorithms are practical enough to use in the real-world applications. (Reviewer icAD)
- The paper presents approximation algorithms for the socially-fair low-rank approximation and column subset selection problem, which are both important from the context of dimensionality reduction and algorithmic fairness perspective. (Reviewer YsQa)
- This paper has a few interesting and nontrivial contributions. The way these are presented is also very good - starting from computationally infeasible (in polynomial time) results, to less time demanding algorithms. (Reviewer CrGT)
- I believe the authors made a nice connection with fairness, and this clearly increases significance of the paper. Nonetheless, I believe this is a fundamental problem of multi-matrix approximation and I appreciate derived results even from pure theoretical perspective. (Reviewer CrGT)
- It seems to be the first paper considering socially fair low-rank matrix approximation and related topics. (Reviewer CrGT)
- Although the main results and their analysis are not simple, I believe the authors made great efforts to make it as accessible as possible. (Reviewer CrGT)
- The studied problem is well-motivated and it has wide application in machine learning. I believe that it should be interesting in the ML community. (Reviewer aAWQ)
- This is a technical solid paper. I like the high-level idea of the algorithm. Namely, it starts from some "bad" solution, and then repeatedly decreases the approximation until the ratio becomes . Checking feasibility requires some new ideas. (Reviewer aAWQ)
- The paper is well-written. I appreciate that the authors provide sufficient intuition and high-level descriptions for algorithms in Section 1.1. They are very helpful. (Reviewer aAWQ)
We provide our responses to the initial comments of each reviewer separately below. We hope our answers addresses all points raised by the reviewers and we will be most happy to answer any remaining or additional questions during the discussion phase!
The reviewers agree that this paper studies important problems that lie in the intersection of algorithmic fairness and low-rank approximation. They acknowledged that the paper is technically solid and novel, and also considered the exposition of the techniques to be reasonably good. Experimental results serve as a good addition to the theory. Although one reviewer still has concerns on writing consistencies in the discussion stage, overall the pros significantly outweigh the cons.