Proportional Fairness in Non-Centroid Clustering
摘要
评审与讨论
The papers study two notions of proportional fairness in clustering, namely:
- The core as introduced by Chen et al.
- Fully Justified Representation (FJR), which is a relaxation of the above and was introduced by Peters et al.
The novelty of the paper lies in studying these problems in clustering setting that you are not required to choose clusters centers. In that context the authors start by formally defining the clustering objectives. In the case of the core, they show that arbitrary objectives are not tractable, and that an adaptation of the Chen et al. algorithm can achieve satisfactory approximation results (for the average and maximum objectives). For FJR they show a cute reduction to choose-a-single-cluster problem, and by plugging in this algorithm to the Chen et al. one they once again achieve strong approximation results. This reduction even produces results for arbitrary objective functions.
优点
- Studying notions of fairness in a non-centroid context makes a lot of sense from a practical perspective
- The paper is very well-written and easy to follow.
- The theoretical results are sound and properly accompanied by an experimental evaluation.
缺点
The techniques are not particularly novel, but I don't think that's a major weakness.
问题
Some of you non-centroid objectives remind me of Single-Link HC clustering (see Optimization of Inter-group Criteria for Clustering with Minimum Size Constraints for example). Do you think that these algorithms could be easily adapted for proportional fairness?
局限性
N/A
Thank you for your detailed and constructive review. Please see our response to Reviewer YYLK regarding our technical novelty.
Regarding Single-Link HC clustering
Thank you for pointing out this family of algorithms. We considered it during our search for a better approximation to the core. Unfortunately, these algorithms fail to detect cohesive groups when we greedily merge groups with minimum spacing. To see this, consider an instance with and an even with the following distances:
- For {}, ,
- For {},
- For {},
All the remaining distances are induced by the triangle inequality.
It is not hard to see that a single-link type of algorithm will first cluster the first agents together. Then, if we want to maintain some kind of balancessness, either tha last agents will be clustered together, or otherwise the first agents will be clustered together and the last agent that is far away from any other agent will be clustered in its own cluster. In both cases, under the average cost (resp., max cost), all the agents in {} have cost at least equal to (resp. ).
However, if they form their own cluster (they are eligible to do as the size of the group is equal to ), all of them would have cost for the new cluster. Thus, the approximation to both FJR and core is (resp. ), which is significantly worst than that of GreedyCapture.
Intuitevely, while this group is very cohesive, a single-link algorithm fails to find it. In constrast, GreedyCapture would be able to find it as the ball centered at would be the first one that would capture sufficiently many agents.
This paper considers the proportional fair clustering for the non-centroid clustering. They consider three types of clustering loss for each agent in a cluster: arbitrary loss, average loss, and maximum loss. The average (maximum) loss is the average (resp. maximum) distance from this agent to other agents in the cluster.
They first consider the alpha-core for proportional fairness, which requires a clustering such that no group with size at least n/k satisfies every agent in the group has a loss at most the current loss divided by alpha. For arbitrary cost, they show an instance in which there is no alpha core clustering for any finite alpha. For average loss, they show a 1.366 lower bound and provide an O(n/k)-core clustering algorithm. For maximum loss, they show a 2-core clustering algorithm.
Then, they consider a relaxation of the core clustering, fully justified representation (FJR). It only requires the clustering satisfies there is no group with size at least n/k such that every agent in the group has a loss at most the minimum loss in its current cluster divided by alpha. They first show that for arbitrary loss, there exists a 1-FJR clustering. Then, they show an efficient algorithm that finds 4-FJR and 2-FJR for average loss and maximum loss respectively. They also show an efficient 4-approximation and 2-approximation FJR auditing algorithm for average loss and maximum loss.
优点
- The paper is well-written and easy to follow.
- The paper extends the proportional fairness to non-centroid clustering, which is reasonable and interesting. They provide interesting algorithms that achieve approximation for the core clustering and the FJR clustering.
- Their algorithm and analysis can potentially be used for other clustering problems.
缺点
Minor: Line 215: lambda_i(S’) should be l_i(S’).
问题
局限性
Yes, they mentioned the limitations.
Thank you for your review. We will implement your correction.
This paper studies proportionally fair non-centroid clustering under the fairness criteria core and fully justified representation (FJR). Both notions are well-studied in the context of centroid clustering. However, the authors claim that they are the first to consider them in the non-centroid setting.
They study the fairness criteria of the core and FJR and approximations to them under different agent-based loss functions - arbitrary (non-negative) losses - average loss (average distance of an agent to the other agents in her cluster) - maximum loss (maximum distance of an agent to the other agents in her cluster)
An -approximation to the core (or -core) is defined as a clustering where no entitled group of agents would want to deviate to a new coalition unless the loss of any agent in the new coalition would be at most an alpha-fraction of the former cost.
For the core under arbitrary loss functions, they prove approximation hardness for any constant factor. Even for the case of average loss, they show that there is no constant-factor approximation better than . On a positive side, they constructively show existence of a clustering in the O(n/k)-core under the average loss function and existence of a clustering in the 2-core under maximum loss. If the metric space is 1-dimensional, they can even prove existence of a clustering in the 1-core under maximum loss.
An alpha-FJR clustering requires that no eligible group of agents would want to deviate unless every agent in the new coalition would have at most an -fraction of the loss of the agent with the least initial loss.
They start by constructively showing existence of FJR clusterings using the slow GreedyCohesiveClustering algorithm. To achieve polynomial-time running times they give the faster GreedyCapture algorithm that, using the SmallestAgentBall subroutine, achieves 4-FJR for average loss and 2-FJR for maximum loss simultaneously. Finally, they give an auditing algorithm (AuditFJR) which returns an estimate of the approximation ratio. Plugging in the right subroutine, they get approximation factors of 4 for average loss and 2 for maximum loss. They complement this by showing hardness of approximation beyond a factor of 2 for FJR auditing under maximum loss.
In a practical evaluation, they compare the efficient GreedyCapture algorithm with k-means++ and k-medoids on real data under the objective values of average within-cluster distance, k-means, and k-medoids. Their experiments reveal that the actual FJR approximation of GreedyCapture is close to 1 in practice. Further, for any of the objective functions under consideration, they find that GreedyCapture yields values at most twice that of k-means++ and k-medoids, respectively.
优点
The paper studies an interesting set of problems and the authors claim to be the first to study the fairness criteria core and FJR for non-centroid clustering. From this point of view, it is interesting to see that the methods from centroid-clustering translate nicely to the non-centroid setting. They achieve good approximation factors for FJR. In practice, their deterministic polynomial-time algorithm GreedyCapture even beats these approximation factors (the actual approximation factor stays close to 1 on the considered real data). Overall, the paper is nicely written and well understandable.
缺点
The proposed algorithms are only slight adaptations from an algorithm proposed in [1], which was formulated for centroid clustering. It would be interesting to see whether one can think of novel approaches leveraging properties characteristic for the non-centroid setting.
Typos
- line 53: "clsutering"
- line 207: missing dashes around
- line 215: right side of the inequality: instead of
Small Remark I think it would enhance readability if you restate the theorems you prove in the appendix.
问题
- definition of the -approximate solution in lines 215-216: should the not be on the other side of the inequality?
- do your results hold for general metric spaces or only for Euclidean metric spaces? Please write this more clearly.
- what is the notion of dimension here?
- what is the formal definition of core/FJR violation?
局限性
The authors discuss limitations of their work by providing a comprehensive list of open problems.
Thank you for your detailed and constructive review. We will correct the typos and use the thm-restate package to restate theorems. And yes, it would indeed be interesting to devise entirely different techniques for non-centroid clustering (although see our response to Reviewer YYLK regarding our technical novelty). Regarding your questions:
definition of the -approximate solution in lines 215-216: should the not be on the other side of the inequality?
Yes, sorry for the typo.
do your results hold for general metric spaces or only for Euclidean metric spaces? Please write this more clearly.
The theoretical results hold for general metric spaces as all we use is the triangle inequality. (And the proof of Theorem 4 does not even use the triangle inequality, and thus holds for arbitrary non-metric losses as well!). We will emphasize this clearly.
what is the notion of dimension here?
As our results hold for general metric spaces, we have minimized the use of the term "dimension" in the paper. In Appendix C, we provide stronger results for the special case of a 1-dimensional Euclidean line metric.
what is the formal definition of core/FJR violation?
The core (resp., FJR) violation of a clustering is the smallest for which it is in the -core (resp., -FJR). In other words, the core (resp., FJR) violation of a clustering is the smallest for which there exists a group of agents of size at least such that, if they form their own cluster, the loss of each of them is lower by a factor of at least than their own loss before deviation (resp., than the minimum loss of any of them before deviation). We will clarify this in our revision.
Thank you for your response. All my questions were answered and I decided to increase my score.
This paper considers the proportional fairness in non-centroid clustering, where the loss of an agent is a function of the other agents in its cluster. It is the first work to study the proportional fairness guarantee in non-centroid clustering. It is interesting to consider the non-centroid clustering under the core and its relaxation. For the non-centroid clustering, authors study three different loss functions including arbitrary loss, average loss and maximum loss, respectively. For arbitrary loss, authors prove that no finite approximation of the core can be guaranteed. For the average and maximum loss, authors present a greedy capture algorithm that achieve -core and 2-core in time, respectively. The greedy capture algorithm is an adaptation of the algorithm developed for the centroid clustering under proportional fairness constraints. Additionally, for the fully justified representation (FJR), which is the relaxation of the core, authors provide constant approximations returned by the greedy capture algorithm. Moreover, authors turn to auditing the FJR approximation of a given clustering, and show that the used same technique achieves a constant approximation of FJR, which can be used to estimate the FJR approximation of any given clustering, up to a constant factor. Experiments on real data show that traditional clustering algorithms are highly unfair, whereas the greedy capture algorithm is considerably fairer and incurs only a modest loss in common clustering objectives.
优点
1.This paper introduces the non-centroid clustering under two proportional fairness criteria, i.e., the core and its relaxation, and provide some approximation algorithms.
2.For the FJR, the approximation results show that while it is a slight relaxation of the core, it is satisfiable even under arbitrary loss functions, whereas the core can be unsatisfiable even under more structured loss functions.
缺点
1.For the average loss, the -core loss returned by the greedy capture algorithm is large, compared with the 2-core of the maximum loss.
-
The practical background of the non-centroid clustering under proportional fairness constraints is unknown. And, it is unfair that in experiments, authors compare the proposed algorithms with -means++ and k-medoids, which cannot obtain solutions satisfying proportional fairness.
-
The running time of algorithm is not compared in experiments, and the datasets have small sizes.
问题
Q1: As the authors mentioned, in other applications of clustering, i.e., federated learning, there are no “cluster centers”. What is the practical background in studying the non-centroid clustering of fairness?
Q2: The authors give the first work studying the proportional fairness in non-centroid clustering. Is it fair to compare the proposed method with centroid-based methods, i.e., -means++, in experiments? Therefore, the experimental result shown in Figure 1 is normal, in which the greedy capture method is significantly better than both -means++ and k-medoids for the Census Income dataset.
Q3: As the authors mentioned, the proposed algorithm of this paper is an adaptation of the algorithm developed for the centroid clustering in [1]. The technique used in this paper is similar to that of [1], with a slight difference in stopping a ball as soon as it captures agents. What are the technical novelties of this paper? Please highlight it.
Q4: Why not compare the running time of algorithms in experiments? Why the experiments are conducted on smaller datasets? Is it possible to test on larger datasets, e.g., millions or even hundreds of millions?
Q5: The one of the main results of this paper is an -core for the average loss by the proposed greedy capture algorithm. Can the -core be improved using other technique?
局限性
The broader impacts have been discussed in the last paragraph of the paper and I don’t expect any negative societal impacts for the proposed methods.
Thank you for your detailed and constructive review.
Regarding Q1 (practical background of fairness in non-centroid clustering)
- Fairness has been studied in clustered federated learning [1-3], where the idea is to be fair to the agents during the grouping. But fairness has also been studied in federated learning more broadly [4-5], where the idea is still the same: making sure that the model learned is fair to the agents. In our work, the loss of an agent for her group serves as an estimate of the loss of the agent for the model that will be learned by her group.
- Team formation is another relevant application where fairness has been studied practically [6-7].
[1] A Fair Federated Learning Framework based on Clustering. J Zhang, A Ye, Z Yao, M Sun. IEEE NaNA, 2022.
[2] Proportional Fair Clustered Federated Learning. M. Nafea, E. Shin, A. Yener. IEEE ISIT, 2022.
[3] A cluster-based solution to achieve fairness in federated learning. F. Zhao, Y. Huang, A.M.V.V. Sai, Y Wu. IEEE ISPA, 2020.
[4] Fairness in federated learning via core-stability. B. R. Chaudhury, L. Li, M. Kang, B. Li, R. Mehta. NeurIPS, 2022.
[5] FairFed: Enabling Group Fairness in Federated Learning. Y.H. Ezzeldin, S. Yan, C. He, E. Ferrara, A.S. Avestimehr. AAAI, 2023.
[6] Partitioning friends fairly. L. Li, E. Micha, A. Nikolov, N. Shah. IJCAI, 2023.
[7] Relaxed Core Stability in Fractional Hedonic Games. A. Fanelli, G. Monaco, L. Moscardelli. IJCAI, 2021.
Regarding Q2 (comparison to -means++ and -medoids)
While -means++ and -medoids are usually described as centroid clustering algorithms, both can be viewed as non-centroid clustering algorithms as well as they minimize the sum of the pairwise distances and squared pairwise distances of points in the same clusters, respectively.
Also, the goal of our experiments is not to demonstrate that GreedyCapture is fairer than k-means++ and k-medoids. The latter two are unfair because they're designed for accuracy, not fairness. It is, therefore, unsurprising that GreedyCapture performs better in fairness metrics while k-means++ and k-medoids perform better for accuracy metrics. What is surprising is how little accuracy GreedyCapture has to sacrifice to gain significantly more levels of fairness compared to the other two algorithms.
Regarding Q3 (technical novelties)
The technical novelties of the paper lie along three axes.
- Algorithmic: While you are correct that our GreedyCapture is only slightly different from that in [1], our auditing algorithm AuditFJR (Appendix A.5) is entirely novel. The fact that a technique designed to design a fair clustering algorithm can also be used to audit the fairness of any clustering algorithm is quite surprising to us.
- Upper bounds: Even for the core approximation guarantee of GreedyCapture, which is studied in [1] for centroid clustering, its analysis for non-centroid clustering (Proof of Theorem 3) is very different. For the centroid case, the center to which each agent is assigned plays a crucial role in the analysis. In our case, the argument involves the distance of an agent from any other agent in a deviating group and from any other agent to the clustering that they included under the algorithm (see, for example, lines 473 and 481). Proof of its FJR satisfaction is also novel. In particular, for the average cost n Lemma 1, we use two individuals in deviating group that are sufficiently far away from each other to lower bound the cost of at least one of them for S. This is a completely novel argument.
- Lower bounds: Our lower bounds and their constructions we derive for non-centroid clustering share no similarity to those for centroid clustering in [1].
Regarding Q4 (running times)
We did not include running times because our focus was on the fairness-accuracy tradeoff. That said, in the attached PDF file (see the common response), we include the running times.
Regarding small datasets, all three algorithms are extremely fast and can scale easily to millions of data points. As we mention in line 325, the bottleneck in the experiments is measuring the exact core and FJR violation of these algorithms, which we do using an integer linear program that does not scale. For FJR, we could use the auditing algorithm we discuss in Section 4.3, but for the core, it is quite unclear whether its violation can be measured faster.
Regarding Q5 (core approximation)
This is an extremely intriguing question and one that we spent a lot of time and effort on. Some of us believe that a constant approximation should be possible, and the rest believe that should be the best possible. We were able to find an example for every technique we could think of, but no general construction worse than that in Theorem 2.
Thank you for your response. All my questions are resolved and I will increase my score accordingly.
We thank all the reviewers for their useful feedback. We attach a PDF file with the running times of the three different algorithms.
This paper studies how to adapt the notion of proportional fairness to the setting of non-centroid clustering, and new nontrivial algorithmic results are given. As also agreed by the reviewers, the new notion is natural and is the major novelty of this paper which may have broader impact. The algorithmic result is nice to have, although it is not technically strong.
Overall, I recommend an acceptance due to its potential impact in the fair clustering research.