Fairness on Principal Stratum: A New Perspective on Counterfactual Fairness
摘要
评审与讨论
This study addresses an important question about which attributes and individuals should be protected. It proposes principal counterfactual fairness based on the concepts of principal stratification and counterfactual fairness. Theoretical analysis of principal counterfactual fairness is provided. In practice, a CPDAG is learnt from data using the PC algorithm in the causal-learn package. Experiments were conducted on synthetic data and one real dataset.
给作者的问题
Algorithmic fairness is a complex topic. Counterfactual fairness and its variants are excellent ideas. However, in practice, it can be challenging to correctly infer causal relationships from data. In addition, our knowledge about specific applications is often incomplete. How does the proposed approach handle this?
论据与证据
The results on the synthetic data is good. The evidence from the real dataset could be stronger.
方法与评估标准
The real dataset used in this study lacks ground truth, making evaluation and judgment challenging.
理论论述
Yes.
实验设计与分析
Can the authors demonstrate the proposed ideas on more datasets, such as German Credit, Adult, and COMPAS available at https://ashryaagr.github.io/Fairness.jl/dev/datasets/. Feel free to choose other datasets if the above ones are not appropriate.
补充材料
Yes.
与现有文献的关系
This study contributes to algorithmic fairness, especially it enchances counterfactual fairness.
遗漏的重要参考文献
No.
其他优缺点
No.
其他意见或建议
No
Thank you for your valuable feedback and the time dedicated to reviewing our work. We address your concerns and questions as follows.
Can the authors demonstrate the proposed ideas on more datasets?
Thank you for pointing this issue! We follow your suggestion to add extensive experiments comparing our method to more baselines on two new datasets: Law and UCI Adult.
| Law | PCF ↑ on (%) | CF ↑ on all individuals (%) | Accuracy ↓ on all individuals (%) |
|---|---|---|---|
| CF [1] | 3.15 ± 0.80 | 5.13 ± 0.74 | 3.28 ± 0.85 |
| CF Rep. [2] | 1.71 ± 0.51 | 1.18 ± 0.47 | 1.89 ± 0.32 |
| PSCF [3] | 1.84 ± 0.42 | 1.21 ± 0.41 | 2.07 ± 0.48 |
| Principle Fairness [4] | 2.60 ± 0.39 | 4.37 ± 0.65 | 2.05 ± 0.21 |
| Quantile CF [5] | 2.34 ± 0.20 | 2.64 ± 0.31 | 2.19 ± 0.23 |
| DCEVAE [6] | 4.01 ± 1.16 | 5.58 ± 0.87 | 2.81 ± 0.53 |
| Ours | 5.54 ± 1.19 | 3.85 ± 0.90 | 1.97 ± 0.38 |
| UCI Adult | PCF ↑ on (%) | CF ↑ on all individuals (%) | Accuracy ↓ on all individuals (%) |
|---|---|---|---|
| CF [1] | 2.89 ± 0.86 | 4.42 ± 1.10 | 2.60 ± 0.70 |
| CF Rep. [2] | 2.30 ± 1.14 | 0.67 ± 0.82 | 1.64 ± 1.00 |
| PSCF [3] | 1.61 ± 1.22 | 1.31 ± 0.87 | 1.13 ± 0.54 |
| Principle Fairness [4] | 2.62 ± 1.28 | 3.12 ± 0.94 | 2.12 ± 0.63 |
| Quantile CF [5] | 1.79 ± 0.40 | 1.56 ± 0.43 | 2.24 ± 1.22 |
| DCEVAE [6] | 3.34 ± 1.07 | 4.67 ± 1.25 | 3.23 ± 1.66 |
| Ours | 4.45 ± 1.36 | 3.38 ± 0.93 | 1.85 ± 0.78 |
- Key observation 1: Our method improves PCF more significantly compared to the original CF metric, in which PCF only focuses but CF focuses on all individuals.
- Key observation 2: Existing CF methods wouldn’t perform better than the proposed approach on the PCF metric.
- Key observation 3: Our post-processing approach exhibits very competitive performance in terms of the trade-off between fairness and accuracy -- our PCF results are the best with only a slight decrease in accuracy.
In addition, we find it's meaningful to add experiments to analyze the power of our proposed test -- "what is the likelihood that an algorithm violating PCF can pass this test (also known as sensitivity)?" on the above two new datasets. The results are shown as below.
| Law | Sensitivity ↑ | Specificity ↑ |
|---|---|---|
| OR | 0.67 ± 0.12 | 1.00 ± 0.00 |
| IPS | 0.72 ± 0.10 | 1.00 ± 0.00 |
| DR | 0.71 ± 0.10 | 1.00 ± 0.00 |
| UCI Adult | Sensitivity ↑ | Specificity ↑ |
|---|---|---|
| OR | 0.79 ± 0.12 | 1.00 ± 0.00 |
| IPS | 0.77 ± 0.12 | 1.00 ± 0.00 |
| DR | 0.81 ± 0.12 | 1.00 ± 0.00 |
The above experimental results align with our theoretical claims for our proposed PCF test in Sec. 4.1, i.e., our PCF test is necessary so that the false positive rate is 0. We also empirically show that our PCF test has a relatively low false negative rate.
In practice, it can be challenging to correctly infer causal relationships from data.
Thank you for raising this concern. We would like to clarify that our method for imposing PCF does not require causal discovery. Considering there is no DAG in real-word datasets, we implement causal discovery to first obtain CPDAG, then sample a DAG as the ground truth for simulating the counterfactuals.
Note that our proposed method does not require a known DAG (or even a CPDAG). Instead, the only assumption we make is the ignorability assumption in line 232, i.e., , meaning that there is no unobserved confounders. We would also like to remark that it is natural to extend our approach to further relax this assumption, such as using sensitivity analysis [7] in causal inference literature. We leave this for future work, considering the orthogonality of these two issues.
Lastly, motivated by the reviewer that obtaining an accurate DAG from observational data is challenging, we note that recent studies have been focused on how to achieve CF with partial DAGs [8, 9]. It would be useful to incorporate these works, but it should still be noted that our approach does not try to infer causal relationships from data.
Please let us know if you have further questions -- thank you so much!
References
[1] Kusner, Matt J., et al. Counterfactual fairness. NeurIPS, 2017.
[2] Zuo, Zhiqun, et al. Counterfactually fair representation. NeurIPS, 2023.
[3] Chiappa, Silvia. Path-specific counterfactual fairness. AAAI, 2019.
[4] Imai, Kosuke, and Zhichao Jiang. Principal fairness for human and algorithmic decision-making. Statistical Science, 2023.
[5] Plečko, Drago, et al. fairadapt: Causal reasoning for fair data preprocessing. Journal of Statistical Software, 2024.
[6] Kim, Hyemi, et al. Counterfactual fairness with disentangled causal effect variational autoencoder. AAAI, 2021.
[7] Fawkes, Jake, et al. The Fragility of Fairness: Causal Sensitivity Analysis for Fair Machine Learning. NeurIPS, 2024.
[8] Zuo, Aoqi, et al. Counterfactual fairness with partially known causal graph. NeurIPS, 2022.
[9] Li, Haoxuan, et al. A Local Method for Satisfying Interventional Fairness with Partially Known Causal Graphs. NeurIPS, 2024.
What if there is bias in data (i.e., the observed outcomes are biased)? How will this and the ignorability assumption affect Definitions 4 & 5 and the computational results?
What if there is bias in data (i.e., the observed outcomes are biased)? How will this and the ignorability assumption affect Definitions 4 & 5 and the computational results?
Thanks for your comments and sorry for our late response (due to extensive additional experiments)!!
Bias in Data (i.e., the observed outcomes are biased), Ignorability Assumption, and How These Affect Definitions 4 & 5
- Bias in Data: Denote ground-truth outcomes as and observed biased outcomes as , we consider following 3 types of biases:
- Random Classification Noise (RCN) [1]: ;
- Class-conditional Noise (CCN) [2]: ;
- Instance-dependent Noise (IDN) [3,4,5]: ;
- Ignorability Assumption: The violation of the ignorability assumption is same as the presence of unmeasured confounding;
- How These Affect Definitions 4 & 5:
- These would not change Definitions 4 and 5 in any way!
- Because these definitions will always be defined using clean labels, and the ignorability assumption will only challenge the identification results of PCF.
- Instead, what is interesting is how our PCF method and baselines can be computational affected in the presence of label noise or (and) unmeasured confounding.
Computational Results
Experiment Setup
- For RCN, we set . For CCN and IDN, to ensure fair comparison, we set class- and instance-dependent such that the average noise rate is 0.2.
- For unmeasured confounding, we randomly mask 25% covariates on the Law and Adult datasets.
Experiment Results
(a) With biased observed outcomes only:
| Law | PCF ↑ on (%) | CF ↑ on all individuals (%) | Accuracy ↓ on all individuals (%) |
|---|---|---|---|
| CF + RCN | 3.57 ± 1.85 | 3.14 ± 1.91 | 4.79 ± 1.66 |
| Ours + RCN | 3.63 ± 1.56 | 3.03 ± 1.45 | 2.17 ± 0.93 |
| CF + CCN | 3.47 ± 1.12 | 4.18 ± 1.72 | 2.39 ± 0.70 |
| Ours + CCN | 4.32 ± 2.34 | 3.62 ± 1.81 | 0.96 ± 0.25 |
| CF + ICN | 2.48 ± 0.81 | 4.42 ± 2.06 | 1.96 ± 0.35 |
| Ours + ICN | 3.97 ± 2.08 | 4.64 ± 1.92 | 0.87 ± 0.62 |
| UCI Adult | PCF ↑ on (%) | CF ↑ on all individuals (%) | Accuracy ↓ on all individuals (%) |
|---|---|---|---|
| CF + RCN | 2.10 ± 0.84 | 1.56 ± 1.11 | 1.71 ± 1.64 |
| Ours + RCN | 2.41 ± 1.56 | 2.27 ± 1.45 | 0.89 ± 0.34 |
| CF + CCN | 0.54 ± 0.28 | 1.89 ± 0.63 | 0.66 ± 0.47 |
| Ours + CCN | 1.45 ± 0.73 | 1.34 ± 0.90 | 0.50 ± 0.19 |
| CF + ICN | 3.53 ± 0.81 | 2.76 ± 2.06 | 2.58 ± 1.30 |
| Ours + ICN | 4.73 ± 2.23 | 2.41 ± 1.89 | 1.88 ± 0.81 |
(b) With unmeasured confounding only:
| Law | PCF ↑ on (%) | CF ↑ on all individuals (%) | Accuracy ↓ on all individuals (%) |
|---|---|---|---|
| CF | 2.58 ± 1.09 | 3.42 ± 1.03 | 4.08 ± 1.31 |
| Ours | 4.03 ± 1.51 | 2.97 ± 1.20 | 2.11 ± 0.87 |
| UCI Adult | PCF ↑ on (%) | CF ↑ on all individuals (%) | Accuracy ↓ on all individuals (%) |
|---|---|---|---|
| CF | 1.32 ± 0.31 | 2.06 ± 0.56 | 2.69 ± 1.46 |
| Ours | 2.30 ± 1.05 | 1.86 ± 0.81 | 3.10 ± 1.89 |
(c) With both unmeasured confounding and biased observed outcomes:
| Law | PCF ↑ on (%) | CF ↑ on all individuals (%) | Accuracy ↓ on all individuals (%) |
|---|---|---|---|
| CF + RCN | 2.17 ± 0.86 | 3.28 ± 1.37 | 4.56 ± 1.72 |
| Ours + RCN | 4.15 ± 2.41 | 2.07 ± 1.25 | 2.24 ± 1.07 |
| CF + CCN | 2.50 ± 0.76 | 2.62 ± 0.87 | 2.77 ± 1.36 |
| Ours + CCN | 3.73 ± 1.27 | 2.46 ± 1.42 | 1.46 ± 0.96 |
| CF + ICN | 3.84 ± 2.68 | 4.80 ± 2.14 | 0.84 ± 0.30 |
| Ours + ICN | 6.04 ± 2.68 | 4.50 ± 1.74 | 0.22 ± 0.32 |
| UCI Adult | PCF ↑ on (%) | CF ↑ on all individuals (%) | Accuracy ↓ on all individuals (%) |
|---|---|---|---|
| CF + RCN | 4.63 ± 3.01 | 3.99 ± 1.84 | 4.83 ± 2.25 |
| Ours + RCN | 5.07 ± 3.15 | 5.97 ± 2.43 | 2.98 ± 1.27 |
| CF + CCN | 1.20 ± 0.97 | 4.92 ± 1.45 | 1.98 ± 1.24 |
| Ours + CCN | 6.18 ± 2.04 | 4.43 ± 2.08 | 2.23 ± 1.76 |
| CF + ICN | 3.88 ± 2.17 | 5.22 ± 2.69 | 2.83 ± 1.15 |
| Ours + ICN | 6.50 ± 2.74 | 4.72 ± 1.39 | 1.38 ± 1.09 |
From the above results, we demonstrate our method stably outperforms CF in the presence of biased observed outcomes or (and) unmeasured confounding.
We would be highly appreciate if the you may kindly consider to upgrade your score to our work -- thank you so much!!
References
[1] Angluin, Dana, and Philip Laird. Learning from noisy examples. Machine Learning, 1988.
[2] Liu, Tongliang, and Dacheng Tao. Classification with noisy labels by importance reweighting. TPAMI, 2015.
[3] Cheng, Jiacheng, et al. Learning with bounded instance and label-dependent label noise. ICML, 2020.
[4] Berthon, Antonin, et al. Confidence scores make instance-dependent label-noise learning possible. ICML, 2021.
[5] Yang, Shuo, et al. Estimating Instance-dependent Bayes-label Transition Matrix using a Deep Neural Network. ICML, 2022.
This paper introduces principal counterfactual fairness (PCF), a novel measure of fairness which enforces (to my understanding) that; if a sensitive attribute A did not have a causal effect on an outcome Y for an individual, then our prediction of Y should likewise not be causally influenced by that sensitive attribute. The reason this is important, is that there are cases where we want our decisions to be dependent on a sensitive attribute (such as when predicting an ability score, we would want to use data on that persons disabilities), but dependent in the right way. For example, if the disability did not affect the ability we are measuring for this person, then we shouldnt penalise them for this. This is in contrast to traditional counterfactual fairness, which demands that the prediction is not caused by the sensitive attribute, regardless of if that attribute causes the outcome Y or not. The authors present a formal definition of PCF and provide statistical bounds along with an optimization-based evaluation framework to verify fairness conditions. They provide a theoretical analysis and empirical validation through experiments with synthetic and real data.
update after rebuttal
Following rebuttal, my concerns have been resolved by the clarifications proposed by the authors (especially those that make it clear, in a graphical sense, when their proposed measure is non-trivial or doesnt reduce to counterfactual fairness), and I am recommending acceptance.
给作者的问题
-
Is the above interpretation of PCF accurate?
-
Assume we can exclude exogenous confounders from the set of endogenous variables {A, D, Y, X}. Can you give examples what are the general graphical conditions for which PCF is distinct from CF? Ideally, specfiying a DAG.
-
Can you come up with a simple SCM describing a scenario where PCF and CF give different answers? Ideally, where PCF gives the more intuitive result.
-
Can you provide some exposition on theorem 2?
论据与证据
PCF is a compelling idea, and their definition is sound and captures what they intended. The result could be more clearly explained and motivated however. For example, when first introducing the athlete / disability example, the authors could clearly state what the desired outcome is. Some athletes may have a disability A = 1, but this does not necessarily always cause them to be below the threshold performance (Y = 0). For example, the athlete may have found ways to overcome their disability with specific training (observed in X). If they would have the same Y regardless of A, then A shouldnt influence our prediction.
The general claim, that “‘if some factor didnt influence my outcome then it shouldnt influence your prediction”, feels quite general, and the paper could be improved by more motivating examples beyond the disability example.
The main issue with the paper is that ignorability (a standard assumption) is not discussed at all. There should be a proper discussion of what it means in this context, and references to papers discussing the assumption and giving it context (e.g. [1]).
The implications of ignorability for the applicability of the result should also be discussed. As I understand it, you are assuming that X contains all confounders between A and {D, Y}. Are you assuming that D is conditioned on all X? If so, then this assumption restricts the result a lot, as the most interesting cases (that do not reduce to standard counterfactual fairness) are where for some sub-population A does not cause Y, but A and Y are correlated via a confounder W. The issue is that conditioning on W breaks this backdoor path, and excluding any W removes the novelty of the result. So it appears the result is interesting in cases where there are endogenous confounders W between A and Y (noting that in most settings the inputs to the algorithmic decision D are fully observed, in which case assumption 1 reduces to there being no unobserved confounders between A and Y).
[1] Fawkes, Jake, Robin Evans, and Dino Sejdinovic. "Selection, ignorability and challenges with causal fairness." Conference on Causal Learning and Reasoning. PMLR, 2022.
方法与评估标准
The experimental evaluation seems thorough. I would encourage the authors to also present their results in the SCM formalism. It would not require much effort, and in settings where you have knowledge of the underlying structual equations, you can directly evaluate PCF without having to rely on bounds. Even a toy example with and SCM would improve the paper, especially if it could be use to highlight the kinds of settings for which PCF differs from CF.
理论论述
The theoretical results appear sound, though I have not checked the appendices in depth.
实验设计与分析
The authors show their post-processing approach effectively improves PCF, demonstrating practical applicability. The subgroup analyses highlight how fairness violations vary depending on contextual covariates. While the validation is limited to the OULAD dataset, I think this is reasonable given that primary contribution of the paper which is theoretical.
补充材料
There is a brief appendix detailing the proofs, which I have not checked in detail.
与现有文献的关系
The authors provide a thorough review of related fairness measures which they use to situate and motivate their results.
遗漏的重要参考文献
Ignorability, and its application to causal fairness.
[1] Rosenbaum, Paul R., and Donald B. Rubin. "The central role of the propensity score in observational studies for causal effects." Biometrika 70.1 (1983): 41-55. [2] Pearl, Judea. "Generalizing experimental findings." Journal of Causal Inference 3.2 (2015): 259-266. [3] Fawkes, Jake, Robin Evans, and Dino Sejdinovic. "Selection, ignorability and challenges with causal fairness." Conference on Causal Learning and Reasoning. PMLR, 2022.
其他优缺点
The paper is clearly written, and after some thinking the fairness measure the authors are proposing is appealing, but needs to be better explained and motivated, and the impact of this result will be more clear to the reader once the affect of the ignorbility assumption is properly discussed. But ultimately, I dont think ignorability is necessary for PCF to be applicable.
其他意见或建议
NA
Can you come up with a simple SCM describing a scenario where PCF and CF give different answers? Ideally, where PCF gives the more intuitive result.
Thank you for the constructive suggestion to help us improve the readability of our paper!
First, we define PCF within the SCM framework, which is equivalent to the potential outcome framework used in our original manuscript.
- For notations, , , and are defined the same as in CF, and denotes the decision-making from , , and , which broadens the in CF;
- In SCM, and , same as and in CF;
- PCF requires only for individuals with , i.e., has no effect on , whereas CF requires to be satisfied for all individuals;
- The main challenges are how to identify individuals with (see Judea Pearl’s "Principal Stratification – A Goal or a Tool?" for more details within SCM, especially Fig. 1 and Table 1), and how to enforce CF on these individuals, instead of all individuals.
Next, we follow the reviewer's suggestion to provide a very simple SCM satisfying PCF but violating original CF.
- Consider , and ( is perfect prediction of );
- In this way, when setting , we always have and , that is, ;
- When setting , we have , that is, and ;
- For the joint distribution, we thus have (violating CF) and (satisfying CF);
- The former half violates CF, due to , but PCF doesn't care these individuals due to . While the latter half satisfies CF, due to ;
- As a result, this SCM satisfies PCF defined on individuals , but violates the original CF defined on all individuals;
- To further enforce CF, the learned predictor need to change its predictions on the individuals, which will inevitably sacrifice the accuracy of the predictor due to the updated CF predictions on these individuals (recap that meets PCF).
Assume we can exclude exogenous confounders from the set of endogenous variables {A, D, Y, X}. Can you give examples what are the general graphical conditions for which PCF is distinct from CF? Ideally, specfiying a DAG.
- The above specified DAG provides a valid example for which PCF is distinct from CF, then we discuss the general graphical conditions for which PCF is distinct from CF.
- Intuitively, as the reviewer noted, the most interesting cases (that do not reduce to standard counterfactual fairness) are where for some sub-population A does not cause Y and for the rest sub-population A does cause Y.
- As an extreme case, if A never cause Y, then we always have , making PCF degenerates to CF. Instead, if A cause Y for all individuals, then PCF degenerates to no fairness constraint.
More discussion on the ignorability assumption
- The reviewer is correct that we assume X contains all confounders between A and {D, Y}, but we don't assume that D is conditioned on all X.
- The ignorability assumption only assumes that all confounders are observed, instead of assuming all confounders are conditioned by D and making the backdoor path be blocked.
- We would like to kindly remark that it's natural to extend our approach to avoid the usage of the ignorability assumption, such as using sensitivity analysis [1] in causal inference literature. We leave this for future work, considering the orthogonality of these two issues.
- We thank the reviewer for pointing out this issue and referring us many insightful references, we will definitely cite and discuss them in our final version.
Interpretation of Theorem 2
- Theorem 2 shows the proposed DR estimator can unbiasedly estimate in Sec. 4.1 with large samples.
We are eager to hear your feedback. We’d deeply appreciate it if you could let us know whether your concerns have been addressed.
Validation is limited to the OULAD dataset
- We kindly ask the reviewer to refer to the rebuttal we provide to Reviewer M8WL, in which:
- We add extensive experiments comparing our method to more baselines on two new datasets: Law and UCI Adult;
- We also add experiments to analyze the power of our proposed test -- "what is the likelihood that an algorithm violating PCF can pass this test (also known as sensitivity)?" on the above two new datasets.
Reference
[1] Fawkes, Jake, et al. The Fragility of Fairness: Causal Sensitivity Analysis for Fair Machine Learning. NeurIPS, 2024.
The authors have done a great job of answering all my questions, and I appreciate the SCM example which I think will improve the clarity of the paper for people who are more used to this formalist. I think this paper should be accepted so am increasing my score.
Thank you for your kind words and for standing that this paper should be accepted. We will definitely include the mentioned SCM formalist and example to enlarge the impact of our work -- thank you so much!
This paper introduces Principal Counterfactual Fairness (PCF) and proposes to unify two approaches,
- Principal Stratification : Frangakis, C. E., & Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics, 58(1), 21-29. In their 2002 paper "Principal Stratification in Causal Inference," Frangakis and Rubin introduce a framework to address the challenges of adjusting for posttreatment variables in causal studies. They propose the concept of principal stratification, which involves classifying subjects based on the joint potential values of a posttreatment variable under each treatment being compared. This classification creates principal strata that are unaffected by treatment assignment, allowing for the estimation of causal effects within these strata, termed principal effects. (see also Judea Pearl’s 2011 "Principal Stratification – A Goal or a Tool?")
- Counterfactual Fairness : Kusner, M. J., Loftus, J., Russell, C., & Silva, R. (2017). Counterfactual fairness. Advances in neural information processing systems, 30. In their 2017 paper, "Counterfactual Fairness", Kusner et al. introduce a formal framework for evaluating fairness in machine learning models by using the concept of counterfactuals. The central idea is that a model is fair if its predictions do not depend on sensitive attributes, like race or gender, in a way that would change under hypothetical counterfactual scenarios.
Here, the authors introduce some new fairness criterions (Principal Counterfactual Parity, Principal Counterfactual Equalized Odds, and Principal Conditional Counterfactual Fairness) based on principal stratification from causal inference. It refines counterfactual fairness by ensuring fairness only for individuals whose protected attributes have no individual causal effect on the outcome. They derive statistical bounds to assess whether an algorithm satisfies Principal Counterfactual Fairness, they propose an optimization-based evaluation method that detects fairness violations by solving feasibility constraints, they develop a post-processing approach that minimally adjusts algorithmic decisions to enforce fairness while preserving accuracy and finally, they use doubly robust estimation techniques to ensure efficient estimation of fairness constraints.
给作者的问题
None
论据与证据
The paper clearly defines PCF and shows how it extends existing fairness definitions using principal stratification. The paper derives statistical bounds for verifying fairness, ensuring a solid mathematical foundation, it provides necessary conditions for fairness violations are rigorously formulated using probability constraints and finally, the doubly robust estimation approach ensures reliable estimation under specific assumptions.
The authors present an optimization-based approach that adjusts decisions with minimal changes to ensure fairness, and theoretical proofs confirm the optimality of the post-processing adjustments. Furthermore, the study includes both synthetic and real-world datasets (OULAD dataset), improving credibility. In the last section, some performance metrics (Counterfactual Fairness and Principal Counterfactual Fairness) show measurable improvements post-adjustment.
Nevertheless, other claims are not based on convincing evidences. For instance, "Principal Counterfactual Fairness is the best way to define fairness in causal settings" : if the paper makes a strong case for PCF, it does not compare its approach against alternative fairness frameworks, such as path-specific counterfactual fairness (Chiappa, 2019), fairadapt based on quantile regressions (Plečko, Bennett & Meinshausen, 2024), sequential transport on graphs (Fernandes Machado, Gallic & Charpentier, 2025) or principal fairness (Imai & Jiang, 2023).
Fernandes Machado, A., Charpentier, A., & Gallic, E. (2024). Sequential conditional transport on probabilistic graphs for interpretable counterfactual fairness. In Proceedings of the AAAI conference on artificial intelligence (Vol. 37).
Imai, K., & Jiang, Z. (2023). Principal fairness for human and algorithmic decision-making. Statistical Science, 38(2), 317-328.
Plečko, D., Bennett, N., & Meinshausen, N. (2024). fairadapt: Causal reasoning for fair data preprocessing. Journal of Statistical Software, 110, 1-35.
The claim "The proposed optimization method reliably detects fairness violations" is not clear. The paper proves necessary conditions for fairness violations but does not establish their sufficiency due to partial identifiability issues. Thus, even if a violation is detected, it is unclear whether the algorithm is truly unfair or if the bound is too loose.
Also, "Post-processing ensures fairness with minimal impact on accuracy", but the paper does not report accuracy metrics before and after fairness adjustments (there are accuracy trade-offs, which are not discussed in details in the paper).
方法与评估标准
Yes. The paper derives statistical bounds to evaluate fairness violations, ensuring a rigorous methodology. Synthetic experiments allow the authors to validate fairness constraints in controlled settings. The real-world dataset (OULAD – Open University Learning Analytics Dataset) provides a practical benchmark for fairness in education, aligning well with fairness applications in admissions and grading. And the use of doubly robust (DR) estimation improves the reliability of fairness assessments (this method is commonly used in causal inference and helps handle estimation biases)
But as mentioned above, the study does not compare PCF against other causal fairness definitions, and the study does not report accuracy trade-offs after fairness corrections. Finally, the fairness constraints rely on statistical bounds, meaning they cannot fully determine whether an algorithm is truly unfair. I have reviewed the key proofs supporting the theoretical claimsThis limitation is acknowledged in the paper, but further discussion on how to improve identifiability would be valuable.
理论论述
I have reviewed the key proofs supporting the theoretical claims.
The ignorability assumption (Assumption 1) is a classical assumption, usefull to derive theoretical results, but it is very strong and may not always hold in real-world scenarios. The sufficiency of these bounds for detecting unfairness is not guaranteed, meaning fairness violations could still occur without detection.
In Section 4.1, the authors claim that the fairness condition is violated if the feasible domain of optimization constraints is empty or if a principal stratum’s probability is negative. The proof relies on the correct estimation of potential outcomes, which is partially identifiable from data, meaning that fairness violations may not always be detected accurately (false positives/negatives possible).
And finally, in the doubly robust estimation part (Theorem 2 & 3), the doubly robust estimator provides asymptotically consistent estimates of fairness violations. The proof is correct, but real-world applications may suffer from model misspecification issues.
实验设计与分析
Yes. As already discussed, there are no baseline fairness measures (e.g., standard Counterfactual Fairness) reported for comparisons. And there are no statistical robustness checks (confidence intervals, significance tests), which makes it hard to assess reliability.
补充材料
QuicklyA
与现有文献的关系
The paper brings Principal Stratification into Fairness Research, which could be seen as a novel contribution, it refines Counterfactual Fairness by applying fairness constraints only where appropriate, and it develops an optimization-based post-processing fairness intervention.
遗漏的重要参考文献
As mentioned in the introduction, the most import related references are Frangakis & Rubin (2002) abd Kusner et al. (2017)
But (at least) two important references are missing :
Imai, K., & Jiang, Z. (2023). Principal fairness for human and algorithmic decision-making. Statistical Science, 38(2), 317-328.
Kilbertus, N., Ball, P. J., Kusner, M. J., Weller, A., & Silva, R. (2020). The sensitivity of counterfactual fairness to unmeasured confounding. In Uncertainty in artificial intelligence (pp. 616-626). PMLR.
imai & Jiang (2023) is one of the first to use principal stratification in algorithmic fairness, it would be nice to explain differences between the two approaches
Kilbertus et al. (2020) studies how fairness constraints are affected by unmeasured confounders. Since principal stratification accounts for hidden heterogeneity, so citing this work would clarify the link between PCF and confounders.
There might also be connexions with
Zuo, Z., Khalili, M., & Zhang, X. (2023). Counterfactually fair representation. Advances in Neural Information Processing Systems, 36, 12124-12140.
Rosenblatt, L., & Witter, R. T. (2023, June). Counterfactual fairness is basically demographic parity. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 12, pp. 14461-14469).
其他优缺点
Unlike standard Counterfactual Fairness (Kusner et al., 2017), this method ensures fairness only in relevant subgroups, making it more context-sensitive.
Unfortunately, no comparison to alternative fairness definitions are considered. Only one real-world dataset (OULAD) is used, and having other popular datasets (adult, law) in the Supplementary material could have been interesting
其他意见或建议
None
Thank you for your encouraging words and valuable feedback! Below, we address your questions and indicate the changes we’ve made thanks to your suggestion.
Unfortunately, no comparison to alternative fairness definitions are considered. Only one real-world dataset (OULAD) is used. Lack of reporting accuracy metrics before and after fairness adjustments (trade-offs).
Thank you for pointing this issue! We follow your suggestion to add extensive experiments comparing our method to more baselines on two new datasets: Law and UCI Adult.
| Law | PCF ↑ on (%) | CF ↑ on all individuals (%) | Accuracy ↓ on all individuals (%) |
|---|---|---|---|
| CF | 3.15 ± 0.80 | 5.13 ± 0.74 | 3.28 ± 0.85 |
| CF Rep. | 1.71 ± 0.51 | 1.18 ± 0.47 | 1.89 ± 0.32 |
| PSCF | 1.84 ± 0.42 | 1.21 ± 0.41 | 2.07 ± 0.48 |
| Principle Fairness | 2.60 ± 0.39 | 4.37 ± 0.65 | 2.05 ± 0.21 |
| Quantile CF | 2.34 ± 0.20 | 2.64 ± 0.31 | 2.19 ± 0.23 |
| DCEVAE | 4.01 ± 1.16 | 5.58 ± 0.87 | 2.81 ± 0.53 |
| Ours | 5.54 ± 1.19 | 3.85 ± 0.90 | 1.97 ± 0.38 |
| UCI Adult | PCF ↑ on (%) | CF ↑ on all individuals (%) | Accuracy ↓ on all individuals (%) |
|---|---|---|---|
| CF | 2.89 ± 0.86 | 4.42 ± 1.10 | 2.60 ± 0.70 |
| CF Rep. | 2.30 ± 1.14 | 0.67 ± 0.82 | 1.64 ± 1.00 |
| PSCF | 1.61 ± 1.22 | 1.31 ± 0.87 | 1.13 ± 0.54 |
| Principle Fairness | 2.62 ± 1.28 | 3.12 ± 0.94 | 2.12 ± 0.63 |
| Quantile CF | 1.79 ± 0.40 | 1.56 ± 0.43 | 2.24 ± 1.22 |
| DCEVAE | 3.34 ± 1.07 | 4.67 ± 1.25 | 3.23 ± 1.66 |
| Ours | 4.45 ± 1.36 | 3.38 ± 0.93 | 1.85 ± 0.78 |
- Key observation 1: Our method improves PCF more significantly compared to the original CF metric, in which PCF only focuses but CF focuses on all individuals.
- Key observation 2: Existing CF methods wouldn’t perform better than the proposed approach on the PCF metric.
- Key observation 3: Our post-processing approach exhibits very competitive performance in terms of the trade-off between fairness and accuracy -- our PCF results are the best with only a slight decrease in accuracy.
In Section 4.1, the fairness violations may not always be detected accurately (false positives/negatives possible).
- Theoretically, the power of this test depends on the 8 identifiable probabilities Recap that , by setting , let be the unit polyhedron in the 12-dim space with total edge length 1, and denote the linear transformation in Sec. 4.1 from the 12-dim non-zero to the above 8-dim identifiable probabilities as . Then the power of this test is .
- Empirically, we add experiments reporting the sensitivity and specificity of our PCF test using various estimators.
| Law | Sensitivity ↑ | Specificity ↑ |
|---|---|---|
| OR | 0.67 ± 0.12 | 1.00 ± 0.00 |
| IPS | 0.72 ± 0.10 | 1.00 ± 0.00 |
| DR | 0.71 ± 0.10 | 1.00 ± 0.00 |
| UCI Adult | Sensitivity ↑ | Specificity ↑ |
|---|---|---|
| OR | 0.79 ± 0.12 | 1.00 ± 0.00 |
| IPS | 0.77 ± 0.12 | 1.00 ± 0.00 |
| DR | 0.81 ± 0.12 | 1.00 ± 0.00 |
The above experimental results align with our theoretical claims for our proposed PCF test in Sec. 4.1, i.e., our PCF test is necessary so that the false positive rate is 0. We also empirically show that our PCF test has a relatively low false negative rate.
More discussion on the two important references
We appreciate your insightful comments!
-
Compared with Imai & Jiang (2023), we highlight the following differences.
- For estimands, they focuses on , while we focused on , resulting in different assumptions and techniques.
- For assumptions, they further assume monotonicity holds, i.e., , while we only assumes ignorability, which can be relaxed by leveraging the sensitivity analysis such as (Fawkes, Jake, et al., NeurIPS 24).
- For techniques, their framework is more like instrumental variable or mediation analysis (see Fig. 1 in Imai & Jiang), while we take advantage from linear programming used by (A. Li and J. Pearl, AAAI 22 & 24).
-
For Kilbertus et al. (2020), we remark that
- Principle stratification can be regarded as an unmeasured confounder (see J. Pearl's “Principal Stratification - A Goal or a Tool?” for details), and Kilbertus et al. (2020) studies how fairness constraints are affected by unmeasured confounders;
- Our paper proposes PCF built on this unmeasured confounder, thus we can compare the difference in performance between PCF and original CF benefiting from Kilbertus et al. (2020).
Please let us know if you have further questions -- thank you so much!
I confirm my Overall Recommendation, this paper should be accepted
Thank you for confirming your Overall Recommendation. We are glad you support the acceptance of our paper!
In this paper, the authors propose a new fairness notion called principle Counterfactual Fairness (PCF). The motivation behind this notion is that algorithms only need to be fair to individuals whose protected attribute has no individual effect on the outcome of interest. The authors derive necessary conditions to assess whether an algorithm satisfies principle CF and propose a corresponding optimization-based evaluation method. They also introduce a post-processing algorithm to adjust unfair decisions. The effectiveness of the algorithm is validated on synthetic datasets and one real-world dataset.
给作者的问题
- What is the relationship between the proposed notion and Individual Fairness?
- In the conclusion section, the authors suggest that causal discovery might be helpful for PCF. Could the authors elaborate a bit more on this?
- Could the authors provide some interpretation of Theorem 2, which seems to be missing in the current draft?
论据与证据
Yes.
方法与评估标准
Yes.
理论论述
The theorem seems correct to me, but I didn't check all the proofs.
实验设计与分析
The experiment setup makes sense to me.
补充材料
I didn't check most of the appendix.
与现有文献的关系
This work proposed a novel definition of Counterfactual Fairness.
遗漏的重要参考文献
N/A
其他优缺点
My major concerns are as below
Relationship to CF
- Overall, I find it challenging to compare the proposed PCF with the original CF in [1]. Is it possible to define principle CF within the SCM framework? More specifically, what would the causal relationships between , , and look like in a causal graph?
Experiment
- My primary concern is about what the authors aim to justify through the empirical study and whether they achieve that goal. Regarding the first question, would it be useful to test previous CF methods to assess if they are indeed too restrictive for PCF? Regarding the second question, in the current draft, the authors show that their post-processing algorithm can improve CF and PCF. How can we be sure that existing fairness or CF methods wouldn’t perform better than the proposed approach?
- Could the authors provide justification for their choice of dataset? For instance, why not use datasets like Law or UCI Adult, which are more commonly used in CF literature [1][2][3][4]?
- What is the motivation behind using a causal discovery algorithm and creating subgroups based on its results? This part is unclear to me, particularly since the definition of PCF is not based on an SCM.
Significance
I cannot fully acknowledge the significance of the proposed framework due to the following reasons:
- As mentioned above, there is confusion about the comparison between CF and PCF.
- Concerns about the experimental design and its ability to justify the proposed framework’s or method’s contributions.
- It appears that the authors only provide a necessary condition for an algorithm to satisfy PCF, which relies on the minimum and maximum values of the solution to an optimization problem. I’m uncertain about the effectiveness of this criterion—what is the likelihood that an algorithm could pass this test without truly satisfying PCF? This aspect is not discussed, either theoretically or empirically.
[1] Kusner, M. J., Loftus, J., Russell, C., & Silva, R. (2017). Counterfactual fairness. Advances in neural information processing systems, 30.
[2] Zuo, Z., Khalili, M., & Zhang, X. (2023). Counterfactually fair representation. Advances in Neural Information Processing Systems, 36, 12124-12140.
[3] Kim, H., Shin, S., Jang, J., Song, K., Joo, W., Kang, W., & Moon, I. C. (2021, May). Counterfactual fairness with disentangled causal effect variational autoencoder. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 9, pp. 8128-8136).
[4] Rosenblatt, L., & Witter, R. T. (2023, June). Counterfactual fairness is basically demographic parity. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 12, pp. 14461-14469).
其他意见或建议
N/A
Thank you for your valuable feedback and the time dedicated to reviewing our work. We address your concerns and questions as follows.
Comparison between CF and PCF
- We can define PCF within the SCM framework, which is equivalent to the potential outcome framework used in our original manuscript.
- For notations, , , and are defined the same as in CF [1], and denotes the decision-making from , , and , which broadens the in CF [1]. In SCM, and , so as and in CF [1].
- For fairness metrics, PCF requires only for individuals with , i.e., has no effect on , whereas CF requires to be satisfied for all individuals.
- The main challenges are how to identify individuals with (see Judea Pearl’s "Principal Stratification – A Goal or a Tool?" for more details within SCM, especially Fig. 1 and Table 1), and how to enforce CF on these individuals, instead of all individuals.
More experiments on common datasets
- Motivated by the reviewer, we add experiments comparing more CF methods on the suggested Law and UCI Adult datasets.
| Law | PCF ↑ on (%) | CF ↑ on all individuals (%) |
|---|---|---|
| CF [1] | 3.15 ± 0.80 | 5.13 ± 0.74 |
| CF Rep. [2] | 1.71 ± 0.51 | 1.18 ± 0.47 |
| PSCF | 1.84 ± 0.42 | 1.21 ± 0.41 |
| Principle Fairness | 2.60 ± 0.39 | 4.37 ± 0.65 |
| Quantile CF | 2.34 ± 0.20 | 2.64 ± 0.31 |
| DCEVAE [3] | 4.01 ± 1.16 | 5.58 ± 0.87 |
| Ours | 5.54 ± 1.19 | 3.85 ± 0.90 |
| UCI Adult | PCF ↑ on (%) | CF ↑ on all individuals (%) |
|---|---|---|
| CF [1] | 2.89 ± 0.86 | 4.42 ± 1.10 |
| CF Rep. [2] | 2.30 ± 1.14 | 0.67 ± 0.82 |
| PSCF | 1.61 ± 1.22 | 1.31 ± 0.87 |
| Principle Fairness | 2.62 ± 1.28 | 3.12 ± 0.94 |
| Quantile CF | 1.79 ± 0.40 | 1.56 ± 0.43 |
| DCEVAE [3] | 3.34 ± 1.07 | 4.67 ± 1.25 |
| Ours | 4.45 ± 1.36 | 3.38 ± 0.93 |
-
Key observation 1: Our method improves PCF more significantly compared to the original CF metric, in which PCF only focuses but CF focuses on all individuals.
-
Key observation 2: Existing CF methods wouldn’t perform better than the proposed approach on the PCF metric.
Causal discovery in experiment
- Our method for imposing PCF does not require causal discovery. Considering there is no DAG in real-word datasets, we implement causal discovery to first obtain CPDAG, then sample a DAG as the ground truth for simulating the counterfactuals.
Necessary condition for testing PCF
- Thank you for your insightful question! Below we supplement both theory and experiments to analyze the power of our proposed test—what is the likelihood that an algorithm violating PCF can pass this test (also known as sensitivity)?
- Theoretically, the power of this test depends on the 8 identifiable probabilities Recap that , by setting , let be the unit polyhedron in the 12-dim space with total edge length 1, and denote the linear transformation in Sec. 4.1 from the 12-dim non-zero to the above 8-dim identifiable probabilities as . Then the power of this test is .
- In fact, it be proved that such an optimization-based approach can obtain the tightest upper and lower bounds of , indicating the optimality of this test.
- Empirically, we add experiments reporting the sensitivity and specificity of our PCF test using various estimators.
| Law | Sensitivity ↑ | Specificity ↑ |
|---|---|---|
| OR | 0.67 ± 0.12 | 1.00 ± 0.00 |
| IPS | 0.72 ± 0.10 | 1.00 ± 0.00 |
| DR | 0.71 ± 0.10 | 1.00 ± 0.00 |
| UCI Adult | Sensitivity ↑ | Specificity ↑ |
|---|---|---|
| OR | 0.79 ± 0.12 | 1.00 ± 0.00 |
| IPS | 0.77 ± 0.12 | 1.00 ± 0.00 |
| DR | 0.81 ± 0.12 | 1.00 ± 0.00 |
Causal discovery might be helpful for PCF
- PCF enforce CF, it would be also interesting to enforce path-specific CF (Chiappa, 2019) on some individuals rather than all. To achieve this, the learned causal diagram via causal discovery can help to estimate the principal strata direct effect (PSDE) to identify these individuals (see also Sec. 4.1 of Pearl’s "Principal Stratification – A Goal or a Tool?").
Interpretation of Theorem 2
- Theorem 2 shows the proposed DR estimator can unbiasedly estimate in Sec. 4.1 with large samples.
We are eager to hear your feedback. We’d deeply appreciate it if you could let us know whether your concerns have been addressed -- thank you so much!
Dear authors,
Thank you for the response. My major concerns (relationship to CF) have been addressed. I have updated my score accordingly.
We appreciate your recommendation to support the acceptance of our paper. We will include more comparisons and discussions with CF in our final version. Thank you for helping to improve the clarity and quality of our manuscript!
The authors introduce Principal Counterfactual Fairness (PCF) and unify two existing problems in the literature. The empirical results are mostly compelling (the extra results in the rebuttal helped considerably), and the theoretical results seem correct. The authors also plan to add missing citations in a future version. However, there is one concern: the authors must be careful when mixing between "observed outcomes" and "ground-truth outcomes", which could be an issue with a biased dataset.