DANCE: Dual Unbiased Expansion with Group-acquired Alignment for Out-of-distribution Graph Fairness Learning
摘要
评审与讨论
This paper propose DANCE to improve fairness performance of GNNs under distribution shifts. DANCE addresses two key challenges: sensitive group imbalance and the trade-off between fairness and model performance. DANCE uses unbiased mixup to balance sensitive attributes, fairness-aware adversarial learning to improve robustness, and a group-acquired alignment strategy to prioritize fair representations by considering sensitive attributes. Extensive experimental results show the effectiveness of DANCE.
给作者的问题
Please refer to the suggestions
论据与证据
Yes
方法与评估标准
Yes
理论论述
Yes
实验设计与分析
Yes
补充材料
Yes, all parts.
与现有文献的关系
Graph neural networks (GNNs), fairness in machine learning, and out-of-distribution (OOD) generalization.
遗漏的重要参考文献
No
其他优缺点
Strengths:
-
This paper addresses graph fairness under distribution shifts, a problem that remains largely unexplored.
-
DANCE expands the training distribution in both graph structure and feature space, generating challenging yet unbiased virtual data.
-
Extensive experiments demonstrate the effectiveness of DANCE
-
This paper provides a theoretical analysis of the graph diffusion process, showing how it controls information propagation across different sensitive groups.
Weaknesses:
-
High computational cost. DANCE use multiple modules (graph expansion, adversarial learning, alignment, and disentanglement), which increase computational costs.
-
The authors do not provide case studies that illustrate why DANCE is superior than other methods.
其他意见或建议
-
The authors should provide theoretical analysis or case studies to show why DANCE is superior than other methods.
-
The authors should provide complexity analysis.
We are truly grateful for the time you have taken to review our paper, your insightful comments and support. Your positive feedback is incredibly encouraging for us! In the following response, we would like to address your major concern and provide additional clarification.
Q1. High computational cost. DANCE use multiple modules (graph expansion, adversarial learning, alignment, and disentanglement), which increase computational costs.
Thanks for your comment. We have included the efficiency analysis and comparison as follows. From the result, we can observe that our method has a competitive computation cost. In particular, we provide the following analysis:
-
Complexity analysis. The computing complexity of our framework mainly depends on (a) Dual graph expansion, (b) Group-acquired alignment and (c) Representation disentanglement. Let be the number of synthesized nodes. Then, since logits can be obtained from the previous epoch, for (a), we need extra time for anchor and auxiliary nodes sampling, for feature mixup and for generating synthesized edges. Meanwhile, adversarial learning on synthesized data takes . For (b) and (c), the complexity for the target and sensitive encoder is with layers while the view alignment takes with total positive and negative pairs. Compared with FatraGNN [1], the additional complexity is , mainly from the dual graph augmentation and contrastive learning modules, which are carefully controlled in our implementation.
-
Practical training overhead. From a practical standpoint, we have provided the time complexity and training time of our DANCE to further demonstrate its computational efficiency in comparison to other baselines. The results indicate that our framework maintains a competitive computation cost compared with baselines.
| Model | Params (M) | Training Time (s/epoch on Credit) |
|---|---|---|
| FatraGNN | 0.140 | 0.263 |
| DANCE | 0.536 | 0.340 |
In summary, our method has a competitive computation cost in comparison with the baseline. We will include the above discussion in our revised version.
Q2. The authors do not provide case studies that illustrate why DANCE is superior than other methods.
Thank you for the insightful comment! We have provided t-SNE visualizations that offer a qualitative comparison of the learned representations between DANCE and the benchmark model (FatraGNN). These visualizations help illustrate how DANCE better achieves fair and discriminative representations.
The colors in the plots represent different combinations of sensitive attributes and target labels:
- Pink: sensitive = 1, y = 0
- Green: sensitive = 0, y = 0
- Grey: sensitive = 1, y = 1
- Blue: sensitive = 0, y = 1
FatraGNN [1] (Benchmark) t-SNE Results:
Pokec_z: https://anonymous.4open.science/r/DANCE_ICML_2025-BD70/Fatra_pokec_z_tsne.png
Pokec_n: https://anonymous.4open.science/r/DANCE_ICML_2025-BD70/Fatra_pokec_n_tsne.png
DANCE t-SNE Results:
Pokec_z: https://anonymous.4open.science/r/DANCE_ICML_2025-BD70/DANCE_pokec_z_tsne.png
Pokec_n: https://anonymous.4open.science/r/DANCE_ICML_2025-BD70/DANCE_pokec_n_tsne.png
Compared to FatraGNN, DANCE exhibits a much clearer separation between target labels (i.e., the blue/grey cluster vs. the pink/green cluster), while sensitive attributes are not clustered, indicating reduced sensitive information leakage. In contrast, the benchmark method (FatraGNN) shows all four colors more uniformly mixed, with no clear separation based on target labels, suggesting entanglement of sensitive and target information.
This case study provides intuitive evidence that DANCE achieves better separation of semantic information while mitigating bias from sensitive attributes, thereby demonstrating its superiority over existing methods.
Reference
[1] Li Y, Wang X, Xing Y, et al. Graph fairness learning under distribution shifts, Proceedings of the ACM Web Conference 2024. 2024: 676-684.
Thanks again for appreciating our work and for your constructive suggestions. Please let us know if you have further questions.
Thanks for your explanation. Most of the concerns have been solved. I will raise the score.
This paper proposes DANCE, a novel framework for enhancing graph neural network fairness learning under distribution shifts by generating unbiased virtual graph data through dual expansion (structural and feature-based) and aligning node representations. It specifically tackles sensitive group imbalance and fairness-performance conflicts by synthesizing challenging yet unbiased virtual graphs and disentangling sensitive attributes. Experiments demonstrate the method's superiority in fairness and generalization performance over existing fairness-aware GNNs.
给作者的问题
What are the computational overheads introduced by the proposed approach, particularly in comparison to simpler fairness-aware methods?
I would be interested in the details about the rationale behind prioritizing negative pairs with identical sensitive labels in the group-acquired alignment objective. Plus, how significantly does this choice affect fairness outcomes?
论据与证据
Yes
方法与评估标准
Yes
理论论述
They appear to be generally correct and the intuition makes sense to me as well.
实验设计与分析
They appear to be carefully designed and I do not have major complaints about the experimental designs and analyses.
补充材料
Yes, mostly A and B
与现有文献的关系
This paper is properly positioned as a part of the broader scientific literature with clarifications about the relationship between itself and other works.
遗漏的重要参考文献
N/A
其他优缺点
Strengths:
1 - This paper Introduces dual unbiased expansion strategies in both graph and hidden spaces to address distribution shifts.
2 - The idea of group-acquired alignment explicitly focusing on fairness is interesting, particularly handling minority-sensitive groups.
3 - Comprehensive framework including both adversarial modules and representation disentanglement strategies.
4 - Strong empirical validation across multiple benchmark datasets showcasing clear improvements over baselines.
5 - Addresses a realistic and underexplored challenge in graph fairness research by focusing on distribution shifts.
Weaknesses:
1 - The framework involves multiple sophisticated modules, potentially increasing the difficulty of implementation and complexity. Also, the author did not discuss the time complexity for the proposed method: how expensive it is to adopt it in practice? Generally the discussions on computational cost, scalability, and efficiency is limited.
其他意见或建议
N/A
We are truly grateful for the time you have taken to review our paper, your insightful comments and support. Your positive feedback is incredibly encouraging for us! In the following response, we would like to address your major concern and provide additional clarification.
Q1. The framework involves multiple sophisticated modules, potentially increasing the difficulty of implementation and complexity. Also, the author did not discuss the time complexity for the proposed method: how expensive it is to adopt it in practice? Generally the discussions on computational cost, scalability, and efficiency is limited. What are the computational overheads introduced by the proposed approach, particularly in comparison to simpler fairness-aware methods?
Thanks for your comment. We have included the efficiency analysis and comparison as follows. From the result, we can observe that our method has a competitive computation cost. In particular, we provide the following analysis:
-
Complexity analysis. The computing complexity of our framework mainly depends on (a) Dual graph expansion, (b) Group-acquired alignment and (c) Representation disentanglement. Let be the number of synthesized nodes. Then, since logits can be obtained from the previous epoch, for (a), we need extra time for anchor and auxiliary nodes sampling, for feature mixup and for generating synthesized edges. Meanwhile, adversarial learning on synthesized data takes . For (b) and (c), the complexity for the target and sensitive encoder is with layers while the view alignment takes with total positive and negative pairs. Compared with FatraGNN [1], the additional complexity is , mainly from the dual graph augmentation and contrastive learning modules, which are carefully controlled in our implementation.
-
Practical training overhead. From a practical standpoint, we have provided the time complexity and training time of our DANCE to further demonstrate its computational efficiency in comparison to other baselines. The results indicate that our framework maintains a competitive computation cost compared with baselines.
| Model | Params (M) | Training Time (s/epoch on Credit) |
|---|---|---|
| FatraGNN | 0.140 | 0.263 |
| DANCE | 0.536 | 0.340 |
In summary, our method has a competitive computation cost in comparison with the baseline. We will include the above discussion in our revised version.
Q2. The rationale behind prioritizing negative pairs with identical sensitive labels.
Thanks for your comment. Below we provide both the rationale analysis and empirical evidence to support prioritizing negative pairs with identical sensitive attributes in the group-acquired alignment objective.
-
Rationale Analysis. The rationale behind prioritizing the negative pairs is to mitigate the encoder’s reliance on sensitive attribute information during representation learning. When , both the positive and negative samples share the same sensitive attributes as the anchor node. Therefore, the encoder no longer considers sensitive attribute information a valuable feature for fairness learning. When , the positive samples have different sensitive attributes from both the anchor and the negative samples. If the encoder learns sensitive attribute information, the similarity between the positive samples and the anchor will decrease, while the similarity between the negative samples and the anchor will increase, which is contrary to the objective of the loss function.
-
Empirical Evidence. To validate this design, we performed a comparative experiment where negative pairs were sampled solely based on differing target labels, without considering sensitive attributes (denoted Var 1 in the following table). The results are summarized below:
| Dataset | Metric | Var1 | DANCE | Dataset | Metric | Var1 | DANCE | |
|---|---|---|---|---|---|---|---|---|
| Pokec_z | ACC↑ | 71.8 | 70.09 | Pokec_n | ACC↑ | 67.49 | 66.58 | |
| ROC-AUC↑ | 78.16 | 77.42 | ROC-AUC↑ | 74.32 | 72.88 | |||
| △DP ↓ | 5.71 | 3.74 | △DP ↓ | 1.77 | 0.83 | |||
| △EO ↓ | 3.33 | 2.70 | △EO ↓ | 1.57 | 0.29 |
From the result, we can find DANCE significantly outperforms Var 1 on fairness metrics. This indicates that prioritizing negative pairs with identical sensitive attributes plays a key role in learning representations that are both fair and robust under distribution shifts.
Reference
[1] Li Y, Wang X, Xing Y, et al. Graph fairness learning under distribution shifts, Proceedings of the ACM Web Conference 2024. 2024: 676-684.
Thanks again for appreciating our work and for your constructive suggestions. Please let us know if you have further questions.
The paper proposes a method to improve fairness in graph neural networks (GNNs) under distribution shifts. It introduces dual graph expansion to generate unbiased virtual graph data, group-acquired alignment to prioritize negative pairs with identical sensitive labels, and representation disentanglement to separate sensitive attributes from task-related information. Experiments show that DANCE improves both fairness and classification performance compared to existing methods.
给作者的问题
Please refer to the weakness.
论据与证据
Yes. The claims are reasonable and convincing.
方法与评估标准
Yes, the methods and evaluation make sense.
理论论述
Yes. The proofs seems correct.
实验设计与分析
I have some concerns about the experiments.
- Although the paper focus on the distribution shift situation, it is worthwhile to see the performance under the same distribution.
- The ablation study is very confusing. For example, all Var1 to Var4 (in table 3) have significant degrade with respect to the fairness metrics. Does it means every component, even the mixup, will contribute to the fairness results? I doubt it.
- There are some contradict results, for example, in table 3, while C3-Var3 cell has better performance, other cells (C-Var3) show significant lower performance. The authors did not explain that in the paper.
补充材料
Yes.
与现有文献的关系
The key contributions of the DANCE framework build upon and extend several lines of research in fair graph learning, out-of-distribution (OOD) generalization, and adversarial learning for fairness. FatraGNN also deals with a similar problem, but this work achieves better performance.
遗漏的重要参考文献
No.
其他优缺点
Strengths:
- The topic and problem is realistic and interesting.
- The experiments are extensive.
Weakness:
- Please refer to the concerns in Experiment above.
- The methods have limited novelty. The Graph Expansion and Attributes Disentanglement are studied in existing IID fair graph learning, although transferring it to OOD graph learning is novel.
其他意见或建议
Please use the same precision in the table.
We are truly grateful for the time you have taken to review our paper, your insightful comments and support. Your positive feedback is incredibly encouraging for us! In the following response, we would like to address your major concern and provide additional clarification.
Q1. The performance under the same distribution.
Thank you for your comment. We have added the performance of our model and several baselines under the same distribution setting as below. From the results, our method achieves better performance compared with baselines. We will add these results to our revised version.
| Dataset | Metric | FatraGNN | DANCE | Dataset | Metric | FatraGNN | DANCE |
|---|---|---|---|---|---|---|---|
| c1 | ACC↑ | 78.11 | 80.43 | c2 | ACC↑ | 77.81 | 80.05 |
| ROC-AUC↑ | 65.59 | 74.69 | ROC-AUC↑ | 68.92 | 81.23 | ||
| △DP ↓ | 6.88 | 4.49 | △DP ↓ | 1.04 | 1.83 | ||
| △EO ↓ | 4.56 | 3.76 | △EO ↓ | 1.64 | 1.91 | ||
| c3 | ACC↑ | 72.39 | 80.77 | c4 | ACC↑ | 72.98 | 76.38 |
| ROC-AUC↑ | 71.24 | 83.21 | ROC-AUC↑ | 71.76 | 76.80 | ||
| △DP ↓ | 1.80 | 2.64 | △DP ↓ | 3.09 | 3.47 | ||
| △EO ↓ | 1.48 | 6.00 | △EO ↓ | 2.07 | 2.60 |
Q2. Does it mean every component, even the mixup, will contribute to the fairness results?
Thank you for the comment. In our framework, each component is designed to address a specific challenge in out-of-distribution (OOD) graph fairness learning, and our ablation results demonstrate that each module makes a meaningful contribution to the final fairness performance.
Specifically, we expand the graph data in both the structural space and the feature space. The unbiased Mixup module enlarges the decision boundary of minority groups, helping to rebalance representation across sensitive attributes. The adversarial learning module simulates worst-case distribution shifts by introducing challenging perturbations in feature space, enhancing the model's robustness. Building on this expanded data, the group-acquired alignment module aligns representations across groups while discouraging the use of sensitive attributes, and the representation disentanglement module explicitly separates sensitive and target information to further reduce bias.
Together, these components form a cohesive design, and the degradation observed in fairness metrics upon removing any one of them confirms that each module plays a critical role in achieving fairness under distribution shifts.
Q3. In Table 3, while the C3-Var3 cell has better performance, other cells (C-Var3) show significantly lower performance
Thank you for the comment. We have provided the following explanation:
- Complicated Graph Data. Real-world graph datasets (such as Credit-Cs) exhibit considerable variability in their structural properties—such as homophily and sensitive group imbalance. As shown in the table below, these graph datasets have different and complicated characteristics, which could result in performance fluctuation. Overall, our full model shows better performance in comparison with the model variant in most cases, which validates the effectiveness of the proposed method.
| Dataset | Sensitive Homophily | s0 Internal Edge Ratio | s1 Internal Edge Ratio | Cross-group Edge Ratio |
|---|---|---|---|---|
| C1 | 0.821 | 0.777 | 0.044 | 0.179 |
| C2 | 0.807 | 0.774 | 0.033 | 0.193 |
| C3 | 0.755 | 0.702 | 0.053 | 0.245 |
| C4 | 0.737 | 0.681 | 0.057 | 0.262 |
- Previous Observations. Similar cross-dataset variations have been observed in prior studies. For instance, [2] observes that fairness improvements can vary significantly across different datasets when evaluating fairness-aware GNN methods. As shown in Table 1 of [2], RFR exhibits large fluctuations in DP between the Adult (1.0, 2.0) and Adult (3.0, 6.0) subsets, while its DP performance remains relatively stable across the ACS-E (1.0, 2.0) and ACS-E (3.0, 6.0) subsets. This illustrates that the underlying data distribution plays a crucial role, and that fairness improvements achieved on one subset may not generalize consistently across others.
Q4. Novelty of DANCE
Thanks for your comment. The innovations of this work include:
-
Data-centric perspective. We explore an underexplored yet important OOD graph fairness learning problem from a data-centric perspective, achieving better performance compared to baselines.
-
A unified graph fairness learning framework. Our framework is a unified approach that combines dual unbiased graph expansion and group-acquired alignment to minimize the domain gap while ensuring fairness.
-
Theoretical analysis. We provide a theoretical analysis demonstrating that our framework can control information propagation between different groups for fairness learning and ensure the convergence of the loss function.
[1] Laclau C, et al. A survey on fairness for machine learning on graphs. arXiv, 2022.
[2] Jiang et al., Chasing fairness under distribution shift: A model weight perturbation approach, NeurIPS 2023.
Thanks again for appreciating our work and for your constructive suggestions.
This paper proposes the DANCE method, which aims to address the problem of fair learning of graph neural networks (GNNs) under distributional bias. Traditional methods assume that training and testing data are identically distributed, whereas distribution bias is prevalent in real-world scenarios, leading to degradation of model fairness and performance.The core idea of DANCE is to generate unbiased and challenging virtual graph data in structural and feature space to simulate distributional bias and enhance model robustness through a data-centric perspective.
给作者的问题
Please refer to previous comments.
论据与证据
Experiments on real-world datasets demonstrate that DANCE outperforms baseline methods in both classification performance and fairness metrics .
方法与评估标准
The OOD issue is really a common problem in reality.
理论论述
I am not really sure why Theorem 3.1 can show that the graph diffusion method can precisely control the propagation of information between different groups, could the authors explain this in further detail?
实验设计与分析
The experimental design is sound.
补充材料
The appendix is supplemented with some experimental results and definitional proofs.
与现有文献的关系
This has implications for the de-biasing of graph representation learning.
遗漏的重要参考文献
Some of the work on fair graph learning [1,2,3] also involves data reconstruction, which the authors should include.
[1]"Rethinking fair graph neural networks from re-balancing." Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2024. [2] "Disentangled contrastive learning for fair graph representations." Neural Networks 181 (2025): 106781. [3]"Toward fair graph neural networks via real counterfactual samples." Knowledge and Information Systems 66.11 (2024): 6617-6641.
其他优缺点
Strengths: This paper provides a principled basis for linking graph diffusion theory to fairness, which is an inspiration for the fair graph learning community.
Weaknesses: The author claims ‘The core idea of our DANCE is to generate challenging yet unbiased virtual graph data in both graph and hidden spaces...’ but does not define ‘challenging’. What is to be understood by ‘challenging’ in this context?
其他意见或建议
None
We are truly grateful for the time you have taken to review our paper and your insightful review. Here we address your comments in the following.
Q1. Why Theorem 3.1 can show that the graph diffusion method can precisely control the propagation of information between different groups?
To clarify the intuition behind the assumption in Theorem 3.1, we rewrite it as follows:
First, we prove that for any , there exists such that:
By the definition of , for any we can get , where . And there exists satisfying . This implies that for any , there exists a path with , where and . For path length , we can get . Similarly, for with the path , we have . Therefore, we can get .
Then, we prove by mathematical induction that , we have:
Base case (): For and , the identity matrix satisfies
Inductive step: Assume and , , . For , we have .
, the inductive hypothesis implies . , by the definition of , . Therefore, we can get .
By mathematical induction, , , which leads to .
This assumption reflects that the graph diffusion method in DANCE can shield minority groups from majority interference while permitting essential cross-group feature learning. This distinction justifies the assumption and highlights the geometric intuition behind Theorem 3.1.
Q2. Some essential references are not included.
We will include the following discussion into our revised version:
Additionally, FairGB [1] introduces a counterfactual node mixup strategy, generating synthetic samples by interpolating node features and labels across different demographic groups to address demographic group imbalances. Similarly, FDGNN [2] and RFCGNN+ [3] design counterfactual augmentations by varying sensitive or label values while preserving the original adjacency matrices to learn fairer representations. In comparison with these methods, our method specifically focuses on mixing minor sensitive group nodes with those of different sensitive attributes to generate challenging samples that extend decision boundaries, thereby enhancing generalization and fairness.
Q3. What is to be understood by ‘challenging’ in this context?
In our context, we use the term “challenging” samples to refer to those that are difficult to classify but strategically designed to improve generalization and fairness. Specifically:
- Synthesized nodes located near class decision boundaries are generated via Mixup (Eq. 7) between minor group nodes and auxiliary nodes. These samples exhibit higher prediction uncertainty and encourage the model to refine its classification boundaries [4].
- Adversarial learning (Sec. 3.2) introduces perturbations that further amplify this uncertainty, thereby enhancing the model’s robustness and fairness under distribution shifts.
In summary, “challenging” refers to samples that are difficult to classify but strategically beneficial for improving both generalization and fairness.
[1] Rethinking fair graph neural networks from re-balancing. KDD 2024
[2] Disentangled contrastive learning for fair graph representations. Neural Networks, 2025
[3] Toward fair graph neural networks via real counterfactual samples. KAIS, 2024
[4] Self-supervised graph-level representation learning with adversarial contrastive learning, TKDD, 2023
This paper studies the problem of ensuring different notions of group fairness in graph neural networks (GNNs), under the potential presence of distribution shifts. The main idea is to use dual graph expansion to simulate the distribution shift, and then use fairness-aware adversarial training to ensure group fairness. The application of adversarial training is not new, even in the context of GNNs [1]. The authors are encouraged to provide a more thorough discussion of the novelty of their approach and how it compares to existing work. The main technical contribution is the idea of using dual graph expansion to generate unbiased graph data, which I also find interesting.
Overall, the reviewers are positive about the paper, particularly the strong empirical results. Multiple reviewers have pointed out the lack of computational complexity analysis, and the authors have provided a brief response in the rebuttal. I would encourage the authors to expand the discussion on this point in the final version of the paper.
[1]. Information obfuscation of graph neural networks, Liao et al., ICML'21