Disentangling Invariant Subgraph via Variance Contrastive Estimation under Distribution Shifts
We propose to learn disentangled invariant subgraph via self-supervised contrastive variant subgraph estimation for achieving satisfying OOD generalization.
摘要
评审与讨论
The submission explores the challenge of out-of-distribution generalization in GNNs. The authors propose a novel model that enhances out-of-distribution generalization by explicitly identifying invariant subgraphs and leveraging contrastive learning on variant subgraphs. Their approach consists of three key components: (1) distinguishing invariant and variant subgraphs, (2) applying contrastive learning to variant subgraphs to estimate the degree of spurious correlations, and (3) predicting invariant subgraphs with inverse propensity weighting to mitigate these spurious correlations. The model is evaluated on multiple benchmark datasets under varying degrees of distribution shifts, demonstrating its superiority over existing methods.
给作者的问题
Could you clarify the mechanism to ensure the accuracy of the variant subgraph identification step? Would the approach be extended to node or link predictions?
论据与证据
The authors claim that this is the first work to explicitly utilize variant subgraphs to help capture invariant subgraphs under distribution shifts and that the three mutually promoted modules significantly enhance performance over state-of-the-art baselines. This claim is supported by their experimental results, which consistently show the superiority of their method across various datasets with varying bias levels.
方法与评估标准
The method aligns well with existing literature on invariant learning and causality based OOD generalization. The three proposed modules are motivated, particularly the introduction of variance contrastive estimation for variant subgraph learning, which differentiates this work from previous approaches that only focus on learning invariant representations.
理论论述
The problem formulation, which defines the OOD generalization objective and the role of invariant subgraphs, is clearly stated and follows a well-defined causality framework. The use of inverse propensity weighting is a reasonable and theoretically sound approach to mitigating spurious correlations.
实验设计与分析
The experimental design is well-structured and includes strong baseline comparisons. The authors compare against graph learning methods, including standard GNN architectures (GCN, GIN), pooling-based methods (DiffPool), and recent invariant learning approaches (DIR, LDD, DisC).
补充材料
Yes, all supplementary material is checked.
与现有文献的关系
The work builds upon prior research (invariant learning) effectively.
遗漏的重要参考文献
The authors should cite or discuss more recent graph OOD papers.
其他优缺点
This work addresses an important problem in graph machine learning and presents a novel solution with strong theoretical and empirical backing. The proposed approach is innovative in its explicit estimation of spurious correlations and its use of contrastive learning to refine invariant subgraph identification. The extensive experimental validation strengthens its contribution. The method is relatively efficient, as shown by the complexity analysis, which indicates that VIVACE maintains comparable computational cost to existing baselines. The scalability of the method makes it suitable for real-world applications beyond the datasets tested.
One potential limitation is the reliance on the accuracy of the variant subgraph identification step. If the model fails to accurately disentangle variant subgraphs, the effectiveness of the entire approach could be questioned. While the authors’ ablation studies suggest that their method is robust, additional discussions on how this challenge would be useful. Another limitation is that they could further elaborated on the underperformance of the baselines in the experiments. A more detailed discussion on this issue would provide valuable insights into the limitations of existing methods.
其他意见或建议
In addition to addressing these limitations, the authors should provide more explicit explanations of the training objective in the main text, particularly for the contrastive learning module.
- Q1. Clarification on the reliance on accurate variant subgraph identification.
We would like to clarify that our method can provably learn accurate variant subgraphs with theoretical guarantee.
Theorem 1. Denote the optimal invariant subgraph generator that disentangles the ground-truth invariant subgraph and variant subgraph given the input graph , where satisfies Assumption 2.1 and denote the complement as . Assume the second variance term of Eq. (5) is minimized, the first contrastive loss term is minimized iff the invariant subgraph generator equals .
Proofs.
Denote the first contrastive loss term of Eq. (5) as and the second variance term as .
: To prove , assume there exists another such that . It implies that includes the ground-truth invariant subgraph information,
Therefore, the contrastive loss among the environments partitioned by the graph label is dependent on the label itself, i.e.,
Thus
However, excludes all the ground-truth invariant information that is sufficiently predictive to the graph label. We have
Thus,
This contradicts the assumption that the variance term is minimized. Finally we prove that the optimal invariant subgraph generator can minimize the contrastive loss .
: To prove minimizing the contrastive loss can derive equals , assume there exists another invariant subgraph generator derived by minimizing , i.e., , where and . Because is minimized, preserves all the intrinsic features of the ground-truth variant subgraph, i.e., . Since the second variance term is minimized, only ground-truth variant patterns can be included in , i.e., . Therefore, we have , so that . Finally, we prove that there exists a unique that can minimize the first contrastive loss term of Eq. (5).
We will add more detailed proofs in the revised paper.
- Q2. Additional discussions on ablation studies.
We have added additional discussions on ablation studies. The first two ablated versions "w/ GCNII" and "w/ GIN" denote replacing the backbone with GCNII and GIN. The next two ablated versions "w/o Var." and "w/o IPW" denote removing the variant subgraph contrastive module and further removing inverse propensity weighting module. Fig. 3 shows that the performance nearly keeps unchanged for the first two ablated versions and drops significantly for the next two ablated versions, indicating that (1) our method is compatible with the other popular GNNs, and (2) it is important to explicitly identify variant subgraphs for estimating its degree of spurious correlations and remove it by reweighting. More details will be added in the revised paper.
- Q3. Why the baselines perform worse.
We have revised the discussions into sec. 4.2 as follows. Some baselines (e.g., GCN and GIN) did not consider the specific design for generalization under distribution shifts. And some baselines (e.g., FactorGCN and DIR) also did not show good performance since their assumptions might be invalid under severe bias. Also, some debiased methods (e.g., LDD and DisC) did not explicitly capture the spurious correlations for each input graph. Therefore, the baselines perform worse than our method. We will add the discussions above to the revised paper.
- Q4. Explanations of training objective.
We would like to clarify that our method is one joint framework to optimize Eq. (11), including (1) the self-supervised contrastive objective in Eq. (5) that can ensure the accurate identification of invariant and variant subgraphs, (2) the objective in Eq. (7) for accurately estimating the degree of the spurious correlations, and (3) the objective in Eq. (8) to learn the predictions on the invariant subgraph after reweighting.
- Q5. Would the approach be extended to node or link predictions?
In this paper, we mainly focus on the graph-level prediction task. But our method can be easily extended to the node or link prediction tasks which we leave for future work.
This study addresses a critical problem in GNNs regarding their limited generalization capabilities under distribution shifts. Current approaches mainly use correlations in graph patterns rather than discovering fundamental causal substructures for predictions. To overcome this limitation, the paper jointly considers the identification of both invariant and variant subgraphs. Specifically, the method estimates the impact of spurious correlations induced by variant subgraphs and leverages this estimation to enhance the learning of invariant subgraphs. The proposed model demonstrates substantial performance gains over representative baselines, and comprehensive ablation studies confirm the effectiveness of each designed module.
update after rebuttal
I will keep my positive opinion towards the paper after rebuttal.
给作者的问题
What are the computational trade-offs of using inverse propensity weighting?
What are the differences among the listed OOD generalization methods from line 392 to line 400.
论据与证据
The paper's claims about improved generalization are clear and convincing. The paper claims that the method improves out-of-distribution generalization in graph classification tasks by explicitly modeling spurious correlations through contrastive learning and mitigating their impact via inverse propensity weighting. The empirical results support the claims by outperforming existing graph OOD generalization baselines across five datasets, including CMNIST, CFashion, CKuzushiji, MOLSIDER, and MOLHIV.
方法与评估标准
The proposed method and evaluations make sense for the graph OOD problem. The use of contrastive learning for estimating variant subgraph effects is novel and well-motivated. Traditional methods only focus on learning invariant subgraph. But the proposed method focuses on learning both the variant subgraph and invariant subgraph. The key idea is inspiring. As said above, the evaluations included existing graph OOD generalization baselines and five common datasets (CMNIST, CFashion, CKuzushiji, MOLSIDER, and MOLHIV).
理论论述
The application of inverse propensity weighting is grounded in causal inference literature and is employed in a principled manner to correct for spurious correlations. The theoretical foundations of the method align with prior works on invariant learning and causal representation learning.
实验设计与分析
I have checked the soundness/validity of the experimental designs or analyses. I think the experimental designs are good but the analyses are a little limited.
补充材料
I reviewed supplementary material mainly on sections B and C.
与现有文献的关系
The paper belongs to the literature on OOD generalization.
遗漏的重要参考文献
There are no additional references that need to be discussed.
其他优缺点
The other strengths are summarized as follows:
- The work made strong methodological contributions with an interesting idea.
- The method effectively disentangles invariant and variant subgraphs, which is a novel approach to handling distribution shifts.
- The empirical results show consistent improvements across diverse datasets.
The other weaknesses are summarized as follows:
- Figure 1 is not clear to show the method’s training procedure.
- The discussions on the related works are arbitrary. For some contents, the authors just simply listed the references without necessary discussions.
- Typos: line 412, it should remove ‘5.1’; line 322, “well handling distribution shifts” should be “well handle distribution shifts”.
其他意见或建议
The authors should address the weaknesses above. The model framework in Figure 1 can be clearer. And the discussions on the related works are arbitrary. For the related work of OOD Generalization part, they just listed the relevant works (even in several lines), which should be revised. The differences among these works should be introduced.
We thank the reviewer for the valuable feedback. We addressed all the comments. Please kindly find the detailed responses to the comments below.
- Q1. Figure 1 is not clear to show the method’s training procedure.
Thank you for this comment. We would like to follow your suggestion to improve Figure 1 by incorporating additional details to better illustrate our method. Specifically, we have made the following three improvements: (1) We have included the key equations used in the method directly into the figure for the readers to connect the pipeline in the figure with the corresponding details in the text. (2) We have refined the pipeline by adding more detailed step-by-step flows indicated by arrows, making the process clearer and easier to follow. (3) We have highlighted more technical details with concrete examples and included a small legend that explains the meaning of specific symbols, colors, or arrow styles. We will update Figure 1 following your suggestion in the revised paper.
- Q2. Differences among the listed related works (line 392-400).
Thank you for this question. We have revised the discussions on the related works (line 392-400) as follows: "Several famous works are proposed to tackle this problem on graphs by learning subgraph backed by different theories or assumptions, including causality [1-2], invariant learning [3-6], disentanglement [7], information bottleneck [8]. Different from these works that output explainable or invariant subgraphs under distribution shifts, some works directly learn generalizable graph representations for the problems where distribution shifts exist on graph size [9-10] or the other structural patterns [11-12]. And the learned representations are expected to remain invariant across different environments. We will also add more discussions on related works in the revised paper.
- Q3. There are some typos.
Thank you for this comment. We have carefully proofread the paper and revised the following typos:
- Line 412: we have revised the section name "5.1. Disentangled Graph Neural Network" into "Disentangled Graph Neural Network".
- Line 322, we have revised the expression "well handling distribution shifts" into "well handle distribution shifts".
We will update the revised paper.
- Q4. What are the computational trade-offs of using inverse propensity weighting?
Thank you for this question. We would like to clarify that it will not have additional computation cost to use inverse propensity weighting. Specifically, in the variant subgraph contrastive estimation module, the time complexity of estimating the degree of the spurious correlations is , which is mainly from the pair-wise similarity calculation within a batch of graphs, where is the representation dimensionality and is the batch size. After that, we calculate the inverse propensity weights, whose time complexity is . So the overall time complexity of inverse propensity weighting is , which is significantly lower than the complexity of message-passing GNNs used in our invariant and variant subgraph identification module as well as the other baselines, whose time complexity is where and denote the number of edges and nodes. We will add these detailed analyses in the revised paper to clarify the efficiency of our method.
References:
[1] Discovering Invariant Rationales for Graph Neural Networks
[2] Causal attention for interpretable and generalizable graph classification
[3] Handling Distribution Shifts on Graphs: An Invariance Perspective.
[4] Learning Invariant Graph Representations Under Distribution Shifts
[5] Empowering Graph Invariance Learning with Deep Spurious Infomax
[6] Does invariant graph learning via environment augmentation learn invariance?
[7] Debiasing Graph Neural Networks via Learning Disentangled Causal Substructure
[8] Interpretable and generalizable graph learning via stochastic attention mechanism
[9] Size-invariant graph representations for graph classification extrapolations
[10] Sizeshiftreg: a regularization method for improving size-generalization in graph neural networks
[11] Graph out-of-distribution generalization with controllable data augmentation
[12] Graphmetro: Mitigating complex distribution shifts in gnns via mixture of aligned experts
Thank you for the clarification. I will keep my positive score unchanged.
This paper presents VIVACE for learning invariant subgraphs under distribution shifts using variance contrastive estimation. The authors propose a three-module framework to disentangle invariant and variant subgraphs, estimate the impact of spurious correlations, and employ inverse propensity weighting for predictions. The framework's effectiveness is validated by the experiments. The experimental results across multiple benchmarks demonstrate the method's superiority against existing approaches. The results indicate that VIVACE achieves better robustness to distribution shifts and effectively captures invariant subgraphs while mitigating the influence of spurious correlations.
update after rebuttal Thanks for the response. I will keep my score.
给作者的问题
- Could you discuss more on the computational cost compared to baselines?
- Why was the Generalized Cross-Entropy (GCE) loss chosen rather than Cross-Entropy (CE) loss?
- What impact does the hyperparameter q in GCE have on model performance?
论据与证据
The claims are supported with extensive experimental compare. The authors compare VIVACE against baseline GNNs, showing consistent improvements in OOD generalization. The ablation studies provide further evidence for the effectiveness of the proposed modules. To be specific, removing the variant subgraph contrastive module or the inverse propensity weighting module leads to a substantial drop in performance, indicating the important role of these components. The hyperparameter sensitivity analysis further demonstrates that VIVACE is robust across a wide range of settings.
方法与评估标准
The framework combining self-supervised contrastive learning and causal inference techniques is sound. The evaluation criteria are appropriate. The authors report classification accuracy for synthetic datasets (e.g., CMNIST, CFashion, CKuzushiji) and ROC-AUC for real-world datasets (e.g., MOLSIDER, MOLHIV), ensuring fair comparisons with prior works. The inclusion of multiple runs and standard deviation reporting strengthens the statistical significance of the results.
理论论述
The theoretical foundations of the method are ground by the causality. The authors formalize the problem using causal invariance assumptions, ensuring that the learned subgraphs have stable predictive power across environments. But a deeper theoretical analysis on the rationale of the approach should be introduced.
实验设计与分析
The experiments are comprehensive, covering both controlled and real-world scenarios. The authors ensure significant performance gains by including multiple runs. The method shows consistent improvements in those datasets.
补充材料
I reviewed the supplementary material, which provides further details on experimental setups and pseudocode. These additions enhance reproducibility.
与现有文献的关系
It builds upon prior work on graph neural networks and causality. The discussion of related work is comprehensive and contextualizes the contributions well.
遗漏的重要参考文献
No critical references appear to be missing.
其他优缺点
Pros:
- The paper is novel overall, which introduces variance contrastive estimation as a self-supervised learning technique to explicitly model and quantify spurious correlations in graph data.
- Unlike prior work that either assumes predefined environments or relies on heuristic-based disentanglement, VIVACE directly estimates the variant subgraphs, making the process adaptive to various real-world datasets.
- This methodology advances existing invariant learning approaches by integrating contrastive learning with causal inference, effectively modeling and mitigating the impact of spurious correlations.
Cons:
- Detailed analysis for some experiment results are weak (e.g., hyper-parameter sensitivity).
- Since VIVACE introduces additional modules (e.g., variant subgraph contrastive estimation, inverse propensity weighting), it is unclear how much extra computation is required.
- Some designs in the method lacked the detailed explanations (e.g., GCE loss in equ. 7).
其他意见或建议
- Clarifying the computational cost compared to baselines would be helpful.
- Equ. 7 introduces the Generalized Cross-Entropy (GCE) loss, but the reason for using GCE instead of standard cross-entropy is not fully discussed.
- Q1. Theoretical analysis on the rationale of the approach.
We would like to clarify that the rationale of our method is to achieve OOD generalization by accurately disentangling invariant and variant subgraphs. We have added the following theorem.
Theorem 1. Denote the optimal invariant subgraph generator that disentangles the ground-truth invariant subgraph and variant subgraph given the input graph , where satisfies Assumption 2.1 and denote the complement as . Assume the second variance term of Eq. (5) is minimized, the first contrastive loss term is minimized iff the invariant subgraph generator equals .
Proofs.
Denote the first contrastive loss term of Eq. (5) as and the second variance term as .
: To prove , assume there exists another such that . It implies that includes the ground-truth invariant subgraph information,
Therefore, the contrastive loss among the environments partitioned by the graph label is dependent on the label itself, i.e.,
Thus
However, excludes all the ground-truth invariant information that is sufficiently predictive to the graph label. We have
Thus,
This contradicts the assumption that the variance term is minimized. Finally we prove that the optimal invariant subgraph generator can minimize the contrastive loss .
: To prove minimizing the contrastive loss can derive equals , assume there exists another invariant subgraph generator derived by minimizing , i.e., , where and . Because is minimized, preserves all the intrinsic features of the ground-truth variant subgraph, i.e., . Since the second variance term is minimized, only ground-truth variant patterns can be included in , i.e., . Therefore, we have , so that . Finally, we prove that there exists a unique that can minimize the first contrastive loss term of Eq. (5).
- Q2. Detailed analysis on hyper-parameter sensitivity.
We clarify that is the coefficient to control the balance between the contrastive loss and the invariance regularizer. A large encourages the invariance among different training environments and a small can learn informative representations but may not be sufficient to encourage the invariance. is a hyperparameter in GCE loss, which controls the degree of fitting spurious correlations. A small pays more attention to correctly classified samples and suffers more from noisy samples, while a large means the model becomes less sensitive and prevents overfitting to the spurious correlations. Fig. 4 shows our method outperforms the best baselines within a wide range of hyper-parameters choices.
- Q3. Computation cost of the modules.
Denote the number of nodes and edges as and of the input graph, the representation dimensionality as , and the batch size is . Our method mainly consists of three modules:
- For the invariant and variant subgraph identification module, the time complexity is from the GCN component.
- For the variant subgraph contrastive module, the time complexity is , which is mainly from the pair-wise similarity calculation within a batch of graphs.
- For the inverse propensity weighting based invariant prediction module, the time complexity is .
Finally, the overall time complexity of our method is mainly induced by the message-passing GNN. The variant subgraph contrastive estimation and inverse propensity weighting modules will not have extra higher time complexity.
- Q4. Reason to use GCE loss.
We adopt GCE loss to fit the spurious correlations as it has been shown that GCE loss can emphasize spurious correlations.
- Q5. Impact of .
As shown in Figure 4, as a hyperparameter in GCE loss to control the degree of fitting the spurious correlations, has a moderate impact on the model performance, but our method is not very sensitive to it by outperforming the best baselines within a wide range of hyper-parameters choices.
This manuscript studies out-of-distribution generalization issue in graph neural networks. The authors propose learning invariant subgraphs via variant subgraph contrastive estimation, which can handle graph distribution shifts with severe bias. The key innovation is leveraging contrastive learning on variant subgraphs to estimate spurious correlations, which is then mitigated using inverse propensity weighting. This method explicitly addresses the scenario where environment labels are either unavailable or unreliable, significantly enhancing the robustness of GNNs to severe biases in datasets.
给作者的问题
Can you provide more theoretical analyses on the method to explain why it works? Can you incorporate the time complexity of each module? Do these variant subgraph contrastive estimation and inverse propensity weighting modules introduce unacceptable time complexity? Can you explain why the hyperparameter and can influence the performance?
论据与证据
The claims made by the authors regarding the benefits of leveraging contrastive learning to estimate and mitigate spurious correlations for better generalization are convincing. The provided empirical evidence supports these claims effectively.
方法与评估标准
The proposed approach is reasonable. The inverse propensity weighting, coupled with self-supervised contrastive learning, effectively addresses the identified issues. The evaluation criteria using established datasets and metrics are appropriate and effectively highlight the strengths of the method for the problem.
理论论述
The assumptions made for invariant subgraph learning are acceptable given the problem setting. However, the detailed theoretical discussions and proofs on why this method can solve this OOD problem are missing.
实验设计与分析
The experiments provide strong empirical validations for the proposed method. Extensive experiments on several graph classification benchmark datasets demonstrate the superiority of the proposed method over baselines. However, the analyses on the important hyperparameters are weak.
补充材料
I checked the supplementary materials; additional details on implementation are helpful.
与现有文献的关系
The paper is well-situated within the literature on graph learning and OOD generalization.
遗漏的重要参考文献
The important references are fully discussed from my point of view.
其他优缺点
strengths
(1) The research problem is interesting and important to the community. As the real-world application of GNNs continues to expand, improving robustness against real-world out-of-distribution scenarios becomes increasingly crucial I think.
(2) The proposed method is technically sound. The technical details are easy to understand. The use of variance contrastive estimation and inverse propensity weighting to mitigate spurious correlations is particularly novel.
(3) The comparative results against baselines validate the effectiveness of the approach. The presented experimental results indicate consistent improvements over several state-of-the-art baselines, validating the effectiveness and robustness of the proposed approach under varying degrees of bias.
weaknesses:
(1) While the paper is methodologically solid and clearly explained, the authors should provide more detailed theoretical analyses of the methods.
(2) The discussions on the experimental results in the experiment section can also be more detailed. Deeper insights into hyperparameter analyses and ablation studies would improve clarity.
(3) The discussions on the time complexity are limited. More comprehensive analysis on each module is missing.
其他意见或建议
I would suggest that the authors incorporate more rigorous theoretical analyses into the proposed method. And I also strongly encourage the authors to incorporate more details on the time complexity of each module.
- Q1. Theoretical analyses.
We have added Theorem 1 to show our method can accurately identify the invariant and variant subgraphs for OOD generalization.
Theorem 1. Denote the optimal invariant subgraph generator that disentangles the ground-truth invariant subgraph and variant subgraph given the input graph , where satisfies Assumption 2.1 and denote the complement as . Assume the second variance term of Eq. (5) is minimized, the first contrastive loss term is minimized iff the invariant subgraph generator equals .
Proofs.
Denote the first contrastive loss term of Eq. (5) as and the second variance term as .
: To prove , assume there exists another such that . It implies that includes the ground-truth invariant subgraph information,
Therefore, the contrastive loss among the environments partitioned by the graph label is dependent on the label itself, i.e.,
Thus
However, excludes all the ground-truth invariant information that is sufficiently predictive to the graph label. We have
Thus,
This contradicts the assumption that the variance term is minimized. Finally we prove that the optimal invariant subgraph generator can minimize the contrastive loss .
: To prove minimizing the contrastive loss can derive equals , assume there exists another invariant subgraph generator derived by minimizing , i.e., , where and . Because is minimized, preserves all the intrinsic features of the ground-truth variant subgraph, i.e., . Since the second variance term is minimized, only ground-truth variant patterns can be included in , i.e., . Therefore, we have , so that . Finally, we prove that there exists a unique that can minimize the first contrastive loss term of Eq. (5).
We will add more detailed proofs in the revised paper.
- Q2.1. Hyperparameter analyses.
We want to clarify that is the coefficient to control the balance between the contrastive loss and the invariance regularizer. A large encourages the invariance among different training environments and a small can learn informative representations but may not be sufficient to encourage the invariance. is a hyperparameter in GCE loss, which controls the degree of fitting spurious correlations. A small pays more attention to correctly classified samples and suffers more from noisy samples, while a large means the model becomes less sensitive and prevents overfitting to the spurious correlations. Fig. 4 shows our method outperforms the best baselines within a wide range of hyper-parameters choices.
- Q2.2. Ablation studies.
The first two ablated versions "w/ GCNII" and "w/ GIN" denote replacing the backbone with GCNII and GIN. The next two ablated versions "w/o Var." and "w/o IPW" denote removing the variant subgraph contrastive module and further removing inverse propensity weighting module. Fig. 3 shows that the performance nearly keeps unchanged for the first two ablated versions and drops significantly for the next two ablated versions, indicating that (1) our method is compatible with the other popular GNNs, and (2) it is important to explicitly identify variant subgraphs for estimating its degree of spurious correlations and remove it by reweighting.
- Q3. Time complexity of each module.
Denote the number of nodes and edges as and of the input graph, the representation dimensionality as , and the batch size is . Our method mainly consists of three modules:
- For the invariant and variant subgraph identification module, the time complexity is from the GCN component.
- For the variant subgraph contrastive module, the time complexity is , which is mainly from the pair-wise similarity calculation within a batch of graphs.
- For the inverse propensity weighting based invariant prediction module, the time complexity is .
Finally, the overall time complexity of our method is .
Thanks for addressing my concerns. I'd like to raise the score to 4.
The paper proposes a novel method to learn disentangled invariant subgraph via self-supervised contrastive variant subgraph estimation for achieving satisfying OOD generalization. All reviewers recommended the acceptance. Therefore, I followed the reviewers' recommendations and gave the acceptance.