Enhancing Graph Invariant Learning from a Negative Inference Perspective
In this work, we propose a negative inference graph OOD framework (NeGo) to broaden the inference space for environment factors, effectively addressing the limitations that existing methods cannot tackle.
摘要
评审与讨论
This paper focuses on addressing complex environmental shifts, whereas current works focus on extracting causal subgraphs but overlook the broader scope of environmental changes, which extend beyond spurious subgraphs. A negative inference graph OOD framework (NeGo) is proposed to expand the inference space for environmental factors.
优点
- This paper is well-written and offers a comprehensive theoretical analysis. Additionally, the Appendix contains more supplementary analysis.
- A negative inference graph OOD framework (NeGo) is proposed to expand the inference space for environmental factors.
- Environment inference is a critical issue in graph OOD, and this paper contributes to advancing its development.
缺点
- This paper claims that current methods fail to address complex environments when tested with double the scale of spurious substructures. More detailed experiments, including explanations of the datasets and settings, are needed to demonstrate how NeGo outperforms the baselines in addressing this issue. However, the paper only presents two results for NeGo.
- In the experiments, this paper only presents the causal subgraphs extracted by NeGo. To allow for better comparison, it should also include the causal subgraphs extracted by the baseline methods.
- There are some methods like [1] that also considers the environment inference. What is the difference between this paper and those papers considering environments.
- Why GOOD-PCBA in GOOD[2] is not tested?
[1] Learning Invariant Graph Representations for Out-of-Distribution Generalization. NeurIPS2023
[2] Good: A graph out-of-distribution benchmark. NeurIPS2022
问题
- According to Line 51, the statement seems jumpy in logic, for "Existing methods ... focus much of their attention on extracting causal subgraphs. This results in the model being able to extract causal subgraphs only in known environments, leading to failures in unseen complex environments.". The authors should elaborate on this point.
The paper addresses the challenge of out-of-distribution (OOD) generalization in graph learning by focusing on the fundamental cause of data distribution shift—changes in environments. While existing works have made progress in handling this issue, they often struggle with complex environmental shifts. Current methods tend to concentrate heavily on extracting causal subgraphs, overlooking the impact of spurious subgraphs as environment variables. By proposing the Negative Inference Graph OOD framework (NeGo), the paper aims to expand the inference space for environmental factors. Drawing inspiration from prompt learning in large language models, NeGo employs negative prompt environment inference to capture underlying environment information effectively. Additionally, an environment-enhanced invariant subgraph learning method is introduced to leverage inferred environment embedding for robust causal subgraph extraction during environment shifts. Through a thorough evaluation of both real-world and synthetic datasets across domains, NeGo consistently outperforms existing baselines, demonstrating the efficacy of the proposed framework.
优点
- The paper shifts its focus from the invariant subgraph to spurious subgraph inference, offering a novel and insightful perspective. Besides, it borrows the prompt concept from LLM to the graph OOD field, which is very interesting.
- The paper provides a thorough theorem analysis of the proposed method, adding a fundamentalism to its theoretical framework.
- Through extensive experiments across diverse datasets and comprehensive evaluations, the paper offers compelling evidence of the effectiveness of the methods employed.
缺点
- in section 3.2 you give the theorem about the negative inference in equation 3, but I do not understand what you mean by the negative inference, it is related to the causal subgraph or the spurious subgraph ?
What does equation 3 means and the following theorem mean?
Furthermore, it seems that in CIGA and GALA, they have given the expression of this equation by constraining the mutual information of the , what is the difference between yours and their claims?
- since the paper has a lot of equations, the meaning of the equations is not explained very well.
- In equation 3, what dose the means? I am sorry that I can not find the definition before the equation or have I missed some details?
- In equation 6, there is a typo in the .
问题
-
I am still confused about the difference between your theoretical framework, CIGA, and GALA ( the organization of your paper seems so close to them), could you explain it more clearly? such as the conditions, assumptions, theoretical objectives, and implementation algorithms.
-
I am still confused about your practical method, as your paper only gives the equations such as the mutual information or the KL, and so does the algorithm in Algorithm I.
-
What is the usefulness of ? I do not understand what is the effect and the implementation details about
The paper studies graph-level OOD generalization, by utilizing prompt learning paradigm, to first learn spurious features, then utilizes the spurious features to learn invariant features on graphs.
优点
- This work introduces the concept of prompt learning into the domain of OOD generalization on graphs and proposes an innovative method aimed at learning invariant features on graphs by first learning spurious features.
- The method’s effectiveness is demonstrated across multiple datasets, showcasing its effectiveness in OOD gneralization.
缺点
-
In lines 47-53, the term "complex environments" requires clarification. In the provided example, the decline in test performance is primarily attributed to size shift, a well-recognized challenge. I disagree with the authors' attribution that this decline is due to the emphasis of current methods on extracting causal subgraphs.
-
The statement in lines 135-136, "Our approach, which represents a pioneering practice in utilizing negative inference, is distinct from all existing practices in this field," is inaccurate. EQuAD [1] also employs a similar paradigm that first learns spurious features before learning invariant features. This work, however, does not discuss or compare with EQuAD.
-
In Section 3, the core idea of mixing various environments can be interpreted as environment interpolation. However, the linear environment mixing via latent variable may prove ineffective in managing extrapolated (unseen) environments. The claim in lines 233-235, "However, we observe that such a goal cannot be achieved by existing methods," is inaccurate; previous works, such as AIA [2] and GraphSplice [3], address environment extrapolation and may offer a more effective alternative to environment interpolation.
-
The objective in Equation 4 appears unusual. Specifically, minimizing the KL divergence using : it is unclear how utilizing the label distribution of extra-class samples facilitates accurate environment inference, as they appear unrelated to me.
-
In the experimental section, the authors should analyze whether the negative prompter can truly learn embeddings in that accurately capture spurious patterns.
-
The authors claim that the proposed method can resolve both PIIF and FIIF, regardless of the correlation strengths of . However, the experiments are limited to the FIIF case, e.g.,GOODMotif, and all datasets assume . Additional experiments on synthetic datasets are necessary to support this theoretical statement.
-
The EQuAD method also employs a similar learning paradigm by leveraging spurious features for disentanglement to facilitate invariant feature learning. The authors should discuss and compare their approach with EQuAD. Furthermore, AIA and GraphSplice also aim to "broaden the inference space of environments" (line 71). A discussion or comparison with these methods is warranted.
References
- Yao et al., Empowering Graph Invariance Learning with Deep Spurious Infomax, ICML2024
- Sui et al., Unleashing the power of graph data augmentation on covariate distribution shift, NeurIPS 2023
- Li et al., Graph Structure Extrapolation for Out-of-Distribution Generalization, ICML2024
问题
Please see weakness.
The paper introduces a model to enhance OOD generalization in GNNs by leveraging negative inference techniques. The authors propose the NeGo model to broaden the inference space for environment variables in graph data, countering the limitations of existing positive-inference-based approaches. The NeGo model employs a negative prompt environment inference strategy, which draws inspiration from prompt learning in natural language processing, to capture underlying environment factors. This method enables NeGo to improve the extraction of causal subgraphs across complex environment shifts. The paper claims significant performance improvements over baselines across synthetic and real-world datasets.
优点
- This paper studies an interesting research problem which is graph OOD generalization using negative inference techniques to model the environmental factors.
- The paper is relatively easy to read. The paper provides a detailed explanation of the proposed method, including the environment inference strategy via negative inference and an attention-based residual connection in the method.
- The experiment demonstrates the effectiveness of the model. The performance improvement on some comparisons seems to be significant.
缺点
- Although the motivation is clear to me, the novelty is one of my concerns since the environment-enhance strategies for graph OOD problem seems to be relatively not new. I think the authors could address my concern by presenting more difference with related works clearly.
- There are a lot of works that study graph OOD in the survey (Out-Of-Distribution Generalization on Graphs: A Survey). The authors should include them in the discussions in the related works, and choose some representative methods for comparisons in the experiments.
- The theorems are present informally. I suggest the authors use formal math language rather than current imprecise expression, especially for the key concepts (including but not limited to “broader cognitive space”, “G_C can be extracted”, etc.) in the theorems.
问题
Please see weaknesses.
I have read and agree with the venue's withdrawal policy on behalf of myself and my co-authors.