PaperHub
6.6
/10
Spotlight4 位审稿人
最低3最高4标准差0.5
4
3
3
4
ICML 2025

Local Identifying Causal Relations in the Presence of Latent Variables

OpenReviewPDF
提交: 2025-01-22更新: 2025-07-24
TL;DR

We propose an efficient and theoretically complete local method for identifying causal relationships between two variables from observational data, even in the presence of latent variables.

摘要

关键词
Causal DiscoveryLatent VariableLocal MethodCausal RelationshipMarkov Blanket

评审与讨论

审稿意见
4

The paper proposes both sufficient and necessary local characterizations for the invariant ancestor, invariant non-ancestor, and possible ancestor relationships, relying solely on local structure rather than the entire graph, even in the presence of latent variables. A novel algorithm, LocICR, leverages these local characterizations to efficiently identify causal relationships. The algorithm is proven to be sound and complete. Extensive experiments on benchmark networks and two real-world datasets demonstrate the effectiveness and efficiency of LocICR compared to existing methods.

update after rebuttal

I've decided to raise my score from 3 to 4. The authors have provided clear and detailed responses to my concerns. For Q1, they clarified the learning process of the Markov blanket and incorporated this into the revised manuscript. For Q2, they explained the technical differences between their method and existing approaches like IDA and LV-IDA, emphasizing the advantages of their approach. For Q3, they elaborated on the distinction between the MMB-by-MMB algorithm and their Algorithm 1, highlighting the extension for identifying local characterizations. For Q4, they clarified why certain edges were not preserved, emphasizing the validation process using propositions. These thorough and thoughtful responses demonstrate a strong commitment to addressing my feedback and improving the quality of the manuscript, leading me to adjust my score upward.

给作者的问题

  • More details about what is the specific difference between the MMB-by-MMB algorithm and Algorithm 1.
  • The edges between G and H, C and F in Fig. 7 (b) are correct, why aren't they preserved in P\mathcal{P}?

论据与证据

Yes, the claims made in the paper are well-supported by both theoretical proofs and empirical evaluations. The authors provide solid theoretical foundations for their local characterizations (Theorems 1, 2, 3, and 4) and establish the soundness and completeness of the LocICR algorithm (Theorem 6). A comparison with several state-of-the-art methods demonstrates that LocICR outperforms these techniques in terms of accuracy and computational efficiency. Furthermore, extensive experiments on two real-world datasets further validate the practical applicability and effectiveness of the proposed approach.

方法与评估标准

Yes, the proposed local characterizations of causal relationships and the LocICR algorithm are well-suited to address the problem at hand. The proposed method focuses on the local structures to identify causal relationships, which is a practical approach for large-scale datasets with latent variables.

理论论述

Yes, I have checked the correctness of the proofs for the main theoretical claims made in the paper.

实验设计与分析

Yes, I have reviewed the soundness of the experimental designs and analyses. Extensive experiments on both synthetic and real-world datasets demonstrate that LocICR outperforms existing methods in terms of accuracy and computational efficiency.

补充材料

Yes, I have reviewed the source code.

与现有文献的关系

The key contributions of the paper are clear, and I think represent valuable research for the causal discovery community and are novel relative to the broader scientific literature on causal discovery. Prior work such as (Fang et al., 2022), and (Xie et al., 2024) also focuses on locally identifying causal relationships, however, the method proposed by (Fang et al., 2022) requires the assumption of causal sufficiency, and the method proposed by (Xie et al., 2024) focuses on relationships between a target variable and its adjacent variables, without generalizing to arbitrary pairs of variables. The proposed LocICR algorithm addresses these limitations.

遗漏的重要参考文献

No, the related works are thoroughly summarized and appropriately referenced.

其他优缺点

Strengths And Weaknesses:

  • The paper provides a novel and efficient approach to causal discovery, addressing significant limitations of existing methods.
  • The theoretical contributions are solid, and the proposed algorithm is practical.
  • The paper is well-written, with clear explanations and examples.
  • Extensive experiments on both synthetic and real-world datasets show that the LocICR algorithm outperforms baselines significantly.
  • The appendix provides helpful clarifications.

其他意见或建议

  • Additional details regarding the learning of LVi\mathcal{L}_{V_i} should be provided.
  • Detailed technical comparisons between the proposed and compared methods (e.g., IDA, LV-IDA) are not explicitly stated in the manuscript, which should be clarified.
作者回复

We are deeply grateful for the time you devoted to reviewing our manuscript. We appreciate that you think our proposal is a very solid paper. We hope that the following responses adequately address your concerns.

Q1. ``Additional details regarding the learning of LVi\mathcal{L}_{V_i} should be provided.''

A1. Thanks for being thoughtful. Specifically, the learning process of LVi\mathcal{L}_{V_i} first learns the Markov blanket of ViV_i, then performs skeleton learning based on the variables in MB+(Vi)MB^{+}(V_i), and finally determines the V-structures. As you suggested, we have incorporated this clarification in the revised version.

Q2. ``Detailed technical comparisons between the proposed and compared methods (e.g., IDA, LV-IDA) are not explicitly stated in the manuscript, which should be clarified.''

A2. Thanks for the helpful suggestion. After a PAG (or CPDAG) is learned, LV-IDA(or IDA)enumerates all corresponding MAGs (or DAGs), identifies adjustment sets that satisfy the generalized back-door criterion, and estimates causal effects. If all these estimated effects are consistently nonzero (or consistently zero), an invariant ancestor (or non-ancestor) is determined. By contrast, our method does not require global graph learning or enumeration. We will include the above discussion.

Q3. ``More details about what is the specific difference between the MMB-by-MMB algorithm and Algorithm 1.''

A3. Thank you for raising this point. The MMB-by-MMB algorithm focuses solely on learning the local structure involving the target variable and its adjacent variables. To identify local characterizations (Theorems 1, 2, 3, 4), we extend the MMB-by-MMB algorithm to learn the PMB+(X)\mathcal{P}_{MB^+(X)}. As a result, we can identify causal relationships between any pair of variables, no longer limited to the target variable and its adjacent variables.

Q4 ``The edges between G and H, C and F in Fig. 7 (b) are correct, why aren't they preserved in P\mathcal{P} ?''

A4. Thank you for your careful observation. We would like to clarify that in each iteration, from the learned LVi\mathcal{L}_{V_i}, we utilize Propositions 1 and 2 to identify valid edges and colliders and store them in P\mathcal{P}. Since the edges between G and H, as well as C and F, are not validated by Propositions 1 and 2, we do not include them in P\mathcal{P}. In other words, we only retain the information that we can confirm to be correct.

审稿人评论

Thank you for addressing the feedback. After re-assessing the manuscript and evaluating the revisions, I have decided to elevate the score from 3 to 4.

作者评论

Thank you for your supportive feedback and for raising the score.

审稿意见
3

The authors propose novel local characterizations that are necessary and sufficient for various types of causal relationships between two variables and bypass the need for global structure learning. Leveraging these local insights, the authors develop efficient and fully localized algorithms that accurately identify causal relationships from observational data. The authors theoretically demonstrate the soundness and completeness of the approach. Extensive experiments on benchmark networks and two real-world datasets further validate the effectiveness and efficiency of the method.

给作者的问题

Please consider address my concerns above during the rebuttal.

论据与证据

Overall, the paper is quite dense and would have been better suited for submission to a journal with a longer review period. The judgments below are based on my (possibly incorrect) understanding of its contributions.

The claims in the paper seem to be well-executed:

  1. The paper clearly defines the problem of local causal structure learning with latent variables and establishes the limitations of previous approaches that assume causal sufficiency.
  2. The paper provides Theorems 1-4, and proves the correctness (though I did not check them in detail).
  3. The effectiveness of the proposed method is validated using synthetic benchmarks (MILDEW, ALARM, WIN95PTS, ANDES) and real-world gene expression data. However, several important baselines are missing.

方法与评估标准

  1. The method section is well-organized and supplemented with visual diagrams, which are very helpful.
  2. However, some important baselines are missing. First, the global methods are somewhat outdated, as only PC/RFCI are considered. I encourage the authors to compare against some of more recent baselines, including [1-4]. Comparing a broader range of recent methods could provide a more comprehensive perspective on the proposed solution.
  3. The benchmark datasets appear to be unconventional. If I remember correctly, bnlearn already provides a conditional probability table for forward sampling. What prevents the authors from using the default setting instead of adopting a linear Gaussian parameterization?

[1] On scoring maximal ancestral graphs with the max–min hill climbing algorithm International Journal of Approximate Reasoning 2018

[2] Iterative causal discovery in the possible presence of latent confounders and selection bias NeurIPS 2021

[3] Differentiable causal discovery under unmeasured confounding AISTATS 2021

[4] Greedy equivalence search in the presence of latent confounders UAI 2022

理论论述

The theoretical claim looks good to me.

实验设计与分析

See comment above.

补充材料

No.

与现有文献的关系

The paper extends an existing solution from Xie et al. and addresses an important missing piece in local causal discovery: understanding the causal relationships between variables under latent confounding without requiring global causal discovery.

遗漏的重要参考文献

Please see my comments above on more recent global causal discovery methods.

其他优缺点

Strengths:

  • The paper is easy to follow given the provided visual diagram.
  • I enjoyed reading the paper and the notations/terms are consistent with the literature.
  • The motivation is clear and the problem studied in the paper is surely important.

Weaknesses:

  • Please see my comments regarding the evaluation.
  • A more thorough review of the state of the art in global causal discovery with latent confounders is missing. Most of the cited papers were published before 2020.

其他意见或建议

N/A

作者回复

We sincerely appreciate the time you dedicated to reviewing our paper, as well as your insightful and encouraging comments. Below, we provide our responses to your comments.

Q1. ``I encourage the authors to compare against some of more recent baselines''

A1. Thank you for the suggestion. Within the limited time, we have added a performance comparison with M3HC [1] and ICD [2] under the same settings as in our main paper. After learning the graph using ICD and M3HC, we applied Zhang's causal identification criteria (similar to RFCI-Zhang). As shown in the table below, LocICR outperforms other methods on all evaluation metrics and significantly reduces the number of CI tests compared to ICD-Zhang, which focuses on global structure learning with latent variables. Since M3HC is a hybrid method rather than fully CI-based, we did not compare the number of CI tests.

NetworkSizeAlgorithmWP↑WR↑WF1↑nTest↓
MILDEW1000LocICR0.910.910.91588.45
M3HC-Zhang0.830.820.80-
ICD-Zhang0.800.760.702139.98
3000LocICR0.960.960.96745.24
M3HC-Zhang0.880.850.83-
ICD-Zhang0.820.750.682857.56
5000LocICR0.980.980.98821.37
M3HC-Zhang0.860.850.84-
ICD-Zhang0.800.780.733369.91
ALARM1000LocICR0.860.820.83454.30
M3HC-Zhang0.850.790.80-
ICD-Zhang0.770.540.541821.39
3000LocICR0.900.880.88503.15
M3HC-Zhang0.850.790.80-
ICD-Zhang0.840.640.662231.66
5000LocICR0.970.970.97535.32
M3HC-Zhang0.880.820.83-
ICD-Zhang0.800.660.672484.25
WIN95PTS1000LocICR0.810.770.762382.69
M3HC-Zhang0.770.700.69-
ICD-Zhang0.680.560.515707.26
3000LocICR0.860.840.843031.96
M3HC-Zhang0.790.740.73-
ICD-Zhang0.680.600.577008.28
5000LocICR0.920.910.913302.28
M3HC-Zhang0.820.780.77-
ICD-Zhang0.690.610.577639.69
ANDES1000LocICR0.830.790.773980.06
M3HC-Zhang0.760.720.67-
ICD-Zhang0.800.710.6576004.21
3000LocICR0.850.800.784528.61
M3HC-Zhang0.800.760.73-
ICD-Zhang0.800.710.65107370.50
5000LocICR0.890.870.864931.34
M3HC-Zhang0.840.790.77-
ICD-Zhang0.810.720.67122166.73

Q2. ``What prevents the authors from using the default setting instead of adopting a linear Gaussian parameterization?''

A2. Following existing studies—specifically the setups of RFCI, LV-IDA, ITC, and IDA—we adopt a linear Gaussian parameterization, which is also used by M3HC and ICD. Additionally, we tested our method on the ALARM network with its default parameters in bnlearn, keeping other settings unchanged. The results below demonstrate that our method continues to perform well.

SizeWP↑WR↑WF1↑nTest↓
10000.890.860.86612.70
30000.890.880.88674.85
50000.940.940.94743.23

Q3. ``A more thorough review of the state of the art in global causal discovery with latent confounders is missing.''

A3. Thank you for the valuable suggestion. In the revised version, we have included a more comprehensive review of state-of-the-art methods in global causal discovery with latent confounders, highlighting recent advancements (e.g., [1–4]).

[1] On scoring maximal ancestral graphs with the max–min hill climbing algorithm International Journal of Approximate Reasoning 2018

[2] Iterative causal discovery in the possible presence of latent confounders and selection bias NeurIPS 2021

[3] Differentiable causal discovery under unmeasured confounding AISTATS 2021

[4] Greedy equivalence search in the presence of latent confounders UAI 2022

审稿意见
3

This paper provides a local causal discovery method for inferring causal relations between a pair of variables. Specifically, given any two variables XX and YY, the proposed algorithm outputs one of the following four results: XX is an invariant non-ancestor of YY, XX is an explicit invariant ancestor of YY, XX is an possible ancestor of YY, XX is a implicit invariant ancestor of YY. The authors provide both theoretical analysis and experimental results to demonstrate the soundness and effectiveness of the proposed algorithm.

==================update after rebuttal=========================

The authors' rebuttal address most of my concerns and I have already raised my score from 2 to 3.

给作者的问题

N/A

论据与证据

I think the proofs of some theoretical claims are potentially problematic, please refer to Theoretical Claims.

方法与评估标准

The authors provide many theoretical claims to demonstrate the soundness of the proposed method, but I think the proofs of some theoretical claims are potentially problematic, please refer to Theoretical Claims.

理论论述

Theoretical claims in Section 5 of this paper relies heavily on theorems in Xie et al. 2024. Therefore, I have also read Xie et al. 2024 carefully, but I found some potential problems detailed in the following.

  • Proposition 1 in this paper is exactly Theorem 1 in Xie et al. 2024. The proof of Theorem 1 in Xie et al. 2024 relies on Theorem 1 in Xie & Geng 2008. However, Theorem 1 in Xie & Geng 2008 presents a property of d-separation while Xie et al. 2024 directly replace d-separation to m-separation without any further clarification. I think this is a non-trivial modification. Specifically, the proof of Theorem 1 in Xie & Geng 2008 relies heavily on An(u)An(u) and An(v)An(v). While any variable in An(u)An(v)An(u) \cup An(v) exists in the underlying DAG, some variables in An(u)An(v)An(u) \cup An(v) may not exist in the underlying MAG because they may be latent variables.

  • The proof of Proposition 2 is confusing. Specifically, to prove S\exists S s.t. V1V2SV_ 1 \perp V_ 2 | S, the authors argue that there are three types of active paths between V1V_ 1 and V2V_ 2, the first and the second can be blocked by AA while the third can be blocked by SX,V2S_ {X, V_ 2}. This is not valid. First, the authors should not only consider active paths, because some inactive paths my become active given SS if SS contains some colliders. Second, the authors should prove all paths are blocked by a unique set rather than prove different paths are provided by different sets.

  • The proof of Proposition 3 is not clear (at least for me). First, according to the definition of MLM_L, it is arrived by iteratively removing leaf nodes, but when should we stop removing leaf nodes? If we don't stop, it seems that MLM_L will be an empty graph. Second, I cannot understand why two marginal distributions equal each other (PML(O)=PM(O)P_{M_L}(O') = P_M(O')) implies continuing this algorithm will not contribute to orienting the undirected edges.

Also, I have a minor question: In Zhang, 2008, PAG and maximally informative PAG are two different concepts. Does PAG in this paper actually refer to maximally informative PAG?

实验设计与分析

I have checked the experimental setup and I have no major concern.

补充材料

I didn't review the supplementary material.

与现有文献的关系

The authors have discussed this in Impact Statement.

遗漏的重要参考文献

There is no related work that is essential to understanding the (context for) key contributions of the paper, but are not currently cited/discussed in the paper.

其他优缺点

Strengths

  1. This paper investigates a novel problem: how to infer causal relations between two observed variables in the presence of hidden variables.

  2. This paper provides examples for their definitions and theorems, which improves readability substantially.

Weakness

  1. Some proofs are not very clear and rigorous, please refer to Theoretical Claim.

  2. This paper relies heavily on Xie et al., 2024. In fact, propositions 1, 2, 3 are all from Xie et al., 2024. This makes the theoretical contribution of this paper limited.

其他意见或建议

N/A

作者回复

We sincerely appreciate your thorough review and insightful comments. We hope the following response properly addresses your concerns.

Q1 Regarding ``...the theoretical contribution...'':

A1. We would like to clarify that one of our paper’s main theoretical contributions is proposing necessary and sufficient local characterizations (Theorems 1–4) that account for latent variables—an aspect not found in Xie et al. (2024). To learn these local characterizations, we extend the method proposed by Xie et al., 2024 to learn PMB+(X)P_{MB^+(X)}. Unlike their method, which is restricted to the target variable and its adjacent variables, we generalize it to apply to any two variables (see lines 280–285).

Q2 Regarding Proposition 1:

A2. We would like to clarify that we have double-checked Theorem 1 in Xie et al. (2024) and confirmed its validity. A key reason is the important fact stated in Zhang (2008): given any DAG G\mathcal{G} over V=OL\mathbf{V} = \mathbf{O} \cup \mathbf{L}—where O\mathbf{O} denotes the set of observed variables, and L\mathbf{L} denotes the set of latent variables—there is a MAG over O\mathbf{O} alone such that for any disjoint X,Y,ZO\mathbf{X}, \mathbf{Y}, \mathbf{Z} \subseteq \mathbf{O}, X\mathbf{X} and Y\mathbf{Y} are d-separated by Z\mathbf{Z} in G\mathcal{G} if and only if they are m-separated by Z\mathbf{Z} in the MAG. We also found similar conclusions in related works [see page 6 in Akbari et al., 2021; page 6 in Pellet & Elisseeff, 2008a].

  • Zhang J. Causal Reasoning with Ancestral Graphs. JMLR, 2008.

Q3 Regarding Proposition 2:

A3. Sorry for the confusion. We would like to clarify that if S\mathbf{S} contains some colliders that open inactive paths, there will exist vertices belonging to MB(X)MB(X) similar to AA that can be added to S\mathbf{S} to block these newly activated paths. According to your suggestion, we can define S\mathbf{S} as the specific set Pa(V1)SX,V2Pa^*(V_1)\cup S_{X,V_2}. Note that when we verify how S\mathbf{S} blocks the three types of active paths, any collider variables introduced are also included in S\mathbf{S}. Here, all AiA_i belong to Pa(V1)Pa^*(V_1) and Pa(V1)MB+(X)Pa^*(V_1)\subseteq MB^+(X), which is observable.

More specifically, the process is as follows:

If there exists an active path of the form V1A1V2V_1 \leftarrow A_1\dots V_2, then A1MB(X)A_1\in MB(X) can block that path. If A1A_1 is a collider on a path p1:p1: V1A1V2V_1 \dots * \rightarrow A_1 \leftarrow * \dots V_2, due to the graph being ancestral and V1A1V_1 \leftarrow A_1, there must exist V1A2V_1 \leftarrow A_2 on p1p1. Thus A2MB(X)A_2\in MB(X) also blocks p1p1, and if A2A_2 is also a collider on a path, it falls back to the case where A1A_1 is a collider.

If there exists an active path of the form V1A3V2V_1 \leftrightarrow A_3 \rightarrow\dots V_2, then A3MB(X)A_3\in MB(X) can block that path. If A3A_3 is a collider on a path p2:p2: V1A3V2V_1 \dots*\rightarrow A_3 \leftarrow * \dots V_2, due to the graph being ancestral and A3V2A_3 \rightarrow\dots V_2, there must exist a A3A4A_3 \leftarrow* A_4 on p2p2 . Thus A4MB(X)A_4\in MB(X) also blocks p2p2, and if A4A_4 is also a collider on a path, it falls back to the case where A3A_3 or A1A_1 is a collider.

If there exists an active path of the form V1V2V_1 \rightarrow\dots V_2, then SX,V2MB(X)S_{X,V_2}\subseteq MB(X) can block that path due to XV1X*\rightarrow V_1 and V1SX,V2V_1 \notin S_{X,V_2}. Thus, the edge V1V2V_1-V_2 is true.

We will incorporate the above discussion to make the proof clearer.

Q4 Regarding Proposition 3:

A4. First, if we do not stop, MLM_L will indeed become an empty graph. We would like to clarify that lines 1055–1061 illustrate the following fact: by suitably removing leaf nodes, we can derive the local subgraph MLM_L of interest from the global MAG MM.

Next, the reason two identical marginal distributions imply that continuing the algorithm will not further orient the undirected edges is that newly added leaf nodes do not affect the joint distribution of the existing variables in MLM_L. In other words, intuitively speaking, these leaf nodes do not introduce new directional information (e.g., V-structures) that could help further orient the undirected edges in the local structure.

Q5 Does PAG in this paper actually refer to maximally informative PAG?

A5. Yes, PAG refers to the maximally informative PAG.

审稿意见
4

The paper addresses the challenge of locally identifying causal relationships between arbitrary pairs of variables in a causal graph, without assuming the absence of hidden confounders. Existing methods typically rely on access to the entire graph or impose strong assumptions about latent variables or on the pair of variables of interest, making them unsuitable for many real-world settings. The paper provides necessary and sufficient conditions for classifying relationships as invariant ancestor, invariant non-ancestor, or possible ancestor based purely on local structure—even when latent confounders are present. It introduces LocICR, a novel local causal discovery algorithm that determines causal relationships between a given pair of variables The algorithm is proven to be sound and complete. Extensive experiments on benchmark causal networks and real-world datasets demonstrate that LocICR is both effective and computationally efficient.

给作者的问题

Enumerating all MAGs within a class and determining that X is an invariant ancestor across all equivalent MAGs is equivalent to saying that X is an invariant ancestor in the PAG. No? Am I missing something? If you disagree can you please give an example? (My intuition is that PAGs represents uncertainty within a class and since in all MAGs X is an ancestor then there is no uncertainty and so this information should be visible in the PAG, assuming you are using the complete rules of FCI). That said, I totally understand that in Malinsky and Spirtes, one way to get identification of a causal effect is via enumerating all MAGs and check if there is at least one set satisfying the generalized back-door criterion in each MAG. However finding a set satisfying the back-door criterion is not the same as finding ancestors. The latter is more complicated because the information it requires is not always visible by looking at the PAG. Maybe I am missing something so please correct me if you think I am wrong. I have scored the paper assuming that I am missing something related to this, I will update my score based on the rebuttal.

Since you are interested in Local causal discovery, is it possible to reduce the faithfulness assumption into a "local" faithfulness?

论据与证据

The authors claim that no existing method locally identifies causal relationships between arbitrary pairs of variables without assuming the absence of hidden confounders.

They provide both necessary and sufficient local characterizations for invariant ancestor, invariant non-ancestor, and possible ancestor relationships.

They introduce LocICR, a novel algorithm designed to locally infer causal relationships between variable pairs, and prove its soundness and completeness.

They demonstrate the effectiveness and efficiency of their approach on experimental data.

方法与评估标准

Yes. Rigorous experimentation has been made, 4 benchmarks have been used containing each 100 simulated datasets along with 2 real world applications. The experimental section is clearly written, providing all the necessary details, and five other methods were used for comparison. These methods are thoroughly described.

理论论述

I checked some proofs and they seem to be sound.

实验设计与分析

The experimental section consists of both a simulation study and a real-world data application. The simulation study is extensive and demonstrates the superior performance of the proposed method in terms of the weighted F1 score, as well as weighted precision and weighted recall. The authors also provide results on two real-world datasets, explaining these favorable outcomes with clear references.

补充材料

Yes, A and B, small part of C, and G.

与现有文献的关系

The authors are claiming that they are proposing the first method that locally identifies causal relationships between arbitrary pairs of variables without assuming the absence of hidden confounders (To the best of my knowledge this is true). Which means this paper is making causal inference more applicable in real world settings. Especially when the true causal graph is unknown and where discovering the true entire graph can be complicated for many reasons ...

遗漏的重要参考文献

I think most relevant citation are included. I just suggested in the comment section few additional citation which can also be relevant.

其他优缺点

The paper presents a compelling and relevant contribution to the field of causal inference.

It is clearly written and tackles an important problem in the domain.

The theoretical framework appears sound, and the experimental evaluation is rigorous, providing strong empirical support for the proposed approach.

By relying solely on local structure, the method remains computationally efficient and scalable, making it well-suited for large graphs where access to global knowledge is impractical.

I did not spot any major weakness other than few sentences that I found a bit confusing. More on that below.

其他意见或建议

I think the pape is very clear. But maybe consider introducing the correct definition of Markov Blanket for MAGs in the main paper (instead of deferring it to Appendix). This can make the paper even more accessible for readers unfamiliar with MAGs.

The authors are citing the seminal paper of Chekering when they first mention partial ancestral graph (PAG). I think this can be confusing since Chickering introduce a CPDAG and not a PAG (the difference between a PAG and a CPDAG substantial). At least along with this citation, cite paper truly working on PAGs such as Spirtes and Richardson, 1996; Ali et al., 2004; Zhang and Spirtes, 2005; Zhao et al., 2005.

In this sentence: "In this paper, we address the challenge of locally identifying the causal relationship between a pair of variables with- out requiring the learning of a full PAG, the enumeration of MAGs, or the assumption of causal sufficiency." You have a redundancy, if you are focusing on PAGs and MAGs then of course you are not assuming causal sufficiency.

Minor Typo in the citation: "Jonas, P., Dominik, J., and Scholkopf, B. ¨ Elements of Causal Inference. MIT Press, 2017." You replaced the first names of the first authors with their last names.

Typo: In the second line of the simulation table, “WP” should be replaced with “WR.”

Why is the performance of the proposed algorithm on MILDEW is better at small sample sizes (in WR and WP)?

Suggestion: Reference Appendix G.4 When Mentioning Benchmark Networks.

作者回复

We sincerely appreciate your constructive and thoughtful feedback, as well as your recognition of the importance, clarity, and empirical rigor of our work in the field of causal inference. We hope the following responses properly address your concerns.

Q1 ``introducing the correct definition of Markov Blanket for MAGs in the main paper''

A1. Following your suggestion, we have moved the definition of the Markov Blanket for MAGs from the Appendix into the main text.

Q2 ``citing the seminal paper of Chekering when they first mention PAG''

A2. Thank you for noting this. It was indeed a citation error, which we have now corrected in the revised version. We have properly cited works specifically addressing PAGs, including [Spirtes and Richardson, 1996], [Ali et al., 2004], [Zhang and Spirtes, 2005], and [Zhao et al., 2005].

Q3 ``... without requiring the learning of a full PAG, the enumeration of MAGs, or the assumption of causal sufficiency.'' Having a redundancy.

A3. Thank you for your thoughts. We have removed this redundancy in the revised version. Our initial aim was to emphasize that the proposed approach does not rely on the assumption of causal sufficiency.

Q4 ``Why is the performance of the proposed algorithm on MILDEW is better at small sample sizes (in WR and WP)?''

A4. Because our algorithm reduces the number of conditional independence (CI) tests through localization, it helps avoid bias introduced by excessive testing. Consequently, even with smaller sample sizes, our method outperforms other global approaches. Notably, existing local methods often perform worse because they require the assumption of causal sufficiency, while latent variables may be present in the system.

Q5 ``Enumerating all MAGs within a class and determining that X is an invariant ancestor across all equivalent MAGs is equivalent to saying that X is an invariant ancestor in the PAG?''

A5. You are correct! Stating that X is an invariant ancestor in all equivalent MAGs is equivalent to saying that X is an invariant ancestor in the PAG.

Q6 ``My intuition is that PAGs represents uncertainty within a class and since in all MAGs X is an ancestor then there is no uncertainty and so this information should be visible in the PAG, assuming you are using the complete rules of FCI. ... However finding a set satisfying the back-door criterion is not the same as finding ancestors. ''

A6. First, we would like to clarify that although XX may be an ancestor in all equivalent MAGs, this fact might not always be explicitly visible(i.e., by checking the directed path) in the PAG. This discrepancy arises because PAGs represent uncertainty within a class of MAGs and do not explicitly encode all ancestral relationships. For instance, as illustrated in Fig. 6, there is no directed path from AA to EE in the PAG shown in Fig. 6(b), yet AA is an ancestor of EE in all equivalent MAGs (Fig. 6(c)-(g)).

Furthermore, LV-IDA enumerates all (local) MAGs, identifies adjustment sets that satisfy the generalized back-door criterion, and estimates the possible causal effects accordingly. If all such causal effects are non-zero, the method concludes an invariant ancestor (Fang et al., 2022). For instance, in Fig. 6, LV-IDA calculates the causal effect from AA to EE for every equivalent MAG (Fig. 6(c)-(g)) as non-zero, thereby concluding that AA is an invariant ancestor of EE.

Q7 ``is it possible to reduce the faithfulness assumption into a "local" faithfulness?''

A7. Thank you for your thoughtful consideration. The local learning—such as our approach—can reduce unnecessary CI tests, which may help mitigate minor violations of the faithfulness assumption [Isozaki, 2014]. We will discuss this point in our paper and plan to explore it more thoroughly in future work.

Isozaki T. A robust causal discovery algorithm against faithfulness violation[J]. Information and Media Technologies, 2014, 9(1): 121-131.

Thank you again for your careful reading. We have corrected the typos in the citations and simulation table in the revised version.

最终决定

This paper introduces LocICR, a local method for identifying causal relationships between arbitrary pairs of variables in the presence of latent confounders. Unlike prior approaches that rely on learning a global causal structure such as a PAG or MAG, the proposed method operates entirely locally, leveraging new necessary and sufficient conditions for distinguishing invariant ancestor, non-ancestor, and possible ancestor relationships. The authors demonstrate soundness and completeness of their characterizations and provide an efficient algorithm that avoids the computational overhead of global methods.

The paper is well written and makes a strong theoretical and practical contribution. Several reviewers praised the clarity of the local characterizations and the extensive experimental evaluation, which covers both simulated benchmarks and real-world datasets. The authors also responded thoroughly during the rebuttal phase, addressing theoretical concerns and clarifying their distinctions from prior work. In particular, they provided detailed explanations of their proof dependencies, clarified differences with methods like IDA and LV-IDA, and included additional baseline comparisons with more recent global discovery methods (e.g., M3HC, ICD), strengthening the empirical case.

While one reviewer questioned the novelty of some of the technical components due to their relation to Xie et al. (2024), the authors convincingly showed how their generalization to arbitrary variable pairs and local characterizations under latent confounding go beyond previous work. Other concerns about unclear proofs and experimental setup were addressed to the satisfaction of the reviewers, with several increasing their scores post-rebuttal.

Overall, this is a rigorous and impactful paper that advances the state of local causal discovery, particularly in the challenging setting where latent variables are present. It offers a practical and theoretically grounded alternative to global structure learning and is well-suited for applications where local causal queries are more relevant than recovering the full graph. Therefore, I recommend acceptance.