PaperHub
7.8
/10
Spotlight4 位审稿人
最低3最高5标准差0.7
4
5
3
4
ICML 2025

Counterfactual Graphical Models: Constraints and Inference

OpenReviewPDF
提交: 2025-01-23更新: 2025-07-24
TL;DR

Constraints and inference using graphical models for counterfactual reasoning

摘要

关键词
Counterfactual inferenceGraphical ModelsConstraints

评审与讨论

审稿意见
4

The paper studies the identification of counterfactual queries. It studies the constraints induced by the casual graph: consistency, exclusion, and independence. The paper proposes a sound and complete method for testing independencies among counterfactual variables based on constructing a simplified graph (AMWN) and then testing d-separations on the graph. These constraints then lead to a set of sound and complete rules, called counterfactual calculus, for identifying counterfactual queries.

给作者的问题

Could you please provide some intuitions for the (counterfactual) ancestors in Definition 2.4? Maybe include it after the definition.

论据与证据

Yes, proofs for all theorems were included in the supplementary.

方法与评估标准

Examples were provided in the paper to illustrate how the counterfactual calculus can be applied to identify counterfactual queries.

理论论述

I reviewed the proofs in Appendix B.1, B.3, B.5, and C.2 in detail and skimmed through others. They look correct and rigorous to me.

实验设计与分析

N/A

补充材料

Yes. I reviewed the proofs (Sec B), Discussion and further examples (Sec C), and Frequently asked questions (Sec E).

与现有文献的关系

The paper proposes a sound and complete method for testing independencies for general counterfactual variables, which improves upon the previous methods including [Balke & Pearl, 1994] (not complete), [Shpitser & Pearl 2007] (not efficient), and [Richardson & Robins, 2013] (restricted to a single world).

Moreover, the paper proposes sound and complete rules for identifying counterfactual queries, which complement the previous algorithmic method in [Correa et al., 2021a]. As mentioned in Appendix E, this is an important contribution since the rules (counterfactual calculus) bring the potential to solve problems under more general setups.

遗漏的重要参考文献

I'm not aware of any missing references.

其他优缺点

  • The paper is clearly structured and well-written.
  • The supplementary contains a huge discussion on the connection to other related frameworks.
  • The paper also included the examples (end of Sec. 3), which makes it easier for me to see how the rules can be applied.

其他意见或建议

  • Figure 3 is a bit confusing to me since the meaning of the two columns and edges is missing.
  • While this was shown in the Appendix, I think it is worth mentioning in the main paper that counterfactual calculus can remove interventions (do operations) whenever they can be removed using do-calculus. Otherwise, it's unclear how it is complete for identification given the observational distributions.

Typos:

  • Page 2: "We base our analysis on the Structural Causal Model (SCM) paradigm (?)Ch. 7]pearl:2k."
  • Page 6, "Algorithm 1: For each edge VVV \in **V**" -> variable?
  • Supplementary Definition D.2: "conditions ??", Section E Q1: "(see (?)Sec. 1.3]bar:etal2020 for details"
作者回复

Thank you for reviewing our work, providing feedback, and giving suggestions.

To answer your question about the intuition for counterfactual ancestors (Definition 2.4), they are the counterfactual variables that are causally relevant to the variable in question. This extends the idea that, in graphical terms, a variable can only affect another if the former is an ancestor of the latter. When counterfactuals are involved, some of those ancestors become irrelevant (by virtue of the exclusion operator). For example, in a simple chain graph such as XZYX \to Z \to Y, XX is an ancestor of YY, but it is not a (counterfactual) ancestor of YzY_z, because under do(Z)do(Z), XX cannot affect YzY_z. Similarly, ZZ is an ancestor of YY, but for YxY_x the counterfactual ancestor is ZxZ_x, not XX or ZZ. As suggested, we will include a brief discussion of this after the definition; your suggestion is appreciated.

As you point out, we did not mention in the main paper that ctf-calculus subsumes do-calculus. It is suitable for any task where the latter is complete, such as identification from observational distributions. We will add Lemma C.1 (in the supplemental material) “ctf-calculus subsumes do-calculus” and a brief discussion to the main text to make this fact clear to the reader.

We see your point about Figure 3, and that the meaning of the columns and edges can be confusing in the middle of so many models where elements have a well-defined meaning. In this figure, the gray boxes represent data-generating mechanisms that transform a specific unit U=uU=u into a counterfactual event over the observable variables. Each rectangle is a copy of the mechanisms of the structural causal model. Depending on the counterfactual of interest, the mechanisms share some functions (e.g., fzf_z and fyf_y in (a)), redefine others (fxf_x and fxf_x' in (a)), or contain functions that require the evaluation of a separate set of mechanisms (e.g. fxf_x' in (b)) to compute a nested counterfactual. Following your suggestion, we will provide a better description of the graphical elements of the figure in the manuscript, thank you.

审稿意见
5

The paper is focused on the graphical modelling and the (symbolic) calculus of counterfactual inferences within the framework of Pearlian structural causal models. There are two major contributions: (i) a novel graphical representation called Ancestral Multi-World Networks (AMWN), which efficiently encodes counterfactual independencies implied by causal diagrams (d-separation is complete wrt AMWNs); (ii) a new set of inference rules called "counterfactual calculus" that extend Pearl's classical do calculus to counterfactuals (also sound and complete for those queries).

给作者的问题

论据与证据

All the claims about the construction and soundness of the new graphical structure and the corresponding calculus are formally proved.

方法与评估标准

This is a theoretical paper with no experiments.

理论论述

I checked the proofs of the main results but not those of the preliminary lemmas. I believe the results are correct.

实验设计与分析

This is a theoretical paper with no experiments.

补充材料

I only read the supp-mat, but didn't check the proofs of the lemmas.

与现有文献的关系

The relation with the broader scientific literature is very clear.

遗漏的重要参考文献

I think all the relevant references are properly cited.

其他优缺点

This is a significant and influential paper for counterfactual inference. The work fills a gap in the earlier literature, and I might imagine lots of applications based on the calculus presented here. Of course, some results are pretty technical, but this is expected and the authors did an excellent job in giving additional information and insights in the supplementary material.

其他意见或建议

The sentence about "transforming nested ctf into non-nested one" might be misleading, as we have a sum in the transformation. There are a few typos in the references (ex. "(?)Ch. 7]pearl:2k"). Moreover, in section 2.3., the test query is conditioned on Z intervened by setting X=x’ and not Z’ intervened setting X=x. Some of the material in S2 is not entirely novel. The authors should better make this explicit in their revised version. Has the lack of completeness of twin nets been explicitly stated in the original paper of Balke & Pearl?

As I can understand, the idea of computing a CTF query in the twin network after the surgery might therefore lead to wrong conclusions. If so, it would be nice to emphasize this point in the paper.

伦理审查问题

作者回复

Thank you for reading our work, pointing out typos, and providing suggestions, which we will incorporate into the manuscript. In particular, we will clarify in the paper that the unnesting corresponds to a transformation that starts with a nested counterfactual and ends with an expression involving a counterfactual with one less level of nesting.

As far as we can tell, the original paper describing the Twin Network method by Balke & Pearl does not claim the method’s completeness for using d-separation to assess counterfactual independencies. However, in Shpitser & Pearl (2007), the authors discuss the incompleteness of the method to motivate multi-networks.

As you point out, “the idea of computing a CTF query in the twin network after the surgery might therefore lead to wrong conclusions”. For instance, in the example in Sec. 2.3, related to Figure 5 (a,b), some distinct nodes in the Twin Network may refer to counterfactual variables that are deterministically the same. In the example, conditioning on ZxZ_{x}' is the same as conditioning on ZZ. Because the Twin Network does not capture this constraint graphically, d-separation does not capture such independencies. We have also provided further discussion on this in section C of the supplemental material but will clarify this in the main text as well. Thank you!

审稿意见
3

The paper presents a novel framework for counterfactual reasoning using graphical models. The paper introduces two key contributions: Ancestral Multi-World Networks (AMWN) – a new graphical representation for counterfactuals, and Counterfactual Calculus (ctf-calculus) – a set of rules for transforming counterfactual expressions. The work extends Pearl’s do-calculus, allowing for more efficient and general counterfactual reasoning. AMWN is a graphical model that encodes counterfactual independence relationships. It builds upon Structural Causal Models (SCMs) and replaces traditional Twin Networks, which are computationally expensive. The paper provides an algorithm to construct AMWNs, ensuring they are both sound (correct) and complete (able to capture all relevant relationships). The model enables efficient testing of counterfactual independence using d-separation. Counterfactual Calculus (ctf-calculus) is a set of three transformation rules that generalize do-calculus to counterfactual settings: First, the consistency rule relates observations and interventions. Second, the independence rule uses d-separation to infer counterfactual independence. Third, the exclusion rule eliminates interventions that do not affect a variable. These rules allow counterfactual expressions (e.g., P(YxX=x)P(Y_x | X=x’) ) to be rewritten in terms of observable or interventional probabilities.

The approach is computationally efficient, reducing the complexity of counterfactual queries compared to existing methods.

给作者的问题

How does your proposed idea compare to the works listed above?

论据与证据

The paper presents several claims that can be summarized in two areas. First, it claims AMWN is a sound and complete method for counterfactual independence reasoning and improves on existing graphical models (e.g., Twin Networks, Single World Intervention Graphs). These assertions are supported by consistency and constraint definitions, nested (and unnesting) counterfactuals, exclusion operators, and a hierarchy of counterfactual relations. These are used for the key theorem on counterfactual d-separation (independence) Second, the paper presents the CTF as a generalization of Pearl’s do-calculus (Theorem 3.1), resulting from counterfactual d-separation.

方法与评估标准

The proposed methods are theoretical. They introduce AMWN as a new representation of counterfactual graphical models. The paper defines ctf-calculus as a set of inference rules using d-separation in the AMWN framework to determine counterfactual independence. Since the work is theoretical, the evaluation criteria consist of the proofs. This includes computational complexity analysis compared with prior methods in terms of efficiency (Twin Networks, SWIG, and Multi-Networks).

理论论述

The key claims are summarized in three theorems, among other lemmas. Theorem 2.5 (Counterfactual d-Separation) proves that d-separation in AMWN is both sound and complete. Theorem 3.1 (Counterfactual Calculus Rules) formally establishes the transformation rules for counterfactual expressions. Theorem 3.2 (Soundness and Completeness of ctf-calculus for Counterfactual Identifiability) shows that ctf-calculus can fully characterize counterfactual identification.

实验设计与分析

The paper does not include empirical experiments or simulations

补充材料

No

与现有文献的关系

The paper extends prior work on Pearl’s do-calculus and Structural Causal Models (SCMs). It improves on existing methods such as Balke & Pearl, 1994); AMWN overcomes the incompleteness of d-separation in Twin Networks; Richardson & Robins (2013), for which AMWN handles multiple interventions simultaneously; Shpitser & Pearl (2007); for which AMWN avoids the exponential graph explosion problem. The literature review is strong, but an empirical comparison with recent causal inference models (e.g., deep learning-based causal models) could be useful.

**** POST REBUTTAL

thank you to the authors for their answer. Their comments can be helpful in improving the paper. I will maintain my positive score for this paper.

遗漏的重要参考文献

I would suggest to discuss the counterfactual d-separation concept in the context of other graphical models criteria of independence. See for instance Ma et al, (2022), Vo et al. (2023) , etc.

Ma, Jing, et al. "Clear: Generative counterfactual explanations on graphs." Advances in neural information processing systems 35 (2022): 25895-25907. Vo, Vy, et al. "Feature-based learning for diverse and privacy-preserving counterfactual explanations." Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023.

其他优缺点

Two observations on this work. First, the paper provides a general framework for counterfactual reasoning, unifying different causal inference tools. Second, the method is computational efficiency as AMWN provides a polynomial-time method for counterfactual independence testing.

其他意见或建议

Just a suggestion to rename the heading Any sep. in Table 1 for something more informative. Alternatively, it could be added to the caption.

作者回复

Thank you for reading our work, providing feedback, and asking questions.

We refer next to the research mentioned in the review. Also, thank you for sharing the references. We describe the work as we understand it, but we would be happy to hear more about it from the reviewer. Regarding Ma et al (2022), the paper makes assumptions on the latent variables’ prior and the availability of an auxiliary variable. Based on these, they aim to identify the SCM that generates the data. They integrate the information on the SCM into the optimization process of a generative model to produce explanations. Compared to our approach, there are similarities in modeling the data-generating process as an SCM and then inferring counterfactual quantities based on the assumptions made over the SCM. On the other hand, in our framework, we make no assumptions about the distribution of the latent variables or the functional form of the mechanism of the SCM but assume a known causal graph. This setting is called non-parametric in the literature, which motivated Pearl’s Biometrika’s 1995 paper, where he introduced the do-calculus. Moreover, we do not attempt to identify the SCM as a whole but to identify particular (counterfactual) queries of interest in the form of independence constraints or probabilities. These are indeed the two main contributions of our paper, namely, graphical criteria and calculus for counterfactual identification.

Vo et al (2023) seems to focus on producing examples that counter the original outcome produced by a model (e.g. classifier). Moreover, it seems the validity of the counterfactual is measured in terms of whether the outcome can be overturned by the generated example. We believe our work is not directly comparable here since it seems this work does not address the causal structure of the underlying data-generating process. Having said that, we would be happy to understand the reviewer's suggestions, in case it implies some subtle connection we may have missed based on our cursory reading.

审稿人评论

Thank you for the answer. I will maintain my score.

审稿意见
4

The paper introduces an efficient graphical construction called Ancestral Multi-world Network, which is sound and complete for interpreting counterfactual independencies from a causal diagram through d-separation. Furthermore, the authors propose the counterfactual (ctf-) calculus, which provides three transformation rules for deriving counterfactual quantities based on the constraints encoded within the diagram.

给作者的问题

Could you provide a simple example illustrating a scenario where your method succeeds but the approaches listed in Table 1 fail—for instance, demonstrating that the Twin network algorithm is not complete, while yours is?

论据与证据

Yes, the claims presented in the paper are supported by clear and convincing evidence.

方法与评估标准

Yes

理论论述

I have checked most of the proofs. However, for Lemma 2.3, Definition 2.4, and Theorem 3.1, providing a causal diagram would greatly enhance clarity and ease of understanding.

实验设计与分析

There is no experimental designs or analyses.

补充材料

Yes, parts A and B in the supplementary material have been reviewed.

与现有文献的关系

The paper generalizes Pearl's celebrated do-calculus from interventional to counterfactual reasoning. The proposed AMWN improves upon existing frameworks, including Twin Networks, Single World Intervention Graphs, and Multi-Networks. Additionally, the three rules introduced for ctf-calculus are more general than Pearl’s do-calculus and the Potential Outcome Calculus (po-calculus).

遗漏的重要参考文献

Although the paper mentions the k-plet Network, it appears that no relevant references are provided.

其他优缺点

Strengths:

-Compared to Twin Networks and k-plet Networks, AMWN reduces complexity by requiring fewer variables for representing counterfactual scenarios.

-Extends Pearl's do-calculus effectively to counterfactual reasoning.

Weaknesses:

Many definitions in the paper overly emphasize detailed explanations of variables. Although the paper compares AMWN to Twin Networks and SWIG, detailed experimental results quantifying performance improvements are not provided.

其他意见或建议

In definitions and lemmas, clearly explaining each variable would be helpful; additionally, providing accompanying causal diagrams would significantly enhance readability and understanding.

作者回复

Thank you for reviewing our paper.

To address your question about a scenario where our method succeeds but the other approaches mentioned in Table 1 fail, let us consider the question of whether the causal graph in Figure 4(b) implies that (Yxw,WxXZx)(Y_{xw}, W_{x'} \perp X | {Z_x}'). Figure 5(a) shows a 3-plet (triplet) network (a natural generalization of the twin network to 3 worlds) for this graph and question. The variables in the query involve three submodels: M,Mx\mathcal{M}, \mathcal{M_x}, and Mxw\mathcal{M_{xw}}, all depicted in the network sharing explicit unobservable variables. It seems that XX is d-connected to YxwY_{xw} given Zx{Z_{x}}', because there is an active path XZUzZxwYxwX \gets Z \to U_z \to Z_{xw} \to Y_{xw}. However, as discussed in the manuscript (around line 262) and due to exclusion restrictions, conditioning on ZxZ_{x'} is equivalent to conditioning on ZZ, meaning the corresponding separation statement holds. This implies that the Twin Network alone leads us to infer a wrong conclusion. Although this example involves three worlds (to take advantage of the figure in the paper), the same argument could be made with only two of them.

For Single World Intervention Graphs (SWIGs), conditional independence among variables present in the SWIG can be read using d-separation, while the representation itself cannot capture cross-world restrictions on the counterfactual joint distribution. For instance, the separation of XX and YxY_x given Z cannot be judged using the SWIG for Figure 4(a) and intervention X=xX = x because Z does not appear in the resulting graph. These examples and some discussion on Shpitser & Pearl 2007 can be found in the supplemental material, section C.

The k-plet Network is a concept we use in the paper to refer to the extension of the Twin Network method to k worlds (2-plet network equals Twin Network). In the paper, we imply that when combined with the exclusion operator, the k-plet network method is complete, but it can further be optimized, leading to the discussion of AMWNs. We added proper discussion and clarification about this point in the paper. Thank you! Also, as you pointed out, several definitions in the paper emphasize explanations about the notation and counterfactual variables. We will try to make those definitions more concise, maintaining the essence of the definition while moving additional explanations elsewhere.

审稿人评论

The authors have addressed my concerns. Hence, I will maintain my current score and lean toward accepting the paper.

最终决定

This paper introduces AMWNs, a novel graphical framework for counterfactual reasoning that is both sound and complete for d-separation-based counterfactual independence. Additionally, it proposes a counterfactual calculus that generalizes Pearl’s do-calculus to handle counterfactual queries, filling a longstanding gap in the causal inference literature.

All four reviewers gave positive assessments, highlighting the paper's theoretical depth, rigorous proofs, and its significance in advancing counterfactual inference. The proposed framework addresses limitations in Twin Networks and SWIGs by avoiding graph blow-up and supporting general counterfactual transformations. The rebuttal addressed the main concerns raised by reviewers.

While the paper does not include experiments—understandable for a theoretical contribution—some reviewers noted that clearer visualizations (e.g., Figure 3), improved explanations of certain concepts (e.g., nested ctf reduction), and minor editorial refinements could enhance accessibility. Additionally, a brief discussion comparing AMWN's d-separation with other graphical independence criteria could strengthen the broader contextual framing.

Overall, the paper presents a significant contribution to the theory of causal inference. I recommend acceptance.