8.2

/10

Poster4 位审稿人

最低4最高6标准差0.7

3.5

置信度

创新性3.5

质量3.8

清晰度3.3

重要性3.3

NeurIPS 2025

Relaxing partition admissibility in Cluster-DAGs: a causal calculus with arbitrary variable clustering

Clément Yvernes,Emilie Devijver,Adèle H. Ribeiro,Marianne Clausel,Eric Gaussier

OpenReview PDF

提交: 2025-05-11更新: 2025-10-29

TL;DR

We extend C-DAGs to support arbitrary (possibly cyclic) clusterings and introduce a sound, atomically complete calculus for cluster-level causal reasoning.

摘要

关键词

CausalityCausal InferenceCluster-DAGsDo-Calculus for Higher-Level abstractionsIdentification

评审与讨论

审稿意见

评分: 4置信度: 32025-06-21

The paper addresses causal effect identification problem in abstract causal graphs, in which each macro-level node is a cluster of the micro-level variables of some underlying causal mixed DAG. The authors extend the existing results in Anand et al. (2023) to allow cycles in the abstract graphs with a reformulation of d-separation and do-calculus rules that are provable sound and complete.

优缺点分析

Strengths:

The paper is well-written and easy to follow, with clear illustrative examples. The authors present solid theoretical results that address the problem at hand. The authors should it is sufficient to search for a compatible graph of size-3 clusters to efficiently apply do-calculus in practice. The method does not require knowledge of the micro-level causal structure but instead assumes known structure at the macro-level. Moreover, the proposed rules are agnostic to specific clustering types.

Weaknesses:

The paper supports causal reasoning up to L2-inference, whereas Anand et al. (2023) additionally support counterfactual inference.
A major limitation is the absence of numerical experiments to validate the theoretical findings. It would significantly strengthen the paper if the authors included a numerical demonstration showing that, given a valid clustering, one can derive the same conclusions of interventional inference from the abstract DAGs as those drawn from the micro-level DAGs.
Additionally, it would be beneficial for the authors to further characterize the algorithm’s computational complexity.

问题

Although I appreciate the theoretical contributions, I remain uncertain about their practical implications. While I am fully aware of the scope of the work, my questions relate primarily to causal discovery, which may help clarify the potential application of the theory.

The results suggest that if the d-separation rules do not hold in a C-DAG, there exists a compatible micro-level graph where they do not hold. What is the implication of this? Does it mean that the current clustering is invalid, and we should instead seek one where do-calculus consistently holds across all compatible graphs?
The paper assumes known abstract structure. In practice, in what scenarios is the macro-level structure known while the micro-level structure remains unknown?
Given a clustering of macro-level variables, there may be multiple compatible micro-level DAGs. Does the true underlying DAG among micro-level variables always belong to this set of compatible graphs?

局限性

Yes.

最终评判理由

It is clear the paper provides a solid theoretical work. I retain my rating where the borderline evaluation pertains to the lack of numerical experiments as empirical evidence that the theory can be reliably applied in practice.

格式问题

N/A

作者回复

2025-07-29

We thank you for your valuable feedback which we will consider for the final version of our paper, if accepted, especially regarding numerical experiments and computational complexity.

Answers to your questions:

Q1: When the clustering fixed, the implication is that the available causal information is insufficient to guarantee d‑separation (and identifiability) across all compatible micro‑graphs. In such cases, our do‑calculus procedure will construct a a micro‑level graph in which the targeted d‑separation condition fails and which can help users understand what causal information is missing. Furthermore, if one is free to choose a different clustering, it may be worthwhile to search for one under which d‑separation holds uniformly across compatible graphs. That said, if d‑separation is genuinely violated in the true underlying micro‑graph, no choice of clustering can restore it.
Q2: There are several real‑world scenarios in which the macro‑level C‑DAG is known but the micro‑level structure remains unknown, as in macroeconomics where sector‑level relationships are known (as consumption→investment) but not the firm‑to‑firm or household‑to‑household causal links, or in neuroscience where functional MRI region interactions can be established but not necessarily the causal links between neurons.
Q3: By Definition 1 (see also lines 119-121), a cluster DAG is assumed to be obtained from an ADMG which corresponds to the true underlying DAG. By construction, this DAG belongs to the set of compatible graphs.

2025-08-01

Thank you for the responses. These have addressed my questions. I retain my rating mainly due to the lack of numerical experiments.

审稿意见

评分: 5置信度: 32025-07-01

In a lot of complex real-world scenarios, access to a fine-grained causal graph is difficult, and often we have to deal with a coarser representation of these graphs. This paper studies one such abstraction of fine-grained causal graph called Cluster-DAG (CDAG) and extends the limitation of previous work, which needed an acyclicity constraint on the C-DAG. They also extend the notion of d-separation and do-calculus for the CDAG.

优缺点分析

Strengths

The paper is well written, and the presentation is clear with supporting examples throughout to give intuition.
The authors provide atomically complete do-calculus for the C-DAG (Theorem 2), which can then be used to answer both Level 1 (observational) and Level 2 (interventional) queries in Pearl’s hierarchy.
They provide a compact representation of the search space for the microlevel DAGs compatible with the C-DAG in the form of an unfolded graph (Definition 6) instead of enumerating over the whole search space. Their definition of a compatible graph (Definition 5) is also constructive, which helps create instances to test for a particular d-separation (Proposition 3).

Weaknesses

(Major) This work assumes that the cardinality of the cluster variable is known, which is then used to create the compatible or unfolded graph. There is a discussion to limit the number of variables to 3 in Theorem 4, showing that this preserves all the relevant dependencies. But, I am not sure if the result will hold if the cardinality is misspecified, which is very likely in the real-world scenarios where we only have access to the coarse C-DAG.
(Minor) This work assumes that when an intervention is performed on a cluster node all the micro-level nodes are also intervened on (the author also mentions this in Line 355). This assumption might be difficult to achieve in practice, where an intervention on such a large number of nodes simultaneously might be infeasible.

问题

In Definition 5 (Line 193), when adding the arrows for V^{C} -> W^{C}, why do we only add one arrow between these two clusters of variables? Why not add as many as possible so that the compatible graph remains acyclic? I understand that this might lead to a question of which edges to choose, and we are adding all those edges anyway in the unfolded graph. Please clarify / give intuition for this.
Theorem 4 is interesting since it shows that using a C-DAG with cluster size at most 3 leads to the same do-calculus results. Please add some intuition on why a cluster size of 3 was necessary and what information is theoretically lost (if any) when we reduce the size of the cluster up to 3.

局限性

Yes

最终评判理由

I retain my very positive rating of 5. I'm not further increasing it to 6, since it isn't clear what happens when the cardinality of the cluster variable is mis-specified, which is very likely in real-world scenarios.

格式问题

None

作者回复

2025-07-29

We thank you for your valuable feedback which we will discuss in the final version of our paper, if accepted, especially regarding clusters of unknown or misspecified cardinality.

Answers to your questions:

Q1: The canonical graph is deliberately constructed to impose the minimal number of topological constraints. For instance, suppose a compatible graph contains an edge $V_{2} \rightarrow W_{w}$ . Acyclicity then forbids $W_{w}$ from being an ancestor of $V_{2}$ in $G^{m}$ , and, by our indexing convention, it also forbids $W_{w}$ from being an ancestor of $V_{1}$ . Instead, by including only the edge $V_{1} \rightarrow W_{w}$ , we eliminate the second constraint.
Q2: First of all, example 4 shows that Theorem 4 may fail if we restrict cluster size to 2. In particular, for rule R1 with $W^{m} = \emptyset$ . When we reduce the size of the cluster up to 3, the resulting C‑DAG no longer corresponds exactly to the original micro‑system. However, from the standpoint of causal reasoning (i.e., the application of do‑calculus rules), no information is lost: the reduced C‑DAG and the original C‑DAG yield the same identification conclusions.

2025-08-01

Thank you for the responses. I retain my rating, since it isn't clear what happens when the cardinality of the cluster variable is mis-specified, which is very likely in real-world scenarios.

2025-08-03

We would like to clarify that the typical use case for C-DAGs assumes access to individual observed variables (i.e., micro-variables), with clusters constructed by grouping these based on interpretability needs and domain knowledge. In such settings, causal and confounding relationships between clusters are explicitly modeled, while dependencies within clusters are left unspecified. Crucially, the specification of a C-DAG does not require the inclusion or modeling of any unobserved variables within a cluster. Unobserved variables are only explicitly modeled when they act as latent confounders between clusters, in which case bi-directed dashed edges are introduced. This means that the cardinality of each cluster corresponds to the number of observed variables it contains, a quantity that is generally known. Therefore, the assumption of known cluster cardinality reflects realistic scenarios and does not compromise the validity of our theoretical results.

Nonetheless, if the cardinality of a cluster is overestimated, then the calculus remains sound, though not necessarily complete. Conversely, if the cardinality is underestimated, the calculus is complete, but soundness is no longer guaranteed. In cases where the true cluster size is unknown, Theorem 4 ensures that assuming a cardinality of 3 yields a sound calculus.

审稿意见

评分: 6置信度: 42025-07-02

This paper extends cluster DAGs (Anand2023) by providing a do-calculus in the presence of cycles, which one would expect to be present between clusters in many applications. The calculus is based on the assumption of an underlying acyclic graph, and the interventions considered act on all members of a cluster. On the basis of a novel d-separation criterion, they show that it is possible to determine whether a causal query would fail in any underlying graph that is compatible with a given cluster DAG in a sound and atomically complete fashion. Moreover, they find that that the result does not change if all clusters of size greater than 3 are reduced to size 3, which greatly improves the computational efficiency of the calculus.

优缺点分析

Strengths

Their results provide a do-calculus for what's arguably the most practically relevant setting: cyclic clusters.
Exceptionally clear, despite the great volume of theoretical results.
Strong result on how to improve the computational efficiency of applying their calculus.

Weaknesses

Some notational issues (see "Questions").
Rather short discussion of limitations and implications of their results.

This is the best paper I have read in a while.

问题

Main points:

line 117: what do you do if the topological ordering is not unique? (also, what is $V$ without index here?)
Definition 4: should it have a name? $\sigma$ is a tuple of sets but treated as a single set in some observations; please clarify. Line 163: should it say "ancestors of or in"?
Theorem 2: In R2, shouldn't it be $\bar{X}$ ? In R3, why the brackets in the union?
line 257: What is $X^m(Z^m)$ ?
Suggestion: add main-text pointers to corresponding proofs in appendix
Suggestion: rename structure of interest to something less generic

Details:

line 28: representations don't have to be learnt
line 80: $Z$ not yet defined
Line 166: what do you mean by "under"?
Line 223: what do you mean by "generally"?
Footnote 1: what is the $\sim$ relation?

局限性

A more extensive discussion on limitations and implications of your work would be welcome.

最终评判理由

This is an excellent paper of high technical quality and remarkable clarity.

格式问题

None.

作者回复

2025-07-29

We thank you for your valuable feedback — in particular, we are proud and pleased that you found this to be “the best paper [you] have read in a while.” We agree with your remarks and provide below clarifications to your questions and comments. In the final version, we will develop the discussion of limitations and implications of our results in the conclusion, by discussing the points raised by the reviewers.

Answers to your questions:

Regarding the topological ordering: Thank you for pointing this out. You are correct that the topological ordering of a DAG is not unique in general. In our context, it suffices to fix an arbitrary topological order of $G^m$ . The only property we rely on is that if $i < j$ , then $V_i$ is not a descendant of $V_j$ in $G^m$ . We will clarify this point explicitly in the final version of the paper.
Regarding Definition 4: we thank the reviewer for this helpful suggestion. We agree that it would be clearer to assign it a name to Definition 4, and we propose 'connecting structure of interest'. For the second point of \sigma being a tuple of sets treated as a single set, we will proofread carefully for the final version.
Regarding the notion of ancestors: we assume that every node is considered as an ancestor of itself, so the term “ancestors” already subsumes “ancestors of or in,”. We will clarify this convention in the final version of the paper.
Regarding Rule 2 and Rule 3: we have carefully checked Rule 2 and believe it is correct as stated. In this step, we apply do-calculus to remove the do-operator on $X^m$ , which results in the mutilation \underline{X^m}. As for Rule 3, you are right, brackets are not necessary since union is an associative operator.
Regarding X^m(Z^m), we defined it in line 251: $\X^m(\Z^m) = X^m \setminus \Anc(\Z^m,\Gm_{\overline{\W^m}})$
Regarding the CRL reference: you are right representations don't have to be learnt. We will improve this paragraph in the final version.
Regarding $\mathcal{Z}$ : thanks for catching this up, we will introduce $\mathcal{Z}$ .
Regarding L166: by "under," we mean "under the conditioning of." We will revise the sentence to clarify this and avoid ambiguity.
Regarding L223:in this context, "generally" is meant to convey that one cannot expect the two operations to commute in general. That is, while the inclusion between the two sets always holds, equality does not. We have updated the sentence to clarify this point. Revised sentence: “It is important to note that, in general, mutilating all graphs compatible with a C-DAG yields a strictly smaller set of graphs than those compatible with the mutilated C-DAG, as illustrated in Example 2”
Regarding Footnote 1: we are defining an equivalence relation in this footnote, denote by tilde. This will be clarified.

评论- Response to Rebuttal

2025-08-01

I thank the authors for thoroughly addressing my questions. I am confident that their proposed changes will clear up the few remaining clarity points in what is already an excellent paper.

审稿意见

评分: 5置信度: 42025-07-03

This paper generalizes Anand et al.'s (2023) result on do-calculus for Cluster-DAGs (C-DAGs), a kind of graphical structure derived from partitioning the vertex set of an underlying acyclic directed mixed graph and adding appropriate types of edge between cells of the partition according to the edges in the underlying graph. While Anand et al. (2023) only considers C-DAGs that do not contain any directed cycle and so effectively rules out certain partitions of vertices, the current work allows C-DAGs that contains cycles and as a result takes all partitions as admissible. In addition to presenting a do-calculus for the less constrained C-DAGs and demonstrating its soundness and atomic completeness, the paper also establishes that for the purpose of checking the applicability of a do-calculus rule to a given C-DAG, it suffices to check it in a simplified C-DAG in which all the clusters of size greater than three (i.e., contains more than three micro-vertices) are treated as if they contain only three micro-vertices.

优缺点分析

Strengths:

Despite being a highly theoretical paper with dense details on graphical definitions and manipulations, it is clearly written and fairly readable.
The central results are elegant.
Although it builds on Anand et al.'s (2023), there is at least one non-trivial innovation (the introduction of the canonical compatible graph), and one interesting finding on the possibility of reducing the computational cost, so the improvement is not merely incremental.

Weaknesses:

No real examples are given to illustrate the utility of the theoretical constructs. As a result, it is hard to estimate how much impact it will make.
The authors often use the terms "micro-variables" and "macro-variables" to describe the contrast between variables represented by the vertices in the underlying graph and those represented by clusters of vertices. However, it is not entirely clear to me that this terminology is appropriate. If the state space of a cluster-variable is merely the Cartesian product of the state spaces of its constituent variables (i.e., there is no coarse-graining of the state space), then referring to it as a macro-variable seems misleading, at least from my perspective.

问题

Minor questions:

I have not tracked recent usage of the term "root" in graph theory, but what are called "roots" in this paper seem to me more appropriately called "leaves".
On p. 3, line 6 from the bottom, what does "the topological order" mean? Topological ordering isn't unique in general, is it?
The informal remark about a "structure of interest" reminds me of the notion of a "d-connecting walk", which is also a kind of combination of a d-connecting path and paths from colliders to conditioning vertices. Is there any connection worth noting here between the two notions?
On p. 5, last line, why is the qualifier "V_v != W_w" needed?
In the statement of Theorem 2, the notion of "density compatible with [a] compatible graph" is not explained previously. Moreover, I have a vague impression that Pearl's do-calculus requires positive density. Do I remember it wrong?

局限性

Yes.

最终评判理由

My evaluation remains positive after taking into account other reviews and feedback.

格式问题

None.

作者回复

2025-07-29

We thank you for your valuable feedback. We agree with your remarks and provide below clarifications to your questions and comments. To better motivate our result, we will include an illustrative example in the final version. Regarding the terminology around micro- and macro-variables, we agree it may be confusing and will revise the paper to consistently refer to variables and clusters of variables instead.

Answers to your questions:

Q1: you are absolutely right that, in graph theory, nodes without descendants are typically referred to as “leaves”. We have adopted here the terminology commonly used in causal inference, where such nodes are often called “roots” (see "Identification of Conditional Interventional Distributions", Ilya Shpitser and Judea Pearl, UAI 2006). We chose to align with this convention to remain consistent with the causal framework.
Q2: Thank you for pointing this out. You are correct that the topological ordering of a DAG is not unique in general. In our context, it suffices to fix an arbitrary topological order of $G^m$ . The only property we rely on is that if $i < j$ , then $V_i$ is not a descendant of $V_j$ in $G^m$ . We will clarify this point explicitly in the final version of the paper.
Q3: We are not aware of a formal notion of a d-connecting walk. If this refers to a walk that satisfies the usual d-connection criterion but allows repeated vertices, then it differs from our structure of interest, which explicitly accounts for additional paths from colliders to conditioning variables. If the reviewer had a different definition in mind or could point us to a relevant reference, we would be very interested in learning more.
Q4: It is needed because the canonical graph is intended to be a compatible graph, and therefore must be a valid ADMG. Without this qualifier, the construction could introduce a self-loop (e.g., on $V_1$ ), which is not permitted in an ADMG.
Q5: You are right, this notion was not clearly defined before Theorem 2. What we meant is a density that admits a Markovian factorization over the observed nodes with respect to the C-DAG. You are also correct that the positivity assumption is needed, similarly to the standard assumptions in Pearl’s framework. We will revise the paper to clarify both points in the final version.

评论- Minor follow-up

2025-08-07

Thank you for the helpful response. By "d-connecting walk", yes, I meant the notion used by Jan Koster and several others, which allows repeated vertices. I mentioned it because a d-connecting walk effectively combines a d-connecting path and paths from colliders to conditioning variables, and for this reason is similar to the structure defined in this paper. I think the similarity is worth acknowledging.

And regarding the qualifier "V_v != W_w" at the bottom of p. 5, do I understand it correctly that the only way for V_v = W_w is that V^C = W^C (because if V^C != W^C, then they are disjoint)? If this understanding is correct, then I think it would be less confusing to state the condition at the cluster level, i.e., V^C != W^C.

评论- Quick answers to your remarks.

2025-08-07

Thank for your valuable feedback! We were not aware of the work you mention and will definitely acknowledge the similarity in the final version of the paper. Concerning Definition 5, it is meant to also encompass the case where V^C = W^C as cycles are allowed in cluster DAGs. That is why we use V_v != W_w and not V^C != W^C.

评论- Thanks

2025-08-07

I see. Thank you for the explanation.

最终决定Accept (poster)

2025-09-17

This paper vastly extends the applicability of Cluster-DAG models, which cluster variables together and model causality at the higher level of clusters, by relaxing the constraint of acyclicity. The key is to develop a proper do-calculus that makes sense in this context and to characterize what graphs are compatible. The paper also shows that it is always possible to work with small compatible graphs to reason at the level of the clusters.

Reviewers appreciated the paper. The presentation is clear, but it could benefit from better guidance about notation and acronyms, e.g., glossary, making sure no acronym is used before it is defined. An illustrative example for the model could also go a long way in establishing its applicability. We urge the authors to address these points and any other clarification made during the discussions.