Optimal Adjustment Sets for Nonparametric Estimation of Weighted Controlled Direct Effect
摘要
评审与讨论
The paper makes three substantial contributions: conditions for identifiability, derivation of the influence function, and characterization of optimal adjustment sets for WCDE. These are theoretically significant and extend existing work on average treatment effect (ATE) adjustment.
优缺点分析
Strengths
The introduction effectively motivates the Weighted Controlled Direct Effect (WCDE) as a generalization of the Controlled Direct Effect (CDE), with clear links to fairness analysis and policy relevance.
The formal criteria for Valid Adjustment Sets (VAS) are well-defined and justified.
The contrast between CDE and WCDE is articulated well, especially regarding the limitations of fixing mediator levels arbitrarily.
Weaknesses
Even though the paper is theoretical, a small simulation study could help validate the efficiency gains from the optimal VAS, enhancing accessibility for applied researchers.
The identifiability and efficiency results hinge on having the true DAG and the absence of unmeasured confounding. Without access to the true DAG, the proposed method may no longer be valid.
问题
n/a
局限性
Since fairness is a key motivation, consider sketching a concrete fairness scenario where the WCDE provides insights unattainable through the CDE or ATE. Including a real-world case would make the example even more compelling.
最终评判理由
Indeed, as I mentioned in my previous comments, it is a good paper. I’m glad to see it accepted at NeurIPS.
格式问题
n/a
We thank the reviewer for recognizing the significance of our identification theory, influence function derivation, and optimal adjustment set analysis, as well as our motivation from fairness. We are grateful for the constructive suggestions that will help us further improve the completeness and clarity of our work.
Weakness 1: Simulation to validate efficiency gains.
We appreciate this valuable suggestion. In the revised version, we have conducted a series of simulation studies to evaluate the performance of AIPW estimators for the WCDE under finite sample settings. Details of the simulation setup and results are provided in our response to Reviewer tnMD (Additional Experimental Results). These simulations have been completed and will be incorporated into the revised version of the paper.
Our findings demonstrate that the optimal Valid Adjustment Set substantially reduces variance and improves estimation accuracy when the sample size is limited. In particular, the adjustment set consistently achieves the lowest variance and MSE across both backdoor and mediator adjustment scenarios.
Weakness 2: Discussion of DAG misspecification.
Thank you for highlighting this important limitation. We acknowledge that assuming a known DAG is a strong condition, and we will explicitly clarify this in the revised paper. As the reviewer notes, our framework is aligned with standard theoretical work in causal inference [1, 2, 3], where identification and efficiency results are derived under the assumption of a known graph structure. We will add discussion distinguishing this setting from causal discovery, which addresses DAG estimation under uncertainty [4, 5]. A more detailed analysis of this limitation and its practical implications is provided in our response to Reviewer Nm1A (Weakness 1).
Limitation: Real-world fairness case study.
We thank the reviewer for raising this important point. Indeed, both in theoretical analyses [6] and real-world applications [7], different mediator values can lead to different CDE values, making interpretation challenging across subpopulations. In contrast, the WCDE is uniquely defined, as it averages over the marginal distribution of mediators rather than conditioning on a specific . This avoids ambiguity and provides a consistent estimand across populations. Additionally, by averaging over different mediator values, it has smaller finite sample variances.We will clarify this distinction and the practical advantages of WCDE in our revised paper.
[1] L. Henckel, E. Perković, and M. H. Maathuis. Graphical Criteria for Efficient Total Effect Estimation via Adjustment in Causal Linear Models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84(2):579--599, 2022.
[2] J. Runge. Necessary and Sufficient Graphical Conditions for Optimal Adjustment Sets in Causal Graphical Models with Hidden Variables. arXiv preprint arXiv:2102.10324, 2021.
[3] A. Rotnitzky and E. Smucler. Efficient Adjustment Sets for Population Average Causal Treatment Effect Estimation in Graphical Models. Journal of Machine Learning Research, 21(188):1--86, 2020.
[4] J. Maasch, W. Pan, S. Gupta, V. Kuleshov, K. Gan, and F. Wang. Local Discovery by Partitioning: Polynomial-Time Causal Discovery Around Exposure--Outcome Pairs. In Proceedings of the 40th Conference on Uncertainty in Artificial Intelligence (UAI), 2024.
[5] J. Maasch, K. Gan, V. Chen, A. Orfanoudaki, N.-J. Akpinar, and F. Wang. Local Causal Discovery for Structural Evidence of Direct Discrimination. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 19349--19357, 2025.
[6] T. J. VanderWeele. Controlled Direct and Mediated Effects: Definition, Identification and Bounds. Scandinavian Journal of Statistics, 38(3):551--563, 2011.
[7] W. W. Loh, O. Lüdtke, A. Robitzsch, U. Trautwein, and O. Lüdtke. Estimation of Controlled Direct Effects in Longitudinal Mediation Analyses with Latent Variables in Randomized Studies. Multivariate Behavioral Research, 55(5):763--785, 2020.
It's a good paper without any experitments on the current version. The added experiments are good, but I will keep my score. Thanks.
This paper presents both necessary and sufficient conditions under which weighted controlled direct effect are identifiable. These necessary and sufficient conditions are proved to yield unique weighted controlled direct effect. Later, the paper also proposes a way to identify and verify optimal adjustment set that yields asymptotically lower variance of the estimated weighted controlled direct effect.
优缺点分析
Strengths:
- The paper is written well and easy to understand.
- This paper nicely extends the theoretical results from literature on effect identifiability and optimal adjustment set identification to the case of weighted controlled direct effect which has applications in domains such as fairness.
Weaknesses:
- A major limitation of this paper is lack of experimental results. While the theoretical results are interesting and useful, experimental results verifying the theoretical results further significantly improve the paper.
- Considering the complexity of the paper, it benefits more from several illustrative figures explaining various aspects of the paper. For instance, more causal diagram based examples are needed to understand condition 2.3 (4).
- Minor: In L133, the phrase "let be parents of " needs a little refinement. Currently it reads like . Similarly in L145, clearly explain what counterfactual independence conditions are.
问题
See the weaknesses section above and provide your responses for those comments.
局限性
Limitations are not explicitly presented but future directions are discussed.
最终评判理由
All my questions are addressed during the discussion with the authors. I encourage authors to include the results on semi-synthetic data in the revised version.
格式问题
None.
We thank the reviewer for highlighting the clarity of the exposition and the paper’s relevance to fairness-related applications. We appreciate the reviewer’s thoughtful feedback and apologize for the aspects of the paper that were unclear or incomplete.
Weakness 1: Experimental results. We thank the reviewer for the excellent suggestions. In the revised version, we include new simulation results to empirically validate the efficiency of different valid adjustment sets in estimating the WCDE. Details of the simulation setup and results are provided in our response to Reviewer tnMD (Additional Experimental Results). The results consistently demonstrate that the adjustment set selected via our optimality criterion achieves lower variance and MSE compared to alternative valid sets, especially under finite samples.
Weakness 2: More illustrative figures.
We agree that additional visual aids would significantly enhance clarity. Condition 2.3(4) is indeed abstract, which is why we specifically included Example 2.5 and Figure 1 to illustrate its meaning. In the revised version, we will incorporate more causal diagrams to better explain Condition 2.3—highlighting the roles of the mediators, treatment, and outcome, as well as elucidating the purpose of the fourth condition.
Similarly, following Theorem 4.3, we will add a concrete DAG example to demonstrate how a sequence of variance-reducing operations leads to the construction of the optimal adjustment set (the O-set). This will help readers better understand the motivation behind the O-set and the roles of Lemmas 4.4 through 4.7. All figures will be clearly labeled and integrated into the main text for improved readability.
Weakness 3: Refinements in L133 and L145.
We apologize for the imprecision in the original version. The original notation " be parents of " was indeed vague and could cause confusion. In the revision, we will clarify this by explicitly defining to indicate that only the mediators that are parents of the outcome are included in the weighted contrast. This refinement ensures consistency with the identification logic and makes the derivation more precise.
Additionally, we will revise Line 145 to clearly state the counterfactual independence assumptions used in the derivation. Since WCDE is defined as a weighted version of the Controlled Direct Effect (CDE), it inherits the identification conditions from CDE. Let denote the potential outcome that would have been observed had the treatment been set to and the mediators set to . Identifying the causal effect of on while holding fixed at requires the following standard assumptions:
(1) No unmeasured confounding of the treatment–outcome relationship:
(2) No unmeasured confounding of the mediator–outcome relationship:
Here, represents a valid adjustment set for identifying the CDE. These assumptions ensure that, conditional on , the joint intervention on and yields counterfactual outcomes that are independent of the actual treatment assignment and mediator values. This enables the identification of from observed data under standard causal models. We will incorporate these clarifications in the revised paper to enhance the precision and transparency of our derivation.
I thank the authors for the detailed response. Regarding experiments, What are the real-world example use cases of the proposed method? The current experiments are synthetic.
Thank you for your valuable comment. To demonstrate real-world applicability, we conducted additional simulations on three semi-synthetic Bayesian networks from the bnlearn repository, which are widely used in causal inference benchmarks for their realistic structure.
For each DAG, we:
- Selected a treatment–outcome pair and enumerated all valid adjustment sets based on our identification criterion;
- Estimated the WCDE using the AIPW estimator;
- Simulated data from the underlying causal model under sample sizes n = 100, 400, and 1000;
- Computed the variance and MSE for each adjustment set.
We applied this procedure to the following networks:
- ASIA network: A medical diagnosis model with 8 nodes and 8 edges (max in-degree 2). We set
eitheras the treatment anddyspas the outcome. Our criterion identifiesbroncas the O-set, which also achieves the lowest variance and MSE when n = 400, 1000, consistent with our asymptotic theoretical guarantees. When n=100, although its variance is not minimal, it still achieves the lowest MSE. This may be due to its relatively low bias compared to other adjustment sets, highlighting its robustness even in small sample settings. - SIGNALING network: A protein signaling DAG with 11 nodes and 17 edges (max in-degree 3). We set
PKCas the treatment andAktas the outcome. Our method selects{PKA, Erk}as the O-set, which performs very well under both n = 400 and n = 1000. When n = 100, the asymptotically optimal O-set is no longer strictly optimal, but its estimation remains competitive and close to the best-performing adjustment set in this low-sample regime. - ALARM network: An anesthesia monitoring model with 37 nodes and 46 edges (max in-degree 4). With
FIO2as treatment andCATECHOLas outcome, our method identifies{SAO2, INSUFFANESTH, ARTCO2, TPR}as the O-set. This adjustment set achieves consistently strong performance across all sample sizes, with variance and MSE lower than most other valid adjustment sets, demonstrating both favorable asymptotic properties and solid finite-sample behavior.
Variance of WCDE estimates on the ASIA network
| Adjustment Set | n = 100 | n = 400 | n = 1000 |
|---|---|---|---|
| [bronc] | 0.03093 | 0.00666 | 0.00336 |
| [smoke] | 0.02896 | 0.01664 | 0.00534 |
| [bronc, smoke] | 0.03074 | 0.01133 | 0.00539 |
| [bronc, lung, smoke] | 0.02347 | 0.02125 | 0.01742 |
| [bronc, lung] | 0.02943 | 0.02562 | 0.01780 |
| [lung] | 0.06492 | 0.04206 | 0.01996 |
| [lung, smoke] | 0.03255 | 0.03582 | 0.02083 |
MSE of WCDE estimates on the ASIA network
| Adjustment Set | n = 100 | n = 400 | n = 1000 |
|---|---|---|---|
| [bronc] | 0.03269 | 0.00676 | 0.00349 |
| [smoke] | 0.03329 | 0.01720 | 0.00552 |
| [bronc, smoke] | 0.03446 | 0.01134 | 0.00563 |
| [bronc, lung, smoke] | 0.05746 | 0.02333 | 0.01786 |
| [bronc, lung] | 0.05455 | 0.02663 | 0.01781 |
| [lung] | 0.07439 | 0.04286 | 0.01997 |
| [lung, smoke] | 0.06130 | 0.03872 | 0.02124 |
Variance of WCDE estimates on the SIGNALING network
| Adjustment Set | n = 100 | n = 400 | n = 1000 |
|---|---|---|---|
| [Erk, PKA] | 0.00812 | 0.00245 | 0.00131 |
| [Erk, PKA, Raf] | 0.00832 | 0.00304 | 0.00148 |
| [Erk, Mek, PKA] | 0.00736 | 0.00244 | 0.00155 |
| [Erk, Mek, PKA, Raf] | 0.00778 | 0.00293 | 0.00155 |
| [Mek, PKA] | 0.01299 | 0.00554 | 0.00352 |
| [Mek, PKA, Raf] | 0.01328 | 0.00615 | 0.00359 |
MSE of WCDE estimates on the SIGNALING network
| Adjustment Set | n = 100 | n = 400 | n = 1000 |
|---|---|---|---|
| [Erk, PKA] | 0.00855 | 0.00245 | 0.00131 |
| [Erk, PKA, Raf] | 0.00890 | 0.00304 | 0.00148 |
| [Erk, Mek, PKA] | 0.00799 | 0.00244 | 0.00156 |
| [Erk, Mek, PKA, Raf] | 0.00841 | 0.00293 | 0.00156 |
| [Mek, PKA] | 0.01322 | 0.00660 | 0.00391 |
| [Mek, PKA, Raf] | 0.01344 | 0.00725 | 0.00396 |
Variance of WCDE estimates on the ALARM network
| Adjustment Set | n = 100 | n = 400 | n = 1000 |
|---|---|---|---|
| [ARTCO2, INSUFFANESTH, SAO2, TPR] | 0.01191 | 0.00305 | 0.00109 |
| [ARTCO2, PVSAT, SAO2, VENTALV] | 0.00652 | 0.00347 | 0.00168 |
| [ARTCO2, INSUFFANESTH, SAO2, VENTALV] | 0.00790 | 0.00413 | 0.00222 |
| [ARTCO2, INSUFFANESTH, PVSAT, SAO2] | 0.00933 | 0.00349 | 0.00225 |
| [ARTCO2, INSUFFANESTH, SAO2] | 0.00923 | 0.00386 | 0.00231 |
| [ARTCO2, PVSAT, SAO2] | 0.00887 | 0.00351 | 0.00235 |
| [ARTCO2, SAO2, VENTALV] | 0.00751 | 0.00443 | 0.00242 |
| [ARTCO2, SAO2] | 0.00889 | 0.00421 | 0.00257 |
MSE of WCDE estimates on the ALARM network
| Adjustment Set | n = 100 | n = 400 | n = 1000 |
|---|---|---|---|
| [ARTCO2, INSUFFANESTH, SAO2, TPR] | 0.01260 | 0.00306 | 0.00109 |
| [ARTCO2, PVSAT, SAO2, VENTALV] | 0.00678 | 0.00357 | 0.00168 |
| [ARTCO2, INSUFFANESTH, SAO2, VENTALV] | 0.00807 | 0.00415 | 0.00224 |
| [ARTCO2, INSUFFANESTH, PVSAT, SAO2] | 0.00959 | 0.00353 | 0.00226 |
| [ARTCO2, INSUFFANESTH, SAO2] | 0.00954 | 0.00388 | 0.00231 |
| [ARTCO2, PVSAT, SAO2] | 0.00921 | 0.00355 | 0.00235 |
| [ARTCO2, SAO2, VENTALV] | 0.00766 | 0.00444 | 0.00247 |
| [ARTCO2, SAO2] | 0.00920 | 0.00423 | 0.00258 |
I thank the authors for answering my questions on real-world significance of the method. Overall, I am satisfied with the paper and I will increase my score.
This paper studies nonparametric estimation of the Weighted Controlled Direct Effect (WCDE), which is an extension of the standard controlled direct effect (CDE) that averages over the mediator distribution, providing a robust estimate when treatment effects vary across mediator levels.
优缺点分析
Strengths:
- This paper studies nonparametric estimation of the Weighted Controlled Direct Effect (WCDE), which is an interesting and important extension of the standard controlled direct effect (CDE).
- This paper defines a Valid Adjustment Set and establishes that Condition 2.3 is both necessary and sufficient for identifying a unique WCDE (Lemma 2.4).
- This paper presents the identification theory and formulas for WCDE via Valid Adjustment Set.
- This paper proposes three fundamental advances for WCDE in observational studies:
- Establishing necessary and sufficient conditions for the unique identifiability of WCDE, clarifying when it diverges from CDE.
- Deriving the influence function for nonparametric estimation of WCDE, considering the class of regular and asymptotically linear estimators.
- Characterizing the optimal covariate adjustment set that minimizes asymptotic variance, demonstrating how mediator-confounder interactions introduce distinct requirements compared to average treatment effect estimation.
Weaknesses:
- While the paper provides a strong theoretical foundation, the practical implementation of determining the "O-set" (optimal adjustment set) in real-world scenarios might be complex. The entire framework, including the definition of VASs and the optimal adjustment set, assumes a known - - Directed Acyclic Graph (DAG) encoding the underlying causal structure. In many real-world applications, the true DAG is unknown and must be learned from data.
- The paper's notion of "optimal" is based solely on minimizing asymptotic variance. While this is a critical statistical property, other practical considerations, such as bias in finite samples, robustness to model misspecification (beyond what's implied by RAL estimators), or computational feasibility for very high-dimensional data, are not discussed in detail.
- The absence of experimental validation limits the immediate practical applicability and empirical understanding of the theoretical advancements presented.
问题
- Could the authors include simulation studies to demonstrate the practical advantages of the proposed methods.
- How robust are the results (identifiability, influence function, and optimal adjustment set) to misspecification or uncertainty in the underlying DAG? Could the authors discuss the implications if the true DAG is unknown and has to be learned from data?
- Could the authors elaborate on how practitioners can leverage the derived influence function (Theorem 3.4) to build efficient RAL estimators for WCDE?
局限性
N/A
最终评判理由
The authors effectively addressed my concerns regarding the practical implementation of determining the "O-set", and the need for experimental validation. They provided a clear discussion on causal discovery, new simulation results, and a practical explanation of the influence function's use in constructing efficient estimators. All of my concerns have been well-addressed. Thus, I raise my score by one point.
格式问题
Lack of an experimental section and a conclusion section.
We thank the reviewer for recognizing the importance of WCDE and acknowledging the theoretical contributions of our work, and we appreciate the thoughtful feedback that highlights areas where further clarification is needed. We apologize for any aspects of the paper that were unclear or incomplete.
Weakness 1: Practical feasibility of determining the O-set. We thank the reviewer for raising this important point. Indeed, learning the causal DAG from data is a fundamental and ongoing challenge in causal discovery, often requiring exponential runtime or suffering from poor finite-sample performance [1, 2]. A more data-efficient alternative is to use local causal discovery algorithms that learn only the relative relationships between the exposure and outcome [3, 4]. For example, the algorithm proposed in [4] efficiently recovers an optimal O-set for WCDE without requiring full causal graph recovery under suitable assumptions. However, we note that these papers did not establish the asymptotic optimality of the proposed O-set, which is precisely the goal of this paper. As these papers demonstrate, developing reliable, data-efficient methods to determine the O-set is a nontrivial problem and beyond the scope of this work. Thanks to the reviewer's point, we will add the following discussion at the end of the paper:
In this work, we assume that the underlying DAG of the causal graph is known and accurate. While this assumption simplifies the identification of valid adjustment sets, we recognize that in practice, the DAG must often be learned from data—a challenge actively studied in the causal discovery literature. Existing global discovery algorithms can recover DAGs from observational data, but often suffer from poor finite-sample performance or exponential runtime in the worst case [1,2]. As a computationally efficient alternative, local discovery algorithms focus on identifying the relative relationships among exposure, and outcome, bypassing the need to learn the full global DAG [3, 4]. In particular, [4] propose a local discovery method tailored for WCDE estimation in fairness applications, under the assumption that the outcome variable is not causally downstream of any observed variables and that all of its direct causes are observed. They show that the proposed method has a polynomial runtime while allowing for unobserved confounders that do not directly affect the outcome. In this work, we establish that the set returned by [4] in their problem setting is indeed asymptotically optimal. Another promising approach to learn causal DAG from data is to assume an additive noise model (ANM), under which polynomial-time global discovery algorithms are provably consistent [5, 6, 7], potentially enabling efficient identification of adjustment sets.
While these alternatives offer practical avenues for discovering the O-set from data, a key open challenge is understanding how errors from causal discovery propagate to the final estimation of WCDE. Our work establishes the optimality of the O-set assuming a correct DAG, but integrating uncertainty from causal discovery into the efficiency analysis of adjustment strategies remains an important direction for future work. Developing robust frameworks that combine structure learning with optimal adjustment set selection would enable more principled and data-driven applications of WCDE estimation in practice.
Weakness 2: Other criteria beyond asymptotic variance.
We agree with the reviewer that criteria such as finite-sample bias, robustness, and computational efficiency are also important for practical applications. However, we recognize that analytical characterization of finite-sample properties is often intractable, especially under general nonlinear and high-dimensional settings. For this reason, we supplement our theoretical results with extensive simulation studies to empirically evaluate the variance, bias, and MSE of different valid adjustment sets across multiple DAGs and outcome mechanisms. Please see the detailed response to Reviewer tnMD (Additional Experimental Results). We will revise the paper to also include a discussion of related work that seeks to balance bias and variance in finite samples, such as Henckel et al.~[8], and recent empirical studies on adjustment set selection in finite samples. These additions aim to strengthen the connection between our asymptotic results and practical performance in finite-sample regimes, providing a more comprehensive view of optimal adjustment strategies.
Weakness 3/Question 1: Experimental Validation.
We thank the reviewer for this insightful comment. To address this point, we have conducted a series of simulation studies to evaluate the performance of AIPW estimators for the WCDE under finite sample settings. Details of the simulation setup and results are provided in our response to Reviewer tnMD (Additional Experimental Results). These simulations have been completed and will be incorporated into the revised version of the paper.
Our findings demonstrate that the optimal Valid Adjustment Set substantially reduces variance and improves estimation accuracy when the sample size is limited. In particular, the adjustment set consistently achieves the lowest variance and MSE across both backdoor and mediator adjustment scenarios.
Question 2: Robustness of results to uncertainty associated with DAG
In addition to our response to Weakness 1, we emphasize that identifiability and optimal adjustment sets depend on whether the estimated DAG and the true group DAG share the same optimal O-set — a quantity that is generally unverifiable in practice. A more practical approach to handling this uncertainty is to examine the resulting WCDE estimates through their bias and variance properties, though this ultimately relates back to the points addressed in Weakness 1.
Regarding the influence function, as shown in Theorem 3.4, while different DAGs may induce different valid adjustment sets , the general form of the influence function for depends solely on the target parameter WCDE itself, rather than the specific DAG structure. We will emphasize this point in the revised paper to clarify the generality and robustness of our derivation.
Question 3: Use of the influence function in practice.
We thank the reviewer for this important question. Indeed, including an exemplary efficient RAL estimator for WCDE would significantly strengthen our paper.
In our experiments, we implemented the augmented inverse probability weighting (AIPW) estimator for WCDE. This estimator is constructed based on the influence function we derived in Theorem 3.4, with each component estimated using plug-in methods. Specifically, we estimated the relevant conditional means and densities, and combined them according to the structure of the influence function to form a doubly robust estimator. This ensures consistency if either the outcome model or the generalized propensity scores are correctly specified.
We will add a detailed description of this estimator to our revised paper. Moreover, we note that the influence function we provide also enables the construction of other efficient RAL estimators, such as TMLE or DML, using their respective estimation frameworks.
[1] Markus Kalisch and Peter Bühlmann. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research, 8:613–636, 2007.
[2] Yang-Bo He and Zhi Geng. Active learning of causal networks with intervention experiments and optimal designs. Journal of Machine Learning Research, 9: 2523–2547, 2008.
[3] J. Maasch, W. Pan, S. Gupta, V. Kuleshov, K. Gan, and F. Wang. Local Discovery by Partitioning: Polynomial-Time Causal Discovery Around Exposure--Outcome Pairs. In Proceedings of the 40th Conference on Uncertainty in Artificial Intelligence (UAI), 2024.
[4] J. Maasch, K. Gan, V. Chen, A. Orfanoudaki, N.-J. Akpinar, and F. Wang. Local Causal Discovery for Structural Evidence of Direct Discrimination. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 19349--19357, 2025.
[5] S. Hiremath, P. Ghosal, and K. Gan. LoSAM: Local Search in Additive Noise Models with Unmeasured Confounders, a Top-Down Global Discovery Approach. In Proceedings of the 41st Conference on Uncertainty in Artificial Intelligence (UAI), 2025.
[6] S. Hiremath, J. Maash, M. Gao, P. Ghosal, and K. Gan. Hybrid Top-Down Global Causal Discovery with Local Search for Linear and Nonlinear Additive Noise Models. In Proceedings of the Neural Information Processing Systems (NeurIPS), 2024.
[7] Hoyer, Patrik, et al. "Nonlinear causal discovery with additive noise models." Advances in Neural Information Processing Systems, 21, 2008.
[8] L. Henckel, E. Perković, and M. H. Maathuis. Graphical Criteria for Efficient Total Effect Estimation via Adjustment in Causal Linear Models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84(2):579--599, 2022.
Thank you for the thorough rebuttal. My concerns have been fully addressed. The additional discussion on causal discovery, the new simulation studies on finite-sample performance, and the practical explanation of the influence function significantly strengthen the paper's theoretical and practical contributions. I will raise my score accordingly.
The paper considers the estimation of "Weighted Controlled Direct Effect (WCDE)". CDE measures the effect of a treatment holding a mediator to a fixed level. WCDE averages over mediator distribution. The paper provides necessary and sufficient conditions for valid adjustment sets for unique identifiability of WCDE (condition 2.3), gives influence function for WCDE (Theorem 3.4), and shows that the optimal adjustment set that minimizes asymptotic variance is different from the ATE adjustment set.
优缺点分析
** Strength: **
S1. The paper is in general well written.
S2. I have not checked the proofs in the appendix, but proof sketches are provided in the main paper. One contribution is condition 2.3 (4), which uses interesting ideas for sufficiency and necessity. An example 2.5 is provided to illustrate the condition.
S3. Similarly the proof idea of finding the optimal valid adjustment set is interesting and a proof sketch for 4 steps is given. I have not checked the proof in the appendix.
** Weakness: **
Presentation: In some places a better clarity would be helpful.
(a) in the introduction, better explanation can be provided for "unique identifiability" and the optimal adjustment set that minimizes variance, suggesting that there are multiple adjustment sets.
(b) In condition 2.3, line 2.3, M should be defined since A and Y are not defined as treatment and outcome but arbitrary disjoint nodes.
(c) In condition 2.3, backdoor and mediator paths are used for the first time. They should be defined for the sake of completeness.
Minor typo in the 3rd condition in 2.3 -- "are block"
问题
Please explain unique identifiability.
Please also explain limitations of the assumptions and method in detail.
Satisfactory explanations to these points might increase my score to Accept.
局限性
Not adequate. The Q/A at the end says that they discuss future work, but I only find a summarization of the contributions and not much future work or limitation. Following the solicitation a clear specification and discussion of limitations and future work will be useful. For instance, the methods seem to be fully dependent on the correctness of causal DAG and correct specification of mediator variables, which may not hold in practice as is largely understood. Studying robustness when the causal DAG is not correct will be a future work.
最终评判理由
The rebuttal addressed my concerns and I am increasing the score by 1 point.
格式问题
NA
We thank the reviewer for the positive feedback, particularly regarding the proof idea for identifying the optimal valid adjustment set and the provided four-step sketch. In what follows, we address all points raised.
Weakness 1/Question 1: Clarify "unique identifiability" and the existence of multiple adjustment sets.
We appreciate the reviewer’s suggestion. In the revision, we will clarify that "unique identifiability" means the WCDE is a well-defined causal quantity that yields the same value in the population regardless of which valid adjustment set is used (under Condition 2.3). Estimator variability arises because different adjustment sets can produce different finite-sample estimates of this quantity due to varying statistical properties (e.g., bias and variance). This distinction motivates our focus on selecting optimal adjustment sets to improve estimation quality while preserving the theoretical identifiability guarantee.
Weakness 2: Define , and clarify the role of and in Condition 2.3.
We apologize for the confusion. The definitions of , , and are provided earlier in the subsection where Condition 2.3 appears (Lines 128–133). However, we agree that reiterating them within the condition itself would improve clarity. In the revision, we will explicitly define as the set of mediator variables in Condition 2.3. We will also restate that and denote the treatment and outcome, respectively, and clarify that the condition is stated in terms of arbitrary disjoint sets to generalize beyond specific treatment-outcome pairs.
Weakness 3: Define "backdoor" and "mediator" paths.
We will include formal definitions of these concepts in the revised paper, referencing Pearl [1] to ensure consistency with the standard causal inference literature. Specifically, we define the following types of paths to clarify our assumptions:
– Directed path: A path in which all arrows point in the same direction, from the starting node to the end node. For example, is a directed path from to .
– Backdoor path: Any undirected path between and that starts with an arrow into . For instance, is a backdoor path from to via .
– Mediator path: Any directed path from to that passes through at least one mediator variable . A typical example is .
These definitions will be formally added in Appendix A, along with illustrative examples to aid understanding. Additionally, we will correct the minor typo in the third item of Condition 2.3 (“are block”), and again we apologize for any confusion this may have caused.
Question 2: Discuss limitations.
We appreciate this valuable suggestion. In our revision, we will clarify that our method assumes the DAG is known and correctly specified. While in practice the DAG often needs to be estimated from data, we address the implications of this limitation more thoroughly in our response to Reviewer Nm1A (Weakness 1). We will include the study of our method’s robustness to DAG misspecification as an important direction for future research.
Additional Experimental Results
Further, to validate the robustness of our procedure under finite samples, we conducted additional experiments and will add them to our revised paper. We construct two 7-node DAGs with multiple causal pathways, corresponding to the scenarios in Lemmas 1–4 (Figures 3 and 4). Our data is generated as follows:
For each DAG edge, coefficients are sampled independently: with 50% probability from Uniform , otherwise from Uniform , ensuring non-negligible effects. Each variable in the DAG is generated recursively according to a structural equation model:
where is the sampled edge coefficient from parent to . The function is applied independently to each parent variable before combination, and is selected uniformly at random from the following:
– Identity:
– Squared:
– Exponential:
The additive noise term is sampled independently from a Gaussian distribution for each node with parents. Root nodes (i.e., nodes with no parents) are sampled from .
For each simulation run, we randomly select one of the three nonlinear functions to transform the outcome. The final outcome is generated by applying the chosen to its parent variables. We repeat this process for 100 randomly generated coefficient sets, simulating each to assess the robustness of WCDE estimation.
Estimation Method. We implement an AIPW estimator with:
– Q-model: linear regression with spline-transformed features (degree 4, 20 knots)
– g-model: logistic regression for the treatment model.
The detailed form of the AIPW is included in the response to Reviewer Nm1A (Question 3).
Evaluation Metrics
For each adjustment set, we report average variance and average MSE across 100 replications. The first two tables correspond to Figure 3 and illustrate the results associated with Lemmas 4.4 and 4.6. The latter two tables correspond to Figure 4, which supports the theoretical findings in Lemmas 4.5 and 4.7.
Table 1. Average Variance for Figure 3
| Adjustment Set | $n=100$ | $n=400$ | $n=1000$ | $n=4000$ |
|---|---|---|---|---|
| , | 0.0110 | 0.0018 | 0.0007 | 0.0002 |
| , | 0.0385 | 0.0076 | 0.0031 | 0.0010 |
| , , | 0.0282 | 0.0031 | 0.0011 | 0.0003 |
Table 2. Average MSE for Figure 3
| Adjustment Set | $n=100$ | $n=400$ | $n=1000$ | $n=4000$ |
|---|---|---|---|---|
| , | 0.0112 | 0.0019 | 0.0007 | 0.0002 |
| , | 0.0391 | 0.0077 | 0.0031 | 0.0010 |
| , , | 0.0286 | 0.0032 | 0.0011 | 0.0003 |
Table 3. Average Variance for Figure 4
| Adjustment Set | $n=100$ | $n=400$ | $n=1000$ | $n=4000$ |
|---|---|---|---|---|
| , | 0.0142 | 0.0018 | 0.0007 | 0.0002 |
| , | 0.0615 | 0.0091 | 0.0048 | 0.0014 |
| , , | 0.0496 | 0.0032 | 0.0012 | 0.0003 |
Table 4. Average MSE for Figure 4
| Adjustment Set | $n=100$ | $n=400$ | $n=1000$ | $n=4000$ |
|---|---|---|---|---|
| , | 0.0146 | 0.0018 | 0.0007 | 0.0002 |
| , | 0.0629 | 0.0093 | 0.0049 | 0.0015 |
| , , | 0.0505 | 0.0033 | 0.0012 | 0.0003 |
Our proposed O-set consistently achieves the lowest variance and MSE across all sample sizes—from as small as to as large as —demonstrating its finite-sample robustness despite our theoretical guarantees being asymptotic. The current tables present preliminary simulation results, with more extensive experiments planned for the final version to strengthen empirical validation and provide a more complete set of results.
[1] J. Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2009.
Thanks for the rebuttal. It addresses my concerns and I increased the score.
This is a reminder that the author-reviewer discussion period is meant to afford the time for a proper discussion. Since the authors have devoted substantial effort to their response, I encourage all of the reviewers to critically engage with the response, which includes reading the other reviews and having an open exchange with the authors.
Thanks all for their valuable efforts so far, and please continue helping to ensure high-quality reviewing at NeurIPS for this year and future years!
Hi reviewers,
First, thank you to those of you who are already participating in discussions! Your engagement is invaluable for the community and helps to ensure that NeurIPS is able to continue as a high-quality conference as it grows larger and larger every year.
I want to highlight some important points shared by the Program Chairs. First, the PCs have noted that many reviewers submitted “Mandatory Acknowledgement” without posting participating in the Author-Reviewer discussion, and we have been instructed that such action is NOT PERMITTED. As suggested by the PCs, I am flagging non-participating reviewers with “InsufficientReview”; I will remove the flag once reviewers have shown an appropriate level of engagement.
Here is a brief summary of the PC’s points on this matter:
- It is not OK for reviewers to leave discussion till the last moment.
- If the authors have resolved your questions, do tell them so.
- If the authors have not resolved you questions, do tell them so too.
- The “Mandatory Acknowledgement” button is to be submitted only after the reviewer has read the author rebuttal and engaged in discussions - reviewers MUST talk to the authors and are encouraged to talk to other reviewers.
To facilitate these discussions, the Author-Reviewer discussion period has been extended by 48 hours till August 8, 11:59pm AOE.
Thank you all for your efforts so far, and I look forward to seeing a more engaged discussion in the coming days!
The paper studies estimation of the weighted controlled direct effect (WCDE), which generalizes the controlled direct effect (CDE) and is relevant in fairness analysis. The paper defines the notion of a valid adjustment set (VAS) for the WCDE, which extends the definition of a VAS for the CDE. The paper establishes that the existence of a VAS is necessary and sufficient conditions for the WCDE to be identifiable, and gives an identification formula in terms of the VAS. The authors derive the influence function of this quantity and characterize the "optimal" adjustment set (in the familiar sense of asymptotic efficiency, i.e., minimum asymptotic variance).
Strengths. The reviewers agree that the WCDE is an interesting and important extension of the CDE (Reviewers Nm1A, Hn2a, UNcX) which is sufficiently motivated by links to fairness. The paper is a well-grounded generalization of existing theory and provides several extensions of familiar, important definitions and existing results (valid adjustment sets, identification, influence functions, optimality). The reviewers also largely agreed that the paper was generally well-written, though there were a few suggestions for improving clarity, e.g., correct placement of some definitions such as "backdoor" and "mediator" paths (Rev. tnMD) and the addition of some illustrative figures (Rev. Hn2a).
Weaknesses. A main weakness pointed out by all reviewers was the lack of experiments to validate the theoretical results. The authors added results on semi-synthetic data during the rebuttal period (in their response to Reviewer Hn2a), reporting the variance and MSE of their estimates on three networks from bnlearn. The experiments show that, when the number of samples is relatively large (400 or 1,000), the adjustment set selected by their method yields an estimator with lower variance and MSE than other adjustment sets. Another weakness pointed out by two reviewers (Reviewers Nm1A and UNcX) is that the results hinge on having the true DAG in the absence of unmeasured confounding. The authors acknowledge this point and mention that they will explicitly discuss this limitation in the revision. While this is indeed an important limitation, I don't think it is a major weakness of this work, since it provides the necessary first step for the development of methods which are robust to misspecification or which simultaneously consider both structure learning and effect estimation. A similar comment applies to another one of the limitations, mentioned by Reviewer Nm1A: that "optimality" is only based on asymptotic variance and not finite-sample bias.
Conclusion. The paper makes a solid technical contribution to an important area of causality. Even without the experiments and other improvements from the rebuttal, the paper is ready to be accepted.
Minor comment to the authors: I believe the term "unique identifiability" is redundant. Saying that a quantity is "identifiable" already implies that it is unique. To avoid confusion, I would suggest removing the term "unique", unless you have something different in mind which I've missed.