PaperHub
4.3
/10
Rejected4 位审稿人
最低3最高6标准差1.3
6
5
3
3
4.0
置信度
正确性2.0
贡献度2.0
表达2.5
ICLR 2025

Active Causal Learning for Conditional Average Treatment Effect Estimation

OpenReviewPDF
提交: 2024-09-28更新: 2025-02-05

摘要

关键词
conditional average treatment effectdynamic samplingpartially observed Markov decision process

评审与讨论

审稿意见
6

The paper proposes a new problem of covariate selection, aiming to balance effect estimation accuracy and covariate measurement cost. This paper proposes a Partially Observed Markov decision process (POMDP)-based method to learn the dynamic sampling strategy. The experimental results show the effectiveness of the proposed method.

优点

  1. The paper defines a novel covariate selection problem, which aims to balance effect estimation accuracy and covariate measurement cost.
  2. The paper provides a POMDP-based solution, and the experiments verify its effectiveness.

缺点

  1. For the problem formulation in Eq. (3), directly minimizing L^f\hat L_f might not be practical enough. For many applications, a hard constraint, i.e., L^f<ϵ\hat L_f < \epsilon could be better.

Even though the proposed method might be simple, I still appreciate the proposed new setting and problem formulation.

问题

/

审稿意见
5

This article describes how the covariates themselves are costly to collect, so the meaningful question is how to sift through the most informative covariates to guide causal learning at minimal cost.

优点

Pros: the motivation for this article is relatively sound.

缺点

Cons: This article comes under the umbrella of active learning, and I am cautious that more causal active learning (at the theoretical level as well as at the applied level) needs to be extensively and deeply contrasted in the work related to this article; secondly, I think that the methodology of this article is a bit oversimplified and that the theoretical level of the analyses is a bit thin

问题

Could you give me more theoretical guarantee about the performance of your new method?

审稿意见
3

The manuscript proposes a cost-efficient conditional average treatment effects (CATE) estimation method by sequentially querying covariates for a new subject. With partially observed covariates, the CATE estimator takes masked full covariates as the input to predict potential outcomes. Authors propose to learn a dynamic covariate-sampling strategy which can balance the cost of querying covariates and CATE loss. By formulating the optimization problem into a partially observable Markov decision process, authors leverage the PPO method to learn such sampling strategy.

优点

Active querying in estimating causal effects is a interesting problem in the field. The article is well organized with extensive numerical studies.

缺点

  • There is no identification for CATE with masked covariates. The SUVTA condition is only assumed for full covariates XiX_i. Please discuss how the SUTVA and other identification assumptions may need to be modified or extended when dealing with partially observed covariates. It is suggested to provide theoretical justification or empirical evidence for the validity of CATE estimation with masked covariates.

  • The estimation model fwf_w is trained only with fully observed XX, i.e., the input is fw(X(X,1))f_w(\mathcal{X}(X,\mathbf{1})). For observation X(X,M)\mathcal{X}(X,M) with other mask MM, the performance of fw(X(X,M))f_w(\mathcal{X}(X,M)) can be very poor. Have you considered training fwf_w with various masked inputs to improve its performance on partially observed data? How might this affect the overall performance of the proposed method?

  • It seems that the policy in (3) is a stochastic policy as authors propose to use PPO to learn this policy, there is a missing expectation over action ata_t according to π\pi in the objective function. Please clarify whether the policy is stochastic and, if so, explain how the expectation over actions is incorporated into the objective function. Also, why is there an indicator function in the constraints as MM is a binary mask vector?

  • In the POMDP setting, the transition should be P:S×AS\mathbb{P}: \mathcal{S} \times \mathcal{A}\rightarrow \mathcal{S}, and reward does not depend on O\mathcal{O}. The policy typically leverages the past history, not only the current observation. I suggest to discuss how incorporating past history into the policy might affect the performance of the proposed method.

  • Only when the discount factor γ=1\gamma=1, the RL value function matches the objective in (3). Why choosing γ<1\gamma < 1 in the experiments? How this choice affects the relationship between the RL value function and the original objective?

问题

See weaknesses.

审稿意见
3

Note: I have previously reviewed this paper for NeurIPS. The authors do not appear to have revised the paper, in particular regarding my earlier comments, so I have submitted the same review as before.

This paper considers a setting where the learner can sequentially decide which features to acquire for a given instance. A dataset with complete features is available, which is used to train a data acquisition policy with RL. The focus is on estimating conditional average treatment effects, which are assumed to be identified in the historical data and then estimated with an existing approach.

优点

The idea of sequentially acquiring features is well motivated, as in sequential decisions about tests or diagnostics in medicine. The experimental results show improvements from using RL over reasonable baselines like greedy acquisition.

缺点

The idea of using RL to define a policy for sequential feature acquisition is not new. There is a great deal of work on this problem in the supervised setting. A few examples that come up from a quick search:

Li and Oliva. Active feature acquisition with generative surrogate models. ICML 2021.

Ghosh and Lan. DiFA: Differentiable Feature Acquisition. AAAI 2023.

Zheng Yu, Yikuan Li, Joseph Chahn Kim, Kaixuan Huang, Yuan Luo, Mengdi Wang Deep Reinforcement Learning for Cost-Effective Medical Diagnosis. ICLR 2023.

The paper doesn't discuss any of this related work or how the proposed methods/problem formulation is related. While this paper focuses on causal estimation, the way that it addresses the problem effectively reduces it to supervised learning: the predictions of an existing causal inference method using the historical (complete) data are used to provide surrogate outcomes, after which the proposed reward function treats it exactly as in the supervised case. I believe that the paper needs to either illustrate a fundamental difference between supervised learning and CATE estimation in this setting, or compare to existing methods in this literature and demonstrate improved performance.

问题

Can you comment on the relationship to this previous work?

AC 元评审

This paper proposes a dynamic sampling strategy to estimate conditional average treatment effects (CATE) from observational data while minimizing the cost of acquiring features and ensuring estimation accuracy. The idea of combining the dynamic feature selection and CATE estimation is quite interesting and of practical importance; however, considering the lack of comparison with related work, insufficient theoretical guarantees, and limitations in experimental design, the paper does not meet the acceptance criteria at this stage.

审稿人讨论附加意见

Through the discussions, there were no significant changes in the scores. Overall, while the novelty of the proposed method is appreciated, the lack of comparison with existing studies and theoretical guarantees kept the scores low.

最终决定

Reject