PaperHub
6.1
/10
Poster4 位审稿人
最低2最高4标准差0.8
4
2
3
4
ICML 2025

Doubly Protected Estimation for Survival Outcomes Utilizing External Controls for Randomized Clinical Trials

OpenReviewPDF
提交: 2025-01-23更新: 2025-07-24
TL;DR

We propose a doubly protected integrative estimator for survival outcomes that is more efficient than trial-only estimators and mitigates biases from external data.

摘要

关键词
Adaptive learningMonotone coarseningunmeasured confoundingdata heterogeneity

评审与讨论

审稿意见
4

Estimating the average treatment effect (ATE) from both trial and external control datasets is challenging due to data heterogeneity, specifically covariate shift and outcome shift. This paper proposes a doubly protected estimation framework to address these challenges.

  1. When the external control dataset is comparable to the trial dataset, the authors propose a doubly protected estimator, which corrects for covariate shift using density ratio weighting of baseline covariates.
  2. When comparability is violated (i.e., outcome shift occurs), the method selectively "borrows" only a subset of the external control dataset that remains comparable to the trial data, improving robustness.

The effectiveness of the proposed method is demonstrated via extensive simulation studies and illustrated through a real-world application in migraine treatment evaluation.

update after rebuttal

Thank you for the response and the additional experiments. I think the new results strengthen the paper and make the overall argument more convincing. I would like to keep my score as Accept.

给作者的问题

Assumption 3.1 states qR(X)<1q_R(X)<1, which implies P(R=1X)<P(R=0X)P(R=1\mid X) < P(R=0 \mid X). Should this instead be P(R=1X)<1P(R=1\mid X) < 1, allowing for the trial population to be dominant in some covariate regions?

论据与证据

The theoretical and empirical claims made in the paper are well-supported by rigorous derivations, asymptotic properties, and empirical validation through simulations.

方法与评估标准

The proposed methodology and evaluation criteria are appropriate for the problem at hand:

  1. The doubly protected estimator improves efficiency by integrating external controls while mitigating biases using density ratio weighting and DR-Learner.
  2. The paper extends semiparametric efficiency theory to survival analysis, allowing dynamic selection of comparable external controls.
  3. The approach is flexible, accommodating machine learning models to estimate survival curves without strong parametric assumptions.

The ATE estimation is based on restricted mean survival time (RMST). However, when the survival curve remains at a high probability at τ\tau (as seen in Figure 3A), RMST may underestimate survival differences by neglecting the tail distribution, which can significantly contribute to the total effect. The authors should discuss this limitation in the future work.

理论论述

I briefly reviewed the correctness of the proofs and did not identify any errors. The derivations are rigorous and well-structured.

实验设计与分析

The experimental design is sound and well-structured:

  1. The benchmarking against multiple baselines provides a comprehensive comparison.
  2. The simulation study covers various settings, including different types of bias-generating mechanisms (selection bias, unmeasured confounding, and lack of concurrency).

The only concern I have is that, the feature generation process in simulations assumes independent features and linear relationships (moreover, the coefficients are all 1s) between the covariates, treatment assignment, and hazard function.

补充材料

There is no supplementary material.

与现有文献的关系

This paper develops a flexible and data-adaptive (for covariates shift, and outcome shift) framework that accounts for survival analysis datasets. The consistency and

遗漏的重要参考文献

N/A

其他优缺点

Other strengths

  1. The paper clearly articulates the problem, the proposed solution, and its impact on clinical trials.
  2. The derivations are mathematically sound and insightful, providing clear intuition alongside proofs.
  3. The simulations and real-world application convincingly support the claims.

其他意见或建议

I believe the correct term for Assumption 3.2 should be informative censoring, rather than non-informative censoring, as stated in the paper. This assumption acknowledges that event and censoring times are dependent and only become independent when conditioned on covariates. In other words, knowing the censoring time provides information about the event time. However, this is merely a terminological distinction and does not affect the validity of the paper.

作者回复

Thanks for the careful reviews. Here are our detailed responses to your questions.

Methods And Evaluation Criteria

  1. The selection of the cutoff value τ\tau for computing RMST is crucial in practice since the tail distribution after τ\tau is neglected. Typically, the event rates at this cutoff value should exceed 10% to ensure sufficient data for model development. A common rule of thumb is to set τ\tau to be around the 75%-80% quantile of observed event times. However, this choice varies case by case and should be informed by domain expertise with real-world data. A brief discussion on this will be included in the paper.

Experimental Designs Or Analyses

  1. We have included additional set of simulation, where for RCT, the hazard is λa(tX)=exp(0.5a0.2X10.5X20.1X3)\lambda_a(t\mid X) = \exp(-0.5a-0.2X_1-0.5X_2 -0.1X_3); for EC, λ0(tX)=exp(0.1X10.5X20.2X3)\lambda_0(t\mid X) = \exp(-0.1X_1-0.5X_2 -0.2X_3) (i.e., the coefficients are not the same anymore), and the covariates are generated by the multivariate normal distribution with pair-wise correlation 0.5 (i.e., non-independent features). The results are presented here (https://anonymous.4open.science/r/ICML2025-7977/fig4.png), and main findings stay the same as before.

  2. Moreover, transformation on the covariates (e.g., polynomial transformation) can be considered as data pre-processing to capture the non-linear relationship between the covariates and the time-to-event outcomes. Furthermore, our method allows to use flexible survival models (e.g., survival random forest) for estimating the survival curves (e.g., Sa(tX)S_a(t\mid X)), which could also capture the non-linear relationship.

Other Comments Or Suggestions: We have changed Assumption 3.2 to Informative censoring as T(a)CX,A=a,R=rT^{(a)}\perp C\mid X, A=a, R=r for a=0,1a=0,1 and r=0,1r=0,1.

Questions For Authors: Thanks for catching this typo. We have changed Assumption 3.1 (ii) to "0<πA(X),πR(X)<10<\pi_A(X),\pi_R(X)<1 in the support of XX", where πA(X)=P(A=1X,R=1)\pi_A(X)=P(A=1\mid X, R=1) and πR(X)=P(R=1X)\pi_R(X)=P(R=1\mid X).

审稿人评论

Thank you for the response and the additional experiments. I think the new results strengthen the paper and make the overall argument more convincing. I would like to keep my score as Accept.

作者评论

Thank you for taking the time to review our response and additional experiments. Your comments certainly strengthen our paper and will be incorporated in the final version.

审稿意见
2

Authors study the estimation of restricted mean survival time in a randomized controlled trial where external controls are leveraged to increase statistical power. Since there may be a conditional shift (outcome drift) between trial controls and external controls, it is well understood that doing this is not trivial and requires adjustment for it (e.g., reweighting external controls to mimic the trial controls' covariate distribution). Authors derive the efficient influence function for that estimand which motivates a doubly-robust estimator. They also propose an estimator (based on selective borrowing) that can operate when there are unmeasured confounders that drive the outcome drift between the trial and external controls.

给作者的问题

NA

论据与证据

Yes

方法与评估标准

Synthetic experiments seem reasonable and authors methods perform better under cases they are expected to. There a few things I could not quite follow about the real-world experiments which I elaborated on below (Experimental Designs Or Analyses)

理论论述

I did not check the proofs in detail, as they are rather lengthy. The parts I skimmed through seemed correct, and the resulting expressions make sense. As the authors mention, most of the techniques for deriving EIFs are borrowed from the literature, so I would not expect any mistakes in the results.

实验设计与分析

Experiments seem exhaustive but this section can benefit from polishing & organization. For real-world experiments, what are CGAI and CGAG trials? Creating a table for abbrevations can help.

I could not really understand the evaluation criteria for real-world experiments through PrSS. Where does that ground-truth threshold (-0.1) come from? Even if we take that as given, I had a hard time understanding what were you looking at. The real-world experiments are extremely important to justify the complicated theory in the paper. I think you have a nice dataset & experimental setup, but just need to be much more clear about what you are doing & how you are evaluating.

补充材料

NA

与现有文献的关系

One of the main drawbacks of this paper for me is its relevance/contributions to the broader machine learning community. It focuses on a very particular estimand that is relevant for causal inference from censored data, and it develops an efficient estimator which can leverage external controls using recipes from the efficient influence functions literature. I do see how this could be useful in practice, but have a feeling that this work would be a great fit for a statistics conference/journal, but it does not contribute much on the methodology/technical side that could be generalized to/used in other ML problems.

遗漏的重要参考文献

NA

其他优缺点

see above (Relation To Broader Scientific Literature)

其他意见或建议

NA

作者回复

We thank reviewer for the through assessment. We here provide the detailed response to each of these points.

Experimental Designs Or Analyses:

  1. We have refined and reorganized the experiment sections to align with the objectives of the simulation, including how to design the data generation mechanisms accordingly, competitors, evaluation metrics, results, and detailed discussions.

  2. The names of real-world dataset have been changed to EVOLVE-1 study and REGAIN study and made consistent throughout. The EVOLVE-1 study (Evaluation of Galcanezumab v.s. Placebo in the Prevention of Episodic Migraine) serves as the randomized clinical trials (CGAG is its protocol name). REGAIN study (Evaluation of Galcanezumab v.s. Placebo in Patients with Chronic Migraine) serves as the external controls (CGAI is its protocol name). The references will be included in the paper.

  3. The threshold 0.1-0.1 for the real-data is chosen by domain knowledge, that is, reducing the RMST at month τ=6\tau=6 by 0.10.1 is considered clinically meaningful. Further, the PrSS could be computed under various other thresholds and similar conclusions could be drawn; see https://anonymous.4open.science/r/ICML2025-7977/fig3.png.

  4. To interpret the results in Panel (C) of Figure 3, we could use an example to illustrate. Suppose that we aim to reach PrSS at most 0.60.6 at month 66, our method “adapt” only need to recruit 100100 patients for the placebo group (solid red line at month 66), however, the benchmark method “aipw” needs at least 150150 patients for the placebo group (dash green line at month 66). Therefore, our approach could attain similar levels of PrSS with fewer patients by leveraging the external controls and thus shorten the patient enrollment period, which could eventually accelerate the drug development for rare diseases.

Relation To Broader Scientific Literature

  1. First, the proposed method can be generalized to any estimand that is a function of the survival function Sa(t)S_a(t) (e.g., mean or median of the survival time), not necessarily limited to one particular estimand as the EIFs are derived for Sa(t)S_a(t). Let the estimand of interest be θτ(t)=Φτ(Sa(t))\theta_\tau(t)=\Phi_\tau(S_a(t)), the associated EIF for the estimand of interest can be obtained as ψθ(t)=dΦτ(q)/dqψSa(t)\psi_\theta(t) = d\Phi_\tau(q)/dq \cdot \psi_{S_a}(t) by Taylor expansion, where ψSa(t)\psi_{S_a}(t) is the EIF for Sa(t)S_a(t), and the DR-learner is directly applicable to detect the outcome drifts in the estimand of interest. Thus, our proposed method should be useful to any integrative (causal or not) analysis for survival outcomes. We will emphasize this point in the paper.

  2. Survival problems should be important in the machine learning (ML) community, such as dropout and customer churn. In many ML problems, there are heterogeneous data sources that can be integrated for the same task or domain adaptation; the critical issue is to handle data heterogeneity. Our proposed selective borrowing method offers a new perspective of the integrative analysis for the survival outcomes with a proper way for inference, which could be a valuable contribution to the general ML community. We will include such discussions in the paper as well.

审稿意见
3

The paper introduces a new way to estimate treatment effects in survival analysis using external controls, which is especially helpful when clinical trials have small control groups, like in rare diseases. It introduces a doubly protected estimator for the restricted mean survival time (RMST) difference, combining doubly robust estimation to adjust for covariate shifts and a DR-Learner to mitigate outcome drift. By leveraging machine learning, the method flexibly models survival curves and selectively borrows external data while ensuring robustness. Empirical validation through simulations and real-data application demonstrates its practical utility.

给作者的问题

  • It would be great if the author can add a small paragraph of double robust estimators in the related work section.

论据与证据

  • Doubly Protected Estimator for RMST Difference:

    • Theorem 3.5 and Theorem 3.6 provides a theoretical foundation for constructing valid confidence intervals and ensuring asymptotic properties, which strengthens the claim.
  • Handling Covariate Shifts and Outcome Drifts:

    • The use of the density ratio to adjust for covariate shifts and the DR-Learner to address outcome drift is conceptually sound. These approaches are grounded in semi-parametric theory and machine learning.
  • Asymptotic Properties:

    • The authors claim to establish asymptotic consistency and efficiency improvements for their estimator. They use efficient influence function, which is known to provide such guarantees under regularity conditions.
  • Empirical Validation:

    • The claim that the method performs well in simulations and a real-data application is supported by sunthetic and real-data analysis of Galcanezumab for migraine headaches in section 4.

方法与评估标准

The claim that the method does not require stringent parametric assumptions is plausible, as the framework incorporates flexible machine learning techniques for survival curve approximation.

理论论述

I have briefly reviewed the correctness of Theorems 3.4, 3.5, and 3.6, and they appear to be correct. The derivations and proofs seem consistent with the theoretical framework and align with established principles in the field.

实验设计与分析

For synthetic data, extensive simulations show robustness and efficiency gains compared to trial-only estimators and other methods. For real data: The method is applied to evaluate the efficacy of Galcanezumab in mitigating migraine headaches, illustrating its practical utility.

补充材料

Yes, I reviewed some of the proofs for the main theorems in the supplementary material, and they appear to be correct.

与现有文献的关系

Doubly robust estimators are well-established in causal inference and missing data literature. These estimators are robust to misspecification of either the outcome model or the propensity score model, making them attractive for handling covariate shifts (e.g., Bang & Robins, 2005; Van der Laan & Rose, 2011). The authors extend doubly robust estimation to the context of survival outcomes with external controls. They use the density ratio of baseline covariates to adjust for covariate shifts and derive the efficient influence function for the restricted mean survival time (RMST) difference. This builds on semi-parametric theory (Tsiatis, 2006) and provides a principled framework for integrating external data while maintaining efficiency and robustness.

遗漏的重要参考文献

I think it's well discussed.

其他优缺点

The theoretical contributions, such as the derivation of the efficient influence function for the restricted mean survival time difference and the establishment of asymptotic properties, provide a rigorous foundation for integrating external data into survival analysis. The application of the method to evaluate the efficacy of Galcanezumab for migraine headaches demonstrates its practical utility.

其他意见或建议

  • In the supplementary material, it would be easier to read if the author restate the theorem and lemma before the proof, and in the main text, use one or two sentence to mention the proof is at which section in the appendix.

  • It would be better to include the synthetic experiment implementation code as supplementary material, rather than stating, "Our implementation codes will be made publicly available after the acceptance of this manuscript." (Additionally, "codes" should be revised to "code

作者回复

We thank the reviewer for the comments. Here is a detailed response to your concerns.

Other Comments Or Suggestions

  1. We will restate each theorem and lemma in the supplementary material. In the main text, we will cross-reference the proof of each theorem and lemma in the Appendix. "Theorem 3.4 is proved in Appendix A.1, Theorem 3.5 is proved in Appendix A.2, Theorem 3.6 is proved in Appendix A.3, Lemma 3.7 is proved in Appendix A.4, and Theorem 3.8 is proved in Appendix A.5".

  2. We have already prepared the codes along with one implementation example for the proposed method in the supplementary material.

Questions For Authors

  1. One short paragraph of double robust estimators will be included for the related work section: ”However, existing integrative methods are limited by the assumption of the Cox model, either on the cause-specific or subdistribution hazard scale, which requires to accurately model the survival probability. In recent years, semiparametric efficient and doubly robust estimators, which leverage the efficient influence function (Bickel et al., 1993; Tsiatis, 2006; van der Vaart, 2000; van der Laan & Robins, 2003), including estimation equation methodology (Hubbard et al., 2000; Robins & Rotnitzky, 1992; van der Laan & Robins, 2003) and targeted maximum likelihood estimation (van der Laan & Rubin, 2006; Rytgaard et al., 2022), have gained great popularity in many fields and are increasingly used to draw inference about treatment effects.”

  1. Bickel, P. J., Klaassen, C. A. J., Ritov, Y., & Wellner, J. A. (1993). Efficient and adaptive inference in semiparametric models, Forthcoming monograph.
  2. Tsiatis, A. A. (2006). Semiparametric theory and missing data (Vol. 4). New York: Springer.
  3. van der Vaart AW (2000) Asymptotic statistics, vol 3. Cambridge University Press, Cambridge.
  4. van der Laan MJ, Robins JM (2003) Unified methods for censored longitudinal data and causality. Springer, Berlin.
  5. Hubbard AE, van der L MJ, Robins JM (2000) Nonparametric locally efficient estimation of the treatment specific survival distribution with right censored data and covariates in observational studies. In: Statistical models in epidemiology, the environment, and clinical trials, Springer, Berlin, pp 135–177.
  6. Robins JM, Rotnitzky A (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. In: AIDS epidemiology, Springer, Berlin pp 297–331
  7. van Der Laan, Mark J., and Daniel Rubin. "Targeted maximum likelihood learning." The international journal of biostatistics 2.1 (2006).
  8. Rytgaard, Helene C., Thomas A. Gerds, and Mark J. van der Laan. Continuous-time targeted minimum loss-based estimation of intervention-specific mean outcomes. The Annals of Statistics 50.5 (2022): 2469-2491.
审稿意见
4

The authors propose a "doubly protected" estimator for treatment-specific restricted mean survival time difference in RCTs, focusing on alleviating biases commonly encountered when employing additional (i.e., non-trial-derived) external control data. Their estimator accounts for both covariate shift and outcome drift, addressing some of the most common issues preventing the usage of external control data for such problems.

Methodologically, the paper makes two main contributions:

  1. Development of an integrative estimator under an assumption of comparability
  2. Extension of the estimator from (1) to settings in which comparability may be violated

In terms of theory, the authors prove two (main) theorems (Theorem 3.6 and Theorem 3.8) establishing the estimation error of the estimator from (1) and the variance of the estimator from (2).

Empirically, the authors consider three simulation scenarios, each focused on a potential bias that might occur with external control data, including selection bias, unmeasured confounders, and lack of concurrency. The authors show that their adaptive estimator (2) performs well and can choose adaptively to what extent and which external control samples should be included for estimation, resulting in comparable bias to the trial-only estimator while achieving moderately reduced MSE and SE.

Lastly, the authors consider an exemplary real-data application of their method on an RCT of a drug for episodic migraines.

给作者的问题

  1. The second simulation scenario was a bit unclear to me, primarily due to two things:
  • Is there a particular reason why the cond. hazard function is not indexed with a? If it is since there is no .5a−.5a term, what is the reason that term was left out?
  • What is the indicator on R=0R = 0 doing? AFAIU, UU is sampled independently for R=1R=1 and R=0R=0, so why is the additional indicator needed? (as, also according to the text, UU reflects differences in the baseline hazards).
  1. Some terms are never really defined, for things like [lack of] concurrency, having at least an informal definition and/or citation to somewhere would make it easier to read for people coming from adjacent fields (e.g., survival).

论据与证据

The authors are relatively modest in their claims and do not (IMO) oversell, focusing primarily on the point that "This approach effectively incorporates external controls without introducing biases into the integrative treatment evaluation.", which I agree with based on the presented evidence.

方法与评估标准

The general simulation setup makes sense. The real dataset also seems appropriate. I have several comments regarding the simulation design:

  1. The authors seem to assume time-constant hazards (i.e., data is effectively simulated from an exponential distribution) throughout - coming from a survival perspective, this seems quite restrictive, especially given that some competing methods consider e.g., Weibull distributions in their simulation [1]. Just to be clear, I think this is fine for the censoring hazard function but not necessarily the event hazard function. I don't think the current ablations on βC\beta_C are enough and would like to see simulations including non-time-constant hazards.
  2. Similarly, given the fact that lack of accounting for time-varying outcome drift and time-varying covariate effects are presented as drawbacks of current methods in related work, I was surprised to not see the simulations directly addressing these.
  3. The simulations in [1] also investigate changes in covariate effects separately and jointly with time-varying baseline hazard differences (see 1 + 2). Unless the authors have strong reasons for not investigating these, I think they would strengthen the simulations and thus the paper.

[1] Li et al. (2023b)

理论论述

I only skimmed the proofs and have no concerns.

实验设计与分析

  1. Several experimental details regarding the empirical results are either missing or unclear: (i) which penalty term is used for step 2 in 3.3 in the experiments and how is the λ\lambda tuned? (ii) How are the nuisance functions estimated throughout the experiments?

补充材料

I skimmed the proofs in the supplementary and reviewed the additional simulations in detail.

与现有文献的关系

The first part of the proposed estimator (3.2) seems like a relatively straightforward (novelty-wise) extension of [2] to the survival outcome setting. Despite this, I think the adaptive proposed method (i.e., 3.3) which stems from a combination of 3.2 and the DR-Learner framework [3, 4], is interesting, given its low bias and low-moderate improvements in terms of MSE, relative to the trial-only estimator.

[2] Gao et al. (2024)

[3] Kennedy Edward (2020)

[4] Kallus & Oprescu (2023)

遗漏的重要参考文献

None that I am aware of.

其他优缺点

Overall, I think this paper handles an interesting and timely problem, especially in the context of survival outcomes. While novelty could be higher, the paper itself is interesting, especially given the proposed estimators' basically non-existent cost in terms of bias.

Despite this, I think there are several points (especially in the simulations) that need to be extended and or better explained (see e.g., also 4).

其他意见或建议

  • Table 3, simulation scenario 2 is missing a right closed parentheses before the final curly brace.
  • AFAIK, the default ggplot2 color scheme is not particularly color-blind friendly, so I would suggest the authors switch their figures to a different palette.
  • Some colors are not matched between figures (e.g., Fig 1 top has acw in olive, bot. in green) and some figures use colors very close to another figure for something very different (e.g., Figure 2 has blue for sim. scenario two, while Fig. 1 bot. uses it for one of the estimators).
  • Some Figures use panels (Fig. 3) and some top bottom - visually, I think it's easier for readers to have it consistent.
  • The y-axis of Figure 2 is very (too, IMO) tight for "Relative Efficiency", presumably due to being forced to share it with the other facet - I would suggest relaxing this and likely making it symmetric for the relative efficiency. It may also make sense to flip the y-axis, to keep with the high -> good direction of the other facet ("Proportion of Borrowing").
作者回复

Thanks for the careful review and nice words. We hereby provide one-to-one responses to your concerns.

Methods And Evaluation Criteria

  1. The current three settings represent three typical scenarios we often encounter in practice: Setting 1, where all the ECs are comparable after adjusting for covariate shift and should be included; Setting 2, where the unmeasured confounding is present, none ECs are comparable and the external data should not be used; Setting 3, where there is lack of conccurency, only 1/3 of ECs are comparable and that portion of external data should be borrowed.

  2. As suggested, we consider two additional simulation settings, and the results are presented here (https://anonymous.4open.science/r/ICML2025-7977/fig1.png).

  • (Setting 4) different covariate effects: for RCT, λ0(tX)=exp(0.2X10.2X20.2X3)\lambda_0(t\mid X)=\exp(-0.2X_1 -0.2X_2 -0.2X_3); for EC, λ0(tX)=exp(0.5X10.5X20)\lambda_0(t\mid X)=\exp(-0.5X_1-0.5X_2-0)
  • (Setting 5) different time-varying hazards: for RCT, λ0(tX)=texp(0.2X10.2X20.2X3)\lambda_0(t\mid X) = t \exp(-0.2X_1-0.2X_2 -0.2X_3); for EC, λ0(tX)=2texp(0.2X10.2X20.2X3)\lambda_0(t\mid X) = 2t \exp(-0.2X_1-0.2X_2 -0.2X_3).
  1. Both the proposed estimator and TransCox (Li et al. (2023b)) could handle differences in covariate effects and time-varying hazards. However, TransCox is only valid under the Cox model. If the conditional survival curve Sa(tX)S_a(t\mid X) is not a Cox model (e.g., under Settings Two and Three in the paper, if we integrate out UU (or δ\delta) out in the hazards, the model generation will no longer be Cox model), TransCox will have large biases whereas our proposed estimator still controls the bias due to its double robustness and achieves improved performance.

Experimental Designs Or Analyses

  1. The penalty term is chosen to be the adaptive lasso (Zou, 2006). The conditional survival curves Sa(tX)S_a(t\mid X) and SC(tX)S^C(t\mid X) for the event and censoring are modeled by the Cox PH model, and the propensities πR(X)\pi_R(X) and πA(X)\pi_A(X) are modeled by SuperLearner from the SuperLearner R package, which is an ensemble model of the Logistic Regression and Random Forest. These details will be added to the Simulation section.

Other Comments Or Suggestions

  1. The comments on the tables and figures are well-received. We have updated the colors for figures by the Nature Platte. Also, the shapes of points are changed to be different for estimator and scenario. The updated figures are provided here (https://anonymous.4open.science/r/ICML2025-7977/fig2/fig2_1.PNG; https://anonymous.4open.science/r/ICML2025-7977/fig2/fig2_2.PNG; https://anonymous.4open.science/r/ICML2025-7977/fig2/fig2_3.PNG).

Questions For Authors

  1. The conditional hazard function should be indexed with a. The original table only includes the hazard functions that have been modified under each considered setting. To avoid confusion, we explicitly list the hazard function as λa(tX,R)\lambda_a(t \mid X, R) under each setting.
  • (Setting 1) λa(tX,R)=exp(0.5a1pTX0.2)\lambda_a(t \mid X, R) = \exp(-0.5a - 1_p^T X \cdot 0.2);
  • (Setting 2) λa(tX,R)=exp[0.5a1pTX0.2+3{U+1(R=0)}]\lambda_a(t \mid X, R) = \exp[-0.5a - 1_p^T X \cdot 0.2 + 3\{U + \mathbf{1}(R=0)\}];
  • (Setting 3) λa(tX,R)=exp(0.5a1pTX0.2+3δ1(R=0))\lambda_a(t \mid X, R) = \exp(-0.5a - 1_p^T X \cdot 0.2 + 3\delta \mathbf{1}(R=0)).
  1. Under Setting 2, we include UU in the hazards for RCT as well to keep the variability of hazards the same level across two datasets. In particular, for R=1R =1, UN(0,1)U \sim N(0,1) with zero-mean is included in the hazard model, whereas for R=0R=0, U+1N(1,1)U+1 \sim N(1,1) with non-zero mean is included, which is expected to introduce more outcome drift for the external controls.

  2. Lack of concurrency will be discussed further in the section of Introduction, where FDA drafted the Guidance documents on the use of external controls: "Lack of concurrency could occur when RCTs and ECs are collected in different time periods or under varying healthcare settings. Therefore, directly integrating ECs with RCTs without any adjustment could introduce biases into the treatment estimation". The guidance reference from FDA will be added to the paper


  1. Li, Z., Shen, Y., and Ning, J. Accommodating time-varying heterogeneity in risk estimation under the cox model: A transfer learning approach. Journal of the American Statistical Association, 118(544):2276–2287, 2023b.
  2. H. Zou. The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429, 2006.
审稿人评论

I'd like to thank the authors for their rebuttal clarifications, additional simulation experiments, and miscellaneous fixes.

Since these have addressed my main concerns and the additional simulation results are consistent with the ones previously performed in the paper, I am raising my score to Accept.

作者评论

Thank you for taking the time to review our rebuttal and new additional experiments, as well as for updating your score. Your comments certainly strengthen our paper and will be incorporated in the updated paper.

最终决定

This work introduces a new way to estimate treatment effects in survival analysis using external controls, which is especially helpful when clinical trials have small control groups, as is the case in rare diseases. The work introduces a doubly protected estimator for the restricted mean survival time (RMST) difference, combining doubly robust estimation to adjust for covariate shifts and a DR-Learner to mitigate outcome drift.

Most reviewers agree that this paper handles an interesting and timely problem, especially in the context of survival outcomes. While novelty could be higher, the paper itself is interesting, especially given the proposed estimators' basically non-existent cost in terms of bias. I think the paper will be of interest to many in the ICML community.