PaperHub
6.8
/10
Poster4 位审稿人
最低6最高8标准差0.8
6
8
7
6
3.5
置信度
正确性3.3
贡献度3.0
表达3.3
NeurIPS 2024

Robust Conformal Prediction Using Privileged Information

OpenReviewPDF
提交: 2024-05-15更新: 2025-01-09

摘要

关键词
Conformal PredictionUncertainty QuantificationDistribution ShiftCorrupted DataPrivileged Information

评审与讨论

审稿意见
6

This paper focuses on conformal prediction with "privileged information" in constructing prediction intervals with missingness/noise. The proposed quantile-of-quantile approach handles the difficulty that the "privileged information" is not available for test data and achieves finite-sample coverage.

优点

The paper is well-written and organized, which presents both solid theoretical guarantees and comprehensive simulation studies. The presentation is very good.

缺点

The intuition behind the notion of "privileged information" is not well explained. It would be better if examples of Z in real applications could be introduced and justified in the beginning. There are also hyperparameters, e.g. \beta, in the proposed approach, for which ablation studies are not provided.

问题

  1. [Conditional independence assumption] Consider the scenario with split conformal prediction and Z is a function of X that is probably determined with the first half of training data, e.g. Z is the feature from X that correlates the most with Y. In this case, the conditional independence assumption may not hold precisely and a robustness result would be better to make theory complete.

  2. [Choice of beta] Ablation study on the choice of the hyperparameter beta in the simulation would be helpful in understanding the role of beta and a tradeoff in terms of accuracy could be expected.

  3. [Definition of Z] In some real applications, the "privileged information" is not pre-defined, and instead, it is usually learned from data. It would be very interesting to investigate the "optimal" partition of information in X into features and "privileged information". Sufficient dimension reduction, self-supervised training, and relevant techniques could be promising.

局限性

Limitations are discussed in the paper.

评论

We appreciate your review of our work and thank you for the helpful suggestions and interest in our work. In what follows we respond to your concerns in detail.

The privileged information

We thank the reviewer for raising this point which is related to the one raised by Reviewer 3YMV. For the convenience of the reviewer, we repeat our answer below. In the updated manuscript, in Example 1 and Example 2 we expanded about the intuition behind the privileged information. Specifically, in the noisy response setup outlined in Example 1, we explained that since the PI, ZiZ_i, is the information about the annotator, it is likely to explain the corruption appearances MiM_i. That is, in this example, the features Xi(0)X_i(0) and the ground truth label Yi(0)Y_i(0) do not provide additional knowledge about the corruption indicator given ZiZ_i, i.e., Xi(0),Yi(0)\indepMiZi=z X_i(0), Y_i(0) \indep M_i \mid Z_i=z.

Additionally, we added the following example inspired by a real-world medical task. Consider a medical setup where patients are being selected for a costly diagnosis, such as an MRI scan. Here, Xi(0)=Xi(1)X_i(0)=X_i(1) is the more standard medical measurements of the i-th patient, such as age, gender, medical history, and disease-specific measurements. The PI ZiZ_i is the information manually collected by the doctor to choose whether the patient should be examined by an MRI scan. This information is obtained through, e.g., a discussion of the doctor with the patient, or a physical examination, and could include, for instance, shortness of breath, swelling, blurred vision, etc. The response Yi(0)Y_i(0) is the disease diagnosis obtained by the MRI scan, and Yi(1)=NAY_i(1)=’`NA`’. The missingness indicator MiM_i equals 0 if the doctor decides to conduct an MRI scan, and 1 otherwise. At test time, our goal is to assist the doctors in future decisions before examining the patients, and hence the test PI ZtestZ_\text{test} is unavailable. This task is relevant in situations where the number of available doctors is insufficient to examine all patients. Here, ZiZ_i explains the missingness MiM_i, and MiM_i does not depend on XiX_i or YiY_i given ZiZ_i.

Finally, we note that PCP can produce valid uncertainty sets even if the independence assumption (X\indepM)Z(X \indep M) \mid Z is not satisfied. In this case, we instead assume that both X,ZX,Z explain the corruption indicator, namely, (Y\indepM)(Z,X)(Y \indep M) \mid (Z,X) , and that the features are uncorrupted X(0)=X(1)X(0)=X(1). Also, the weights wiw_i are instead defined as: wi=P(M=0)P(M=0X=xi,Z=zi)w_i = \frac{\mathbb{P}(M=0)}{ \mathbb{P}(M=0 \mid X=x_i, Z=z_i)}. We will clarify this point in the revised manuscript as well.

The choice of β\beta

Thank you for bringing up this important point, which was also raised by reviewer hB3A. First, we emphasize that Theorem 1 holds for any choice of β(0,α)\beta\in (0,\alpha). Therefore, β\beta only affects the sizes of the uncertainty sets. Intuitively, as βα\beta \rightarrow \alpha, a higher quantile of the weighted distribution of the scores is taken, and a lower quantile of the QiQ_i’s is taken. Similarly, as β0\beta \rightarrow 0 a lower quantile of the weighted distribution of the scores is taken, and a higher quantile of the QiQ_i’s is taken. An optimal β\beta can be considered as the β\beta that leads to the narrowest intervals. Such optimal β\beta can be practically computed with a grid of values for β\beta in (0,α)(0,\alpha), using a validation set. Nonetheless, to maintain Algorithm 1 (PCP) simple and intuitive, we did not optimize for β\beta in the paper. In our experiments, we chose β\beta that is close to 0, so the quantile of the weighted distribution of the scores chosen by PCP is close to the quantile chosen by (the infeasible) WCP. In response to this comment, we added this discussion to the text, and we conducted an ablation study for the choice of β\beta on a synthetic dataset. The results of the ablation study are provided in the PDF file in the global response. This experiment indicates that the smallest intervals are achieved for β\beta that is close to 0, and the interval sizes are an increasing function of β\beta. Yet, it is important to understand that different results could be obtained for different datasets. Therefore, we recommend choosing β\beta using a validation set, as explained above. Thank you for the opportunity to discuss this issue.

Definition of Z

Definitely! We agree that in some tasks it might be difficult to collect privileged information. Indeed, we view the problem of choosing an “ideal” privileged information that satisfies the conditional independence assumption as a thrilling future research direction.

评论

Thank you for the comments! I'll maintain my score at this moment. I think it is a very interesting paper and it raises the idea of using side information to leverage (conformal) inference. I'll be more than happy to see a discussion section on (1) how to identify privileged information and (2) potential extension beyond the conditional independence assumption. Thanks!

评论

We are appreciative of the reviewer's comments and for acknowledging our previous response. We are thankful for the opportunity to further clarify and expand on the points raised. Below, we address each comment in detail.

Identifying privileged information

Identifying appropriate privileged information is indeed a challenging task. In general, this requires expert knowledge, similar to the unconfoundedness assumption in causal inference applications used for treatment effect estimation, for example. In more detail, our solution lies in the conditional independence assumption: (X(0),Y(0))\indepMZ=z(X(0), Y(0)) \indep M \mid Z=z. In words, ZZ should explain the information about the corruption appearances that is encapsulated in X(0)X(0) and Y(0)Y(0). For instance, in our noisy labels setup in Example 1, the information about the annotator is a good candidate for privileged information, as the noise pattern depends directly on the annotator. Yet, the challenge here is that the conditional independence requirement cannot be directly tested in practice, as we only observe (X(0),Y(0))M=0(X(0), Y(0)) \mid M=0. In view of unconfoundedness, our work introduces a relaxation of this assumption as we do not assume that ZZ is observed at test time.

From a practical perspective, we found that PCP can produce valid uncertainty sets even if the independence assumption is not satisfied. For example, in our real-world noisy response experiment from Section 4.3, PCP attained the target coverage rate even though the conditional independence assumption was not confirmed. We believe this is attributed to our formulation of ZZ, being a variable that is highly correlated to MM. Intuitively, when ZZ explains the corruption indicator MM well, it is sensible to believe that the conditional independence requirement is approximately satisfied.

More formally, below we analyze the setting where the conditional independence assumption is violated and provide a lower bound for the coverage rate attained by PCP. In simple words, this new result reveals that PCP achieves a coverage rate that is closer to the nominal level as the conditional independence assumption violation is smaller.

Extension beyond the conditional independence assumption

We are grateful for the opportunity to discuss this topic. We propose an initial extension of Theorem 1 to a setting where the conditional independence assumption is not fully satisfied. For the simplicity of this initial extension, we assume X(0)\indepMZ=zX(0) \indep M \mid Z=z.

The independence assumption Y(0)\indepMZ=z Y(0) \indep M \mid Z=z is equivalent to assuming that the density of Y(0)M=m,Z=zY(0) \mid M=m, Z=z is the same for m0,1m \in \\{0,1\\}, formally:

fY(0)M=0,X=x,Z=z(y;0,x,z)=fY(0)M=1,X=x,Z=z(y;1,x,z).f_{Y(0) \mid M=0, X=x,Z=z}(y; 0,x,z) = f_{Y(0) \mid M=1,X=x, Z=z}(y; 1,x,z).

In our extension, we relax this assumption and instead require that xX\forall x\in \mathcal{X}, there exists εxR\varepsilon_x \in\mathbb{R} such that the difference between the two densities is bounded by εx\varepsilon_x: yY,zZ:fY(0)M=0,X=x,Z=z(y;0,x,z)fY(0)M=1,X=x,Z=z(y;1,x,z)εx.\forall y\in\mathcal{Y}, z\in\mathcal{Z}: | f_{Y(0) \mid M=0, X=x,Z=z}(y; 0,x,z) - f_{Y(0) \mid M=1,X=x, Z=z}(y; 1,x,z) | \leq \varepsilon_x .

Theorem[RobustnessofPCPtoconditionalindependenceviolation]**Theorem [Robustness of PCP to conditional independence violation]**

Suppose that (Xi(0),Xi(1),Yi(0),Yi(1),Zi,Mi)_i=1n+1\\{(X_i(0),X_i(1), Y_i(0),Y_i(1),Z_i, M_i)\\}\_{i=1}^{n+1} are exchangeable, PZP_{Z} is absolutely continuous with respect to PZM=0P_{Z \mid M=0}, and xX\forall x\in\mathcal{X} there exists εxR\varepsilon_x \in\mathbb{R} such that: yY,zZ:fY(0)M=0,X=x,Z=z(y;0,x,z)fY(0)M=1,X=x,Z=z(y;1,x,z)εx. \forall y\in\mathcal{Y}, z\in\mathcal{Z}: | f_{Y(0) \mid M=0, X=x,Z=z}(y; 0,x,z) - f_{Y(0) \mid M=1,X=x, Z=z}(y; 1,x,z) | \leq \varepsilon_x . Then, the coverage rate of the prediction set CPCP(Xtest)C^{PCP}(X^\textup{test}) constructed according to Algorithm 1 is lower bounded by:

P(YtestCPCP(Xtest))1αEX,Z[CPCP(X)εXP(M=1X,Z)].\mathbb{P}(Y^\textup{test} \in C^{PCP}(X^\textup{test})) \geq 1-\alpha- \mathbb{E}_{X,Z}[ | C^`PCP` (X) | \varepsilon_X \mathbb{P}(M=1\mid X,Z)] .

We omit the proof due to the limitation of space. We can send the proof in a separate comment.

This result provides a lower bound for the coverage rate of PCP in the setting where the conditional independence assumption is not exactly satisfied. Intuitively, as εx\varepsilon_x decreases, i.e., as the two distributions Y(0)M=m,Z=zY(0) \mid M=m, Z=z for m0,1m\in \\{0,1\\} are closer to each other, the lower bound is tighter, and closer to the target level. Similarly, as the two distributions diverge, the lower bound becomes looser.

We once again thank the reviewer for these insightful suggestions, which significantly improve our paper. We will include these discussions in the revised manuscript.

审稿意见
8

This paper introduces a method to create prediction sets with guaranteed coverage in the presence of training data that is corrupted by missing or noisy variables. The approach is an extension of conformal prediction that works by assuming access to privileged information available during training. This information is used to account for distribution shifts caused by the corruptions. The proposed method is supported by theoretical coverage guarantees and empirical examples are used to demonstrate that the approach produces more reliable and informative predictions compared to existing methods on real and synthetic datasets.

优点

This paper stands out for its originality in addressing issues caused by corrupted training data through a novel extension of conformal prediction. Through a rather common assumption of access to privileged information the authors are able to develop an effective solution for obtaining conformal prediction sets despite the distribution shift cased by the corrupted data. The main results are rigorously proven and the presentation of the results and proofs are clear. The difficulty in proving Theorem 1 is well-explained and thus the the quality and significance in obtaining the result is evident. The practically of this result is apparent in that it potentially opens new possibilities for applying conformal prediction in high-stakes applications where there may be corrupted data.

缺点

The empirical evaluation, though comprehensive, could benefit from a broader range of real-world datasets. Also some discussion around access to and potential surrogates for the privileged information could better highlight the scope of the work. For example, some discussion on what could constitute privileged information and ideally even what characteristics of the information would make it most useful to the method would be interesting and really enhance the work.

问题

Could some discussion around what constitutes "ideal" privileged information for this particular method be possible?

局限性

Yes, sufficiently addressed.

作者回复

We very much appreciate your positive feedback and interest in our work. We thank the reviewer for classifying our contribution as a novel one. We also thank the reviewer for their helpful comments and suggestions. In what follows, we address your comments in detail. \newcommand{\indep}{\perp \\\!\\!\\! \perp}

The Privileged Information

We thank the reviewer for raising this point which is related to a comment raised by Reviewer R7TW. In the updated manuscript, in Example 1 and Example 2 we expanded about the intuition behind the privileged information. Specifically, in the noisy response setup outlined in Example 1, we explained that since the PI, ZiZ_i, is the information about the annotator, it is likely to explain the corruption appearances MiM_i. That is, in this example, the features Xi(0)X_i(0) and the ground truth label Yi(0)Y_i(0) do not provide additional knowledge about the corruption indicator given ZiZ_i, i.e., Xi(0),Yi(0)\indepMiZi=z X_i(0), Y_i(0) \indep M_i \mid Z_i=z.

Additionally, we added the following example inspired by a real-world medical task. Consider a medical setup where patients are being selected for a costly diagnosis, such as an MRI scan. Here, Xi(0)=Xi(1)X_i(0)=X_i(1) is the more standard medical measurements of the i-th patient, such as age, gender, medical history, and disease-specific measurements. The PI ZiZ_i is the information manually collected by the doctor to choose whether the patient should be examined by an MRI scan. This information is obtained through, e.g., a discussion of the doctor with the patient, or a physical examination, and could include, for instance, shortness of breath, swelling, blurred vision, etc. The response Yi(0)Y_i(0) is the disease diagnosis obtained by the MRI scan, and Yi(1)=NAY_i(1)='`NA`' is the missing value. The missingness indicator MiM_i equals 0 if the doctor decides to conduct an MRI scan, and 1 otherwise. At test time, our goal is to assist the doctors in future decisions before examining the patients, and hence the test PI ZtestZ_\text{test} is unavailable. This task is relevant in situations where the number of available doctors is insufficient to examine all patients. Here, ZiZ_i explains the missingness MiM_i, and MiM_i does not depend on XiX_i or YiY_i given ZiZ_i.

Finally, we note that PCP can produce valid uncertainty sets even if the independence assumption (X\indepM)Z(X \indep M) \mid Z is not satisfied. In this case, we instead assume that both X,ZX,Z explain the corruption indicator, namely, (Y\indepM)(Z,X)(Y \indep M) \mid (Z,X) , and that the features are uncorrupted X(0)=X(1)X(0)=X(1). Also, the weights wiw_i are instead defined as: wi=P(M=0)P(M=0X=xi,Z=zi)w_i = \frac{\mathbb{P}(M=0)}{ \mathbb{P}(M=0 \mid X=x_i, Z=z_i)}. We will clarify this point in the revised manuscript as well.

审稿意见
7

The authors introduce a calibration method called Privileged Conformal Prediction (PCP) to generate prediction sets that guarantee coverage on uncorrupted test data, even when the target label (Y) and/or input features (X) in the calibration data samples are corrupted (e.g., missing or noisy variables). The key innovation is leveraging privileged information (PI)—additional features (Z) available during training but not at test time—to handle distribution shifts induced by corruptions. They assume that the input and target features of clean data (X(0),Y(0)) are independent of the corruption indicator variable (M) given the privileged information (Z). This allows the authors to treat this setting as a specific case of covariate shift (and weighted conformal prediction), with the added challenge that the PI variable is not available at test time (Ztest). To address this, they propose a reformulation of the weighted conformal prediction framework, where an estimate of the non-conformity score threshold without Ztest is obtained by considering a conservative estimate based on a quantile of the calibration thresholds. Experiments on real and synthetic datasets show that PCP achieves valid coverage rates and constructs more informative predictions than existing methods that lack theoretical guarantees

优点

The paper is well-presented, motivated, clear, and easy to follow. The problem seems relevant and novel within the context of conformal prediction, as far as I know. The authors effectively present the hypothesis and challenges of their problem. Discussing the two-stage approach before introducing the proposed solution is beneficial, as one might initially consider estimating Ztest from the input features and building the weight based on that estimate. I think the proposed method, where the authors estimate the non-conformity score threshold without Ztest by considering a conservative estimate based on a quantile of the calibration thresholds, is well-motivated and supported by theoretical guarantees, as presented in Theorem 1. Moreover, experimental results show the benefits of the proposed approach.

缺点

I do not see major weaknesses with this work, authors address the limitations of their approach in Section 5.

问题

.

局限性

Yes.

作者回复

We are very grateful for the time and effort you put into this review. We are also very appreciative of your encouragement and positive assessment of our work. Your feedback is very important to us! Thank you again for your support.

审稿意见
6

This paper is studying the problem of conformal prediction in the presence of data (covariate or label) corruption leveraging privileged data. They build upon the framework of weighted conformal prediction by introducing a novel leave-one-out weighting technique which produces a conservative (upper-bound) estimate of the original threshold of the weighted conformal prediction. They further theoretically and experimentally evaluate the coverage validity of their method.

优点

  • The presented framework is very general and can potentially be applied in a range of problems in practice.
  • Their method is very intuitive and insightful in the sense that their leave-one-out technique can potentially be applied to adjacent problems in conformal prediction.
  • The presentation of the paper is very nice. I particularly like the authors decision to present the "two-stage naive" method first to prepare the readers to fully absorb the underlying challenge of the problem.

缺点

I am willing to show a thumb up for this paper. However, before that I would appreciate if the authors can respond to the following concerns:

  1. I dont understand how one can compute the quantiles (Q_i) using the eq.(8). It looks like one need to know the values of w_i in order to compute Q_i. However, I can not find an explanation in the paper on how to compute w_i using data. A formulation of w_i is given in the paper using the probability distribution of M and Z. Are the authors assuming that the distribution of (Z, M) are known? if not how one can compute w_i? If the w_i is meant to be estimated from data, first how to do that estimation and second how does that estimation affects the coverage validity theorems? This is an important concern, and either way, this should be clearly stated before presenting the algorithm. In the current format, the algorithm is not complete!

  2. It is interesting, yet concerning, that there is no trace of the choice of \beta in the presented theory. Can the authors comment on the choice of \beta from point of views of theory and practice? It is odd to me how the algorithm might produce meaningful prediction sets for both \beta = 0.00001 and \beta = 0.99999! It is then a natural question/expectation that how one should tune \beta in practice and how that choice affect the prediction set size and coverage validity.

问题

  • How to compute w_i?
  • How to tune \beta? I might ask more questions after i hear the response of the authors.

局限性

There is a limitation section that covers most of the limitations. However, I believe some of the limitations discussed in the very last section of the paper (specifically all the theoretical assumptions) must be presented and discussed much earlier in the paper.

作者回复

We appreciate your positive and valuable feedback and suggestions. In what follows, we address your concerns in detail.

The weights wiw_i

We thank the reviewer for raising this point. As the reviewer suggested, the real ratios of likelihoods, wiw_i, are required to provide the validity guarantee in Theorem 1. While PCP can be applied with estimates of wiw_i, which can be computed with estimates of P(M=0Z=z)\mathbb{P}(M=0 \mid Z=z) according to equation (3), the validity guarantee does not hold in this case. This restriction is similar to WCP which also requires the true weights wiw_i to provide a validity guarantee. The effect of inaccurate estimates of wiw_i on the coverage rate attained by PCP could be an exciting future direction to explore, e.g., by borrowing ideas from [1].

Following the reviewer's comment we updated Algorithm 1 (PCP) in the text, and added wiw_i as an input. Furthermore, in Section 3.2 we explained if the real weights wiw_i are unavailable, then they can extracted from estimates of P(M=0Z=z)\mathbb{P}(M=0 \mid Z=z) according to equation (3), and this conditional probability can be estimated from the training data, using any off-the-shelf classifier. We also clarified in the updated manuscript that Theorem 1 does not hold if PCP is not used with the oracle weights.

[1] Yonghoon Lee, Edgar Dobriban, and Eric Tchetgen Tchetgen. Simultaneous conformal prediction of missing outcomes with propensity score ϵ\epsilon-discretization. arXiv preprint arXiv:2403.04613, 2024.

The choice of β\beta

Thank you for bringing up this important point, which was also raised by reviewer R7TW. First, we emphasize that Theorem 1 holds for any choice of β(0,α)\beta\in (0,\alpha). Therefore, β\beta only affects the sizes of the uncertainty sets. Intuitively, as βα\beta \rightarrow \alpha, a higher quantile of the weighted distribution of the scores is taken, and a lower quantile of the QiQ_i’s is taken. Similarly, as β0\beta \rightarrow 0 a lower quantile of the weighted distribution of the scores is taken, and a higher quantile of the QiQ_i’s is taken. An optimal β\beta can be considered as the β\beta that leads to the narrowest intervals. Such optimal β\beta can be practically computed with a grid of values for β\beta in (0,α)(0,\alpha), using a validation set. Nonetheless, to maintain Algorithm 1 (PCP) simple and intuitive, we did not optimize for β\beta in the paper. In our experiments, we chose β\beta that is close to 0, so the quantile of the weighted distribution of the scores chosen by PCP is close to the quantile chosen by (the infeasible) WCP. In response to this comment, we added this discussion to the text, and we conducted an ablation study for the choice of β\beta on a synthetic dataset. The results of the ablation study are provided in the PDF file in the global response. This experiment indicates that the smallest intervals are achieved for β\beta that is close to 0, and the interval sizes are an increasing function of β\beta. Yet, it is important to understand that different results could be obtained for different datasets. Therefore, we recommend choosing β\beta using a validation set, as explained above. Thank you for the opportunity to discuss this issue.

Limitations

We thank the reviewer for raising this topic. We added a discussion about the limitations of PCP in Section 3.2. Specifically, we clarified that the true conditional corruption probability P(M=1Z=z)\mathbb{P}(M=1 \mid Z=z) must be known to provide a theoretical coverage validity guarantee.

We thank the reviewer for these comments, which greatly improve our paper!

评论

I thank the authors for their responses. The argument and the numerical evaluation for the value of β\beta is well taken. I also encourage the revisions with respect to the values of wiw_i, as suggested by authors.

What is still missing for me is a principled way of estimating wi:=P(M=0)P(M=0Z=Zi)w_i:=\frac{\mathbb{P}(M=0)}{\mathbb{P}\left(M=0 \mid Z=Z_i\right)}. In the case of WCP, the estimated ratios are likelihood ratios of the two distributions, which can be effectively done by unlabeled data (at least when the covariates are low dimensional) and there is a rich literature on that matter. Here one has to estimate a specific ratio. Therefore, I would appreciate if authors can elaborate more on their comment "this conditional probability can be estimated from the training data, using any off-the-shelf classifier".

评论

We thank the reviewer for their comments and for acknowledging our previous response. Below, we provide a detailed explanation of the approach we recommend for estimating wiw_i, which we also employed in our experiments.

First, we estimate the conditional corruption probability given ZZ, i.e., P(M=0Z=z)\mathbb{P}(M=0 \mid Z=z), using the training and validation sets with any off-the-shelf classifier, such as random forest, XGBoost, or a neural network. This classifier takes the PI ZZ as an input and outputs an estimate for the conditional corruption probability, which we denote by p^(M=0Z=z)\hat{p}(M=0 \mid Z=z). Notice that this classifier can be fit on unlabeled data, similar to the approach suggested by the reviewer. In our experiments, we primarily used neural networks, and occasionally random forests or XGBoost. The specific models used for each dataset are detailed in Table 2 of Appendix D1. Next, we estimate the marginal corruption probability directly from the data: p^(M=0)=1ni=1nMi\hat{p}(M=0)=\frac{1}{n}\sum_{i=1}^{n} M_i Finally, the estimated weights are computed according to equation (3), using the estimated probabilities: w^i=w^(zi)=p^(M=0)p^(M=0Z=zi)\hat{w}_i = \hat{w}(z_i) = \frac{\hat{p}(M=0)}{\hat{p}(M=0 \mid Z=z_i)}

We appreciate the opportunity to elaborate on this integral aspect of our method and we will clarify this point in the revised manuscript. We apologize for any confusion and hope this discussion resolves the reviewer’s concerns. Please let us know if there are any questions, comments, or concerns left.

评论

I thank the authors for their detailed responses. My major concerns are addressed and I would like to increase my score and vote for acceptance.

作者回复

Dear Reviewers,

Thank you for your time and effort in reviewing our submission and providing valuable feedback and suggestions.

In response to the reviewers' comment, we attached to this reply a PDF file containing the results of an ablation study analyzing the effect of β\beta on Algorithm 1 (PCP).

Once again, we thank the reviewers for helping us improve our paper!

最终决定

This paper considers the problem of conformal prediction in the presence of data corruption assuming the availability of some privileged information (i.e. additional features that explain the distribution shift). They build upon the framework of weighted conformal prediction by introducing a novel leave-one-out weighting technique which produces a conservative (upper-bound) estimate of the original threshold of the weighted conformal prediction. The authors evaluate the coverage validity of their method both theoretically and experimentally.

All the reviewers agree that the paper provides a novel set of ideas and methodologies for the important problem of robust CP in the presence of corruption in the training data. Hence, I would like to recommend the paper to be accepted.