PaperHub
4.0
/10
Rejected4 位审稿人
最低3最高5标准差1.0
5
3
5
3
3.5
置信度
ICLR 2024

Causal Representation Learning and Inference for Generalizable Cross-Domain Predictions

OpenReviewPDF
提交: 2023-09-20更新: 2024-02-11
TL;DR

We propose a causal representation learning framework based on a novel SCM

摘要

关键词
Generalizable representation learning; Causal Intervention

评审与讨论

审稿意见
5

The authors propose a domain generalization algorithm motivated by causality. They specify two latent variables uxu_x and uxyu_{xy} whose marginals can shift between training and testing. Their approach aims to become invariant to the aforementioned latent variables by intervening on their common child zsz_s, which closes the backdoor path between the input and target.

优点

The authors tackle the important problem of DG, and their approach shows promising empirical results. The paper is well-written and clearly motivated. Also, their approach is interesting in that unlike many existing DG algorithms, theirs doesn't use the environment labels.

缺点

I found three technical issues with the paper. One is a major issue, and two are minor.

  1. (Major) The algorithm does not perform its stated purpose of being invariant to shifts in uxu_x and uxyu_{xy}. The predictive distribution in Eq. (2) is not invariant to uxu_x and uxyu_{xy}, since it involves an expectation over p(zsx)p(z_s \mid x), which can shift across training and testing.

  2. (Minor) The posterior is assumed to factorize q(zc,zsx)=q(zcx)q(zsx)q(z_c, z_s \mid x) = q(z_c \mid x) q(z_s \mid x), which is at odds with the assumed causal graph.

  3. (Minor) The authors cite Kivva 2022 to claim that their standard normal prior p(zc)p(z_c) is a one-component Gaussian mixture, and therefore zcz_c is identifiable (along w/ the piecewise affine decoder assumption). Calling a standard normal distribution a Gaussian mixture is technically true, but this identifiability argument is a bit tenuous.

问题

Please address my three points above in the "weaknesses" section.

评论

Factorized variational distribution qq: While we acknowledge that the true distribution p(Zc,ZsX)p(Z_c, Z_s|X) cannot be factored under our SCM assumption, it often becomes intractable due to the conditions that only XX and YY are observed in the SCM. The distribution p(Zc,ZsX)p(Z_c, Z_s|X) is likely to be complex and non-Gaussian. As an alternative solution, we approximate p(Zc,ZsX)p(Z_c, Z_s|X) by a variational distribution q(Zc,ZsX)q(Z_c, Z_s|X) with Gaussian assumptions. Although using non-factorized Gaussians might slightly enhance results, it requires more time to estimate the large covariance matrix. There is always a balance between accuracy and efficiency. We believe our assumption, commonly used in many VAEs, is reasonable and effective. Moreover, our empirical results indicate that the factorized Gaussian approximation effectively leads to better OOD generalization. We appreciate your suggestion and will include an analysis of using non-factorized Gaussians in our revised paper.

Identifiability: We emphasize that our proof of identifiability, leveraging the results of [1], is mathematically correct. However, it is important to note that this proof is not our main contribution. Our focus is on designing conditional distributions in our SCM to ensure that the learned latent representations are identifiable. While Kivva et al. have proven identifiability in a variety of cases, our work necessitates only a specific instance to effectively learn our SCM. We model p(Zc)p(Z_c) as a standard Gaussian instead of a mixture of Gaussian, chosen for its simplicity and because we lack additional prior information about ZcZ_c. This assumption aligns with the prior distributions commonly used in many VAEs.

[1] Kivva, Bohdan, et al. "Identifiability of deep generative models without auxiliary information." Advances in Neural Information Processing Systems 35 (2022): 15687-15701.

评论

Invariance of p(YX,do(Zs))p(Y|X, do(Z_s)): To address the reviewer’s concerns, we will first emphasize and rectify the assumptions we make regarding the latent variables Ux,UxyU_x, U_{xy}. Then we will explain that under these assumptions, p(YX,do(Zs))p(Y|X, do(Z_s)) is invariant and transportable across domains. Finally, we will explain how to obtain ZsZ_s values for calculating p(YX,do(Zs))p(Y|X, do(Z_s)) and why it is a reasonable and good choice we have to approximate an invariant transformation between ZsZ_s and XX.

  • Assumptions: Upon careful review, we acknowledge that our assertion regarding the variation of p(Uxy)p(U_{xy}) was overstated. It is imperative to make the confounding effects between the source domain and target domain consistent to make the proposed interventional distribution invariant and transportable. Hence we correct our initial assumption that the confounding effects, encompassing p(Uxy)p(U_{xy}) and the causal mechanisms between UxyU_{xy}, ZsZ_s, and YY, remain consistent across training and test domains. Moreover, we assume the distribution of p(Ux)p(U_x) varies across domains. It results in the variation of p(Zs)p(Z_s), and further the variation of p(X)p(X). However, we emphasize that the generative mechanism p(XZs,Zc)p(X|Z_s, Z_c) remain invariant across domains (XX is independent of Ux,UxyU_x, U_{xy} given Zc,ZcZ_c, Z_c). Otherwise, it is impossible to infer ZZ from any unseen domain.

  • Invariance of p(YX,do(Zs))p(Y|X, do(Z_s)): Our proposed interventional distribution, by setting specific values for ZsZ_s to mitigate the influence of the domain-specific variable UxU_x on ZsZ_s, effectively addresses the invariant confounding effects and prevent the UxU_x from influence YY. Hence, it is invariant and transportable across domains.

  • The choice of ZsZ_s: Under our corrected assumption, the latent confounding effect remains invariant across domains. To infer the label for a test input xtx^t using our interventional distribution, it is imperative to provide the true value of ZsZ_s for xtx^t to accurately account for the confounding effects between xtx^t and yty^t. The distribution p(ZsX)p(Z_s|X) utilized in Eq. (3) can be construed as a proxy distribution that enables us to derive the true values of ZsZ_s from an input XX. The challenge lies in determining which learned distribution can be employed to approximate this proxy distribution. According to the SCM, we observe that XX is independent of UU given Z=[Zc,Zs]Z=[Z_c, Z_s], rendering p(XZ)p(X|Z) invariant and transportable. Therefore, a reasonable and good choice is to obtain the identifiable zstz_s^t from the learned variational distribution q(ZsX=xt)q(Z_s|X=x^t) with a high p(X=xtZs=zst)p(X=x^t|Z_s=z^t_s). However, as q(ZsX=xt)q(Z_s|X=x^t) represents a variational estimation of the desired proxy distribution, we average the interventional distribution over multiple samples of zsz_s from q(ZsX=xt)q(Z_s|X=x^t). In practice, randomly obtaining a zsz_s value is likely to yield a low p(XZ)p(X|Z) and contribute minimally to the calculation of the interventional distribution.

Nevertheless, it's crucial to underscore that with the corrected assumption, our CIIRL remains innovative and has proven its efficacy across various benchmark distribution shift datasets: 1) With the revised assumptions, our proposed interventional distribution maintains invariance and transportability, facilitating cross-domain inference. 2) Our training procedure does not explicitly rely on the variation of p(Uxy)p(U_{xy}). The distinct classes of UU can be associated with the diverse domain variable UxU_x. Our training procedure can still effectively distinguish the two types of representations. 3) Empirical results affirm the existence of invariant latent confounders, as evidenced by the overall superior OOD prediction performance of CIIRL compared to predictions solely based on ZcZ_c.

审稿意见
3

This paper aims to solve the problem of out-of-distribution classification using a causal approach. In the problem setting, the features XX are caused by causal latent variables ZCZ_C and spurious latent variables ZSZ_S and are correlated with labels YY through both sets of latent variables. A typical classifier predicts P(YX)P(Y \mid X), using the correlation through both ZSZ_S and ZCZ_C. However, under distribution shift, the distribution of unobserved variables affecting ZSZ_S are changed, so using the spurious latent variables for classification can result in incorrect predictions out-of-distribution. Instead, the paper proposes using P(YX,do(ZS))P(Y \mid X, do(Z_S)) for classification, which severs the correlation between YY and XX through ZSZ_S via a causal intervention, thus providing a quantity that is invariant across domains. Estimating this quantity requires learning encoders which map XX to ZCZ_C and ZSZ_S, a decoder which maps ZSZ_S and ZCZ_C back to XX, and a classifier P(YZC)P(Y \mid Z_C). This is done by optimizing over a variational bound on the log-likelihood of the data. After training, predictions are obtained by computing a linear combination of predictions from P(YZC)P(Y \mid Z_C) weighted by a value indicating the compatibility of ZCZ_C with XX (using Monte Carlo sampling to estimate expectations). Experiments demonstrate the effectiveness of the approach.

优点

This paper offers a novel take on leveraging causality to solve out-of-distribution classification. To my knowledge, there are no works which consider modeling the problem as done in Fig. 1, where P(yx,do(zS))P(y \mid x, do(z_S)) is used as the classifier. The problem setup has interesting implications in terms of the ways that features XX and label YY are related. The experimental results also show promise that the approach is effective in practice.

缺点

I am concerned about the soundness of some of the claims:

  1. The path from UxyU_{xy} to YY is not influenced by any intervention on ZsZ_s. Hence, if ps(Uxy)pt(Uxy)p^s(U_{xy}) \neq p^t(U_{xy}), it should also be the case that ps(yx,do(zs))pt(yx,do(zs))p^s(y \mid x, do(z_s)) \neq p^t(y \mid x, do(z_s)). This seems to contradict what is stated at the end of Sec. 3.1.

  2. It is not clear how calculating the expectation of p(yx,do(zs))p(y \mid x, do(z_s)) over p(zsx)p(z_s \mid x) (as done so in Eq. 2) is considered marginalizing out zsz_s. It is also not clear why this is preferable to just choosing some arbitrary zsz_s to intervene.

  3. How are p(ux)p(u_x) and p(uxy)p(u_{xy}) modeled in Eq. 3 if they are unobserved and change between source and target?

  4. What justifies that the learned representations ZSZ_S and ZCZ_C truly follow the causal diagram in Fig. 1? Given the generative process of learning these representations (i.e. through q(zsx)q(z_s \mid x) and q(zcx)q(z_c \mid x)), it could be argued that ZSZ_S and ZCZ_C are caused by XX rather than the other way around. Further, it is difficult to believe that a learned representation can contain more information about YY than XX, but this is what is implied by the graph (i.e. YY and XX are independent given ZSZ_S and ZCZ_C but YY is not independent of ZSZ_S and ZCZ_C given XX?).

In addition, there are a few points that could use more elaboration:

  1. At the beginning of Sec. 3.1, it is explained that the consideration of UxU_x and UxyU_{xy} address two types of biases: selection bias and stereotype bias. This seems to be an interesting point and could be expanded.

  2. Under Alg. 1, the paper mentions the necessity of assumptions to compensate for the lack of observations of ZZ and UU. These should be explicitly stated, as this seems to be the crux of the reasoning behind why the model works. Further, are some of these assumptions only relevant to certain types of data (e.g. images)?

I cannot recommend acceptance while I have these doubts, but I look forward to having them clarified in the authors’ responses.

问题

See weaknesses.

评论

The SCM assumption: We would like to emphasize that the causal mechanisms in the proposed SCM in Figure 1 are all assumptions that are widely adopted in the area of causal representation learning [1, 2], including the three following points:

    1. The latent high-level factors ZZ can be separated into causal factors ZcZ_c and spurious factors ZsZ_s.
    1. The input XX is generated by the the high-level factors ZZ.
    1. Causal factor ZcZ_c is either direct cause or effect of target YY.

We cannot prove these assumptions always hold in real-world data and we admit the effectiveness and soundness of our derived theorem and algorithm is built upon these assumptions. We believe the learned representation ZsZ_s and ZcZ_c are the factors that satisfy the causal graph by

    1. parameterizing the joint distribution regarding all the variables of interest into conditional distributions adhering to the causal mechanisms in Figure 1;
    1. establishing the identifiability of the learned Zs,ZcZ_s, Z_c.

We appreciate the suggestion of further elaboration of these two types of biases and will revise the paper accordingly.

We acknowledge the recommendation to underscore the importance of assumptions and will incorporate this emphasis into our paper accordingly. It's worth noting that our method is not confined solely to image data. The validity of our theorem and the effectiveness of the algorithm persist as long as the assumptions we posit apply to the given data. As an illustration, our approach can be extended to encompass text data, demonstrating the versatility of our framework.

评论

Invariance of p(YX,do(Zs))p(Y|X, do(Z_s)): Upon careful review, we acknowledge that our assertion regarding the variation of p(Uxy)p(U_{xy}) was overstated. Our proposed interventional distribution, by setting specific values for ZsZ_s to mitigate the influence of the domain-specific variable UxU_x on ZsZ_s, effectively accounts for the invariant confounding effects and prevents the UxU_x from influencing YY. Hence, it is invariant and transportable across domains. However, it is important to note that this framework may not generalize to new domains with different and unknown confounding effects.

In light of this, we correct our initial assumption that the confounding effects, encompassing p(Uxy)p(U_{xy}) and the causal mechanisms between UxyU_{xy}, ZsZ_s, and YY, remain consistent across training and test domains. This invariant confounding assumption aligns with standard practices widely adopted in works utilizing interventional distributions [1]. We will incorporate such a revision into the paper.

Nevertheless, it's crucial to underscore that with the corrected assumption, our CIIRL remains innovative and has proven its efficacy across various benchmark distribution shift datasets:

    1. With the revised assumptions, our proposed interventional distribution maintains invariance and transportability, facilitating cross-domain inference.
    1. Our training procedure does not explicitly rely on the variation of p(Uxy)p(U_{xy}). The distinct classes of UU can be associated with the diverse domain variable UxU_x. Our training procedure can still effectively distinguish the two types of representations.
    1. Empirical results affirm the existence of invariant latent confounders, as evidenced by the overall superior OOD prediction performance of CIIRL compared to predictions solely based on ZcZ_c.

[1] Mao, Chengzhi, et al. "Causal transportability for visual recognition." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

The choice of ZsZ_s: Under our corrected assumption, the latent confounding effect remains invariant across domains. To infer the label for a test input xtx^t using our interventional distribution, it is imperative to provide the true value of ZsZ_s for xtx^t to accurately account for the confounding effects between xtx^t and yty^t. The distribution p(zsx)p(z_s|x) utilized in Eq. (3) can be construed as a proxy distribution that enables us to derive the true values of ZsZ_s from an input XX. The challenge lies in determining which learned distribution can be employed to approximate this proxy distribution. According to the SCM, we observe that XX is independent of UU given Z=[Zc,Zs]Z=[Z_c, Z_s], rendering p(XZ)p(X|Z) invariant and transportable. Therefore, a reasonable and good choice is to obtain the identifiable zstz^t_s from the learned variational distribution q(ZsX=xt)q(Z_s|X=x^t) with a high p(X=xtZs=zst)p(X=x^t|Z_s=z^t_s). However, as q(ZsX=xt)q(Z_s|X=x^t) represents a variational estimation of the desired proxy distribution, we average the interventional distribution over multiple samples of zsz_s from q(ZsX=xt)q(Z_s|X=x^t). In practice, randomly obtaining a zsz_s value is likely to yield a low p(XZ)p(X|Z) and contribute minimally to the calculation of the interventional distribution.

The modeling of p(Ux)p(U_x) and p(Uxy)p(U_{xy}): Firstly, we revise our assumption by rectifying the notion that the distribution of p(Uxy)p(U_{xy}) remains invariant across domains. In our context, UxU_x denotes any information specific to the domain, with a common simplification being to assume that UxU_x represents the domain index [2]. During the training procedure, we utilize a clustering algorithm to estimate the domain index for each training input. The objective of the training process is to acquire the encoder distributions q(ZsX),q(ZcX)q(Z_s|X), q(Z_c|X) that produce a disentangled representation, which is identifiable and possesses an invariant generative mechanism p(XZ)p(X|Z). The interventional distribution accommodates confounding effects and mitigates the influence of the domain variable UxU_x. Consequently, we can make inferences from the interventional distribution without knowledge of UxyU_{xy} and UxU_x for the target domain.

[2] Lu, Chaochao, et al. "Invariant causal representation learning for out-of-distribution generalization." International Conference on Learning Representations. 2021.

审稿意见
5

The work proposes a causal representation learning procedure for domain generalization given data from a single domain. An invariance relation is derived based on interventions on the spurious representation. The proposed procedure aims to identify the latent causal and spurious representations and then make predictions according to the invariance relation.

优点

  1. The representation learning procedure is novel and interesting, especially the interventions on ZsZ_{s}.

  2. The method outperforms the baselines by a large margin on the CMNIST dataset.

缺点

  1. The latent confounder UxyU_{xy} is assumed to be discrete, which is restrictive. The dependency between YY and ZsZ_{s} can be more complicated in general.

  2. The identifiability of the representation is a crucial result. From the discussion in Section 4.1, the identifiability results are not trivial. I think they should be written in a formal statement and proved rigorously.

  3. A claim is that p(YX,do(Zs))p(Y|X,do(Z_{s})) is invariant across different distributions due to the removed arrows UxZsU_x \to Z_{s} and UxyYU_{xy} \to Y. However, there is still an arrow UxyYU_{xy} \to Y, meaning that the marginal distribution of YY can change across different distributions. As a result, p(YX,do(Zs))p(Y|X,do(Z_{s})) is not invariant in general.

问题

  1. Whether the assumption of a discrete UxyU_{xy} can be relaxed? What are the consequences of a large J=UJ=|U|?

  2. Does the confounder make the invariance fail as mentioned above?

I may raise my score depending on the response. If the invariance indeed fails, I would recommend rejection.

评论

Assumptions of Ux,UxyU_{x}, U_{xy}: The assumptions we make about UxU_x and UxyU_{xy} posit them as random variables representing domain-specific and confounding information, respectively. We adhere to the standard practice of simplifying UxU_x into a domain index. Importantly, we do not constrain UxU_x and UxyU_{xy} to be exclusively discrete or continuous. During the training process, a neural network is employed to take in UxU_x and UxyU_{xy} and generate parameters, including the mean and variances, for the prior distributions of ZsZ_s. The neural network accommodates inputs of any type. However, the training procedure disentangles ZsZ_s and ZcZ_c by employing asymmetric prior distributions. As the dimensionality J=UJ=|U| increases, optimization becomes more challenging due to potential inaccuracies in estimating domain indices, a growing number of parameters in the prior distribution, and limited improvements (if any) in disentanglement.

Invariance of p(YX,do(Zs))p(Y|X, do(Z_s)): Upon careful review, we acknowledge that our assertion regarding the variation of p(Uxy)p(U_{xy}) was overstated. Our proposed interventional distribution, by setting specific values for ZsZ_s to mitigate the influence of the domain-specific variable UxU_x on ZsZ_s, effectively accounts for the invariant confounding effects and prevents the UxU_x from influencing YY. However, it is important to note that this framework may not generalize to new domains with different and unknown confounding effects.

In light of this, we correct our initial assumption that the confounding effects, encompassing p(Uxy)p(U_{xy}) and the causal mechanisms between UxyU_{xy}, ZsZ_s, and YY, remain consistent across training and test domains. This invariant confounding assumption aligns with standard practices widely adopted in works utilizing interventional distributions [1]. We will incorporate such a revision into the paper.

Nevertheless, it's crucial to underscore that with the corrected assumption, our CIIRL remains innovative and has proven its efficacy across various benchmark distribution shift datasets:

    1. With the revised assumptions, our proposed interventional distribution maintains invariance and transportability, facilitating cross-domain inference.
    1. Our training procedure does not explicitly rely on the variation of p(Uxy)p(U_{xy}). The distinct classes of UU can be associated with the diverse domain variable UxU_x. Our training procedure can still effectively distinguish the two types of representations.
    1. Empirical results affirm the existence of invariant latent confounders, as evidenced by the overall superior OOD prediction performance of CIIRL compared to predictions solely based on ZcZ_c.

[1] Mao, Chengzhi, et al. "Causal transportability for visual recognition." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Identifiability: We emphasize that our proof of identifiability, leveraging the results of [2], is mathematically correct. However, it is important to note that this proof is not our main contribution. However, we appreciate the suggestion of the reviewer and will consider dedicating a formal statement with rigorous proof for identifiability.

[2] Kivva, Bohdan, et al. "Identifiability of deep generative models without auxiliary information." Advances in Neural Information Processing Systems 35 (2022): 15687-15701.

审稿意见
3

In this paper, the authors investigate the problem of domain generalization, where the target domain datasets are unobserved during the training phases. To solve this problem, the authors propose a structural causal model with latent variables to model the causal mechanism. Sequentially, the authors conduct an intervention on the spurious representations to remove the spurious correlations and further learn the invariant interventional distribution. The authors evaluate the proposed methods on several datasets and achieve ideal performance.

优点

  1. The authors leverage the causal knowledge to address the domain generalization problem.
  2. The authors evaluate the proposed methods on several datasets.

缺点

  1. One important issue is the confusedness of the type of variables in Figure 1. In the domain generalization task, the domain labels are usually observed. However, it is unclear if UxU_x and Ux,yU_{x,y} are observed variables or not.
  2. Moreover, the authors mentioned that PS(YX,do(ZS))=PT(YX,do(ZS))P^S(Y|X,do(Z_S))= P^T(Y|X,do(Z_S)) according to Figure 2(b). But if Ux,yU_{x,y} is influenced by different domains, the aforementioned equation is not true.
  3. The proposed causal generation process is similar to that of [1], it is suggested that the authors should provide a discussion between the proposed causal generation process and [1]. Moreover, it seems to be impossible to conduct do-calculus on the latent variables without identification guarantees of the latent variables.

[1] Partial disentanglement for domain adaptation Lingjing Kong, Shaoan Xie, Weiran Yao, Yujia Zheng, Guangyi Chen, Petar Stojanov, Victor Akinwande, Kun Zhang Proceedings of the 39th International Conference on Machine Learning, PMLR 162:11455-11472, 2022.

问题

N.A.

评论

Latent UxU_x and UxyU_{xy}: Our approach is specifically designed to enhance out-of-distribution (OOD) prediction in situations where the domain variable UxU_x and confounder UxyU_{xy} are not known. While domain indices are provided for specific tasks/datasets like PACS and VLCS, acquiring them for general real-world tasks poses significant challenges.

Invariance of p(YX,do(Zs))p(Y|X, do(Z_s)): Upon careful review, we acknowledge that our assertion regarding the variation of p(Uxy)p(U_{xy}) was overstated. Our proposed interventional distribution, by setting specific values for ZsZ_s to mitigate the influence of the domain-specific variable UxU_x on ZsZ_s, effectively accounts for the invariant confounding effects and prevents the UxU_x from influencing YY. Hence, it is invariant and transportable across domains. However, it is important to note that this framework may not generalize to new domains with different and unknown confounding effects.

In light of this, we correct our initial assumption that the confounding effects, encompassing p(Uxy)p(U_{xy}) and the causal mechanisms between UxyU_{xy}, ZsZ_s, and YY, remain consistent across training and test domains. This invariant confounding assumption aligns with standard practices widely adopted in works utilizing interventional distributions [1]. We will incorporate such a revision into the paper.

Nevertheless, it's crucial to underscore that with the corrected assumption, our CIIRL remains innovative and has proven its efficacy across various benchmark distribution shift datasets: 1) With the revised assumptions, our proposed interventional distribution maintains invariance and transportability, facilitating cross-domain inference. 2) Our training procedure does not explicitly rely on the variation of p(Uxy)p(U_{xy}). The distinct classes of UU can be associated with the diverse domain variable UxU_x. Our training procedure can still effectively distinguish the two types of representations. 3) Empirical results affirm the existence of invariant latent confounders, as evidenced by the overall superior OOD prediction performance of CIIRL compared to predictions solely based on ZcZ_c.

[1] Mao, Chengzhi, et al. "Causal transportability for visual recognition." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Comparison to [2]: We appreciate that the reviewer recommends this relevant paper. The casual graph for the data generation process closely resembles our SCM. Specifically, our causal graph shares similarities with the one presented in [2] in the following aspects:

    1. we both separate the latent representation into (invariant) causal representation ZcZ_c and (variant) spurious representation ZsZ_s. The input XX is generated by both types of representation.
    1. The spurious representation varies from domain to domain since it is controlled by a domain-specific variable UxU_x.
    1. There exists a high-level invariance confounder between ZsZ_s and YY (Z~s\tilde{Z}_s in their graph).

However, the graph in [2] assumes that causal features as the parent variables to target while we use the child variables.

Algorithmically, both our method and iMSDA from [2] address the confounding issue. iMSDA utilizes domain index information to estimate the confounder, whereas we focus on constructing the interventional distribution. Furthermore, iMSDA is tailored for domain adaptation tasks and necessitates access to test domain data during training. Given these distinctions, a direct comparison between our method and iMSDA would not be equitable.

[2] Partial disentanglement for domain adaptation Lingjing Kong, Shaoan Xie, Weiran Yao, Yujia Zheng, Guangyi Chen, Petar Stojanov, Victor Akinwande, Kun Zhang Proceedings of the 39th International Conference on Machine Learning, PMLR 162:11455-11472, 2022.

Identifiability of representation Zc,ZsZ_c, Z_s: We strongly agree that establishing the identifiability of the latent variables ZZ is crucial for performing do-calculus on them. Therefore, we leverage the theoretical results presented by Kivva et al. (2022) to demonstrate the identifiability of the ZZ obtained through our learning framework. Please refer to the paragraph below the algorithm in Section 3.2 for more details.

[3] Kivva, Bohdan, et al. "Identifiability of deep generative models without auxiliary information." Advances in Neural Information Processing Systems 35 (2022): 15687-15701.

AC 元评审

The paper deals with domain generalization, where the target domain datasets are unobserved. This work intervenes on spurious representations to remove correlations and learn an invariant distribution. Their method performs well across various datasets, essentially focusing on learning causal and spurious representations to guide predictions based on an invariance relation.

All four reviews are toward rejection with ratings of 3, 3, 5, and 5 with confidence of 4, 4, 3, and 3 respectively. The issues raised by the reviews are critical including the theoretical plausibility (RUwh, kccw, Rf1b), and the clarity (ni4b).

为何不给更高分

The issues of theoretical plausibility and clarity.

为何不给更低分

N/A

最终决定

Reject