PaperHub
5.0
/10
withdrawn5 位审稿人
最低3最高6标准差1.1
5
6
6
5
3
3.2
置信度
ICLR 2024

InCo: Enhance Domain Generalization in Noisy Environments

OpenReviewPDF
提交: 2023-09-22更新: 2024-03-26
TL;DR

This work studies the field of IRM and reveals that prior IRM-related approaches may be ineffective in noisy environments, then a new method called InCo is proposed to address these challenges.

摘要

关键词
causalityinvariant risk minimizationdomain generalizationnoisy environments

评审与讨论

审稿意见
5

This work tackles the problem of domain generalization with noisy environments. The authors propose to use the correlation between representation and labels to solve this challenge. Theoretical and empirical results demonstrate the robustness of the proposed algorithm under noisy environments.

优点

  1. The paper is well written.

  2. The proposed method is effective in dealing with the noisy environment problem. The illustrative examples in section 2.3 make a convincing comparison between InCo and other methods.

  3. The development of causal relationship in Figure 4 is novel in the IRM setting.

缺点

  1. My biggest concern is that the motivation for the concept of environment noise is unclear because its difference from environmental features is very subtle. For the dog example, the snow and water can also be understood as generated by the environment. The mentioned environmental noise should already be present in any real-world datasets, even without manually injected noise. I hope the authors can provide a more detailed discussion about the motivation to treat them differently. If the authors can address this point clearly, I am willing to raise the score.

  2. Only from section 2.2, it is difficult to see how the algorithm is actually implemented. It will be clearer to put the computation steps in pseudo code, as the computation is not straightforward.

  3. The experimental result has limitation as the type of noise is limited to Gaussian noise. It will be more convincing if more realistic noise is added, but it all comes down to the authors’ definition of environment noise, which is not well defined.

问题

Is there a reason why only PACS and VLCS are picked from DomainBed?

审稿意见
6

This paper introduces a new method, InCo, to improve invariant risk minimization (IRM) in noisy environments. The central idea is to optimize the invariant correlation between the learned representation and labels across different training environments. The authors use a case study to demonstrate that existing IRM methods (IRMv1 and VREx) can fail in noisy settings, whereas InCo succeeds. They then offer theoretical analysis to prove that invariant correlation is a necessary condition for optimal invariant predictor in such settings. InCo is further evaluated on the Colored MNIST, Circle datasets, and noisy DomainBed benchmarks. It consistently outperforms baseline methods like ERM, IRMv1, and VREx in noisy conditions.

优点

  • [Novelty and Significance] The paper clearly identifies and addresses the problem of IRM methods failing in noisy environments through an intuitive case study and experiments. This issue of handling noise appears to have been overlooked in previous IRM literature, making the paper's focus both novel and significant. The proposed InCo method employs a simple and intuitive approach for optimizing invariant correlation between representations and labels. Essentially, this method extends VREx by modifying the variance regularization term. The rationale behind this loss design is well-illustrated and justified using a two-bit toy example.

  • [Theoretical Guarantee] The authors employ a two-bit toy example to analytically examine different IRM algorithms and provide an intuitive explanation for InCo's performance in noisy settings. Moreover, they theoretically analyze InCo from a causality perspective and prove that invariant correlation is necessary for optimal invariant predictors in noisy conditions.

  • [Emprical Justification] The paper offers comprehensive empirical validation through experiments on diverse image classification datasets, such as Colored MNIST, Circle datasets, and others. InCo consistently outperforms IRM baselines by wide margins in noisy settings, thereby demonstrating its effectiveness.

缺点

  • [Clarification on Loss Design] The paper would benefit from a more detailed explanation of InCo's loss design in Section 2.2. While it's evident that InCo extends VREx by modifying its variance term—from calculating the variance of loss to the variance of correlation between prediction and label—the rationale behind this change could be made explicit. Additionally, it may be worth exploring the variance term in InCo as equivalent to the variance of a specific loss function, such as the unbounded hinge loss, L=1yf(x) L = 1 - y \cdot f(x)). Providing this perspective could offer valuable insights into why the unbounded hinge loss is a superior choice for the loss function in this context.

  • [Extensibility to Multi-class Labels] The current version of the InCo method seems to be limited to binary classification. Clarification is needed on whether the method can be extended to handle multi-class scenarios. Given that the DomainNet dataset involves multi-class classification, further details on how InCo is implemented in such a setting is needed.

  • [Noise Implementation for Image Experiments] In the ColorMNIST experiments, the paper states, "There are three training groups in our experiments: {0,0}, {0, N(0, 0.5)}, and {0, N(0, 1)}." I don't understand how the noise is specifically implemented, given that the input is an image rather than a scalar. This question also applies to the DomainNet experiment.

问题

See weaknesses above.

审稿意见
6

This paper proposed a new training objective for OOD generalization, which is to find the invariant correlation between the prediction and the label. They also derived theoretical guarantees for the proposed method. The paper included case studies to explain why IRMv1 and VREx fail to obtain optimal solutions and why InCo can succeed. The experiments showed the merits of the method.

优点

For noisy data, the invariant correlation between the prediction and the label across environments is well-motivated and intuitive. The case studies are easy to follow. The experiments are comprehensive and convincing.

缺点

Emphasizing the noise data, this paper seems to focus on the situation of adding perturbations or noises to images. However, noisy data also include those beyond computer vision. The case studies only used toy datasets simulated by simple processes. I am curious about how the method will perform on tabular datasets which may show high ``noise-to-signal'' ratios. The first time I saw the term noise in the paper, I thought it was about tabular datasets, which have inherent noises and need no artificial perturbations.

问题

Please see the Weaknesses section.

审稿意见
5

The paper studies how to learn invariant features in noisy environments to improve domain generalization ability, where the noisy environments (inherent noise) can impact the invariant feature resulting in different inherent losses. The authors proposed Invariant Correlation (InCo) regularization to solve the problem. The method was tested on Noisy PACS and Noisy VLCS. The authors also provide a linear regression analysis to support its method's effectiveness and intuition.

优点

  • The paper has good writing. The introduction is insightful and the comparison to other invariant feature methods is clear.
  • The paper’s analysis and experiments are consistent with each other. The simulation experiments and the analysis are in support of each other. Although I did not check the proof in detail, it seems the intuition is correct in the linear regression setting from my perspective.

缺点

  • Analysis: The analysis part is weak and the setting is simple from my perspective. The paper only considers linear models and linear regression under square loss. Thus, (1) it will make the classifier weights vv and representation weights Φ\Phi be merged to ww and somehow there is no classifier layer; (2) the analysis under the regression setting may not general enough to cover the problem defined in the classification setting. It would be good to consider logistic loss with some non-linearity, e.g., a 2-layer neural network.
  • Experiment: The experiment's part is weak. (1) In Table 3 about Noisy PACS and Noisy VLCS, the author defines the noise by Gaussian perturbations. This kind of noise is too artificial. If the authors would like to show the method’s effectiveness, InCo should be evaluated on some real noisy domain generalization datasets rather than PACS with Gaussian perturbations. Table 3 can only show that InCo is a good regularization against Gaussian perturbations, but we do not know whether InCo works well in practical noisy environments. (2) Even for Table 3, it would be good to show the performance under different noise power levels. (3) Many of the latest SOTA works are missing in Table 3, e.g., [1,2] and many others.
  • λ\lambda: The method needs a hyper-parameter λ\lambda. The paper only studies it in the simulation experiments. It should be studied in real datasets as well. Also, how to select λ\lambda efficiently in practice is unknown.

[1] Junbum Cha, Kyungjae Lee, Sungrae Park, and Sanghyuk Chun. Domain generalization by mutual-information regularization with pre-trained models. ECCV 2022.

[2] Junbum Cha, Sanghyuk Chun, Kyungjae Lee, Han-Cheol Cho, Seunghyun Park, Yunsung Lee, and Sungrae Park. Swad: Domain generalization by seeking flat minima. NeurIPS 2021.

问题

  • It seems Re(w)\mathcal{R}^e(w) and R(xe,w)\mathcal{R}(x^e,w) are the same. What is the difference?
  • Why do two equations in (2) share the same ηe\eta^e, while Figure 4 (b) shows they are different?

伦理问题详情

NA

审稿意见
3

This work proposes an objective function for learning feature representations in noisy environments. Theoretical analysis is provided to show the effectiveness of the proposed method and the failures of the previous methods, while some important aspects are not studied. A major limitation of the work is the restrictive model assumption that the causal features and the label have invariant distributions. The proposed method is shown to be superior over the baselines in the restrictive setting.

优点

  1. The idea of invariant correlation is a new perspective for studying the invariance principle.

  2. Sufficient empirical results show the effectiveness of the proposed method.

缺点

  1. In the considered setting, xinv^\widehat{x_{inv}} and y^\hat{y} are assumed to have invariant distributions, which is very restrictive. The idea of invariant correlation should focus on the invariance of the linear relation between xinv^\widehat{x_{inv}} and y^\hat{y} (i.e., γ\gamma), rather than the statistical correlation. In the IRM framework, the only parameter that is required to be invariant is γ\gamma. I think a simple adjustment to InCo is to divide the correlation ρf,ye(w)\rho_{f,y}^{e}(w) by the standard deviation of f(xe;w)f(x^{e};w)

  2. Fig 4. (a) is not what IRM concerns. Check Fig.3 from (Arjovskyetal., 2019).

  3. Theorem 3.1 shows that the regularization term is zero when ff is a linear function of the invariant features, which is a straightforward result. The more important problem is whether a zero regularization term implies that ff is a linear function of the invariant features, as shown in Theorem 9 (Arjovskyetal., 2019). Without this result, it is not clear whether solving InCo helps to identify the invariant features.

  4. Regarding the presentation, I think section 2.3 should be shortened. A simple example does not show the general behavior of the proposed method and the baselines. Two pages are quite redundant for such an example.

问题

A major limitation of the work is the model assumption mentioned above. A question is whether the assumption can be relaxed. I have suggested a simple adjustment. Does the adjustment fixes the problem?

I think the idea of invariant correlation is promising, but the work is not ready in its current form.