Causal Discovery with Unobserved Variables: A Proxy Variable Approach

Mingzhou Liu,Xinwei Sun,Yu QIAO,Yizhou Wang

OpenReview PDF

提交: 2023-09-22更新: 2024-03-26

TL;DR

We propose a proxy-variable-based method to identify causal relations when unobserved variables are present.

摘要

关键词

causal discoveryunobserved variablesproxy variablesdiscretization

评审与讨论

审稿意见

评分: 3置信度: 52023-10-31

This work proposes a hypothesis to identify the causal direction under the existence of unobserved variable. By assuming a proxy variable of the unobserved variables exists, these work try to extend the results in the discrete data (Miao et al. (2018)) to the continuous data by using the discretization.

优点

This work propose a proxy variable approach for identifying the causal relationship under the existence of unobserved variables.

缺点

The contribution of this work seems somewhat limited as it only an extension of the previous work Miao et al. (2018) by using the discretization.
This work supposes that the matrix P(W|U,x) is invertible after the discretization. However, unlike the discrete case, such condition can be possibly violate when the original data is continuous after being discretized, and it is necessary to discuss that in what condition and in which type of relationship that such invertbility holds.
Moreover, based on the work in Miao et al. (2018), it seems that several additional assumptions are also required other than the invertable matrix one, and it is not disclosed and discussed in this work.
In fact, instead of discretizing the data, is it possible to directly test the independence using the continuous information?

问题

See the weaknesses above.

审稿意见

评分: 3置信度: 42023-11-01

Causal discovery methods do not work when there is a hidden confounder between two variables being tested. However, sometimes proxy variables (children of hidden confounders) can provide information about the hidden confounders, which can then be used to correctly identify causal relationships between variables. Previous work has attempted this but only for discrete variables. The current work attempts to find assumptions such that continuous variables that can be properly discretised such that the proxy causal discovery of previous work can be applied.

优点

The paper tackles an important problem, that is causal discovery in the presence of hidden confounders.

缺点

The presentation is not great. There are numerous references to assumptions and models that are not well defined. The example 1.1 is entirely unclear, the details of it can be guessed at after reading the paper, but this is not a good thing. Figure 1b) is very unclear.
I'm a bit unsure about the differences between previous works. It seems like the analysis and testing procedure are very similar to previous works. More specifically, it seems like two different works have been combined without too much novelty (see questions below).

问题

What is figure 1b actually showing? Its not obvious that it is showing what you are claiming it is showing.
Figure 1c. what independence is being measured here?
Asm 4.1 is referred to multiple times, but there is no Asm 4.1 in the paper.
Its not clear to me and it isn't explained why the discretisation can break the required independence structure? It will be useful if some intuition or reasoning is provided.
Corollary 4.7: Where are models (a)-(b) defined?
Given that the number of bins controls the trade-off between type 1 and type 2 errors, is there a heuristic for choosing this when a user does not have access to the ground truth?
What exactly is the difference in Section 4.2 between your work and Warren (2021)? If the variable is unobserved, why does the theory of the previous work not hold in this section?
In Section 4.3, what is the difference between your work and Miao et el (2018)? Is it just that you are applying a discretising procedure first?

审稿意见

评分: 6置信度: 32023-11-01

In this manuscript, a novel proximal-based hypothesis testing method has been proposed, and it comes accompanied by provable consistency. Notably, the authors have identified certain smoothness conditions that are compatible with several causal models, notably including Additive Noise Models. Experiments have been performed using both synthetic data sets and real-world data to validate the proposed method.

优点

The manuscript does an excellent job of articulating the motivation behind the proposed method.
The analysis provided for the discretization is not only easy-to-follow but also enlightening, offering potential insights for readers in the domain.

缺点

The authors themselves have acknowledged a potential avenue of exploration: it would indeed be intriguing to see how the proposed test integrates with existing constraint-based methods. While this is not currently addressed, it presents an interesting direction for future research.
A notable omission is the lack of experimental evaluation or in-depth theoretical discussion concerning the scalability of the proposed method. This oversight might result in some reservations for practitioners considering the implementation of the method in expansive real-world situations.

问题

Could the authors elaborate on the specific assumption referenced in Sec 5.1? The section mentions, ‘Under Asm. ??, this means …’ but it isn't clear what this refers to.

审稿意见

评分: 6置信度: 32023-11-04

The authors propose a method to extend the discrete proxy-based causal discovery method to continuous cases. Their method is based on a comprehensive analysis regarding discretization error. The authors claim that the discretization error can be reduced to an infinitesimal level, provided the proxy is discretized with sufficiently fine bins.

优点

The authors present a theoretical analysis of discretization error. They also give a profound theoretical study on the asymptotic validity of the method.

缺点

There are several issues the authors need to address.

The paper is just an extension of an existing method, and the contribution of the paper is limited and incremental.
The experimental study is not sufficient to validate the effectiveness of the method. The authors could provide additional experimental results on multiple real-world datasets to show the benefits of the approach.

问题

The authors could add more experimental studies on additional real-world datasets to strengthen the paper.