4.3

/10

Rejected4 位审稿人

最低3最高6标准差1.3

3.3

置信度

正确性2.3

贡献度2.0

表达2.5

ICLR 2025

CausalDiffusion: Causally Related Time-Series Generation through Diffusion Models

Giuseppe Masi,Andrea Coletta,Novella Bartolini

OpenReview PDF

提交: 2024-09-27更新: 2025-02-05

TL;DR

A novel pipeline capable of generating realistic time-series along with a ground truth causal graph that is generalizable to different fields.

摘要

关键词

time-series generationcausal discoverydiffusion modelbenchmarkdatasetsynthetic data

评审与讨论

审稿意见

评分: 5置信度: 32024-10-29

The author proposes a novel generative framework named CausalDiffusion that generates a causal graph and the time series directly within a diffusion model architecture.

优点

The generation pipeline is notable for not assuming stationarity, allowing for the generation of diverse graph types.
To my knowledge, this is the first work addressing causal time-series generation using diffusion processes.
The author provides evaluation of causal graphs in the experiments.

缺点

While the method is straightforward, there is excessive focus on dataset descriptions, metrics, and evaluations, leading to a lack of depth in analyzing the rationale behind the method.
I am skeptical about the overarching scheme of learning sparse causal graphs while generating time series. This process resembles causal discovery based on generated time series, which are then used to evaluate other causal discovery algorithms. Essentially, the approach is "discovering a causal graph to generate time series, and then testing causal discovery algorithms," but there's no guarantee that the causal graphs learned during this process closely align with true causal graphs present in nature. Still, I am open to debates. If I am convinced, I will be happy to raise my score.

问题

In Lines 230-236, after $\operatorname{DEN}_\theta$ outputs the initial steps, subsequent steps are generated using a linear model: $\hat{x}_0[l, i]=\mathbf{c}_l^i \cdot \hat{\mathbf{x}}_0 \left[ l - \tau: l-1,:\right]$ While I understand that the weights evolve over time, can this generation methodology effectively generalize to accommodate any nonlinearity? A more rigorous theoretical analysis is needed.
In Lines 255-257, the authors state that "we aggregate over the time steps by computing the 95th percentile." This suggests that the causal graph $\hat{g}$ may not be a strict causal summary graph. For instance, if the weights along the time axis are $[0, 0, ..., 0, 0, 1]$ , the 95th percentile could still be zero. This raises concerns about the causal graph's fidelity to the generated time series.

Minor:

The notation $x_0[l,i]$ in Line 227 can be confusing, as the superscripts and subscripts do not align with $c_l^i$ . Could the notation be improved for clarity, such as using $x_{(l,i)}^0$ ?

评论- Response to weaknesses and questions

2024-11-19

We sincerely thank the reviewer for the thoughtful comments, which we believe will significantly help improve the paper.

Weakness 1

Thank you for this comment. We will add theoretical support to the rationale behind our method. Indeed, (Hyvärinen et al., 2010) prove that if the time resolution of the measurements is higher than the time-scale of causal influences, one can estimate a classic autoregressive (AR) model with time-lagged variables and interpret the autoregressive coefficients as causal effects. In particular, they prove that causal effect matrices can be consistently, and computationally efficiently, estimated from the coefficients of the VAR model by classic least-squares methods. Therefore, thanks to this result, incorporating a VAR model in the reconstruction may provide more solid guarantees. We will improve the section explaining the causal reconstruction to highlight the theoretical ground of our approach.

Weakness 2

Thank you for this comment. In light of what was stated for weakness 1, the generated VAR representation establishes a clear and coherent relationship between the causal structure and the generated time-series, as described in (Hyvärinen et al. 2010). This allows us to think that our work is an improvement of the current State-of-the-Art, represented by (Cheng et al., 2024) using a two-stage process with a posteriori causal graph inference via DeepSHAP.

Answer to Q1

Thank you for this comment. Nonlinearities are not handled, so our samples only include linear causal relationships.

Answer to Q2

Our approach to summarize a causal graph from the VAR coefficients respects the following formal definition of causal relationships. Let $\rho$ be the percentage of causal relationships we want to keep in the synthetic dataset. For a synthetic sample $\hat{\boldsymbol{x}}$ , we say that variable $\hat{\boldsymbol{x}}_i$ $\langle\rho,p\rangle$ -causes variable $\hat{\boldsymbol{x}}_j$ with a lag of $\tau$ if the $p$ -percentile of the coefficients of the VAR model at lag $\tau$ , i.e. giving the effect from $\hat{\boldsymbol{x}}_i(t-\tau)$ to $\hat{\boldsymbol{x}}_j(t)$ , is among the $\rho$ % highest values. We underline that $\rho$ and $p$ refer to the dataset and the single sample, respectively. Our approach followed such a definition, also employed by (Cheng et al., 2024), but in the literature there are some other definitions. For instance, (Hyvärinen et al., 2010) define that a variable $x_i$ causes $x_j$ with a lag of $\tau$ if at least one of the coefficients of the VAR model at lag $\tau$ , i.e. giving the effect from $x_i(t-\tau)$ to $x_j(t)$ , is (significantly) non-zero for $\tau \geq 0$ . These definitions can be adopted by setting the related parameter values by using domain knowledge or through hyper-parameter tuning techniques. In our experiments, our approach turned out to be the best one and it also allows the analysis of the coefficients' distribution to tune a custom threshold and only keep the strongest connections.

Moreover, to deal with non-stationary data and ensure graph fidelity to the time-series, we also executed our benchmark extracting the causal graph, and keeping only the strongest connections, per-sample. In this way, all the synthetic time-series are associated with a non-empty causal graph.

In the revised version of the article, we will add a formalization of causal relationships to motivate the methodological aspects of our settings.

Answer to Q3

Yes.

We thank again the reviewer. We hope the responses will be satisfactory.

评论- Paper update

2024-11-23

We have updated the article following the discussion. Changes are highlighted in blue. In particular, Section 4.2 and Section 4.3 have been revised to add the theoretical background of our approach. Section 7 now include a discussion about the limitations of our work. Thank you again.

2024-11-26

I thank the authors for the feedback. However it seems that nonlinearities are not handles, which is a serious limitation. I will keep my score.

审稿意见

评分: 6置信度: 32024-11-03

The paper introduces CausalDiffusion, a generative framework for synthesizing time-series with an associated causal graph, addressing a significant limitation in Time-Series Causal Discovery (TSCD). CausalDiffusion is the first framework to integrate diffusion models for generating causally related multivariate time-series data, alongside a ground-truth causal graph that abstracts temporal dependencies. The model employs a tau-lag Vector Autoregressive (VAR) structure within the diffusion model to naturally encode these causal relationships. The experiments demonstrate superior performance in generating realistic time-series data and more accurately recovering causal graphs compared to existing approaches. The paper also introduces new evaluation metrics and benchmarks various causal discovery algorithms on the generated synthetic datasets.

优点

Novelty in Causal Time-Series Generation: The use of diffusion models to generate causally related time series, coupled with simultaneous causal graph extraction, is a novel approach. Unlike previous methods, which rely on post hoc analysis to infer causal graphs, this framework generates the causal graph directly alongside the time series.
Integration of Diffusion with Causal Discovery: The incorporation of VAR structures within the diffusion model enables realistic generation while maintaining explicit causal links. This integration allows CausalDiffusion to support unsupervised learning of causal graphs, making it applicable to a variety of time-series datasets.
Thorough Evaluation Across Real and Synthetic Datasets: The authors provide a comprehensive evaluation of the proposed model on both real-world and synthetic datasets, including Hénon, Rivers, and Air Quality Index (AQI) datasets. The results consistently show that CausalDiffusion outperforms other SOTA generative models like CAUSALTIME and CR-VAE in generating realistic time-series data and more accurately recovering causal relationships.
Benchmarking for TSCD Algorithms: The paper also uses the generated datasets to benchmark causal discovery algorithms such as DYNOTEARS, PCMCI+, and CUTS+. The extensive comparison across different TSCD methods underscores the practical utility of the synthetic datasets for evaluating and improving causal inference models.
Metrics for Causal Evaluation: The introduction of new evaluation metrics, including Granger Causality False Positive Rate (GC-FPR) and Graph False Positive Rate (Graph-FPR), provides more robust tools for assessing the quality of the generated causal graphs.

缺点

Comparison Against State-of-the-Art Approaches: The comparison against state-of-the-art methods could be improved by incorporating a broader range of generative models, including more recent GAN-based and attention-based models used in time-series generation. Furthermore, the evaluation is somewhat limited in showcasing how these approaches would fare against real-world causal discovery problems beyond the chosen datasets.
Majority Strategy for Causal Graph Extraction: The aggregation strategy for extracting causal graphs based on the strongest 95th percentile coefficients might lead to overlooking personalized or localized causal relationships, which could be critical in diverse or non-stationary data. This limitation might impact the framework's ability to capture more nuanced causality specific to different time points or individuals.
Scalability Concerns: Training a diffusion model for high-dimensional time series is computationally demanding. The experiments show that the inference time for generating both synthetic data and causal graphs, especially when using the complete model (OUR W/L2 W/DTW), was substantially longer compared to simpler alternatives such as CR-VAE. This might limit the usability of CausalDiffusion in real-time scenarios or for larger-scale datasets without further optimization.
Dependency on Hyperparameter Tuning: The model's performance is highly dependent on hyperparameter tuning, particularly the regularization terms such as the ℓ2-norm and the Dynamic Time Warping (DTW) loss. The authors present several variants of their model, and their results suggest that a significant degree of manual tuning is required to achieve the best performance. This makes the framework less accessible to practitioners without a deep understanding of the hyperparameter effects.
Limited Analysis of Real-World Applicability: The evaluation of real-world datasets (e.g., AQI) does not explicitly address potential confounders or latent variables that could affect causality. It is important to assess how latent confounders would impact the generated causal graphs, especially in complex real-world applications like healthcare or finance.

问题

How can the scalability of the CausalDiffusion model be improved to facilitate real-time usage? Could techniques such as implicit diffusion models or parallelization be used to reduce computational overhead?
How does the framework handle latent confounders that might not be explicitly represented in the training data?
How robust is the graph extraction method to noise in the time series? Would adding a denoising or regularization mechanism help in stabilizing the graph generation process?
Beyond the specific datasets used, how generalizable is the model to different time-series domains (e.g., healthcare or financial data)?

评论- Response to weaknesses and questions

2024-11-19

We sincerely thank the reviewer for helping us improving our paper. Below, we address each comment in detail:

Weakness 1

We evaluated our method against models capable of generating both the time-series and the causal graph. A comparison with a broader range of generative models able to generate time-series could help in assessing the quality of the time-series only, but we are also interested in the causal representation of the synthetic sample. We will stress this aspect in the related work section.

Weakness 2:

These definitions can be adopted by setting the related parameter values by using domain knowledge or through hyper-parameter tuning techniques. In our experiments, our approach turned out to be the best one and it also allows the analysis of the coefficients' distribution to tune a custom threshold and only keep the strongest connections. Moreover, to deal with non-stationary data and ensure graph fidelity to the time-series, we also executed our benchmark extracting the causal graph, and keeping only the strongest connections, per-sample. In this way, all the synthetic time-series are associated with a non-empty causal graph.

In the revised version of the article, we will add a formalization of causal relationships to motivate the methodological aspects of our settings.

Weakness 3

We would like to point out that our model is still way faster than CausalTime, the main competitor of our work, since our work is the only one able to produce both the time-series and the associated full-time causal graph. Regarding CR-VAE, on the one hand, it is true that it is very fast, but, on the other hand, it only generates Granger Causality matrices. Moreover, the causal matrix is the same for all the synthetic samples since it is model-dependent. Finally, several works in the literature, such as (Song et. al 2020), developed techniques to accelerate the sampling procedure of diffusion models. We will highlight these aspects in the experiment and conclusion sections. In conclusion, regarding the training phase, in our experiments the training losses converged after a few epochs.

Weakness 4

We acknowledge that hyper-parameters tuning is a well-recognized challenge in deep learning, in general, requiring time and effort to tune a model optimally. However, we found out that the same hyper-parameters work well with respect to all the datasets involved in our experiments, which may simplify the adoption of our work in different scenarios. Moreover, we will release our code including an automatic hyper-parameter search.

Weakness 5

In our work, we assume causal sufficiency, i.e. no latent confounders in our datasets. We recognize that this may be a strong assumption limiting the approach. However, this working hypothesis is in line with the State-of-the-Art of this generating task, and we plan to address it in future work. Instead, we would like to highlight that we drop the stationarity hypothesis, made by most of the State-of-the-Art algorithms. In the revised paper we will add a limitations section in the article to discuss such limitations and future work. Finally, this paper aims to --- and succeed --- in improving the quality and the reliability of the synthetic samples with respect to the current State-of-the-Art.

Answer to Q1

See Weakness 3.

Answer to Q2

See Weakness 5.

Answer to Q3

We will provide an experiment in which we added Gaussian noise to the training time-series and see how the graphs are robust.

Answer to Q4

We conducted experiments in three different scenarios highlighting the ability of our model to be applied in several scenarios. Moreover, we will conduct the experiment to test the robustness of the graph extraction to noisy time-series, verifying the applicability of our framework to more noisy scenarios, like finance.

评论- Paper update

2024-11-23

Regarding the robustness of the graph extraction to noise in the time-series, we conducted the following experiment. We trained our model in a setting where Gaussian noise $z \sim \mathcal{N}(0, \sigma^2)$ has been added to the training time-series. The Table below reports the results of the Graph-FPR for $\sigma^2 = 0.005$ and $\sigma^2 = 0.01$ showing the robustness of the extraction to noise in the time-series.

Dataset	Noise Variance
	$\sigma^2=0$	$\sigma^2=0.005$	$\sigma^2=0.01$
Hénon	$0.05 \pm 0.00$	$0.02 \pm 0.00$	$0.02 \pm 0.00$
Rivers	$0.07 \pm 0.00$	$0.01 \pm 0.00$	$0.01 \pm 0.00$

Thank you again.

2024-11-25

Thank you very much for the detailed responses and they clarified most of my concerns.

审稿意见

评分: 3置信度: 32024-11-03

the paper proposes a time series datasets, that are generated by using diffusion models. Ground-truth Granger-causal graphs can also be extracted from the diffusion model. Experiments result show, by using stand metrics and a modified F1 score , the generated samples and graphs are better.

优点

the paper proposes to generate both data and ground truth causal graphs using diffusion models, which is a need in literature in evaluating different dynamic causal models.
the generated samples are valided to be better

缺点

overall, the novelty of the paper seems limited. It builds on the same approach to an existing work(Yuan and Qiao, 2024), and seems to only add L1/L2regularzations on top (solely for the post processing step to extract the causal graphs). Technically it is standard.
Moreover, it is not fully clear that graphs extracted, which is based on L1 and L2 regularization, can serve as the ground truth causal graphs. There seems a lack of theoretical justification on this part: under what assumptions can the causal graphs be extracted here, with what guarantees?
A lot of the compared algorithms model more than just Granger causality and are still compared. It is not clear what kind of causal graphs are generated. In addition, some heuristic are also used here (95% and 1% sparsity) - not sure these can produce reliable graphs. A more rigorous justification for why their regularization approach produces valid causal graphs is needed.

Other Comments:

L370: only FPR is explained.
there are many evaluation metrics, but not mathematically defined. In addition, it would be good to provide a prioritized list of metrics, explaining which are most crucial for evaluating the quality and usefulness of their generated data and graphs. This would help readers understand how to interpret the results more effectively.

问题

mainly how the causal graphs can be thought as ground truth

评论- First comment

2024-11-19

We sincerely thank the reviewer for the comments and valuable suggestions to improve our paper. Below, we address each comment in detail:

Weakness 1

Our work is the first that by employing a diffusion-based model emphasizes the necessity of generating causally-interpretable time-series. Indeed, unlike the work of (Yuan and Qiao, 2024), our model simultaneously generates multiple time-series and their associated causal graphs. While we build upon an established class of generative approaches, we would like to highlight that from the work (Yuan and Qiao, 2024) we have taken only the formulation of forward and backward processes typical of diffusion models and also present in other articles such as (Ho et al., 2020, Coletta et al., 2024).

Weakness 2

Our approach is justified by an acknowledged technique to identify causal relationships from the estimated VAR coefficients. For instance, the work of (Hyvärinen et al., 2010) proves that if the time resolution of the measurements is higher than the time-scale of causal influences, one can estimate a classic autoregressive (AR) model with time-lagged variables and interpret the autoregressive coefficients as causal effects. In particular, they prove that causal effect matrices can be consistently, and computationally efficiently, estimated from the coefficients of the VAR model by means of least-squares methods. Therefore, in agreement with this result, incorporating a VAR model in the reconstruction may provide more solid guarantees. In the revised version of the article, we will highlight the theoretical background of our approach. We also acknowledge the lack of formal proof, which requires the formalization of the most appropriate tuning of the sampling period and scaling of the intensity of the observed phenomena. We will add a thorough discussion of these aspects in the revised version of the paper.

Weakness 3

We compared algorithms modeling more than just Granger causality since our model is able to generate the full-time causal graph, modeling causal relationships with respect to precise values of lag. We will stress this aspect in the benchmarking section.

Regarding the approach to summarize a causal graph from the VAR coefficients, we use the following formal definition of causal relationship. Let $\rho$ be the percentage of causal relationships we want to keep in the synthetic dataset. For a synthetic sample $\hat{\boldsymbol{x}}$ , we say that variable $\hat{\boldsymbol{x}}_i$ $\langle\rho,p\rangle$ -causes variable $\hat{\boldsymbol{x}}_j$ with a lag of $\tau$ if the $p$ -percentile of the coefficients of the VAR model at lag $\tau$ , i.e. giving the effect from $\hat{\boldsymbol{x}}_i(t-\tau)$ to $\hat{\boldsymbol{x}}_j(t)$ , is among the $\rho$ % highest values. We underline that $\rho$ and $p$ refer to the dataset and the single sample, respectively. Our approach followed such a definition, also employed by (Cheng et al., 2024), but in the literature there are some other definitions. For instance, (Hyvärinen et al., 2010) define that a variable $x_i$ causes $x_j$ with a lag of $\tau$ if at least one of the coefficients of the VAR model at lag $\tau$ , i.e. giving the effect from $x_i(t-\tau)$ to $x_j(t)$ , is (significantly) non-zero for $\tau \geq 0$ .

Our definition can be extended to incorporate expert knowledge of the generated phenomena or by hyper-parameters tuning. In our experiments, our approach turned out to be the best one and it also allows the analysis of the coefficients' distribution to tune a custom threshold and only keep the strongest connections. In the revised version of the article, we will provide the formal definition of $\langle \rho, p \rangle$ -causality and motivate the methodological aspects of our settings.

Weaknesses 4&5

Thank you for the comments, we will add missing details.

Answer to question

The causal graph can be thought as ground truth since they are extracted from the VAR coefficients generating the time-series. Indeed, as stated by (Hyvärinen et al., 2010) the autoregressive coefficients can be interpreted as causal effects.

We thank again the reviewer. We hope the responses will be satisfactory.

评论- Paper update

2024-11-23

审稿意见

评分: 3置信度: 42024-11-03

Structure learning from time series and diffusion models have been of high interest to a subset of the community. The current work proposes a diffusion model for the generative task of time series generation and causal structure learning. A key motivation for the work seems to be that " ..., this is the first work to incorporate diffusion models for causally related time-series generation." However being first is not a merit on its own but should be accompanied by having a clear motivation i.e. there is no particular motivation or use of a diffusion model beyond being a good generative model. But no specific property of a diffusion model seems to be used beyond the fact that it is a good generative model.

Opportunities for improvement: i) In order to justify the interest of the community it would be good if then at least all possible variations or closely related models of the diffusion model class are studied e.g. flow matching, Generator Matching, consistency models etc and evaluated rather than just being first for a special case of diffusion models. Why was the paper not studied with the more general class of flow matching for example? Why not use a consistency model etc?

ii) The tile or short "CausalDiffusion" is misleading since and potentially an abbreviation with time should be used instead. It just seems to be too optimized for google hits.

iii) The Markovian condition seems to be a very strong assumption. While it might be true that other works assume it too, it would be good to see some progress in the number of assumptions we need to make when developing new approaches. The same holds true for causal sufficiency i.e. no latent confounding and causal stationarity. These are all bottlenecks for practical deployment. At least for the number of assumptions the approach makes, there is no progress to see compared to even very old approaches. Given that these assumptions are very limiting in practical use cases, I would expect to see a clear ablation on the robustness against miss-specification of these assumptions when a new algorithm is introduced. For practical relevance, it is much less interesting to achieve SOTA on a leaderboard compared to being robust against assumption violation. See e.g. [1] where this is done for causal discovery algorithms. Without this analysis or better guarantees it is highly unlikely the approach would actually be used in practice and it should not be up to the reader to perform that analysis.

iv) Moreover it would be good to see the evaluation not wrt. metrics like dimensionality reduction but wrt. especially real world downstream tasks e.g. by training a policy on top of the learned graphs or a classification task on top of the learned graph. Interpreting UMAPS and t-SNEs should only be done if no other metric is available.

v) Rather than tables with the top performance it would be better to show the classical training vs performance curves used in deep learning e.g. training time on the x-axis and performance for a metric on the y-axis. Given that there is no finite sample guarantee or any guarantee, but it is a purely heuristic approach it would be better to shift to these plots similarly to how you show training curves in deep learning.

vi) Where are the results on the Dream3 dataset which is mentioned but does not seem to be evaluated on?

Unfortunately, in its current form, I do not see a clear motivation or a clear benefit of the work and do not think it is ready to be presented at the conference. I did not generate the review with an LLM but spent significant time reading the paper since I find the problem interesting and hope that the authors find the review helpful.

[1] Montagna, Francesco, et al. "Assumption violations in causal discovery and the robustness of score matching." Advances in Neural Information Processing Systems 36 (2024).

优点

see above

缺点

see above

问题

One point which is unclear to me but I might have misunderstood that is "using our synthetically generated datasets, highlighting the practical benefits of our data". The benefit as far as I see was simply benchmarking, which could have been achieved with the original data. It would have been a real practical benefit if the model generated synthetic data which could have been used to augment the original real world data for a better performance, but that was not demonstrated. So I unfortunately missed a key contribution or motivation of the work especially since the paper does a very good job in mentioning all the established benchmarks and datasets.

评论- Response to weaknesses and questions

2024-11-19

We sincerely thank the reviewer for the thoughtful comments, which we believe will significantly help improve the paper. In particular, we will clarify the primary goal of our work: we aim to generate data through an internal VAR representation, learned by reconstructing the original time-series. We find our approach particularly compelling compared to existing work for the following reasons:

Our method generates distinct causal graphs tailored to each time-series --- unlike (Li et al. 2023), which assumes a single causal graph for all time-series and relies on the stationarity assumption. With the ability of learning individual graphs, we can instead relax the stationary assumption on the dataset.
In our approach, the generated VAR representation directly constructs the time-series, establishing a clear and coherent relationship between the causal structure and the generated time-series, as described in (Hyvärinen et al. 2010). Instead (Cheng et al. 2024) use a two-stage process with a posteriori causal graph inference via DeepSHAP.
Eventually, our method demonstrates superior performance across several metrics compared to existing approaches.

Although many deep generative models could be adapted for this task, we believe our contribution lies in introducing --- and providing an example of --- a more structurally grounded approach to generating paired causal graphs and time-series. Finally, we would like to highlight that diffusion-based generative models enable a conditional generation without the need of retraining the model.

Below, we address each comment in detail:

Weakness 1

We focused on the particular class of diffusion-based generative models because of their excellent performance and the possibility to guide (and condition) the generation of time-series without the need of re-training the model, for instance following the approach of (Coletta et al., 2024).

Weakness 2

While we now noticed that the acronym may not fully capture the essence of our contribution, we can find an acronym to include an insight about time-series generation.

Weakness 3

We agree with the reviewer that addressing some of the mentioned assumptions is indeed interesting and important. Indeed, our work aimed to advance the State-of-the-Art by not assuming stationarity across the dataset. By generating a distinct graph for each sample, we relax this assumption and enhance the flexibility of our approach. We will include a comprehensive discussion of the limitations in a dedicated section of the paper, along with suggestions for future work to address these challenges.

Weakness 4

Thank you for this comment. The dimensionality reduction metrics are employed only for the time-series evaluation, along with several quantitative metrics like discriminative score, predictive score, authenticity, and others. Regarding the causal graphs, we employed the false positive rate of the causal connections. Moreover, we are actively working to include a new downstream task that involves predicting the $i$ -th time-series given the other dimensions and the corresponding causal graph. We believe this task provides a meaningful way to evaluate the realism of the generated causal graphs, reflecting a practical use case where the causal graph is leveraged to estimate the values of a target dimension based on its causal relationships.

Weakness 5

We will add new charts with the training time on x-axis and evaluation metrics on the y-axis.

Weakness 6

The Dream3 was not included since it is limited in size and only Granger Causality matrices are provided as ground truth, as also stated in (Tank et al. 2021). Indeed, it has not been considered by the two works addressing the task of generating causally-related time-series we compare with. We will add these details while reviewing the datasets involving causal relationships.

Answer to question:

In general, we believe that research on synthetic data offers several key benefits for data sharing, particularly in contexts involving scarce data, biased datasets, privacy concerns, or regulatory constraints. These scenarios — frequently encountered in industry — highlight the potential of our approach to accelerate research and development in causal time-series. We will add a few sentences to better discuss these aspects.

We thank again the reviewer. We hope the responses will be satisfactory.

评论- Paper update

2024-11-23

Regarding the downstream task that involves the causal graphs we conducted the following experiment. We considered the task of predicting the $i$ -th feature of the multivariate time-series given its other dimensions and the corresponding causal graph. The predictor model consists of a 2-layer LSTM to compute the embedding of the observed time-series and a linear layer $L_{graph}$ to learn the embedding of the associated graph $g$ . The two embeddings are summed and passed through a linear layer ( $L_{out}$ ) to output the $i$ -th feature of the sample. The model is trained to minimize the $\ell_1$ reconstruction loss. It will be formalized in a dedicated section.

We believe this task provides a meaningful way to evaluate the realism of the generated causal graphs, reflecting a practical use case where the causal graph is leveraged to estimate the values of a target dimension based on its causal relationships.

The Table below reports the results evaluation the predictor model in terms of Mean Absolute Error (MAE) on an independent validation set.

Dataset	Predictor Performance (MAE)
Hénon	$0.13 \pm 0.01$
Rivers	$0.01 \pm 0.00$

Finally, we added new charts with the training epoch on $x$ -axis and evaluation metrics on the $y$ -axis in the section Appendix A.4.5.

Thank you again.

2024-11-25

I thank the authors for the feedback. Some of my concerns have been successfully addressed but are not yet reflected in the text and would potentially require more significant changes. In addition, some concerns from the other reviewers and my own concerns have not yet been addressed and would potentially require more time. I thus maintain my score.

AC 元评审

2024-12-20

This paper presents CausalDiffusion, a framework that generates multivariate time-series data alongside associated causal graphs using diffusion models. Reviewers raised concerns regarding the paper’s motivation and novelty, as well as its reliance on strong assumptions, such as causal sufficiency, which limit its applicability. Additional concerns include the lack of thorough exploration of the model’s scalability and robustness to assumption violations. The experimental evaluation was also noted to lack depth, with limited attention to downstream tasks and real-world datasets. While the authors’ rebuttal addressed some concerns, such as providing theoretical justification for VAR-based graph generation and conducting robustness experiments, several key issues remain unresolved.

审稿人讨论附加意见

Most reviewers acknowledged the rebuttal. While some concerns have been addressed, others remain unresolved and may require additional time to fully address. Additionally, some changes are not yet reflected in the text and would likely require more substantial revisions.

最终决定Reject

2025-01-22

Reject