PaperHub
5.3
/10
Rejected3 位审稿人
最低5最高6标准差0.5
6
5
5
3.0
置信度
正确性2.3
贡献度2.3
表达3.0
ICLR 2025

Elucidating the Design Choice of Probability Paths in Flow Matching for Forecasting

OpenReviewPDF
提交: 2024-09-27更新: 2025-02-05
TL;DR

We introduce a novel probability path model to improve the performance of flow matching for probabilistic forecasting.

摘要

关键词
generative modelingflow matchingdynamical systemsforecasting

评审与讨论

审稿意见
6

In this paper, the authors discussed the influence of probablity path choice in the context of spatial temporal forecasting. The authors conclude that different probability paths can significantly impact the accuracy and convergence of forecasting models and propose a new probability path model specifically designed for probabilistic forecasting of dynamical systems. Experiments show outperforming performance and faster convergence of the proposed method.

优点

  • The paper clearly and thoroughly discusses and compares the various kinds of probabilistic forecasting models, which are informative and insightful.
  • The proposed probability path model learns to connect consecutive time series samples, leading to faster convergence and more stable training.
  • The experiments are intensive, providing support for the proposed model.

缺点

  • Some notations are a little bit confusing. For example, do vtv_t in algorithm 1 and vsv_s in algorithm 2 mean the same vector field?
  • The result for faster convergence and fewer sample steps comes empirically. It can benefit from more therotical derivations or insights for why the proposed probability path yileds better results.

问题

  • How are those sns_n determined in the sampling algorithm?
  • What is the reason for setting the highest variance at the middle and lowest variance at both the start and the end?
  • Does the proposed probability path have equivalent form for other variants of diffusion models and bring the same benefits to them, such as the score matching and noise prediction objective?
评论

We thank the reviewer for recognizing the strengths of our paper. We are glad that you found the insights and contributions of our work to be informative and valuable.

We appreciate your feedback on the clarity of notations in Algorithm 1 and Algorithm 2. The vector field in Algorithm 2 is the trained vector field from Algorithm 1, and we have fixed the typo for the notation to indicate this in Algorithm 2 in the revised version (with θ\theta^* instead of θ\theta in line 293).

评论

We agree that providing a more detailed theoretical analysis to understand training convergence and sampling efficiency is of interest. However, this remains challenging due to the forecasting setting, which involves sequential dependencies and random dynamics that are inherently complex to model rigorously. A complete theoretical analysis requires further research and is beyond the scope of the present paper.

That said, we have already included intuition (see the discussion in Section 4.2) and some theoretical results in the paper by comparing the variance of the vector field corresponding to our proposed probability path model and that of the rectified flow model (see Section C.3 in Appendix and the related discussions). This comparison highlights how the proposed probability path could lead to smaller variance when computing the vector field during gradient descent updates, potentially contributing to smoother training loss curve and more stable training.

评论

Since Algorithm 2 is specifically tailored for the forward Euler scheme, the step sizes {Δsn}\{\Delta s_n\} are uniform. For example, if we are using 10 steps for sampling, then Δsn=0.1\Delta s_n = 0.1 for all nn (recall that we are integrating the ODE from s=0s=0 to s=1s=1). In principle, the step sizes could also depend on the numerical scheme used and made non-uniform. We choose to illustrate Algorithm 2 with the forward Euler scheme for simplicity and have made the remark that other numerical schemes could also be used.

评论

The question on the reason for setting the highest variance at the middle and lowest variance at both the start and the end is certainly an interesting one. The variance scheduling, with the highest variance in the middle and the lowest variance at both ends, is designed to balance exploration and stability:

  • Low variance at the start ensures stable initialization, preventing the trajectory from deviating too far from the initial distribution.

  • High variance in the middle allows the model to explore diverse paths in the latent space, avoiding mode collapse and enhancing diversity in the generated trajectories.

  • Low variance at the end sharpens the trajectory, ensuring accurate reconstruction of the desired output.

This strategy is inspired by findings in diffusion models that utilize a forward noising process and a backward denoising process, where such variance patterns have been shown to effectively manage the trade-off between exploration and refinement. We have included this discussion in the revised paper.

评论

The proposed probability path is designed within the context of our specific flow matching framework, but we believe its principles can generalize to other diffusion model variants. For score matching, the path could align the score estimates with a smoother trajectory, potentially enhancing stability during training. For the noise prediction objective, the smoother transition provided by the probability path might reduce the variance in noise predictions, improving convergence and accuracy. We hypothesize that the underlying structure of the probability path—particularly its variance modulation—can be adapted to these objectives to achieve similar benefits.

We hope this answers the reviewer's questions and concerns, and have revised the paper accordingly by taking into account the feedback. We are happy to answer any follow-up question(s) that the reviewer may have.

评论

Thank you for the responses. I appreciate the clarifications made by the authors and my questions are properly answered.

Though the intuitions and design are good, I still have the feeling that, without further theoretical insights into the variance schedule choice, the proposed probabilistic path is more like a trick built upon some existing methods, rather than a well-grounded, sound novel approach.

Given the above considerations, I support this paper to be above the accpetance threshold, but I am not convinced to provide a higher rating. I will keep my original score at 6.

评论

Thank you for your thoughtful feedback and for supporting the paper's acceptance!

审稿意见
5

The authors examine the impact of the specific choice of the probability path model on the forecasting performance. This is achieved in the context of flow matching, a well know principle which has not been properly examined in forecasting. There, the mappings are learned via stochastic processes, through random differential equations.

优点

THe ability to perform spatio-temporal forecasting is a major strenght. A novel theoretical framework for flow-based forecasting of spatio-temporal data. The probabilistic path is parametrized using a neural network, which results in the task to be solved in the form of second-order regression. By taking into account inherent correlations in spatio-temporal data, the proposed model improves upon the existing models of the kind. Extensive performance records, and comprehensive ablation study.

缺点

The main weakness is the choice of the framework. Stochastic processes and ODE based method are known to underperform for non-Gaussian distributions and non-stationary data. Especially, the gaussian probabiliy paths cannot deal with real-world data which exhibit fat-tailed distribution. The conditional probability paths and the marginalization of distributions are rather standard in GenAI. The performance metrics are all second-rder (MSE, Frobenius norm, peak signal to noise ratio, etc), which are subobptimal given the use of probabilistic models. Too many concepts are put together: encoders, neural networks (MLP), regression using MSE, optimal mass transport, diffusion, ODEs, Gaussian models. Overall, the approach appears more of a "system-building" exercise rather than a deep new framework.

问题

The authors considered Gaussian probability paths, which are suboptimal fore real world data. Have the authors considered elliptical probability distributions and elliptical mixture models, to fix this issue The authors used second-order performance metrics, yet the approach is probabilistic and requires probabilistic estimates rather than second-order measures. Which higher-order probabilistic metrics would the authors use in this context.

评论

It is important to choose the right higher-order probabilistic metrics, if we are going to use them in this context. The ideal scenario is when we are able to compute metrics that quantify the full difference between two probability distributions. These metrics include statistical distances such as Wasserstein distance and Kullback-Leibler divergence. Given the constraints of limited data, we are cautious about directly applying higher-order probabilistic metrics without reliable access to full ground truth distributions.

On the other hand, it is possible to compute the continuous ranked probability score (CRPS) [1] given an ensemble of scalar forecasted values and a target value. CRPS is a metric that is often used to measure the compatibility of the cumulative distribution function (CDF) of the forecasts with the target value, taking into account the uncertainty of the prediction. For high-dimensional arrays (such as forecasts for multiple variables or at multiple spatial locations), the CRPS can be extended by treating the multidimensional forecasts as multivariate distributions. However, computing CRPS for high-dimensional forecasts (which is our case here given the high-dimensional spatial resolution of the PDE datasets; e.g., 64 ×\times 64 = 4096 dimensions for the fluid flow dataset) using a sufficiently large number of ensemble members (important to obtain a sufficiently accurate estimate of CRPS) is computationally expensive.

That said, we manage to compute the CRPSs for the considered tasks. The CRPS results for all tasks, except for the Navier-Stokes task (which requires additional time for experimentation and we will add the results later), have been included in the revised version of the paper. We hope their addition will help to raise the score of the paper.

[1] Matheson, J. E. & Winkler, R. L. Scoring Rules for Continuous Probability Distributions. Management Science 22, 1087–1096 (1976).

评论

We have not considered using elliptical probability distributions and elliptical mixture models, which are of course interesting and are natural extensions of our present work. We appreciate the suggestion to explore the broader class of elliptical distributions as alternatives to Gaussian models. While Gaussian models were used in this work due to their simplicity and to make comparisons of different probability path models tractable, we recognize that elliptical distributions (and mixture models) are more flexible and could potentially better capture the fat-tailed nature of real-world data. We have noted this as a future direction in the revised version of the paper, and plan to investigate elliptical distributions and mixtures in future work, as they offer a promising direction for improving the model's ability to handle more complex data distributions.

We hope this answers the reviewer's questions and concerns, and have revised the paper accordingly by taking into account the feedback. We are happy to answer any follow-up question(s) that the reviewer may have.

评论

We thank the reviewer for the careful reading of our paper and recognizing the strengths of our work on flow matching framework for spatio-temporal forecasting.

We would like to emphasize that the present paper focuses on evaluating the effectiveness of a specific framework (flow matching) and exploring the design choices of the models within a controlled setting, for the task of forecasting deterministic PDE dynamics (also see our response to Reviewer 6PGK for a detailed discussion of our motivations). Going beyond stochastic dynamics, non-Gaussian distributions, and non-stationary data is certainly interesting but is outside the scope of this paper.

While several concepts are integrated within our approach, we have presented them within a unified framework that is designed to be easily accessible and comprehensible (as noted by Reviewer NMVG). This allows readers to understand the core principles of the model and its application to deterministic systems, without the additional complexity of more advanced, generalized scenarios.

评论

Evaluation metrics such as MSE and PSNR are commonly used in papers on video prediction; see, e.g., (Davtyan et. al. 2023). Importantly, we also provide Pearson correlation coefficients to assess the correlation between predicted and true snapshots at various prediction steps. We believe that this metric is sufficiently informative, as the decay of these coefficients can inform us the long-term predictive capability. However, we are also aware that second-order performance metrics may not fully capture the potential of probabilistic models, which inherently involve higher-order statistical properties. The reason for using second-order metrics in this work is two-fold:

  • Data availability: In many real-world forecasting tasks, only a few data samples (sometimes even just one) are available from the ground truth distribution. This makes the computation of higher-order statistics (such as skewness, kurtosis, or higher moments) highly challenging and prone to instability, as we do not have a full distribution over which to compute these moments reliably.

  • Comparison focus: One of the main goals of our work is to compare different choices of probability paths and their impacts on forecasting performance, training convergence and inference efficiency, rather than to fully explore higher-order probabilistic metrics. We argue that the second-order metrics we used provide a solid baseline for evaluating the core aspects of our model, such as the stability and accuracy of predictions, which are critical in many practical forecasting applications.

评论

The authors state: "Given the constraints of limited data, we are cautious about directly applying higher-order probabilistic metrics without reliable access to full ground truth distributions." I agree and appreciated that they were able to add the higher-order CRPS metric. Unfortunately, the main limitations stem from strong assumptions (data obey ODE and Gaussian). Real-world data do not obey ODEs, they are not Gaussian and are non-stationary. I therefore incline to keep my original scores.

评论

Thank you for taking the time to review our rebuttal and for recognizing our addition of the CRPS results. We acknowledge that many real-world datasets do not strictly obey ODEs, are not Gaussian, and may be non-stationary. However, these assumptions were intentionally chosen as a simplification to create a controlled environment for evaluating the core contributions of our probabilistic model.

The main goal of our research is to develop a novel probabilistic model, and the problems we selected primarily serve as benchmarks to demonstrate the advantages and capabilities of our approach, rather than to advocate for its direct application to specific real-world scenarios. While we considered applying our model to other tasks such as video generation, we found that studying it within the context of dynamical systems provides a more interpretable and focused setting to effectively highlight its strengths.

审稿意见
5

This paper addresses spatio-temporal forecasting using latent flow matching. In spatio-temporal forecasting, the performance of flow matching is highly dependent on the choice of the probability path model. To address this, the authors propose a new probability path model that accounts for the inherent continuity and correlation in spatio-temporal data, aiming to shorten the interpolating path.

优点

  1. The paper is well-written and accessible.

  2. It presents a unified framework for probability path models in the context of flow matching and diffusion models, providing readers with a clearer overview of current generative model research.

缺点

  1. The main contribution of the paper is a new probability path model for spatio-temporal data. However, the motivation behind this model is presented somewhat vaguely. A more formal and rigorous mathematical illustration would improve clarity.

  2. There are numerous existing diffusion model-based methods for spatio-temporal data; a comparison with these methods would strengthen the paper.

  3. The baselines all seem to use an encoder. An ablation study demonstrating the benefits of modeling in the latent space would be valuable (Please correct me if I missed this).

问题

If the encoder is trained separately from the flow matching model, what objective function is used to train the encoder?

评论

We thank the reviewer for recognizing our contribution as valuable and for describing our paper as well-written and accessible. We hope it will inspire further research in generative modeling for spatio-temporal scientific data. (AI for science is an emerging research area.)

Indeed, the core contribution of our paper is a novel probability path model tailored for spatio-temporal scientific data, such as fluid flows. Our investigation reveals a critical insight: when employing flow-matching methods for spatio-temporal forecasting, the choice of the probability path model profoundly affects predictive performance, training convergence, and inference efficiency. This observation led us to ask a fundamental question: Are existing probability path models inherently well-suited for spatio-temporal tasks, or could alternative models better leverage the unique characteristics of such data to achieve improvements across these key aspects?

In response, we propose a new probability path model specifically designed to harness the continuous dynamics intrinsic to spatio-temporal data. By interpolating between consecutive sequential samples, our model aligns directly with the constructed flow, resulting in enhanced predictive performance, more stable training convergence, and greater inference efficiency. The motivation for our model, therefore, results from observed limitations in existing probability path models, which often fail to fully capture the continuous nature of spatio-temporal scientific data. This misalignment with flow-based methods frequently leads to suboptimal results, a gap our model aims to address.

评论

We appreciate the reviewer’s suggestion for a more formal and rigorous mathematical illustration. Our work already includes theoretical analysis and intuitive discussions aimed at providing insights into the behavior of the proposed method. Specifically, in Section 4.2, we provide intuition on the advantages of using a probability path that interpolates between consecutive time series samples, and in Section C.3 of the Appendix, we present a comparative analysis of the variance of the vector field associated with our proposed probability path model and that of the rectified flow model. This theoretical comparison demonstrates how our proposed path can result in smaller variance during gradient descent updates, which may contribute to smoother training loss curves and more stable training dynamics.

That said, we agree that additional theoretical results, particularly those addressing training convergence and sampling efficiency, could further improve the understanding of the proposed method. However, deriving such results in the context of our forecasting setting is particularly challenging due to the presence of sequential dependencies and inherently complex random dynamics. These aspects make it difficult to perform mathematical analysis that fully takes into account the interplay of stochasticity and temporal dependencies.

We are open to specific suggestions from the reviewer that could guide the development of additional formal results within the scope of this work.

评论

In this paper, we focus exclusively on flow matching methods and address the foundational question of how the choice of probability path models impacts predictive performance, training convergence and inference efficiency, within a controlled setting. While it would indeed be interesting to compare our flow-based method with various existing diffusion-based models for spatio-temporal data, our primary goal is to advance the understanding of probability path models within the context of flow matching. Moreover, flow matching and diffusion-based methods have distinct modeling frameworks and objectives, making a direct comparison challenging without introducing confounding factors. Diffusion models typically rely on stochastic dynamics to generate data, while flow matching emphasizes deterministic mappings guided by learned flows.

评论

Latent-space approaches have been shown to offer significant advantages in scaling and performance for high-dimensional data. For example, Karras et al. (2022) demonstrate that latent diffusion methods, combined with suitable architectural choices, exhibit superior scaling behavior compared to pixel-space diffusion approaches. These findings support the effectiveness of latent-space modeling for complex, high-dimensional datasets, providing a strong motivation for our design choice.

In our work, the use of an autoencoder to map data into a latent space is similarly motivated by the computational challenges of working directly with the high-dimensional spatial resolution of PDE datasets. Training directly in the ambient space requires substantial GPU memory and computational resources, making it impractical for large-scale or high-resolution datasets. By leveraging a latent-space representation, we achieve significant dimensionality reduction while preserving the essential structure of the data, enabling efficient training and inference with standard hardware configurations.

While we acknowledge that an ablation study could provide additional insights, particularly for lower-dimensional ODE datasets where training in pixel space might be computationally feasible, our focus in this paper is on PDE datasets with high spatial resolution. For these datasets, latent-space modeling provides a critical balance between computational efficiency, scalability, and performance.

We appreciate the reviewer’s suggestion and have incorporated a more explicit discussion of these trade-offs in the revised paper (see App. E.2) to clarify the rationale for our design choices.

评论

The autoencoder is trained separately using a mean squared error (MSE) loss, which is a standard choice for training autoencoders, to reconstruct the original data from its latent representation. This ensures that the autoencoder learns a compact and meaningful latent space while preserving the essential features of the high-dimensional spatio-temporal data. Training the autoencoder independently with MSE loss allows us to decouple the complexity of representation learning from the flow matching model, simplifying the overall pipeline and focusing the flow matching model on learning dynamics in the latent space.

We hope this answers the reviewer's questions and concerns, and have revised the paper accordingly by taking into account the feedback. We are happy to answer any follow-up question(s) that the reviewer may have.

评论

Thank you to the authors for their response. However, the rebuttal does not fully address my concerns regarding the mathematical motivation of the method and the lack of comparison with more advanced spatio-temporal forecasting methods. To clarify, I am not asking for a formal convergence analysis, but I do expect the motivation of the method to have a rigorous foundation rather than relying on vague textual descriptions. As a result, I do not believe the current version meets the acceptance standards for ICLR.

评论

We thank all the reviewers for their overall positive ratings and constructive feedback. We have uploaded a revised version of the paper, incorporating the reviewers' feedback to enhance its quality and impact. Changes made in the paper are highlighted in blue.

The main updates include:

  • An expanded discussion of the motivations behind our proposed model and the rationale for the choice of the variance schedule in our probability path model.

  • The inclusion of the Continuous Ranked Probability Score (CRPS) as an additional evaluation metric (see also our response to Reviewer is4e), along with results from the considered experiments (see Table 2-3 in the revised paper), to enhance the assessment of probabilistic forecasting performance.

  • An expanded discussion on the motivations for performing flow matching in the latent space and the advantages of using a pre-trained autoencoder.

Below, we provide detailed responses to each reviewer individually.

AC 元评审

The paper introduces a new probability path model under the framework of latent flow matching to improve forecasting performance for spatio-temporal data. The novelty resides in implementing a new probability path for flow matching within the latent space of a pre-trained autoencoder. Considering the existing latent diffusion models and the close relationship between Gaussian flow matching and Gaussian diffusion, the innovation's significance may appear marginal. The distinction of this new model's approach hinges on demonstrating that the choice of probability path significantly impacts results differently than varying noise scheduling in latent diffusion models. However, the paper lacks a thorough theoretical analysis that explains why this specific probability path is optimally suited for spatio-temporal data. Additionally, there is a notable absence of comparisons with relevant diffusion model baselines. These omissions weaken the case for the proposed model's substantial advancement over existing methods.

审稿人讨论附加意见

After reviewing the authors' rebuttals, the reviewers continue to express concerns, particularly regarding the mathematical motivation of the method, the theoretical foundations underlying the choice of variance schedule, and the absence of comparisons with diffusion-based spatio-temporal forecasting methods.

最终决定

Reject