PaperHub
7.2
/10
Poster4 位审稿人
最低3最高4标准差0.4
4
3
4
4
ICML 2025

Finite-Time Analysis of Discrete-Time Stochastic Interpolants

OpenReviewPDF
提交: 2025-01-21更新: 2025-07-24

摘要

关键词
Stochastic InterpolantsDiffusion Models

评审与讨论

审稿意见
4

This work provides a theoretical analysis of time-discretized stochastic interpolant models. Specifically, they address the problem of convergence with respect to the number of steps in the time discretization, characterizing the error in the modeled distribution with respect to the discretization scheme, the model approximation error, and the prior approximation error. This extends prior analysis focusing solely on diffusion models while remaining consistent, and addresses the effect of time discretization in particular. Motivated by their theoretical results, the authors outline a new time discretization strategy which can theoretically improve convergence of discrete-time solvers using stochastic interpolants, which they demonstrate empirically.

update after rebuttal: I believe my original score holds, as this remains a strong paper and contribution.

给作者的问题

Is there some way to selected or design \gamma to yield efficient sampling?

论据与证据

The primary claim made by the authors concerns the convergence of discrete-time solvers (namely, Euler-Maruyama) for stochastic interpolant models. They provide a bound on the KL divergence between the true and model target distributions, which is a function of the model approximation error, time discretization error, and prior approximation “distance”. The proof of this method is explained intuitively, with a detailed proof provided in the Appendix. This bound is then used to design an efficient time discretization schedule to ensure more rapid convergence with respect to the number of time steps. They show theoretically and empirically that their new strategy yields more efficient sampling.

方法与评估标准

While the primary focus of this work is a theoretical characterization of practical methods for computing stochastic interpolants, the authors also provide limited experimental validation of their claims. I find this evaluation very convincing, as they demonstrate the enhanced performance of their scheduling scheme as predicted by their theory. While this is demonstrated on toy datasets, the point is made very clearly.

理论论述

The authors provide intuitive interpretations and walkthroughs of their theorems and proofs. I found these descriptions intuitive and easy to follow, and they are convincingly backed by rigorous proof.

实验设计与分析

As mentioned above, I believe their limited experiments provide sufficient empirical evidence that their theory is sound. While the level of experimental evaluation is not the same as a typical practical generative modeling paper, I believe their experiments do well to demonstrate the validity of their theoretical results.

补充材料

The supplementary material provides additional background, details regarding the proofs for sections 4 and 5, and additional experiment details. I believe the main text does well to explain the intuition behind the proofs, and therefore I mainly used the main text descriptions to develop an understanding of the theoretical results. However, the details in the appendix provide a rigorous outline of the theoretical results.

与现有文献的关系

The authors position their work within two main areas in the literature. First, their theoretical analysis focuses on stochastic interpolants related to continuous-time normalizing flows and diffusion. Such approaches have seen widespread application in high-dimensional, structured generative modeling settings (e.g., image generation). Second, they provide an analogous study of stochastic interpolants to those conducted for the convergence and error analysis of diffusion.

遗漏的重要参考文献

I do not believe there is any missing relevant literature.

其他优缺点

Overall, I believe this would be a very valuable contribution to the community. The authors do well to highlight the importance of considering time discretization in addition to model approximation error in modern generative modeling convergence/error analysis. Moreover, their theoretical insights provide immediate practical considerations that can be employed in stochastic interpolant samplers, as they demonstrate both theoretically and empirically that their proposed sampling scheme can yield better convergence than a simple scheme used in practice, such as uniform discretization. They provide a theoretical basis for designing SDE-based sampling schemes targeting specific error bounds.

其他意见或建议

Right Column Line 170: “initial distribution mismatch (i.e., KL( ρ(t0)||ρ^(tN) ))” it should be KL( ρ(t0)||ρ^(t0) )?

Right Column Lines 361-365: There seems to be a grammatical error here, part of the sentence is repeated.

Right Column Line 373: aspeccts - > aspects

作者回复

Thank you for your time and effort in reviewing our paper! We are grateful for your constructive suggestions, which have significantly guided our improvements. Please find our responses to your comments below.

Other Comments Or Suggestions:

Right Column Line 170: “initial distribution mismatch (i.e., KL(ρ(t0)ρ^(tN))\text{KL}( \rho(t_0)\Vert\hat{\rho}(t_N)))” it should be KL(ρ(t0)ρ^(t0))\text{KL}( \rho(t_0)\Vert\hat{\rho}(t_0))?
Right Column Lines 361-365: There seems to be a grammatical error here, part of the sentence is repeated.
Right Column Line 373: aspeccts - > aspects

A: Thanks for your suggestion! We will update them in the final version.

Questions For Authors:

Is there some way to selected or design γ\gamma to yield efficient sampling?

A: Thanks for your insightful question! Our current analysis regarding schedule design and its associated sample complexity is predicated on a fixed latent scale, γ\gamma. In Section 5, we adopt γ(t)=at(1t)\gamma(t)=\sqrt{at(1-t)} due to its prevalence and natural appeal, stemming from its connection to the Brownian bridge process. However, the systematic design of γ\gamma to optimize sampling efficiency represents a compelling avenue for future research, potentially necessitating a synergistic approach encompassing both theoretical analysis and practical experimentation.


We hope our response addresses your concerns. If so, we wonder if you could kindly consider raising your score? We will also be happy to answer any further questions you may have. Thank you very much!

审稿人评论

Thank you for addressing my limited concerns/questions. I believe my original score holds, and that this work represents a useful contribution.

审稿意见
3

This paper present discrete-time analysis of the stochastic interpolant framework or as known as flow models, by theorical results, this paper design a schedules for convergence acceleration.

update after rebuttal

My view has not changed, so I maintain my original score.

给作者的问题

See above.

论据与证据

I think the discrete-time analysis is reasonable.

方法与评估标准

  1. I believe the authors should explain the novelty of the Exponentially Decaying Time Schedule. Given that flow models and diffusion models are not substantially different, it appears that the authors may have merely transferred a time schedule from diffusion to flow models.
  2. If that is the case, the main contribution seems to be theoretically verifying that the Exponentially Decaying Time Schedule offers certain advantages over a Uniform Schedule in flow models.

理论论述

Theoretical claim is valid.

实验设计与分析

  1. Although the authors provide theoretical justification for their method’s superiority, the experiments themselves appear overly simple. I recommend conducting toy experiments on low-resolution datasets such as CIFAR-10 or ImageNet32 since training flow models on these datasets are relatively fast and computationally manageable.
  2. Experiments conducted solely on Gaussian data only confirm the theoretical foundations. Given that the authors’ theory largely transfers discrete-time analysis to flow models, it would be more convincing to present additional experiments that demonstrate the claimed ability to “design efficient schedules for convergence acceleration.”

补充材料

See Theoretical Claims.

与现有文献的关系

This paper relates to many applications of diffusion models.

遗漏的重要参考文献

Flow straight and fast: Learning to generate and transfer data with rectified flow ICLR 2023 Flow matching for generative modeling ICLR 2023

其他优缺点

No

其他意见或建议

I believe the authors’ statement that the paper focuses on theory is valid. However, if they aim to provide an improved method, they should at least include standard experiments on datasets like CIFAR-10 to support their claims. Relying solely on simple two-dimensional datasets is not sufficient.

作者回复

Thank you for your time and effort in reviewing our paper! We are grateful for your constructive suggestions, which have significantly guided our improvements. Please find our responses to your comments below.

Methods And Evaluation Criteria

I believe the authors should explain the novelty of the Exponentially Decaying Time Schedule. Given that flow models and diffusion models are not substantially different, it appears that the authors may have merely transferred a time schedule from diffusion to flow models.

A: We first would like to emphasize that our work introduces the first discrete-time convergence bound for SDE-based generative models within the stochastic interpolants framework. Theorem 4.3 provides an explicit upper bound on the estimation error, expressed in terms of step sizes hkh_k, dimension dd, latent scale γ\gamma, and the distance between two distributions measured by Ex0x1p\mathbb{E}\Vert x_0-x_1\Vert^p. This theorem establishes an error bound applicable to arbitrary time schedules, thereby offering insights into the design of schedules that minimize sample complexity. The schedule presented in Section 5 is designed based on this theorem, wherein we stipulate that hkh_k should be proportional to γˉk2\bar{\gamma}_k^2 to ensure a well-balanced discretization error.

Specifically, for the case where γ(t)=at(1t)\gamma(t)=\sqrt{at(1-t)}, the derived schedule manifests as an exponentially decaying schedule, characterized by a reduction in step size on both ends of the interval [0,1][0,1]. While this schedule shares similarities with the exponentially decaying schedule employed in diffusion models, a key distinction lies in the fact that diffusion models typically decay the step size on only one side of the interval.

Furthermore, it is noteworthy that for alternative choices of γ\gamma, the optimal schedule may deviate from the above schedule.

Experimental Designs Or Analyses

Although the authors provide theoretical justification for their method’s superiority, the experiments themselves appear overly simple. I recommend conducting toy experiments on low-resolution datasets such as CIFAR-10 or ImageNet32 since training flow models on these datasets are relatively fast and computationally manageable.

A: Primarily, this work constitutes a theoretical investigation, wherein we establish the first discrete-time complexity analysis for the stochastic interpolants framework. Our contributions include the derivation of an explicit error bound for the discrete-time sampler, as presented in Theorem 4.3, and the subsequent exploration of schedule design strategies aimed at minimizing sample complexity. The objective of our numerical experiments is to validate our theoretical findings.

While we acknowledge the value of demonstrating our method's efficacy on more complex datasets, such as CIFAR-10 or ImageNet32, we must emphasize that the application of our framework to these datasets introduces a multitude of practical challenges. These challenges encompass the design of effective network architectures, the meticulous tuning of hyperparameters, and the optimization of learning algorithms. Addressing these practical considerations falls outside the primary scope of this theoretical study. However, we recognize the importance of empirical validation on standard datasets and consider it an interesting direction for future research.

Given that the authors’ theory largely transfers discrete-time analysis to flow models, it would be more convincing to present additional experiments that demonstrate the claimed ability to “design efficient schedules for convergence acceleration.”

A: A core contribution of our work lies in the derivation of the inaugural theoretical error bound for distribution estimation within the stochastic interpolants framework, explicitly expressed in terms of the step sizes. Leveraging this result, we theoretically demonstrate the feasibility of designing schedules that accelerate convergence by optimizing the balance between error terms.

To empirically validate the efficacy of our schedule design methodology, we have conducted numerical experiments for both γ=(1t)t\gamma=\sqrt{(1-t)t} (Section 6) and γ=(1t)2t\gamma=\sqrt{(1-t)^2t} (Appendix D.1), comparing the performance of our specifically designed schedules against that of the standard uniform schedule. While we acknowledge the potential for further experimentation to strengthen our claims, we consider the current numerical results as a solid validation. Furthermore, the exploration of optimal choices for γ\gamma in practical applications represents an intriguing avenue for future research.

Essential References Not Discussed

A: Thank you for pointing them out! We will discuss them in the revision.


We hope our response addresses your concerns. If so, we wonder if you could kindly consider raising your score? We will also be happy to answer any further questions you may have. Thank you very much!

审稿人评论

Thank you for clarifying your concerns. While I acknowledge the paper's theoretical contributions and the exploration of a flow-based model, I remain unconvinced that the proposed “stochastic interpolants” framework significantly differs from standard SDE-based diffusion models. In addition, the experimental results appear limited.

Given the paper’s theoretical focus and the solid proofs provided, I recognize the authors’ contributions. However, these do not fully address my reservations regarding novelty and empirical support. Therefore, I maintain my initial score of 3.

审稿意见
4

This paper proposes a finite time analysis of discretization of stochastic interpolants. The paper presents assumptions on the initial and final distributions, score estimators, then provides a complexity bound in KL divergence.

给作者的问题

  1. The authors assume in Assumption 4.2 that the drift is estimated with high accuracy. However, can authors clarify how this can be relevant to practice? We know that bFb_F can be decomposed as the score + velocity term. For the score term, very good estimators exist from diffusion models literature. Can the same be said for the second term? Is this a practical setting?

Please mention some practical estimators for the second term after this assumption, their behaviour, whether it is easy to ensure good training of it, as in the score case.

  1. While the bound in Theorem 4.3 has a favourable dependence to dimension, I feel like some of this is hidden in the term Ex0x18\mathbb{E}\|x_0 - x_1\|^8. Please exemplify a bound with two multivariate Gaussians - provide this expression and provide its dependence to dd. Similar question extends to Corollary 5.2.

  2. What is the reason in Assumption 4.1 that you have an 8th moment assumptions, vs. 4th moments in the literature. Please give some intuition following line 190.

  3. Figure 1: is there any issue with t=0.001t = 0.001. Is the estimator b^F\hat{b}_F stable around this region? Please provide the plot around this time for the SDE.

  4. I suggest authors to not assume uniformly that bFb_F is estimated ε\varepsilon-accurate. Perhaps given the score estimators, you can provide εscore\varepsilon_{score}-accurate score and εvel\varepsilon_{vel}-accurate velocity term estimate - thus see the interplay between these two error sources and how they play out in your final bound.

论据与证据

The paper claims to approach the problem of analysis discrete-time stochastic interpolants, and does provide clear assumptions and results to support the claim. There are also numerics that demonstrate the convergence rate. No claims seem problematic.

方法与评估标准

Not applicable as this is mainly a theoretical paper.

理论论述

While I did give a careful look at statements and proof sketches, I did not check every single line of proofs. Below are my high-level questions about theoretical claims

实验设计与分析

The experimental design is very limited, limited to 2D examples. Given the explicit dimension dependence in the bound, I suggest authors to provide experiments with variable dd (maybe Gaussian case or something similar) and let dd grow, to obtain a scaling result to see whether their bound is predictive of what will happen with increasing dd.

补充材料

Briefly, not all parts, not all proofs are checked.

与现有文献的关系

The paper extends the work on stochastic interpolants, particularly Albergo et al. (2023) and provides a discrete time analysis. However, my opinion in general is that given the assumption on ε\varepsilon-accurate drift for the SDE given in the Albergo et al. (2023), I think the contribution is a bit incremental - this is a discretisation analysis in essence. Which is usually standard adapting the usual analysis methods as authors did.

遗漏的重要参考文献

None.

其他优缺点

Analysis seem very favourable dependence to dimension and ε\varepsilon accuracy.

其他意见或建议

  1. line 149 lhs, "Both b(t,x)b(t,x) and bF(t,x)b_F(t,x) can be expressed as linear combinations", clarify for reader's convenience.

  2. Line 383, NN is the number of iterations, line 375 N(0,Id)N(0, I_d) is a Gaussian, use N\mathcal{N} for the density.

作者回复

Thank you for your time and effort in reviewing our paper! We are grateful for your constructive suggestions, which have significantly guided our improvements. Please find our responses to your comments below.

Experimental Designs Or Analyses

A: Thanks for the suggestion. We tested the samplers on dd-dimensional Gaussian mixtures (ρ0\rho_0 and ρ1\rho_1) and compared the distribution estimation error for different dd. The real drift terms used in the sampler are analytically computed [1]. You can check link1 or link2 for results. The first figure compares the error for different dd with fixed iterations, and the second shows convergence for higher dimensions. Both results align with our theory and validate the dd dependence.

Other Comments Or Suggestions

A: Thank you for your suggestions! We will update them in the revision.

Questions For Authors

A1: We first emphasize that our work primarily focuses on the theoretical analysis of the general stochastic interpolant method. We provide the first discrete-time analysis and propose an upper bound for the distribution estimation error. Additionally, we develop a discretization time schedule that optimizes this bound for a specified γ\gamma.

Theorem 4.3 offers a general upper bound, controlling the error using εbF2\varepsilon_{b_F}^2, the distance between distributions, data dimension dd, and step sizes hkh_k. Assumption 4.2 quantifies how close b^F(t,x)\hat b_F(t,x) is to bF(t,x)b_F(t,x) and holds for some εbF2\varepsilon_{b_F}^2 if b^(t,xt)\hat b(t,x_t) has finite second moments. Theorem 4.3 illustrates how this impacts the final distribution error.

Our numerical experiments validate these findings. We use the loss function L[v^]=12t0tNE[v^22tIv^]dt\mathcal{L}[\hat{v}]=\frac{1}{2}\int_{t_0}^{t_N}\mathbb{E}[|\hat{v}|^2-2\partial_tI\cdot\hat{v}]\text{d}t to train the estimator v^(t,x)\hat{v}(t,x) for v(t,x)v(t,x). This choice proved effective and is commonly used (e.g., [1], [2], [3]).

However, creating robust estimators for general applications is challenging, requiring careful designs for network architectures, learning algorithms, and learning rate schedules, which is a direction for future research.

A2: Regarding dimension dependence in Theorem 4.3, the key terms contributing to the discretization error are: εdisk=0N1hk3(d3γˉk6+dEx0x18γˉk2)+k=0N1hk2(d2γˉk4+dEx0x14γˉk2).\varepsilon_{\text{dis}}\lesssim\sum_{k=0}^{N-1}h_k^3(d^3\bar \gamma_k^{-6}+d\sqrt{\mathbb{E}|x_0-x_1|^8}\bar\gamma_k^{-2})+\sum_{k=0}^{N-1}h_k^2(d^2\bar\gamma_k^{-4}+d\sqrt{\mathbb{E}|x_0-x_1|^4}\bar\gamma_k^{-2}). For a multivariate Gaussian zN(0,Id)z\sim\mathcal{N}(0,I_d), Ez2pC(p)dp\mathbb{E}|z|^{2p}\le C(p)d^p. If x0x_0 and x1x_1 are multivariate Gaussians, Ex0x18=O(d4)\mathbb{E}|x_0-x_1|^8=O(d^4). Substituting this shows the dependence on dd is O(d3)O(d^3) in the hk3h_k^3 term and O(d2)O(d^2) in the hk2h_k^2 term. The dimensional dependence is also consistent in Corollary 5.2.

A3:Our work is focused on discrete-time analysis, which is new for the stochastic interpolant framework. Unlike continuous-time error bounds that use the 4th moment assumption (see [1]), our discrete-time analysis requires the 8th moment assumption. In Theorem 4.2, we control EbF(t,xt)bF(t,xt)2\mathbb{E}|\nabla b_F(t,x_t)\cdot b_F(t,x_t)|^2, requiring Ex0x18<\mathbb{E}|x_0-x_1|^8<\infty. This term appears as hk3γˉk2dEx0x18h_k^3\bar{\gamma}_k^{-2}d\sqrt{\mathbb{E}|x_0-x_1|^8} in Theorem 4.3's error bound.

However, this term doesn't dominate when the step size is small, with the error bound being dominated by hk2γˉk2dEx0x14h_k^2\bar{\gamma}_k^{-2}d\sqrt{\mathbb{E}|x_0-x_1|^4}. Relaxing the 8th moment assumption is a future research direction.

A4: It is true that the drift term bF(t,x)b_F(t,x) has larger variation near boundaries, motivating our exponentially decaying schedule in both theory and experiments. Our estimator approximates bF(t,x)b_F(t,x) well for t0=0.001t_0=0.001, so this choice does not pose a problem. We will add more details and visualizations in the revision.

A5: While analyzing the estimation errors of score and velocity terms separately is suggested, we find it unnecessary for our analysis. Partitioning bF(t,x)=v(t,x)+(ϵγ˙γ)s(t,x)b_F(t,x)=v(t,x)+(\epsilon-\dot{\gamma}\gamma)s(t,x) results in εbF2εv2+sup(ϵγ˙γ)2εs2\varepsilon_{b_F}^2\lesssim\varepsilon_v^2+\sup(\epsilon-\dot{\gamma}\gamma)^2\cdot\varepsilon_s^2, which is similar to [1] for continuous-time stochastic interpolants.

References

[1] Albergo, M. S., Boffi, N. M., and Vanden-Eijnden, E. Stochastic interpolants: A unifying framework for flows and diffusions, 2023.

[2] Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling, 2023.

[3] Albergo, M. S. and Vanden-Eijnden, E. Building normalizing flows with stochastic interpolants, 2023.


We hope our response addresses your concerns. If so, we wonder if you could kindly consider raising your score? We will also be happy to answer any further questions you may have. Thank you very much!

审稿人评论

I posted this comment yesterday as an 'official comment' without realizing that this is apparently not visible to authors. Please read below my response:

Thank you!

Thank you for the plots. Can you elaborate how one would interpret your linear growth plot in link1 in the context of Theorem 4.3? Does it connect with the theoretical result? Also, if you were to fix some and run numerics with increasing , scaling the number of steps necessary for this, would you get the same scaling as in theory?

I am happy to raise my score, if these experiments were conducted with good care. To be honest, I don't expect perhaps the behaviour in these experiments to match theory, as theory is just an upper bound. But this would give a quite good picture whether there is more work to be done, or bounds derived here are tight.

作者评论

Thank you for your comment! We are glad to further discuss the questions with you. Please see our response below.

In our experiments, the KL error scales almost linearly w.r.t. dd. The O(d2)O(d^2) KL error bound provided by our theorem holds for this case, and the linear growth of KL error in the experiments can be caused by several reasons. For example, there might be some special properties of gaussian mixtures which does not hold for general distributions, or there simply exists a sharper bound that haven't been found. Trying to improve the bound would be an interesting work in the future.

In addition, we have scaled the number of iterations (NN) in our experiments on Gaussian mixture data, which makes the convergence clearer, see link3 and link4. Moreover, compared to link1, we draw multiple curves in link4 to show the scale of KL error w.r.t. the dimension dd for fixed number of iterations.


We hope our response addresses your concerns. If so, we wonder if you could kindly consider raising your score? We will also be happy to answer any further questions you may have. Thank you very much!

审稿意见
4

This paper derives a convergence bound for the stochastic interpolants framework. The discretized SDE analyzed in this work is more general than the SDE analyzed in existing diffusion model convergence bounds. Like in the diffusion setting, the derived bound suggest an that an exponentially decaying discretization schedule is a better choice. The authors show empirically that this is indeed the case.

update after rebuttal

I thank the authors for their clarifications. The more general interpolant analysis remains quite close to existing work but I believe it could be interesting to have the result out there, I therefore increase my score.

给作者的问题

Could an additional assumption on the target distribution remove the unfortunate initial KL term in the bound?

论据与证据

The central claim in the paper is the convergence bound is provided in Theorem 4.3. The proof adapts the result of Chen et al [A] on the discretization of SDE to the stochastic interpolant framework. The differences introduced by the slightly more general interpolant I(x_0, x_1, t) are dealt with in Lemmas B.3 and C.1, which are the main technical contribution of the work (Could the authors confirm?).

A secondary claim is derived from the theorem which shows that an exponentially decaying schedule cancels out terms appropriately and leads to a simplified bound in Proposition 5.1. This claim is confirmed experimentally in Figure 3.


[A] Chen, Sitan, et al. "Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions."

方法与评估标准

.

理论论述

The analysis is sound. The only unfortunate element is the need for t0>0t_0 > 0 and the initialization error resulting from this. It creates a discrepancy between the bounds in the paper and bounds for diffusion models whose initialization error can be made exponentially small by increasing the simulation interval T.

实验设计与分析

.

补充材料

.

与现有文献的关系

There are some papers already providing analysis of the flow matching framework and the authors clearly compare their techniques with prior work.

遗漏的重要参考文献

.

其他优缺点

The strengths: The paper is clearly written and easy to follow. It adapts prior work and adds to the literature on discretized SDEs used for sampling. The derived bound gives indication on a good choice of step-size schedule and the authors confirm this experimentally.

A minor weakness: The technical extension over prior work might be slightly limited but it can still be worthwhile to have the bound in the literature.

其他意见或建议

.

作者回复

Thank you for your time and effort in reviewing our paper! We are grateful for your constructive suggestions, which have significantly guided our improvements. Please find our responses to your comments below.

Claims And Evidence / Weakness

The differences introduced by the slightly more general interpolant I(x0,x1,t)I(x_0, x_1, t) are dealt with in Lemmas B.3 and C.1, which are the main technical contribution of the work (Could the authors confirm?). A minor weakness: The technical extension over prior work might be slightly limited but it can still be worthwhile to have the bound in the literature

A: The main contributions of our work include the following. First, we propose the first discrete-time analysis for the stochastic interpolant framework, where we propose a new discrete-time sampler and provide the rigorous distribution estimation error bound for the sampler (Theorem 4.3). Second, based on the error bound, we further develop a time schedule for the discretization to achieve lower sample complexity (Corollary 5.2), when the latent scale is defined by γ(t)=at(1t)\gamma(t)=\sqrt{at(1-t)}. Lastly, we validate our theoretical results with numerical experiments.

Among our technical results, Lemma B.3, Lemma C.1 and Appendix B.2 address the discretization error of the drift term bF(t,x)b_F(t,x). In this part, the main difference between our result and previous results is that we utilize a more general interpolant I(t,x0,x1)+γ(t)zI(t,x_0,x_1)+\gamma(t)z between two general distributions. This introduces a novel velocity term v(t,x)v(t,x) in the process, and demands new analysis for the estimation error compared to existing results on diffusion models.

Questions

Could an additional assumption on the target distribution remove the unfortunate initial KL term in the bound?

A: Firstly, the initial KL term quantifies the discrepancy between the true initial distribution ρ(t0)\rho(t_0) and the estimated distribution ρ^(t0)\hat{\rho}(t_0), where X^t0\hat X_{t_0} is sampled from. We incorporate this term into our theorem to comprehensively cover the scenario where X^t0\hat{X}_{t_0} differs from ρ^(t0)\hat{\rho}(t_0), and provide a corresponding error bound in Theorem 4.3

Next, we look at two examples on how this term affects our error bound. If we define II such that I(t,x0,x1)=x0I(t,x_0,x_1)=x_0 within the interval t[0,t0]t\in[0,t_0], the initial error becomes exactly 00 because we can directly sample from ρ(t0)\rho(t_0) (note that during generation, data from ρ0\rho_0 is accessible). Furthermore, if γ2(t)=Θ(t)\gamma^2(t)=\Theta(t) near t=0t=0, we can achieve an O(t0)O(t_0) initial KL error if we choose ρ^(t0)\hat{\rho}(t_0) such that X^t0=x0+γ(t)z\hat{X}_{t_0}=x_0+\gamma(t)z for x0ρ0x_0\sim\rho_0 and zN(0,Id)z\sim\mathcal{N}(0,I_d). In this case, the initial KL error is the same to the initial KL error of diffusion models with T=Θ(log(1/t0))T=\Theta(\log(1/t_0)) [1].

References

[1] Benton, J., Bortoli, V. D., Doucet, A., and Deligiannidis, G. Nearly d-linear convergence bounds for diffusion models via stochastic localization


We hope our response addresses your concerns. If so, we wonder if you could kindly consider raising your score? We will also be happy to answer any further questions you may have. Thank you very much!

最终决定

This paper conducts a convergence analsysis of discretization of finite time stochastic interpolants. The authors further demonstrate how their theoretical results have practical implications on time-step scheduling. All reviewers appear to agree that the paper is theoretically sound. However, two reviewers raises conerns regarding the novelty of the work in relation to diffusion/flow models. All reviewers appear to agree that the experimental results are sufficient to demonstrate the theoretical contribution. However, one reviewer wanted empirical exploration beyond the theory; in particular investigation into the established bound as the dimension increases. Another reviewer wanted more empirical validation "in the wild" such as CIFAR-10 or ImageNet32.

In a camera-ready verison I would suggest that the authors consider:

  1. More clearly explain what the challenges are in transferring previous result on discretization of diffusion models to the stochastic interpolant framework.
  2. Empirical investigation of the error bound in relation to the dimension.