Stochastic Optimal Control for Diffusion Bridges in Function Spaces
摘要
评审与讨论
The paper uses a stochastic optimal control to derive Doob's h-transform in infinite dimensions, and it shows the relation between solving the optimal control problem and learning diffusion generative models. The approach applies both to bridge sampling and for generative modelling. The approach is demonstrated on infinite dimensional problems, including bridges between images and bridges between probability distributions.
优点
Strengths:
- well-written and interesting paper
- the stochastic optimal control approach to deriving Doob's h-transform is well-founded and interesting
- the authors derive a Bayesian inference algorithm using a reference measure
- the method is tested on simple examples
缺点
Weaknesses:
- Doob's h-transform in the infinite dimensional setting has been derived using other methods in previous papers (both in the linear and non-linear cases, e.g. ref [2,47], https://arxiv.org/abs/math/0610386). I believe the list of contributions and the introduction does not clearly show that the current paper is not the first to do this, e.g. [2] is first mentioned much later in the paper. I am not sure the introduction and list of contributions adequately reflects this, something that should be addressed before acceptance
问题
no questions
局限性
yes
We would like to express our gratitude to the reviewer for their thorough evaluation of our work. We appreciate your recognition of the merits of our research.
1.Clarification on prior work
- We agree that clarifying the relationship between previous papers on Doob's h-transform and our work is essential to avoid any unintended confusion. Our contribution involves leveraging stochastic optimal control to derive Doob's h-transform and extend finite-dimensional SOC problems based on conditional diffusion into infinite-dimensional spaces. In the revised manuscript, we will explicitly outline how our contribution compares to previous works, particularly [1].
[1] Baker et al., "Conditioning non-linear and infinite-dimensional diffusion processes."
Thank you for the response. Assuming that the revised manuscript clearly describes your contribution in comparison to the existing literature as you write in the response and as the other reviewers also request, I keep my accept rating.
The authors investigate the notion of h-transform in infinite dimensional state spaces and provide a novel representation (Theorem 2.3) based on connections to stochastic optimal control.
The authors introduce two approaches to using this h transform derivation - firstly in something resembling bridge matching whereby both marginals are known and secondly by simulating the process with network parameterized h transform and taking gradients through the simulation.
The authors then apply this to image super resolution and Bayesian inference tasks in function space.
优点
- Derivations appear correct
- Although the h-transform has been described for infinite dimension through Hilbert spaces in the context of diffusion models in Baker et al 2024 (https://arxiv.org/pdf/2402.01434); as far as I am aware this connection to optimal control is novel.
- Experiments are reasonable compared to other infinite dimensional methods (see below) but still not on the same level of fixed dimension methods for e.g. superresolution.
- Spectral diffusion processes, Phillips et al 2023: https://arxiv.org/abs/2209.14125
- Neural Diffusion Processes, Dutordoir et al 2022, https://arxiv.org/abs/2206.03992
- Baker et al 2024 (https://arxiv.org/pdf/2402.01434)
缺点
-
Motivation for infinitedimensional diffusion bridge is not very strong and experiments are not very convincing. I am not so familiar with the Bayesian inference experiments and what is SOTA. There are a few baselines missing as noted below. For the superresolution task there are stronger and simpler methods which have not been discussed. I think some stronger use-case in scientific applications would be needed for a higher score.
-
More discussion with Baker et al 2024 (https://arxiv.org/pdf/2402.01434) would be appreciated
-
As the authors note, the second training method for Bayesian learning problems (Alg 2) requires taking gradients through the simulated diffusion which can be slow, unstable and memory intensive. This goes against much of the diffusion model philosophy of splitting the generative problem into smaller problems through time and solving each jointly. I fear this will not be very scalable beyond 2D.
Experiments
- FID score or other quantitative metrics are note provided for superresolution tasks.
- There are no baseline or comparisions to other methods. There are many superresolution, and infinite dimensional diffusion methods.
- The authors compare to neural processes but there are more recent and comparable baselines for similar infinite dimensional / functional/ Bayesian experiments which do not rely on gradients through the simulated process, such as:
- Spectral diffusion processes, Phillips et al 2023: https://arxiv.org/abs/2209.14125
- Neural Diffusion Processes, Dutordoir et al 2022, https://arxiv.org/abs/2206.03992
问题
See weaknesses.
局限性
See weaknesses.
We appreciate the recognition of our paper's strengths and extend our thanks to the reviewer for their comprehensive review and insightful comments. Below, we provide detailed responses to address each valuable comment.
1.Motivation, Comparison with baselines
- We agree with the reviewer’s concerns regarding motivation and experiments. To address these, we have included a PDF file in the general response with additional experiments comparing unpaired image transfer methods with finite-dimensional baselines [1] and 1D functional generation with infinite-dimensional baselines [2, 3]. Additionally, we have further clarified our motivation. Please kindly refer to that section for more details.
2.Comparison with [4]
-
[4] primarily focused on developing the conditional diffusion process in function space. To achieve this, they defined a Doob’s h transform in infinite-dimensional space using Itô’s lemma and Girsanov’s theorem. This approach has its merits, such as enabling the conditioning of non-linear SDEs (while our work has considered the linear SDEs). However, simulating conditioned non-linear SDE often faces challenges because the conditional distribution of such SDE is generally intractable. Therefore, they require the approximation algorithm presented in [5].
-
While our approach shares the idea of developing an infinite-dimensional Doob’s h transform, as the reviewer already mentioned, our primary goal is not merely to derive Doob’s h transform but to generalize various finite-dimensional sampling problems [6, 7] into the infinite-dimensional space by exploiting the theory of infinite-dimensional stochastic optimal control.
-
In practice, the choice of linear SDEs to develop the relevant theory might appear to be a strict limitation for modeling complex distributions. However, similar to most recent diffusion-based models, the linear form may be sufficient for modeling. Moreover, this choice can be beneficial as it allows for more scalable algorithms due to the closed-form solution of the conditional distribution. In this light, by leveraging the theory of stochastic optimal control along with the choice of linear dynamical systems, our contribution also includes proposing tractable learning algorithms for real-world sampling problems.
3.Computational concerns
- As the reviewer pointed out and as we stated in our paper, Algorithm 2 may induce computational difficulties. While it is possible to consider a more computationally favorable approach, such as implementing the adjoint solver [8] for memory efficiency or using the variance reduction technique proposed in [9], in the current work we focus on the theoretical property that the optimal control still yields Bayesian posterior sampling despite being defined on function space. Proposing a more scalable algorithm will be an interesting direction for future work.
[1] Peluchetti, “Diffusion bridge mixture transports, Schrödinger bridge problems and generative modeling.”
[2] Phillips et al., “Spectral Diffusion Processes”
[3] Durordoir et al., “Neural Diffusion Processes”
[4] Baker et al., “Conditioning non-linear and infinite-dimensional diffusion processes”
[5] Heng et al., “Simulating Diffusion Bridges with Score Matching”
[6] Zhang et al., “Path Integral Sampler: A Stochastic Control Approach For Sampling”
[7] Shi et al., “Diffusion schrodinger bridge matching”
[8] Li et al., “Scalable Gradients for Stochastic Differential Equations”
[9] Xu et al., “Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations”
Thank you for the response. I believe my review and scores are appropriate.
This article proposes a perspective on diffusion-based generative models based on stochastic optimal control, with objective functions based on the log density ratio between objectives.
优点
As far as I could evaluate, the mathematics are correct, and this particular mathematical perspective is new (to the best of my knowledge).
缺点
I found this submission to have a weak presentation. It reads more like a stochastic calculus journal article than a machine learning conference submission.
This perspective is not clearly motivated: what is gained by considering an infinite dimensional perspective compared to the wide literature already approaching diffusion-based approaches through the length of stochastic optimal control? Since there are several elements that are infinite-dimensional in nature in this problem (distribution of random variables, score matching functions, etc), some early-on explanation and clarification of the approach considered here would be helpful.
Further, while a lot of the writing is centered around an infinite-dimensional perspective, this is then converted to a parametric model, with finitely many parameters. How much of the infinite-dimensional perspective is then lost? Is this important?
问题
Thanks to the authors for addressing my comments during the rebuttal.
局限性
Presentation and motivation - adressed by authors during rebuttal.
We gratefully thank the reviewers for their valuable feedback and suggestions. Here, we address the concerns raised by the reviewer.
1.Early-on explanation and clarification
- Following the reviewers’ suggestions, we have further clarified our motivation in the general response. Please kindly refer to that section for more details. We hope this provides the clarity you need.
2.Finite-dimensional Approximation
-
We would like to point out that the model having finite number of parameters does not mean that it should be modeling finite-dimensional models; for instance, Gaussian process regression, one of the most popular infinite-dimensional stochastic process models, effectively requires finite number of parameters to be estimated from the observed data. Usually, the function is evaluated only at a countable set of sampling points that are assumed to be generated from an infinite-dimensional stochastic process. In this case, we can approximate the infinite-dimensional function by fitting the models to the finite sampling points with finite-dimensional parameters.
-
The “finite dimensional approximation” happening in our model would be in the part where we are approximating the covariance operator Q. Specifically, we approximate Q via the truncation, that is, choosing the finite number of eigenfunctions among infinitely many ones. This approximation may incur approximation error, but it does not alter the nature of our model as an infinite-dimensional model; To see this, note that our model can model image data in a resolution-agnostic way. This is possible because ours deals with the infinite-dimensional stochastic processes, so it can model a varying number of finite sampling points.
Thank you for your comments and very helpful rebuttal - I have a better understanding now and have updgraded my score.
The paper presents stochastic control in function spaces with applications in diffusion bridges and Bayesian learning. Since the Lebesgue measure does not exist in infinite dimensional space, the authors derive Doob-h function with the Radon-Nikodym density with respect to a suitable Gaussian measure and conduct bridge matching experiments under this setup.
优点
Overall, the paper is well-motivated and well-written. The paper reviews the stochastic control in function space and the connection of Doob's h-transform with stochastic control and bridge matching in Section 2. It transits smoothly to Section 3, where it proposes an algorithm for diffusion bridges in function space, and an extension for Bayesian learning.
缺点
As the theory exists for stochastic control in function space, and there is a recent work on h-transform [1] and generative model in infinite dimensional space, the novelty mainly lies in the application to bridge matching and Bayesian learning. These applications are interesting and important, however, the weaknesses are in the discussion on Bayesian learning, and the experiments on bridge matching. In particular, there should be a comparison with finite-dimensional bridge matching; several arguments in section 3.2 about Bayesian learning need more clarification (see questions for the details of this point).
[1] Baker, Elizabeth Louise, et al. "Conditioning non-linear and infinite-dimensional diffusion processes." arXiv preprint arXiv:2402.01434 (2024).
问题
Comments and major questions:
- As the paper introduced in section 2.1, one can instead consider the cylindrical Wiener processes on the Cameron-Martin space. What will break down in the current results? Will the cylindrical Wiener processes set-up bring convenience to the experiments as it is implemented in finite dimensions?
- How do you arrive at equation(21)? What assumptions are required, and what are the regularity requirements for the energy function? It would also be good to remind the readers what mu_T is here.
- In equation(25), how are energy function U and covariance operator Q determined in general? What are the particular choices used in the presented experiments?
- What are the challenges of using time-dependent diffusion processes in the current method?
Minor points:
- The paper repeatedly refers to Lemma 2.2, but the authors seem to be referring to Theorem 2.2.
- The paper should include a proof of Theorem 2.3 for its completeness and rigor.
- Why is it H_0 instead of H in Theorem 3.2?
局限性
The authors have adequately addressed the limitations.
We sincerely appreciate your interest in our research and acknowledgment of its significant contributions. We are also grateful for the insightful questions raised by the reviewer, to which we have provided detailed responses in the subsequent text.
1.Comparison with finite-dimensional bridge matching
- Thank you for your valuable suggestion to improve our work. In general response, we have included a PDF file containing additional experiments and a comparison with finite-dimensional method [1]. We will incorporate these results into the paper.
2.Cylindrical Wiener process
- In theory, selecting a cylindrical Wiener process for infinite-dimensional SDEs can also be a viable option. Indeed, when prior knowledge of the data domain isn't available, opting for a cylindrical Wiener process, as noted by the reviewer, can facilitate the construction of a bridge model. However, the choice of Q determines the geometric structure of the Hilbert space where functional data resides. Therefore, If we model Q so that our Hilbert space contains characteristics of the target data, such as smoothness or curvature information, it can be beneficial.
- In practice, where our objective is to model complex data as a function, the choice of the covariance operator Q can significantly impact the ability of learned neural network operators (control in our case). For instance, in time-series imputation, our baselines [2, 3] differ only in their selection of Q (Technically “kernel” since they are finite-dimensional models) —[2] using a standard Wiener process and [3] employing an RBF kernel to construct the Wiener process— leading to performance improvements. Furthermore, in generative modeling, studies [4] empirically demonstrate that selecting an appropriate operator Q enhances the correctness of generation. Specifically, [4] highlights that opting for a cylindrical Wiener process may lead to issues such as mode collapse.
3.Regularity condition for U and choice of U and Q
- Equation (21) is coming from a Bayes formula [5, Section 2] , where and the potential is a measurable mapping for a given . Hence in equation (21), equation (25) and lines (245-246, 751) should be changed into. We apologize for the typo and for any confusion it may have caused in understanding the paper. The potential typically chosen as a negative log-likelihood function (also referred to as the potential energy); in our case, we set it as a negative Gaussian log likelihood function. For Bayesian learning, we choose and covariance operator as an RBF kernel. We have detailed the setting in Appendix A.9.2.
4.Time-dependent SDEs
- The main challenges of using a time-dependent diffusion process in our method are proving the existence and uniqueness of the invariant measure. For an explicit form of the h-function, as stated in Theorem 2.3, we need to define a certain class of Gaussian measures, where the collection of time-dependent Gaussian measures is equivalent to its invariant measure over a long-term period. This class of Gaussian measures should be defined by a linear SDE.
5.Minor comments
- We appreciate the reviewer pointing out the typo related to Lemma 2.2 and in Theorem 3.2. It should be expressed with respect to the norm in . We will make the necessary corrections in the revised manuscript. Moreover, as reviewer suggested, we will include the proof of Theorem 2.3.
[1] Peluchetti, “Diffusion bridge mixture transports, Schrödinger bridge problems and generative modeling.”
[2] Tashiro et al., “CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation”
[3] Bilos et al., “Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion”
[4] Hagemann et al., “Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation“
[5] Hairer et al., "Signal Processing Problems on Function Space: Bayesian Formulation, Stochastic PDEs and Effective MCMC Methods"
Thank you for the extensive responses and additional experiments! I expect the rebuttal/general response to be included in the final submission.
We sincerely appreciate the time and effort the reviewers have dedicated to evaluating our paper. In response to their valuable and insightful feedback, we have provided some general responses that address comments common to all reviewers. The attached PDF file includes relevant figures and tables for additional experiments.
1.Comparison with baselines
-
In line with the reviewers' suggestions, we conduct additional experiments to demonstrate the applicability of our method to various real-world problems.
-
First, for a comparison with recent infinite-dimensional baselines, we conduct an experiment on 1D function generation task. We evaluated our method against baselines on three datasets: Quadratic, Melbourne, and Gridwatch, following the setting provided in [1]. For generative modeling, we set the initial distribution as a centered Gaussian distribution with covariance operator and the terminal distribution as the target data distribution and utilize the bridge matching algorithm in Alg 1. We used an RBF kernel for . For quantitative evaluation, we employed the power of a kernel two-sample hypothesis test which attempts to distinguish between the dataset from generated samples. Table 2 in the attached PDF file shows that our method is comparable to the baselines. Moreover, we provide a generated sample compared to the ground-truth for each dataset in Figure 1.
-
Second, we compare our proposed model with a finite(fixed)-dimensional baseline. We conduct an experiment on unpaired image transfer between MNIST dataset and EMNIST dataset. We compare the performance of the [2] and our DBFS. For a fair comparison, we adhere to the iterative training scheme proposed by [2] where two forward control and two backward control models are learned alternately. We set for all methods. For quantitative evaluation, we estimate the FID score between the generated data samples and real datasets. Table 1 in the attached PDF file shows that our method is comparable to the finite-dimensional method. Furthermore, we provide additional generated samples at various unseen resolutions in Figure 2 to demonstrate the resolution-invariant property inherent in proposed infinite-dimensional models. We want to stress that our method may have slightly lower FID scores compared to finite-dimensional baselines. This may reflect observation in [3], where resolution-agnostic methods often have lower FID scores compared to resolution-specific methods. They argue that this is because resolution-specific methods can incorporate domain-specific design choices into their score networks (e.g., translation equivariance in CNNs for images). An interesting direction for future work would be to develop well-designed score operators for infinite-dimensional diffusion models.
2.Motivation
-
Traditionally, there has been interest in sampling from a probability measure on infinite-dimensional Hilbert spaces, particularly in Bayesian inverse problems [4]. Recently, the modeling of data as continuous functions has become increasingly popular within the machine learning community. This functional representation avoids the need for discretization, enabling the handling of data at arbitrary resolutions. Consequently parameterizing these functions with neural networks provides memory efficiency and the flexibility to represent various data forms [5]. For example, we can regard an image as a continuous function, where the function takes a 2-dimensional grid pixel location as input and outputs grayscale or RBF channels. Therefore, it is an infinite-dimensional object, as a continuous function can produce outputs for any 2-dimensional input defined on some domain.
-
Since diffusion-based models are powerful inference tools for various tasks, researchers have been working to extend these models to handle functional data representation. To achieve this, they have generalized the framework of previous diffusion models by extending their formulation into infinite-dimensional Hilbert spaces, also known as function spaces [6, 7]. However, previous diffusion-based generative models typically focus on sampling from a target data distribution. This framework cannot easily address various sampling problems, such as distribution transfer or exact sampling from a posterior distribution (in functional form as in equation (21)).
-
In the finite-dimensional case, these problems can be solved by exploiting the theory of stochastic optimal control (SOC) [8]. This motivates us to extend and generalize finite-dimensional SOC into the infinite-dimensional case to meet the demands of sampling problems from a functional perspective. In practice, by generalizing previous SOC-related problems into infinite-dimensional space, our model can naturally achieve resolution-free data transfer between any two image distributions, perform posterior sampling from a distribution over functions such as GP-posterior, and modeling irregular time series.
[1] Phillips et al., “Spectral Diffusion Processes”
[2] Peluchetti, “Diffusion bridge mixture transports, Schrödinger bridge problems and generative modeling.”
[3] Zhuang et al., “Diffusion probabilistic fields”
[4] Hairer et al., “Signal Processing Problems on Function Space: Bayesian Formulation, Stochastic PDEs and Effective MCMC Methods”
[5] Dupont et al., “From data to functa: Your data point is a function and you can treat it like one”
[6] Franzese et al., “Continuous-Time Functional Diffusion Processes”
[7] Lim et al., “Score-based Generative Modeling through Stochastic Evolution Equations in Hilbert Space”
[8] Zhang et al., “Path Integral Sampler: A Stochastic Control Approach For Sampling”
Please participate to the discussion
This paper tackles the problem of extending diffusion models and diffusion bridges to infinite-dimensional function spaces. The authors propose a novel approach that leverages stochastic optimal control (SOC) to derive Doob's h-transform, a key tool for constructing diffusion bridges, and extend it to infinite dimensions. This provides an interesting framework for learning bridges between infinite-dimensional distributions and generating samples from them.
The reviewers acknowledge the paper's significant contributions in bridging the gap between diffusion models, infinite-dimensional function spaces and SOC, and the potential of the method for handling complex problems involving continuous function space representations. However, when preparing the final version of the paper, the authors should carefully address the remaining concerns of the reviewers: better position their work in relation to existing work on Doob's h-transform and include more extensive experiments. This will improve the paper's impact.