Efficient Diffusion Models for Symmetric Manifolds
摘要
评审与讨论
The paper introduces a new framework for designing efficient diffusion models on symmetric manifolds, including torus, sphere, special orthogonal group and unitary group. The paper incorporates a spatially varying covariance structure that allows efficient training without computing the manifold heat kernel. In addition, the forward process involves a projected Euclidean Brownian motion, and thus avoids costly numerical solvers.
给作者的问题
- Could the authors comment on how to generalize the method to non-symmetric manifolds?
论据与证据
The proposed diffusion model based on manifold projection for symmetric manifolds is claimed to be efficient and there exists evidence both in terms of complexity analysis and runtime comparison in experiments. The method seems also improve sample quality on synthetic datasets compared to prior manifold-based diffusion models.
方法与评估标准
The proposed methods are based on properties of symmetric manifolds and appear to be correct. The evaluation is standard as in existing works.
理论论述
Did not closely check the proofs.
实验设计与分析
Experiments seem to verify the benefits of the methods in terms of efficiency and sample quality.
补充材料
Did not check the supplementary material.
与现有文献的关系
Given the need for efficiency and scalability in generative models, the motivation and findings of this work is significant.
遗漏的重要参考文献
NA
其他优缺点
Because I am not an expert in the field of manifold diffusion models, my judgement on the paper may not be less confident. In general, I found the paper to be well-written and the method has merits even though the idea may seem natural (as projection and its inverse are used for efficient training and sampling).
其他意见或建议
NA
Thank you for your valuable comments and suggestions. We are glad that you appreciate that the motivation and findings of this work is significant, and the benefits of our methods in terms of efficiency and sample quality. We answer your specific question below.
Could the authors comment on how to generalize the method to non-symmetric manifolds?
Thank you for this important question. Our theoretical guarantees for runtime and sampling accuracy rely on the following three properties:
-
Exponential map oracle: An oracle for computing the exponential map on the manifold .
-
Projection map oracle: A projection map , where , which can be computed efficiently, along with its Jacobian and the trace of its Hessian , as these appear in our training objective.
-
Lipschitz SDE on : The projection of the time-reversal of Euclidean Brownian motion should satisfy a stochastic differential equation (SDE) on with drift and covariance terms that are -Lipschitz at every point on , with growing at most polynomially in . This ensures accurate and efficient simulation of the reverse diffusion process.
Conditions (1) and (2) can, in principle, be satisfied even when is not a symmetric manifold. For example, consider the setting where is the boundary of a compact convex polytope as a structured non-symmetric domain. is assumed to contain a ball of some small radius centered at some point . While not a Riemannian manifold due to non-smooth points at vertices and edges, the polytope boundary is composed of piecewise flat -dimensional faces. Geodesics within faces are linear and can be computed efficiently, satisfying the spirit of (1). For (2), one can define the projection to be the map which maps a point to the intersection of the ray with the polytope boundary , where is the ray extending from the center and passing through . This projection can be computed efficiently, for example via binary search.
However, condition (3) is harder to guarantee in such cases. The drift of the projected reverse SDE has discontinuities at the vertices (and lower-dimensional faces) of the polytope. In our paper, we rely on continuous symmetries of the manifold to "smooth out" such irregularities and prove the necessary Lipschitz properties (see the paragraph "Showing 'average-case' Lipschitzness" on page 6).
Even among smooth Riemannian manifolds, generalizing beyond symmetric spaces is nontrivial. Examples include surfaces of revolution with non-uniform curvature (e.g., a torus with a varying cross-section) or higher-genus compact manifolds (e.g., a double torus), which lack the high degree of symmetry leveraged by our current analysis.
Extending our framework to such settings is a compelling and challenging direction for future research. We will include this discussion in the final version of the paper.
To improve the efficiency and accuracy of diffusion model on manifold, this work first defined a novel diffusion process on a so-called symmetric manifold by applying the projection map with mild smoothness condition. For the reverse process, instead of considering the manifold's head kernel that has no closed form, they design a new training objective to obtain the drift and covariance term. Furthermore, they obtain the complexity bound of the Wasserstein distance between the generated distribution and target distribution by using the comparison theorem for Riemannian manifolds, which is bounded by .
给作者的问题
- About the Assumption 2.1, what is the norm meaning? Is it the operator norm of a matrix, because I thought is a matrix when extending to ? Furthermore, I am not sure what and exactly mean.
论据与证据
Yes
方法与评估标准
Yes
理论论述
- (Line 179-192) What is the motivation of the symmetry property of ? Here, I don't understand and with . For the example of ,d but not in . Or you mean is a submanifold? Does this concept come from the concept of symmetric space?
实验设计与分析
Yes
补充材料
I have checked the Appendix A of proof outline and Appendix B for examples of classifcal symmetric manifolds.
与现有文献的关系
This work provides a novel approach to consider the diffusion on manifolds. Unlike the previous works that requires the heat kernel on manifolds, which has no closed form so that it requires a lot of calculations, their forward process, reverse process, and training strategy are much more computationally efficient. So it has the potential to be applied to unknown data manifolds.
遗漏的重要参考文献
No.
其他优缺点
Strength:
-
This work provides another approach to consider the diffusion on Riemannian manifold by training the drift and variance terms to avoid the problem of computing the manifold's heat kernel which has no closed form, so it improves the efficiency significantly.
-
The assumptions are not too strict, because it does not require the smoothness of the construction maps , . So it may be applied to considering the learnable for unknown manifold .
Weakness:
-
In general, the problem is how to check a data manifold satisfying the so-called symmetry property.
-
The sampling algorithm needs the previous knowledge of the exponential map, which may prevent the algorithm from being applied to unknown data manifold. The problem are whether we can train an exponential map and how the error effects the final accuracy.
其他意见或建议
- In Theorem 2.2 (Line 171), , and (Line 173) ?
- I thought it is better to provide a brief preliminary knowledge of geometry and diffusion on manifold in the appendix part.
Thank you for your valuable comments and suggestions. We are glad you appreciate the novel approach to diffusion on manifolds, and our significant improvement in computational efficiency. We answer your specific questions below.
…motivation of the symmetry property of ?
Thank you for this thoughtful question. The symmetry property, together with the average-case Lipschitz property for the map (Assumption 2.1), allows us to show that the SDE for the projected reverse diffusion on satisfies a Lipschitz property at every point on (We prove this in Lemma C.6; see also the paragraph Showing "average-case" Lipschitzness in Section 3.2). This everywhere-Lipschitz property in turn allows us to bound the numerical error of our algorithm's SDE solver.
Roughly, the average-case Lipschitz property (Assumption 2.1) says the projection map satisfies a Lipschitz condition which holds on a subset of “average-case” points which contains the Euclidean-space forward diffusion with high probability. Moreover, satisfies a symmetry property: the indicator function which determines if a point is in , is independent of the projection (e.g., for the sphere, depends only on the radial magnitude and not on the projection onto the sphere).
We show Assumption 2.1 holds for symmetric manifolds studied in our paper (see the paragraph following Assumption 2.1, and Lemma C.4).
I don't understand …
We appreciate the opportunity to clarify. The dimension of the Euclidean space is within a small constant factor of the dimension of the manifold (). The dimension of the Euclidean space that contains the 's, and the dimension of , add up to . In our paper we sometimes abuse notation and refer to the manifold's dimension as rather than "", as this does not change the runtime bounds beyond a small constant factor. We will clarify this.
When , each element of is an orthogonal matrix. The map takes as input an upper triangular matrix, and outputs an orthogonal matrix in . As each upper triangular matrix has nonzero entries, the space of upper triangular matrices (in vector form) is with
The dimension of is (as orthogonal matrices have degrees of freedom). As are the diagonal entries of an diagonal matrix, . Thus, , as .
how to check…symmetry property
In this paper, we assume the constraint manifold is known a priori and is one of several standard symmetric manifolds (e.g., sphere, torus, , or their direct products). This is typical in many applications—e.g., molecular data on tori or quantum evolution matrices in . In such applications, numerical methods are not required to verify the symmetry property.
…previous knowledge of the exponential map…
In the setting where the manifold is a symmetric-space, there are closed-form expressions which allow one to efficiently and accurately compute the exponential map. E.g., on or , this map is given by the matrix exponential.
In Theorem 2.2 (Line 171),
We will fix this typo.
…(Line 173) ?
The map , defined in Section 2, is the (restricted) inverse of where for all . {} is the pushforward of w.r.t. .
…preliminary knowledge…appendix.
We will include a brief primer on Riemannian geometry and manifold-based diffusion in the appendix.
… Is it the operator norm
Yes, this is the operator norm (induced 2-norm) of a matrix. We will clarify this in the final version.
…what and exactly mean.
In the decomposition , we define the partial derivative as the derivative of the parameterization with respect to . For instance, when is the special orthogonal group , we have , and the derivative corresponds to projecting onto the tangent space of .
This paper introduces a new method for producing scalable diffusion models on Riemannian manifolds with certain symmetries. The method is constructed by placing a diffusion process on a Euclidean space that can be projected entirely onto the manifold, with a partial inverse.
The training speed of this algorithm is significantly faster than the current best methods for applicable manifolds. It significantly improves on the runtime growth rate with manifold dimension, for special cases to be the same rate as Euclidean diffusion models.
The authors contribute several pieces of new theory to prove results regarding the runtime and accuracy of the model and the training objective.
Experimental validation proves out the theory by showing excellent results in high dimension scaling.
给作者的问题
Please see the relevant sections of the review.
论据与证据
I believe all the claims are well supported.
方法与评估标准
They do.
理论论述
I checked some of the derivations, that of the loss, the training and sampling algorithm.
The results on run time and accuracy are beyond my expertise.
实验设计与分析
The evaluation section is well conceived and demonstrates the advantage of the model well.
One missing experiment would be on real data of some kind - for example the earth sciences dataset typically used for Riemanian diffusion models, or something like the the experiments on molecule generation in papers such as Torsional Diffusion [1]. I suggest this as usually real data is significantly more mode-concentrated than synthetic data and it is useful to see if methods can handle this. I do not see this as a barrier to publication, but practitioners may find it more convincing to use the method if such experiments are included.
补充材料
I checked the sections relevant to the loss, the training and sampling algorithm, and the additional experimental detail.
与现有文献的关系
The methods here relate to the body of work modeling densities on Riemannian manifolds.Typically these methods have struggled to scale to higher dimensions outside of the special case of the torus. This method significantly improves on prior work on the subset of manifolds it applies to.
遗漏的重要参考文献
Not essential, but I would have like to see a mention of [1], as they also develop a model for placing density on the torus with runtimes on the same order as Euclidean space. This method is clearly different and applicable to other settings however.
其他优缺点
I found the presentation of sections 2 & 3 a little confusing. I found it strange to present the results ahead of the derivation of the algorithm and found myself constantly referring forwards. I would perhaps reorder these section to present the method first, followed by the results and then the proof highlights. This is just a suggestion however.
I was also unsure about the use of the word oracle throughout the paper to describe what to me are just known functions, such as the exponential map and the projection maps. Could the authors explain why it is needed to term these maps this way?
其他意见或建议
None
Thank you for your valuable comments and suggestions. We are glad that you appreciate the significant runtime improvement, our contribution of several pieces of new theory, and excellent experimental results in high dimension scaling. We answer your specific questions below.
One missing experiment would be on real data of some kind - for example the earth sciences dataset typically used for Riemanian diffusion models, or something like the the experiments on molecule generation in papers such as Torsional Diffusion [1]. I suggest this as usually real data is significantly more mode-concentrated than synthetic data and it is useful to see if methods can handle this. I do not see this as a barrier to publication, but practitioners may find it more convincing to use the method if such experiments are included.
Thank you for this helpful suggestion. We agree that evaluating our method on real-world datasets would enhance its practical relevance.
We considered the datasets you mentioned. Our theoretical results and experiments on synthetic datasets indicate that our method yields greater gains in sample quality and training runtime as the dimension of the manifold increases—for example, on tori with , and on and for (see Table 1 and Figure 1). As such, low-dimensional datasets (e.g., on ) are less likely to reveal the advantages of our approach.
In contrast, the GEOM-DRUGS dataset from the paper Torsional Diffusion for Molecular Conformer Generation [1] offers a promising setting. It consists of molecules whose torsion angles can be represented as points on tori of varying dimensions (average , with some above ), provided one first applies a preprocessing model [1] that infers torsion angles from 3D molecular structures. However, one cannot directly apply our framework to this dataset, as the torsion angles of different molecules lie on tori of different dimensions. Applying our model to this dataset would require extending our framework (and code) to handle a union of tori of varying dimensions, and adapting it to perform conditional generation based on molecular graphs.
We are actively exploring this direction and will include a discussion of such extensions and limitations in the final version.
Not essential, but I would have like to see a mention of [1], as they also develop a model for placing density on the torus with runtimes on the same order as Euclidean space.
Thank you for pointing us to this work. We will include a discussion of this work in Section 1 (Introduction) and Section 2 (Results) of the final version.
I found the presentation of sections 2 & 3 a little confusing... I would perhaps reorder these section to present the method first, followed by the results and then the proof highlights.
Thank you for this feedback. We will revise the paper so that the algorithmic details precede the presentation of results.
I was also unsure about the use of the word oracle throughout the paper to describe what to me are just known functions, such as the exponential map and the projection maps...
You are right — the term "oracle" is not necessary here, since the exponential and projection maps are known and computable in closed form for the manifolds considered. We will remove the term in the final version.
This work proposes a new efficient algorithm to generate symmetric manifold data, which enjoys gradient evaluation and nearly arithmetic operations (exactly for sphere and torus data) and significantly improves previous results. The main intuition is to take advantage of Ito Lemma and the projection operator to allow the use of closed-form score on the Euclidean space. They also use synthetic experiments to support their theoretical results.
给作者的问题
Please see the weakness part.
论据与证据
Yes.
方法与评估标准
Yes.
理论论述
I partly check the correctness of the proof (proof sketch) for theoretical claims.
实验设计与分析
I have checked the soundness of the analysis and experimental designs.
补充材料
No.
与现有文献的关系
Previous works on the manifold data either have gradient evaluation or exponential arithmetic operations. In this work, they improve these two terms at the same time.
遗漏的重要参考文献
No.
其他优缺点
Strengths 1: The intuition behind the algorithm design is helpful, and the technique novelty is clear (Sec. 3.2.).
Weakness 1: It would be better to add a notation part at the beginning of the appendix.
其他意见或建议
Comment 1: It seems that this work does not introduce the definition of
Thank you for your valuable comments and suggestions. We are glad that you appreciate that our method significantly improves previous results, the intuition behind the algorithm design, and the technique novelty. We answer your specific questions below.
It would be better to add a notation part at the beginning of the appendix.
Thank you for the suggestion. We will add a notation section at the beginning of the appendix in the final version.
It seems that this work does not introduce the definition of
We appreciate the opportunity to clarify. The Jacobian is the matrix whose -th entry is the partial derivative . While this is briefly mentioned at the top of page 5 (end of the first paragraph in the right column), we agree that the definition can be made more explicit. We will revise the explanation on page 5 and also include a formal definition in the new notation section.
This paper introduces a new method for producing scalable diffusion models on Riemannian manifolds with certain symmetries. Previous works in this direction have O(d) complexity of gradient evaluation and around O(d) arithmetic operations. In this work, the authors improve these two terms at the same time.
This paper was reviewed by the four reviewers. The reviewers agreed that the the proposed diffusion model is efficient and there exists evidence both in terms of complexity analysis and runtime comparison in experiments. The method seems also improve sample quality on synthetic datasets compared to prior manifold-based diffusion models.
Based on their 3 "accept"s and 1 “weak accept”, the decision is to recommend the paper for acceptance.
The reviewers did raise some valuable concerns that should be addressed in the final camera-ready version of the paper. The authors are encouraged to make the necessary changes to the best of their ability.
In particular,
- experiments on real world data, demonstrating properties of the method, should really help to promote the method among practitioneers (reviewer zeqM)
- it could be good first to present the algorithm, and then the results concerning its properties, but not vice versa (reviewer zeqM)
- discuss motivation for the symmetry property (reviewer q8R3)
- possibility to generalize to the case of non-symmetric manifolds (reviewer xdLt)
We congratulate the authors on the acceptance of their paper!