Diffusion Generative Modeling on Lie Group Representations
We introduce Diffusion-Based Generative Modeling on the (flat) Lie group representation space rather than on the (curved) Lie group
摘要
评审与讨论
This paper proposes diffusion generative models on Lie groups using group representations based on the Generalized Score Matching. The idea is to separate the group actions (Lie groups) and the data space, and generate distributions on Lie groups that yield samples in the data space. The model works under some technical assumptions. Some numerical examples are provided for this method.
优缺点分析
Strengths:
- The method is explained in a good mathematical manner. Theories are provided for the method to work.
- The discussion of related work is thorough.
- Numerical experiments on different data types, including molecular conformer generation, are provided.
Weaknesses:
- While the core idea appears to be interesting and straightforward, after reading the introduction, I found it difficult to clearly identify the paper’s contributions and novelty in comparison to prior work. I recommend revising the writing to improve clarity.
- Numerical experiments are not convincing or extensive enough to demonstrate the strength of the proposed methods. It can be improved by comparing it with a few recent methods cited in the related work (including equivariant diffusion models and some previous diffusion models on Lie groups), in particular, for the QM9 and CrossDocked2020 examples. Moreover, complexity of training and sampling should also be compared.
问题
-
The defined in line 35 seems different from that in line 86.
-
If is defined as in line 35, then is the image of indeed in line 56?
-
In Figure 1, why is any point in generated from the origin ? Also, why is associated with instead of ?
-
To make lines 86 and 87 consistent, it may be better to write in line 87.
-
Line 90, should be .
-
Is there any empirical comparison to the work of Zhu et al. (2024) and Kong & Tao (2024) cited in line 287? (e.g., for Abelian groups.)
-
Line 98, what is ? I cannot find the definition of that before this line. Is well-defined for any ?
-
Line 136, "result" "results".
-
Line 160, "Beyond this, the formalism remains fully applicable in the non-homogeneous case." I don't quite understand how the formalism remains fully applicable in the non-homogeneous case if is not homogeneous for .
-
What's the benefit or advantage of the proposed method compared to equivariant models? I understand that the model can be unconstrained on Lie groups, but it would also limit the diversity of generated samples, since you only create samples using group actions.
-
What if is not Euclidean but a curved manifold, in addition to the curved Lie group?
局限性
I would suggest that the authors include a paragraph or a section of discussion about the limitations.
最终评判理由
I appreciate the efforts that the authors make to address my questions. The additional experiments are praiseworthy.
格式问题
N/A
We thank the reviewer PLvS for their time for reading and reviewing our manuscript and for their insightful questions and comments.
Weaknesses
-
Paper's contribution and novelty. TL;DR.: We propose a novel diffusion process governed by SDEs, which operates not directly on the Lie group itself, but on its space of representations. Specifically, the diffusion takes place in the (flat) space of linear operators (i.e., matrices) induced by the group action rather than on the (curved) group manifold.
Formally, given it can be written as for some fixed and (in the manuscript we denoted ), where the above map is a matrix map, i.e., where is a matrix. For instance, for , is the angle and the known rotation matrix in 2d. Given this setup we can explain the three approaches:
- Lie group diffusion extracts from and performs diffusion on (the generally curved) group ;
- We propose to perform diffusion in the space of 's for any group . This matrix-valued diffusion process induces a stochastic flow on by applying the evolving matrices to points in , . In this way, the entire process remains within Euclidean space, as the representation elements are matrices acting on a vector space.
- Standard diffusion consists in taking to be the identity matrix (+ constant vector). This corresponds to the representation of the translation group , which makes it explicitly a subcase of our general formalism.
Compared to diffusion on the Lie group itself, our method is computationally simpler and more general. It avoids the geometric and optimization challenges of curved manifolds, does not require projections, and avoids the difficulty of sampling forward SDE trajectories for non-Abelian groups, where no closed-form solutions exists (but they do in our formalism!). We plan to clarify these points in the revised manuscript following the rebuttal. Please let us know if the above explanation is helpful for the reviewer to understand our work and its novelty in the context of standard Lie group diffusion.
-
Further experiments and comparison. We performed extra experiments to further increase the benchmarking evaluation of our strategy:
-
Synthetic datasets (for table of results please refer to the answer to reviewer d94q): We performed a quantitative evaluation using the Wasserstein-2 (W2) distance on the synthetic 2D/3D datasets comparing standard (Fisher) score matching () to our proposed approach with Lie group . One strong bias of such experiments is due to how close the prior distribution is to the target one and this affect the performance of the generative process. To mitigate this effect we report a normalized W2 metric, where we divide the W2 distance for the sampled points with the W2 distance of the priors. We noticed that GSM performs on par or better in most datasets, especially the ones where there is a clear inductive bias given by symmetry. In the MoG datasets the standard score matching (with ) does perform better, which is to be expected since there is no rotation symmetry to be exploited and translation symmetry is a good inductive bias since it helps identify the center of the various Gaussian modes. We also notice that outperformance of GSM is stronger in 3d than in 2d. We postulate that as the dimension grows, the model has a harder time to just "memorize" the distribution, and the symmetry-awareness advantage also increases.
-
Further CrossDocked benchmarking (for implementation details please refer to the answer to reviewer d94q): We conducted additional experiments on CrossDocked2020 using a BBDM model, finding that while BBDM and our Lie algebra-based method achieve comparable RMSD (Lie: , BBDM: ), BBDM produces unphysical poses due to linear interpolation in Euclidean space. Unlike our method and RSGM, BBDM fails to preserve global rotational structure, resulting in a higher rotation between the two distance matrices, compared to for both our method and RSGM. These results demonstrate that our method combines the best of both worlds: it maintains the symmetry structure of the data without sacrificing performance, matching unconstrained models in accuracy while outperforming Lie group diffusion approaches.
-
Quantitative MNIST evaluation(please refer to the answer to reviewer d94q for table of results): We evaluated the FID as well as the classification accuracy on the generated images from both our model and the BBDM one. While our model is only slightly better on the FID, it is far superior in the classification accuracy (93% vs 80%) (the classifier is a simple network trained on the original (unrotated) MNIST dataset, whose embeddings we also extract for the FID computation). This happens because BBDM sometimes generate bridges between different classes, as it cannot enforce a strict "rigid" rotation. We showed one instance of this occurence in Figure 6 of the manuscript.
-
Questions
- The reviewer is correct that our use of notation is a bit sloppy here. In line 35 it is the usual definition of a linear representation on a vector space. In line 86 we generalized this to be a general group action (thus not needing the structure of a vector space for ), but more importantly the map in 86 is the map in 35 applied at a specific point , using the fact that . To reconcile the two, these definitions relate via . We will revise the notation and clarify the text accordingly.
- This simply follows from the fact that , and the differential of this has indeed as the image.
- In Figure 1 we wish to emphasize how elements in are obtained through the representation of the group action. Here is not necessarily the classical origin of , but it is any point in the image of the identity element , . and are obviously swapped in the bottom side of the figure, we thank the reviewer for catching that!
- We appreciate the suggestion of the reviewer to make the notation clearer!
- Thank you for catching that!
- We did not compare our results with the works of Zhu et al. and Kong & Tao. While it is fair to cite these papers as they also approach group-aware diffusion using the structure of the Lie algebra, these methods are quite new and not been yet applied to real-world scenarios like molecule generations.
- is defined in line 93, and as Figure 2 depicts, it corresponds to the flow on induced by the map , given a flow (geodesic) on corresponding to a Lie algebra element . When the conditions of section 2.2 are satisfied, is well-defined. It can still be that there might be an ambiguity, for instance for , is the angle parameter, which is periodic. In such cases, we establish an interval for allowed values.
- Thank you!
- If is not homogeneous for , then cannot connect all points in , and the orbits partition . In such cases, our diffusion process will be restricted to the orbit of the prior initial condition, which is not the standard case where each (prior) point can be moved by the dynamics to any target point. However, the framework (score estimation, Langevin dynamics, etc.) still applies on each orbit separately.
- Equivariant models restrict the score function to a fixed representation of the , limiting its expressivity. In contrast, our method does not enforce equivariance on the score function, but it leverages the group action to guide the generative dynamics while allowing the score to be freely learned in . For instance, in , a strictly -equivariant model cannot capture angular variation, as equivariance dictates how the score must transform under rotations. Our approach overcomes this by modeling dynamics under a product group (e.g., ), allowing us to capture both angular and radial dependencies. Crucially, the score function remains unrestricted: the group governs the direction of the Langevin dynamics but does not constrain the form of the learned function.
- This is a very good question! We address it already in lines 283-284. In short, our results for Generalized Score Matching hold also for a generic curved manifold. Our Theorem 3.1 requires the flatness of since it is the ultimate goal of our work to derive a diffusion process following Lie group trajectories but all in Euclidean space. Nonetheless, it would be a very nice theoretical extension to extend our Theorem 3.1 to more generic spaces.
Limitations
In Appendix G, we discussed several limitations of our approach. Another potential challenge can lie in the flexibility of our framework. While its flexibility is actually a strength, as it enables accurate inductive bias modeling of the data structure, it can also require more extensive parameter tuning, particularly when working with novel datasets or symmetry groups as compared to standard score matching methods. We will make a separate section about the limitations.
We hope that our responses have adequately addressed the reviewer’s questions and concerns. We would be grateful if the reviewer could acknowledge the efforts we have made, particularly in generating new benchmarks, by considering to increase the overall score when updating their review. Should any issues remain unresolved, we would be more than happy to provide further clarification.
I thank the authors for detailed responses clarifying certain points. Also, I believe the additional experiments will strengthen the paper. I also wanted to let the authors know that I have updated my rating to 5, in case you cannot see the update before the decision. However, I still have one question regarding my Q9: According to your answer, will other methods also apply if you simply restrict your focus on each orbit, such that the applicability to the non-homogeneous case is not unique in your method?
We thank the reviewer for the very positive feedback! We truly appreciate that the reviewer finds that the new benchmarks are strengthening the paper. Furthermore, we value that the reviewer acknowledged our efforts and overall work.
Regarding your additional question about homogeneity: Absolutely, this is not a special feature required only for our method, nor only our method can be valid in case of lack of homogeneity. If the main text might suggest we are somehow implying so, we will reformulate it to avoid misunderstandings.
Having said that, we wish to make a couple of further comments on the topics. When discussing the strict notion of homogeneity, it automatically implies that there is an underlying group structure (since a homogeneous space is always with respect to a specific group). So, for instance, if we consider Lie group diffusion (taken as a Riemannian manifold) where the underlying data lives in a non-homogeneous space, we will get the same behavior, where the model is valid within the chosen orbits.
For standard diffusion models, to recreate this behavior is a bit trickier, since is homogeneous with respect to the translation group (which we showed is the Lie group underlying standard diffusion). However, we can artificially constrain the network to achieve this. Imagine that , and that the score network is built such that it cannot change the overall sign of the input, that is, . Then, a prior point starting in the negative axis will never be able to be mapped by the Langevin dynamics (of course, we need to take care of the noise term appropriately as well) to a target point in the positive axis. The model, however, should be able to learn this "disjoint" probability distributions as well.
To summarize, the property of being valid also in the absence of the condition of homogeneity is not proper only of our formalism, but other methods will work as well. We added this nonetheless as one of our conditions since it mirrors the typical expected behavior proper of standard Euclidean diffusion models, that any point of the prior can be connected to any point of the target distribution.
We hope that this helps clarify the role of homogeneity in our framework. Should there be any further questions, we are more than happy to answer them throughout the full rebuttal phase. Thank you again for taking the time to review so carefully our work!
Thank you for the clarification. I am satisfied with that and do not have any other questions.
The authors present a new, more general way of presenting score matching, using representations of Lie groups (instead of diffusing on them directly), which enables them to retrieve usual score matching and generalise it to datasets with more structure. They demonstrate the effectiveness of their method on a variety of datasets.
优缺点分析
Strengths:
- The paper is very well-presented. Neat, great-quality plots are available for all visualisations, and rather thorough introductions to Lie groups are given (as much as can fit in a page or so).
- The goal is very clear, and the paper is well written in general. Theoretically, the paper is sound, and the results are nicely presented.
- The experiments demonstrate the quite clearly.
Weaknesses:
- Section 3 should be restructred slighlty, wherein I would include a subsection containing the theory, instead of jumping into it straight ahead (so to have a 3.1 and 3.2).
- The experiments are good, but sometimes lack easily understandable metrics, or they are sometimes not put forth enough. (And they should be, as the numbers are quite good.) But MNIST could use an FID evaluation, for instance. The synthetic experiments would benefit from numbers as well (perhaps, W2 distance?). For QM9 conformers, the standard metrics are reported, but none really exhibit the superiority of the method, I believe, at least in this scenario (which is fine).
- More experiments on higher dimension groups could be useful to verify that the method scales.
The paper is good as such, I believe, but I think a few improvements to its experimental section can make it stronger. Although, perhaps there are no useful examples of higher dimensional Lie groups?
问题
- Are there any other datasets that could be using higher dimensional groups, and have you tried them out? It seems to me that in low dimensions, quite a lot of methods can work naturally, but, of course, it might be hard to find a good example. Perhaps some material sciences datasets, with space groups? I understand as well it can be out of the expertise of the authors. In general, I would also like to see more evidence that the method outperforms others that do not have such a geometric inductive bias. (Again, please point it out if I have missed it in the existing paper.)
- Are there any other low-dimensional groups you can try out, even on synthetic data? Other groups that are perhaps a bit more challenging than or ? (Feel free to reply to 1 and 2 together.)
- How "difficult" to satisfy are the conditions that you pose on and ? That is to say, are they high requirements in terms of regularity of and ?
局限性
I am mostly concerned about the scalability of the method in higher dimensions.
最终评判理由
The authors clearly demonstrated that their method works on a variety of settings with a rigorous and thorough methodology. The thoroughness and rigour of the work is not to be doubted.
格式问题
We thank the reviewer d94q for their time reading and reviewing our manuscript for their and insightful questions and comments. We are particularly pleased that the reviewer found the goal of the paper clear, theoretically sound and that the experiments demonstrate our claims.
Weaknesses
-
The suggestion of the reviewer regarding the structure of Section 3 makes absolutely sense and we will implement it in the final version of the manuscript.
-
Additional experiments and quantitative benchmarks
- We include a quantitative evaluation using the Wasserstein-2 (W2) distance on the synthetic 2D/3D datasets comparing standard (Fisher) score matching () to our proposed approach with Lie group . One strong bias of these distributions is that all the datasets considered are in fact symmetric with respect to the origin in , as is the standard Gaussian which is the prior for the Fisher score matching models. The similarity of the prior distribution to the target one affects decisively the performance of the generating process. Indeed, both methods produce visually indistinguishable samples across all manifolds tested, as they only differ to the relative weighting given to the different modes (in the updated manuscript we will provide images of all the generated distributions). To try to at least partially mitigate this effect, we report a normalized W2 metric, obtained by dividing the W2 distance for the sampled points with the W2 distance of the priors. We noticed that GSM performs on par or better in most datasets, especially the ones where there is a clear inductive bias given by symmetry. In the MoG datasets standard score matching (GSM with ) does perform better than the group , (which is to be expected since there is no rotation symmetry to be exploited and translation symmetry is a good inductive bias since it helps identify the center of the various Gaussian modes. We also notice that outperformance of GSM is stronger in 3d than in 2d. We postulate that as the dimension grows, the model has a harder time to just "memorize" the distribution, and the symmetry-awareness advantage also increases.
Table: Comparison of GSM () and standard Fisher Score matching () on 2D and 3D synthetic datasets. Best results are in bold. When numbers are too close, we consider them on par.
Dataset Group W2 MoG (2D) (GSM) 0.34 MoG (2D) (standard) Concentric Circles (2D) Concentric Circles (2D) Line (2D) Line (2D) 0.56 MoG (3D) MoG (3D) Torus (3D) Torus (3D) 0.35 Möbius Strip (3D) Möbius Strip (3D) 0.16 -
We performed additional experiments on CrossDocked2020 comparing our Lie-algebra induced generalized score matching against standard equivariant Euclidean diffusion (BBDM) using identical network architectures (please see Reviewer 2 DmbC response for implementation details). While RMSD metrics are comparable, GSM generates 3D poses that faithfully preserve transformations, whereas BBDM produces implausible final ligand poses (with deformed geometries, e.g. non-planar aromatic rings, or wrongly streched bonds) that furthermore deviate from proper global rotations. In particular, to measure this effect we computed the mean absolute error on the distance matrices, i,e., , while RSGM and our method achieve by design. Thus, our approach offers a clear advantage over other strategies, as it keeps the RMSD of unconstrained diffusion while keeping the perfect structure due to the symmetry inductive bias, and outperforming Lie group diffusion. We will include the new experiment in the final version of the manuscript. Also note that the group is with (so it is of high dimensionality in a way), and the representation on which it acts has dimensionality where is the number of atoms in the ligand, also of very high dimension, showing scalability of the method.
-
We also perform additional benchmarking in the MNIST experiment, as the reviewer suggested. In particular we evaluted the FID as well as the classification accuracy on the generated images from both our model and the BBDM one. While our model is only slightly better on the FID, it is far superior in the classification accuracy (the classifier is a simple network trained on the original (unrotated) MNIST dataset). This happens because BBDM sometimes generate bridges between different classes, as it cannot enforce a strict "rigid" rotation. We showed one instance of this occurrance in Figure 6 of the manuscript.
Model Average Accuracy ($\uparrow$) Average FID ($\downarrow$) GSM BBDM $0.80 \pm 0.10$ $133.4 \pm 19.0$
Questions
-
In the manuscript we already presented a synthetic dataset in 4D with group , showing that the method straightforwardly applies to higher dimensional groups. The results can be found in Figure 5h of the manuscript. In appendix A.6 we provided the necessary mathematical derivation for applying our method to . We did search in the short time we had at our disposal for the rebuttal if there is some easy to implement dataset from material science, but unfortunately we did not find a standard one on which to run our method (material science does not lie unfortunately within our area of competence, and it seems it would require some further knowledge about the data and experience to find "the right questions to ask").
-
Unfortunately, most of the continuous groups actions relevant for common data type are (product of) rotations (), translations , dilations . These are also the groups that are tackled in virtually all the papers on Lie group diffusions. Other groups like the unitary groups are relevant for quantum mechanics applications, but they are complex groups and the diffusion process needs to take place on a complex manifold, which has not been tackled yet systematically in the literature. This is however a very interesting direction for future work, but would be out of scope for the current manuscript.
-
We thank the reviewer for raising this important point. While the formal conditions on the space and the fundamental fields may initially appear technical, they are in fact natural and routinely satisfied in common settings involving Lie group actions. We explicitly verified these conditions in every experiment presented in the paper, including those involving , and .
The results of the conditions can be formulated in a practical series of steps that the practitioner must follow/check when setting up the theoretical framework for learning on a new use case:
- Identify the space where diffusion is performed. This is not necessarily the space where the data resides, but rather the space where the network outputs. For instance: has dimension 1 for the MNIST experiment, while dim data-space . (Pre-condition)
- Dimension matching of the group action: Ensure that the dimension of the image of the group action satisfies . This step is often straightforward, as it involves computing the kernel of the group action (i.e., the group elements whose action is the identity element on ) and ensuring the dimension of its complement matches or exceeds . (Condition 1)
- Group coverage of the space: Verify that the group "covers" the entire space . For example, generates all of , and generates all rotations of an image. This property is usually so obvious that it doesn’t require explicit mention. (Conditions 1 and 2)
- Lie algebra differential operator commutators: Compute the action of the Lie algebra differential operators and check that the commutators vanish. While this calculation can sometimes be lengthy, it follows systematically from the representation of the group action. (Condition 3)
We hope that this clarifies the intuition behind the conditions and makes it clear that they are natural and mostly very easy to satisfy and check.
We thank the reviewer for their thoughtful and constructive feedback, which has helped us improve the experimental section through the inclusion of new metrics. We hope that our additional work demonstrates our commitment to addressing the reviewer's concerns, and we would be grateful if this could be reflected in an updated review score.
I would like to thank the authors for their thorough and complete response. The experiments confirm the solidity of the paper, and my doubts have been allayed by your answers.
PS: Just as a clarification: I did not allow myself to request from the authors a material science experiment, as I fully understand that it is not necessarily of their expertise and could have required a lot of time; it was a mere suggestion for what could have been further datasets. Thank you for answering thoroughly to that as well.
Dear Reviewer d94q,
Thank you for the positive feedback and for reviewing our work. We're pleased that our responses addressed your concerns and that the new experiments demonstrated the strength of our approach.
We didn't mean to dismiss your material science suggestion, we find it quite interesting and hope other researchers will explore applying our framework to such applications.
Please feel free to ask if any other questions or remarks arise during the remaining rebuttal discussion period. Thank you again for your time and effort.
The paper introduces a new method for applying diffusion models to Lie group representations. It employs a score-based diffusion process with generalized score matching, enabling modeling on any Lie groups, including non-Abelian ones, which have been challenging to handle until now. The diffusion process stays in the original Euclidean space but is guided by Langevin dynamics through Lie algebras. Several examples of Lie algebras are provided, including those with real-world applications, such as generating molecular conformations. The experimental section demonstrates the performance of this model across tasks from different domains, including generating point clouds, predicting molecular conformations or docking poses, and rotating images.
优缺点分析
Strengths:
- The paper has solid theoretical foundations, with theorems proven in the Appendix (although I did not review the proofs thoroughly).
- The core concept of the paper is presented in figures, which are helpful in understanding the proposed diffusion process.
- The diffusion model on Lie group representations may have interesting applications in different domains, including molecular geometry prediction, as described in Section 3.1.
- The presented approach is original and unique because generalized score matching has not yet been applied to Lie algebras. This development required working through the mathematical foundations of the model and providing Langevin dynamics for this new SDE.
Weaknesses:
- The experiments lack baselines to motivate the use of the proposed new diffusion model. For example, the generated distributions in Figure 5 should be both quantitatively and qualitatively compared to diffusion models using standard score matching.
- Similarly, can docking poses be predicted as a simple regression task with a similar neural network architecture? It would be recommended to compare these results with other neural docking models, such as DiffDock.
- The paper might be hard to follow for readers unfamiliar with Lie algebras. Many new terms are briefly defined in Section 2 without providing the intuition behind them.
- The labeling of rows in Figure 6 may be confusing, especially for panel (b), where two labels correspond to two rows, and the middle label does not correspond to any row.
- Although the method is very interesting, its significance in real-world applications is unclear. The authors mention modeling molecular geometry using dihedral angles, but then in the experimental section, they employ different, simpler Lie groups, such as rigid body movements.
问题
- The rotated MNIST images in Figure 6 are blurred for the proposed model. Do you think it is because of the expressivity of the neural network used to generate these images or other problems with the diffusion process?
- For molecular conformation generation, why do you use Lie groups other than those introduced in Section 3.1, which were well-motivated in that section? For the docking example, it would be interesting to compare your approach with DiffDock for semi-flexible docking, which employs a similar strategy of moving diffusion to a different manifold of torsional rotations. For the QM9 example, could you justify using this particular group? If the space is spanned at each atom position, what is the role of the dilation transformation? Does it not cause problems with scaling bond lengths?
- You mentioned that you use the Casimir element to compensate for the deviation of the tangent vector from the orbit. In practice, do you sometimes observe errors that accumulate throughout the diffusion process?
局限性
The limitations are explained in Appendix G. The authors do not mention any broader impacts, nor do they justify why their research is considered to have no societal impacts in the checklist.
最终评判理由
The Authors adequately addressed all my concerns. The paper is strong theoretically, and the experiments have now been significantly improved. I hope that all feedback from the discussion period will be integrated into the final version. Consequently, I increased my score to 5.
格式问题
N/A
We thank the reviewer FJCA for their time reading and reviewing our paper and for their insightful questions and comments.
Weaknesses
-
We performed several extra experiments to further increase the benchmarking evaluation of our strategy for three of our experiment setups. In summary:
-
Quantification of performance on the synthetic datasets (for the table of results please refer to the answer to reviewer d94q): We conducted a quantitative evaluation using the Wasserstein-2 (W2) distance on synthetic 2D and 3D datasets, comparing standard (Fisher) score matching () with our proposed approach based on Lie groups ( and ). A strong bias in such experiments arises from the similarity between the prior and target distributions. To account for this, we report a normalized W2 metric, dividing the W2 distance between samples and target by the W2 distance between target and the corresponding priors. We observe that GSM performs on par or better in most datasets, particularly where symmetry provides a clear inductive bias. In the MoG datasets, standard score matching () outperforms the Lie group model (), which is expected since no rotational symmetry is present, while translation symmetry effectively helps locate the Gaussian modes. The performance gap becomes even more pronounced in 3D, where GSM shows stronger advantages. We hypothesize that in higher dimensions, memorizing the target distribution becomes more difficult, and models that incorporate symmetry more explicitly benefit increasingly from this inductive bias.
-
We performed a new experiment on CrossDocked2020 (see Response to Reviewer DmbC below for details of the implementation): We evaluated a (equivariant) Brownian Bridge Diffusion Model (BBDM) on CrossDocked2020 and compared it to our Lie algebra-based method. Both achieved similar RMSD (Lie: , BBDM: ), but BBDM suffers from unphysical poses and fails to preserve global SO(3) rotation (despite being an equivariant network), yielding a rotation (computed between the pairwise internal distance matrices of the two point clouds) versus for our method, indicating that our Lie algebra-induced diffusion offers a clear advantage over standard Diffusion models in this bridging problem.
-
Quantitative MNIST evaluation: We evaluated the FID as well as the classification accuracy on the generated images from both our model and the BBDM one. While our model is only slightly better on the FID, it is far superior in the classification accuracy (the classifier is a simple network trained on the original (unrotated) MNIST dataset, whose embeddings we also extract for the FID computation). This happens because BBDM sometimes generate bridges between different classes, as it cannot enforce a strict "rigid" rotation. We showed one instance of this occurrence in Figure 6 of the manuscript. We report the evaluation in the following table (best results in bold):
Model Average Accuracy ($\uparrow$) Average FID ($\downarrow$) GSM BBDM $0.80 \pm 0.10$ $133.4 \pm 19.0$
-
-
We acknowledge that Lie group and representation theory may be unfamiliar to many in the machine learning community. In response to the reviewer’s suggestion, we added a table in the appendix summarizing key notation, definitions, and intuitive explanations. The full version is available in the response to Reviewer DmbC.
-
We understand the reviewer's point. We will just have one label per column to improve consistency of notation.
-
We understand the reviewer’s concern regarding the real-world significance of our method, and to this regard we wish to share with the reviewer our point of view on the matter and our current/future plans. When preparing this paper, we faced a strategic choice. One option was to focus on a single high-impact application, such as molecular design or protein docking, and demonstrate competitive (out)performance in that specific context. The other was to present a more methodological contribution: a general framework grounded in rigorous theory, validated across multiple data types to illustrate its versatility and correctness. We deliberately chose the second path. Our motivation is that the relevance of symmetry and Lie group structure in data extends far beyond individual applications such as molecular conformer generation or protein docking. By establishing the theoretical foundations and practical applicability of our method in a variety of controlled settings, we aim to provide the machine learning community with a robust and general-purpose tool. While specific applications often require extensive domain-specific tuning, these can obscure the contribution of the underlying method itself. Nonetheless, we agree that a focused, real-world application would further underscore the utility of our approach. We are currently working on a follow-up study dedicated to Lie group-aware molecular generation in protein binding pockets.
Questions
-
The blurriness of the images is due to a interpolation layer we added to the sampling process due to the fact that the pixels do lie on a integer grid, but the diffusion process is in continuous angle space. Unless the angle is a multiple of 90 degrees, the transformed pixels will lie outside the grid, and we necessitate an interpolation step to assign values on the original grid. Performing this procedure for each step of the diffusion process (T=10) results in this smoothness effect. To address the blurriness in visualization, we initialize the rotation angle change and iteratively update it during each diffusion sampling step by adding the score network output (properly scaled by the diffusion scheduler) and Brownian noise. At the final timestep, we apply the accumulated rotation change to the original (perturbed) MNIST image. The trajectory figures will be updated in the final manuscript.
-
That's a very good question. Given the technical and methodological nature of the paper, the main purpose of the experiment is to show the validity and the effectiveness of the method. A full implementation of the method for semi-flexible docking would go beyond the scope of this work, but it is indeed the topic of a follow-up focused on a very similar use-case as discussed here. With the QM9 experiment, we aim to validate our formalism (as outlined in Sections 2 and 3) by demonstrating that it holds even in the absence of an exploitable symmetry structure. We compare two group actions, (standard diffusion) and (generalized score matching), and show that both models solve the task equally well, with no significant differences in performance or energy. This supports our claim that, as long as the setup satisfies the conditions in Section 2.2, it can be used to learn any unconstrained, unconditional distribution, regardless of whether it aligns with specific degrees of freedom. So we choose on purpose two groups with no data-related symmetry to demonstrate the universality and correctness of the approach, rather than claiming some particular advantage in this case.
-
For the Casimir, we begin by making a small clarification. We did not add such term explicitely by hand. The term arises naturally in both backward and forward SDEs when imposing the condition that they are paired, namely, that they generate the same time-marginal probability distributions at all times . In the manuscript, we described the intuitive meaning of such term, namely, that it compensates for the orbit deviation due to the curvature, since it is not obvious from just looking at the formulas. To better demonstrate its effect, we have sampled from the trained model with and without the Casimir term in the backward process for the components in the 2d synthetic datasets. We then report the average radial distribution of the sampled points and of the original dataset
Dataset With Casimir Without Casimir Original data GMM (2D) 8.3 9.6 8.3 Concentric Circles (2D) 6.4 7.3 6.4 Lines (2D) 2.0 2.5 2.0 As we can see from the table, with Casimir the model captures perfectly (at least the mean) of the true radial distribution. The sampling with no Casimir always leads to a systematic overshooting of the radial component (we recall that the $\mathrm{SO}(2)$ Casimir is purely radial, as depicted in Figure 3 of the manuscript).
We hope that our responses have adequately addressed the reviewer’s questions and concerns. We would be grateful if the reviewer could acknowledge the efforts we have made, particularly in generating new benchmarks, by considering to increase the overall score when updating their review. Should any issues remain unresolved, we would be more than happy to provide further clarification.
Thank you for your response! I am satisfied with the clarifications and new benchmarks you proposed. I have one final question for the authors: in the MNIST example, how does the FID score (and also accuracy) change with rotation angle? What are the FID scores before and after the modification you described in your response to the first question?
Dear Reviewer FJCA, thank you again for the time and effort during this rebuttal period, and we are very pleased that you are satisfied with our new benchmarks!
For the second question, we would like to recall that the (MNIST-image) representation we are dealing with is describing pixel values in a 2d grid. So , defines a - dimensional matrix. This matrix is block diagonal where each -dimensional diagonal box is identical, since it implies an identical rotation for all the pixels. In pixel space, we can interpret the forward SDE applying a shared Brownian noise term on all pixel values through the flow coordinate which represents the rotation angle, see Eq. (54) in the Appendix of the submitted manuscript. We evaluate three GSM variants during the diffusion sampling process. We denote as MNIST image that comes from the rotated prior distribution.
GSM-v0 rotates intermediate images using the (infinitesimal) change in angle at each step: ,
where
describes the (matrix) representation of the rotation.
This is the matrix (as mentioned before, obtained as a direct sum of identital matrices applied to the individual pixels) and this matrix is applied to the image for each .
Applying consecutive group actions leads to cumulative interpolation artifacts (due to interpolation in torch.nn.functional.rotate using the BILINEAR interpolation) as each intermediate image gets rotated, yielding FID: and accuracy: .
GSM-v1 accumulates angle changes in form of generated representations and applies the total rotation only at the final predicted step: where is the product of single (infinitesimal) rotations. While this reduces interpolation artifacts compared to GSM-v0, the intermediate trajectory still involves rotated images during the diffusion sampling process (as explained above in GSM-v0), and it achieves FID: and accuracy: . This version was reported in our first response in the rebuttal period.
GSM-v2 rotates intermediate images (at each timestep) relative to the prior (perturbed) image: . Note that each of the single matrices are still generated from the previous state . This approach minimizes interpolation artifacts by always rotating from the perturbed prior image rather than from previously rotated intermediate results, achieving the best performance with FID: and accuracy: . This result will be used in our final updated manuscript.
Note that in all methods the training is completely identical. All methods involve rotating intermediate images during sampling, but differ in their rotation strategies: incremental rotations from previous states (GSM-v0), accumulated rotation applied once (GSM-v1), or progressive rotation from the prior perturbed image (GSM-v2). The progression shows clear improvement as interpolation artifacts are minimized.
In other words, as our method can be seen as a matrix-valued diffusion, where we obtain a stochastic path on by applying the diffusion representation matrix to points in (see also response to Reviwer PLvS), the different methods differ by how we perform this operation. In GSM-v0 we apply this to every step. In GSM-v2 we apply only the combined (diffused to timestep t) matrix to the point in . GSM-v1 is equivalent to GSM-v0, but for visualization purposes we apply the strategy of GSM-v2 to the final image.
Updated table for Accuracy and FID scores comparing GSM against BBDM
| Model | Average Accuracy (↑) | Average FID (↓) |
|---|---|---|
| GSM-v0 | 0.92 ± 0.04 | 333.5 ± 59.7 |
| GSM-v1 | 0.93 ± 0.04 | 130.8 ± 22.0 |
| GSM-v2 | 0.96 ± 0.02 | 85.77 ± 15.7 |
| BBDM | 0.80 ± 0.10 | 133.4 ± 19.0 |
Regarding the first question, we save intermediate results at the diffusion trajectory for times for BBDM and GSM-V2 and evaluate the samples on accuracy and FIDs. As expected for GSM, the closer the diffusion sampling approaches the final state, the more accuracies and FID scores improve.
For the BBDM model, we observe that throughout the diffusion sampling trajectory the mean accuracy does not improve over time since intermediates are not resembling MNIST digits such that the classifier might only predict the mean class, while the FID score improves. (A similar example is in the new CrossDocked2020 experiments, where intermediate conformers are unphysical due to the liner interpolation of two rotated point clouds). Note that our model is superior to BBDM for both metrics at all time points.
Model performance on generated samples from specific timepoints
| Model | Image at | Average Accuracy ↑ | Average FID ↓ |
|---|---|---|---|
| GSM-V2 | 0.25 | 0.8404 ± 0.07 | 164.1 ± 36.0 |
| GSM-V2 | 0.50 | 0.9384 ± 0.03 | 98.3 ± 20.9 |
| GSM-V2 | 0.75 | 0.9517 ± 0.02 | 88.7 ± 15.9 |
| GSM-V2 | 1.0 | 0.9554 ± 0.02 | 85.8 ± 15.7 |
| BBDM | 0.25 | 0.8230 ± 0.09 | 199.7 ± 43.6 |
| BBDM | 0.50 | 0.8299 ± 0.08 | 131.4 ± 19.4 |
| BBDM | 0.75 | 0.8212 ± 0.09 | 126.1 ± 17.1 |
| BBDM | 1.0 | 0.8048 ± 0.10 | 133.4 ± 19.0 |
As a final remark, we would like to point out that the MNIST experiment was not intended with the goal to optimize the FID score, but to check if our proposed method is doing what we intend it to do: gradually and correctly aligning a randomly rotated image to its true state, while respecting the symmetry of the problem. We are however grateful to the reviewer for giving us a change to improve on the sampling strategy and to improve the FID and classification accuracy altogether!
Instead of allowing pixel values to change entirely through a sampling trajectory as in BBDM, our method is robust and even after half of the sampling process, the model is able to correctly align MNIST digits such that an external classifier can predict the corresponding digit with accuracy with moderate FID, which steadily improves up until the final time reaching an overall average accuracy of and FID of , far superior to the BBDM model.
We hope our responses addressed comprehensivly your two questions. If there are any open remarks or questions on your side, we are more than happy to engage in further discussions during the full rebuttal period. Beyond this, we also hope that we have sufficiently addressed your concerns, and if so, we would truly appreciate if you would consider adjusting your evaluation accordingly. Thank you again for taking the time for reviewing our work and providing constructive feedback.
Dear reviewer FJCA,
We thank you again for taking the time to review our work and engage in this constructive discussion. Since we haven't heard back following our last response, we hope this indicates that our replies have adequately addressed your concerns and questions. With time remaining in the rebuttal period, we're happy to address any further questions or clarifications.
We are encouraged by the positive feedback you've provided throughout the rebuttal process, and we believe the additional experiments have significantly strengthened our paper's contributions. We would be grateful if the improvements demonstrated through our responses and the enhanced experimental validation could be considered in your final evaluation.
Thank you once more for your thoughtful review and valuable feedback.
Yes, you have adequately addressed my comments. I believe all concerns are now resolved. Thank you again for your responses. I have also read the discussion with other reviewers, and it seems that everyone agrees that this paper makes a strong contribution. I will increase my score to 5.
The authors present a framework for constructing diffusion models on data spaces equipped with Lie group actions. By leveraging the infinitesimal action of Lie algebras, the authors define forward diffusion processes that respect the symmetries of the data space. These dynamics yield fundamental vector fields and associated differential operators. A generalized score matching framework is proposed minimizing the generalized Fisher divergence, which includes the standard score matching as a special case. Experiments were conducted on rotated MNIST, molecular conformer generation (QM9), and molecular docking (CrossDocked2020), demonstrating the framework's applicability to structured data with symmetries.
优缺点分析
Strengths:
- The proposed framework is mathematically grounded, extending denoising score matching to data with Lie group symmetry using infinitesimal generators.
- A set of sufficient conditions for suitable Lie groups for score matching and Langevin dynamics are given.
- The paper introduces a class of solvable SDEs that govern the Lie group diffusion via Euclidean coordinates. -Choosing a suitable group G, the learning process is significantly simplified, effectively reducing the dimensionality of the problem.
- While related to previous works on diffusion models on Riemannian manifolds and latent spaces, this work articulates how the Lie algebra induces vector fields that guide the forward SDE. The link to fundamental vector fields and their role as a Jacobian-like structure is well presented.
- Experiments were run on three different datasets.
Weaknesses:
-
The paper is densely packed and hard to follow, specially for some one not familiar with the topic. While there are figures provided to help understand concepts used in the proposed approach, they are not well explained in the main text and their captions are not very explanatory either. Having a table in the appendix with a summary of the notation would be helpful.
-
It would be helpful to clarify when Lie algebra-induced diffusion offers a clear advantage over standard diffusion models. A direct comparison with conventional Euclidean diffusion -- both with and without built-in equivariance-- would strengthen the motivation and help identify specific scenarios where the proposed approach is most beneficial.
问题
Please refer to weaknesses.
局限性
Yes.
最终评判理由
We would like to thank the authors for the detailed responses clarifying many points raised by all the reviewers. We believe that the table will help readers to follow the paper. We also encourage the authors to try to improve the figures presentation and their captions. Thanks also for the extra experiments, which significantly strengthen the paper.
格式问题
None.
We thank the reviewer DmbC for their time reading our manuscript and for their insightful comments, which we will address below.
- We understand that Lie group and representation theory is very technical and not common knowledge amount the machine learning community. We appreciate the reviewer's suggestion and we added a table in the appendix with the notation of the main concepts used in this work, together with their definition and a more intuitive explanation of the ideas behind them. The full table is presented here
| Symbol | Name | Definition | Intuition |
|---|---|---|---|
| Lie group | A continuous symmetry group, e.g., rotations (), translations (), scalings(). Encodes the structure of transformations acting on the data. | ||
| Identity element of | The identity transformation leaving everything unchanged. | ||
| Lie algebra of | Tangent space at the identity; represents infinitesimal group transformations. | ||
| Data manifold | The space where the data lives, often , but can be more general or even discrete (e.g., graph for molecules, grid for images, etc.). | ||
| Group action | Specifies how each abstract group element transforms data points in via matrix multiplication. | ||
| Orbit of under | The set of all points reachable from via group actions. Captures the “symmetry class” of . | ||
| Stabilizer subgroup at | Subgroup of that leaves unchanged. Describes residual symmetries at that point. | ||
| Infinitesimal action | Maps infinitesimal transformations to vector fields on ; captures how a tiny "step" in moves a point in . | ||
| Exponential map | , where | Point on the geodesic path on determined by the direction and length given by the vector . | |
| Flow on induced by | Path on corresponding to a geodesic path on determined by . | ||
| Fundamental vector field from | A vector field on generated by a direction in the Lie algebra; describes how moves under an infinitesimal group transformation. |
-
Additional experiments/benchmarks:
-
We perform additional experiments on CrossDocked2020 using a Brownian Bridge diffusion model (BBDM) similar to the MNIST experiment in Section 5. We sample a rotated ligand endpoint using the Riemannian Score-Based Generative Models (RSGM) scheduler, with the original ligand as , and sample intermediates as , where and using diffusion steps. As the Euclidean baseline, we train an equivariant Fisher score network with degrees of freedom to predict the ground-truth pose . Similar to MNIST, intermediate BBDM samples display unphysical 3D poses due to linear interpolation and noising. We use the same network architecture as in the GSM and RSGM experiments to learn the correct rotation. Unlike existing experiments (our method and RSGM), the Euclidean BBDM in this setting attempts to learn only global rotation, neglecting translation. Since the problem is implicitly 3-dimensional but the equivariant score network predicts all ligand atom coordinates, final samples with implausible coordinate trajectories tend to have higher energies due to unphysical poses including bond stretching, non-planar aromatic rings, and deformed rings. In terms of mean/std RMSD on the CrossDocked2020 testset, our method (Lie algebra: ) is comparable with BBDM (). However, since BBDM models all atomic coordinates, the overall dynamics does not follow global SO(3) rotation, achieving , while RSGM and our method achieve by design. This indicates that Lie algebra induced diffusion offers a clear advantage over standard Diffusion models in that particular bridging problem. We will include the new experiment in the final version of the manuscript.
-
Synthetic datasets (please refer to the response to reviewer d94q for the full table of results): We ran a quantitative evaluation using the Wasserstein-2 (W2) distance on synthetic 2D and 3D datasets, comparing standard (Fisher) score matching () with our proposed method based on Lie groups ( and ). One challenge in these experiments is that the similarity between the prior and target distributions can affect the results. To reduce this bias, we report a normalized W2 metric, which divides the W2 distance between the samples and the target by the W2 distance between the prior and the target. We find that GSM performs as well or better in most datasets, especially in cases where symmetry gives a useful inductive bias. In the MoG datasets, standard score matching () performs better than the Lie group model (), which makes sense since these data don’t have rotational symmetry, and translation symmetry helps the model find the centers of the Gaussian modes. The difference becomes even clearer in 3D, where GSM shows stronger performance. We believe that in higher dimensions, it is harder for a model to memorize the distribution, and the benefits of using symmetry-aware biases become more important.
-
Quantitative MNIST evaluation (table with numerical results can be found in the response to Reviewer d94q): We evaluated the FID as well as the classification accuracy on the generated images from both our model and the BBDM one. While our model is only slightly better on the FID, it is far superior in the classification accuracy (93% vs 80%) (the classifier is a simple network trained on the original (unrotated) MNIST dataset, whose embeddings we also extract for the FID computation). This happens because BBDM sometimes generate bridges between different classes, as it cannot enforce a strict "rigid" rotation. We showed one instance of this occurrance in Figure 6 of the manuscript.
-
We hope our responses have effectively addressed the reviewer’s questions and concerns. We would sincerely appreciate it if the reviewer could take into account the additional efforts made, particularly in generating new benchmarks, by updating the overall score in their review. If any issues remain unclear or need further discussion, we would be happy to provide additional clarification.
Dear Reviewer,
Thank you for submitting the mandatory acknowledgement. Please note that, in addition to this, a brief response to the authors' rebuttal is required. While you do not need to indicate whether your score has changed, you should clarify whether any concerns remain unresolved or if you have further questions for the authors.
Best regards, Your AC
We would like to thank the authors for the detailed responses clarifying many points raised by all the reviewers. We believe that the table will help readers to follow the paper. We also encourage the authors to try to improve the figures presentation and their captions. Thanks also for the extra experiments, which significantly strengthen the paper.
Dear Reviewer DmbC,
we sincerely appreciate your positive feedback on our responses, and we are very happy to hear that our revisions and experiments have strengthened the paper. In the final version of the paper, we will make sure that the readibility of the figures and of the captions will be improved, also by adding more explanation of the related points in the main text.
Since you noted that the paper has improved due to the extra explanations and experiments during the rebuttal phase, we hope you might consider updating your evaluation to reflect these improvements.
Thank you once again for your time and thoughtful comments that helped us improve the paper!
The paper introduces a new framework for diffusion-based generative modeling on Lie group representation spaces rather than on the curved Lie group itself. The contribution is theoretically sound, with careful derivations using generalized score matching and paired SDEs, and it provides a general-purpose formalism that includes standard score matching as a special case. Reviewers agreed that the work is original, mathematically rigorous, and of potential broad impact across domains where symmetry plays a central role.
During the rebuttal, the authors strengthened the paper with additional experiments, including quantitative benchmarks on synthetic datasets, docking poses, and rotated MNIST with FID and classification metrics. These additions addressed concerns about missing baselines and clarified the practical advantages of the method, particularly in preserving symmetry structures and avoiding unphysical samples. Reviewers noted that the explanations of notation and conditions, along with a new summary table, substantially improved clarity.
Remaining concerns centered mostly on accessibility and exposition, which the authors have committed to improving in the final version through clearer figures and expanded explanations. Overall, the consensus among reviewers is that the paper is technically solid, original, and timely, and it provides a framework likely to influence future work on generative modeling with symmetries.
Given its strong theoretical contributions, thorough responses, and improved experimental validation, I recommend acceptance. The work stands out as both foundational and broadly relevant, and I suggest it be nominated for a spotlight presentation.