Wasserstein Flow Matching: Generative Modeling Over Families of Distributions
Flow Matching between distributions over distributions on the Wasserstein space for generation of Gaussians & general distributions.
摘要
评审与讨论
The paper shows how Riemannian Flow Matching (RFM) (Chen & Lipman, 2023) is applied to the Wasserstein metric space, the space of distributions endowed with the Wasserstein metric. For RFM to work, a parametric vector field that such that when marginalizing it over the [0,1] time interval, turns a source, Gaussian distribution to a target distribution, an appropriate metric and a family of conditional flows. The paper shows that, for the Wassserstein metric space, we can use McCann interpolation, its time derivative naturally matches as the desired parametric vector field, and the Wasserstein metric acts as the Riemannian metric. The authors coined the approach as Wasserstein Flow Matching (WFM). The paper further discusses a special subspace of the Wasserstein space, the subspace of non-degenerate Gaussian distributions, called the Bures-Wasserstein space, where the authors show that the McCann interpolation, its time derivatives and all other key entities can be computed in closed forms, exploring the analytical form of Gaussian distributions. The Bures-Wasserstein space has an application in single-cell gnenomics where cellular microenvironments or highlight fine-grain clusters can be captured with just means and covariances. For the general case of general distributions though, since optimal transport (OT) maps are intractable, the authors proposed to rely on entropic OT (Cuturi, 2013), which approximates OT and can be efficiently computed with GPUs via Sinkhorn's algorithm (Sinkhorn, 1964). Experiments were conducted over the family of Gaussians and the family of general distributions using synthetic and real data, where WFM was shown to be competitive against existing baseline approaches.
Update after rebuttal
My review is already positive for the paper. The additional clarifications from the authors are helpful. I've kept my score.
给作者的问题
N/A
论据与证据
The main claim, which is instantiation of RFM on the Wasserstein metric space is supported by clear and convincing evidences.
There is one small claim (end of section 1), that "...current approaches cannot scale to high-dimensional distributions and fail when distributions are realized as point-clouds with variable sizes. Conversely WFM succeeds in these challenging settings, enabling generative modeling in new domains like synthesizing tissue microenvironments...". I acknowledge that the authors have a very interesting application. However, I am not totally convinced by the evidences. There is nothing in the Wassertein geometry or RFM that deals with point cloud-related computational costs. What I have seen in the paper is that every point cloud in consideration are sampled with 1,000 points. This is too small to represent a realistic point cloud in industry. In addition, WFM in this case has to rely on the approximate approach of entropic OT, in which the relationship between point cloud size and computational cost is not clear in the paper.
方法与评估标准
The methods and evaluation criteria are reasonable to me. In the experiments over the family of Gaussians, the results make sense.
My concern is at the experiments over the general family of point clouds where, if I understand correctly, all point cloud sample sizes are at 1,000 points, which in my view is too low in practice. There is no study regarding the impact of point cloud size vs the overall WFM performance in terms of both accuracy and speed. In my view point, this seems to be a substantial weak point because WFM in this case has to rely on entropic OT due to the intractability of the true OT.
理论论述
I did check the proposed proofs. They appear valid to me and they are not that hard to derive.
实验设计与分析
N/A
补充材料
I checked appendices A, B, E, F, G.
I do not understand the need for Lemma B.1. The Wasserstein distance function is a metric (Villani, 2009). That means it satisfies the 4 metric axioms: identity of indiscernibles, positivity, symmetry and triangle inequality. A premetric only requires 2 axioms: identity of indiscernibles and positivity. By definition, every metric is a premetric. RFM only requires a premetric. Thus, why do we need Lemma B.1 here?
与现有文献的关系
From a theoretical point of view, the contribution here is interesting to branches dealing with synthesizing 2D and 3D point clouds. It might be good to relate to Dirichlet distributions as well for classification applications. From a practical point of view, applications that can stem from this work like modelling tissue biology and cellular microenvironment distributions can benefit from the work.
遗漏的重要参考文献
N/A
其他优缺点
N/A
其他意见或建议
I have some concerns above but this is a good paper overall because it has interesting applications and its theoretical contribution is sound, although not very technically difficult.
Response to Reviewer 4 (euNm)
We thank the reviewer for their insightful feedback and positive assessment of our work. We address each point below.
"...What I have seen in the paper is that every point cloud in consideration are sampled with 1,000 points. This is too small to represent a realistic point cloud in industry."
In our revised Table 3 (see response to reviewer 3 - ToSj), we now apply WFM to point-clouds with 2048 particles, which is the standard size used by other generative models for shapes in the literature (see PVD and PSF). In this regime, WFM performs on par with or better than other methods, while being significantly more computationally efficient. We note that while training is performed on 2048-sized realizations, inference can be done at any scale. For biological applications like cellular microenvironments, niche sizes are typically on the order of tens of cells, so our methodology is more than sufficient for these important use cases.
"...In addition, WFM in this case has to rely on the approximate approach of entropic OT, in which the relationship between point cloud size and computational cost is not clear in the paper."
As we discuss in Appendix A, the complexity of entropic OT is between point-clouds with particles is , where is the entropic regularisation strength. We also discuss the trade-offs between true and entropic OT and how the regularisation parameters provides a straightforward way to balance computational efficiency and OT accuracy.
"I do not understand the need for Lemma B.1. The Wasserstein distance function is a metric (Villani, 2009). That means it satisfies the 4 metric axioms: identity of indiscernibles, positivity, symmetry and triangle inequality. A premetric only requires 2 axioms: identity of indiscernibles and positivity. By definition, every metric is a premetric. RFM only requires a premetric."
The reviewer correctly points out that the classical definition of a premetric requires only positivity and identity of indiscernibles, both of which are satisfied by any metric. However, the RFM framework (Chen & Lipman, 2023) uses a slightly different definition that includes a third necessary condition:
-
Positivity: for all
-
Identity of indiscernibles: iff
-
Non-degeneracy: iff
Indeed, if a (squared-)metric is differentiable, then it is easy to sketch out that condition (3) holds. We found it pedagogically useful to explicitly write out this third condition, which introduced the logarithmic and exponential maps.
The paper addresses the problem of learning generative models of high-dimension distribution, i.e., where each sample from the model is a distribution itself. The authors propose Wasserstein flow matching (WFM), which builds on top of recent advances in the Riemannian flow matching (RFM) framework (Chen & Lipman, 2023) and extend it to the Wasserstein geometry (Ambrosio et al. 2008). The authors show that WFM can generate high-dimensional distribution either represented analytically as Gaussians or empirically as point clouds and derive valid flow-matching objectives for these cases. The authors then apply the method to single-cell and spatial transcriptomics datasets. Additionally, the method is applied to 3d shape generation on point cloud data (ShapeNet & ModelNet), on which it performs similarly, although not better than other methods.
给作者的问题
- Can the authors discuss the results "lower" performance in Tab 3?
论据与证据
The authors claim to introduce WFM, extending the Flow-matching framework to the space of probability distributions. To my knowledge, this is a novel contribution. The authors provide a sound theoretical foundation for the method and demonstrate its effectiveness on various tasks. Empirical results suggest that it outperforms other approaches (Table 2) in its intended application and is on par with current state-of-the-art models for 3d shape generation (Table 3) while allowing flexible choice of number of particles (Table 4).
方法与评估标准
The proposed method is well-motivated, and the evaluation is thorough.
理论论述
The authors prove the validity of the WFM objective in Appendix B. The proof is sound and largely builds upon previous work on FM (Lipman et al., 2022, Chen & Lipman, 2023).
实验设计与分析
Experimental design and analysis are well thought out and the results are presented clearly and concisely. The experiments are well motivated and the results are convincing.
补充材料
NA
与现有文献的关系
Related work is discussed in Section 2.2 and is, to my knowledge, comprehensive.
遗漏的重要参考文献
NA
其他优缺点
Strengths:
- Very well written and structured.
- Extensive experiments on toy datasets, standard benchmarks, and fitting scientific applications in genomics.
Weaknesses:
- While Tab 3 reveals that WFM performs similarly to other methods on ShapeNet and ModelNet, it never outperforms them.
其他意见或建议
- Page 3, line 110 typo "moodal data".
Response to Reviewer 3 (ToSj)
We are grateful to the reviewer for their positive assessment of our work.
"While Tab 3 reveals that WFM performs similarly to other methods on ShapeNet and ModelNet, it never outperforms them."
All reviewers expressed concerns about our performance in 3D shape generation experiments (Table 3). We've addressed this by making the following improvements:
- Fixed a minor inference bug: We corrected a time-stepping loop that previously ran from t=0 to t=1+t (i.e. from ) , instead of the correct t=0 to t=1 ( ).
- Standardized evaluation metrics: We now use the same approximate EMD implementation as other benchmark methods. Our original results used true EMD calculations, creating an unintended evaluation discrepancy. Using consistent benchmarking code provides a fairer comparison.
- Matched point cloud size: We've increased our point cloud size from 1000 to 2048 points to align with other methods' evaluations. This directly addresses reviewer concerns about scalability while enabling more direct comparisons.
These changes have significantly improved our results:
| Airplane | Chair | Car | ||||
|---|---|---|---|---|---|---|
| CD ↓ | EMD ↓ | CD ↓ | EMD ↓ | CD ↓ | EMD ↓ | |
| PointFlow | 75.68 | 70.74 | 62.84 | 60.57 | 58.10 | 56.25 |
| SoftFlow | 76.05 | 65.80 | 59.21 | 60.05 | 64.77 | 60.09 |
| DPF-Net | 75.18 | 65.55 | 62.00 | 58.53 | 62.35 | 54.48 |
| Shape-GF | 80.00 | 76.17 | 68.96 | 65.48 | 63.20 | 56.53 |
| PVD | 73.82 | 64.81 | 56.26 | 53.32 | 54.55 | 53.83 |
| PSF | 71.11 | 61.09 | 58.92 | 54.45 | 57.19 | 56.07 |
| WFM (ours) | 69.88 | 64.44 | 57.62 | 57.93 | 53.41 | 58.10 |
WFM now achieves state-of-the-art CD performance on 2/3 datasets while maintaining competitive performance on all metrics. Importantly, WFM achieves these results with substantially lower computational requirements (~120 GPU hours on an 80GB GPU versus ~400 GPU hours for PSF and PVD).
"Page 3, line 110 typo 'moodal data'."
Thank you for catching this typo. It has been fixed to 'multi-modal data'.
I thank the authors for their response and for addressing my concern.
I will maintain my already positive evaluation of the work.
This paper proposes Wasserstein Flow Matching, a method for building a generative model where the datapoints themselves are distributions. The basic idea is to treat the Wasserstein space as a manifold and use Riemannian flow matching techniques. The model is studied in two settings: (1) the Bures-Wasserstein space, and (2) empirical measures / point clouds. The method is evaluated on synthetic data, 3D point clouds, and some applications in single-cell genomics and spatial transcriptomics.
update after rebuttal
After the author's rebuttal, I will maintain my positive score of weak accept. I think the core idea of the paper is interesting and novel, but stronger theoretical and/or empirical results would be necessary to further strengthen the paper and raise my score.
给作者的问题
See above comments on claims
论据与证据
There are two claims that I feel are not entirely justified.
-
The authors claim that existing approaches for 3D point cloud generation cannot handle point clouds of variable sizes. However, a method like PSF for example (https://arxiv.org/abs/2212.01747) should in principle be able to do this, as long as a suitable architecture is selected. The ability to handle variable-size point clouds seems to have less to do with the proposed methodology than with the specific architectural choice (in this case, just a transformer).
-
The use of Equation 11 / Proposition 3.1 do not seem to be properly justified, at least in the general setting. In particular, Riemannian flow matching makes an important assumption that the manifold is finite dimensional. In this setting one is thus generally justified in using densities, e.g., those appearing in Equation 19 of this submission. However, in the setting of this paper, the Wasserstein manifold is in general an infinite dimensional space, where the derivations in Appendix B no longer work without significant extra steps.
方法与评估标准
The methods and evaluation criteria make sense.
理论论述
See previous comment
实验设计与分析
The experimental analyses seem appropriate to me.
补充材料
I reviewed the appendix but not the code.
与现有文献的关系
- In relationship to the existing literature, this work can be viewed as an extension of Riemannian Flow Matching to the case of manifolds of distributions. This involves taking known results about the geometry of Wasserstein space and combining these tools with RFM to obtain their model.
- Prior work on flow matching for point cloud generation (that the authors cite) develop similar approaches, but do not provide the nice formalism the authors here do.
- The relationship between the submission and Meta Flow Matching could be described in more detail.
遗漏的重要参考文献
n/a
其他优缺点
Strengths
- The general framework of FM on Wasserstein space is very clear and compelling, and lends itself nicely to the methodology developed.
- The synthetic experiments demonstrate the importance of appropriately accounting for the geometry of the space.
Weaknesses
- Some of the claims are not entirely justified (see above).
- The proposed method has a fairly high computational cost, as each training step requires solving an OT problem.
- While this is fine if it leads to better models, I am not convinced that it does. On the 3D point cloud experiments for instance, the proposed method seems to do a bit worse than PSF, which is essentially the same technique as the one proposed in this work, except without using an inner optimal transport solver.
其他意见或建议
- Equation 11 is critical to the proposed method; dedicating more space to the derivation and justification of this in the main paper might be useful.
- There appear to be some typos in Section 2.3.1 -- namely the use of a, b, A, B to (I think?) represent means/covariances.
Response to Reviewer 2 (C2Yo)
Thank you for your review and insightful comments. We address each of your concerns below:
"The authors claim that existing approaches for 3D point cloud generation cannot handle point clouds of variable sizes. However, a method like PSF for example should in principle be able to do this, as long as a suitable architecture is selected."
PSF fundamentally relies on 'Euclidean' distance between point clouds (), which only exists for fixed-size point clouds. Such an interpolation between point-clouds of different sizes is not possible. While one might hypothetically modify PSF to handle variable sizes, such modification would likely require implementing some form of optimal transport to align points between differently realised distributions, precisely what WFM does by design.
"The use of Equation 11 / Proposition 3.1 do not seem to be properly justified, at least in the general setting. In particular, Riemannian flow matching makes an important assumption that the manifold is finite dimensional."
This is an important assumption, which we do indeed make. In the papers you mentioned (such as Functional Flow Matching), the authors simply assume that a notion of continuity equation holds at the level of the flow defined over functionals. We readily adopt this approach and assume the continuity equation holds in the Wasserstein space as well. Another possible interpretation of can be inferred from Appendix B --- we are simply averaging the original flow matching loss (not the conditional flow matching loss) over pairs of measures.
"The proposed method has a fairly high computational cost, as each training step requires solving an OT problem"
The reviewer is correct that there is a computational complexity incurred due to the OT solver. However, as Table 3 shows (see response to reviewer 3 - ToSj) , WFM achieves competitive performance while requiring significantly less compute than competing methods such as PVD and PSF.
"The relationship between the submission and Meta Flow Matching could be described in more detail."
We expand upon the comparison here. The main difference is that Meta Flow Matching (MFM) requires paired couplings of distributions on the Wasserstein space while WFM operates with unpaired couplings. WFM is a general generative framework for creating new samples from a learned distribution of distributions, while MFM specifically learns transformations between paired distributions.
"Equation 11 is critical to the proposed method; dedicating more space to the derivation and justification of this in the main paper might be useful."
We provide a complete derivation of Eq 11 in Appendix B. We now dedicate more background on its motivation in the main text to make it clearer to the reader.
"There appear to be some typos in Section 2.3.1...
Fixed, we thank the reviewer for spotting this typo.
Thanks for these detailed responses.
variable size point clouds...
Yes, I agree with you here in the sense that a given source point cloud and target point cloud must have the same number of points for PSF -- but this can vary across different clouds. I had originally interpreted your comments (e.g., Section 4.2.1) in the sense of "all clouds have the same, fixed number of points".
While your method can allow for variable sizes in the source/target, it is still somewhat unclear to me where this is beneficial -- if you are using a noise distribution as your source, you can freely choose the number of points to be equal to that of the target point cloud. Otherwise, simply downsampling one of the clouds might be fairly effective.
finite dimensional...
I think with this assumption everything should work (at least, for the Bures-Wasserstein and point cloud examples, assuming there is an upper bound on the number of points). However, to avoid over-claiming, I think it is important to spell out that the derivations are only informal in the general setting which the paper considers.
Thank you for your thoughtful follow-up comments.
On variable-sized point clouds: You are right that PSF might be able to handle point clouds of different sizes across examples through downsampling/matching techniques. We'll soften our claims about this aspect of novelty. Still, WFM's ability to handle different-sized source/target pairs can offers benefits where maintaining original densities matters or downsampling would lose structural information.
Regarding the finite-dimensional assumption: We did not intend to over-claim. Our revision will explicitly mention that our derivations and thus results are informal.
This paper proposes Wasserstein flow matching, a generative model which can be treated a variant of Riemannian Flow Matching on Wasserstein manifold. This model allows for working with the families of distributions and is tested in a variety of experiments.
After the rebuttal.
I kept my initial positive score and explained the reasons in my answer to the authors.
给作者的问题
- Is it possible to conduct the experiments with point-clouds in higher dimensions and assess the method's performance w.r.t. the other baselines? I understand that many baselines can not operate in high dimensions but maybe such kind of comparison is possible for moderate dimensions (>2d,3d)?
论据与证据
Overall, the paper is well-written and most of the claims are well-supported.
方法与评估标准
Overall, the proposed method and evaluation criteria make sense. However, the paper is lacking an experiment which shows that the proposed approach beats its competitors in the case of non-Gaussian and at least moderate-dimensional experiment.
理论论述
I skimmed through the provided proofs. The main issue here is the gap between the established theoretical results which where classic OT problem is considered and practical implementation of the approach which utilizes the solutions of the entropy-regularized OT problem (in the experiments with general distributions). While the authors show that error of estimating OT maps with entropic Brenier maps converges to zero with the increase of the number of samples (Appendix A.3), this issue should be directly stated in the main text, possibly in the limitations subsection.
实验设计与分析
Experiments are valid but have some issues:
- Practically, the method demonstrates strong performance mainly for distributions which can be approximated by Gaussian families. In point cloud experiments, the method actually shows a fair performance in low dimensions (3d), while in high dimensions there is a lack of baseline comparisons which makes it difficult to assess its effectiveness. Experiments related to biology also do not clarify this point for me since I am not an expert in this field.
补充材料
I skimmed through the appendix session which is well-structured.
与现有文献的关系
The paper applies the Riemannian flow matching approach to the Wasserstein space. The theoretical results are strongly connected with the original paper (Chen, R. T. and Lipman, Y, 2023).
Chen, R. T. and Lipman, Y. Riemannian flow matching on general geometries. arXiv preprint arXiv:2302.03660, 2023.
遗漏的重要参考文献
N/A
其他优缺点
Strengths. The paper is well-written, has a nice structure and combines theoretical and experimental results. It proposes an approach which can be treated as an application of Riemannian flow matching to Wasserstein space, is justified theoretically and has interesting practical properties.
Weaknesses. I was reviewing this paper at one of the previous conferences. Overall, the paper became a lot better since many important theoretical aspects are now clarified. Previously, I was expressing my concerns regarding the lack of tractable experiments in moderate or high dimensions where the proposed approach shows superior results. These concerns still hold since new experiments were not added.
其他意见或建议
N/A
Response to Reviewer 1 (8NGf)
Thank you for your thoughtful review and constructive feedback.
"Overall, the proposed method and evaluation criteria make sense. However, the paper is lacking an experiment which shows that the proposed approach beats its competitors in the case of non-Gaussian and at least moderate-dimensional experiment."
Our updated Table 3 (see response to reviewer 3 - ToSj) now demonstrates that WFM outperforms competing methods on 2/3 ShapeNet datasets according to the CD metric. Importantly, WFM achieves these results with approximately 70% lower computational requirements (~120 GPU hours vs. ~400 GPU hours for PSF and PVD), making it substantially more efficient.
"The main issue here is the gap between the established theoretical results which where classic OT problem is considered and practical implementation of the approach which utilizes the solutions of the entropy-regularized OT problem (in the experiments with general distributions)."
We acknowledge this important point. Computing exact OT maps between general distributions is not possible in most circumstances. Outside of univariate measures, product measures, and multivariate Gaussians, closed-form expressions of optimal transport maps are not known to us. Entropic OT provides the only feasible approach with both computational efficiency and statistical guarantees; see Pooladian and Niles-Weed (2021). We added a clarifying comment in the main text to address this approximation gap. In short, an approximation scheme (like entropic OT) is consistent with other geometric methods (including RFM) which rely on numerical approximations when exact solutions aren't available for complex geometries (see Algorithm 1 in Chen & Lipman, 2024).
The new text now reads (line 248, revision in bold):
The condition of "inner continuity" is fairly mild, as this is ensured for any distribution with density. For Gaussian distributions, inner continuity holds naturally. For general distributions, we assume continuity but work with point-clouds as empirical realizations to approximate OT maps with statistical guarantees (see Appendix A.3) and computational efficiency (Flamary et al., 2021; Cuturi et al., 2022). We note there exists a gap between our theoretical results which consider classic OT and our practical implementation which uses entropy-regularized OT approximations. This approach aligns with other geometric methods (see Algorithm 1 in Chen & Lipman, 2024) that rely on numerical approximations when exact solutions are not tractable. The "outer continuity" condition is purely technical, and it serves the same role as in prior work. Our training algorithm is described in Algorithm 1, and Appendix E contains precise details on our neural network design.
"Is it possible to conduct the experiments with point-clouds in higher dimensions and assess the method's performance w.r.t. the other baselines? I understand that many baselines can not operate in high dimensions but maybe such kind of comparison is possible for moderate dimensions (>2d,3d)?"
We appreciate this thoughtful suggestion. We explored adapting other methods to moderate dimensions (4-10D), but the computational requirements proved prohibitive as even our most efficient implementations would require several weeks of computation time per experiment. Instead, we've focused on demonstrating WFM's effectiveness through Table 4, which shows ~60% 1-NN accuracy across all metrics (with 50% being optimal), and Figures 4-6, which showcase realistic generation of variable-sized and high-dimensional distributions.
"Experiments related to biology also do not clarify this point for me since I am not an expert in this field"
We would like to elucidate the biological application of WFM in spatial transcriptomics, a technique that combines molecular profiling of gene-expression with spatial localization of cells within tissues. In these applications, cellular microenvironments are naturally represented as distributions in gene-expression space based on neighboring cells within a tissue. WFM enables generative modeling of cellular niches, synthesizing biologically plausible microenvironments for analysis and study.
By modeling the relationship between cell phenotypes and their environments, WFM could help researchers better understand potential tissue-level responses to cellular changes. WFM addresses the inherently distributional nature of microenvironments by lifting Flow Matching to the Wasserstein space, making it well-suited for this application.
I thank the authors for their answers and provided improvements in Table 3. According to the new results, the approach is now providing better performance in small dimensions (3D point clouds). Still, 1) in updated Table 3, it beats competitors only w.r.t. the CD metric and on 2 of 3 datasets; 2) its performance in high dimensions remains not entirely clear due to the absence of comparison with other approaches (while I see that there are computational obstacles for performing comparison).
Thus, I think that my current positive score represents a fair assessment of the paper.
Thank you for acknowledging the improvements in our 3D point cloud results and for your positive evaluation. We appreciate your constructive feedback throughout the review process.
In this study, the authors proposed a Wasserstein flow matching method for learning generative models of distributions/probability measures. All the reviewers, including AC, admit the solidity of the theoretical part and the novelty of the proposed method.
However, the proposed method may inevitably suffer from the curse of dimensionality, and its computational complexity seems too high for some practical applications, which harms its feasibility in practice.
Moreover, the are also related work on modeling exchangeable data and neural processes [1, 2, 3], which should be included.
Therefore, I tend to weakly accept this work.
[1] Korshunova, I., Degrave, J., Huszar, F., Gal, Y., Gretton, A., and Dambre, J. BRUNO: A deep recurrent model for exchangeable data. In Advances in Neural Information Processing Systems, pp. 7188–7197, 2018.
[2] Garnelo, Marta, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Eslami, and Yee Whye Teh. "Neural processes." arXiv preprint arXiv:1807.01622 (2018).
[3] Yang, M., Dai, B., Dai, H., & Schuurmans, D. (2020, November). Energy-based processes for exchangeable data. In International Conference on Machine Learning (pp. 10681-10692). PMLR.