Dear reviewer 8Atg,

We sincerely appreciate your great support and insightful comments, and we are also very grateful for your increased evaluation of our work. Here we would like to address your additional concerns, which may help facilitate a better understanding and a more in-depth discussion.

I would like to point out that flow matching and diffusion models are, to some extent, unified (meaning the score function and vector field are somewhat equivalent).

We fully agree with your statement that the vector field and score are equivalent to some extent, which have been proven by Proposition 3.3. However, the main difference arises from the fact that diffusion has a simple Gaussian prior at one end, while both ends of bridge matching are data distributions. This leads to the diffusion SDE process being controlled by a single variable (e.g., the score function), while bridge matching must be described by two variables (e.g., and in our work). To be more specific, both the reverse-time diffusion and bridge matching can be unified with the following SDE process, where we use the notations in [1]:

where is the score function at diffusion time , as in [2] while in our bridge matching framework. It is obvious that, though the score function is considered in both methods, the drift term has a quite non-trivial form in bridge matching. In fact, most of our effort has been dedicated to proving that, after incorporating the Boltzmann constraint, holds (Proposition 3.2), which is not considered at all in diffusion. We believe this is the main challenge and contribution of incorporating the Boltzmann constraint into bridge matching.

Additionally, in your rebuttal, the bridge matching works because and are treated as distinct delta distributions. However, my question assumes that and follow Boltzmann distributions. In this case, introducing bridge matching does not seem ideal. It might be more convincing to condition on and sample directly from the Boltzmann distribution.

Our rebuttal may have caused some misunderstandings and we would like to clarify it here. Our statement is, given the bridge pinned down to two endpoints , i.e., in the training process, the learned forward vector field , which is equivalent to , will converge to when the diffusion time converges to 0. Therefore, in the ideal case, the model learns the expectation of the delta distribution when after the whole training process. That is, , where is the transition probability estimated on the training dataset. Thus, if we assume " and follow Boltzmann distributions" in the ideal case, our model will learn the transition probability of the Boltzmann distribution accordingly. Thus we believe that the assumption of Boltzmann distributions and the bridge matching framework are indeed compatible.

By the way, we also believe that, in the case of randomly selecting data pairs from the original MD trajectories for training, the assumption of both ends following the Boltzmann distribution is unlikely to hold. In such non-ideal circumstances, introducing the Boltzmann constraint helps bring the generated distribution closer to the Boltzmann distribution, which is also the reason we incorporate the force guidance into the bridge matching framework.

We hope that our response helps to further address your concerns.

Best regards,

Authors of #4521

Reference

[1] Albergo M S, Boffi N M, Vanden-Eijnden E. Stochastic interpolants: A unifying framework for flows and diffusions[J]. arXiv preprint arXiv:2303.08797, 2023.

[2] Wang Y, Wang L, Shen Y, et al. Protein conformation generation via force-guided se (3) diffusion models[J]. arXiv preprint arXiv:2403.14088, 2024.