/10

Poster4 位审稿人

最低3最高4标准差0.4

ICML 2025

Smooth Interpolation for Improved Discrete Graph Generative Models

Yuxuan Song,Juntong Shi,Jingjing Gong,Minkai Xu,Stefano Ermon,Hao Zhou,Wei-Ying Ma

OpenReview PDF

提交: 2025-01-23更新: 2025-07-24

TL;DR

A discrete graph generative model based on Bayesian Flow Networks

摘要

关键词

Graph Generation; Deep Generative Models; Bayesian Flow Networks;

评审与讨论

审稿意见

评分: 32025-03-10

This work proposes a new graph generative model called GraphBFN by extends BFN to graph generation. This paper further proposes an advanced sampling strategy and new time-scheduling techniques to boost the performance of BFNs. Empirical evaluation shows that GraphBFN consistently achieves strong performance and higher efficiency.

Update after rebuttal

The rebuttal has addressed my concerns, so I have raised my score.

给作者的问题

See the weaknesses mentioned above.

论据与证据

The claims are adequately supported by empirical evidence.

方法与评估标准

The proposed method and evaluation criteria makes sense.

理论论述

Strength 1 (theoretical results): This paper theoretically shows the permutation invariance of the loss function and the density function (Proposition 3.2) by proving the permutation equivariance of the Bayesian update rule, and the proof looks sound.

Weakness 1 (marginal contribution): The technical contribution of this paper seems a bit marginal. Most of the method design and theoretical results are just mimicking the original BFN paper (Graves et al., 2023).

实验设计与分析

Weakness 2 (efficiency): Although this paper claims that GraphBFN is efficient, experimental results do not support this claim as much. In Table 11, GraphBFN is the slowest method in terms of sampling time, slower than DiGress and GruM. Furthermore, later work such as continuous-time graph diffusion has been proposed to further improve the efficiency of DiGress (see Weakness 3 below). It is unclear how does GraphBFN compare with continuous-time graph diffusion models in terms of sampling time.

补充材料

I have briefly checked the entire supplementary material.

Strength 2 (detailed analyses): This paper provides detailed and comprehensive analyses in the supplementary material. The analyses help reader understand GraphBFN more thoroughly.

Strength 3 (reproducibility): The authors promise to release their code to facilitate reproducibility. (Unfortunately, I cannot check their code because the anonymous link has expired.)

与现有文献的关系

Strength 4 (new direction): This work is the first to extend BFN to graph generation. It opens up a new interesting and promising direction in the area of graph generation.

遗漏的重要参考文献

Weakness 3 (missing baselines): Continuous-time graph diffusion models [1,2] can also be viewed as "smooth interpolation." They are proposed to improve generation efficiency as well by reducing the number of discretization steps. However, the authors did not discuss or compare GraphBFN with continuous-time graph diffusion models.

[1] Xu et al. Discrete-state continuous-time diffusion for graph generation. NeurIPS, 2024.
[2] Siraudin et al. Cometh: A continuous-time discrete-state graph diffusion model. arXiv:2406.06449.

其他优缺点

There are no other strengths or weaknesses that I want to point out especially.

其他意见或建议

You mentioned "ICLR" in the "Ethics Statement" section. It is not relevant to ICML and should be removed.
The required "Impact Statement" section is missing. Please add it in the revised version of the paper.

作者回复

2025-04-01

W1: Towards the non-trivialness of our work

Thank you for raising this concern. We would like to further clarify the technical contribution of our approach. While our method builds on the BFN framework, naively applying it fails to perform well. Our work bridges the gap between the theoretical framework and complex real-world graph generation settings. For instance, in our ablation of the adaptive Flowback sample, disabling it causes a drastic performance drop—valid sample rate on Planar drops from 96.2% to 14.1%. Techniques like this improve the model framework itself and hence are task-agnostic, which may generalize to other complex applications of BFNs.

W2&3: Regarding GraphBFN's efficiency

Thank you for raising this important point. We would like to clarify that, similar to continuous-time diffusion models, GraphBFN achieves efficiency not through lower per-step runtime but through its ability to maintain high sample quality with fewer discretization steps.

Table 5 indicates that GraphBFN is not much slower per step than baselines. Rather, its key advantage lies in Table 7, where GraphBFN can maintain high sample quality with only ~10% of the original sampling steps.

Since GraphBFN and continuous-time models gains efficiency similarly, we compare their performance directly in the next table. GraphBFN consistently maintains the strongest performance at every level of step reduction. This reinforces GraphBFN effiency.

Dataset—QM9withoutH

Steps	GraphBFN	DisCo-GT	Cometh
500	99.7	99.3	99.6
100	99.6	98.75	99.4
10	98.6	95.3	88.7

We also provide a runtime comparison for DisCo under identical settings (Cometh's implementation is not available). Its longer runtime stems from its larger default architecture. When reduced to our setup, DisCo’s quality noticeably drops (see last below), suggesting GraphBFN maintains competitive per-step efficiency.

	Time to Sample 10k QM9withH
DisCo-GT	66min
DisCo-MPNN	46min
GraphBFN-GT	44min
GraphBFN-MPNN	35min

We also experimented with the faster but less expressive MPNN-based backbone architecture suggested by DisCo. We found GraphBFN can also enjoy this speed up. However, similar to DisCo's finding, we observe that the speed-up architecture results in a decline in sample quality. Even then, GraphBFN-MPNN still outperforms baselines.

Dataset—QM9withH

	Valid	Unique	Atom. Stab.	Mol. Stab.
DisCo(matching size)	96.7	97.9	97.5	68.9
DisCo-GT	98.1	97.8	97.8	70.0
DiGress(GT)	95.4	97.6	98.1	79.8
GraphBFN-GT	99.2	94.9	99.4	94.7
GraphBFN-MPNN	97.3	97.4	98.8	87.8

W2&3: Clarifying the Conceptual Difference between Continuous-time Diffusion and "Smooth Interpolation"

Thank you for pointing out their relation to "smoothness"

We want to clarify that there are generally two dimensions to consider when discussing smooth interpolation.

continuous transition of latent space
latent space itself being continuous-state and interpretable (our claim)

Continuous-time modeling (e.g. GraphBFN, DisCo, and Cometh) addresses the first dimension by allowing the latent distribution to evolve smoothly over continuous time. This allows the model to learn how to make predictions at any point in time, reducing discretization errors. This property explains why these models can sample with far fewer discretization steps than the discrete-time model DiGress[2].

However, the “smooth interpolation” we emphasize in our paper primarily refers to the second dimension: the latent space being continuous in state, not just in time. This is a unique property of GraphBFN, thanks to the probabilistic modeling of the BFN framework. Under this continuous and normalized latent space, changes in latent variables between nearby timesteps are significantly more stable compared to discrete latent spaces like those in DiGress. which enables smoother transitions in spectral features. In graph generation tasks, the model heavily relies on the conditioning of these features; hence, having a stable and consistent latent evolution is crucial for generation quality, as illustrated in Appendix G.2.

Though continuous-time dynamics help reduce transition noise in models like DisCo and Cometh, their latent variables still reside in the discrete graph space, which inherently leads to fluctuating latent transitions. To validate this, we conducted the analysis in Appendix G.1 on DisCo. The results show that DisCo’s trajectory behaves similarly to DiGress, with higher fluctuation.

	MASD	MLSTD	MVar
Digress	1.14	0.78	0.65
Drum	1.43	0.79	0.77
DisCo	1.05	0.76	0.56
GraphBFN	0.39	0.25	0.26

We also distinguish our method from models like GruM. Though GruM also uses a continuous latent space, it is neither normalized nor structured in a way that makes spectral features interpretable. As a result, these models cannot effectively utilize the latent space to aid generation in the way GraphBFN can.

审稿人评论

2025-04-02

Thank you for your response! My concerns have been addressed. I will raise my score.

作者评论

2025-04-03

Thank you for your speedy reply! And thank you so much for providing valuable feedback and recognizing our efforts!

审稿意见

评分: 32025-03-13

The paper introduces Graph Bayesian Flow Networks (GraphBFN), a novel framework for discrete graph generation that enables smooth interpolation between graph states. Unlike discrete diffusion models, GraphBFN leverages probabilistic adjacency matrices and Bayesian flow updates to improve stability and efficiency. Several techniques are also proposed to improve the proposed frameworks.

update after rebuttal

The rebuttal of the author brings additional insights to the broader implication of the methodology, but it does not impact the overall assessment of the paper (which was positive in the first place).

给作者的问题

The paper clearly articulates the issue of discrete diffusion models for graph generation. However, there seems to be a slight misalignment between the identified issue and the proposed solution. The issue concerns limitations in discrete diffusion, but the proposed method falls under the flow-based family rather than addressing diffusion directly. Could you provide further clarification on this choice? Is there a specific reason for moving away from the diffusion framework rather than improving it?
I also have a question regarding the motivation behind the proposed approach in the broader context of graph generation models. Existing methods include continuous diffusion models [1] and discrete diffusion models [2]. Since graph structure generation is inherently discrete, a purely continuous approach may not be suitable. At the same time, the identified limitations of discrete diffusion models suggest that a fully discrete approach is also problematic. If this interpretation is correct, then an intermediate approach—one that balances discrete and continuous modeling—seems necessary. How should we characterize such a hybrid model, and why does the proposed method fit this characterization?
There are two existing research directions (see references) that attempt to bridge the gap: (1) making discrete diffusion models more continuous and (2) making continuous diffusion models more discrete. These approaches also seem like potential solutions to the identified problem. How does the proposed method compare to these? What advantages does it offer over existing solutions?

References:

[1] Jo, J., Lee, S., and Hwang, S. J. Score-based generative modeling of graphs via the system of stochastic differential equations. arXiv preprint arXiv:2202.02514, 2022.

[2] Vignac, C., Krawczuk, I., Siraudin, A., Wang, B., Cevher, V., and Frossard, P. Digress: Discrete denoising diffusion for graph generation. arXiv preprint arXiv:2209.14734, 2022.

论据与证据

The claims are supported

方法与评估标准

The evaluations make sense.

理论论述

I did not check the proof in detail but the results make sense.

实验设计与分析

The experimental design and analysis are sound.

补充材料

I did not review the supplementary material.

与现有文献的关系

The paper proposes a novel flow-based graph generation method.

遗漏的重要参考文献

The paper identifies the limitations of discrete processes in graph generation and proposes a flow-based smooth alternative. This problem aligns with two existing research directions:

Extending discrete state-space models to continuous representations (e.g., [1]).

Enabling continuous diffusion processes to model discrete features (e.g., [2]).

References:

[1] Xu, Zhe, et al. "Discrete-state Continuous-time Diffusion for Graph Generation." arXiv preprint arXiv:2405.11416 (2024).

[2] Chen, Ting, Ruixiang Zhang, and Geoffrey Hinton. "Analog bits: Generating discrete data using diffusion models with self-conditioning." arXiv preprint arXiv:2208.04202 (2022).

其他优缺点

Strength

The paper is well-written with clear identification motivation/limitation of existing method.

The proposed method is sound and the empirical results look promising

Weakness

The proposed method does not perfectly align with the identified motivation (see question below for more details)

其他意见或建议

There seem to be typos in Eq.(2)

作者回复

2025-04-01

Q1: Towards the choice of Generative Framework

Thank you for your thoughtful and insightful comments. We first clarified our motivation for choosing the BFN as our model framework. As you mentioned, we use discrete diffusions as a case study to reveal a specific challenge in the discrete graph generative model, i.e., the fluctuation in the training dynamics. Firstly, the challenges are not limited to the scope of discrete diffusion models. For multi-step generative models, such as discrete diffusion/discrete flow matching[1], the fluctuation in the training trajectory originated from discrete space and the properties of discrete graphs. Moreover, the challenge could actually be generalized as to creating a trajectory from the prior to target distribution of a discrete graph that smoothly decomposes information and provides dense supervision.

There are several possible solutions, but actually remain an open research problem. Under the context of the diffusion framework, encoding the graphs to latent space and hybridizing the latent and discrete diffusion could be a possible solution to keep the discrete nature and build smooth trajectories. However, these approaches generally require training a well-performed encoder/decoder to make it work in practice, which brings a huge computational cost and system complexity. Compared to the diffusion framework, we choose BFN mainly due to its simplicity. BFN achieves smooth interpolation based on the introduced parameter space. Analog to the latent space for diffusion, the parameter space could be seen as a special non-parametric latent space, which is implied by a Bayesian update instead of a trained encoder. Our empirical studies show that parameter space enables the interpretation of natural data properties. We agree that there could be improved space for further exploring and tackling the challenges under various model frameworks, and we will add the discussion to the updated version.

[1] Lipman et al., 2023 “Flow Matching for Generative Modeling.”

Q2: Towards how to characterize a hybrid model

Thanks for your insightful comments! This is a very good question. Firstly, we discuss how to characterize a hybrid model. As we mentioned in the response to Q1, this is an open research area, and we believe there are several possible solutions. The key point is that a latent space is generally needed to capture the continuous properties of discrete graphs. The latent space could be represented by learned neural networks or constructed in a non-parametric manner (GraphBFN). Feasible solutions could be either jointly learning the latent distribution and the discrete distribution or designing the latent as an intermediate state between continuous-state and discrete space and only modeling in the latent space as GraphBFN did.

Next, we elaborate on the parameter spaces of GraphBFN. As shown in the previous response to Q1, we could interpret the parameter space as a non-parameterized latent space. However, though the points over the parameter space have a continuous value, the parameter space of $K$ -dim one-hot discrete data is actually a $K-1$ simplex space. This is because the parameter space holds the constraint that the samples' dims need to be summed as $1$ . The non-free lunch trade-off here is that though the parameter space could be seen as an intermediate between the continuous and discrete, the simplex-contains make it non-trivial to design generative models over such space.

We will add the discussion to the updated version to illustrate the possible solutions and tradeoffs better.

Q3: Towards the trade-off between "continuous" and "discrete"

Thank you for eliciting such an insightful discussion. We discuss the trade-off in the practical setting of graph generation. First, similar to our response to Reviewer HH53, the benefit of being more "continuous" lies in two aspects:

continuous transition of latent space
latent space itself being continuous-state and interpretable

Due to the space limit, please refer to the last response to Reviewer HH53 for a detailed discussion of these benefits. We apologize!

Regarding the notion of being more “discrete,” we are skeptical about the potential benefits of applying Bit Diffusion [3], given concerns similar to those observed with GruM. While the discrete latent space in Bit Diffusion may appear to align naturally with the inherently discrete structure of graphs, its binary representation lacks interpretability and structure. This limits the model’s ability to capture high-level graph semantics or guide the generation process meaningfully. Nonetheless, we believe Bit Diffusion may still be beneficial in other application domains where simplicity and generic data representation are more important than structured latent semantics.

[1] "DiGress: Discrete Denoising diffusion for graph generation."

[2] "Graph Generation with Diffusion Mixture."

[3] "Analog bits: Generating discrete ..."

审稿意见

评分: 42025-03-13

Graph Bayesian Flow Networks (GraphBFN) introduce a new way to generate graphs using continuous latent variables. Unlike traditional models, GraphBFN smoothly transitions from a starting state to the desired graph by mapping these variables to a probability matrix. This approach efficiently models complex graphs, supports various features, and is faster than previous methods. Experiments show GraphBFN creates realistic and accurate graphs.

给作者的问题

All of my questions are above.

论据与证据

The paper is easy to follow and extensive experiments demonstrate the effectiveness of Bayesian flow method on graph modeling.

方法与评估标准

The proposed method follows the widely used evaluation process.

理论论述

I have checked the algorithm related to Bayesian flow method in Appendix E.

实验设计与分析

This paper provides extensive experiments on graph modeling task, as well as detailed ablation study on schedule, crucial hyperparameters and sampling efficiency.

补充材料

To keep a fair comparison, I have checked the data process and evaluation parts in the code.

与现有文献的关系

Graph generation has always been a meaningful topic in machine learning / generation tasks and it has a broader impact on scientific discovery. This paper provides a practical implementation for Bayesian flow method in graph modeling and theoretical foundation for this as well.

遗漏的重要参考文献

Bayesian flow based

其他优缺点

Strengths

The paper proposed Bayesian flow based graph generative framework, a novel graph generative model attempting to capture the discrete data distribution with smooth interpolation.
The paper makes huge efforts on adapting Bayesian flow network on graph modeling through careful design of schedule and sampling strategy.
Flowback is interesting.
Extensive experiments show the GraphBFN's ability on various graph modeling tasks and the model exhibits good performance across all benchmarks.

Weaknesses

Lack of error bar in general graph generation.
The choice of a Bayesian flow network is unclear.

其他意见或建议

ICLR -> ICML in Ethics Statement
In appendix G.3, how the number of steps in Topology Recovery.

作者回复

2025-04-01

W1: Lack of error bar in results

Thank you for the catch. We apologize for not making clear the inconsistent appearance of error terms in the paper. Actually, all reported results are averages of 3 runs. We omitted error terms for readability and because baseline results in prior work did not include them. Detailed results with the error bound for all datasets are summarized in the tables below.

Dataset	Deg.	Clus.	Orbit	Spec.	V.U.N.
Planar	$0.0005\pm0.0002$	$0.0294\pm 0.0163$	$0.0002\pm0.0001$	$0.0046\pm0.0013$	$90.0\pm3.3$

Dataset	Deg.	Clus.	Orbit	Spec.	V.U.N.
SBM	$0.0005\pm0.0001$	$0.0560\pm0.0392$	$0.0370\pm0.0163$	$0.0053\pm0.0005$	$87.5\pm4.0$

Dataset	Valid	FCD	NSPDK	Scarf.
QM9 without H	$99.73\pm0.16$	$0.101\pm0.017$	$0.0002\pm0.0000$	$0.9386\pm0.0980$

Dataset	Valid	FCD	NSPDK	Scarf.
ZINC250k	$99.22\pm0.49$	$2.116\pm0.327$	$0.0013\pm0.0006$	$0.5304\pm0.1980$

Dataset	Val	Unique	Novel	Filters	FCD SNN	SNN	Scaf.
MOSES	$88.5\pm2.9$	$99.8\pm0.2$	$89.0\pm3.3$	$98.3\pm0.8$	$1.07\pm0.11$	$0.59\pm0.11$	$10.0\pm1.8$

Dataset	Val.(%)↑	Uni.(%)↑	A.Stab.(%)↑	M.Stab.(%)↑
QM9 with H	$99.2\pm0.1$	$94.9\pm0.5$	$99.4\pm0.0$	$94.7\pm0.4$

W2: Motivations for Bayesian Flow Networks(BFN)

Thank you for the valuable feedback. We agree that the motivation for choosing BFN can be further clarified.

The rationale stems from our desire to design a 'smooth' generation/training trajectory is motivated by the recent advancements in the research field of diffusion-based approaches [1,2,3], where the signal noise ratio is widely set as a smooth transformation with respect to timesteps. This implies the information should be gradually decomposed in the forward transformation to provide dense supervision. However, for graphs that lie in discrete state space, the discrete nature makes it challenging to gradually decompose the information using discrete-state diffusion models (like DiGress and DisCo [1,3]), where the noisy graphs are created by operations such as deletion(masking) and replacement and these small perturbations can lead to abrupt structural changes (e.g., loss of connectivity or altered clustering).

BFN offers a compelling alternative by operating in a continuous categorical parameter space, which aligns naturally with the discrete form yet continuous property of graph data.

Furthermore, from the perspective of utilizing the inductive bias of graph data, the most advanced graph generation methods[1,4] have demonstrated the effectiveness and importance of utilizing the graph spectrum as either the auxiliary conditions or learning objectives. The smooth transformation in GraphBFN respects the local geometric and topological properties of the graph, which means that similar nodes remain similar throughout the transformation. Hence, it could provide better spectrum conditioning for neural networks. From the perspective of neural network optimization, smooth transformations avoid abrupt changes and can facilitate more stable training dynamics. Thus, gradients are less likely to explode or vanish, leading to more reliable and efficient learning.

[1] Vignac, et al., 2022 "DiGress: Discrete Denoising diffusion for graph generation."

[2] Jo, et al., 2024 "Graph Generation with Diffusion Mixture."

[3] Xu, et al., 2024 "Discrete-state Continuous-time Diffusion for Graph Generation."

[4] Martinkus, et al., 2022 "SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators."

Towards additional comments

Thank you for your comments. We will incorporate them in future revisions.

Regarding your second comment, could you please clarify the intended question?

If you’re referring to the difference between “timesteps” in Figure 5 and “sampling steps” in Table 7, we’re happy to clarify. Figure 5 shows Spec.MMD was evaluated on noisy intermediate samples during generation. “Timestep” refers to a specific point in a sampling trajectory—each curve point corresponds to a sample at a different noise level.

Table 7 shows an ablation study on how reducing the number of sampling steps affects final sample quality. Each row corresponds to a run with a different step count. The first column shows the percentage of default steps used, standardized across datasets. For example, (10, Planar) means using 10% × 1000 = 100 steps.

If you’re asking about the DiGress and GruM curves in Appendix G.3, we apologize for the confusion. That study replicates GruM [1]’s analysis of graph topology recovery during generation. The corresponding curves for GruM, DiGress, and other baselines appear in Figure 2 of the GruM paper [1]. We’ll clarify this in the figure caption in the next version.

[1] Jo et al., 2024. Graph Generation with Diffusion Mixture.

审稿人评论

2025-04-04

Thank you for your response, which has clarified several aspects of your work.

For the motivations for BFN, I find you mention that "BFN offers a compelling alternative by operating in a continuous categorical parameter space, which aligns naturally with the discrete form yet continuous property of graph data", which is very interesting and similar to another recent work[1] in some ways. [1] modeling the discrete data in graph with a continuous model but also align with sparsity in discrete data due to the nature of beta diffusion. I believe both of your works focus on how to further explore discrete data modeling with continuous methods while keeping discrete properties, it will be interesting for you to discuss this difference and similarity.

[1] Advancing Graph Generation through Beta Diffusion, Liu et al., ICLR 2025

作者评论

2025-04-05

Thank you for introducing the highly related work and eliciting the thoughtful discussion!

Beta diffusion models [1,2] present a compelling alternative to the widely used Gaussian diffusion models [3,4]. Like the Gaussian counterparts, it admits analytical forms for the marginal and conditional distributions of the diffusion process. However, it offers the added advantage of modeling variables within a bounded domain between 0 and 1, which Gaussian diffusion does not naturally support. This boundedness is particularly beneficial in discrete settings such as graph generation, where the data is inherently constrained. Therefore, you are absolutely right that Graph Beta Diffusion (GBD) [2] is similar to GraphBFN in the sense that they both preserve the constrained nature of discrete graphs while modeling the evolution of continuous quantities.

Nevertheless, we would like to highlight the key similarity and distinction between GraphBFN and GBD.

In the case of generic/abstract graph generation, where there are only two edge types (has edge & no edge), it suffices to represent each entry of adjacent matrixes with a single variable within the range [0, 1]. In this simple setting, the variables modeled by GBD could be interpreted as the probability of edge existence. Therefore, both GBD and GraphBFN share the similar property of modeling the smooth interpolation of probabilistic matrices, although they differ in specific probability trajectories. We compare the performance of GBD and GraphBFN on two abstract graph datasets in the following:

Dataset—Planar

Method	Deg. $\downarrow$	Clus. $\downarrow$	Orbit $\downarrow$	Spec. $\downarrow$	V.U.N. $\uparrow$
DiGress	0.0003	0.0372	0.0095	0.0106	75
GBD	0.0003	0.0353	0.0135	0.0059	92.5
GraphBFN	0.0005	0.0294	0.0002	0.0046	96.7

Dataset—SBM

Method	Deg. $\downarrow$	Clus. $\downarrow$	Orbit $\downarrow$	Spec. $\downarrow$	V.U.N. $\uparrow$
DiGress	0.0013	0.0498	0.0434	0.0400	74.0
GBD	0.0013	0.0493	0.0446	0.0047	75.0
GraphBFN	0.0005$	0.0560	0.0370	0.0053	87.5

As the results indicate, both GBD and GraphBFN could obtain a superior performance compared to the discrete-state-space models (e.g., DiGress [5]) in abstract graph settings, which justifies the advantages of smooth interpolation, as we claimed.

However, for graphs with multiple edge types (e.g., molecular graphs), the modeling of GraphBFN and GBD have non-negligible differences. When the adjacency matrix contains k edge types, GraphBFN represents each matrix entry as a k-dimensional probability vector that lies on the simplex (i.e., the vector components sum to 1), making each dimension interpretable as the probability of a particular edge type. In contrast, GBD models each dimension independently as a value in [0, 1], without enforcing normalization. The trajectory over simplexes of GraphBFN makes it a more natural choice to model multiple-class discrete variables by making different dimensions contrastive. We further provide a performance comparison on the molecular dataset ZINK250K below:

Dataset—ZINK250K

Method	Valid $\uparrow$	FCD $\downarrow$	NSPDK $\downarrow$	Scarf. $\uparrow$
DiGress	94.99	3.482	0.0021	0.4163
GBD	97.87	2.248	0.0018	0.5042
GraphBFN	99.22	2.116	0.0013	0.5304

As the results indicate, GraphBFN achieves notably stronger performance in this setting, which justifiies the potential of simplex-based approaches in modeling multi-class discrete graphs.

[1] Zhou, et al., 2023 "Beta Diffusion."

[2] Liu, et al., 2025 "Advancing Graph Generation through Beta Diffusion."

[3] Ho, et al., 2020 "Denoising Diffusion Probabilistic Models."

[4] Jo et al., 2022 "Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations."

[5] Vignac, et al., 2022 "DiGress: Discrete Denoising diffusion for graph generation."

审稿意见

评分: 32025-03-18

The paper introduces Graph Bayesian Flow Networks (GraphBFN), a generative model for graphs based on Bayesian Flow Networks. The key idea is to represent the graph structure in a continuous latent space to generate discrete graphs more smoothly. In other words, instead of working directly with the adjacency matrix of a graph, the model creates a continuous representation of the graph structure using a probabilistic distribution. The paper also introduces a new sampling method, Adaptive Flowback, to reduce the train-test discrepancy.

Update after rebuttal The rebuttal brings some helpful clarifications but does not change my overall assessment of the paper, which was already positive.

给作者的问题

The sentence “each entry P_{i,j} could be seen as the parameter of the Bernoulli distribution” suggests that the final discretization of the probabilistic matrix is performed through Bernoulli sampling. However, it is not explicitly stated whether this is the actual step or if other strategies (e.g., thresholding, structural constraints) are applied.

Are there additional post-processing steps (e.g., ensuring connectivity, removing isolated nodes)?
Have you compared different discretization strategies to assess their effect on the generated graphs?

论据与证据

The paper claims that GraphBFN improves graph generation compared to discrete diffusion-based models, demonstrating competitive performance in experimental benchmarks. However, some aspects require further clarification, such as spectral conditioning.

方法与评估标准

The experiments are conducted on standard benchmarks such as Planar, SBM, QM9, and ZINC250k, which are standard datasets for evaluating generative graph models. The evaluation metrics are consistent with the existing literature.

理论论述

The proof of Proposition 3.2 in Appendix J, about the invariance of the model’s optimization objective and generative density function under node permutations, appears to be correct, although I have not analyzed the entire proof in detail.

实验设计与分析

The experimental design should be explained more precisely in the appendix rather than simply stating that the setting from another paper was adopted.

补充材料

No supplementary material

与现有文献的关系

The paper is well connected to recent works on diffusion-based models for graph generation.

遗漏的重要参考文献

Huang, H., Sun, L., Du, B., Fu, Y., & Lv, W. (2022, November). Graphgdp: Generative diffusion processes for permutation invariant graph generation. In 2022 IEEE International Conference on Data Mining (ICDM) (pp. 201-210). IEEE.

Minello, G., Bicciato, A., Rossi, L., Torsello, A., & Cosmo, L. Generating Graphs via Spectral Diffusion. In The Thirteenth International Conference on Learning Representations.

Yang, L., Huang, Z., Zhang, Z., Liu, Z., Hong, S., Zhang, W., ... & Zhang, L. (2024). Graphusion: Latent diffusion for graph generation. IEEE Transactions on Knowledge and Data Engineering

Zhou, C., Wang, X., & Zhang, M. (2024). Unifying generation and prediction on graphs with latent graph diffusion. Advances in Neural Information Processing Systems, 37, 61963-61999.

其他优缺点

The final discretization process of the adjacency matrix requires a more detailed explanation.

Spectral conditioning is mentioned in Conditioned Graph Features on Output Distributions, but it is unclear which features are used and how they improve the model.

其他意见或建议

In Table 1, I suggest underlining the second-best results to help the reader with comparisons.
I recommend adding more detailed descriptions in the captions, at least in the main text.
From the Experiment Details section, it is unclear how the experiments were conducted. For instance, when computing MMD, you compare a generated graph set with a reference set. How large are these sets?
I suggest moving Related Work after the Introduction.
There are several instances where capital letters appear incorrectly after a comma or colon (e.g., “Due to space limit, We defer…”). It would be beneficial to check for these typos.
It is unclear where and when the graph spectrum is used. The Abstract states: “Though typically represented by the discrete node and edge attributes, the graph topological information can be sufficiently captured by the graph spectrum in a continuous space.” This initial sentence could suggest that the model operates directly on the matrix of eigenvectors, and this impression remains in the reader’s mind, even though the rest of the text does not support it. I recommend clarifying it throughout the text, especially at the beginning, and explicitly distinguishing when the spectrum is used as part of the method and when it is only used for evaluation, as in Figure 1.
What do the node colors represent in Figure 1?
In Figure 5, “A comparison of the transformation of graph topology between GraphBFN and the diffusion-based models, DiGress and GruM,” are the curves for DiGress and GruM missing?
There is a visualization issue: a floating letter “t” appears above Table 5 and Table 6.
Is this sentence correct? “A few entries are left as blank because the corresponding experiment results are presented or implemented in the original work.” What does it mean?
“To improve conciseness, we use the adjacency matrix E” is an unconventional choice, as “A” is typically used for the adjacency matrix, while “E” is commonly associated with the edge list.
The following metrics require better explanation: FCD, SNN, Scaffold Similarity (Scaf), NSPDK, Novelty, and Validity.
Figure 1 could be improved.

作者回复

2025-04-01

W2: Towards how spectral features are used and how they improve the model

We apologize for the confusion. Our use of extra features—including both spectral and structural—follows the setup in prior work [1]. The key difference is that GraphBFN computes these features over the output distribution $\phi(G^{θ_t, t})$ , a probabilistic matrix, instead of noisy intermediate samples.

We first clarify how extra features are used and implemented in the model. Each feature is categorized into node-level features and graph-level features. For graph-level features, we use

graph level counts of cycles,
the multiplicity of the eigenvalue 0, and
the first 5 nonzero eigenvalues of the graph Laplacian.

For node-level features, we use

node level counts of cycles.
the eigenvectors corresponding to eigenvalue 0 and
the first two eigenvectors associated with nonzero eigenvalues.

For molecular graphs, we include an additional graph-level feature — the molecular weight — and a node-level feature — the valency of each atom.

These choices follow DiGress [1]; more details are in their Appendix B.

How the extra features are used is included in Appendix.L.1, "Experiment Details". In short, we compute the extra features of intermediate samples and append them to network input.

Next, we demonstrate the importance of extra feature conditioning via additional ablation studies. Specifically, we evaluate the performance of both GraphBFN and DiGress with and without extra feature conditioning on the challenging QM9 with explict Hydrogens dataset. The results are summarized in the table below.

Model	Val.(%)↑	Uni.(%)↑	A. Stab.(%)↑	M. Stab.(%)↑
Train. Set	97.8	100	98.5	87.0
DiGress + no feature	92.3	97.9	97.3	66.8
DiGress	95.4	97.6	98.1	79.8
GraphBFN + no faeture	96.9	98.1	99.1	90.5
GraphBFN	99.2	94.9	99.4	94.7

We observe that extra feature conditioning improves performance across all evaluation metrics for both models, showing its impact on enhancing molecular graph generation.

[1] Vignac, et al., 2022 "DiGress: Discrete Denoising diffusion for graph generation."

W1&Q1: Regarding the final discretization of probabilistic matrix and post-processing

Thank you for pointing out the ambiguity in our implementation detail. Yes, that’s correct—at the final step, discrete samples are obtained by directly sampling from the predicted probabilistic matrix (or node probability vectors). We do not employ any specialized discretization technique for all datasets; instead, we sample faithfully from the categorical distribution defined by the output distribution $\phi(G^{θ_1, 1})$ . We also did not filter or curate our samples (e.g. forgo unconnected samples), in order to remain consistent with the evaluation procedures used in prior works.

The sampling details are depicted in Algorithm 2. Specifically, we sample from output distribution $\phi(G^{θ_1, 1})$ of the graph, which is split into two components: $V[\phi(G^{θ_1, 1})]$ for nodes and $E[\phi(G^{θ_1, 1})]$ for edges. The node tensor $V[\phi(G^{θ_1, 1})]$ has shape [bs,N, $c_v$ ], where $c_v$ is the number of node types. For instance, $V[\phi(G^{θ_1, 1})]$ [1,1] is a categorical distribution representing how the first node of the first sample is distributed. Similarly, the edge tensor $E[\phi(G^{θ_1, 1})]$ has shape [bs,N,N, $c_e$ ], where $c_e$ denotes the number of edge types. Sampling is performed independently for each node and edge from the corresponding categorical distribution defined by the last dimension.

We have not explored alternative discretization strategies beyond direct sampling from the categorical distribution. This choice preserves the unbiased nature of our probabilistic modeling and maintains a good balance between sample quality and diversity. However, we believe that exploring alternative sampling strategies, like top-k or top-p, may be useful for tasks with specific demand, such as favoring higher fidelity at the expense of diversity.

Towards Additional Comments

Thank you very much for your detailed feedback. We will carefully incorporate your suggestions in future revisions.

Node colors in Figure 1:

Apologies for the confusion. The node color reflects a spectral metric used to highlight clusters, originally meant to enhance visualization for structured graphs like SBM samples. We recognize this may be misleading and will either remove the coloring or clarify it in the caption.

Missing curves for DiGress and GruM in Figure 5:

Apologies for the confusion. The study in Appendix G.3 replicates GruM [1]’s analysis of topology recovery. The curves for GruM, DiGress, and others can be found in Figure 2 of the GruM paper [1]. We’ll clarify this in the caption in the next version.

Typo in “A few entries…”

You’re right—this was a typo. There should be a NOT before "reported." We’ll correct this in the next draft.

[1] Jo, et al., 2024 "Graph Generation with Diffusion Mixture."

审稿人评论

2025-04-03

I’ve read the authors’ response, in particular the clarification:

“For node-level features, we use node-level counts of cycles, the eigenvectors corresponding to eigenvalue 0, and the first two eigenvectors associated with nonzero eigenvalues.”

However, I’m not sure I understand the motivation behind using the eigenvectors corresponding to the eigenvalue 0 (these are constant vectors and should do not provide any discriminative node-level information). Could the authors please clarify this point?

作者评论

2025-04-04

Thank you for pointing this out, and we apologize for the lack of clarification.

The eigenvector(s) corresponding to the zero eigenvalue(s) (of the graph Laplacian) reveal each node’s membership in its respective connected components [1]. Specifically, the i‑th entry of each eigenvector indicates whether the i‑th node belongs to that component. In practical implementation, we only include one of these eigenvector(s)—specifically, the one associated with the largest component, which is typically the eigenvector with the most frequently appearing mode. So you’re absolutely right: if the graph is fully connected, then this vector is constant, as all nodes belong to the same component.

However, during the intermediate stages of the generation process, the graph is often not fully connected. In those cases, this eigenvector is no longer constant—it captures node-level connectivity information by distinguishing between different connected components. This helps guide the model by providing structural context as the graph gradually forms.

We hope this addresses your concern, and we'd be happy to discuss this further if you have any additional questions or suggestions!

[1] Vignac, et al., 2022 "DiGress: Discrete Denoising diffusion for graph generation."

最终决定Accept (poster)

2025-05-01

This paper develops Graph Bayesian Flow Networks (GraphBFN), a generative model for graphs based on Bayesian Flow Networks. The paper is well-written, the method sound, and an interesting advance in graph generation. Experiments demonstrate good performance across all benchmarks. The reviewers unanimously recommend acceptance given the contribution made by the paper.