Symmetry-Aware GFlowNets
This paper analyzes the bias inherent in GFlowNets and proposes a reward-scaling method to address the issue.
摘要
评审与讨论
This paper aims to rectify equivalent actions in the graph generation process of GFlowNet. In GFlowNet’s graph generation, actions that result in identical graphs are considered distinct actions, leading to incorrect learning of their probabilities. The paper presents a theoretical framework for these actions and proposes a solution using reward scaling to correct them. The effectiveness of this approach is validated through both synthetic and real-world experiments.
给作者的问题
My questions for this paper are listed in the preceding points.
论据与证据
The theoretical claims presented in this paper are substantiated by detailed proofs presented in the appendix. The experiment settings are outlined in the main text, while the experiment details are further elaborated upon in the appendix.
方法与评估标准
Graph generation is a common use case of GFlowNets, and the existence of equivalent actions is a common problem in related tasks like atom-based and fragment-based molecule generation. The authors propose reward scaling as a simple and effective solution to this issue. However, I have a potential concern with this approach: its exponential complexity could limit its scalability to large and symmetric graphs. In contrast, the positional encoding approach proposed by Ma et al. has polynomial time complexity. While Ma et al. didn’t provide their code, it would be helpful for the authors to compare the proposed reward scaling approach with Ma et al.’s approach in different scales. Specifically, the authors could compare the following two methods:
- The proposed reward scaling approach, which requires computing the automorphism group of terminal graphs.
- Computing the random walk positional encodings of each graph in the trajectory, e.g., using the
torch_geometric.transforms.AddRandomWalkPEfunction from the PyG package. Since Ma et al. didn’t provide their code implementation, the authors don’t need to use positional encodings to check for orbit equivalent actions.
理论论述
I reviewed the theoretical proofs and didn’t find any errors on my own. By the way, in Theorem 4.4, when you mention that the state-action flow constraints are satisfied, do you mean that they should be satisfied for every possible action that leads from to ? It would be clearer if you could clarify this in the theorem statement to prevent any confusion.
实验设计与分析
I’ve thoroughly examined the soundness of all experiment settings. The proposed method appears to be effective in generating synthetic and fragment-based molecules. However, it shows limited improvement in generating atom-based molecules. My understanding is that symmetries are uncommon in real-world molecule graphs, and as the graph size increases, these symmetries become even rarer. Consequently, the proposed method may not scale well to large-scale graphs. The authors could validate this by measuring the ratio of equivalent actions in their tasks. This could potentially restrict the scalability of the proposed approach to large-scale tasks.
Additionally, I have some concerns regarding the synthetic experiment results:
- In Figure 3(c), why is the performance of reward scaling consistently lower than transition and orbit correction? I suspect that the reward scaling approach only corrects the terminal probabilities, while the intermediate probabilities remain incorrect. Therefore, for methods that rely on intermediate probabilities, such as detailed balance, the performance would still be negatively impacted. This limitation is particularly significant considering that detailed balance outperforms trajectory balance in your task.
- In Figure 3(a), why are there some outliers that deviate from the straight line? From your theoretical analysis, both reward scaling and transition correction are expected to yield perfect terminal probabilities. Consequently, I would anticipate the predicted probabilities in Figure 3(a) to form a perfect straight line.
补充材料
I reviewed the codes provided in the supplementary material and didn’t find any problems.
与现有文献的关系
This paper makes a significant and broad contribution to the application of GFlowNets in graph generation tasks. In theory, any application of GFlowNets in this domain should incorporate these correction methods to ensure accurate prediction of rewards.
遗漏的重要参考文献
I couldn’t find any missing essential references.
其他优缺点
The strengths and weaknesses of this paper have been thoroughly discussed in the preceding points, and I have nothing further to contribute.
其他意见或建议
A minor point on Appendix H.2: you mentioned augmenting edge-level representations with shortest-path lengths. However, shortest-path lengths aren’t very expressive, and there are many edge cases where non-equivalent edges in the same graph can have the same shortest-path lengths. I suggest augmenting edge-level representations with the off-diagonal elements of the powers of the random walk matrix. As per results from Ma et al, this approach should yield nearly perfect performance.
Thank you for your helpful feedback and for recognizing our work as a significant contribution to the community!
Time Comparison with the Positional Embedding (PE)
Since PE must be computed at every transition, its computational cost scales poorly with the trajectory horizon. To evaluate how each method scales with the number of transitions, we sampled random graphs with varying transition requirement. We measured the time spent on only the major components of each method (e.g., computing automorphisms for Reward Scaling). 8-dimensional PEs were computed for the PE method. The table below shows the additional computational cost incurred by each method.
Plot link: https://drive.google.com/file/d/1hhB0TVyHKmclFJ65O__BlX74fRUvNtrS/view?usp=sharing
| Method | 10 Transitions (ms) | 50 Transitions (ms) | 100 Transitions (ms) |
|---|---|---|---|
| Transition Correction | 24.32 ± 6.28 | 1148 ± 240.3 | 7354 ± 1288 |
| PE (Ma et al., 2024) | 5.49 ± 0.58 | 186.7 ± 29.87 | 997.5 ± 133.9 |
| Orbit Correction (Ours) | 0.215 ± 0.025 | 2.106 ± 0.116 | 6.975 ± 0.421 |
| Reward Scaling (Ours) | 0.024 ± 0.002 | 0.063 ± 0.004 | 0.111 ± 0.008 |
The experiments clearly show that our methods (Orbit Correction, Reward Scaling) scales much better by significant margins. Note that the reported time is required for each trajectory and accumulates over the course of training.
This indicates that exponential complexity of our method is not the current bottleneck in GFlowNets. We can compute the size of automorphisms in just a few milliseconds, even for graphs with thousands of nodes—far beyond the scale of current applications, which involve fewer than a hundred nodes.
Clarification of the Theorem 4.4
Your understanding of the theorem is correct: the flow constraints must be satisfied for all actions and states. Thank you for giving us the opportunity to clarify this, and we will further revise the manuscript in the updated version.
On Atom-based Generation Task
Your conjecture suggests that symmetries are uncommon in large, real-world molecular graphs, explaining the limited improvement in the atom-based task. We observed that the average symmetries in the synthetic environment is 2.6, whereas molecules in the QM9 training dataset have an average symmetry of 1.3. As you pointed out, this partly explains the limited improvement in the atom-based task. However, a few important points are worth noting:
- Our method does not inherently guarantee improved sample diversity, as this depends on the reward landscape.
- The average symmetry of molecules in the ZINC250k dataset is 2.44, despite containing larger molecules than QM9 (38 atoms vs. 9 atoms). Additionally, the Spearman correlation between the number of atoms and symmetry is 0.08 in ZINC250k, indicating that real-world molecular graphs do not necessarily exhibit fewer symmetries as they grow larger.
- For larger graphs, where only fragment-based generation is practical, what matters is the symmetry of fragments, as demonstrated in our paper.
On the Performance of Reward Scaling in Figure 3(c)
You are correct in pointing out that reward scaling only corrects the terminal probabilities, while the intermediate probabilities remain incorrect. In contrast, both transition and orbit correction methods utilize the correct intermediate probabilities. This explains the performance gap between Reward Scaling and Orbit Correction with detailed balance, as it depends on intermediate probabilities. As mentioned in the paper, this provides additional signal, leading to faster convergence.
On Outliers in Figure 3(a)
We are happy to elaborate on this. It turns out that the previous experiment stopped training too early. We resumed training from the last checkpoint with a larger batch size and a lower learning rate, obtaining an almost straight line!
See the plot in link: https://drive.google.com/file/d/1IHdLn5Ht4zlZ_frqeEOk31X-F-rrIKzC/view?usp=sharing
In addition, outliers in the Fig. 3(a) corresponds to to graphs that are challenging to represent using GNNs; compare Figure 3(a) with Figure 10. As mentioned in the paper, the expressive power of the GNN is another source of incorrect learning, as it can give the same representation for actions in different orbits.
Your suggestion to use the off-diagonal elements of the random walk matrix is insightful. However, in this simple environment, shortest-path lengths provide sufficient information to distinguish different actions. The main limitation was that the information is processed by a simple MLP with only one hidden layer. We observed similar learning dynamics with the random walk matrix. While other approaches could address this issue, the current setup already serves its illustrative purpose.
We hope this clarifies our points and addresses your concerns.
Thank you for your detailed response, and I’ve adjusted my score accordingly. I suggest the authors include the discussion on intermediate probabilities in the related work and experiment sections, as it’s currently absent from the paper. Specifically, reward scaling significantly improves computational efficiency, but it comes at the cost of inaccurate intermediate probabilities and a slight performance drop for detailed balance.
Thank you very much for your thoughtful follow-up, for updating your score, and for your support. We sincerely appreciate your engagement with our work and your valuable suggestions throughout the review process.
We appreciate your recommendation to clarify the role of intermediate probabilities and the trade-off between computational efficiency and performance. We will revise the manuscript to incorporate this discussion more clearly in the related work and experimental sections, as suggested.
Thank you again for your time and constructive feedback, which have helped us improve the clarity of our results.
The authors claim that when applying GFlowNets to the graph generation problem, the presence of equivalent actions—where different actions produce the same graph structure—introduces additional bias. Unlike previous works that require computationally expensive calculations at each iteration, the authors propose a simple yet effective approach to compute the number of graph automorphisms only in the final iteration. This is achieved by modifying the Trajectory Balance (TB) and Detailed Balance (DB) losses based on their theoretical proof. The authors demonstrate the effectiveness of their proposed loss function using both synthetic data and the benchmark molecular generation task.
update after rebuttal
My remaining concerns have been well addressed.
给作者的问题
-
I have a remaining concern regarding the method's reliance on 'bliss' for calculating the number of automorphisms. I believe it may occasionally produce incorrect outputs. Could you provide a sensitivity analysis to assess the impact of incorrect automorphism calculations and demonstrate the robustness of the proposed method?
-
Could you provide the total training time compared to exact transition correction or prior work using positional encoding to demonstrate the significance of this method?
论据与证据
I agree that the problem identified by the authors is important but has not been well studied. I also find the proposed method to be theoretically sound and a practical approach to addressing graph symmetry.
方法与评估标准
The illustrative example and synthetic graphs effectively demonstrate that the proposed Reward Scaling mitigates the bias present in vanilla GFlowNets, which do not account for graph symmetry. Additionally, they show that this strategy performs as well as transition correction while requiring significantly fewer computational resources.
理论论述
The theoretical claim that handling only orbit equivalence, instead of transition equivalence, is sufficient, is valid. Additionally, their modified TB and DB loss, derived through automorphism correction based on ratios, is sound.
实验设计与分析
The experiments using illustrative examples and synthetic graphs validate the proposed reward scaling method, showing competitive results compared to transition correction, which requires high computational cost. Additionally, experiments on molecule generation demonstrate the practicality of this method.
补充材料
I checked the supplementary materials A, B, C, D, and I.
与现有文献的关系
Since this work addresses the challenges commonly encountered in graph generation tasks using GFlowNets and successfully solves them with minimal computational cost, I believe it has the potential to make a significant impact in this area.
遗漏的重要参考文献
N/A
其他优缺点
The presentation of this paper is clear and easy to follow, making it accessible to readers with varying levels of expertise in the field.
其他意见或建议
N/A
We sincerely appreciate your thoughtful feedback and your recognition of our work's potential impact in this area!
Impact of Incorrect Automorphism Counting
While we are uncertain about the conditions under which the bliss algorithm might fail, we believe it produces exact outputs given the current scale of GFlowNet applications (with fewer than 100 nodes). Moreover, reliance on bliss is not essential, as popular alternatives such as nauty are available.
That said, incorrect automorphism calculations can result in slow convergence or an inaccurate policy, with the exact impact depending on the reward landscape. We consider two specific cases separately.
Random Counting: We consider a scenario where the software randomly miscounts the size of the automorphism group in 10% of cases—doubling it 5% of the time and halving it 5% of the time. Since the GFlowNet objectives are constructed in the log space, this amounts to the random noise of , which incurs no bias.
Systematic Miscounting: We implemented an approximate automorphism counter using the Weisfeiler-Lehman (WL) algorithm. If the final coloring has distinct color classes, and class has size , then an approximation is computed as . This assumes that nodes within each class can be permuted freely, which is an upper bound of .
Experimental Results
We conducted a simple experiment in relatively small environment () using the TB objective with Reward Scaling. As shown in the plot linked below, Random Counting converges to the exact method but does so slowly. The approximation using the WL algorithm proved too crude for this environment. A more accurate approximation would better align with our method.
https://drive.google.com/file/d/12m4uqiZdPxcpsaUYowD7ZWUh3wYCGz99/view?usp=sharing
Additionally, we implemented the PE method of Ma et al. (2024), which gives an approximate solution to the problem. The experiments were conducted on two synthetic environments: Cliques and Cycles. While it underperformed compared to ours, it proved to be far better than the Vanilla GFlowNets, which highlights the importance of removing bias.
| Method | Cliques (L1 Error) | Cycles (L1 Error) |
|---|---|---|
| Vanilla | 0.693 ± 0.001 | 0.463 ± 0.001 |
| PE (Ma et al., 2024) | 0.476 ± 0.026 | 0.233 ± 0.005 |
| Reward Scaling (Ours) | 0.464 ± 0.011 | 0.189 ± 0.002 |
We obtained similar results in the fragment-based generation task (reported in page 8), where we proposed an approximate solution.
Overall, miscalculating automorphisms can degrade performance. However, as long as we can approximate them well, we should always apply the necessary corrections to eliminate bias.
Scalability Comparison
Our method scale far better than other correction methods. We evaluated each method, varying the number of transitions per trajectory. We sampled 100 random graphs for each horizon length category. We measured the time spent on only the major components of each method (e.g., computing automorphisms for Reward Scaling). The table below shows the additional computational cost (compared to Vanilla GFlowNets) incurred by each method per trajectory.
Plot link: https://drive.google.com/file/d/1hhB0TVyHKmclFJ65O__BlX74fRUvNtrS/view?usp=sharing
| Method | 10 Transitions (ms) | 50 Transitions (ms) | 100 Transitions (ms) |
|---|---|---|---|
| Transition Correction | 24.32 ± 6.28 | 1148 ± 240.3 | 7354 ± 1288 |
| PE (Ma et al., 2024) | 5.49 ± 0.58 | 186.7 ± 29.87 | 997.5 ± 133.9 |
| Orbit Correction (Ours) | 0.215 ± 0.025 | 2.106 ± 0.116 | 6.975 ± 0.421 |
| Reward Scaling (Ours) | 0.024 ± 0.002 | 0.063 ± 0.004 | 0.111 ± 0.008 |
While the cost increases for all methods as the number of transitions grows, our method clearly scales better. It is important to note that the time differences accumulate over the entire training duration. Our implementation of positional encoding is adapted from torch_geometric.transforms.AddRandomWalkPE.
Training Time Comparison
We report the total training time for each method, including positional encoding (PE) method of Ma et al. (2024). The results for each training step are summarized in the table below.
| 1000 Steps (s) | 3000 Steps (s) | 5000 Steps (s) | |
|---|---|---|---|
| Transition Correction | 1338 ± 79 | 4122 ± 168 | 6859 ± 80 |
| PE (Ma et al., 2024) | 1322 ± 44 | 4001 ± 131 | 6604 ± 152 |
| Orbit Correction (Ours) | 1176 ± 32 | 3577 ± 100 | 5987 ± 168 |
| Reward Scaling (Ours) | 1178 ± 31 | 3584 ± 97 | 5999 ± 146 |
While PE is faster than Transition Correction (which performs several isomorphism tests), our methods (Reward Scaling, Orbit) are faster. We used one processor with one GPU (24GB TITAN RTX GPU, Intel Xeon Silver 4216 CPU).
Overall, our method has lower computational cost compared to other methods and scales better.
Thank you for your thorough response to my concerns. All of these concerns have been well addressed.
We are happy to hear that your concerns have been fully addressed, and we appreciate thoughtful follow-up and continued support. Your feedback throughout the review process has helped us improve the clarity of our work.
The authors propose a formal characterization of the equivalent action problem, first outlined by Ma et al. (2014), along with a simple fix by re-scaling the reward to account for automorphisms.
给作者的问题
N/A
论据与证据
To my best judgment, the work is theoretically sound, and the authors provide compelling evidence that their method fixes the pathology they set up to solve—with a few caveats pointed out below.
方法与评估标准
The comparison should include Ma et al. (2014)'s approach, including comparisons regarding computing time.
理论论述
I didn't check the proofs closely, but I found the notation overcharged and sometimes confusing. For instance, the same is used for sets, edges, and graphs. Also, is used is used to denote bot a set of ordered pairs of graph and to denote a function outputting the set of graphs reachable from in a single transition. There are also things that are not formally defined like , which I guess assumes is an equivalence class of graphs.
实验设计与分析
I like the synthetic experiment showcasing the pathology in question. However, I would also like to point out that analyzing the results in Table 1 alone might be misleading, as the quantities presented do not measure sampling correctness. A measure like [1]'s FCS would better proxy distributional correctness.
补充材料
I skimmed through the appendices.
与现有文献的关系
This work addresses a well-known issue in the GFlowNet literature with an easy and provably correct fix --- which could become a standard when implementing GFlowNets to sample from distributions over anonymous graphs.
遗漏的重要参考文献
To the best of my knowledge, authors properly recognize the prior art.
其他优缺点
Strengths
-
Proposes a formal treatment to the equivalent actions problem, along with an easy fix by rescaling the reward
-
The manuscript is well written, and developments are mostly easy to follow
-
Nice illustration in a synthetic experiment, along with experiments in molecule generation
Weaknesses
-
Missing direct comparison with Ma et al.'s (2014) original fix
-
Measures of performance for the molecule generation experiments to not reflect goodness-of-fit (i.e., distributional correctness)
-
Notation is sometimes confusing (see "Theoretical Claims" above)
其他意见或建议
N/A
We appreciate your thoughtful feedback, and recognizing our work as important.
Comparison with Ma et al. (2024)
Implementation of Positional Encoding (PE) method
To better position our work, we implemented the PE method to compare with Ma et al. (2024) and made our best effort to reproduce their approach. We computed 8-dimensional PEs for each node, with edge-level PEs obtained by summing the corresponding node-level PEs. Finally, actions sampled from the policy were compared with other forward/backward actions based on their PEs to identify equivalent actions.
Performance Comparison
Since PE is an approximate solution, it underperforms compared to our method in terms of error. We validated this in two synthetic environments: the Cliques and Cycles environment. The table below presents the results for the Trajectory Balance objective.
| Cliques (L1 Error) | Cycles (L1 Error) | |
|---|---|---|
| PE (Ma et al., 2024) | 0.476 ± 0.026 | 0.233 ± 0.005 |
| Reward Scaling | 0.464 ± 0.011 | 0.189 ± 0.002 |
For the Detailed Balance objective, however, it is difficult to predict whether Reward Scaling will outperform, as the PE method corrects intermediate probabilities similarly to Orbit Correction. In our experiments, PE performed on par with Orbit Correction.
| Cliques (L1 Error) | Cycles (L1 Error) | |
|---|---|---|
| PE (Ma et al., 2024) | 0.129 ± 0.007 | 0.190 ± 0.009 |
| Orbit Correction (Ours) | 0.122 ± 0.008 | 0.184 ± 0.005 |
| Reward Scaling (Ours) | 0.150 ± 0.005 | 0.188 ± 0.025 |
However, the PE method, as proposed in Ma et al. (2024), requires task-specific tuning and is not applicable to fragment-based generation. Therefore, we do not recommend using PE. Orbit Correction offers clear advantages over PE: exact correction, faster computation, broader applicability, and ease of implementation.
Scalability Comparison
Since PE must be computed at every transition, its computational cost scales poorly with the trajectory horizon. We evaluated each method, varying the number of transitions per trajectory. We sampled 100 random graphs for each horizon length category. The table below shows the additional computational cost (compared to Vanilla GFlowNets) incurred by each method per trajectory.
Plot link: https://drive.google.com/file/d/1hhB0TVyHKmclFJ65O__BlX74fRUvNtrS/view?usp=sharing
| Method | 10 Transitions (ms) | 50 Transitions (ms) | 100 Transitions (ms) |
|---|---|---|---|
| Transition Correction | 24.32 ± 6.28 | 1148 ± 240.3 | 7354 ± 1288 |
| PE (Ma et al., 2024) | 5.49 ± 0.58 | 186.7 ± 29.87 | 997.5 ± 133.9 |
| Orbit Correction (Ours) | 0.215 ± 0.025 | 2.106 ± 0.116 | 6.975 ± 0.421 |
| Reward Scaling (Ours) | 0.024 ± 0.002 | 0.063 ± 0.004 | 0.111 ± 0.008 |
While the cost increases for all methods as the number of transitions grows, our method clearly scales better. It is important to note that the time differences accumulate over the entire training duration.
Training Time Comparison
While PE is faster than Transition Correction (which performs several isomorphism tests), our methods (Orbit Correction, Reward Scaling) are faster. We present the actual running time of each method on the synthetic environment presented in the paper.
| 1000 Steps (s) | 3000 Steps (s) | 5000 Steps (s) | |
|---|---|---|---|
| Transition Correction | 1338 ± 79 | 4122 ± 168 | 6859 ± 80 |
| PE (Ma et al., 2024) | 1322 ± 44 | 4001 ± 131 | 6604 ± 152 |
| Orbit Correction (Ours) | 1176 ± 32 | 3577 ± 100 | 5987 ± 168 |
| Reward Scaling (Ours) | 1178 ± 31 | 3584 ± 97 | 5999 ± 146 |
We used one processor with one GPU (24GB TITAN RTX GPU, Intel Xeon Silver 4216 CPU).
Measure of Goodness-of-Fit
We emphasize that the purpose of the molecule generation task is to assess whether the correction methods can improve the generation of high-reward samples in realistic reward settings, rather than to demonstrate the correctness of our method. Therefore, we included the correlation plot only for fragment-based generation (Fig. 4). We obtained improved results with correction for atom-based generation as well, which will be incorporated into the revised version.
We used correlation as a metric because it is a widely adopted measure of goodness-of-fit in several previous works (Malkin et al., 2022; Malkin et al., 2023; Madan et al., 2023). While we considered your suggestion to use the recently proposed FCS score (Silva et al., 2025), we concluded that further investigation is needed to determine the best practice for its application.
On Notations
While we summarized our notations in Appendix A, we acknowledge that some of our notation can be confusing. We are carefully reviewing our notation and theoretical explanations to make the paper more readable in the revised version.
We hope this addresses your concerns regarding the weaknesses of our paper.
This paper addresses the equivalent action problem in GFlowNets for graph generation, providing a theoretical foundation and solution to a bias previously identified in Ma et al. (2024). While the paper offers valuable theoretical insights and formal proofs, the core solution (reward scaling based on automorphism counts) appears to be fundamentally the same as what was previously proposed. The paper's novelty is therefore primarily in its theoretical analysis rather than in the proposed method itself, raising questions about sufficient contribution for publication.
给作者的问题
na
论据与证据
The paper's claims are mathematically proven in the appendix. The authors establish that:
- GFlowNets without correction exhibit systematic bias toward graphs with fewer symmetries in atom-based generation and toward symmetric components in fragment-based generation.
- This bias can be corrected by scaling rewards based on the automorphism group size.
- This correction is sufficient for both TB and DB objectives.
These claims are supported by mathematical proofs and experimental validation.
方法与评估标准
The proposed reward scaling method is mathematically sound and the evaluation metrics used are relevant:
- L1 error for synthetic experiments
- Diversity, Top K diversity, reward metrics for molecule generation
These are appropriate for measuring both the theoretical correctness and practical utility of the approach.
理论论述
These claims are supported by mathematical proofs that seem correct
实验设计与分析
The illustrative example with uniform reward is particularly effective at visualizing the bias. The experimental settings span synthetic graphs and molecule generation tasks, providing good coverage of potential applications.
One weakness is the lack of direct experimental comparison with the positional encoding approach from Ma et al. (2024), which would have strengthened the paper's positioning.
补充材料
skimmed only
与现有文献的关系
The paper's relationship to Ma et al. (2024) is the most critical aspect to evaluate. The authors acknowledge that Ma et al. (2024) was the first to identify the equivalent action problem in GFlowNets and propose methods to address it.
However, the core solution in both papers appears to be fundamentally the same: accounting for graph automorphisms to correct the sampling bias. While this paper provides a more rigorous theoretical foundation and generalizes to fragment-based generation, the central mechanism of the solution is not novel.
The paper states: "our work provides the first rigorous theoretical foundation for the correction," which is accurate, but the actual solution method appears to be a formalization of what was already proposed rather than a new approach.
遗漏的重要参考文献
na
其他优缺点
Appendix C contains an unclear explanation. The statement "permuting nodes 3 and 4 yields the same graph" doesn't seem accurate for the example shown. A more complex permutation (e.g., 1→4, 2→5, 3→1, 4→3, 5→6, 6→2) would be needed to map between the two graphs.
Otherwise, the paper is well-written and easy to follow
其他意见或建议
na
We sincerely appreciate your thoughtful comments and valuable feedback. In particular, we sincerely hope that our response helps evaluation recognizing our contributions, especially in providing a rigorous theoretical foundation and conducting extensive experiments on the problem.
Clarification on the Differences from Ma et al. (2024)
Our proposed approach is methodologically different from that of Ma et al. (2024). Their method explicitly identifies actions that lead to the same next state to compute exact transition probabilities, using positional encoding as an approximate way to find such actions. In contrast, our approach is based on Theorem 4.6 and does not rely on identifying equivalent actions. This brings clear operational advantages over the approach of Ma et al. (2024).
Exactness: While our method provides an exact bias correction, the PE method can only approximate the target distribution.
Faster Computation: Our method is orders of magnitude faster than the PE method, as PE should be computed for every transition (see Appendix I and the comparison below).
Broader Applicability: Our method is applicable to any graph type and can be extended to fragment-based generation with minimal modifications. In contrast, the PE method as proposed in Ma et al. (2024) cannot be applied to graphs with edge types or fragment-based generation.
Straightforward Implementation: Our method requires only scaling the final reward, making it straightforward to implement. In contrast, the PE method must identify both forward and backward equivalent actions and sum their probabilities, which may require substantial modifications to existing codebases.
One of our key theoretical findings is that graph automorphism is central to this problem, and in this sense, all solutions may appear fundamentally similar in hindsight. However, our approach offers clear theoretical and practical advantages, providing exact bias correction while maintaining broad applicability across different graph types. We further validate these arguments in the following comments.
Experimental Comparison with Ma et al. (2024)
To better position our work, we implemented the PE method to compare with Ma et al. (2024) and made our best effort to reproduce their approach. Implementation details, as well as plots for additional experiments, can be viewed through the following link: https://docs.google.com/document/d/1GhQgcaxQ32uXM7s6xfteGWsBovdm-JvQNW_m3hAG30Q/edit?usp=sharing
Performance Comparison
Since PE is an approximate solution, it underperforms compared to ours in terms of error. We validated this in two synthetic environments: Cliques and Cycles. The results are presented using the Trajectory Balance objective, while the results for the Detailed Balance objective are provided in the linked document.
| Cliques (L1 Error) | Cycles (L1 Error) | |
|---|---|---|
| PE (Ma et al., 2024) | 0.476 ± 0.026 | 0.233 ± 0.005 |
| Reward Scaling (Ours) | 0.464 ± 0.011 | 0.189 ± 0.002 |
Scalability Comparison
Since PE must be computed at every transition, its computational cost scales poorly with the trajectory horizon. We evaluated each method, varying the number of transitions per trajectory. We sampled 100 random graphs for each horizon length category. We measured the time spent on only the major components of each method (e.g., computing automorphisms for Reward Scaling). The table below shows the additional computational cost incurred by each method per trajectory.
| Method | 10 Transitions (ms) | 50 Transitions (ms) | 100 Transitions (ms) |
|---|---|---|---|
| PE (Ma et al., 2024) | 5.49 ± 0.58 | 186.7 ± 29.87 | 997.5 ± 133.9 |
| Orbit Correction (Ours) | 0.215 ± 0.025 | 2.106 ± 0.116 | 6.975 ± 0.421 |
| Reward Scaling (Ours) | 0.024 ± 0.002 | 0.063 ± 0.004 | 0.111 ± 0.008 |
While the cost increases for all methods as the number of transitions grows, our method clearly scales better. It is important to note that the time differences accumulate over the entire training duration.
Training Time Comparison
We measured the actual training time with the same setting used for our paper.
| 1000 Steps (s) | 3000 Steps (s) | 5000 Steps (s) | |
|---|---|---|---|
| PE (Ma et al., 2024) | 1322 ± 44 | 4001 ± 131 | 6604 ± 152 |
| Orbit Correction (Ours) | 1176 ± 32 | 3577 ± 100 | 5987 ± 168 |
| Reward Scaling (Ours) | 1178 ± 31 | 3584 ± 97 | 5999 ± 146 |
Our method is faster than the PE method in terms of actual training time.
Appendix C: We appreciate the feedback. We will revise it in the revision.
Conclusion
Overall, our method offers clear advantages over Ma et al. (2024), which are validated through additional experiments. This demonstrates the novelty of our solution. We compared our method with Ma et al. (2024) to better position our work, and we will incorporate these results into the revised version.
This paper addresses the symmetry-induced bias in GFlowNets by proposing a reward scaling method based on graph automorphisms. While this paper is heavily inspired by a prior work (Ma et al., 2024), the authors clearly demonstrate improved scalability and methodological contributions on being able to correct biases without requiring transition-level equivalence checks. Empirical results on synthetic and molecular tasks support the method’s effectiveness. Overall, I recommend acceptance.