MetaGFN: Exploring Distant Modes with Adapted Metadynamics for Continuous GFlowNets
We develop an improved exploration strategy for continuous GFlowNets inspired by metadynamics enhanced sampling.
摘要
评审与讨论
This paper introduces MetaGFN, an exploration algorithm for continuous Generative Flow Networks (GFlowNets) that combines metadynamics with GFlowNets to improve exploration in continuous domains. The authors prove the consistency of their adapted metadynamics approach and conduct experiments to showcase the performance of the proposed MetaGFN method.
优点
- The writing is clear and well-structured
- The methodology is explained in sufficient detail
- Addresses an important challenge in training GFlowNets
缺点
- Limited and unconvincing experimental evaluation:
- The experimental evaluation relies exclusively on simple, toy environments that fail to capture the complexity and challenges of real-world applications, especially the line and Grid tasks with small scale. For example, the authors can consider the more complex and challenging molecule generation task and biological sequence design tasks from (Bengio et al., 2021) and (Jain et al., 2022) with high-dimensional and complex spaces.
- The authors have not demonstrated the method's effectiveness in scenarios where exploration is genuinely challenging or where reward signals are naturally rare
- The results exhibit high variance (column 2, row 2 in column 3, rows 1-2 in column 4, rows 1-2 in column 5 in Fig. 4) and L1 loss values (all figures in Fig. 4), contradicting the paper's claims about robustness
- Insufficient empirical evidence for claims:
- The paper claims that MetaGFN "is the most robust" (line 375) variant, yet this assertion is questionable given the high L1 loss values and large variances shown in the results (Fig. 4)
- Scalability concerns:
- No evaluation in high-dimensional spaces, which is crucial for real-world applications
- Missing discussion of potential limitations in high-dimensional settings
- Novelty concerns:
- The paper primarily combines two existing methods (metadynamics and GFlowNets), but the experimental results do not convincingly demonstrate advantages over simpler alternatives, and the practical utility of the method remains uncertain given the limited evaluation. Therefore, the contribution seems incremental without strong empirical support
问题
- How does MetaGFN perform in more challenging exploration settings? This is crucial for practical applications but isn't addressed in the current experiments.
- How does the method scale to higher-dimensional problems? The current experiments are limited to low-dimensional spaces.
The authors work on exploration for GFlowNets in the continuous setting. Specifically, they combine continuous GFlowNets with Metadynamics, a Langevin dynamics based exploration strategy. They validate their method on a few smaller scale tasks.
优点
- Connecting continuous GFlowNets and meta-dynamics is novel.
- Authors provide theoretical motivation for their approach.
- The proposed method works on the provided environments.
缺点
My biggest concern is that the authors fail to show their method's applicability in larger environments. The authors state this in their limitations, but then I feel the method loses its significance, especially since the novelty is rather limited. Even in the provided experiments, the results are not significantly better than the baselines. I would also like to see the runtime analysis of the baselines--this might make the work more appealing. Besides, while I understand the importance of theoretical nature of the writing, a intuitive explanation will greatly improve the quality of the work.
问题
- How does the method work for flow matching?
- What is the condition for terminating states in the hypergrid? In the work, I see , but to what extent does the approximation hold?
This paper adapts metadynamics technique from molecular modelling to improve mode exploration in continuous GFlowNets. The approach works in the setting of black-box reward distributions without access to reward gradients, employing kernel density estimates in a lower-dimensional collective variable space. The experimental evaluation is carried out on three continuous environments, showing that the proposed MetaGFN approach outperforms various existing GFlowNet exploration techniques.
优点
This paper tackles a challenge that arises in GFlowNet training — mode exploration, which has received especially little attention in the literature in the case of continuous environments. Adapting metadynamics technique from molecular modelling, the paper falls into the important category of cross-disciplinary research.
The key strength of the paper in my opinion is thorough experimental evaluation in the presented environments. In addition to the proposed MetaGFN approach, paper presents the results for a number of baseline techniques, which were proposed in the previous works for discrete GFlowNets but in turn adapted by the authors to the continuous case. A faithful comparison is carried out, showing that MetaGFN has superior performance on average, but also clearly stating its shortcomings in some cases. I found Figure 5 really insightful.
缺点
I believe that the presentation of the proposed approach and the overall clarity of the paper can be greatly improved.
Without prior knowledge of metadynamics, it is challenging to understand the proposed approach from the text. Firstly, I suggest to move Section D of the Appendix to the main text, especially Algorithm 3, since it is crucial for understanding the overall structure of MetaGFN algorithm. Moreover, even though it is possible to understand the exact method from Algorithms 1, 2 and 3, I still have very little intuition about it. Before going into low-level details in Section 3, I would suggest adding a more high-level discussion explaining how the method works, some intuition behind it, and why it should improve exploration in continuous GFlowNets (Section 2.4 contains some, but it still doesn't give much high-level intuition). This would make the paper more accessible to a broader ML community.
In addition, some important details are unclear from the description of the experiments. Firstly, I failed to find in the text how collective variables are chosen for line and grid environments. The choice of collective variables is a crucial point in the algorithmic design of the proposed method, so emphasizing this is critical. Secondly, the clear descriptions of the choice of forward and backward kernels are missing. Especially little overall detail is given about grid environment. Is it the same as the one from [1]? It is possible that I missed some of these points while looking through appendix, but then it is one more reason to highlight them.
Next, from my understanding, the efficiency of the method directly depends on the choice of lower dimensional collective variable space. However, the dimensions of spaces of terminal states in all experiments are very low themselves (the maximum dimension is 4), so the big question is whether the proposed method can perform well in higher-dimensional tasks. Thus I suggest adding some higher-dimensional environment to the experiments. The method can still be interesting and viable if it can only perform well in lower dimensions, but then this should be clearly stated as a limitation, or experimentally demonstrated otherwise.
A minor detail, pdf works really slow when looking at Figure 4. I think reducing the density of the grids used to plot the training curves could help with it.
If all my concerns are addressed during rebuttal, I'm open to increasing my score.
References:
[1] Salem Lahlou, Tristan Deleu, Pablo Lemos, Dinghuai Zhang, Alexandra Volokhova, Alex Hernández-García, Léna Néhale Ezzine, Yoshua Bengio, Nikolay Malkin. A theory of continuous generative flow networks. ICML 2023
问题
-
Some experimental details are missing, see Weaknesses.
-
Could the proposed approach be applied to diffusion sampling tasks, e.g. on environments from [2]? If yes, adding some to the experimental evaluation would further strengthen the paper.
-
[2] introduces an approach where Metropolis-adjusted Langevin algorithm is used to explore the space of terminal states, store the results in a replay buffer, and then use the backward policy to sample trajectories ending in points stored in the buffer. Thus it shares a lot of similarities with the proposed method. Do I understand correctly that it cannot be employed in your setup because MALA requires access to the reward gradient? I believe it is important to add a discussion about this to the paper, so the work is clearly positioned in comparison to previous litareture.
-
In the definition of Measurable pointed graph, line 122, is it enough to bound the number of steps required to be able to reach any measurable set to guarantee acyclicity of the environment? The construction in [1] explicitly states that for some , the support of should contain only the sink state.
References:
[2] Marcin Sendera, Minsu Kim, Sarthak Mittal, Pablo Lemos, Luca Scimeca, Jarrid Rector-Brooks, Alexandre Adam, Yoshua Bengio, Nikolay Malkin. Improved off-policy training of diffusion samplers. NeurIPS 2024
This paper investigates the exploration strategies for continous Generative Flow Networks (GFlowNets) by proposing a variant of metadynamics, termed Adapted Metadynamics. The proposed method, MetaGFN, applies to arbitrary black-box reward functions over continuous domains. The authors prove that Adapted Metadynamics is consistent and converges to standard metadynamics in a certain limit. They also empirically show that MetaGFN outperforms existing GFlowNet exploration strategies on continuous domains.
优点
The paper introduces a variant of metadynamics that enables explorations in black box rewards and continuous spaces. It proves that the method reduces to standard metadynamics in a limit and provides empirical evidence that MetaGFN outperforms existing GFlowNets exploration strategies. This paper is well-written and has clear structures.
缺点
-
The process for selecting the low-dimensional collective variables is not clear. Could the authors give an example illustrating how to obtain these CVs?
-
If the CVs can only obtained from expert knowledge, does this limit the applicability of the method to certain domains?
-
The choice of hyperparameters of KDE such as Gaussian width and Gaussian height is not fully explained. Could the authors give more details?
-
Although the authors introduce adaptive metadynamics on pages 5-6, a more explicit breakdown of the differences and potential advantages of adaptive metadynamics over standard metadynamics would be helpful.
-
The experiments only reach a maximum dimension of 4 in the grid environment. Is this truly representative of high-dimensional cases? Expanding the experiments to higher dimensions would be more convincing.
-
The terms "Adaptive Metadynamics" and "Adapted Metadynamics" are used interchangeably in the paper. If both terms refer to Algorithm 1, it would be helpful to choose one term consistently throughout to avoid potential confusion.
问题
The questions are the same as in the weaknesses section.
We thank the reviewer for their comments and for acknowledging the good presentation and soundness of the research. The reviewer pointed out several weaknesses, which we address below.
"[..] selecting collective variables is not clear. Could the authors give an example illustrating how to obtain these CVs?"
We have made major modifications to Section 2.4 (metadynamics and collective variables). As part of these changes, we explicitly highlight the three key features required of good collective variables. We also reference the classical TiCA method, which is one of the most popular ways of identifying CVs within the molecular dynamics literature. We include a reference to a popular review paper (Sidky et al 2020) which offers a comprehensive overview of CV-learning approaches.
Note in simple environments, with high degrees of symmetry (such as the line or grid environment), suitable CVs can coincide with the natural (cartesian) coordinates of these systems. We have now made that explicit (lines 286-387 and lines 415-416).
"If the CVs can only obtained from expert knowledge, does this limit the applicability of the method to certain domains?"
CVs can be obtained in a data-driven approach without expert knowledge (see answer above). The concept of a CV is related to the intrinsic data dimension, which could be extracted in numerous ways e.g. PCA, TiCA, manifold learning algorithms etc. We believe that our rewording of Section 2.4 now addresses this point.
"The choice of hyperparameters of KDE such as Gaussian width σ and Gaussian height w is not fully explained."
Overall, the method is not too sensitive to the specific choices of w and σ; the right order of magnitude is sufficient for reasonable results.
Setting σ: We mentioned in Section 3 that σ is determined by the length-scale of variation of the problem, which is usually known or can be estimated in practice; for the line environment we set σ = 0.1 (smallest variance peak was 0.1) and for the grid environments σ = 2 (variance of peaks is 2). We did not require tuning - these ball-park values immediately gave the expected metadynamics behaviour.
Setting w: This parameter controls the deposition rate of the Gaussian bias and is linked to freqMD, which determines how often biases are applied. Simulations with w and freqMD such that w/freqMD = const show similar behaviour. If w is too low, exploration slows, and if too high, metadynamics becomes unstable. Intermediate w values yield sensible results. We set w by examining an on-policy training run and setting w such that metadynamics explores new modes in a similar timescale to the time it takes the policy to learn to correctly sample the previous mode. As such, it didn't require any extensive hyperparameter tuning.
[..] a more explicit breakdown of the differences and potential advantages of adaptive metadynamics over standard metadynamics would be helpful.
The key difference is the reliance on gradients: standard metadynamics requires access to both a potential function and its gradient, while adapted metadynamics requires only a reward oracle. The potential landscape in adapted metadynamics is inferred using a time-dependent KDE, which replaces the potential and its gradients. Beyond this, the two algorithms are identical.
However, it is not obvious that substituting the potential with a dynamically updating KDE would preserve metadynamics' exploration properties (e.g., eventual uniform sampling over V(x)). This critical point is rigorously established in Theorem 3.1.
To address ambiguities, we have improved the clarity of Section 2.4 (introduction to metadynamics) and Section 3 (introduction to MetaGFN) in the revised manuscript. We believe these revisions make the differences between adapted metadynamics and standard metadynamics more explicit.
The experiments only reach a maximum dimension of 4 [..]. Is this truly representative of high-dimensional cases? [..]
We acknowledge that dimensionality scaling is a limitation of metadynamics, and higher-dimensional problems seem unfeasible for now. However, we would like to emphasize:
- All exploration strategies struggled even with the (relatively low) 4-dimensional problem, and MetaGFN typically performed best.
- Popular alternative strategies, such as Nested Sampling, rely on exhaustively sampling the reward space before training and thus have a cost that grows exponentially with dimensionality. MetaGFN does not require this pre-training.
- The 4-dimensional problem is genuinely hard (< 0.0025% of the domain has a reward greater than 0.02).
- Low to medium dimensionality CVs are a standard setup in many molecular sampling problems, where training amortised samplers (e.g. GFlowNets) is potentially very valuable.
"The terms "Adaptive Metadynamics" and "Adapted Metadynamics" are used interchangeably.
Thank you for pointing this out. We now refer to the method as "Adapted Metadynamics" throughout.
This paper identifies that the exploration for continuous setting remains lacking in the context of GFlowNets. On the basis, it incorporates Adapted Metadynamics as an exploration strategy for continuous GFlowNets. Experiments demonstrate that the proposed method can accelerate convergence and provide more effective exploration.
优点
- The theoretical results are sound and comprehensive.
- The research problem is fundamental and related to various subjects.
缺点
- The research problem and research gap is not very clear in the introduction. Authors are encouraged to highlight the research gap clearly. Is there any prevailing works that conduct off-policy exploration in the continuous setting? I have found some related statements in the lines 60-61, but the narrative is not very clear in this paragraph such that it is difficult to identify the true research gap surely.
- The research motivation is not very clear. As mentioned by authors, the exploration strategies in discrete setting can be adapted to the continuous setting. Authors should demonstrate the motivation of developing MetaGFN rather than adapting prevailing exploration strategies, by highlighting the unique advantage of it.
- The complexity and running cost should be analyzed or evaluated.
问题
See the weaknesses above.
This paper proposes MetaGFN, which adapts metadynamics for exploration in continuous GFlowNets through a novel approach using collective variables. While the reviewers acknowledged the paper's theoretical soundness and the potential value of improving exploration strategies for continuous GFlowNets, significant concerns were raised about the method's practical applicability and empirical validation. The primary limitations include the restriction to low-dimensional problems (maximum dimension of 4), the lack of convincing demonstrations in more challenging real-world scenarios beyond toy examples, and the high variance in experimental results that questions the claimed robustness of the method. Although the authors provided detailed responses and made efforts to improve the manuscript's clarity, particularly in explaining the intuition behind metadynamics and collective variables, the core concerns about dimensionality scaling and limited experimental validation remain unaddressed. The selection of collective variables also heavily relies on domain knowledge, which may limit the method's general applicability.
Given these substantial limitations and the preliminary nature of the empirical results, I recommend rejection, encouraging the authors to strengthen the work by demonstrating effectiveness in higher-dimensional problems or more complex real-world applications.
审稿人讨论附加意见
The primary limitations include the restriction to low-dimensional problems (maximum dimension of 4), the lack of convincing demonstrations in more challenging real-world scenarios beyond toy examples, and the high variance in experimental results that questions the claimed robustness of the method. Although the authors provided detailed responses and made efforts to improve the manuscript's clarity, particularly in explaining the intuition behind metadynamics and collective variables, the core concerns about dimensionality scaling and limited experimental validation remain unaddressed. The selection of collective variables also heavily relies on domain knowledge, which may limit the method's general applicability.
Reject