Genetic-guided GFlowNets for Sample Efficient Molecular Optimization
We propose a genetic-guided GFlowNet, which integrates a genetic search to guide GFlowNets to explore high-reward regions efficiently, achieving SOTA performance in an official molecular optimization benchmark.
摘要
评审与讨论
In this paper, the authors present a sample-efficient molecular optimization method using GFlowNets and a genetic algorithm. The authors demonstrate its effectiveness in generating inhibitors against SARS-CoV-2, with fewer reward calls than other methods, as well as on the PMO benchmark, where the method outperforms all other methods on the 23 tasks (at least on top-1, -10, and -100 AUC). The paper is strong, novel, and well-written, with also very nice figures which are also well-annotated. Sadly, no code provided.
优点
- A molecular optimization method that actually computes the sample efficiency - well done! And great results outperforming SOTA with a well-motivated strategy.
- The authors postulate that genetic algorithms can more effectively navigate the chemical space through domain-specific genetic operations, something which deep generative models generally lack. As such, their approach leverages the ability of GFlowNets to generate diverse molecules, while instead leveraging the optimization power of genetic algorithms. Neat idea.
- The ability for users to control the score-diversity trade-off via the inverse temperature parameter is very useful. Nonetheless, I believe that some of the other methods shown in Figure 3 also have that feature (e.g., REINVENT4 also has a controllable temperature parameter), which many be misleading in the current text/plot that suggests only the genetic GFN has it.
缺点
- How sensitive is the genetic GFN to the hyperparameters? I saw that many defaults were chosen to compare directly to REINVENT, did these also happen to be good options for GGFN or could even better results have been obtained? Some discussion around hyperparameter sensitivity would be interesting, perhaps I missed it. Similarly, a discussion around the computational complexity of the model would be interesting.
问题
- In Table 1, the authors demonstrate that their approach outperforms MolGA and REINVENT on most tasks from the PMO benchmark when using the AUC top-10 metric. Do these results still hold for the AUC top-1 (Table 14 suggests they do), and if so, is it by similar amounts or are the gaps wider between the top 3 models?
- How does the model scale with the number of atoms? For instance, can it scale well to other modalities or macromolecules?
- It is interesting to see that for task #22 (valsartan_smarts), only the genetic GFN and REINVENT perform somewhat well on this. Do the authors have any idea why the genetic GFN and REINVENT are the only models which do not completely fail on this task? I think it would be interesting to dig more deeply into some of the specific oracles/tasks and look at the molecules being generated by the genetic GFN and REINVENT - for instance, do they find the same solutions here?
局限性
- Despite the excellent results, the authors fail to provide any accompanying code, which is a shame (the link in the paper points to a non-existing anonymized repo). The lack of openly-shared code casts doubt on the performance of the model and some of the claims made in the paper, hence the reason for the low score. Hard to say how well-documented the code is or how useful it will be to researchers.
Thanks for the valuable comments. We address concerns as follows.
Nonetheless, I believe that some of the other methods shown in Figure 3 also have that feature (e.g., REINVENT4 also has a controllable temperature parameter), which many be misleading in the current text/plot that suggests only the genetic GFN has it
Thanks for the suggestion, and we agree on your feedback. We will include the explanation of temperature-like parameter in REINVENT4.
How sensitive is the genetic GFN to the hyperparameters? I saw that many defaults were chosen to compare directly to REINVENT, did these also happen to be good options for GGFN or could even better results have been obtained?
In the manuscript, we have searched additional parameters only (GA parameters and the number of replay training); the results are provided in Appendix F.6.
As commented, we further conducted hyperparameter sensitivity analysis (batch size, learning rate, and varying the number of layers) with 3 independent runs. Since we have not searched all hyperparameters, there might be the better combination of hyperparameter setup.
| batch 64, lr 0.0005 (main) | batch 128, lr 0.0005 | batch 64, lr 0.001 |
|---|---|---|
| 16.088 ± 0.025 | 15.801 ± 0.016 | 15.900 ± 0.035 |
| 2 layers | 3 layers (main) | 4 layers |
|---|---|---|
| 15.628 ± 0.021 | 16.088 ± 0.025 | 16.012 ± 0.030 |
The results show that our method consistently achieves competitive performance under the different hyperparameter setups.
Similarly, a discussion around the computational complexity of the model would be interesting.
Unfortunately, rigorous analysis of computational complexity presents significant challenges for the following reasons:
- The number of iterations is performance-dependent, terminating either when the maximum reward calls are reached (where repeated samples do not necessitate additional calls) or when early termination conditions are met.
- Graph GA involves RDKit API calls for tasks such as converting molecules to SMILES or removing certain components, (e.g., kelkulization).
Do these results still hold for the AUC top-1, and if so, is it by similar amounts or are the gaps wider between the top 3 models?
Here, we provide the results of the top3 models. For each metric, the performance gaps are similar.
| AUC Top1 | AUC Top10 | AUC Top100 | |
|---|---|---|---|
| Genetic GFN | 16.527 ± 0.043 | 16.213 ± 0.042 | 15.516 ± 0.041 |
| Mol GA | 16.001 ± 0.027 | 15.686 ± 0.025 | 15.021 ± 0.025 |
| SMILES-REINVENT | 15.686 ± 0.035 | 15.185 ± 0.035 | 14.306 ± 0.033 |
| Avg. Top1 | Avg. Top10 | Avg. Top100 | |
|---|---|---|---|
| Genetic GFN | 17.924 ± 0.054 | 17.760 ± 0.054 | 17.481 ± 0.054 |
| Mol GA | 17.252 ± 0.032 | 17.116 ± 0.030 | 16.816 ± 0.029 |
| SMILES-REINVENT | 17.345 ± 0.040 | 17.149 ± 0.042 | 16.763 ± 0.043 |
How does the model scale with the number of atoms? For instance, can it scale well to other modalities or macromolecules?
The maximum length of SMILES is set to 140, consistent with the setting used in REINVENT. This length corresponds to approximately 70-100 atoms, depending on the molecules. For JNK3 (#10), which consists of relatively large molecules, the generated SMILES lengths range from 50 to 130, with the number of atoms varying from approximately 35 to 100. In contrast, for the isomers_c7h8n2o2 (#8), the generated molecules typically contain about 10-15 atoms.
It is interesting to see that for task #22 (valsartan_smarts), only the genetic GFN and REINVENT perform somewhat well on this. ... I think it would be interesting to dig more deeply into some of the specific oracles/tasks and look at the molecules being generated by the genetic GFN and REINVENT - for instance, do they find the same solutions here?
Thanks for the insightful comments. The valsartan SMARTS targets molecules containing a SMARTS pattern related to valsartan while being characterized by physicochemical properties corresponding to the sitagliptin molecule [1]. It measures the arithmetic means of several scores, including (1) binary score about whether it contains a certain SMARTS structure, (2) LogP, (3) TPSA, and (4) Bertz score. Since we utilize a TDC oracle function for evaluations, we provide our empirical observations here.
- Difficulty of the task: Due to the binary score (1 if the certain SMARTS pattern is included), many tries terminate with 0. Especially with a limited number of oracle calls, generating molecules containing a certain sub-structure is notoriously hard. Other literature shows that other methods achieve high scores with more oracle calls [2]. With 10K calls, even REINVENT and Genetic only succeed to find non-zero score molecules once out of five independent runs.
- Another observation is that methods (REINVENT, Genetic GFN, and GEGL) achieving non-zero scores all generate SMILES with RNN-based models. Thus, we have a conjecture that SMILES generation is effective in generating a certain SMARTS pattern.
We provide examples of generated molecules with non-zero valsartan_smarts scores. Note that the other four seeds failed. Each run generates similar molecules (see Top1,10,100 samples in Fig.3 in the additional material), but the samples between the two runs (REINVENT and Genetic GFN) have different structures (the molecule-distance between Top1 samples is 0.854).
[1] Brown et al. (2019). GuacaMol: benchmarking models for de novo molecular design. Journal of chemical information and modeling.
[2] Hu et al. (2024). De novo drug design using reinforcement learning with multiple GPT agents. Advances in Neural Information Processing Systems, 36.
Despite the excellent results, the authors fail to provide any accompanying code, which is a shame (the link in the paper points to a non-existing anonymized repo).
We apologize for the inconvenience caused. There was an error in the provided link; please refer to this link (https://anonymous.4open.science/r/genetic_gfn).
Thank you to the authors for the detailed response and clarification, and for updating the link to the code!
Limitation: Despite the excellent results, the authors fail to provide any accompanying code, which is a shame (the link in the paper points to a non-existing anonymized repo). The lack of openly-shared code casts doubt on the performance of the model and some of the claims made in the paper, hence the reason for the low score. Hard to say how well-documented the code is or how useful it will be to researchers.
Given your observation that our main limitation resulted in a lower score, we sincerely ask if you could reconsider and potentially raise our score.
Thank you for your response. I think the score is fine where it is.
This paper presents a novel approach called Genetic-guided GFlowNets (Genetic GFN) for sample-efficient molecular optimization. The method integrates domain-specific genetic algorithms to guide a GFlowNet policy toward higher-reward molecular samples.
优点
- The paper is very pedagogical and easy to read, clearly explaining the proposed method and its rationale.
- Extensive experiments demonstrate the effectiveness of Genetic GFN, showing state-of-the-art performance on benchmark tasks.
- The approach offers a promising direction for enhancing sample efficiency in molecular design.
- The method shows promising results in designing SARS-CoV-2 inhibitors, demonstrating potential real-world impact.
- The approach offers controllability of the score-diversity trade-off, which is valuable for practical applications.
缺点
- The proposed method is limited to molecular optimization and is not readily generalizable to other domains.
- While the graph GA is based on prior work, the paper would benefit from being more self-contained by including an algorithm or visualization of the mutation and crossover steps. For example, how does the GA ensures the validity of the molecules.
- A schematic diagram illustrating the entire training process, including both pretraining and GFlowNet training, would improve clarity.
- The anonymous link to code provided is not accessible, limiting reproducibility.
- Lack of efficiency/runtime analysis and comparison with the baseline methods
问题
- Is the GA implemented on CPU or GPU? If it’s on CPU, how slow it is?
- Would the model's performance improve if combined with modern RNN architectures like Mamba or xLSTM?
- How sensitive is the graph genetic algorithm to the way molecules are fragmented and the resolution of fragmentation?
- How might this approach be adapted to other discrete optimization problems beyond molecular design? (such as TSP in combinatorial optimization. A quick intuition would be sufficient)
- How does the method's performance scale with the size and complexity of the molecular space being explored?
- Is there potential for incorporating human feedback or preferences into the optimization process?
局限性
See above.
Thanks for the valuable comments.
The proposed method is limited to molecular optimization and is not readily generalizable to other domains.
Our method focuses on integrating strong domain-specific search heuristics into deep neural network policies using the off-policy nature of GFlowNets for sample-efficient optimization. This approach is adaptable to any task where a powerful domain-specific search heuristic is available. For example, in jailbreaking tasks on LLMs, one of the state-of-the-art methods is a genetic algorithm [1]; we could use this to train GFlowNets for jailbreaking policies. We indeed agree that the potential of developing automated genetic algorithm methods that can be applied across general domains, leveraging deep learning, as a promising direction for future work.
[1] Liu et al. "Autodan: Generating stealthy jailbreak prompts on aligned large language models." arXiv preprint arXiv:2310.04451 (2023).
the paper would benefit from being more self-contained by including an algorithm or visualization of the mutation and crossover steps.
Thanks for the helpful suggestion. We will include the new figure, provided in the attached additional material (Fig. 2), and more detailed explanation of Graph GA.
A schematic diagram illustrating the entire training process would improve clarity.
Thanks for helpful suggestion. We will include the new diagram, which is provided in the attached additional material (Fig.1).
The anonymous link to code provided is not accessible, limiting reproducibility.
We apologize for the inconvenience caused. There was an error in the link; please refer to this link (https://anonymous.4open.science/r/genetic_gfn).
Lack of efficiency/runtime analysis and comparison with the baseline methods
In sample efficient molecular optimization, the main computational bottleneck is evaluating Oracle functions, so it is common to compare efficiency based on the number of samples. Also, note that the main metric, the area under the curve (AUC), is defined to measure sample efficiency (the higher AUC score means the higher sample efficiency).
Our average runtimes are as follows. Though we tested all algorithms using similar computational environments, we did not rigorously control the computation resource.
| Avg (sec) | |
|---|---|
| Genetic GFN | 827.50 |
| SMILES-REINVENT | 252.88 |
| Mol GA | 9803.26 |
| Graph GA | 165.00 |
| GPBO | 1519.65 |
- The runtime of Mol GA is significantly longer than Graph GA, despite both utilizing the same crossover and mutation operations. This difference arises because Mol GA has a much smaller offspring size, and we observed a tendency to generate the already discovered molecules repeatedly (i.e., early convergence before reaching the maximum number of calls).
- Compared to REINVENT, our runtime is increased, but it is not significantly longer than other baselines.
- Note that the methods have different early termination rules, complicating direct comparisons.
Is the GA implemented on CPU or GPU? If it’s on CPU, how slow it is?
GA is implemented on the CPU (we adopt the implementation of Graph GA without parallel computation). As shown in the table above, Graph GA has a relatively short runtime. The runtime increase compared to REINVENT comes from replay training, which is roughly twice longer than genetic search time.
Would the model's performance improve if combined with modern RNN architectures like Mamba or xLSTM?
Thank you for the suggestion. We believe that utilizing modern RNN architectures like Mamba or xLSTM can indeed enhance performance. Recent work [2] has demonstrated that Mamba outperforms vanilla RNNs in their methodology.
[2] Guo & Schwaller (2024). Saturn: Sample-efficient Generative Molecular Design using Memory Manipulation. arXiv preprint arXiv:2405.17066.
How sensitive is the graph genetic algorithm to the way molecules are fragmented and the resolution of fragmentation?
First of all, we adopt the original implementation of Graph GA. The Graph GA crossover divides each parent molecule into two fragments, either by cutting within a ring or arbitrarily along non-ring edges. Mutations are applied mostly at the atom level.
How might this approach be adapted to other discrete optimization problems beyond molecular design?
We can utilize constructive RL policy, like AM [3], which sequentially adds elements into partial solution to complete solution. Notably, there is a work that trains AM with GFN [4]. To guide GFN training, we can utilize domain-inspired genetic algorithms, such as edge assembly crossover [5] or hybrid genetic search [6].
[3] Kool et al. (2019). Attention, learn to solve routing problems! ICLR [4] Kim et al. (2023). Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization. ICML [5] Nagata & Kobayashi, (2013). A powerful genetic algorithm using edge assembly crossover for the traveling salesman problem. INFORMS Journal on Computing [6] Vidal et al. (2012). A hybrid genetic algorithm for multidepot and periodic vehicle routing problems. Operations Research
How does the method's performance scale with the size and complexity of the molecular space being explored?
The maximum SMILES length is set to 140, consistent with REINVENT, corresponding to approximately 70-100 atoms. For JNK3 (#10), consisting of relatively large molecules, SMILES lengths range from 50 to 130, with 35 to 100 atoms. In contrast, isomers_c7h8n2o2 (#8) typically contain about 10-15 atoms.
Is there potential for incorporating human feedback or preferences into the optimization process?
Our method follows the (unsupervised) pretraining and fine-tuning framework, similar to approaches like RL with human feedback (RLHF) and direct preference optimization. One possible approach is incorporating the reward model used in RLHF as our oracle function and fine-tuning the policy with Genetic GFN.
thanks for the response. i ll maintain my score.
This work proposes a variant of GFlowNet, called genetic GFN, for molecular property optimization. Specifically, the authors use genetic search to guide the GFlowNet to explore high-reward regions, addressing the over-exploration issue in GFlowNet. Besides, the authors incorporate some effective training strategies to improve the performance further. The proposed genetic GFN achieves achieves state-of-the-art performance in practical molecular optimization (PMO) benchmark.
优点
- This paper studies an important scientific problem, i.e., molecular optimization.
- The paper is well-organized and easy to follow.
- The experimental results are comprehensive and strong.
缺点
- The novelty is somewhat limited, as the proposed Genetic GFN simply combines some existing techniques, that have been studied independently.
- In Table 4, Genetic GFN is worse than Graph GA and MARS on GSK3 + JNK3. The reason should be discussed.
- I am more curious about the difference between increasing KL term in loss function and directly using the logP of the reference distribution as part of the reward (Please see equation (4) in DPO [1]). In this situation, what's the different between GFlowNet and PPO.
[1] Direct preference optimization: Your language model is secretly a reward model
问题
See weaknesses
局限性
None
Thanks for the valuable comments. We address concerns as follows.
The novelty is somewhat limited, as the proposed Genetic GFN simply combines some existing techniques, that have been studied independently.
This method is novel because it is the first to combine 1D sequence generation using GFlowNets with 2D molecular graph search via genetic algorithms. This approach allows the policy to generate 1D sequences, which are easier to train, while utilizing a genetic algorithm to explore 2D molecular graphs, reaching regions that might be inaccessible to the 1D policy alone. Thanks to the off-policy nature of GFlowNets, we can leverage the insights from the genetic algorithm's 2D molecular graph search to enhance the training of the 1D sequence policy. Our experimental results empirically support these; please see Tables 2 and 3.
Combining existing methods in innovative ways is both important and novel, as it creates synergies that significantly enhance performance and efficiency, achieving breakthroughs that neither method could accomplish independently.
In Table 4, Genetic GFN is worse than Graph GA and MARS on GSK3 + JNK3. The reason should be discussed.
| GSK3b + JNK3 | GSK3b + JNK3 + QED + SA | |
|---|---|---|
| Graph GA | 0.368 ± 0.020 | 0.335 ± 0.021 |
| MARS | 0.418 ± 0.095 | 0.273 ± 0.020 |
| HN-GFN | 0.669 ± 0.061 | 0.416 ± 0.023 |
| Genetic GFN | 0.718 ± 0.138 | 0.642 ± 0.053 |
Thanks for pointing out. The results are directly brought from the work of Zhu et al. (2023) [1], and there were some mistakes in the number of Graph GA and MARS on the GSK3b + JNK3 task. The provided table here shows the correct numbers. Please also refer to Fig. 3 in the HN-GFN paper (https://arxiv.org/pdf/2302.04040).
[1] Zhu, Yiheng, et al. "Sample-efficient multi-objective molecular optimization with gflownets." Advances in Neural Information Processing Systems 36 (2024). (The numbers are from https://openreview.net/forum?id=ztgT8Iok130)
I am more curious about the difference between increasing KL term in loss function and directly using the logP of the reference distribution as part of the reward (Please see equation (4) in DPO [1]). In this situation, what's the different between GFlowNet and PPO.
Thanks for the engaging discussion on this important topic, which has broad relevance across many domains: KL constraint optimization (e.g., RLHF).
There are indeed several methods to incorporate KL constraints during optimization. One effective approach is to include a logP prior within the trajectory balance loss, treating it as a soft reward as you suggested. The following table shows both achieving outperforming performance (AUC10). Note that the results are obtained from three independent runs and our hyperparameters have been searched with Genetic GFN (KL).
| Genetic GFN (KL) | Genetic GFN (logP prior) |
|---|---|
| 16.088 ± 0.025 | 15.777 ± 0.018 |
When using explicit KL loss terms, it is important to note the differences between PPO and GFlowNets. PPO aims for reward maximization, whereas GFlowNets aim for reward matching, generating samples proportional to the reward. Even with KL constraints, PPO will seek a unimodal maximum reward within the KL constraint region, while GFlowNets will sample diverse, high-reward modes within the KL constraint region.
I'm glad to see that the author's response has addressed my concerns and I maintain the score, which tends to acceptance.
This paper proposed a combination of genetic algorithm (GA) and GFlowNets for molecular optimization, with an emphasis on sample efficiency (achieving high property value by few number of reward calls). The key motivation is that GA can incorporate domain specific knowledge by designing the mutation operations, which is the key to improve sample efficiency, while GFlowNets model the overall distribution of the molecular space in a data-driven manner. The pipeline thus consists of two components: a SMILES-based generative model using GFlowNets and a graph-based genetic search. Several techniques for training this pipeline, including unsupervised pretraining, experience buffer for off-policy training, and TB + KL loss functions, are proposed. Experimental results on PMO benchmark and in silico design for SARS-CoV-2 inhibitors are conducted to demonstrate the effectiveness of the proposed method.
优点
-
The proposed method is simple and easy to follow. Though there is a small gap between the motivation and the proposed method (see Weakness), overall I buy the story of combining domain-specific knowledge with pure data-driven methods; the proposed combination of GA and GFlowNets makes sense to me.
-
The authors clearly have put a lot of efforts on conducting and designing the experiments, for both the main results and the ablation studies. The proposed study on designing inhibitors against SARS-CoV-2 looks interesting to me. Though it is hard for me to judge whether the in silico score functions indeed have a strong correlation with real-world performance in the biological sense, including such efforts/results is already something intriguing from a machine learning perspective.
-
The paper is in a good shape. The authors did a good job on describing necessary background knowledge for GFlowNet. Sufficient technical details are provided in the paper.
缺点
-
One weakness is the gap between "domain-specific knowledge" and "GA method" mentioned in the introduction. When I read the introduction, I am expecting the authors will propose some new genetic operators that are well-designed and specific to domain-specific tasks, with clear indicator that these operators have strong correlation with domain-specific knowledge. It turns out that the authors are still using the standard genetic operators such as crossover and mutation. Are there any special care taken that I may have missed? for example, the crossover fragments are not purely random but are collected from motifs specific to tasks. If no, I feel the statement of "GA method can encode domain-specific knowledge" is very vague, or at least it not being discussed comprehensively if previous works have explored.
-
Another thing missing in the paper is enough qualitative results. I can only see Fig 4 & 5 contain some final generated molecules for two targets. It will be great if the authors can include more visual results, especially the trajectory of the sampled molecules, with highlights on what fragments have been changed during the process. Some analysis on why certain fragments are favored (if any) or remain at the final optimized structure is also very helpful for the readers to verify the effectiveness of the proposed method.
-
Another interesting study to show is how the proposed pipeline can be incorporated with grammar-based representations, such as STONED, SELFIES, and many more if search "molecular grammar", rather than SMILES. Since these manually designed grammar consists of more "domain-specific knowledge" compared to pure atom-based SMILES string, I would expect the experimental results will be better. It would be great to include such analysis in the paper to provide more evidence on the key motivation of the paper.
问题
- line 55, please remove "expectional"
- line 118, no references to "previous works"
局限性
See weaknesses.
Thanks for the valuable comments. We address concerns and questions as follows.
One weakness is the gap between "domain-specific knowledge" and "GA method" mentioned in the introduction. When I read the introduction, I am expecting the authors will propose some new genetic operators... It turns out that the authors are still using the standard genetic operators such as crossover and mutation.
We identify two potential but distinct approaches for integrating genetic algorithms with deep learning: (1) automating genetic algorithms (GA) using deep learning and (2) incorporating domain-specific GA's search capabilities into deep learning as an inductive bias. Our research focuses on the latter, leveraging powerful GAs designed based on chemical domain knowledge to enhance deep neural network policies for more sample-efficient molecular optimization strategies. We also acknowledge that the first approach, automating genetic algorithms with deep learning, could be a valuable direction for future research. We will revise our manuscript to articulate these points clearly.
Are there any special care taken that I may have missed? for example, the crossover fragments are not purely random but are collected from motifs specific to tasks. If no, I feel the statement of "GA method can encode domain-specific knowledge" is very vague, or at least it not being discussed comprehensively if previous works have explored.
Crossover and mutation operations are conducted according to predefined patterns. For instance, when altering bond order (one type of mutation), the possible transformations are specified to ensure valid changes that adhere to chemical rules. Defining these valid operations requires domain knowledge and careful design of the logic.
Graph GA employs two crossover operations and seven mutation operations based on SMARTS patterns (e.g., inserting an atom into a double bond: [*;!H0:1]~[*:2]>>[*:1]=X-[*:2]). Creating and utilizing these patterns requires expertise in chemistry to ensure accurate representation and manipulation of molecular structures. Additionally, validation and sanitization steps ensure that only chemically plausible and stable molecules are considered. Further details about genetic operations can be found in the original Graph GA paper.
To make our manuscript self-contained, we will include a detailed explanation of how the crossover and mutation operations are designed to guarantee molecular validity in the Appendix, accompanied by Fig.2 in our supplementary material.
Another thing missing in the paper is enough qualitative results. I can only see Fig 4 & 5 contain some final generated molecules for two targets. It will be great if the authors can include more visual results, especially the trajectory of the sampled molecules, with highlights on what fragments have been changed during the process.
We provide additional visual results in Fig. 4 of our supplementary material. Due to space constraints, only the top three samples for 50, 100, 500, and 1000 steps are reported.
Our observations from the generated candidates for both targets are as follows.
- Many molecules include heterocyclic rings, which contain nitrogen or other non-carbon atoms. These structures may play a role in the molecules' interactions with the target protein.
- Benzene rings with various substituents (e.g., methyl, hydroxyl) are frequently observed. These substitutions could provide diverse interaction points with the target protein, although their exact contribution to binding affinity needs further investigation.
- There seems to be a trend of increasing molecular complexity and functional diversity over iterations. For example, at step 100, more complex substituents on aromatic rings are introduced compared to the generated candidates at step 50. After 1000 steps, we observe the addition of bulkier groups, such as tert-butyl and sulfone groups.
We plan to provide more visual results in the revised manuscript.
Another interesting study to show is how the proposed pipeline can be incorporated with grammar-based representations, such as STONED, SELFIES, and many more if search "molecular grammar", rather than SMILES. .... It would be great to include such analysis in the paper to provide more evidence on the key motivation of the paper.
We have included the results of Genetic GFN with SELFIES in Appendix F.5. As pointed out in the work of Gao et al. [1], despite of clear benefits of SELFIES, SMILES often shows competitive and better performances in sample efficient molecular optimization tasks. For instance, SMILES-REINVENT outperforms SELFIES-REINVENT, and SMILES-LSTM-HC(hill climbing) outperforms SELFIES-LSTM-HC; please see Table 3 and the analysis in Section 3.2 in the work of Gao et al. [1].
We also additionally provide experiments that incorporating STONED (GA with SELFIES) as an exploration strategy to guide GFN training instead of Graph GA. Note that STONED only utilize mutations (designing valid crossover with string representation is difficult).
| AUC1 | AUC10 | AUC100 | |
|---|---|---|---|
| Genetic GFN | 16.527 ± 0.043 | 16.213 ± 0.042 | 15.516 ± 0.041 |
| Genetic GFN (STONED) | 15.806 ± 0.037 | 15.439 ± 0.037 | 14.870 ± 0.036 |
| Mol GA | 16.001 ± 0.027 | 15.686 ± 0.025 | 15.021 ± 0.025 |
| SMILES-REINVENT | 15.686 ± 0.035 | 15.185 ± 0.035 | 14.306 ± 0.033 |
line 55, please remove "expectional"
Thanks for the suggestion. we removed the term in the revised manuscript.
line 118, no references to "previous works"
Thank you for bringing this to our attention. The "previous works" indicate the string generation approaches, which are usually high-ranked in the benchmark, like REINVENT, SMILES-LSTM hill climbing, and GEGL. We added these references in the revised version.
References
[1] Gao et al. "Sample efficiency matters: a benchmark for practical molecular optimization." Advances in neural information processing systems.
This is a reminder of that our discussion end in less than two days. If you have any remaining concerns, please let us know. If your concerns have been resolved, we kindly ask you to consider increasing your score.
The rebuttal adequately resolved my concerns. I will raise my score to 6. The authors shall integrate the results provided in the rebuttal to the manuscript.
We are pleased to hear that our responses have effectively addressed the concerns raised. Once again, we will ensure that all these discussions are thoroughly incorporated into the revised version of the manuscript. Thank you for the response and for re-scoring our work.
General Response
We are sincerely grateful to the reviewers for their valuable feedback on our manuscript. We are pleased to hear that the reviewers found our paper well-written (4tbn, gNyQ, kTKB, FCer) and supported by extensive experiments with state-of-the-art performance (4tbn, gNyQ, kTKB, FCer). We appreciate the recognition of the usefulness (4tbn, gNyQ), simplicity (4tbn, kTKB), novelty (4tbn), and rationale (gNyQ) of our method. Additionally, we are encouraged by the acknowledgment of our research as well-motivated and promising with potential real-world impact (4tbn, gNyQ).
Answers for the common concern and feedback
- Further explanation of Graph GA: We adopt the original implementation of Graph GA [1]. To make the manuscript self-contained, we will include detailed explanations of Graph GA along with figures; please see the summary of Graph GA andFig. 2 in the supplementary material.
- Provided link does not work: We apologize for the inconvenience. Our code is available at https://anonymous.4open.science/r/genetic_gfn. (directories: PMO
pmo/main/genetic_gfn, multi-objectivemulti_objective/genetic_gfn, SARS-CoV-2sars_cov2/genetic_gfn) - Analysis of generated molecules: We will include more detailed explanations along with visual results (Fig. 3 and 4 in the supplementary material).
Additionally, in our supplementary material, we provide a semantic overview of pretraining and fine-tuning with Genetic GFN (Fig. 1), examples of Graph GA operations (Fig. 2), visual results for valsartan_SMARTS (oracle ID: #22) in Fig. 3, and visual results of SARS-CoV-2 inhibitor designs in Fig. 4.
Summary of Graph GA: Graph GA is implemented by utilizing predefined SMARTS patterns for operations. During crossover, the algorithm randomly selects either non_ring_crossover or ring_crossover with equal probability. Non_ring_crossover involves cutting an arbitrary non-ring edge of two parent molecules and combining the subgraphs, while ring_crossover cuts two edges within a ring and combines the subgraphs from different parents. For mutations, the algorithm randomly applies one of seven modifications: atom_deletion, atom_addition, atom_insertion, atom_type_change, ring_bond_deletion, ring_bond_addition, and bond_order_change. Invalid molecules resulting from mutation are discarded and the mutation is re-applied.
The paper combines genetic algorithms and GFlowNets, which gives state-of-the-art results in the field of generative modeling for molecular data. This is a natural marriage, as GFlowNet is an off-policy method that allows to naturally introduce genetic algorithms into the sampling procedure.
All reviewers voted for acceptance, highlighting strong empirical results and combining the strengths of deep learning (high reward) and genetic algorithms (using domain-specific knowledge that helps navigate chemical spaces). As such, it is my pleasure to recommend acceptance of the paper.