PaperHub
3.5
/10
Rejected4 位审稿人
最低1最高7标准差2.2
3
1
3
7
4.8
置信度
正确性2.5
贡献度2.0
表达2.0
NeurIPS 2024

PMechRP: Interpretable Deep Learning for Polar Reaction Prediction

OpenReviewPDF
提交: 2024-05-15更新: 2024-11-06

摘要

关键词
chemistrydeep learninginterpretabletransformersreaction predictionmechanisms

评审与讨论

审稿意见
3

This paper attempts to address the problem of low interpretability of reaction prediction methods by proposing modeling step-wise polar reactions. To model such mechanisms it uses an existing dataset PMechDB. The authors propose an approach to model such reaction by first selecting the right atoms to react from the input molecules using learned models and then react them.

优点

  • Several existing chemistry reaction prediction models are benchmarked on the PMechDB dataset.
  • A way to integrate reaction mechanism information is introduced.

缺点

  • The paper has substantial clarity problems:
    • Table captions are insufficiently informative, requires going deeper into the text to understand what results are actually presented (e.g. 'Table 3: Top-N Accuracy of Trained Models').
    • Figures 5 and 6 are formatted inconsistently with the rest of the file.
  • Citation quality is poor:
    • Could provide more references to prior work overall. e.g. section 3.3 describes prior work on sequence to sequence modelling without any references.
    • PMechDB is introduced in a way that makes it unclear, whether the database is a contribution of this work or not.
  • Novelty is not prominent. Method in [18] (OrbChain) is already working with similar task on a similar dataset.
  • Evaluation is insufficient:
    • Source code for reproduction has not been provided.
    • The resulting models have not been evaluated on the global datasets, making it unclear whether the fine tuning as specified in this work improves the performance in general rather than on the test set of PMechDB.
    • Error bars are not provided.
    • Not benchmarked against a comparable method, referenced in [18].

问题

What would the performance of the models tuned on this dataset be on general reaction prediction benchmarks? Would it improve the performance as compared to models without interpretability steps or not?

局限性

The authors address some of the limitations of their work.

作者回复

Reviewer Comment: The paper has substantial clarity problems: Table captions are insufficiently informative, requires going deeper into the text to understand what results are actually presented (e.g. 'Table 3: Top-N Accuracy of Trained Models') Figures 5 and 6 are formatted inconsistently with the rest of the file.

Response: Table captions have been revised to better clarify the results being presented. Figures 5 and 6 have been updated to make them more consistent in formatting. Inside the attached pdf, we have provided Figure 3, this contains the revised versions of Figures 5 and 6.

Reviewer Comment: Citation quality is poor: Could provide more references to prior work overall. e.g. section 3.3 describes prior work on sequence to sequence modeling without any references.

Response: References have been added to section 3.3.

Reviewer Comment: Novelty is not prominent: Method in [18] (OrbChain) is already working with similar task on a similar dataset.

Response: For novel ML architectures, we present the two-step transformer. However, the major contribution of the manuscript is to address the prediction of polar reactions in an explainable way. Polar reactions are an extremely complex, commonly encountered, and fundamental group of chemical reactions in organic chemistry. Predicting polar reactions is one of the most difficult problems in AI applied to science, and understanding them is necessary for designing synthetic pathways. We train models on a newly introduced dataset and are able to provide interpretable predictions of polar mechanisms with high accuracy.

Reviewer Comment: Evaluation is insufficient: Source code for reproduction has not been provided. The resulting models have not been evaluated on the global datasets, making it unclear whether the fine tuning as specified in this work improves the performance in general rather than on the test set of PMechDB. Error bars are not provided.

Response: The source code for the two-step models cannot be provided as it uses licensed OpenEye Scientific Software to do the chemoinformatics processing.

Evaluating the models on the global datasets would be testing them on a task which they are not trained to do. The models are designed to predict elementary steps, not overall transformations like those contained in standard benchmarking datasets.

The standard metric for evaluation on reaction prediction models is top-n accuracy. They were trained once on the training dataset, and then evaluated on the testing dataset, there are no error bars to report.

Reviewer Comment: Not benchmarked against a comparable method, referenced in [18].

Response: Due to limitations in space, we cannot compare to all existing methods. We made a selection of a representative subset of relevant methods. We do not claim to compare all existing methods.

Reviewer Comment: What would the performance of the models tuned on this dataset be on general reaction prediction benchmarks? Would it improve the performance as compared to models without interpretability steps or not?

Response: This is a good point, but it is a complex question because of several reasons. Firstly, it is difficult to choose a benchmarking dataset. The most popular benchmarking datasets involve overall transformations, which are a series of elementary steps chained together.

Secondly, our method is computationally expensive. In order to predict an overall transformation, a series of elementary step predictions must be chained together. This means a branching tree search must be performed, and as the depth of the tree grows, the runtime increases exponentially. It will be difficult to assess the performance on overall transformations without knowing the depth (number of steps) in the pathway so we can make a reasonable stopping point. We have provided several example pathways that we have recovered in the attached pdf as Figure 2. We are actively collecting a dataset of pathways extracted from organic chemistry textbooks. In preliminary experiments, we recover the target products in 65% of reactions.

We cannot say what is the performance on general reaction prediction benchmarks, as we did not assess on overall transformation datasets.

审稿意见
1

Current reaction prediction models lack interpretability for chemical reaction prediction. This paper evaluates the various machine learning models on the PMechDB dataset which contains polar elementary steps. Besides, this paper proposes a new system: PMechRP, which achieves the highest top-5 accuracy.

优点

Strengths:

  1. A new benchmark has been introduced, which improves the interpretability and causality of a chemical reaction.

  2. Several methods are evaluated.

缺点

Weaknesses:

  1. This paper seems like a technique report.

  2. The main conference track is not suitable for this paper. I think the dataset & benchmark track is more suitable.

  3. Writing is poor.

问题

N/A

局限性

N/A

作者回复

Reviewer Comment: This paper seems like a technique report. The main conference track is not suitable for this paper. I think the dataset & benchmark track is more suitable. Writing is poor.

Response: We address a very important problem in Chemistry, the prediction of polar reactions. Polar reactions are an extremely diverse, complex, and fundamental subset of chemical reactions which are widely observed in important synthetic pathways. Predicting polar reactions is one of the most difficult problems in AI applied to science. At the moment, there is no AI that predicts chemical reactions at the level of an expert. The major contribution of the manuscript is to address the prediction of polar reactions in an explainable way. We train models on a newly introduced dataset and are able to provide interpretable predictions of polar mechanisms with high accuracy.

审稿意见
3

Previous reaction prediction models formulate the forward chemical reactions in an end-to-end manner, which only considers the input state and output state while ignoring intermediate states describing the electron redistribution changes. This work tries different models on a new benchmark dataset PMechDB. Experimental results demonstrate the effectiveness of the transition state information in the new benchmark dataset.

优点

The motivation is quite clear. This reviewer agrees with the importance of the exploration of intermediate electron transfer. This is particularly important for the chemical reaction simulation, benefitting the understanding of reaction mechanisms.

缺点

(1) The technical contribution of this work is very limited. This reviewer does not see enough improvements from the algorithm side. Also, it seems the dataset is not proposed by this work. The contribution of this work is overall limited.

(2) If this work intends to propose a new benchmark, then much more comprehensive reaction models should be covered. Currently, two important reaction models are not discussed: "non-autoregressive electron redistribution modeling for reaction predictions" and "A Generative Model For Electron Paths." In addition, the evaluation metric and the new task are not clearly described. More detailed descriptions should be provided for clarity.

(3) The presentation of this work is not very clear. This reviewer does not fully understand how the multi-step information helps the reaction modeling. A good example illustrating the significance of the intermediate step information is required. At this stage, this reviewer thinks the multi-step transition information can be easily captured by recursive modeling of single-step reaction models. Currently, this reviewer does not see what new challenges are brought by the intermediate step.

(4) This work is very similar to the published paper "AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways via Contrastive Learning." This reviewer does not see many differences between the submitted work and this prior work.

问题

Is this paper published in NeuriPS 2023 as "AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways via Contrastive Learning"? It seems the content is very similar. If not, what are the differences between this work and the previous work?

局限性

Constructing the benchmark dataset with ground-truth multi-step electron transition states is very hard. This may hinder the further development of this direction.

作者回复

Reviewer Comment: The technical contribution of this work is very limited. This reviewer does not see enough improvements from the algorithm side. Also, it seems the dataset is not proposed by this work. The contribution of this work is overall limited.

Response: For novel ML architectures, we present the two-step transformer. However, the major contribution of the manuscript is to address the prediction of polar reactions in an explainable way. Polar reactions are an extremely complex, commonly encountered, and fundamental group of chemical reactions in organic chemistry. Predicting polar reactions is one of the most difficult problems in AI applied to science, and understanding them is necessary for designing synthetic pathways. We train models on a newly introduced dataset and are able to provide interpretable predictions of polar mechanisms with high accuracy.

Reviewer Comment: If this work intends to propose a new benchmark, then much more comprehensive reaction models should be covered. Currently, two important reaction models are not discussed: "non-autoregressive electron redistribution modeling for reaction predictions" and "A Generative Model For Electron Paths." In addition, the evaluation metric and the new task are not clearly described. More detailed descriptions should be provided for clarity.

Response: The number of models of comparison we can do was limited by the size limits of the neurips conference. However, the reviewer is correct that these models are interesting. We believe their use of modeling electron flows makes them relevant to this project. The NERF model from the "non-autoregressive electron redistribution modeling for reaction predictions" paper will be included in the updated manuscript.

Reviewer Comment: The presentation of this work is not very clear. This reviewer does not fully understand how the multi-step information helps the reaction modeling. A good example illustrating the significance of the intermediate step information is required.

Response: An example illustrating the importance of intermediate steps will be added to the manuscript. We have included a figure demonstrating this in the attached pdf titled Figure 1. This figure illustrates the creation of an unwanted side product in an intermediate step in the synthetic pathway for the drug Deucravacitinib. This led to a decrease in the overall purity of the products. By modeling the overall transformation as a series of elementary steps, competing pathways can be identified and the synthetic pathway can be optimized to reduce unwanted side products and increase purity and yield.

Reviewer Comment: At this stage, this reviewer thinks the multi-step transition information can be easily captured by recursive modeling of single-step reaction models. Currently, this reviewer does not see what new challenges are brought by the intermediate step.

Response: We are unsure what the author is asking.

If the “single-step” reaction models refer to the mechanistic models: This is exactly what we are doing. The mechanistic models are designed so that they can be recursively applied to generate a series of elementary steps which can describe an overall transformation.

If the “single-step” reaction models refer to overall transformation reaction models: The task of predicting elementary steps is inherently different from the task of predicting overall transformations. To make a simple analogy, if we consider the reactants to be a set of cooking ingredients. The overall transformation model predicts the food we are cooking, while the mechanistic model predicts the product of each step of the cooking recipe. Chaining the overall transformation model will not provide any pathway because it does not predict intermediate steps, it only predicts the final product.

Reviewer Comment: Is this paper published in NeuriPS 2023 as "AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways via Contrastive Learning"? It seems the content is very similar. If not, what are the differences between this work and the previous work?

Response: The main contribution of this work is to present a predictor for elementary polar reaction steps. The AI for Interpretable Chemistry paper addresses prediction on radical reaction steps. Polar reactions are a much more diverse group of reactions than radical reactions. We train on a larger training set, containing a different type of reaction, which has mechanisms that are considerably more diverse and complex.

审稿意见
7

the paper describes a new approach to predict polar reaction mechanisms, which is the most important class of chemical reaction mechanisms. this can be quite useful for chemical reaction prediction.

this reviewers rating is based on the current presentation of the manuscript, if the authors are willing to enhance the clarity of the manuscript, this reviewer is willing to increase their score.

优点

  • Addressing an underexplored but important problem
  • decent results
  • interesting results with pre-trained methods, that in some cases are surprising (T5 seem to work not as well despite multi-task pretraining)

缺点

  • Model and data processing descriptions are quite short and should be expanded, and presented coherently in one location in the manuscript. From the description in the manuscript I would likely not be able to re-implement the method
  • It is not immediately clear which ensemble is shown in table 4
  • maybe not so much innovation from the ML side?

问题

  1. What could be the reason the t5 model is not working so well, even though it's pre-trained on multiple tasks?
  2. Why did the authors not consider employing more advanced GNN models or the models from Kayala et al?

Tiny details: line 105. "We utilize the innovative text-based reaction predictor, Molecular Transformer" - the MT is now a 5 year old model, maybe not call it innovative anymore?

related prior work by Segler https://doi.org/10.1002/chem.201605499 and in particular by Bradshaw https://openreview.net/forum?id=r1x4BnCqKX should be cited

局限性

ok

作者回复

Reviewer Comment: Model and data processing descriptions are quite short and should be expanded, and presented coherently in one location in the manuscript. From the description in the manuscript I would likely not be able to re-implement the method

Response: The transformer models are publicly available. As for the two-step models, they are described in detail in the Kayala et al paper. We will add an additional section in the appendix to further clarify the basis for the two-step methods.

Reviewer Comment: It is not immediately clear which ensemble is shown in table 4

Response: In table 4, all the ensembles are for the two-step transformer architecture. The figure title has been updated to clarify this.

Reviewer Comment: maybe not so much innovation from the ML side?

Response: For novel ML architectures, we present the two-step transformer. However, the major contribution of the manuscript is to address the prediction of polar reactions in an explainable way. Polar reactions are an extremely complex, commonly encountered, and fundamental group of chemical reactions in organic chemistry. Predicting polar reactions is one of the most difficult problems in AI applied to science, and understanding them is necessary for designing synthetic pathways. We train models on a newly introduced dataset and are able to provide interpretable predictions of polar mechanisms with high accuracy.

Reviewer Comment: What could be the reason the t5 model is not working so well, even though it's pre-trained on multiple tasks?

Response: One possible reason T5Chem is outperformed by Chemformer on the forward prediction task could be the pretraining. The Chemformer model is pre-trained on roughly 4.5 times more forward reactions than the T5Chem model.

However, it is worth noting that the performance of T5Chem is consistent with the original T5Chem paper. In the T5Chem paper, they compare the model to Molecular Transformer, where it offers an improvement of 2% to top-5 accuracy. In our paper, we observe a similar improvement of 4% to top-5 accuracy between T5Chem and Molecular Transformer. They do not compare their model to Chemformer in their paper.

Reviewer Comment: Why did the authors not consider employing more advanced GNN models or the models from Kayala et al?

Response: The models from Kayala et al were implemented and used for the two-step prediction as well as the two-step transformer.

We have applied more advanced GNNs to the data at the time of submission, but the results were incomplete. We will include GNN results in the updated version of the manuscript.

Reviewer Comment: Tiny details: line 105. "We utilize the innovative text-based reaction predictor, Molecular Transformer" - the MT is now a 5 year old model, maybe not call it innovative anymore?

related prior work by Segler https://doi.org/10.1002/chem.201605499 and in particular by Bradshaw https://openreview.net/forum?id=r1x4BnCqKX should be cited

Response: Both of these points have been fixed in the manuscript.

评论

Thank you for your reply and effort to address the points raised, I will adjust my score.

I’d still recommend to consider improving the clarity of the writing, as also other reviewers have pointed out. The manuscript addresses an important and timely topic, which unfortunately doesn’t come across as clear as it should be.

作者回复

We have carefully considered all comments given by the reviewers, and wish to thank them for helping us enhance the quality of the manuscript. In response to the reviewers’ feedback, we have prepared an additional pdf with figures to address specific comments.

最终决定

The submission focuses on explainable polar reactions prediction. The problem has been recognized as interesting and challenging. However, the novel contribution of the work from the perspective of neural learning machinery is marginal. The review discussion has highlighted similarities with recently published works and the authors response on the differentiating factors does not highlight a substantially novel contribution to support acceptance.