PaperHub
5.8
/10
Poster4 位审稿人
最低5最高7标准差0.8
7
5
6
5
3.5
置信度
正确性2.8
贡献度3.3
表达3.3
NeurIPS 2024

Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning

OpenReviewPDF
提交: 2024-05-16更新: 2024-11-06

摘要

关键词
AI for science; molecule structure generation; diffusion model; physical prior;

评审与讨论

审稿意见
7

This paper presents a method to improve different tasks simultaneously by combining datasets of different fidelities and focuses, utilizing scientific laws that connect the tasks. Predicting the energy and equilibrium structure of molecules is used as an example. Two forms of consistency losses are developed based on (1) the rule that equilibrium structure minimizes energy, and (2) the relation between the probability distribution of structure and energy. The benefits of consistency losses in multi-task learning are demonstrated on quantum chemistry datasets.

优点

  • Developing consistency loss functions is an elegant approach to incorporating scientific laws into machine learning.
  • The proposed method does not require additional data to improve performance.

缺点

  • The demonstration of the method is limited to one scenario (energy + equilibrium structure), where the formulation of consistency losses is ad hoc. The broader applicability is thus questionable.
  • The review of related works is not comprehensive (see Questions).

问题

  • This work essentially incorporates scientific laws by modifying loss functions. There are other works following a similar approach, e.g., PINNs, should they be discussed as related works?
  • In the experiments, the demonstrated benefits of consistency learning are mainly about accuracy. While for structure prediction via diffusion, efficiency (e.g., the diffusion steps required to attain a near-equilibrium structure) is also important. Could you comment on the efficiency?
  • Training on 8× Nvidia V100 GPU for a week is a considerable computational cost. How does consistency loss affect computational cost and model convergence?

局限性

Discussed in Sec. 5.

作者回复

Thank you for your dedicated effort in reviewing our paper! We deeply appreciate your acknowledgement of our contributions, as well as informative feedback and suggestions.  

Broader applicability

Thank you for the opportunity to elaborate on this point. Please refer to the global rebuttal (item 1).

Related work about PINN

Thank you for the suggestion! We have included a discussion on related works on PINN as follows:

Physics-informed neural networks (PINNs) [R1] are another example of incorporating physical laws in neural network training. The principle of PINNs involves representing the unknown target function with a neural network and optimizing it using a loss function derived from a system of partial differential equations (PDEs), such as the variational form of the PDE [R2]. This approach has shown promise in solving PDEs across various applications, including higher-dimensional problems [R3, R4]. PINNs are also applied to solve inverse problems by optimizing the parameters of PDEs [R5, R6]. More relevant to our case, PINNs can also be used to tackle data heterogeneity. For instance, HFM [R7] uses the Navier-Stokes equation to establish a connection between the concentration of contrast agents in the bloodstream, and the dynamic quantities of blood flow such as velocity and pressure. It is used to derive the latter from the inferred results of the former from concentration variations in medical imaging. Similarly, PhySR [R8] utilizes the underlying physical laws in the system, such as those governing the 2D Rayleigh-Bénard convection system, to reconstruct high-resolution results from low-resolution data.

While sharing the same spirit, our work has some technical differences. Physical laws in molecular properties often do not come in the form of PDEs (unless solving the Schrödinger equation in the original form) but as algebraic or statistical equations. There is no need of grids, but involves multiple quantities, e.g., energy and structure in our case, so the laws are used to bridge different prediction models rather than learning one model. Moreover, neural networks can be treated as black-box models in PINNs, while in our case, in-depth analyses are needed to connect model output to the desired quantities: Sec. 3.2 analyzed how to produce a rough structure prediction from the output of the denoising model for optimality consistency, and Sec. 3.3 analyzed how to compute the score using the denoising model for score consistency.

[R1] Raissi M, Perdikaris P, Karniadakis G E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations[J]. Journal of Computational Physics, 2019, 378: 686-707.

[R2] Yu B. The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems[J]. Communications in Mathematics and Statistics, 2018, 6(1): 1-12.

[R3] Lu L, Meng X, Mao Z, et al. DeepXDE: A deep learning library for solving differential equations[J]. SIAM review, 2021, 63(1): 208-228.

[R4] Han J, Jentzen A, E W. Solving high-dimensional partial differential equations using deep learning[J]. Proceedings of the National Academy of Sciences, 2018, 115(34): 8505-8510.

[R5] Lu L, Pestourie R, Yao W, et al. Physics-informed neural networks with hard constraints for inverse design[J]. SIAM Journal on Scientific Computing, 2021, 43(6): B1105-B1132.

[R6] Yu J, Lu L, Meng X, et al. Gradient-enhanced physics-informed neural networks for forward and inverse PDE problems[J]. Computer Methods in Applied Mechanics and Engineering, 2022, 393: 114823.

[R7] Raissi M, Yazdani A, Karniadakis G E. Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations[J]. Science, 2020, 367(6481): 1026-1030.

[R8] Ren P, Rao C, Liu Y, et al. PhySR: Physics-informed deep super-resolution for spatiotemporal data[J]. Journal of Computational Physics, 2023, 492: 112438.

Efficiency for structure prediction via diffusion

Thank you for bringing out the efficiency consideration. Our methods would introduce some cost in the training stage for going beyond the level of accuracy of training data, but they do not alter the way to generate structures in the inference stage, hence not affecting the efficiency of structure prediction. To improve the efficiency, we can directly leverage prevailing techniques to reduce the number of diffusion steps, e.g., DDIM [43], Heun's method [R9], and DPM-Solver [R10].

[R9] A. Jolicoeur-Martineau, K. Li, R. Piché-Taillefer, T. Kachman, and I. Mitliagkas. Gotta go fast when generating data with score-based models. CoRR, abs/2105.14080, 2021.

[R10] Lu C, Zhou Y, Bao F, Chen J, Li C, Zhu J. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems. 2022 Dec 6;35:5775-87.

Computational cost

Thank you for your careful read! Consistency training indeed introduces more training effort, but we made an implementation design as described in Appendix B.3, so that the additional cost is still manageable. The main reason that we need such a computational cost is that we used a relatively large model (around 130M parameters) to sufficiently capture the information on PM6, which is the largest public molecular dataset with DFT-level energy label.

评论

I appreciate the authors' comprehensive response, which well addresses my concerns. I could raise the score for Presentation to 4 with the clarification questions answered.

评论

Thank you for your update! We are glad to know that our reply has addressed your concerns. We will include these additional clarification contents into our paper.

审稿意见
5

The paper proposed a scientific consistency based improvement of molecule structure and energy prediction task. Upon the overall diffusion process for structure prediction, the authors incorporated energy-guided losses, which enables direct information exchange between the two tasks. Based on the two benchmark datasets, the proposed consistency losses improved the naive multi-task learning framework.

优点

  • Well-written in Methods
  • It is notable to improve prediction performance based on the scientific correlation between molecular structure and energy without additional data

缺点

  • The title and abstract of the paper are too general in relation to the specific task performed.
  • The proposed method's utility is limited as it can only be applied to tasks with a strong correlation, such as molecular structure and energy.
  • The paper specifies 200 test molecules, but it does not explain the criteria for their selection or why the number 200 was chosen.
  • In the experimental process, PM6 was used for pretraining, and identical molecules from PCQ and QM9 were removed; however, there is no mention of removing structurally similar molecules.

问题

  • Do the authors think the proposed method would be useful for other tasks besides molecular structure and energy? If not, how about limiting the scope of the paper to molecular structure?
  • How was the number 200 determined for the test molecules?
  • What do the authors think the results would be if molecules from PCQ and QM9 with a similarity above a certain threshold (e.g., Tanimoto similarity 0.7) were additionally removed and the experiments were conducted?

局限性

  • The effectiveness of the proposed method is uncertain as there are no experiments applying it to the latest diffusion-based methods.
作者回复

Thank you for your devoted effort in evaluating our paper! We appreciate your informative feedback and solid suggestions.

About the title and abstract

Thank you for your feedback. Please refer to the global rebuttal (item 2).

Utility of the proposed method

Thank you for the opportunity to elaborate on this point. Please refer to the global rebuttal (item 1).

About test molecules

The 200 test molecules are uniformly randomly selected from the PCQ (or QM9) dataset to guarantee that they are in-distribution with the whole PCQ (or QM9) dataset, while excluding molecules that also appear in the PM6 (training) dataset. This setting, including the number 200, follows previous mainstream structure generation works, including ConfGF [39], GeoDiff [53], and DMCG [62].

Structurally dissimilar test molecules

Thank you for the insightful suggestion! It makes sense to exclude structurally similar molecules from the test set. Nevertheless, we found our selected 200 test molecules are already sufficiently dissimilar from the PM6 (training) dataset. For each test molecule, we take the portion of PM6 molecules that have a Tanimoto similarity larger than 0.7 with the test molecule as a measure of similarity to the PM6 dataset. Figure R2 in the pdf file from the global rebuttal shows the distribution of this measure. We can see that most (almost all) of the 200 molecules only has less than 1e-7 (2.5e-7) similar molecules in PM6. This indicates that the presented results already show the performance on structurally dissimilar test molecules.

Applicability to latest diffusion-based methods

We'd like to mention that our consistency learning methods are model-agnostic and can be applied to any diffusion-based model. This is because they all predict the score at each diffusion time step (predicting the noise or clean sample (denoising) are equivalent to predicting the score), so at small time steps, the score should align with the energy gradient (score consistency), and at larger time steps, the score can be used to predict the equilibrium structure through the denoising formulation (Eq. 4) (optimality consistency).

Due to limited time and computational resources during the rebuttal period, we are unable to provide results on more diffusion-based methods, but we'd be happy to try if you could specify the method in your mind.

评论

Thank you for the time and effort of the authors. The authors' rebuttal has addressed most of the concerns, but there are still doubts regarding the applicability of the paper and structurally dissimilar molecules between PM6 and training set of PCQ (or QM9). For this reason, I will maintain my original score.

评论

Thank you for sharing your updated comments! We are glad to know that our reply has addressed most of your concerns.

Regarding the applicability of the paper

We have made a clarification in the global rebuttal (item 1). We'd like to further highlight that beyond the utility to improve equilibrium structure prediction with energy, in Sec. 3.4, 4.3 we also showed the utility to leverage force labels and off-equilibrium structures to further improve structure prediction. This type of data heterogeneity is perhaps more ubiquitous than it seems to be. On one hand, these tasks bear a central importance and cover most problems in molecular science. The equilibrium structure provides a direct understanding of important properties of a molecule e.g. a quick judgement of whether it can bind to a protein target, and is the pre-requisite to calculate many properties e.g. phonon spectrum. Energy and force are central to molecular dynamics simulation which is perhaps the most important way to study functions and macroscopic properties of a molecule. On the other hand, data heterogeneity between energy, force and equilibrium structure is ubiquitous. As we mentioned in Lines 35-40, generating an equilbrium structure data point requires repeated energy calculation, which is inherently orders more costly than generating an energy label, so structure data are usually generated using a less costly but also less accurate method (there is a long-standing accuracy-efficiency trade-off in data generation methods), causing data heterogeneity.

Moreover, we have explained in the Conclusion and in the global rebuttal that the proposed methods can be directly applied to leverage energy and force to improve thermodynamic ensemble sampling, which is a different task from structure prediction but also widely concerned since it can estimate statistical properties and functions of molecules.

We would be more than happy if you could specify your doubts regarding the applicability.

Regarding evaluation on structurally dissimilar molecules

There seems a misunderstanding (if not a misspelling) in your description: we did not include any PCQ (or QM9) molecules in training (which only contains PM6 molecules). The test molecules are from PCQ (or QM9) that do not appear in the PM6 dataset.

In the previous reply, we have shown that our results are already evaluation on dissimilar molecules. Figure R2 in the pdf file attached in the global rebuttal shows that the test molecules have a very rare, if any, similar (Tanimoto similarity > 0.7 as you suggested) molecules in the training dataset. Particularly, there are 49% of PCQ test molecules that do not have any similar molecules in the PM6 (training) dataset, and there are more than 80% of them that have less than 0.000008% (8e-8) similar molecules in PM6.

We'd also like to point out that even if there are identical molecules (in terms of the same molecular graph, or equivalently, SMILES) in the test set, a well learned model may still unnecessarily give a good evaluation result: note that only the less accurate PM6-level equilibrium structures (recall that this refers to the 3D structure of a molecule; see Lines 125-126; note "A molecule (a given SMILES) in physical reality can take different structures") are available in training, while the evaluation compares model-predicted structures against DFT-level (more accurate) equilibrium structures. So the improvement in the evaluation results is a solid evidence that the proposed consistency learning takes effect.

For a completely sanitized evaluation on dissimilar molecules, below we provide the results evaluated on the 49% of PCQ test molecules that do not have any similar molecules in the PM6 (training) dataset. Due to limited time, we provide the results (in terms of both RMSD and Coverage (see Appendix C.1)) only in the denoising generation setting corresponding to Tables 1 and 2, and will provide results in other settings in the revision.

Training SetMethodMean RMSD (Å) downarrow\\downarrowMin RMSD (Å) downarrow\\downarrowMean Cov uparrow\\uparrowMedian Cov uparrow\\uparrow
PM6 (Table 1)Multi-Task1.1750.6420.6130.675
Consistency1.1350.6250.6440.745
PM6 + SPICE force (Table 2)Multi-Task1.1360.6090.6390.735
Consistency1.1210.5790.6720.790
PM6 + subset force (Table 2)Multi-Task1.1740.6530.6120.660
Consistency1.0990.6160.6970.830

We can see that the proposed consistency learning method still universally outperforms the baseline. The improvement is even larger than that in Tables 1 and 2. This result is a direct solid verification that the improvement does not come from memorizing training data.

We would be more than happy if you could specify your doubts if there are still any.

审稿意见
6

To handle heterogeneity in molecular data and different computational costs, authors propose to exploit molecular tasks that have scientific laws connecting them. Their results show that the more accurate energy data can improve the accuracy of structure prediction. Authors highlight that in contrast to conventional machine learning tasks defined by data, scientific tasks originate in fundamental scientific laws. Scientific laws impose explicit constraints between tasks defining the scientific consistency between model predictions on these tasks. By enforcing such consistency, model predictions for different tasks are connected and can explicitly share the information in the data of one task to the prediction for other tasks, hence bridging data heterogeneity. Authors demonstrate the practical value of the scientific consistency between energy prediction and equilibrium structure prediction. Authors demonstrate the advantages of incorporating the proposed consistency losses into multi-task learning. Authors design consistency losses to enforce scientific laws between inter-atomic potential energy prediction and equilibrium structure prediction.

优点

To handle heterogeneity in molecular data and different computational costs, authors propose to exploit molecular tasks that have scientific laws connecting them. Their results show that the more accurate energy data can improve the accuracy of structure prediction. Authors highlight that in contrast to conventional machine learning tasks defined by data, scientific tasks originate in fundamental scientific laws. Scientific laws impose explicit constraints between tasks defining the scientific consistency between model predictions on these tasks. By enforcing such consistency, model predictions for different tasks are connected and can explicitly share the information in the data of one task to the prediction for other tasks, hence bridging data heterogeneity. Authors demonstrate the practical value of the scientific consistency between energy prediction and equilibrium structure prediction. Authors demonstrate the advantages of incorporating the proposed consistency losses into multi-task learning. Authors design consistency losses to enforce scientific laws between inter-atomic potential energy prediction and equilibrium structure prediction.

缺点

As authors point out this work is limited to the consistency between energy and molecular structure prediction, while more consistency laws can be considered in molecular science and the significance of improvement in this work is still limited by the abundancy of involved data.

问题

As authors point out the significance of improvement in this work is still limited by the abundancy of involved data. Can authors provide a measure to quantify the abundancy of dada for molecular equilibrium structure prediction?

局限性

As authors point out this work is limited to the consistency between energy and molecular structure prediction, while more consistency laws can be considered in molecular science and the significance of improvement in this work is still limited by the abundancy of involved data.

作者回复

Thank you for your effort in evaluating our paper. Your informative feedback are greatly appreciated.

Consistency Beyond Energy and Structure

Thank you for the opportunity to elaborate on this point. Please refer to the global rebuttal (item 1).

Abundancy of involved data

In consistency training, energy landscape matters: the energy model needs to rank different structures of each molecule in optimality consistency, and provide gradient (slope) in score consistency. Although the PM6 dataset [31] is already the largest public DFT-labeled molecular dataset (up to our best knowledge), it provides energy on only one structure per molecule, which may be insufficient for learning the landscape. So we tried leveraging more data (Sec. 3.4, 4.3): we generated force (negative energy gradient) labels on a subset of PM6 molecules, and leveraged the force labels of multiple structures per molecule in the SPICE dataset [11]. They indeed improve energy landscape (Table 7: better energy prediction results) and lead to more accurate structure prediction with consistency (Table 2 vs. Table 1), but data abundancy is still limited: the generated data do not cover multiple structures, and there is a mismatch in the DFT settings of SPICE labels. We did not find public datasets providing energy or force labels under the same DFT setting as the PM6 dataset on multiple structures per molecule, and we did not have sufficient resource to generate such data. We expect more significant benefits of consistency training if such datasets are available.

We'll include these discussions in our next version.

审稿意见
5

 

The authors consider the multitask learning setting for molecular structure and energy prediction where the fidelity of the labels differs between tasks [1]. The authors note that they can leverage the relationship between high fidelity labels (energy) and low fidelity labels (structure) to design loss functions, a) the optimality consistency loss and b) the score consistency loss that operate as inductive biases in the multitask learning setting. Given that the proposed method is straightforward and appears to work well empirically, I consider the paper borderline at the moment since the code has not been supplied to reproduce the experimental results. I will revise my score if the code can be provided to ensure the reproducibility of the reported results.

 

优点

 

The method introduced by the authors is straightforward and appears to work well empirically.

 

缺点

 

MAJOR POINTS

 

  1. The title is not fully descriptive of the authors' contribution. I would recommend the authors revise the title to be more descriptive of the paper content. Specifically, the contribution does not apply to molecular science as a whole, but rather to structure and energy prediction. From the title alone, it is also unclear what the meaning of scientific consistency is, given that this appears to be a neologism coined by the authors.

 

MINOR POINTS

 

  1. It would be great if the references appeared in numbered order.

  2. Line 2, typo, "at scale".

  3. There are some discrepancies in capitalization in the references e.g. 3d and 3D.

  4. Line 38, typo, "hundreds of times more costly".

  5. The original paper on molecule generation with VAEs [2] should probably be cited at some point in the text.

  6. Given that the standard deviations are crucial to establishing the improvement afforded by the consistency loss approach, I would recommend providing them in the main text instead of in the appendix. It may also be possible to perform a paired t-test to assess the statistical significance of the results e.g. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html

  7. The axis labels of Figure 2 are somewhat confusing. The y-axis label is the predicted energy on the model-generated structure whereas the x-axis label is the energy of the equilibrium structure in the dataset. Would it be more appropriate to label the x-axis ground truth energy? The axis labels are confusing because the "predictor" for both energies is different.

  8. In Figure 2, why is there a systematic deviation in the predicted energy of R_pred relative to R_eq?

    1. ADAM, reference 22, was published at ICLR 2015.
  9. The details of the model should be provided in the main paper rather than the appendix.

  10. In Section 4.4 the authors refer to the models from Section 4.2. It is not clear what these models are or how they were pre-trained. I would suggest including this in the main text.

  11. Line 375, "by the abundance of the data involved".

  12. It would be worth mentioning [3] as a reference source for multitask learning at some point in the text.

 

REFERENCES

 

[1] Peherstorfer, B., Willcox, K. and Gunzburger, M., 2018. Survey of multifidelity methods in uncertainty propagation, inference, and optimization. Siam Review, 60(3), pp.550-591.

[2] Gómez-Bombarelli, R., Wei, J.N., Duvenaud, D., Hernández-Lobato, J.M., Sánchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T.D., Adams, R.P. and Aspuru-Guzik, A., 2018. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2), pp.268-276.

[3] Caruana, R., 1997. Multitask learning. Machine learning, 28, pp.41-75.

 

问题

 

  1. In Section C.1, the authors state that Tables 4 and 5 contain the std of the predictions, however, the caption states that the table contains test coverage values. The caption of Table 4 also indicates that lower is better. Is this the correct direction?

  2. Could the authors produce standard errors for the results presented in Table 7 of the appendix?

 

局限性

 

  1. One limitation of the work is the realm of applicability of the method since it applies specifically to molecular structure and energy prediction.

  2. At the current point the authors have yet to provide code to reproduce the results.

 

作者回复

Thank you for your dedicated effort in evaluating our paper! We can feel your careful read, and are grateful for your feedback and suggestions.

About the title

Thank you for your informative feedback. Please check item 2 of the global rebuttal.

The minor points

  1. Thank you for the professional suggestion. We have revised the references in numbered order.

  2. Thank you for pointing it out. We have revised it.

  3. Thank you for your careful read! We have revised the capitalization.

  4. Thank you. We have revised it.

  5. Thank you for your suggestion. We have cited the paper in Introduction.

  6. Thank you for your suggestion. We have managed to insert standard deviation (in a smaller font) in the main tables.

    Following your instruction, we also conducted a t-test to all results in the main tables (1-3). Please check Table R1 in the pdf file from the global rebuttal. There are only 4 out of 32 cases that have a p-value > 0.05, and those are all under the "Min" case where multi-task learning may also have a chance to hit the target structure. The means are indeed close in those 4 cases. We have included these results in the revision.

  7. Thank you for your feedback. We'd like to clarify that for the x-axis of Figure 2, the structures are from the PCQ dataset (vs. predicted by the model for the y-axis), but the energies of the structures are predicted by the energy prediction model (this part is the same as the y-axis). The purpose of this figure is to verify that the improved structure prediction accuracy by consistency training is indeed due to that the predicted structures achieve a lower DFT-level energy (instead of e.g. a better fit to the PM6 structures), hence it is the consistency learning that makes the model predict structures closer to DFT-level equilibrium structures. We did not compare the ground-truth energies since we do not have DFT energy labels for the model-predicted structures. The energy prediction model is trained on DFT-level energy labels on PM6 structures, hence can be used as a surrogate to evaluate DFT-level energy.

  8. Alignment of predicted energy of R_pred and R_eq (i.e., points are on the diagonal) means the model predicts the same as R_eq, which is not what we expected, since the model does not see any DFT-level equilibrium structures in training (note only the less accurate, PM6-level equilibrium structures are available for training, while R_eq are DFT-level equilibrium structures from the PCQ dataset as the ground truth for evaluation). Instead, the point of Figure 2 is that consistency learning (still no DFT-level equilibrium structure data!) drives the model to predict structures closer to R_eq, which is verified in Figure 2 as the orange points (consistency) lie closer to the diagonal than the blue points (multi-task).

  9. Thank you. We have revised the citation.

  10. Thank you for your suggestion. We have added model details in Sec. 4.1.

  11. Thank you for your feedback. We meant that the pre-trained models for the finetuning experiments in Sec. 4.4 are those trained under the settings described in Sec. 4.2 (the models corresponding to the results in Table 1). The detailed training settings for Sec. 4.2 are provided in Appendix B.3, which we have moved to Sec. 4.1 in the revision.

  12. Thank you. We have revised it.

  13. Thank you for your suggestion. We have added the reference in the introduction of multitask learning.

About the questions

  1. Thank you for your careful read! We apologize for the confusion. We have revised Sec. C.1 such that Tables 4 and 5 contain the coverage results, and revised the caption such that it is the higher the better. The corresponding std results are presented in Tables 9 and 10.

  2. As the scale of energy depends on the size of molecule (energy is an extensive quantity), which also affects the standard deviation of the error, we provide a box plot (Figure R1 in the pdf file from the global rebuttal) for the energy MAE in each case corresponding to Table 7, which provides more details about the error distributions. Comparing results in each column, we see that training with consistency losses does not hurt energy prediction. Comparing results in each row, we find including force data in training leads to a more accurate energy model, which explains that consistency learning performs better in this case (comparing Table 2 to Table 1). This aligns with the observations from Table 7.

About the limitations

  1. Please refer to the global rebuttal (item 1) for broader applicability.

  2. As we mentioned in the Paper Checklist (item 5), releasing code requires an internal asset release review process within our organization. We have started the process, but cannot guarantee availability during the review period. We have been pushing the process, but unfortunately it is still not completed. We have provided implementation details in Appendix B, and will provide more to guarantee reproducibility. We will push further to guarantee the release before the time of publication.

评论

 

Many thanks to the authors for their rebuttal. I will confine my response to the outstanding points given that the remainder have been addressed by the authors.

 

  1. I would recommend using "null hypothesis" in place of "postulate" for the paired t-test.
  2. I think the confusion arises because the model-generated structure is denoted as R_pred. As far as I understand there is a) the model-generated molecule and b) the model-predicted energy of the molecule. Currently both the molecule itself is referred to as a prediction in addition to its energy being referred to as a prediction. Perhaps a notation such as R_gen for the model-generated molecule would help disambiguate these cases?
  3. Rather than the meaning of the figure, I was inquiring more as to the systematic deviation present in the figure. In other words all predicted energies lie higher than the predicted energy of the equilibrium structure. Could the authors remind me as to whether there is a constraint that enforces this in the model prediction?

 

In terms of code release, it is unfortunate that the authors are subject to an internal review process which will delay the release of the code. Please notify me in the case that the internal review process completes before the rebuttal period. One option might be to provide a blank anonymous GitHub repository link and to update this post-rebuttal once the internal code review process completes. In that way the code release may still be accounted for prior to the final decision on the paper.

 

评论

Thank you for your attention on our response, and for sharing your further thoughts and suggestions! We are glad to know that we have addressed most of your concerns, and are happy to make further discussions on the rest.

  1. Thank you for your suggestion! We will change to the professional term "null hypothesis" in explaining the meaning of the presented numbers in the table.

  2. Thank you for your informative suggestion. In the submission, we treated "model-predicted structure" and "model-generated structure" as the same. We will change the term to the latter and correspondingly change the label R_pred to R_gen in response to the information that the latter could reduce ambiguity.

    (In case of potential misunderstanding, we would like to mention that the energy, denoted as E_mathcalG(mathbfR)E\_\\mathcal{G}(\\mathbf{R}) in our paper, is a function of both a molecule (in terms of its molecular graph mathcalG\\mathcal{G}) and a 3D structure mathbfR\\mathbf{R} of the molecule (coordinates of atoms in the molecule). In Figure 2, each point corresponds to one molecule mathcalG\\mathcal{G} in the PCQ dataset (as the test dataset). Its x- and y-coordinates are the model-predicted energies of the ground-truth equilibrium structure mathbfR_mathrmeq\\mathbf{R}\_\\mathrm{eq} of the molecule mathcalG\\mathcal{G} (available from the dataset) and of the model-generated structure mathbfR_mathrmgen\\mathbf{R}\_\\mathrm{gen} (formerly denoted as mathbfR_mathrmpred\\mathbf{R}\_\\mathrm{pred}) of the molecule mathcalG\\mathcal{G}. Your description is accurate if "the model-generated molecule" is meant to be "the model-generated structure of a given molecule".)

  3. To understand why the predicted energy of model-generated structure mathbfR_mathrmgen\\mathbf{R}\_\\mathrm{gen} is higher than the predicted energy of the equilibrium structure mathbfR_mathrmeq\\mathbf{R}\_\\mathrm{eq} for all the molecules, please recall that the definition of equilibrium structure is the structure that achieves the minimal energy (Line 74). In Figure 2, the equilibrium structures are from the PCQ dataset which are ground-truth DFT-level equilibrium structures (produced by carrying out actual DFT calculation), and the energy-prediction model is trained on DFT-level energy labels from the PM6 dataset. If the energy-prediction model is well learned, then any structure (including the model-generated structure) shall not have a lower energy than the ground-truth DFT-level equilibrium structure for all the molecules.

    Note that the structure-generation model is trained on the PM6 dataset whose structures are not DFT-level equilibrium structures, so the model-predicted energies (approximately the DFT-level energies) of model-generated structures are systematically larger than the ground-truth DFT-level equilibrium structures. With the proposed consistency learning technique, the systematic deviation becomes smaller, indicating that the model-generated structures become closer to the corresponding ground-truth DFT-level equilibrium structures.

Thank you for the suggestion regarding code release. We have set up an anonymous GitHub repository, and will upload the code there once we have completed the internal review process. According to the NeurIPS review policy, this link should be sent to the area chair for verification before we can provide it to you. We will provide the link once we hear back from the area chair.

评论

 

Many thanks to the authors for their clarifications. In terms of point 8, it now makes a great deal of sense that under the assumption the energy model is performant, the energy prediction of the equilibrium structure should be lower relative to the generated structure.

In terms of the code release please keep me updated regarding the response from the Area Chair regarding verification of the anonymous GitHub link. My score will be upgraded automatically to a 6 with addition of the code and the authors may be assured that I will note this in the reviewer-AC discussion phase.

 

评论

Thank you for taking the time. We are glad that our clarification regarding point 8 has resolved your question. We appreciate your willingness to upgrade the score based on the code release. However, as the deadline draws near, we have yet to receive the permission from the Area Chair regarding the code link. We have reached out but have not received a reply. If possible, could you assist by asking the Area Chair for the code link during the reviewer-AC discussion period? We would be grateful if you could consider raising your score accordingly. We look forward to your continued discussion in the reviewer-AC discussion phase.

作者回复

We thank all the reviewers for their careful read, informative feedback, and sincere suggestions. We provide responses to two common questions in this global rebuttal.

  1. Applicability beyond energy and structure (for Reviewers B2Rx, fhSX, p18M, 3Qs1)

    Within the presented content, beyond energy and equilibrium structure, we'd like to mention that we also showed in Sec. 3.4, 4.3 that leveraging force labels on off-equilibrium structures in the proposed consistency learning methods can further improve the accuracy of equilibrium structure prediction.

    As we mentioned in Sec. 5, the proposed methods can be used for connecting energy and thermodynamic distribution (beyond predicting a single equilibrium structure but generating a thermodynamic ensemble of structures), since the score consistency still holds, and the optimality consistency can be adjusted to match model-derived structure statistics to macroscopic observations. This consistency training can potentially improve the accuracy of distribution beyond data-based training, as data samples are often only available from unconverged simulations (hence biased). We treated it as future work due to more tricky and elaborate training settings, evaluation protocols and benchmarks, and chose to develop and demonstrate the methods in the structure prediction scenario, which has already taken the capacity of a conference paper.

    We also mentioned broader possibilities following the same idea. Molecular properties (e.g., energy) are derived from electronic structure of the molecule following physical laws, and coarse-grained structure distribution is a partial integration of fine-grained structure distribution. We can design consistency training losses according to such laws to connect these tasks, tackling data heterogeneity in these cases.

  2. About the title (for Reviewers fhSX, p18M)

    To better describe the content, we plan to revise the title to "Tackling Data Heterogeneity in Molecular Energy and Structure by Enforcing Physical Consistency". We have also revised the abstract to focus on molecular energy and structure and their consistency.

    • The new title highlights the major scenario ("Energy and Structure") for the proposed consistency techniques. We'd like to mention that we also involved force and off-equilibrium structure data (Sec. 3.4, 4.3), extending the relevance beyond energy and (equilibrium) structure.

    • We hope that "Enforcing Physical Consistency" could convey the sense that there is a physical law connecting the two quantities, which defines a consistency between the two quantities and can be enforced by a loss. It is challenging to convey the precise meaning in the title, and we'd be more than glad if you have any suggestions.

评论

Dear Area Chair,

We would like to express our sincere gratitude for overseeing the review process and for your time and effort in ensuring a fair and thorough evaluation of our submission.

As the discussion deadline approaches, we would like to remind you that we have provided an anonymous code link in the official comment section for your review. We kindly request your approval to share this link with reviewer fhSX.

We appreciate the feedback we have received from three of the reviewers in response to our rebuttal. We have addressed most of their concerns and are thankful for their valuable insights and suggestions.

However, we have not yet heard back from reviewer B2Rx. This reviewer acknowledged our contribution and raised a significant question regarding data abundance, which we have addressed in our response. We are eager to engage in further discussion on this topic.

Additionally, while we have thoroughly addressed reviewer p18M’s concerns, some doubts remain regarding two specific points: the applicability of our approach and the treatment of structurally dissimilar molecules between PM6 and the training sets of PCQ (or QM9). In our follow-up comments, we clarified our approach's applicability and offered additional experimental results on the dissimilar structures between the training and test sets. We also emphasized the dissimilarity of the similar or identical molecules generated by two different level.

We are keen to continue the discussion, though we recognize that less than 24 hours remain. We recognize that reviewers have demanding schedules, which may lead to unavoidable delays. Even so, we would sincerely appreciate any action you could take to help us obtain a response from these reviewers. Your help in this matter would be incredibly meaningful to us.

Thank you once again for your dedication and support.

Best regards,
Authors

最终决定

The paper introduces the novel approach of leveraging scientific laws to enforce scientific consistency between tasks in multi-task learning. The current work applies this approach to energy prediction and structure prediction - exploiting the the relationship between structure and energy - and demonstrates improvement over vanilla multi-task learning. While the presented application is limited to energy prediction and structure prediction, the idea has broad applicability to other fields especially for machine learning in scientific domains. Concerns regarding regarding presentation and generalization to new chemical space are also sufficiently addressed.