/10

Poster3 位审稿人

最低3最高4标准差0.5

ICML 2025

DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra

Montgomery Bohde,Mrunali Manjrekar,Runzhong Wang,Shuiwang Ji,Connor W. Coley

提交: 2025-01-22更新: 2025-07-24

TL;DR

DiffMS incorporates discrete graph diffusion for de novo generation from mass spectra and reaches state-of-the-art performance.

摘要

关键词

AI4ScienceMass SpectraDiffusionGraph Neural Networks

评审与讨论

审稿意见

评分: 32025-03-12

The paper introduces DiffMS, a diffusion-based framework for generating molecular structures from mass spectra. DiffMS combines existing approaches in discrete graph diffusion (DiGress), with a pretraining framework in encoder-decoder transformer architecture. The authors conducted experiments and evaluations on two generation datasets, CANOPUS and MassSpecGym.

给作者的问题

Can the chemical formula derived using tools like SIRIUS be ambiguous or incorrect (e.g., due to low-resolution spectra)? If so, are there mechanisms to mitigate or improve such inaccurate formulae?

论据与证据

Yes.

方法与评估标准

Yes.

理论论述

No.

实验设计与分析

Yes.

补充材料

No.

与现有文献的关系

The method is built on discrete graph diffusion (DiGress) but integrates a pretraining strategy and molecular formula as additional constraint.

遗漏的重要参考文献

No.

其他优缺点

Strengths:

Strong performance: The authors show strong performance and outperform the previous approaches in the two benchmarks (CANOPUS and MassSpecGym).
Effectiveness of pretraining strategy: The paper includes ablation studies that demonstrate the effectiveness of the pretraining strategy on overall performance, indicating that pretraining on a large number of molecules is crucial for achieving high performance.

Weaknesses:

The method still has some weaknesses.

Reliance on external tools for formula determination: The method relies on using external tools (e.g., SIRIUS) for formula determination, which could lead to some errors in the predicted formula. While the paper argues that chemical formulae can be determined with sufficient accuracy, it does not address the potential errors from these tools or where they might fail.
Lack of discussion on computational requirements; The paper lacks a discussion on the runtime or computational requirement of the proposed method. The paper did not mention the resource/ time requirements of the proposed method.

其他意见或建议

No.

作者回复

2025-04-01

We appreciate the reviewer’s thoughtful feedback and for highlighting the novelty of our discrete diffusion method and pretraining strategies. Below, we have addressed the concerns regarding exact accuracy of annotation:

Reliance on external tools for formula determination: The method relies on using external tools (e.g., SIRIUS) for formula determination, which could lead to some errors in the predicted formula. While the paper argues that chemical formulae can be determined with sufficient accuracy, it does not address the potential errors from these tools or where they might fail.

We thank the reviewer for raising an important question about the difficulty of chemical formula inference. Firstly, we would like to point out that the top performing baseline models on CANOPUS, MIST + Neuraldecipher and MIST + MSNovelist, also require the chemical formula to be known. However, to show that this precondition is not restrictive, we provide a new experiment on the CANOPUS dataset which shows that DiffMS achieves state-of-the-art performance even without the ground truth formulae. Specifically, we use MIST-CF [1], a formula prediction tool, to label chemical formulae from spectra, where we find that it achieves 92% top-5 formula annotation accuracy on CANOPUS. To have a fair comparison, we still sample 100 molecules, but split across the top-5 candidate formulae. As before, we order the generated molecules by frequency to obtain the top-10 DiffMS predictions. Below are the results, where we find that DiffMS performance does not significantly deteriorate without access to ground truth formulae:

Model	ACC@1	MCES@1	Tanimoto@1	ACC@10	MCES@10	Tanimoto@10
DiffMS: Predicted Formulae	7.03%	11.81	0.36	14.98%	9.39	0.48
DiffMS: True Formulae	8.34%	11.95	0.35	15.44%	9.23	0.47

Altogether, we demonstrate that DiffMS can recover true molecules using established formula inference tools; and, under parallel formula strategies, DiffMS can still yield molecules with high structural similarities even when the true formula is not highly ranked. We will include these new results in the revised manuscript.

Lack of discussion on computational requirements; The paper lacks a discussion on the runtime or computational requirement of the proposed method. The paper did not mention the resource/ time requirements of the proposed method.

We thank the reviewer for asking this important question about the computational requirements. DiffMS is a relatively lightweight model. All DiffMS experiments were run on NVIDIA 2080ti GPUs which have 12GB of memory. On these GPUs, training DiffMS takes around 1.5 minutes per epoch on CANOPUS and 45 minutes per epoch on MassSpecGym. We train DiffMS for 50 epochs on CANOPUS and 15 epochs on MassSpecGym, for a total training time of 1.25 and 11.25 hours, respectively. Sampling from DiffMS takes around 4 minutes to sample all 100 molecules used to rank the top-10 predictions. We will include a table describing DiffMS computational requirements in the revised manuscript.

Can the chemical formula derived using tools like SIRIUS be ambiguous or incorrect (e.g., due to low-resolution spectra)? If so, are there mechanisms to mitigate or improve such inaccurate formulae?

We run a sample experiment which demonstrates the usage of MIST-CF [1], which achieves high formula accuracy and outperforms SIRIUS by a considerable margin. MIST-CF and other formula annotation tools are not perfect, and additional information from the MS1 data collection, such as isotope information and higher resolution of the precursor peak, can greatly further improve annotation. In cases where formula inference is not accurate, MIST’s subformula annotation module, even with incorrect formula, can still allow plausibly accurate subformula annotation for individual peaks; accordingly, we would expect the fingerprints output by MIST even under incorrectly provided subformula will still provide informative substructural knowledge. Indeed, this is reinforced by the performance on unrestricted formula DiffMS inference, where though exact match accuracies slightly worsen, structural similarities are still largely preserved. We hope to explore further mitigating strategies for formula misannotation in future iterations of the method.

[1] MIST-CF: Chemical Formula Inference from Tandem Mass Spectra, Goldman et al, https://pubs.acs.org/doi/10.1021/acs.jcim.3c01082

审稿意见

评分: 32025-03-14

The paper introduces DiffMS, a diffusion-based model for generating molecular structures from mass spectra, addressing the "inverse" MS problem. It uses a pretraining-finetuning framework with large-scale fingerprint-structure datasets and achieves state-of-the-art performance on benchmarks like CANOPUS and MassSpecGym. The model incorporates chemical formula constraints and discrete graph diffusion, enabling accurate and diverse molecular generation.

给作者的问题

None

论据与证据

Yes

方法与评估标准

Yes

理论论述

None

实验设计与分析

Yes, all

补充材料

与现有文献的关系

The Discrete diffusion model is proposed by Digress (ICLR 23).

遗漏的重要参考文献

None

其他优缺点

Strengths:

DiffMS is the first to apply discrete diffusion for molecular generation from mass spectra, handling permutation invariance and formula constraints effectively. This approach generates chemically plausible molecules, even when spectra underspecify the exact structure.
The model leverages large-scale fingerprint-structure datasets (2.8M pairs) for pretraining, improving performance with increased data. This scalable approach allows for future enhancements by expanding the pretraining dataset.

Weaknesses:

DiffMS struggles with predicting high-accuracy molecules, as seen in its lower performance. This suggests the limited ability to use in real scenarios.
The model relies on accurate chemical formula inference, which may fail in low-resolution spectra or complex mixtures. This dependency could limit its applicability in real-world scenarios.

其他意见或建议

see above

作者回复

2025-04-01

We appreciate the reviewer’s thoughtful feedback and for highlighting the novelty of our discrete diffusion method and pretraining strategies. Below, we have addressed the reviewer’s concerns about the applicability of DiffMS and shown DiffMS performs comparatively well without formula annotations:

DiffMS struggles with predicting molecules [with high accuracy]… this limits its applicability in real-world scenarios.

Though exact matches prove to be a universally challenging task for de novo structural elucidation, the chemical similarity metrics including MCES and Tanimoto similarity indicate that the candidates DiffMS proposes are of strong structural value in the analytical chemistry pipeline. Table 3 in the Appendix demonstrates further similarity metrics that another elucidation method, MS2Mol [1], established. These “close match” and “meaningful match” metrics were derived from an empirical scoring study where chemists were asked to rate predicted and actual structures as one of these two labels, or not similar. These labels provide an understanding of what expert practitioners find to be structurally useful candidates, as they might take these structurally similar candidates and conduct further filtering or refinement steps using orthogonal information collected. Altogether, obtaining similar but not exact structural matches is still valuable to the elucidation pipeline, and DiffMS is able to generate molecules with high meaningful and close match percentages. Specifically, DiffMS is able to generate over 16x more meaningful matches on MassSpecGym compared to the best baseline model.

Additionally, we agree it is too early to claim that de novo generation is “solved”. We believe DiffMS proves a feasible technical pathway towards de novo generation, whereas other baselines are struggling with near-zero accuracies on the challenging MassSpecGym benchmark. We believe this paper, together with training and testing code to be released, will also attract more attention from the machine learning community on studying mass spectrometry, a growing field that has great potential for scientific impact.

The model relies on accurate chemical formula inference, which may fail in low-resolution spectra or complex mixtures.

We thank the reviewer for raising an important question about the difficulty of chemical formula inference. Firstly, we would like to point out that the top performing baseline models on CANOPUS, MIST + Neuraldecipher and MIST + MSNovelist, also require the chemical formula to be known. However, to show that this precondition is not restrictive, we provide a new experiment on the CANOPUS dataset which shows that DiffMS achieves state-of-the-art performance even without the ground truth formulae. Specifically, we use MIST-CF [2], a formula prediction tool, to label chemical formulae from spectra, where we find that it achieves 92% top-5 formula annotation accuracy on CANOPUS. To have a fair comparison, we still sample 100 molecules, but split across the top-5 candidate formulae. As before, we order the generated molecules by frequency to obtain the top-10 DiffMS predictions. Below are the results, where we find that DiffMS performance does not significantly deteriorate without access to ground truth formulae:

Model	ACC@1	MCES@1	Tanimoto@1	ACC@10	MCES@10	Tanimoto@10
DiffMS: Predicted Formulae	7.03%	11.81	0.36	14.98%	9.39	0.48
DiffMS: True Formulae	8.34%	11.95	0.35	15.44%	9.23	0.47

[1] MS2Mol: A transformer model for illuminating dark chemical space from mass spectra, Butler et al., https://chemrxiv.org/engage/chemrxiv/article-details/6492f28ea2c387fa9ab2a465

[2] MIST-CF: Chemical Formula Inference from Tandem Mass Spectra, Goldman et al., https://pubs.acs.org/doi/10.1021/acs.jcim.3c01082

审稿人评论

2025-04-07

Thanks for the rebuttal. I keep my score.

审稿意见

评分: 42025-03-14

The paper introduces DiffMS, a novel diffusion-based generative model for de novo molecular structure prediction from mass spectra. This work addresses the inverse mass spectrometry (MS) problem, which involves reconstructing molecular structures based on experimental mass spectra data.

给作者的问题

No.

论据与证据

The authors did the claims well, however, there are some points may need stronger address:

While the DiffMS encoder leverages transformers for mass spectrum embeddings, no ablation is performed on the impact of different conditioning strategies. Does spectral conditioning significantly impact generation, or would a MLP-based conditioning method work just as well?
How's the performance of fingerprint prediction influence the final generation accuracy? The authors show the ablation study on different number of pre-train samples, but didn't report the performance of ms encoder, which may give some insights.

方法与评估标准

CANOPUS and MassSpecGym are widely used in mass spectrometry applications, making them reasonable choices. However, it would be great to contain the NIST dataset but the current benchmark datasets is also making sense.
No ablation study on alternative conditioning strategies (e.g., how much the transformer-based spectrum encoder improves performance).
How's the performance on different initialization of molecule graph.

理论论述

I didn't see any issues.

实验设计与分析

I would suggest authors adding the experiments for different initialization of graph edges, how the performance will influenced by empty graph, fully-connected graph.
The authors utilized MIST in the first stage to label the peaks' formula. How's the performance on that? Should it considered oracle function? Some reference or discussion would be great.
The graph decoder is pre-trained with the condition on fingerprints. So I assumed that in the end-to-end framework, the fingerprints prediction of the ms encoder (fingerprints) is feed as condition of decoder. But the authors stated "We extract the final embedding corresponding to the precursor peak as the structural condition y for the diffusion decoder.". It's not make sense if the diffusion is pre-trained with the 0/1 condition (fingerprint), but train with float value (final embedding).

补充材料

与现有文献的关系

It is importance in the molecule de novo generation by the guidance of mass spectra.

遗漏的重要参考文献

其他优缺点

See above.

其他意见或建议

No.

作者回复

2025-04-01

We appreciate the reviewer’s thoughtful feedback and suggestions, and respond accordingly below:

While the DiffMS encoder leverages transformers for mass spectrum embeddings, no ablation is performed on the impact of different conditioning strategies. Does spectral conditioning significantly impact generation, or would a MLP-based conditioning method work just as well?

Regarding the effectiveness of utilizing transformers to produce a spectral embedding, MIST [1] has been established as a powerful model for extracting structural information from mass spectra. Additionally, the MIST paper compares with an MLP model (denoted by FFN in their paper), where they observe that MLPs perform significantly worse than MIST across all metrics. In Section 4.4, we demonstrate that DiffMS benefits from a powerful pretrained encoder, thus given the findings in [1], simpler MLP-based spectra conditioning is not expected to perform well.

How's the performance of fingerprint prediction influence the final generation accuracy? The authors show the ablation study on different number of pre-train samples, but didn't report the performance of ms encoder, which may give some insights.

As stated above, MIST has shown to outperform simple MLP baselines on general, fingerprint-relevant tasks. In [1], the MIST encoder achieves state-of-the-art results on Tanimoto similarity, cosine similarity, and log likelihood for predicted fingerprints. We demonstrate in Section 4.4 that pretraining the encoder on spectra to fingerprint prediction improves the performance of the end-to-end finetuned DiffMS model. Ultimately, given the results shown by the MIST paper, and the empirical success of DiffMS leveraging MIST to help generate chemical matches, we think the MIST architecture is a well-suited choice of encoder for this task.

However, it would be great to contain the NIST dataset but the current benchmark datasets is also making sense.

We plan not to prioritize training DiffMS on NIST at this time as NIST is not publicly available without purchase of a license.

I would suggest authors adding the experiments for different initialization of graph edges, how the performance will be influenced by empty graph, fully-connected graph.

We thank the reviewer for asking this important question. We provide some experiments below on the CANOPUS dataset with different graph initialization strategies:

Model	ACC@1	MCES@1	Tanimoto@1	ACC@10	MCES@10	Tanimoto@10
DiffMS: Fully Connected Initialization	3.36%	12.67	0.28	7.60%	9.56	0.4
DiffMS: Empty Graph Initialization	6.60%	11.55	0.34	14.94%	9.07	0.47
DiffMS	8.34%	11.95	0.35	15.44%	9.23	0.47

We observe that fully-connected initialization performs poorly. Empty graph initialization performs similarly to random initialization using the marginal prior distribution on Tanimoto similarity and MCES–which aligns with the intuition that bond connectivity is inherently sparse and thus closer to an empty graph compared to a fully connected one–though the accuracies are worse, suggesting that the marginal prior distribution is still optimal. We will include these additional experiments in the revised manuscript.

The authors utilized MIST in the first stage to label the peaks' formula. How's the performance on that? Should it considered oracle function? Some reference or discussion would be great.

MIST labels the peaks’ formulae using a combinatorial enumeration of possible substructures. It is reasonable to consider this as an oracle function given the formula of the full molecule, and as mentioned in our response to reviewer udS8, the overall molecular formulae can be annotated with over 90% accuracy.

The authors stated "We extract the final embedding corresponding to the precursor peak as the structural condition y for the diffusion decoder." It's not make sense if the diffusion is pre-trained with the 0/1 condition (fingerprint), but train with float value (final embedding).

We thank the reviewer for raising this point and will clarify this wording in the revised manuscript. When finetuning the end-to-end model, we initialize decoder input to be the predicted (binary) fingerprints from the encoder, however, we do not have any auxiliary losses to enforce that the encoder submodule continues to output 0/1 fingerprints; thus, the end-to-end model can ultimately learn different intermediate representations.

[1] Annotating Metabolite Mass Spectra with Domain-Inspired Chemical Formula Transformers, Goldman et al, https://www.nature.com/articles/s42256-023-00708-3

审稿人评论

2025-04-02

Thanks authors for the clear rebuttal, I don't have any other concerns, I believe this work is a good contribution to the metabolimics domain. By the way, looks like the madgen has updated the results, please considering to update as well.

Overall, I believe this paper should be accepted, thanks again for authors' efforts.

作者评论

2025-04-05

We thank reviewer iMA4 for their insightful comments and for improving their score. We will integrate all reviewer feedback as well as the updated MADGEN results into our revised manuscript.

最终决定Accept (poster)

2025-05-01

This paper presents a diffusion model for generating molecular graphs conditioned on mass spectra. The authors proposed several architectural contributions to this problem to encode the domain knowledge in mass spectra, as well as using a discrete graph diffusion models. Results show that the proposed model outperforms previous approaches on de novo molecule generation.