PETIMOT: A Novel Framework for Inferring Protein Motions from Sparse Data Using SE(3)-Equivariant Graph Neural Networks
We present a new formulation of protein flexibility and learn protein motions from sparse experimental data
摘要
评审与讨论
This paper presents a novel framework named PETIMOT (Protein sEquence and sTructure-based Inference of MOTions), which aims to infer protein motions from sparse experimental data. The framework addresses the long-standing problem of approximating protein conformational ensembles under physiological conditions by learning continuous and compact representations of protein conformational changes. PETIMOT integrates SE(3)-equivariant graph neural networks and pre-trained protein language models, optimizing data symmetries, including scaling and permutation operations, through a task-specific loss function. The training and evaluation results on the Protein Data Bank (PDB) show that PETIMOT outperforms existing flow-matching methods and traditional physics-based models in terms of both time and accuracy.
优缺点分析
Strengths
-
The paper presents a novel perspective on addressing the problem of protein conformational diversity by learning the linear subspaces of protein motions, which provides new tools for understanding protein functions.
-
Compared to existing methods, PETIMOT demonstrates performance improvements across multiple metrics, which is of great significance to biomedical research.
-
The authors designed several loss functions (such as least squares loss, squared sinusoidal loss, and independent subspace loss) that are specifically tailored for learning the linear subspaces of protein motions, showing innovation.
Weakness
- The paper is highly specialized in its field. It is suggested to add examples of demo illustrations to showcase the scientific questions of the research.
问题
- The model training only uses structural data from the PDB. Can other types of data be utilized? Or are there any data augmentation techniques being considered? For example, augmented datasets were used in AlphaFold3.
- Why do AlphaFlow and ESMFlow require such a long time as shown in Table 1?
- The PETIMOT model predicts conformational changes of proteins by learning the linear subspaces of protein motions. Do these subspaces have biological significance?
局限性
Yes
最终评判理由
I think another round of submission and revision would make the paper more ready for publication and would keep the original score.
格式问题
No
Weaknesses:
- The paper is highly specialized in its field. It is suggested to add examples of demo illustrations to showcase the scientific questions of the research.
W1: We thank the reviewer for this comment. We have added specific illustrative examples from the structural collections extracted from the PDB database that demonstrate distinct functional conformations of proteins in the revised version of the manuscript.
Questions:
- The model training only uses structural data from the PDB. Can other types of data be utilized? Or are there any data augmentation techniques being considered? For example, augmented datasets were used in AlphaFold3.
Q1: We thank the reviewer for this question. Indeed, other types of structural data can be utilised for training, such as motions derived from MD and Normal Mode Analysis simulations, as well as conformations predicted by AlphaFold or other state-of-the-art protein structure predictors. While we have not yet explored data augmentation techniques for training enhancement, our current results suggest that such expansion may not be immediately necessary.
Connected to this point, and following suggestions from the other reviewers, we added an evaluation experiment of PETIMOT against long MD trajectories from the ATLAS dataset (DOI: 10.1093/nar/gkad1084). We identified 400 protein chains common to both the ATLAS set and our dataset, providing an independent molecular dynamics benchmark. Our approach achieved a 60% success rate on this MD data – significantly higher than our 41% success rate on experimental structures – with an average minimum LS error of 0.55 ± 0.19, average minimum magnitude error 0.17 ± 0.11, and average global SS Error 0.60± 0.16. These results demonstrate that PETIMOT generalises to MD data without re-training nor fine-tuning.
These results demonstrate that PETIMOT generalizes remarkably well to MD data without retraining or fine-tuning.
- Why do AlphaFlow and ESMFlow require such a long time as shown in Table 1?
Q2: AlphaFlow and ESMFlow are much bigger models. The evoformer of AlphaFlow contains about 90M of parameters, and multiple layers with operations scaling as L^3, where L is the protein length. According to Algorithm 2 “Inference” from the AlphaFlow paper, at the inference time during flow matching, the model calls AlphaFold N=10 times, which makes it 10 times slower than the vanilla AlphaFold for a single prediction. For 50 predictions (called “samples” in the code), AlphaFlow will be 500 times slower than the vanilla AlphaFold. Indeed, the AlphaFlow code simply repeats the same computations ‘samples’ number of times: “for j in tqdm.trange(args.samples): prots = model.inference(batch, as_protein=True, noisy_first=args.noisy_first, no_diffusion=args.no_diffusion, schedule=schedule, self_cond=args.self_cond)”. The same reasoning applies to ESMFlow, which is based on the pretrained protein language model ESM with 650M of parameters.
- The PETIMOT model predicts conformational changes of proteins by learning the linear subspaces of protein motions. Do these subspaces have biological significance?
Q3: This is a very important concern and we thank the reviewer for bringing it up. Ref 15 (cited in lines 38-40) provides supporting evidence for the biological significance of linear subspaces of protein motions inferred from experimental 3D structures. The authors showed that held out conformations can be reconstructed with reasonable accuracy by sampling from a linear interpolation trajectory between extreme known conformations in the learnt PCA subspace.
This concept is currently being exploited in Cryo-EM reconstruction. For example, DOI 10.1016/j.jsb.2015.05.007 paper experimentally demonstrates that the first principal component of the 70S ribosome data reveals the expected conformational changes, and the first principal component of the RdRP dataset reveals a conformational change in the two dimers of the RdRP. The DOI 10.1107/S2059798321002291 paper overviews the PCA applications in Cryo-EM reconstruction, analyses volume covariance matrices, and concludes that PCA of Cryo-EM data should be considered as a way to describe large conformational changes.
To clarify this aspect we will add the following statement to the revision “Interpolation trajectories along the linear manifold can recapitulate known intermediate conformations, supporting its biological significance.“
That being said, we acknowledge that some of the motions in our training dataset may not be biologically relevant, due to experimental artifacts (e.g., of crystallographic origin, or due to sequence engineering). Nevertheless, automatically and reliably distinguishing artifactual motions from biologically relevant ones remains infeasible. Our working hypothesis is that a part of the conformational manifold represents functionally relevant motions.
This limitation in our dataset may contribute to PETIMOT’s relatively modest success rate. The additional evaluation we performed on MD data, likely containing less artifacts and representing a more limited conformational diversity, further supports this hypothesis. Indeed, on this type of data, the success rate increases by 30%, up to 60% – see our answer to Reviewer JsMv’s Weakness 1 above.
I'd like to thank the authors for the rebuttal. After reading all the comments and responses, I'll keep the score.
This paper introduces PETIMOT, a framework for predicting linear residue motions from 3D protein structures using a SE(3)-equivariant GNN on top of protein language model embeddings. The experiments show that PETIMOT is able to recover such linear motions better than existing physics and flow-based models in this space.
优缺点分析
The paper is actually quite well written, and I enjoyed reading it as a whole. The different parts of the loss make sense (even if they don’t affect the final results too much, as seen in Figure 2). The experiments and metrics seem reasonable.
My main concerns are on the data aspect (the authors also discuss this in the text). First, given the fact that the PDB consists of proteins which could be crystallized in the first place, (thus, more rigid proteins and not intrinsically disordered ones) what is the real conformational diversity that you aim to capture? Second, how reasonable are linear motions to describe such conformational ensembles in the PDB? My feeling is that allosteric transitions are far more complex than simple linear motions, since the motions of these residues are all correlated with one another in very complex ways. I think without answering these key questions well, the paper is a little incomplete.
I think this paper could have much more impact with a study on intrinsically disordered proteins (eg. from the IDRome) or macrocycles (eg. from CREMP) that do have much more interesting conformational diversity which is currently quite challenging to model.
References:
问题
- Do you have experiments on generalization across length scales, in the sense that you can learn dynamics from shorter proteins/peptides and transfer to larger ones?
- Figure 1: typo in “Aggregarion” -> “Aggregation”?
局限性
yes
最终评判理由
I believe the author's additional results on ATLAS helped convince me about the utility of their method. In short, I am not super sure whether their method is capturing biologically relevant phenomena, but I also understand that evaluating on such criteria can be challenging. I like the simplicity of their model and I think this paper should be accepted.
格式问题
none
Weaknesses:
- My main concerns are on the data aspect (the authors also discuss this in the text). First, given the fact that the PDB consists of proteins which could be crystallized in the first place, (thus, more rigid proteins and not intrinsically disordered ones) what is the real conformational diversity that you aim to capture?
W1: We understand the reviewer’s concern regarding conformational diversity limitations in crystallographic data. However, we would like to clarify that our dataset encompasses conformations solved by multiple experimental techniques, including 56,866 Cryo-EM structures (30.5% of the dataset) and 2,187 NMR structures (about 1.5%). This ensures representation of diverse conformational states beyond those accessible to crystallization alone.
Moreover, ou dataset is not limited to rigid-body motions from globular proteins. It contains opening-closing motions of different types (shear, hinge, allosteric, complex) and also fold-switching motions of metamorphic proteins -- see above our answer to Reviewer JsMv's Q2 for more details. We mention here two additional illustrative examples demonstrating the extensive conformational diversity captured in our dataset. (1) Apoflavodoxin from Anabaena. This collection comprises 25 conformations, including one corresponding to the equilibrium intermediate of the protein’s thermal unfolding solved by NMR (PDB id: 2KQU). The first principal motion reflects the transition between this partially unfolded state and the folded state (represented by the remaining 24 X-ray conformations), explaining 97% of the observed conformational heterogeneity. (2) Anthrax toxin protective antigen. This collection includes conformations solved by both X-ray crystallography and cryo-EM, exhibiting extensive conformational plasticity corresponding to the conversion from prepore state (PDB id: 1TZO) to translocating pore state (PDB id: 6UZD). The associated conformational change is substantial, with 30 Å RMSD between extreme conformations and involving large secondary structure rearrangements of a 80-residue long loop.
These examples illustrate that our dataset captures biologically relevant large-scale motions, including partial unfolding and dramatic conformational transitions. More broadly, it features a maximum pairwise RMSD above 7 Å in about one quarter of the collections, and up to 70 Å. The main motions are of very different types depending on the collection, from loop deformation involving only a few residues (highly localised) to large domain-domain motions involving almost the entire protein (highly collective).
- Second, how reasonable are linear motions to describe such conformational ensembles in the PDB? My feeling is that allosteric transitions are far more complex than simple linear motions, since the motions of these residues are all correlated with one another in very complex ways. I think without answering these key questions well, the paper is a little incomplete.
W2: We acknowledge the reviewer's important point about the complexity of allosteric transitions. While protein motions are indeed inherently nonlinear, linear approximations through PCA have proven remarkably effective for capturing the dominant modes of conformational change, as extensively demonstrated in the literature. In Ref 15, the authors demonstrated that in about half of the collections, computed over the whole PDB, only one or two linear motions are sufficient to explain almost all of the observed conformational heterogeneity (>90% of the positional variance explained). Moreover, the vast majority of the collections (>90%, Fig.2a) require less than 8 linear motions. These results demonstrate that low-dimensional linear manifolds are a reasonable means for describing most of the conformational diversity observed in the PDB.
However, we recognize that a few protein families require a high number of linear motions (>10). This high complexity may reflect nonlinear structural deformations, for instance those involved in allosteric transitions, or seemingly random fluctuations. Beyond predictive performance, our approach can be used to identify such cases where the reviewer's concern about complex allosteric correlations becomes critical and warrants further in-depth characterisation with nonlinear manifolds.
- I think this paper could have much more impact with a study on intrinsically disordered proteins (eg. from the IDRome) or macrocycles (eg. from CREMP) that do have much more interesting conformational diversity which is currently quite challenging to model.
W3: We thank the reviewer for this suggestion and we agree with them that the dynamics of these systems are highly interesting and challenging to model. Nevertheless, we would like to stress that a meaningful study of IDPs and macrocycles would require extensive additional considerations.
IDPs present a fundamentally different learning problem: they lack stable structural templates and we expect them to exhibit much more diffuse conformational sampling compared to folded proteins. To quantify this difference, we performed PCA on the 28,058 MD trajectories from IDRome. On average, 20 ± 5 principal motions are necessary to explain 90% of the observed conformational heterogeneity. The first principal motion explains only 24.1 ± 2.4%. These values are substantially shifted from what we observe in our PDB collections. While our method learns conformational transitions from sparse distinguishable experimentally validated protein states, there exist no such validation data for IDRs. IDRome entries correspond to poorly resolved protein segments extracted from AlphaFold database that lack proper intra-molecular constraints and cellular context (e.g., partners), yet crucial determinants of IDR functional conformations. The associated paper (DOI: 10.1038/s41586-023-07004-5) exploits the trajectories only to derive global properties such as compaction, not to gain detailed insights into motions and conformations. From a practical perspective, the simulations are coarse-grained (only C-alpha atoms) and thus not adapted to our architecture. A meaningful training and evaluation of our approach on IDPs would require developing entirely different metrics and training paradigms.
Macrocycles have completely different input features (synthetic sequences, non-natural amino acids, cyclization constraints) that do not match our training distribution of natural proteins. This would essentially require extensive adaptation of our model architecture and training procedure. In addition, CREMP entries are synthetic molecules lacking evolutionary context. The protein language model embeddings leveraged by our approach would be inappropriate for them.
We acknowledge these would be valuable extensions. Our empirical analysis of IDPs demonstrates the fundamental methodological challenges involved, while our current work on folded proteins provides the necessary foundation for tackling these more complex systems in future research.
--
Questions
- Do you have experiments on generalization across length scales, in the sense that you can learn dynamics from shorter proteins/peptides and transfer to larger ones?
Q1: We thank the reviewer for this question. We did not perform a training/evaluation procedure based on a protein length split. Nevertheless, we did analyse the success and failure cases and we did not find any correlation between PETIMOT's performance and the protein length (number of residues), nor the motion collectivity (number of residues involved in the motion).
- Figure 1: typo in “Aggregarion” -> “Aggregation”?
Q2: We thank the referee and apologize for the typo, we have fixed it in the revised version of the manuscript.
I thank the authors for their response. I think the main concerns I have right now goes back to two points:
- How biologically relevant are the linear motions identified by PETIMOT? Do you have some examples in your test set that are particularly interesting/challenging? I just wonder if the PDB dataset is just too simple, but if you can explain your ATLAS MD experiments a little bit better, I think I will be convinced.
- Can you talk more about false positives: eg motions that PETIMOT identified that just correspond to thermal fluctuations (for example)?
I like the method, especially its simplicity. I just want to see some more evidence that it is actually capturing something non-trivial and useful, if that makes sense.
We thank the reviewer for their feedback. We would like to clarify that to maximise the likelihood of having functionally relevant motions in our training set, we excluded collections with max RMSD between any 2 conformations smaller than 2Å – this reduces the risk of learning typical thermal fluctuations (~0.5-1Å). Furthermore, Only 25% of our test proteins exhibit motions that can be inferred “easily” (by a simple elastic network model, NMA) from an input 3D structure. Among the more challenging cases, PETIMOT captures motions with extremely high accuracy (LS error < 0.35, we visually inspected many cases to set up meaningful thresholds) for 59 proteins. With a median max pairwise RMSD of 5Å and up to 40Å, these well-captured motions include well documented functional open-closed transitions such as that of the bacterial Gln-binding protein (1GGG, 8EYZ), of the Brain-type Creatine Kinase (DOI: 10.1016/j.febslet.2008.10.039), and of the Nickel Sensing protein Nikr (DOI: 10.1016/j.jmb.2005.03.017), among others.
To further identify particularly challenging cases from our dataset, we looked for collections where both Alpha/ESMFlow and the NMA were unable to capture any motion while PETIMOT achieved extremely high accuracy. We identified 11 such test proteins. Among those, the pore-forming toxin FraC was extensively described in different functional states in DOI: 10.1038/ncomms7337. The collection in our dataset displays a max pairwise RMSD of 15Å with a large conformational change where the 30 N-terminal residues detach from the rest of the protein, captured by PETIMOT with high precision: min LS error 0.22 from the extended state and 0.42 from the compact state.
Another particularly interesting example is the ATPase NSF whose experimental structures correspond to ATP/ADP-bound states and 20S supercomplex from cryo-EM studies (DOI: 10.7554/eLife.38888, DOI: 10.1038/nature14148). The functionally relevant heterogeneity require 4 linear motions (the 1st motion alone explains 57%). PETIMOT successfully captures this complex motion subspace with min LS error as low as 0.32 and global SS error of 0.30, demonstrating its ability to predict not just single motions but biologically meaningful motion subspaces. Hence, our PDB dataset offers interesting/challenging motions that PETIMOT successfully captures.
Turning to the ATLAS database: it comprises over 1K proteins with unique ECOD class X and captures a wide range of motions (trajectories available online for inspection). It is well-accepted in the ML community, Alpha/ESMFlow and others are trained or fine-tuned on its trajectories. We focused our evaluation on 400 protein chains, from 39 to 1,023 residues, shared between the ATLAS set and our PDB dataset. The ATLAS simulations for these proteins display a wide range of motion amplitude, max pairwise RMSD from 1.5Å to 31Å, and a wide range of motion complexity, 1-70 linear motions to explain 80% of the variance. To avoid data leakage, we re-trained PETIMOT using a rigorous 5-fold cross-validation procedure over the full PDB dataset (36,675 samples). For each of the 400 selected ATLAS protein chains, we used the corresponding PETIMOT_5folds model trained on the fold where that specific chain was held out from training. Overall, PETIMOT_5folds achieves a 60% success rate on this MD data without being trained nor fine-tuned on it and without having seen similar proteins (sequence- and structure-wise) during training – see our answer to Reviewer JsMv’s weakness 1. PETIMOT_5folds captures with high accuracy small-amplitude motions, such as those of Amyloid-beta precursor protein (2FMAA, max RMSD of 2.2Å, min LS error of 0.30), as well as high-amplitude motions, such as those of CCHC-type domain-containing protein (7C45A, max RMSD of 23,9Å, min LS error of 0.19). Overall, we observe a weak dependence of PETIMOT performance on the max RMSD and the variance explained by the first mode (R of -0.25 and 0.28).
Finally, regarding false positives, we observe cases where all methods produces very high errors. Visual inspection reveals reference motions unlikely functional, such as induced by insertions/deletions of protein segments or very high amplitude displacements of the protein extremities. In addition, in about 30% of the full PDB dataset, we observed that both PETIMOT and NMA are wrong but they do agree with each other. We suspect that some of these predicted motions may actually be true positives but that the limited PDB sampling does not allow us to observe them on top 4 PCA modes. For instance, this scenario happens for adhesion receptor integrins. NMA and PETIMOT are both able to capture motions in the AlphaV Beta3 Integrin’s collection (4G1EA, min LS errors of 0.28 and 0.48 with reference, and 0.19 between them) but these motions are not exhibited by the integrin β 3 headpiece’s collection (6BXBA). These results suggest that PETIMOT could be used to discover new motions, for example for singleton proteins.
I thank the authors for their additional experiments on ATLAS. I had some time to think about this paper: I believe that this paper is worthy of acceptance based on the authors' results. I think the scientific community will be able to judge the broader utility of their model once released. I will increase my score.
We are grateful to the reviewer for taking the time to reconsider our work and for their positive feedback. We appreciate their constructive comments and we are glad that our additional experiments addressed their concerns effectively.
The authors formulate the problem of predicting protein conformations as one of predicting the principal components of motion. Their method couples a graph neural network architecture with a collection of loss functions that encourage the learning of motion subspaces that align with reference subspaces extracted from a dataset they curate from deposited PDB structures. They demonstrate that relative to diffusion generative models and normal mode analysis, their method predicts dominant motions more accurately at a fraction of the cost.
优缺点分析
Strengths:
The paper proposes an interesting new formulation of a classic NMA / dimensionality reduction approach to protein conformation analysis as a machine learning problem. The loss functions appear well-thought out, emphasizing different properties of the predicted motion subspaces, and are supported by the literature. The resulting model provides a fast way for practitioners to assess what modes of variability might be present in their system, which can be helpful in designing further experiments or validating heterogeneous structure determination workflows. The authors additionally present comprehensive ablations covering losses, graph construction, embedding models, etc.
Weaknesses:
The authors note the ability to generate conformational ensembles by sampling along the predicted components of variation. While I understand that the method is not designed to produce ensembles in a thermodynamic sense, it would enhance the value and validation of the method to sample an ensemble and compare it against the reference using metrics like precision/recall/diversity as is done in AlphaFlow.
Additionally, as the success rate is modest (~44%), it would be helpful if the authors shared an analysis of failure modes. Are the motions in unsuccessful cases not well-described by linear subspaces?
问题
- In line 168, why would the opposite not be true, that the SS loss is 0 for identical subspaces and 1 for mutually orthogonalising subspaces?
- The IS loss is intended to maximize the rank of the predicted subspace. Can you clarify the dependence of this loss on the reference components?
- In Figure C.2, I understand (b)(d) might be artificially influenced by the constraints of the matching procedure. But in (a)(c), K=8 appears to clearly outperform K=4. Why was the default chosen to be K=4?
- Do you note any patterns in the unsuccessful test cases? Are these motions not described well by linear subspaces?
- Would it be meaningful to generate an ensemble of structures from the predicted components, and compare it to the reference collection using simple metrics like precision and recall?
局限性
Yes
最终评判理由
While there remains a question of whether the training set captures biologically relevant motions, the new results on MD simulations partly alleviates this concern. Furthermore, the problem formulation and resulting method is elegant, and could be extended to better training datasets including MD in the future. A direct comparison to AlphaFlow/ESMFlow is difficult since the methods are solving different tasks, but I think this is ok. PETIMOT-generated structures, while not being accurate in a thermodynamic sense, could still be useful for practitioners to quickly assess possible dynamics or seed other workflows like heterogeneous cryo-EM reconstruction. I believe this paper should be accepted on its technical merits, and agree with reviewer UWSK that the broader scientific community can judge the extent to which this model is useful for their workflows.
格式问题
None
Weaknesses:
- The authors note the ability to generate conformational ensembles by sampling along the predicted components of variation. While I understand that the method is not designed to produce ensembles in a thermodynamic sense, it would enhance the value and validation of the method to sample an ensemble and compare it against the reference using metrics like precision/recall/diversity as is done in AlphaFlow.
W1: We thank the reviewer for this suggestion. However, direct comparison using AlphaFlow-style metrics is not appropriate for our approach due to fundamental differences in the underlying data and objectives.
Our reference conformational collections do not represent thermodynamic ensembles. Instead, the distribution of protein states in our dataset reflects sampling biases in the PDB due to experimental conditions or to researchers' interests. Our approach is specifically designed to cope with these biases. Indeed, the main motions extracted by PCA are those explaining the most the positional variance and thus, these calculations are not impacted by protein state relative frequencies. For instance, while the collection associated with adenylate kinase comprises 35 conformations representing the closed ligand-bound state and 7 conformations for the open apo state, the main motion, explaining 99% of the variance, describes the transition between open and closed states. As a consequence, our approach is able to describe transitions between the distinct protein states that have been captured in experiments.
This property of our reference makes it inappropriate to compare with conformational ensembles generated from our predicted motions. Alternatively, we did the following experiment: (1) project the reference conformations onto the predicted motion manifold, (2) reconstruct the conformations in the ambient space from the projections, and then (3) compute the lDDT or RMSD error between ground-truth and reconstructed conformations. However, conformations close to the input query always show low reconstruction error regardless of manifold quality, creating a bias toward artificially good metrics. When these conformations dominate the ensemble, like the adenylate kinase’s closed state, the average lDDT/RMSD is always high/low. Moreover, this set up establishes a bijective relation between generated and reference conformations, making it impractical to compute precision and recall as in AlphaFlow. Another option would be to compare with MD ensembles. This experiment was not feasible for this response, due to time and resource constraints. Nevertheless, we are currently running MD ensemble comparisons and will provide these results in our next response within the rebuttal period.
- Additionally, as the success rate is modest (~44%), it would be helpful if the authors shared an analysis of failure modes. Are the motions in unsuccessful cases not well-described by linear subspaces?
W2: PETIMOT’s relatively modest success rate may be partially explained by limitations of our dataset. Indeed, we acknowledge that some of the motions in our training dataset may not be biologically relevant, due to experimental artifacts (e.g., of crystallographic origin, or due to sequence engineering). Nevertheless, automatically and reliably distinguishing artifactual motions from biologically relevant ones remains infeasible. Our working hypothesis is that a part of the conformational manifold represents functionally relevant motions.
The additional evaluation we performed on MD data, likely containing fewer sampling artifacts and representing a more limited conformational diversity, further supports this hypothesis. Indeed, on this type of data, the success rate increases by 30%, up to 60% – see our answer to Reviewer JsMv’s W1 above.
Questions:
- In line 168, why would the opposite not be true, that the SS loss is 0 for identical subspaces and 1 for mutually orthogonalising subspaces?
Q1: We thank the reviewer for detecting this typo. If the two subspaces are identical, the SS loss is 0 and it becomes 1 in the orthogonal case. We have corrected the statement in the revised version.
- The IS loss is intended to maximize the rank of the predicted subspace. Can you clarify the dependence of this loss on the reference components?
Q2: The IS loss extends the SS loss by adding a term that indeed aims at maximizing the rank of the predicted ‘x’ subspace, and more generally, at maximizing the orthogonality of the ‘x’ subspace. This orthogonalization term operates solely on the predicted components ‘x’ and does not explicitly involve the reference components ‘y’. The other term, directly inherited from the SS loss, aims at minimising the discrepancy between predicted ‘x’ and reference ‘y’ subspaces and thus, explicitly depends on the ‘y’ reference components. Let us remind the reviewer that the ‘y’ reference subspace is itself orthogonal.
The IS loss may lead to instabilities in orthogonalizing the predicted ‘x’ subspace in the case where the number of predicted motions is higher than the number of reference motions, i.e., the ‘x’ subspace has higher dimensionality than the ‘y’ subspace. In such cases, components of ‘x’ that lie outside the reference subspace ‘y’ contribute equally (but with opposite signs) to both terms of the IS loss, leading to zero gradient and slow convergence.
However, we are not in this situation here, and thus, we believe our current IS formulation is stable. Additionally, IS optimization may converge more slowly than SS, because each of the subspace components are iteratively rotated individually, while in the SS loss the whole subspace is orthogonalized and then rotated at once.
- In Figure C.2, I understand (b)(d) might be artificially influenced by the constraints of the matching procedure. But in (a)(c), K=8 appears to clearly outperform K=4. Why was the default chosen to be K=4?
Q3: We thank the reviewer for this comment, which made us realise that the description of this experiment was not precise enough. An important point here is that we keep the number of ground-truth components fixed to L=4.
The key insight from Figure C.2 is that minimum-based metrics (a,c) and assignment-based metrics (b,d) measure different aspects of subspace quality. Minimum metrics measure the best possible match between any predicted and ground-truth component. These improve with more predicted components (from 1 to 8) because having more candidates increases the likelihood of finding at least one good match with each ground-truth component. Optimal assignment metrics measure overall subspace alignment by finding the best one-to-one matching between predicted and ground-truth components. Here, models with fewer predicted components (1-2) perform better because they face fewer constraints in the assignment problem - each predicted component can be matched to the best available ground-truth component without competition.
The 8-component model maintains the best performance overall, as having more candidate vectors provides flexibility while still capturing the 4-dimensional ground-truth subspace effectively.
The default was chosen to K=4 because of our data augmentation strategy. We consider at least 5 conformations per collection, and we use at most 5 conformations as input queries during training.
We have reformulated the description of the experiment in Appendix C of the revised manuscript.
- Do you note any patterns in the unsuccessful test cases? Are these motions not described well by linear subspaces?
Q4: To answer the reviewer’s question, we conducted a systematic analysis of the dependency of PETIMOT performance on collection properties. We could not identify a clear correlation with the protein length (number of amino acids), collection cardinality (number of members), motion complexity (number of modes required to explain most of the variance), motion amplitude (maximum pairwise RMSD between two conformations), or motion collectivity (number of residues involved in the motion). Nevertheless, we do observe a number of collections with large amplitude motions at the protein extremities, sometimes induced by sequence insertions and deletions across the conformations of the collection, or seemingly random fluctuations, casting doubts on the biological relevance of these motions. We also observe that the four methods tends to agree and that the NMA shows a bias toward collective motions (see answer to Weakness 2. from Reviewer JsMv).
- Would it be meaningful to generate an ensemble of structures from the predicted components, and compare it to the reference collection using simple metrics like precision and recall?
Q5: We thank the reviewer for this suggestion; this point was also raised by Reviewer XsLc. However, direct comparison using AlphaFlow-style metrics between ensembles generated from our predicted motions and the reference collections do not seem appropriate to us. Indeed, we would generate Gaussian distributions while the reference collection may be biased toward specific protein states due to experimental conditions or researchers’ interest. For instance, the collection associated with adenylate kinase comprises 35 conformations representing the closed ligand-bound state and 7 conformations for the open apo state. If we generate conformations from the closed state and with a small amplitude, both the precision and the recall will be high, regardless of the predicted motion manifold quality, creating a bias toward artificially good metrics. Another option would be to compare with MD ensembles. This experiment was not feasible for this response, due to time and resources constraints. Nevertheless, we are currently running MD ensemble comparisons and will provide these results in our next response within the rebuttal period.
I thank the authors for their detailed response and additional experiments. While the proposed method captures variability that is different from AlphaFlow/ESMFlow, I still believe it is elegantly formulated and potentially useful for practitioners to quickly assess motions or produce initial structures for other workflows. Therefore, I will raise my score.
We thank the reviewer for taking the time to reconsider our work and for their positive feedback. We appreciate their constructive comments throughout the review process and we are glad that our response and additional experiments addressed their concerns effectively.
PETIMOT employs an SE(3)-equivariant graph neural network to model protein dynamics, directly predicting principal components of conformational changes through its geometrically aware architecture. The authors compiled a custom dataset using protein structures from the PDB, leveraging evolutionary information from multiple sequence alignments to infer diverse conformational states for individual proteins, enabling robust dynamics modeling.
优缺点分析
Strengths:
- Introduces a novel formulation to predict principal components of protein conformational changes using SE(3)-equivariant networks.
- Proposes innovative loss functions for protein dynamics modeling that may inspire future methodologies in related fields.
Weaknesses:
See limitations
问题
- The dataset construction process in this paper is controversial (add references for proof), using more MD simulation dataset is recommended, mayby you can test your PETIMOT system on some long MD simulation case studies?
- More test cases could be included in this paper, including several structure subsets like Apo-Holo, Fold-Switch proteins
- Evaluation metrics currently used are hard to understand, and only used in this paper. It is possible to add some wildly used metrics, including pairwise distance, JS divergense in tICA/PCA, pairwise RMSD, RMSF MAE, etc.
- Add comparison to conventional long MD simulation is preferred in results, I understand that it is not possible to do large scale simulation, but it can be used for a small subset of test set.
局限性
- Lack sufficient test and evaluation, the test dataset is relatively limited
- Resutls did not show similarity and systematic differences towards known models, including ESMFlow/AlphaFlow
最终评判理由
Based on the following concerns mentioned in my reply for rebuttal, I will keep my scores.
- As mentioned in Q3, it is good for authors to design some new metrics for this task, but without those well-accepted metrics, espacially like tICA and RMSD-based metrics, it is hard to compare its real performance with conventional MD and other related methods.
- As mentioned in Q4, although authors have employed ATLAS dataset, 100 ns conventional MD should not be considered as "long simulation" in this task. Some long simulation or large scale structure change including DE Shaw's fast-folding protein, BPTI long simulation or some multi-state Cryo-EM resolved structures.
- In limitation 2, the reply still cannot convince my the difference between this method with ESMflow in performance. In general, I appreciate the work done by the authors, especially in the model design and new metrics (losses), but the evalutaion is not well-defined, and the performance is not convincing enough.
I have read authors' reply to my second round comments, but still not resolve my concerns.
格式问题
No formatting concerns
Weaknesses:
- Lack sufficient test and evaluation, the test dataset is relatively limited
W1: To address the reviewer’s concern, we conducted 5-fold cross-validation over the full PDB dataset (36,675 samples, 5 by collection). For each fold, we implemented a rigorous data partitioning protocol ensuring that the training set (80%) did not contain any protein chain sharing significant structural similarity (Foldseek, e-value 1e-2) or sequence similarity (MMseqs2, 30% identity) with the test set (20%). This strict protocol prevents data leakage and provides robust evaluation across our complete dataset.
To assess the impact of this new training scheme, we re-evaluated our original test set of 824 protein chains, excluding chains with structural or sequence similarity to the training data. On the remaining 473 independent protein chains, the newly trained PETIMOT models ("PETIMOT_5folds") and the original model achieved consistent performance of 41 and 42% success rate, with the same average min LS Error of 0.63 ± 0.21. These results, nearly identical to those reported in the manuscript, demonstrate that our original evaluation was representative despite the smaller test set, and PETIMOT's performance is robust and generalizable across different data partitions.
To further assess PETIMOT's robustness, we evaluated it on long MD trajectories from the ATLAS dataset (DOI: 10.1093/nar/gkad1084), as suggested by the reviewer. We identified 400 protein chains common to both the ATLAS set and our dataset, providing an independent MD benchmark. To ensure rigorous evaluation without data leakage, for each ATLAS protein chain we used the corresponding PETIMOT_5folds model trained on the fold where that specific chain was held out from training (iensuring no training exposure). PETIMOT_5folds achieved a 60% success rate on this MD data – significantly higher than our 41% success rate on experimental structures – with a min LS error 0.55 ± 0.19, min magnitude error 0.17 ± 0.11, and global SS Error 0.60 ± 0.16. These results demonstrate that PETIMOT generalises to MD data without re-training nor fine-tuning.
- Resutls did not show similarity and systematic differences towards known models, including ESMFlow/AlphaFlow
W2: Following the reviewer suggestion, we conducted a systematic analysis of the similarity and differences between PETIMOT and the baselines. The differences between PETIMOT and AlphaFlow are strongly correlated with those between PETIMOT and ESMFlow (Pearson R = 0.7) and also, but to a lesser extent, to those between PETIMOT and NMA (Pearson R = 0.6). Furthermore, three quarters (76%) of PETIMOT failure cases (min LS loss above 0.6) also represent failure cases for all the baselines, AlphaFlow, ESMFlow and the NMA. This result shows that that the four methods tend to agree.
PETIMOT consistently performs better than the baselines across the whole test set: its min LS error is lower than AlphaFlow in 65% of the cases, than ESMFlow in 67% and than NMA in 69%. The remaining cases do not show any enrichment with respect to protein length (number of amino acids), collection cardinality (number of members), or motion amplitude (maximum pairwise RMSD between two conformations). The only clear trend is a dependency of the NMA on motion collectivity (number of residues involved in the motion). The NMA success cases are enriched in collective motions and depleted in localised motions. PETIMOT does not share this limitation and tends to approximate localised motions better than the NMA.
Over the whole test set, there are only 5, 4 and 2 cases, respectively, where PETIMOT produces highly inaccurate predictions (min LS loss above 0.7) and AlphaFlow, ESMFlow and the NMA, respectively, are clearly successful (min LS loss below 0.4). A couple of cases exhibit a highly localised loop motion (the peptidoglycan peptidase, 6JN8A, AlphaFlow better) or a rigid-body domain motion (the chromosomal replication initiator protein dnaA, 2HCBC, NMA better) involved in the function of the protein, but the majority exhibit large amplitude motions at the protein extremity, sometimes induced by insertions/deletions across the members of the collection, casting doubt on their functional relevance.
Questions:
- The dataset construction process in this paper is controversial (add references for proof), using more MD simulation dataset is recommended, mayby you can test your PETIMOT system on some long MD simulation case studies?
Q1: The use of experimental structure collections to infer protein dynamics through PCA is well-established in the literature (Refs 14-16, cited in lines 38-40). Ref 14 directly validates our dataset construction. The authors inferred dynamic properties from sets of experimentally determined structures of highly similar proteins called “High Sequence-similarity Protein Data Bank ensembles”, which are conceptually similar to the collections of our dataset. They demonstrated that these inferred properties faithfully reproduce experimental NMR measurements reflecting the behaviour of the protein in solution, and showed that even a modest number of experimental structures is sufficient to capture the native conformational heterogeneity.
This experimental validation has stimulated the development of efficient computational tools (Refs 15-16) for extracting principal modes of motions from conformational collections. Ref 15 demonstrated that interpolation trajectories performed in PCA space inferred from experimental ATPase structures can recapitulate intermediate states, further supporting the validity of our dataset construction.
As explained above, in the answer to Weakness 1., we tested PETIMOT on some long MD simulations from the ATLAS dataset (DOI: 10.1093/nar/gkad1084).
- More test cases could be included in this paper, including several structure subsets like Apo-Holo, Fold-Switch proteins
Q2: We thank the reviewer for the suggestion, we have conducted an evaluation of PETIMOT on two additional datasets. The first one is the iMod benchmark (DOI: 10.1093/bioinformatics/btr497) comprising pairs of open-closed conformations for a couple of tens of proteins that represent a wide variety of motions (hinge, shear, allosteric, and complex motions). The open-closed transitions are associated with ligand or partner binding in most of the cases. PETIMOT_5folds achieved a 86% success rate on this dataset with an average min LS error of 0.41 ± 0.18, average min magnitude error 0.14 ± 0.07, and average global SS Error 0.64 ± 0.12.
We compiled the second dataset from Ref 7 cited on lines 25-26. It comprises six metamorphic proteins with six pairs of structures representing fold switches. PETIMOT_5folds achieved a success rate of 37% on this dataset, with a min LS error of 0.67 ± 0.17, min magnitude error 0.25 ± 0.14, and global SS Error 0.78 ± 0.09. Our approach performed particularly well on KaiB, also highlighted in Ref 7. The min LS error is 0.45 starting from the ground state (2QKEC) and 0.57 starting from the FS state (5JYTA). The respective global SS errors are 0.68 and 0.69. By contrast, PETIMOT achieves high accuracy for the RfaH transition only when starting from the beta-sheet active state (6C6SD, min LS error of 0.37). The error is much higher (0.98) when starting from the inactive alpha-helix bundle state (5ONDA).
These additional evaluations demonstrate PETIMOT's versatility across different types of conformational changes. The lower performance on metamorphic proteins is expected given their dramatic secondary structure rearrangements. Nevertheless, PETIMOT shows promise even for these extreme transitions.
- Evaluation metrics currently used are hard to understand, and only used in this paper. It is possible to add some wildly used metrics, including pairwise distance, JS divergense in tICA/PCA, pairwise RMSD, RMSF MAE, etc.
Q3: While we understand the reviewer’s concern, we would like to stress that we carefully chose metrics that specifically reflect the agreement between individual motions or the overlap between motion subspaces, which is the core challenge of our prediction task.
Our LS error (Equations 5 and 7 on page 4) computes the weighted pairwise least-square difference between ground-truth and predicted motion directions. The LS error, together with MAE, are among the most accepted metrics for regression tasks. We have only specifically adapted it to the challenge of evaluating directional motion vectors rather than static coordinates, and scaled between 0 and 1 for better training, interpretability and usability.
Moreover, our SS loss (Equation 8 on page 5) relies on subspace coverage metrics established in the literature (Refs 54-56, cited in lines 162-163). Refs 54-56 define the Root Mean Square Inner Product (RMSIP) as a global measure of principal motion subspace overlap and demonstrate its effectiveness for comparing motions derived from MD simulations, Anisotropic Network Model and a geometrical rigid cluster decomposition algorithm. Our SS loss is conceptually similar to RMSIP, ensuring that our evaluation is grounded in established methodology.
The SS loss is also conceptually similar to the comparison of angles between subspaces. A few recent examples of such subspace comparison from other ML domains include e.g., Zhu, Fei, et al. NeurIPS 2021: 14306-14318; Feng, Qizhang, et al. NeurIPS 2023: 80644-80660; Hawke, Sam, YueEn Ma, and Didong Li. NeurIPS 2024: 74034-74057; Schlaginhaufen, Andreas, and Maryam Kamgarpour. NeurIPS 2024: 21461-21501; Chen, Wei, Zichen Miao, and Qiang Qiu. NeurIPS 2023: 73995-74020. etc.
- Add comparison to conventional long MD simulation is preferred in results, I understand that it is not possible to do large scale simulation, but it can be used for a small subset of test set.
Q4: As explained above, in the answer to Weakness 1., we tested PETIMOT on some long MD simulations from the ATLAS dataset.
Thanks for the comprehensive rebuttal. But based on the following concerns, I will keep my scores.
- As mentioned in Q3, it is good for authors to design some new metrics for this task, but without those well-accepted metrics, espacially like tICA and RMSD-based metrics, it is hard to compare its real performance with conventional MD and other related methods.
- As mentioned in Q4, although authors have employed ATLAS dataset, 100 ns conventional MD should not be considered as "long simulation" in this task. Some long simulation or large scale structure change including DE Shaw's fast-folding protein, BPTI long simulation or some multi-state Cryo-EM resolved structures.
- In limitation 2, the reply still cannot convince my the difference between this method with ESMflow in performance.
In general, I appreciate the work done by the authors, especially in the model design and new metrics (losses), but the evalutaion is not well-defined, and the performance is not convincing enough.
We thank the reviewer for their feedback. We respectfully emphasize that PETIMOT introduces a fundamentally new paradigm for protein motion prediction that cannot be fairly evaluated using metrics designed for time-series analysis. Our geometric approach to motion prediction necessitates appropriate geometric metrics. We believe the field would benefit from this methodological advancement rather than constraining innovation to existing evaluation frameworks.
(1) tICA is not an appropriate metric, since PETIMOT does not generate time-dependent simulations. Please refer to our first answer for a detailed explanation about RMSD-based metrics. We did compute RMSD metrics, however we do not believe they are appropriate here for fair comparison with AlphaFlow and ESMFlow.
(2) While we acknowledge that some specialized studies achieve microsecond timescales, 100ns simulations represent the practical reality for most protein systems, are widely used in the literature for conformational sampling, and are sufficient to capture many functionally relevant conformational transitions. The ATLAS dataset provides a valuable compromise between simulation length and dataset diversity, allowing us to evaluate across hundreds of protein systems rather than a handful of exceptionally long simulations. Some of the ATLAS simulations against which we assessed PETIMOT contain large amplitude motions – max pairwise RMSD of 7.05 ± 4.83Å, up to 30.92Å. Regarding multi-state Cryo-EM resolved structures, we indeed have them in our dataset. A particularly interesting example is the ATPase NSF whose experimental structures correspond to ATP/ADP-bound states and 20S supercomplex conformations from cryo-EM studies (DOI: 10.7554/eLife.38888, DOI: 10.1038/nature14148). The functionally relevant motions involve large-amplitude rigid-body domain movements and loop rearrangements. The first linear PCA mode explains 57% of the variance and 4 modes are required to explain 90%. PETIMOT successfully captures this complex motion subspace with min LS error as low as 0.32 and global SS error of 0.30, demonstrating its ability to predict not just single motions but biologically meaningful motion subspaces.
(3) We would appreciate further clarification on this point. Our answer to Reviewer’s Weakness/Limitation 2 clearly demonstrates that PETIMOT outperforms ESMFlow on most motions in our dataset, being much faster at the same time. We believe this represents a clear methodological contribution that advances the field's ability to predict biologically relevant protein motions from structural data alone. We would be happy to provide additional analysis or address specific concerns if the reviewer could elaborate on what aspect of our comparison they find unconvincing.
This paper proposes PETIMOT, a SE(3)-equivariant graph neural network for inferring protein motions from sparse PDB data, claiming superior performance over baselines like AlphaFlow/ESMFlow and normal mode analysis (NMA). While reviewers appreciated the novel formulation and innovative loss functions, they raised concerns about evaluation metrics, dataset limitations, and biological relevance, with scores ranging from rejection to borderline acceptance. Upon reviewing the paper, rebuttals, and discussions, the AC identified several critical flaws that undermine the claims. First, the primary experimental comparison is unfair, as PETIMOT predicts motion direction vectors while AlphaFlow/ESMFlow generate complete 3D conformers—tasks of vastly different complexity. This renders the ~8000x speedup comparison (4 vectors vs. 50 full structures) meaningless, and the primary LS error metric (which PETIMOT optimizes directly) biases results toward it over methods trained for realistic structure generation. Moreover, the rebuttal reveals that the authors are aware that PETIMOT performs a different task from AlphaFlow/ESMFlow, yet they still choose metrics that unilaterally favor PETIMOT. Such misuse of baselines and unfair metrics cannot be justified by the authors' claim of "introducing a new paradigm." Second, the evaluation lacks comprehensiveness by omitting comparisons with molecular dynamics (MD) simulations or other physics-based dynamics methods beyond NMA. While the authors added ATLAS dataset evaluation in their rebuttal, they only demonstrated PETIMOT's performance on MD benchmarks without actually comparing against MD methods or other physics-based approaches. Third, Figure 3a's sorting by PETIMOT's LS error produces random distributions for baselines, indicating the metric fails to enable meaningful cross-method comparisons and actually weakens the authors' validity claims. Fourth, while the authors cite references to address the validity of using sequence-clustered experimental structures for dataset construction, they fail to explain why they exclusively use experimental data rather than simulation data. This means the significance of the fourth contribution claimed in the paper remains unclear. Given these issues and the mixed reviewer feedback, the AC recommends rejection.