RNA FrameFlow: Flow Matching for de novo 3D RNA Backbone Generation
摘要
评审与讨论
The paper introduces RNA-FRAMEFLOW, a novel generative model designed for de novo 3D RNA backbone generation. The model adapts SE(3) flow matching, previously used for protein backbone design, to handle RNA's structural complexities. RNA-FRAMEFLOW represents RNA backbones using rigid-body frames and leverages auxiliary losses to enhance training effectiveness. The model incorporates several evaluation metrics to ensure self-consistency and structural realism. Additionally, it introduces novel data preparation strategies to enhance both the diversity and novelty of generated RNA structures.
优点
- Successfully adapts SE(3) flow matching to RNA, addressing RNA’s complex and flexible conformations.
- It achieves a validity rate of 41%, indicating good global and local structural accuracy.
- Implements techniques like structural clustering and cropping augmentation to enhance training diversity, partially addressing the scarcity of 3D RNA datasets.
- Compared to diffusion models like MMDiff, RNA-FRAMEFLOW generates RNA backbones more quickly, making it more efficient in practical applications.
缺点
- The cropping strategy, meant to increase dataset diversity, can introduce unrealistic subsequences that may not fold correctly, reducing the validity of generated structures.
- The reliance on folding and inverse-folding models, which have their own biases, affects RNA-FRAMEFLOW’s overall performance and the validity of generated designs.
- It would be beneficial to include more detailed results on the generation process (see Q1).
- Further ablation studies could be conducted to evaluate the impact of the proposed auxiliary losses on model performance (see Q2).
- The scTM only reflect the self-consistency of C4’ atoms, other atoms should be also compared (see Q3).
问题
- As the number of timesteps increases, the performance in terms of validity, diversity, and novelty decreases. Could you analyze how the number of sampling steps impacts the quality of the generated samples?
- Could you conduct ablation studies to demonstrate how the added auxiliary losses contribute to performance improvements?
- The C3’ atom primarily reflects the accuracy of translational performance, while the aspects of rotations and angles are not assessed by the current validity metric. Could you compute the RMSD between the generated structure and the folded structure for a more comprehensive evaluation?
Dear Reviewer FB4S,
Thank you for your time and valuable feedback on our paper. We appreciate the criticisms which go into improving the quality of our submission. We respond to the questions and weaknesses brought up above. We have also uploaded a new PDF for your reference.
We ablate the number of sampling steps in Table 1 in Section 4.1. To summarize, increasing the number of sampling steps beyond an optimal amount does not yield any performance improvement. The number of sampling steps is conventionally a hyperparameter in flow matching that is tuned for a specific dataset and model – it is known that having more steps does not always mean higher quality samples – and we found the optimal steps to be .
Could we please clarify the difference between “number of sampling steps” vs “number of timesteps”? For our Flow Matching model, the number of sampling steps during inference is equal to the number of interpolation steps during training, which is the common practice.
Could you conduct ablation studies to demonstrate how the added auxiliary losses contribute to performance improvements?
Indeed. We provide ablations of the auxiliary losses and their impact on performance (ie, the local and global metrics) in Table 6 in Appendix C2. To do so, we drop/include each loss from our total training loss for each re-run and compute the evaluation metrics. We also visually show artifacts from generated samples when ablating each loss in Figure 8 in Appendix C2.
To summarize, we observe that including ALL the losses offers the best performance. As for the relative importance of each loss, we believe it is in this order: to place the frames with the right orientation in 3D, to ensure the frames are contiguous, and to ensure intra-frame atoms are placed properly with minimal clashes.
Could you compute the RMSD between the generated structure and the folded structure for a more comprehensive evaluation?
We include the scTM and scRMSD scores in Section 4.1 for our best-performing model. However, we believe RMSD might not be the best metric since our structure predictor, RhoFold, predicts backbones that may not perfectly correspond to the crystal structure, as shown in Figure 7 in Appendix B.2.
The reliance on folding and inverse-folding models, which have their own biases, affects RNA-FRAMEFLOW’s overall performance and the validity of generated designs.
We include an assessment of the loss in validity/self-consistency due to using gRNAde and RhoFold in Appendix B.3. We notice that RNA-FrameFlow generates backbones that closely match the validity of gRNAde sequences.
We hope this clarifies the questions. Please let us know!
Hello Reviewer FB4S,
We additionally include the scRMSD computed using the frame atoms {C4', O3, C3'} to factor in the rotation component, not just the translation:
| Sequence Length | Median scRMSD | Mean scRMSD | StdDev scRMSD |
|---|---|---|---|
| 40 | 3.81916 | 6.01004 | 4.15092 |
| 50 | 3.66592 | 9.36061 | 10.6806 |
| 60 | 5.12685 | 9.36558 | 9.98278 |
| 70 | 3.74416 | 6.88861 | 8.85833 |
| 80 | 6.06382 | 8.43335 | 5.95096 |
| 90 | 10.3897 | 11.718 | 6.74968 |
| 100 | 13.2794 | 13.5417 | 6.65053 |
| 110 | 14.3694 | 13.3328 | 6.43336 |
| 120 | 1.89943 | 3.03471 | 3.89094 |
| 130 | 17.4543 | 16.2355 | 5.82752 |
| 140 | 11.1751 | 13.067 | 4.90893 |
| 150 | 20.2802 | 20.2932 | 4.67098 |
We observe that these scRMSD values correlate well with those in our original submission. For sequence lengths with high % validity, we see relatively lower scRMSD values and higher scTM scores.
Do let us know if we can clarify anything else!
Dear Reviewer FB4S,
We also include the all-atom scRMSD values using all 13 backbone atoms along the generated chains:
| Sequence Length | Median scRMSD | Mean scRMSD | StdDev scRMSD |
|---|---|---|---|
| 40 | 4.2231 | 6.36835 | 3.88851 |
| 50 | 4.05122 | 9.73204 | 10.4827 |
| 60 | 5.4111 | 9.6583 | 9.85863 |
| 70 | 4.19358 | 7.27077 | 8.71512 |
| 80 | 6.20196 | 8.70575 | 5.78062 |
| 90 | 10.5345 | 11.8731 | 6.65065 |
| 100 | 13.2689 | 13.6572 | 6.5722 |
| 110 | 14.3971 | 13.4845 | 6.25824 |
| 120 | 2.82262 | 3.85901 | 3.67436 |
| 130 | 17.4762 | 16.3113 | 5.74994 |
| 140 | 11.2988 | 13.1918 | 4.85217 |
| 150 | 20.2991 | 20.3286 | 4.64406 |
We see similar correlations between the scTM scores from our paper.
Do let us know if we can provide any other information. Thank you!
This paper is the first to apply frameflow for RNA backbone generation, designing the structure of RNA. It also provides comprehensive experiments and reasonable analysis.
优点
This is the first paper to utilize frame flow for designing the RNA backbone. It also includes a complete experimental section.
缺点
Main Weaknesses:
- The innovation in the methodology section is limited. It applies frame flow to the backbone design of RNA. However, the application of frame flow to RNA design, in my view, lacks novelty, as frame flow is a model for generating frames (although the authors only provide experiments with proteins), and RoseTTAFoldNA has already offered a method to represent RNA as frames. Therefore, applying frame flow to RNA seems like a combination and does not appear to be a significant enough contribution to serve as the primary innovative point for a top-tier conference paper; it could only be considered a minor innovation and a baseline. However, this paper presents it as the main innovation. I think this paper needs to highlight some other insights, such as what problems arise when applying frame flow to RNA and how they should be addressed. The paper does indeed point out some insights, like how to address the issue of insufficient data and how to represent RNA as frames. However, I have questions about these insights. First, the lack of 3D structural data for RNA is a well-known problem [1], and it is not an insight. To solve this problem, the authors propose data augmentation methods, such as Structural clustering and Cropping augmentation. But these Structural clustering[3] and Cropping augmentation[2] are very common [2,3] and cannot be considered major insights or innovations.
[1] RDESIGN: HIERARCHICAL DATA-EFFICIENT REPRESENTATION LEARNING FOR TERTIARY STRUCTURE-BASED RNA DESIGN
[2] AlphaFold3 & AlphaFold2
[3] SE(3) diffusion model with application to protein backbone generation
- Another innovation in the methodological part is their proposal of a way to characterize RNA backbones as frames. However, I find this questionable. First, in RoseTTAFoldNA, a method to characterize RNA backbones as frames has already been proposed, and it is highly similar to the one presented in this paper. Therefore, this paper is not the first to propose such a method. Second, while RoseTTAFoldNA constructs frames based on the phosphate group (P, OP1, and OP2), this paper bases its construction on C3′−C4′−O4′. However, the authors do not provide compelling reasons or experimental results to demonstrate that choosing C3′−C4′−O4′ is a superior option. The authors do mention in the paper that C3′−C4′−O4′ is more centrally located. Nevertheless, even so, I do not think this would lead to a significant improvement in model performance, as it essentially just adds a fixed offset to the RNA, and neural networks should be capable of fitting this offset. Thus, I do not consider this a valid innovation. If, however, the authors can provide experimental results showing that using C3′−C4′−O4′ over P, OP1, and OP2 significantly enhances the model's performance, along with a compelling analysis of the underlying reasons, I would be willing to regard this as a major innovation.
Minor Weaknesses: To my knowledge, the reliability of RNA structure prediction results is not as high as that of protein structure predictions. As a result, although the experimental outcomes have some reference value, they are not very precise. Of course, I understand this is a difficult issue to resolve, and my evaluation of this paper does not hinge on this point. This issue has a minimal impact on my rating of the paper, and the authors can disregard it. I am merely pointing out this objective fact in the interest of rigor. It would be better if the authors could provide additional experimental data, such as some molecular dynamics software that can now estimate RNA energy, which would also serve as a helpful evaluation tool.
问题
See weaknesses
Dear Reviewer Bgio,
Thank you for your time and valuable feedback on our paper. We appreciate the criticisms which go into improving the quality of our submission. We respond to the questions and weaknesses brought up above:
RoseTTAFoldNA constructs frames based on the phosphate group (P, OP1, and OP2), this paper bases its construction on C3′−C4′−O4′…the authors do not provide compelling reasons…
We pick the atoms {C4’, C3’, O4’} to create the frame comprising the rigid rotations and translation of the C4’ atom. Aligning with the medicinal chemistry literature [1], we wanted the atoms to be close to the centroid of the nucleotide to prevent the accumulation of errors when autoregressively placing the non-frame atoms. It also allows us to implicitly capture intra-nucleotide motions like ring puckering that occurs in the ribose sugar ring containing our frame atoms. We document the choice of frame atoms in more detail in Section 2.1.
RF2NA’s choice of frame is actually completely arbitrary (no biological motivation) and has the following drawbacks: Assuming rigid/fixed geometry at P, OP1, OP2 is not supported by literature, unlike our choice of C4’, C3’, O4’ [1]. Placing the remaining atoms via predicted torsion angles w.r.t. the P atom can lead to large error accumulation as the P, OP1, OP2 (phosphate group) is on one extreme of a nucleotide.
I think this paper needs to highlight some other insights, such as what problems arise when applying frame flow to RNA and how they should be addressed.
In Appendix B.3., we now have a section on the upper bounds of performance of bringing over the self-consistency pipeline seen in protein design (as used by RFDiffusion [2] and FrameDiff [3]) and porting it to RNA design. Given the current state of RNA folding and inverse folding tools, we observe that the generated backbones from RNA-FrameFlow closely retain the validity and self-consistency of gRNAde sequences.
We also include ablations on the auxiliary losses that capture the more complex nucleotide structure (with 13 atoms) in Appendix C.1., documenting how naively applying FrameFlow to RNA design might not immediately yield results and requires introducing RNA-specific inductive biases that account for structural differences to proteins.
We hope this clarifies the questions. Please let us know!
[1] Ribose puckering: structure, dynamics, energetics, and the pseudorotation cycle. Harvey & Prabhakaran. 1986.
[2] De novo design of protein structure and function with RFdiffusion. Watson, J.L., Juergens, D., Bennett, N.R., Trippe, B.L., Yim, J., Eisenach, H.E., Ahern, W., Borst, A.J., Ragotte, R.J., Milles, L.F. and Wicky, B.I.. Nature, 620(7976), pp.1089-1100. 2023
[3] SE(3) diffusion model with application to protein backbone generation. Yim, J., Trippe, B.L., De Bortoli, V., Mathieu, E., Doucet, A., Barzilay, R. and Jaakkola, T.. 2023
Thank you for your reply. However, my issue remains unresolved. Regarding the selection of the phosphate group (P, OP1, and OP2), I have not seen any experimental results that demonstrate such a choice can significantly outperform RoseTTAFoldNA. I believe that this method of establishing frames will not show a significant difference in final performance compared to RoseTTAFoldNA. Moreover, this paper does not propose a completely new method for frame establishment; it merely adjusts the method used by RoseTTAFoldNA. Therefore, I think the contribution of this paper regarding the frame part is very limited and does not meet the criteria for acceptance.
Concerning other contributions, such as the self-consistency pipeline and auxiliary losses, these elements have already existed in previous protein design work and are not newly proposed methods; they are just being applied to RNA tasks. What I hope to see is an effective and previously unproposed method, rather than transplanting methods from other tasks onto RNA tasks. If the authors have indeed proposed such a method, I hope they could emphasize it in their next response.
Regarding the experiment in Appendix C.1, I understand that it is an experiment about auxiliary supervision. However, this experiment yields a rather awkward result: although adding non-frame supervision improves Validity, the Diversity and Novelty drop significantly.
This paper proposes RNA-FrameFlow, a flow matching framework for RNA backbone generation. The authors utilize three key atoms to determine the frame of each nucleotide, and leverage the FrameFlow[1] method with proper auxiliary losses to generate 3D backbone structures. The authors establish a generation pipeline for benchmarking the performance of RNA generation models, and results demonstrate the proposed models' capability to yield valid backbone structures with specific lengths.
[1] Yim, Jason, et al. "Fast protein backbone generation with SE (3) flow matching."
优点
- This paper focuses on a novel problem of RNA backbone generation, and is the pioneer to represent RNA structures as frames.
- The experiments are thorough. Notably, the paper provides a detailed exploration of the limitations, offering insightful analyses and preliminary solutions. This openness about current challenges reflects a constructive effort toward advancing the field, even if these issues remain to be fully resolved.
缺点
- For the evaluation pipeline, self-consistency scores could be affected by cumulative errors introduced by the inverse folding and structure prediction models. To better understand the potential limitations, it would be helpful to estimate an upper bound for this pipeline. One approach could involve sampling a similar distribution of structures from the test set and computing self-consistency scores using the inverse folding–structure prediction pipeline.
- As the model may exhibit bias toward common patterns in the training set (tRNAs or 5S ribosomal RNAs), it would be beneficial to provide the training distribution and compare it with the TM-score distribution across different sequence lengths for better analysis.
问题
- Does the selection of frame atoms involve some rigid assumptions? If so, how do these assumptions impact the results, and are there comparisons with other possible selections?
- Could you provide the average scTM for each method in addition to the validity metrics? The validity may be sensitive to the chosen threshold (0.45).
Dear Reviewer Dcv1,
Thank you for your time reading our submission and providing feedback on enhancing the quality of our work. We respond to the critiques brought up above. We have uploaded a new PDF for your reference.
Does the selection of frame atoms involve some rigid assumptions?
Yes. We pick the atoms {C4’, C3’, O4’} to create the frame comprising the rigid rotations and translation of the C4’ atom. Aligning with the medicinal chemistry literature [1], we wanted the atoms to be close to the centroid of the nucleotide to prevent the accumulation of errors when autoregressively placing the non-frame atoms. It also allows us to implicitly capture intra-nucleotide motions like ring puckering that occurs in the ribose sugar ring containing our frame atoms. We document the choice of frame atoms in more detail in Section 2.1.
Could you provide the average scTM for each method in addition to the validity metrics?
We have included the average scTM and scRMSD metrics in Section 4.1 (colored red). We have also included the per-sequence-length scTM and validity in Figure 8 in Appendix B.3..
It would be beneficial to provide the training distribution and compare it with the TM-score distribution across different sequence lengths for better analysis.
We provide this in Figure 3 in Section 4.1. We show the scTM scores across sequence lengths. Do let us know if there’s something specific we can elaborate on here. We also include the average scTM and scRMSD scores for our generated backbones in Section 4.1.
One approach could involve sampling a similar distribution of structures from the test set and computing self-consistency scores using the inverse folding–structure prediction pipeline.
We thank the reviewer for this suggestion. We have now included a section on assessing our self-consistency pipeline in Appendix B.3 (colored red), quantifying how well gRNAde and RhoFold influence the validity of samples. We apply our pipeline to ground truth samples from RNAsolo to measure recovery.
We hope this clarifies the questions. Please let us know!
[1] Ribose puckering: structure, dynamics, energetics, and the pseudorotation cycle. Harvey & Prabhakaran. 1986.
Thanks for the detailed response. I would like to keep my origin score for acceptance and raise my confidence to 4.
Thank you, Reviewer Dcv1. Do let us know if there's anything else that can be clarified!
This paper introduces a generative model for designing the tertiary structure backbone of RNA, offering a novel approach to structure-based RNA design. The authors establish a set of evaluation metrics to assess the designability and rationality of the generated structures. Experimental results demonstrate that the proposed model can successfully generate reasonable designs for RNA of a certain length.
优点
- This is the first generative model specifically proposed for RNA 3D backbone design.
- A comprehensive set of evaluation metrics is defined to assess the quality of generated RNA structures.
- The flow matching technique, originally applicable only to proteins, is extended to RNA, contributing at the engineering level.
- The proposed model demonstrates successful design outcomes on several common RNA types.
- The writing logic of the paper is clear and straightforward, making it easy to follow.
缺点
- The paper employs gRNAde for inverse folding to verify the quality of RNA backbone generation from a self-consistency perspective. However, the inherent capabilities and errors of gRNAde may significantly impact this evaluation metric. This should be thoroughly explained and discussed in the paper.
- Unlike the protein field, RNA structure prediction accuracy is relatively low, with many challenges remaining. Relying solely on RhoFold for RNA structure prediction may introduce bias. It would be more comprehensive and reasonable to incorporate additional structure prediction models, such as trRosettaRNA and Alphafold3, to demonstrate performance.
- For the Earth Mover's Distance (EMD) in Table 1, the results of the 50/50 split of the training set should be included as a reference for the numerical scale. The numerical range of EMD varies greatly and is easily influenced by hyperparameter settings.
- From an RNA biology perspective, the current diversity metric is relatively meaningless. A more effective approach might be to explore whether the generated structures encompass diverse RNA types or families (e.g., by calculating structural or sequence similarity), rather than focusing on the most common tRNA and rRNA.
- The current quality assessment of the generated structure includes both global and local geometry, which can be further enhanced by further checking whether the generated RNA structures contain some common tertiary interaction motifs, such as pseudoknots and base multiplets, rather than just base pairing and stacking.
- For the case where clashes appear in the generated 3D backbones, there is a lack of quantitative metrics to illustrate the frequency of this phenomenon, such as the average proportion of nucleotides with clashes in RNA.
问题
Please refer to the details in Weaknesses.
- Could you elaborate on the limitations and potential errors of gRNAde that might impact the self-consistency evaluation of backbone generation? How do these factors influence the reliability of your results?
- Consider incorporating results from multiple structure prediction models to provide a more comprehensive evaluation of your method's performance.
- Include the results of the 50/50 split calculation of the training set as a reference to provide a clearer understanding of the EMD's numerical scale.
- Explore whether the generated structures include diverse RNA types or families by calculating structural or sequence similarity to enhance the diversity metric's biological significance.
- Enhance the quality evaluation by checking for common tertiary interaction motifs to provide a more comprehensive evaluation of the generated RNA structures.
- How frequently do the generated 3D backbones exhibit clashes, and what is the average proportion of nucleotides involved in these clashes?
Dear Reviewer WjH5,
Thank you for your time reading our submission and providing feedback on enhancing the quality of our work. We respond to the critiques brought up above. We have also updated our submission PDF.
Could you elaborate on the limitations and potential errors of gRNAde that might impact the self-consistency evaluation of backbone generation?
We include an assessment of gRNAde and RhoFold on the RNAsolo training set to see how it influences self-consistency in Appendix B.3. We observe that the generated backbones from RNA-FrameFlow closely retain the validity and self-consistency of gRNAde sequences.
Additionally, gRNAde has been experimentally validated in the wet lab with a success rate of 50% at forming the target backbone structure across diverse RNAs; see https://openreview.net/forum?id=lvw3UgeVxS – this is close to our computationally determined validity rates at different sequence lengths.
Include the results of the 50/50 split calculation of the training set as a reference to provide a clearer understanding of the EMD's numerical scale.
We now include EMD scores for the local structural descriptors from the 50/50 training set random split in Table 2 in Section 4.2 (colored red). We see very high similarity and low variance in the statistics from both these sets.
Consider incorporating results from multiple structure prediction models to provide a more comprehensive evaluation of your method's performance.
We use Chai-1 [1] as an alternative structure predictor and include the self-consistency results in Table 7 in Appendix C.3. We see similar validity scores to RhoFold across sequence lengths.
Additionally, several recent benchmarks of 3D RNA structure prediction (latest example) have found that open-source models such as RhoFold, trRosettaRNA, RF2NA, etc. all perform very similarly due to architectural and training data similarities, with minor differences for specific targets/families.
We have also recently obtained the academic-use weights for AlphaFold3 and are incorporating it into our pipeline.
How frequently do the generated 3D backbones exhibit clashes, and what is the average proportion of nucleotides involved in these clashes?
We include a detailed analysis of steric clashes in Appendix D.4. We describe how we define a steric clash (aligning with popular tools like MolProbity [2]) and measure the clashes from RNAsolo and the generated backbones across sequence lengths.
A more effective approach might be to explore whether the generated structures encompass diverse RNA types or families (e.g., by calculating structural or sequence similarity)
We also use a metric called pdbTM to show how structurally different our generated backbones are from existing samples in RNAsolo, which aligns with the reviewer’s comments on “calculating structural similarity”. The best-performing RNA-FrameFlow model has an average pdbTM of 0.54 (1 being perfect). We define how we measure pdbTM in Section 3 on evaluation metrics.
can be further enhanced by further checking whether the generated RNA structures contain some common tertiary interaction motifs, such as pseudoknots and base multiplets, rather than just base pairing and stacking.
These interaction motifs like pseudoknots, bulges, etc are determined by interactions between the atoms in the bases, and not the backbone atoms. The reviewer’s comment on this is valid but we cannot compute this as we only deal with backbone atoms, not base atoms. We have included structural clustering as a data augmentation that only deals with the tertiary structure and not any base interactions.
We hope this clarifies the questions. Please let us know!
[1] Chai-1: Decoding the molecular interactions of life. Chai Discovery, Boitreaud, J., Dent, J., McPartlon, M., Meier, J., Reis, V., ... & Wu, K. 2024.
[2] MolProbity: More and better reference data for improved all-atom structure validation. Williams et al. 2018.
Thanks for the author's response, which partially addressed my concern. I will raise my score to 6 due to the improvement in the quality of the paper.
Thank you for your time and consideration! Do let us know if there's anything we can clarify.
The paper presents RNA-FrameFlow, a novel generative model tailored to design de novo 3D RNA backbones. Building on SE(3) flow matching methods developed for protein backbone modeling, the authors introduce an adapted framework to accommodate RNA’s structural flexibility, characterized by 13 backbone atoms per nucleotide versus the typical four in proteins. They establish a protocol for handling RNA-specific challenges like limited 3D structural data, which they address with structural clustering and augmentation. Evaluation includes newly defined metrics for assessing the self-consistency and structural fidelity of generated RNA backbones. RNA-FrameFlow outperforms baselines like MMDiff in validity, diversity, and sampling efficiency, showing potential in RNA structural modeling.
优点
-
Adapting the SE(3) flow matching approach to RNA is novel and technically complex. The model navigates challenges in structural flexibility and the need for RNA-specific design by introducing RNA-tailored frame parameters and torsion-based atom placement.
-
Development of comprehensive evaluation protocols for RNA backbone generation, including both local and global structural metrics.
-
Figures, particularly those explaining the RNA frame, frame transformations, and evaluation metrics, effectively illustrate the approach and results. The structural consistency and Ramachandran-like angle plots are insightful, showing the model’s fidelity in RNA structural features.
缺点
-
The model’s training set, derived from a limited pool of 3D RNA structures, restricts RNA-FRAMEFLOW’s generalization to less frequent RNA folds. While clustering and cropping augmentations aim to mitigate this, they lead to a decrease in sample validity. Additionally, over-represented lengths like those common to tRNA and 5S rRNA may cause the model to rely on memorized structural features, limiting its novelty in unseen scenarios.
-
The evaluation pipeline uses RhoFold for forward folding, but RhoFold’s length bias favors structures with specific lengths, potentially skewing evaluations for other sequence lengths. The reliance on scTM and pdbTM as primary evaluation metrics may thus disproportionately favor over-represented sequence lengths, and alternative, less biased evaluation methods could improve fairness.
问题
-
How might explicit modeling of RNA base pairing and stacking interactions (e.g., incorporating hydrogen bonding) impact RNA-FRAMEFLOW’s ability to produce realistic structures?
-
Given the length bias of RhoFold, do the authors have plans to explore alternative evaluation tools that might yield less length-biased results?
-
Could the clustering and cropping data augmentation methods be adjusted to avoid introducing invalid folds? How might alternative data augmentation methods improve diversity without sacrificing validity?
Dear Reviewer Vgi9,
Thank you for your time reading our submission and providing feedback on enhancing the quality of our work. We respond to the critiques brought up above. We have also uploaded a new PDF for your reference.
The model’s training set, derived from a limited pool of 3D RNA structures, restricts RNA-FRAMEFLOW’s generalization ... limiting its novelty in unseen scenarios.
The points raised in the Weaknesses are valid, however, we believe they can be characterized as general limitations of our current pipeline which we have honestly brought to light in our submission. We hope the broader community will work towards addressing some of these interesting scientific questions about data preparation and rigorous evaluation of generative models for relatively niche, non-protein biomolecules like RNA.
How might explicit modeling of RNA base pairing and stacking interactions (e.g., incorporating hydrogen bonding) impact RNA-FRAMEFLOW’s ability to produce realistic structures?
We have tried incorporating base stacking and pairing by including the base atom N1/N9 in our auxiliary backbone losses. This leads to the model’s loss emphasizing more on the coordinates as well as pairwise distances between nearby N atoms (which signify the starting position of the base) – hence explicitly emphasizing base pairing and stacking interactions. The results are included in Table 5 in Appendix C.1. For the loss including the Nitrogenous base atom, we see an increase in validity (ie, realism) from 41.0% to 46.7% but with a decrease in diversity and novelty. We believe combining such an approach with data augmentations and novel folds could mitigate the hit to diversity and novelty.
Given the length bias of RhoFold, do the authors have plans to explore alternative evaluation tools that might yield less length-biased results?
Yes. Recently, two open-source RNA structure prediction tools have emerged, Chai-1 [1] and Boltz-1 [2]. We compare to Chai-1, a recent open-source structure prediction tool with similar results to AlphaFold2. We include these self-consistency results in Table 7 in Appendix C.3. We do not notice a significant change in validity scores across sequence lengths. In fact, several recent benchmarks of 3D RNA structure prediction (latest example) have found that open-source models such as RhoFold, trRosettaRNA, RF2NA, etc. all perform very similarly due to architectural and training data similarities, with minor differences for specific targets/families.
We have also acquired AlphaFold3’s academic-access weights and are incorporating it into our self-consistency pipeline.
Could the clustering and cropping data augmentation methods be adjusted to avoid introducing invalid folds?
Yes, we are currently experimenting with a cropping augmentation that refolds the extracted subsequence during batching rather than blindly snipping it from the original structure. We believe this may introduce more diverse and realistic folds into the training set. However, this is still limited by the availability of an accurate structure predictor.
We hope this clarifies the questions. Please let us know!
[1] Chai-1: Decoding the molecular interactions of life. Chai Discovery, Boitreaud, J., Dent, J., McPartlon, M., Meier, J., Reis, V., ... & Wu, K. 2024.
[2] Boltz-1: Democratizing Biomolecular Interaction Modeling. Wohlwend, J., Corso, G., Passaro, S., Reveiz, M., Leidal, K., Swiderski, W., ... & Barzilay, R. bioRxiv, 2024-11. 2024.
[3] Accurate structure prediction of biomolecular interactions with AlphaFold 3. Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., ... & Jumper, J. M. Nature, 1-3. 2024.
Thank you for the author's response and the additional supporting experiments addressing my questions. While these partially resolved my concerns, I will maintain my original score for acceptance and increase my confidence level to 3.
Thank you — we noticed that your confidence has yet to be updated. Do you mind increasing it?
Sorry, my mistake. It should be updated now.
The paper addresses RNA backbone generation for which it curates training data and designs evaluation protocols with several metrics. As the methodology it adopts the recent generative model for protein generation, FrameFlow by characterizing RNAs as frames and increasing 3D RNA structural data by data augmentation. The results, across various metrics, demonstrate meaningful results.
Reviewers acknowledge the novelty of the design of RNA tertiary structures, the adaptation of FrameFlow to RNAs with additional augmentations to mitigate low structural data, the comprehensive evaluation setup, and the clarity of the writing and presentation.
On the other hand, their main concerns include limited technical novelty as the proposed method is an adaptation of an existing method to RNA with standard techniques for frame representation and data augmentation, the reliability of the evaluation metrics e.g., the reliance on computational structure prediction and inverse folding methods for evaluation that can be inaccurate or biased for RNAs, the ability to generate unobserved length and structures, and the validity of the augmented dataset.
As the paper is borderline, the AC carefully considered the paper, the initial reviews, the authors’ rebuttals, and the discussions with the authors and afterwards. Overall, the AC believes the paper has clear merits as it would be the first to start a line of research which has important applications. However, in the current form, the paper does not meet the acceptance bar at ICLR as for a technical paper, it lacks sufficient technical novelty and for an application paper it lacks to offer unknown empirical insights. So, the AC recommends rejection.
The AC recommends submitting an improved submission to a next venue based on the wealth of fair, precise and thorough feedback provided by the expert reviewers both on the accept and reject sides. As a first application paper in this direction without technical novelty as the main focus, the AC also suggests benchmarking different (existing) aproaches to each problem when moving from proteins to RNAs to create some more empirical insights for a reader.
审稿人讨论附加意见
Five expert reviewers have reviewed the paper thoroughly, offering excellent feedback on the strengths and weaknesses of the work. The authors provided a thorough rebuttal to each review where some concerns regarding generalization from the training data were mitigated and some new experiments were provided to analyze the bias of the computational prediction models in the evaluation. However, after the discussion, while 3 reviewers were more satisfied and leaned on the accept side, two reviewers remained concerned about the lack of technical novelty and clear significant insights. The AC, after careful consideration, sided with the latter group and believes the paper can become much stronger incorporating the feedback in formulating a next version.
Reject