CryoGEM: Physics-Informed Generative Cryo-Electron Microscopy
摘要
评审与讨论
In this paper, the authors introduce a method to generate large annotated cryo-EM datasets from a small number (100) of real micrographs. The method combines a physics-based model of the image formation model with a contrastive learning strategy. The authors show that their method can be used to improve the quality obtained with downstream tasks such as particle picking, homogeneous reconstruction and pose estimation.
优点
Although acquiring more data in cryo-EM is usually easy, getting access to annotated data is hard. CryoGEM combines a physics-based model with a contrastive learning strategy to generate realistic annotated data. This is particularly interesting because, due to the low SNR in cryo-EM images, most downstream tasks often significantly benefit from pretraining (pose estimation) or finetuning (particle picking). The method introduced in this paper holds the potential to improve the accuracy of cryo-EM reconstruction pipelines. Notably, the authors explicity showed that cryoGEN can improve the performances obtained on downstream tasks.
The particularly appreciated the following points:
- the method is described in a clear way ;
- the experiments are fully described and seem reproducible ;
- all the contributions claimed are illustrated with an experiment -- I appreciated the effort made by the authors to evaluate the quality of particle picking, pose estimation and homogeneous reconstruction after cryoGEM ;
缺点
I find that some parts of Section 3 (description of the method) lack clarity and that information on the 3D models used by the simulator of cryoGEM are missing (see "Questions").
问题
Mutual information extraction. This paragraph of the method was unclear to me. Why are and not indexed by while is indexed by in (5)? Why does (5) correspond to the probability of "selecting" a positive sample?
Origin of coarse models. CryoGEM needs to a coarse 3D model to generate synthetic images. For the experiments conducted in this paper, where do these models come from? I did not find this information in the paragraph "Datasets".
Resolution of coarse models. What is the resolution of the coarse models used in this paper? What is the influence of the resolution of the initial model on the accuracy obtained on downstream tasks?
Pose accuracy What does correspond to in Eq (15)?
局限性
Yes, limitations and potential negative impacts are discussed in the paper.
Thank you for your appreciation and insightful comments. We will improve the clarity of the paper based on them.
Clarification of Notations
Thank you for thoroughly reading our paper and the supplementary material. We truly appreciate your suggestions.
For mutual information extraction (Equation 5), we followed CUT [1], where represent an arbitrary pair of feature vectors of the query and the positive sample from the whole set. However, in our case, we have pairs, and it will be clearer to rewrite to . We will fix this in the revised version.
The "selecting" operation in Equation 5 corresponds to a multi-class classification task, aiming to maximize the probability of correctly matching the corresponding positive from a set of it and negatives, given the query . Therefore, we formulate Equation 5 to minimize a -class cross-entropy loss.
In Equation 13, is a unit vector .
We will clarify all of these points in the revision.
[1] Park T, Efros A A, Zhang R, et al. Contrastive learning for unpaired image-to-image translation[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16. Springer International Publishing, 2020: 319-345.
The Origin of Coarse Models
Thank you for your attention. We will add this description to the Datasets section for better clarity. The coarse 3D model is obtained by running an ab-initio reconstruction of CryoSPARC with its default setting, followed by cryoDRGN to handle heterogeneous cases. We train cryoDRGN for 50 epochs using particles at a resolution aligned with the low-resolution 3D volume.
I thank the authors for providing clarifications. They have addressed my main concern in the rebuttal. I will keep my positive rating.
The paper introduces CryoGEM, an innovative method combining physics-based cryo-EM simulation with unpaired noise translation via contrastive learning to generate high-quality synthetic cryo-EM datasets. The approach significantly improves the visual quality of generated images and enhances downstream tasks like particle picking and pose estimation, leading to better 3D reconstructions.
优点
1 Extensive experiments demonstrate that CryoGEM produces high-quality synthetic cryo-EM images that significantly outperform existing methods like CycleGAN and CUT. The visual quality of the generated images is notably superior, preserving structural details and realistic noise patterns.
- The synthetic datasets generated by CryoGEM improve the performance of downstream tasks, such as particle picking and pose estimation. The paper reports substantial improvements in these tasks, leading to better resolution in the final 3D reconstructions.
缺点
The physics-based simulation in CryoGEM relies on a coarse result as an input. This requirement can be a significant limitation in scenarios where obtaining a reliable coarse result is challenging, such as with very small or highly dynamic molecules.
问题
Is there any reference indicating whether the Gaussian noise distribution accurately represents the actual physical noise?
In practice, it is relatively easy to obtain a large number of observed samples of the target image in transmission images. Even if the proposed approach enhances the results, can we still easily access more samples of the target particle with minimal effort?
局限性
See Weakness
Thank you for your thoughtful suggestions. We will improve the paper based on them.
On the Relationship between Gaussian and Actual Physical Noise
We follow the common practice in the literature of using Gaussian noise to model the reconstruction problem in cryo-EM. For example, cryoDRGN [1] and RELION [2] model image noise in the Fourier domain as zero-mean, independent Gaussian distributed noise. The inverse Fourier transform of this noise in the real domain is also i.i.d. Gaussian noise.
We'd like to stress that CryoGEM can also accommodate other noise models, such as signal-dependent Poisson noise [3], by replacing the current Gaussian noise model with the new one during training. If space permits, we will include an example of this.
[1] Zhong E D, Bepler T, Davis J H, et al. RECONSTRUCTING CONTINUOUS DISTRIBUTIONS OF 3D PROTEIN STRUCTURE FROM CRYO-EM IMAGES[C]//8th International Conference on Learning Representations, ICLR 2020. 2020.
[2] Scheres S H W. RELION: implementation of a Bayesian approach to cryo-EM structure determination[J]. Journal of structural biology, 2012, 180(3): 519-530.
[3] Vulović M, Ravelli R B G, van Vliet L J, et al. The Supplementary Material of Image formation modeling in cryo-electron microscopy[J]. Journal of structural biology, 2013, 183(1): 19-32.
On the Effort to Capture More Samples
We agree that capturing more micrographs can improve the resolution of the final results. However, this approach leads to longer capture time on expensive cryo-EM equipment and demands substantial computational resources for iterative optimizations, often taking several days for a human expert.
Complementarily, CryoGEM aims to enhance downstream results without the need for a large amount of raw data, thereby improving the final resolution of resolved structures. This aligns with the recent generation-based reconstruction approaches [1], where sparse view reconstruction is achieved by generative models.
In terms of resource consumption, CryoGEM is a lightweight generative model that can be trained on 100 real micrographs using a single NVIDIA RTX 3090 GPU in just two hours. After training, it can rapidly generate annotated synthetic datasets with minimal additional cost. Therefore, the best practice should combine efficient data capture with advanced data processing, such as inputting as many as possible samples into a CryoGEM-improved pipeline for optimal reconstruction results.
[1] Wang S, Leroy V, Cabon Y, et al. Dust3r: Geometric 3d vision made easy[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 20697-20709.
Thanks for the authors' feedback. I will keep my positive rating.
In this paper, the authors introduce Physics-Informed Generative Cryo-Electron Microscopy (CryoGEM), a novel generative model for cryo-electron microscopy (Cryo-EM) micrographs. CryoGEM is trained to produce micrographs that accurately replicate the ice gradient, point spread function (PSF), and noise characteristics of experimental micrographs. The model offers two main applications in Cryo-EM analysis through the annotations provided by the generated micrographs: a) The generated micrographs include precise positional annotations for particles (2D projections of proteins in the micrograph). b) In addition to positional information, CryoGEM provides data on particle orientations and conformations. This information can be utilized with methodologies such as CryoFIRE to train models that distinguish particle orientations and conformations in experimental micrographs. The authors propose this innovative generative model to assist Cryo-EM data analysts in a) Fine-tuning particle picking models, and b) Training deep learning models for heterogeneous 3D reconstruction. This approach has the potential to significantly enhance the particle-picking process in experimental micrographs, ultimately leading to improved 3D reconstructed volumes of proteins.
优点
Originality: CryoGEM presents an innovative approach by integrating physics-informed modeling with generative techniques, addressing a significant gap in cryo-EM analysis. Quality: The study features well-designed experiments that demonstrate the model's capabilities. The application of CryoFIRE provides additional validation of CryoGEM's practical utility. Clarity: The paper is articulated precisely, offering detailed explanations and a well-structured layout that enhances reader comprehension. Significance: CryoGEM shows considerable potential to impact cryo-EM analysis significantly. It provides valuable tools that can improve particle picking accuracy and the quality of 3D reconstruction processes, potentially advancing structural biology research.
缺点
Code Availability: Currently there is no code availability, which may limit the accessibility of CryoGEM to the broader research community.
Resolution and Time Requirements: There is a lack of detailed discussion on the resolution of the initial coarse cryo-EM density map obtained from the ab-initio reconstruction of CryoSPARC, as well as the time required for this process.
Pipeline Efficiency: The time consumption for fine-tuning Topaz through CryoGEM raises concerns about the practicality of the proposed pipeline compared to manually picking a small number of micrographs.
Comparison with Template Matching: The particle picking approach of CryoGEM is not compared with template matching from the coarse cryo-EM input map, which could provide a more comprehensive evaluation of its advantages.
Incomplete Quantitative Comparisons: The quantitative comparisons for pose estimation are incomplete without the fine-tuning of CryoFIRE using pose estimations of particles from the coarse cryo-EM map input from CryoSPARC.
问题
Originality: CryoGEM presents an innovative approach by integrating physics-informed modeling with generative techniques, addressing a significant gap in cryo-EM analysis. Quality: The study features well-designed experiments that effectively demonstrate the model's capabilities. The application of CryoFIRE provides additional validation of CryoGEM's practical utility. Clarity: The paper is articulated with precision, offering detailed explanations and a well-structured layout that enhances reader comprehension. Significance: CryoGEM shows considerable potential to impact cryo-EM analysis significantly. It provides valuable tools that can improve both particle picking accuracy and the quality of 3D reconstruction processes, potentially advancing structural biology research.
局限性
The authors have adequately addressed the limitations of their work, discussing potential areas for future improvements, such as generalizing the model to different experimental conditions and the need for a coarse cryo-EM map as an input, which may be hard to produce. However, a more explicit discussion on the accessibility of CryoGEM to the wider research community and the efficiency of the proposed pipeline would strengthen the paper, as well as the code availability.
Thank you for your valuable suggestions. We appreciate the opportunity to address these concerns.
On the Pipeline Efficiency
We acknowledge that CryoGEM indeed takes longer than manual annotating, as shown in the following table. However, particle picking is a tedious, labor-intensive, and time-consuming task for technicians. Even with a blob picker, identifying target particles from the candidates requires excessive effort. CryoGEM enhances productivity by automatically generating annotated data for particle picking with high precision, although at the expense of speed.
The bottleneck in CryoGEM's pipeline is the particle labeling (generation) time, which includes ab-initio reconstruction to get the coarse volume, as well as the training and inference time. Currently, CryoGEM's training and inference are conducted on a single RTX 3090 GPU. To improve speed, we are developing a more parallelized GPU version, which could potentially accelerate the process by an order of magnitude. Additionally, AlphaFold3 serves as a potential alternative to CryoSPARC, which could further speed up ab-initio reconstruction.
| Method | Labeling Time | Topaz Fine-tune Time | Reconstruction Time | Total Time | AUPRC (↑) | Res. () |
|---|---|---|---|---|---|---|
| Manual | 1h26m41s | 16m27s | 1h41m7s | 3h24m15s | 0.776 | 3.59 |
| Blob Picker | 14m31s | 12m12s | 2h26m33s | 2h53m16s | 0.684 | 4.57 |
| Ours | 2h56m39s | 10m0s | 2h28m13s | 5h34m52s | 0.796 | 3.25 |
Comparison with Template Matching
The particle picking approach of CryoGEM is not compared with template matching from the coarse cryo-EM input map, which could provide a more comprehensive evaluation of its advantages. Thank you for the suggestion. It is indeed an excellent point. The following table shows the suggested quantitative comparison of our finetuned Topaz (Ours, by CryoGEM's synthetic annotated datasets) with cryoSPARC’s template-based matching method (Template Picker, using the coarse volume as an input). CryoGEM consistently outperforms the Template Picker in nearly all examples in both AUPRC and resolution, except for the AUPRC of Proteasome and the resolution of Integrin. We will add Template Picker as a picking baseline in the revision.
| Metric | AUPRC (↑) | Res (, ↓) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method/Dataset | Proteasome | Ribosome | Integrin | PhageMS2 | HumanBAF | Avg. | Proteasome | Ribosome | Integrin | PhageMS2 | HumanBAF | Avg. |
| Template Picker | 0.547 | 0.742 | 0.585 | 0.873 | 0.393 | 0.628 | 2.71 | 3.58 | 4.95 | 9.63 | 10.91 | 6.36 |
| Ours | 0.490 | 0.797 | 0.606 | 0.915 | 0.562 | 0.674 | 2.68 | 3.25 | 5.54 | 7.16 | 7.74 | 5.27 |
Comparisons with Pre-trained CryoFIRE
Thank you for your insightful comments. We did not include this comparison because we focused on evaluating the performance improvement between our method (Ours) and the original CryoFIRE. However, it is indeed more convincing to compare our method with the pre-trained cryoFIRE (CryoFIRE*) using the particle images whose poses are estimated at the ab-initio reconstruction stage. As shown in the following table, the pre-trained cryoFIRE exhibits a reasonable performance improvement compared to the original CryoFIRE. However, our method still outperforms both schemes by a large margin in most cases.
| Metric | Res. (px. ↓) | Rot. (rad. ↓) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | Proteasome | Ribosome | Integrin | PhageMS2 | HumanBAF | Average | Proteasome | Ribosome | Integrin | PhageMS2 | HumanBAF | Average |
| CryoFIRE | 5.94 | 16.92 | 13.87 | 17.23 | 6.98 | 12.18 | 1.55 | 0.64 | 0.93 | 0.75 | 1.53 | 1.08 |
| CryoFIRE* | 2.96 | 8.09 | 7.95 | 3.89 | 9.04 | 6.386 | 1.43 | 0.52 | 0.73 | 0.69 | 1.54 | 0.98 |
| Ours | 2.59 | 4.27 | 4.88 | 5.54 | 6.55 | 4.29 | 0.41 | 0.32 | 0.88 | 0.43 | 1.42 | 0.69 |
On the Code Availability
All of our code and datasets will be released to the public upon acceptance for further evaluation of CryoGEM. In the meantime, we will provide an anonymous version of our code for training and evaluating CryoGEM to the area chair.
After considering the authors' responses and reviewing the other evaluations and rebuttals, I've decided to maintain my original assessment.
Global Response
We are grateful that all reviewers recognize that CryoGEM showcases the usefulness of generative AI in structural biology. We will soon release the code and the data for the community to experiment with and improve. The reviewers have provided many insightful suggestions that we will incorporate into the revision. As follows, we first address the common question raised by the reviewers and then their comments.
On the resolution of the coarse model (w8uj, D477, B6fS)
The reviewers are correct that CryoGEM uses a coarse model as an input, which can be obtained by ab-initio reconstruction, e.g., CryoSPARC. In our experiments, the resolution of coarse models ranges from 7.05 to 21.22. We have shown that even using such low resolutions as initializations, CryoGEM can still achieve an accurate location, pose, and conformation-controlled data generation (we kindly refer to Figure 8 in our paper for visual illustrations). The generated data significantly benefits downstream tasks such as particle picking and pose estimation, subsequently improving the reconstruction's final resolution. The following table shows the ab-initio and final resolutions of several examples, as well as the time required for ab-initio reconstruction.
It is worth mentioning that ab-initio reconstruction may fail on the first attempt. A common practice is to conduct another round of data collection or cleaning to obtain a better ab-initio initialization. In extreme cases where it completely fails, a potential solution is to leverage structure prediction models, such as AlphaFold3[1], to initialize a density volume as a de facto ab-initio reconstruction. This approach is part of our immediate future work. We will clarify these points in the revision.
| Dataset | Res. of the coarse model (, ↓) | Time of ab-initio reconstruction | Final Resolution (, ↓) | |
|---|---|---|---|---|
| Proteasome | 7.79 | 1h0m27s | 2.68 | |
| Ribosome | 7.05 | 1h56m39s | 3.25 | |
| Integrin | 9.65 | 1h24m21s | 5.54 | |
| PhageMS2 | 11.78 | 1h6m28s | 7.16 | |
| HumanBAF | 21.22 | 42m49s | 7.74 |
Note that the resolution of the coarse model is calculated by splitting the real particle dataset into halves and conducting ab-initio reconstruction independently [2]. The coarse models are then aligned to calculate the spatial resolution in CryoSPARC. For the heterogeneous dataset, Integrin, we utilized cryoDRGN to obtain the neural volume. This process took 8 hours, 5 minutes, and 24 seconds after the ab-initio reconstruction.
[1] Abramson, J., Adler, J., Dunger, J., et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). https://doi.org/10.1038/s41586-024-07487-w
[2] Van Heel M, Schatz M. Fourier shell correlation threshold criteria[J]. Journal of Structural Biology, 2005, 151(3): 250-262.
(5,5,7) In this paper, the authors introduce a method to generate synthetic cryo-EM images from a small number (100) of real micrographs. The method combines a physical simulation of the image formation model with a contrastive learning approach for learning noise statistics. The authors show that their method can be used to improve the quality obtained with downstream tasks such as particle picking, homogeneous reconstruction, and pose estimation. Reviewers were positive, noting the novelty of the task.