5.7

/10

Poster3 位审稿人

最低5最高7标准差0.9

4.0

置信度

正确性3.3

贡献度3.0

表达3.3

NeurIPS 2024

CryoGEM: Physics-Informed Generative Cryo-Electron Microscopy

Jiakai Zhang,Qihe Chen,Yan Zeng,Wenyuan Gao,Xuming He,Zhijie Liu,Jingyi Yu

OpenReview PDF

提交: 2024-05-13更新: 2025-01-06

摘要

关键词

Image SynthesisContrastive LearningCryo-EM

评审与讨论

审稿意见

评分: 7置信度: 42024-07-08

In this paper, the authors introduce a method to generate large annotated cryo-EM datasets from a small number (100) of real micrographs. The method combines a physics-based model of the image formation model with a contrastive learning strategy. The authors show that their method can be used to improve the quality obtained with downstream tasks such as particle picking, homogeneous reconstruction and pose estimation.

优点

Although acquiring more data in cryo-EM is usually easy, getting access to annotated data is hard. CryoGEM combines a physics-based model with a contrastive learning strategy to generate realistic annotated data. This is particularly interesting because, due to the low SNR in cryo-EM images, most downstream tasks often significantly benefit from pretraining (pose estimation) or finetuning (particle picking). The method introduced in this paper holds the potential to improve the accuracy of cryo-EM reconstruction pipelines. Notably, the authors explicity showed that cryoGEN can improve the performances obtained on downstream tasks.

The particularly appreciated the following points:

the method is described in a clear way ;
the experiments are fully described and seem reproducible ;
all the contributions claimed are illustrated with an experiment -- I appreciated the effort made by the authors to evaluate the quality of particle picking, pose estimation and homogeneous reconstruction after cryoGEM ;

缺点

I find that some parts of Section 3 (description of the method) lack clarity and that information on the 3D models used by the simulator of cryoGEM are missing (see "Questions").

问题

Mutual information extraction. This paragraph of the method was unclear to me. Why are $\mathbf{v}$ and $\mathbf{v}^+$ not indexed by $q$ while $\mathbf{v}^-$ is indexed by $k$ in (5)? Why does (5) correspond to the probability of "selecting" a positive sample?

Origin of coarse models. CryoGEM needs to a coarse 3D model to generate synthetic images. For the experiments conducted in this paper, where do these models come from? I did not find this information in the paragraph "Datasets".

Resolution of coarse models. What is the resolution of the coarse models used in this paper? What is the influence of the resolution of the initial model on the accuracy obtained on downstream tasks?

Pose accuracy What does $v$ correspond to in Eq (15)?

局限性

Yes, limitations and potential negative impacts are discussed in the paper.

作者回复

2024-08-07

Thank you for your appreciation and insightful comments. We will improve the clarity of the paper based on them.

Clarification of Notations

Thank you for thoroughly reading our paper and the supplementary material. We truly appreciate your suggestions.

For mutual information extraction (Equation 5), we followed CUT [1], where $\boldsymbol{v}, \boldsymbol{v}^+ \in \mathbb{R}^{N}$ represent an arbitrary pair of feature vectors of the query and the positive sample from the whole set. However, in our case, we have $Q$ pairs, and it will be clearer to rewrite $\boldsymbol{v}, \boldsymbol{v}^+ \in \mathbb{R}^{N}$ to $\boldsymbol{v}_q, \boldsymbol{v}_q^+ \in \mathbb{R}^{N}$ . We will fix this in the revised version.

The "selecting" operation in Equation 5 corresponds to a multi-class classification task, aiming to maximize the probability of correctly matching the corresponding positive $\boldsymbol{v}_q^+$ from a set of it and $K$ negatives, given the query $\boldsymbol{v}_q$ . Therefore, we formulate Equation 5 to minimize a $K+1$ -class cross-entropy loss.

In Equation 13, $v$ is a unit vector $(0,0,1)$ .

We will clarify all of these points in the revision.

[1] Park T, Efros A A, Zhang R, et al. Contrastive learning for unpaired image-to-image translation[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16. Springer International Publishing, 2020: 319-345.

The Origin of Coarse Models

Thank you for your attention. We will add this description to the Datasets section for better clarity. The coarse 3D model is obtained by running an ab-initio reconstruction of CryoSPARC with its default setting, followed by cryoDRGN to handle heterogeneous cases. We train cryoDRGN for 50 epochs using particles at a resolution aligned with the low-resolution 3D volume.

2024-08-14

I thank the authors for providing clarifications. They have addressed my main concern in the rebuttal. I will keep my positive rating.

审稿意见

评分: 5置信度: 42024-07-09

The paper introduces CryoGEM, an innovative method combining physics-based cryo-EM simulation with unpaired noise translation via contrastive learning to generate high-quality synthetic cryo-EM datasets. The approach significantly improves the visual quality of generated images and enhances downstream tasks like particle picking and pose estimation, leading to better 3D reconstructions.

优点

1 Extensive experiments demonstrate that CryoGEM produces high-quality synthetic cryo-EM images that significantly outperform existing methods like CycleGAN and CUT. The visual quality of the generated images is notably superior, preserving structural details and realistic noise patterns.

The synthetic datasets generated by CryoGEM improve the performance of downstream tasks, such as particle picking and pose estimation. The paper reports substantial improvements in these tasks, leading to better resolution in the final 3D reconstructions.

缺点

The physics-based simulation in CryoGEM relies on a coarse result as an input. This requirement can be a significant limitation in scenarios where obtaining a reliable coarse result is challenging, such as with very small or highly dynamic molecules.

问题

Is there any reference indicating whether the Gaussian noise distribution accurately represents the actual physical noise?

In practice, it is relatively easy to obtain a large number of observed samples of the target image in transmission images. Even if the proposed approach enhances the results, can we still easily access more samples of the target particle with minimal effort?

局限性

See Weakness

作者回复

2024-08-07

Thank you for your thoughtful suggestions. We will improve the paper based on them.

On the Relationship between Gaussian and Actual Physical Noise

We follow the common practice in the literature of using Gaussian noise to model the reconstruction problem in cryo-EM. For example, cryoDRGN [1] and RELION [2] model image noise in the Fourier domain as zero-mean, independent Gaussian distributed noise. The inverse Fourier transform of this noise in the real domain is also i.i.d. Gaussian noise.

We'd like to stress that CryoGEM can also accommodate other noise models, such as signal-dependent Poisson noise [3], by replacing the current Gaussian noise model with the new one during training. If space permits, we will include an example of this.

[1] Zhong E D, Bepler T, Davis J H, et al. RECONSTRUCTING CONTINUOUS DISTRIBUTIONS OF 3D PROTEIN STRUCTURE FROM CRYO-EM IMAGES[C]//8th International Conference on Learning Representations, ICLR 2020. 2020.

[2] Scheres S H W. RELION: implementation of a Bayesian approach to cryo-EM structure determination[J]. Journal of structural biology, 2012, 180(3): 519-530.

[3] Vulović M, Ravelli R B G, van Vliet L J, et al. The Supplementary Material of Image formation modeling in cryo-electron microscopy[J]. Journal of structural biology, 2013, 183(1): 19-32.

On the Effort to Capture More Samples

We agree that capturing more micrographs can improve the resolution of the final results. However, this approach leads to longer capture time on expensive cryo-EM equipment and demands substantial computational resources for iterative optimizations, often taking several days for a human expert.

Complementarily, CryoGEM aims to enhance downstream results without the need for a large amount of raw data, thereby improving the final resolution of resolved structures. This aligns with the recent generation-based reconstruction approaches [1], where sparse view reconstruction is achieved by generative models.

In terms of resource consumption, CryoGEM is a lightweight generative model that can be trained on 100 real micrographs using a single NVIDIA RTX 3090 GPU in just two hours. After training, it can rapidly generate annotated synthetic datasets with minimal additional cost. Therefore, the best practice should combine efficient data capture with advanced data processing, such as inputting as many as possible samples into a CryoGEM-improved pipeline for optimal reconstruction results.

[1] Wang S, Leroy V, Cabon Y, et al. Dust3r: Geometric 3d vision made easy[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 20697-20709.

2024-08-13

Thanks for the authors' feedback. I will keep my positive rating.

审稿意见

评分: 5置信度: 42024-07-13

In this paper, the authors introduce Physics-Informed Generative Cryo-Electron Microscopy (CryoGEM), a novel generative model for cryo-electron microscopy (Cryo-EM) micrographs. CryoGEM is trained to produce micrographs that accurately replicate the ice gradient, point spread function (PSF), and noise characteristics of experimental micrographs. The model offers two main applications in Cryo-EM analysis through the annotations provided by the generated micrographs: a) The generated micrographs include precise positional annotations for particles (2D projections of proteins in the micrograph). b) In addition to positional information, CryoGEM provides data on particle orientations and conformations. This information can be utilized with methodologies such as CryoFIRE to train models that distinguish particle orientations and conformations in experimental micrographs. The authors propose this innovative generative model to assist Cryo-EM data analysts in a) Fine-tuning particle picking models, and b) Training deep learning models for heterogeneous 3D reconstruction. This approach has the potential to significantly enhance the particle-picking process in experimental micrographs, ultimately leading to improved 3D reconstructed volumes of proteins.

优点

Originality: CryoGEM presents an innovative approach by integrating physics-informed modeling with generative techniques, addressing a significant gap in cryo-EM analysis. Quality: The study features well-designed experiments that demonstrate the model's capabilities. The application of CryoFIRE provides additional validation of CryoGEM's practical utility. Clarity: The paper is articulated precisely, offering detailed explanations and a well-structured layout that enhances reader comprehension. Significance: CryoGEM shows considerable potential to impact cryo-EM analysis significantly. It provides valuable tools that can improve particle picking accuracy and the quality of 3D reconstruction processes, potentially advancing structural biology research.

缺点

Code Availability: Currently there is no code availability, which may limit the accessibility of CryoGEM to the broader research community.

Resolution and Time Requirements: There is a lack of detailed discussion on the resolution of the initial coarse cryo-EM density map obtained from the ab-initio reconstruction of CryoSPARC, as well as the time required for this process.

Pipeline Efficiency: The time consumption for fine-tuning Topaz through CryoGEM raises concerns about the practicality of the proposed pipeline compared to manually picking a small number of micrographs.

Comparison with Template Matching: The particle picking approach of CryoGEM is not compared with template matching from the coarse cryo-EM input map, which could provide a more comprehensive evaluation of its advantages.

Incomplete Quantitative Comparisons: The quantitative comparisons for pose estimation are incomplete without the fine-tuning of CryoFIRE using pose estimations of particles from the coarse cryo-EM map input from CryoSPARC.

问题

Originality: CryoGEM presents an innovative approach by integrating physics-informed modeling with generative techniques, addressing a significant gap in cryo-EM analysis. Quality: The study features well-designed experiments that effectively demonstrate the model's capabilities. The application of CryoFIRE provides additional validation of CryoGEM's practical utility. Clarity: The paper is articulated with precision, offering detailed explanations and a well-structured layout that enhances reader comprehension. Significance: CryoGEM shows considerable potential to impact cryo-EM analysis significantly. It provides valuable tools that can improve both particle picking accuracy and the quality of 3D reconstruction processes, potentially advancing structural biology research.

局限性

The authors have adequately addressed the limitations of their work, discussing potential areas for future improvements, such as generalizing the model to different experimental conditions and the need for a coarse cryo-EM map as an input, which may be hard to produce. However, a more explicit discussion on the accessibility of CryoGEM to the wider research community and the efficiency of the proposed pipeline would strengthen the paper, as well as the code availability.

作者回复

2024-08-07

Thank you for your valuable suggestions. We appreciate the opportunity to address these concerns.

On the Pipeline Efficiency

We acknowledge that CryoGEM indeed takes longer than manual annotating, as shown in the following table. However, particle picking is a tedious, labor-intensive, and time-consuming task for technicians. Even with a blob picker, identifying target particles from the candidates requires excessive effort. CryoGEM enhances productivity by automatically generating annotated data for particle picking with high precision, although at the expense of speed.

The bottleneck in CryoGEM's pipeline is the particle labeling (generation) time, which includes ab-initio reconstruction to get the coarse volume, as well as the training and inference time. Currently, CryoGEM's training and inference are conducted on a single RTX 3090 GPU. To improve speed, we are developing a more parallelized GPU version, which could potentially accelerate the process by an order of magnitude. Additionally, AlphaFold3 serves as a potential alternative to CryoSPARC, which could further speed up ab-initio reconstruction.

Method	Labeling Time	Topaz Fine-tune Time	Reconstruction Time	Total Time	AUPRC (↑)	Res. ( $\unicode{x212B}$ )
Manual	1h26m41s	16m27s	1h41m7s	3h24m15s	0.776	3.59
Blob Picker	14m31s	12m12s	2h26m33s	2h53m16s	0.684	4.57
Ours	2h56m39s	10m0s	2h28m13s	5h34m52s	0.796	3.25

Comparison with Template Matching

The particle picking approach of CryoGEM is not compared with template matching from the coarse cryo-EM input map, which could provide a more comprehensive evaluation of its advantages. Thank you for the suggestion. It is indeed an excellent point. The following table shows the suggested quantitative comparison of our finetuned Topaz (Ours, by CryoGEM's synthetic annotated datasets) with cryoSPARC’s template-based matching method (Template Picker, using the coarse volume as an input). CryoGEM consistently outperforms the Template Picker in nearly all examples in both AUPRC and resolution, except for the AUPRC of Proteasome and the resolution of Integrin. We will add Template Picker as a picking baseline in the revision.

Metric	AUPRC (↑)						Res ( $\unicode{x212B}$ , ↓)
Method/Dataset	Proteasome	Ribosome	Integrin	PhageMS2	HumanBAF	Avg.	Proteasome	Ribosome	Integrin	PhageMS2	HumanBAF	Avg.
Template Picker	0.547	0.742	0.585	0.873	0.393	0.628	2.71	3.58	4.95	9.63	10.91	6.36
Ours	0.490	0.797	0.606	0.915	0.562	0.674	2.68	3.25	5.54	7.16	7.74	5.27

Comparisons with Pre-trained CryoFIRE

Thank you for your insightful comments. We did not include this comparison because we focused on evaluating the performance improvement between our method (Ours) and the original CryoFIRE. However, it is indeed more convincing to compare our method with the pre-trained cryoFIRE (CryoFIRE*) using the particle images whose poses are estimated at the ab-initio reconstruction stage. As shown in the following table, the pre-trained cryoFIRE exhibits a reasonable performance improvement compared to the original CryoFIRE. However, our method still outperforms both schemes by a large margin in most cases.

Metric	Res. (px. ↓)						Rot. (rad. ↓)
Method	Proteasome	Ribosome	Integrin	PhageMS2	HumanBAF	Average	Proteasome	Ribosome	Integrin	PhageMS2	HumanBAF	Average
CryoFIRE	5.94	16.92	13.87	17.23	6.98	12.18	1.55	0.64	0.93	0.75	1.53	1.08
CryoFIRE*	2.96	8.09	7.95	3.89	9.04	6.386	1.43	0.52	0.73	0.69	1.54	0.98
Ours	2.59	4.27	4.88	5.54	6.55	4.29	0.41	0.32	0.88	0.43	1.42	0.69

On the Code Availability

All of our code and datasets will be released to the public upon acceptance for further evaluation of CryoGEM. In the meantime, we will provide an anonymous version of our code for training and evaluating CryoGEM to the area chair.

2024-08-12

After considering the authors' responses and reviewing the other evaluations and rebuttals, I've decided to maintain my original assessment.

作者回复

2024-08-07

Global Response

We are grateful that all reviewers recognize that CryoGEM showcases the usefulness of generative AI in structural biology. We will soon release the code and the data for the community to experiment with and improve. The reviewers have provided many insightful suggestions that we will incorporate into the revision. As follows, we first address the common question raised by the reviewers and then their comments.

On the resolution of the coarse model (w8uj, D477, B6fS)

The reviewers are correct that CryoGEM uses a coarse model as an input, which can be obtained by ab-initio reconstruction, e.g., CryoSPARC. In our experiments, the resolution of coarse models ranges from 7.05 to 21.22. We have shown that even using such low resolutions as initializations, CryoGEM can still achieve an accurate location, pose, and conformation-controlled data generation (we kindly refer to Figure 8 in our paper for visual illustrations). The generated data significantly benefits downstream tasks such as particle picking and pose estimation, subsequently improving the reconstruction's final resolution. The following table shows the ab-initio and final resolutions of several examples, as well as the time required for ab-initio reconstruction.

It is worth mentioning that ab-initio reconstruction may fail on the first attempt. A common practice is to conduct another round of data collection or cleaning to obtain a better ab-initio initialization. In extreme cases where it completely fails, a potential solution is to leverage structure prediction models, such as AlphaFold3[1], to initialize a density volume as a de facto ab-initio reconstruction. This approach is part of our immediate future work. We will clarify these points in the revision.

Dataset	Res. of the coarse model ( $\unicode{xC5}$ , ↓)	Time of ab-initio reconstruction	Final Resolution ( $\unicode{x212B}$ , ↓)
Proteasome	7.79	1h0m27s	2.68
Ribosome	7.05	1h56m39s	3.25
Integrin	9.65	1h24m21s	5.54
PhageMS2	11.78	1h6m28s	7.16
HumanBAF	21.22	42m49s	7.74

Note that the resolution of the coarse model is calculated by splitting the real particle dataset into halves and conducting ab-initio reconstruction independently [2]. The coarse models are then aligned to calculate the spatial resolution in CryoSPARC. For the heterogeneous dataset, Integrin, we utilized cryoDRGN to obtain the neural volume. This process took 8 hours, 5 minutes, and 24 seconds after the ab-initio reconstruction.

[1] Abramson, J., Adler, J., Dunger, J., et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). https://doi.org/10.1038/s41586-024-07487-w

[2] Van Heel M, Schatz M. Fourier shell correlation threshold criteria[J]. Journal of Structural Biology, 2005, 151(3): 250-262.

最终决定Accept (poster)

2024-09-25

(5,5,7) In this paper, the authors introduce a method to generate synthetic cryo-EM images from a small number (100) of real micrographs. The method combines a physical simulation of the image formation model with a contrastive learning approach for learning noise statistics. The authors show that their method can be used to improve the quality obtained with downstream tasks such as particle picking, homogeneous reconstruction, and pose estimation. Reviewers were positive, noting the novelty of the task.