6.8

/10

Poster4 位审稿人

最低5最高8标准差1.1

4.5

置信度

正确性3.3

贡献度3.0

表达3.5

NeurIPS 2024

R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction

Ruyi Zha,Tao Jun Lin,Yuanhao Cai,Jiwen Cao,Yanhao Zhang,Hongdong Li

OpenReview PDF

提交: 2024-05-10更新: 2024-11-06

TL;DR

We discover an inherent problem in 3DGS and develop a novel 3DGS-based framework for tomographic reconstruction.

摘要

3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R$^2$-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a previously unknown integration bias in the standard 3DGS formulation, which hampers accurate volume retrieval. To address this issue, we propose a novel rectification technique via refactoring the projection from 3D to 2D Gaussians. Our new method presents three key innovations: (1) introducing tailored Gaussian kernels, (2) extending rasterization to X-ray imaging, and (3) developing a CUDA-based differentiable voxelizer. Experiments on synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art approaches in accuracy and efficiency. Crucially, it delivers high-quality results in 4 minutes, which is 12$\times$ faster than NeRF-based methods and on par with traditional algorithms.

关键词

3D Gaussian Splatting3D ReconstructionCT ReconstructionTomographic Reconstruction

评审与讨论

审稿意见

评分: 7置信度: 42024-07-08

The paper aims to achieve high tomographic reconstruction performance with a limited number of views in a time-efficient manner. To this end, the paper modifies 3DGS for X-ray projection by adjusting the rendering equation, correcting 2D projection errors, and using voxelizers for regularization. The experiments demonstrate the efficacy of the proposed method.

优点

First of all, the paper is well written. The problem in the current tomographic representation is clearly stated, and the authors' objectives are sufficiently addressed. The overall structure is easy to understand, and the ablation study covers most of the arising questions.

缺点

One critical weakness of this paper is the existence of prior work using 3DGS for X-ray projection. Although the paper seems structurally well-written, the paper seems less novel due to the presence of X-Gaussian. X-Gaussian, which has been accepted to ECCV 2024, employs a similar method. More importantly, X-Gaussian achieved a PSNR of 43 in Human Organ reconstructions, whereas this paper (R2 Gaussian) achieved a PSNR of 36, despite slightly different experimental settings. If the authors provide persuasive explanation on the novelty of this paper, I am willing to raise my score.

Additionally, it would be helpful to augment the related works and baseline models. For example, C^2RV (CVPR 2024) is another recent tomographic representation model.

问题

In the SAX-NeRF paper, the performance gap between SAX-NeRF and NAF is significant. However, in the R^2 Gaussian paper, the performance gap between them is not significant. Could you elaborate on why this happens?

局限性

The paper clearly states the limitations and the potential societal impact of the work.

作者回复

2024-08-05

We thank the reviewer for the detailed review. The comments and suggestions are helpful in improving our paper.

Q3.1: Novelty comparison w.r.t. X-Gaussian.

Our method demonstrates considerable novelty compared to the concurrent work X-Gaussian for the following reasons:

Broader task scope: Our R $^2$ -Gaussian is designed for both X-ray view synthesis and direct 3D CT reconstruction, whereas X-Gaussian only supports 2D X-ray novel view synthesis.
Theory-supported model design: Our R $^2$ -Gaussian has successfully extended 3DGS to 3D CT reconstruction with a theoretically sound approach, including new Gaussian kernels, new splatting equations, and a voxelization strategy. All of these are grounded with careful theoretical derivations. In contrast, X-Gaussian empirically modifies and extends 3DGS for X-ray view synthesis applications, with limited novel theoretical contribution.
Theory contribution: Our method provides novel and original theoretical results, including the new derivation of X-ray rasterization and the identification (and the remedy) of the previously overlooked integration bias in the standard 3DGS technique. X-Gaussian, on the other hand, does not offer theoretical contributions in this regard, despite that it indeed represents the first successful yet empirical adaptation and application of 3DGS to X-ray view synthesis.
Efficient CT reconstruction: Our method can directly output 3D CT volumes. In contrast, X-Gaussian augments novel-view projections first, and then relies on other existing CT algorithms for CT reconstruction.

In summary, our method offers notable technical and theoretical contributions compared to X-Gaussian. The methodological comparison between the two methods has been discussed in L89-94. We will further highlight our novelty and contributions w.r.t. X-Gaussian in the revised manuscript.

Q3.2: X-Gaussian achieved a PSNR of 43, whereas this paper achieved a PSNR of 36.

Actually, the reported 43 (in Tab. 1 of the X-Gaussian paper) is the 2D PSNR of novel-view image rendering quality, rather than the 3D PSNR of the CT volume reconstruction quality. As a mater of fact, X-Gaussian only reported their 3D PSNR as 30.56 dB (from 5+95 view, as shown in Tab. 2 in the X-Gaussian paper), which is significantly lower than our 3D PNSR at 36 (or 36.89 dB, for human organ, in Tab. 3 of our paper).

Q3.3: It would be helpful to augment the related works and baseline models, such as C $^2$ -RV (CVPR'24).

Thank you for this suggestion. We will include these SOTA related works in our revised manuscript. These will add value and authority to our current paper.

Regarding baseline selection, we did not include supervised learning methods like C $^2$ -RV because they require external datasets for pre-training. Our primary focus is on evaluating the method's representation capability for arbitrary objects without pre-training. For a fair comparison, we choose self-supervised learning methods that require only X-ray projections of objects (L483-484). To our knowledge, SAX-NeRF [7] (CVPR'24) is the latest SOTA work, and we have included it in our baseline methods. Experiments show that our method also outperforms SAX-NeRF, with a 0.93 PSNR increase and 78 $\times$ faster training speed.

Please also note that C $^2$ -RV has not released their code and models (empty GitHub repository). We tried to contact the authors but received no response. Due to the limited time available for rebuttal, it is unfortunate that we could not reproduce their method and compare it with our method experimentally.

Q3.4: Could you elaborate on the performance gap between SAX-NeRF papers and your paper?

We use the official code of SAX-NeRF [7] and NAF [62] to perform experiments without changing the network or hyperparameters. We show human organ results from the SAX-NeRF paper [7] and our paper, which use the same source (Tab. C). SAX-NeRF performs consistently in both papers, while NAF in our paper has a higher PSNR. Nevertheless, both the SAX-NeRF paper and our paper conclude that SAX-NeRF achieves better results than NAF. Therefore, the slight performance inconsistency does not harm the arguments made in our paper.

Table C. PSNR values in SAX-NeRF paper [7] and our paper.

	NAF in [7]	SAX-NeRF in [7]	NAF in ours	SAX-NeRF in ours
Jaw	34.14	35.47	35.01	35.37
Foot	31.63	32.25	31.65	31.90
Head	36.46	39.70	38.90	39.51
Chest	33.05	34.38	33.99	34.45
Average	33.71	35.44	34.85	35.29

2024-08-08

I appreciate the authors for their detailed rebuttal. It resolved most of the critical concerns. Therefore, as long as the authors incorporate the explanations from the rebuttal into the final version, I will raise my score accordingly.

2024-08-09

Thank you for your positive feedback and for the effort you put into reviewing our rebuttal. We are glad that our explanations have addressed most of your concerns. We will ensure that these clarifications are fully integrated into the final version of our paper.

审稿意见

评分: 5置信度: 42024-07-08

The paper introduces R2-Gaussian, a framework for tomographic reconstruction using 3D Gaussian splatting (3DGS). This framework aims to address the limitations of traditional 3DGS in volumetric reconstruction, specifically for tasks like X-ray computed tomography (CT).

优点

Identifying and addressing the integration bias in standard 3DGS formulation for volumetric reconstruction
The proposed R2-Gaussian framework is developed with tailored Gaussian kernels, rectified projection techniques, and a CUDA-based differentiable voxelizer.
The paper provides simulated X-ray validation, comparing the proposed method against state-of-the-art techniques

缺点

The proposed method shows results in experimental settings. However, its performance in real-world clinical or industrial scenarios is not thoroughly examined. For instance, real X-ray images are given to reconstruct the CT scan.
I am not sure if 75, 50, and 25 views of X-rays are considered sparse-view reconstruction and if it is practical to have 75, 50, or 25 views of X-rays for CT reconstruction.
In the proposed method, the kernel formulation removes view-dependent color. However, it cannot model the scattering effect in X-ray.
In Fig. 7, it is unclear how to get X-3DGS slices and what "queried from three views" means in detail. Why not implement the voxelization on X-3DGS (or vanilla 3DGS) for a fair comparison?

问题

After the voxelization from Gaussians, can the final density volume be compatible with traditional CT scans and be viewed in CT software?

局限性

Yes.

作者回复

2024-08-05

We thank the reviewer for the detailed review.

Q4.1: The performance in real-world clinical or industrial scenarios is not thoroughly examined. For instance, real X-ray images are given to reconstruct the CT scan.

We further evaluate our method on real-world data. We use FIPS [b], a public dataset providing real 2D X-ray projections. FIPS includes three objects (pine, seashell, and walnut). Each case has 721 projections in the range of $0^{\circ}\sim 360^{\circ}$ captured by Hamamatsu Photonics C7942CA-22. Since ground truth volumes are unavailable, we use FDK to create pseudo-ground truth CT volumes with all views and then subsample 75/50/25 views for sparse-view experiments. We report the quantitative and qualitative results in Tab. 1 and Fig. 1 in the attached PDF file. Our method outperforms baseline methods by a large margin in 75- and 50-view scenarios. In 25-view, our method slightly underperforms IntraTomo but is 11 $\times$ faster. Overall, our method shows superior performance and efficiency in the presence of real-world noise and scattering effects. We will include these results in the revised manuscript.

Q4.2: I am not sure if 75, 50, and 25 views of X-rays are considered sparse-view reconstruction and if it is practical to have 75, 50, or 25 views of X-rays for CT reconstruction.

Practicality: As mentioned by Reviewer sKx6, in industrial and medical applications, CT machines typically take hundreds to thousands of X-ray projections for high-quality details [a]. Therefore, modifying existing CT machines to 75/50/25 views is practical and convenient by setting different scanning intervals.
Rationale: The community considers fewer than 100 projections as sparse-view CT (SVCT). We list the numbers of projections used in some published papers, as shown in Tab. D. These papers use 20-180 views. Accordingly, we set our projections to 75, 50, and 25 views. Additionally, there are works investigating extremely sparse-view CT (ESVCT), which uses 2-10 views [58,30,10]. ESVCT is a severely ill-posed problem that cannot be solved without fine-grained prior knowledge, such as pretraining with external datasets. Therefore, ESVCT is out of the scope of our study, as we only use projections (completely self-supervised). Overall, our experimental setting aligns with previous work and should be considered sparse-view.

Table D. Number of projections used in published papers.

Paper	Publisher	Number of projections
DD-NET [b]	TMI'20	60-180
IntraTomo [61]	ICCV'21	20
NEAT [c]	TOG'22	25-50
NAF [62]	MICCAI'22	50
SAX-NeRF [7]	CVPR'24	50

Q4.3: It cannot model the scattering effect in X-ray.

We follow most CT reconstruction work [13,2,50,61,62,7], assuming that the target radiodensity field to be isotropic, and treating scattering as a noise source on the 2D detector.

Although we do not explicitly model scattering effects, we take it into consideration in the experiments. When generating X-ray projections for synthetic datasets, we model scattering noises with Poisson. We also evaluate our method in the real-world data which contains scattering effects (Q4.1). All results demonstrate our method's superior performance and robustness to scattering noise.

Q4.4: It is unclear how to get X-3DGS slices and what "queried from three views" means in detail.

We will improve Fig. 7 captions and relevant descriptions for better understanding.

X-3DGS slice: After recovering the 3D density of each Gaussian (L251-252), we use the same voxelizer (Sec. 4.2.2) as R $^2$ -Gaussian to extract CT volumes. We then show slices of these volumes in Fig. 7 to demonstrate the reconstruction quality.
"Queried from three views" means that we show reconstruction results from three different views to demonstrate the view inconsistency in X-3DGS.

Q4.5: Why not implement the voxelization on X-3DGS (or vanilla 3DGS) for a fair comparison?

For X-3DGS, we implement the same voxelizer (Sec. 4.2.2) in R $^2$ -Gaussian to extract a CT volume. We will clarify it in the revised manuscript.

Q4.6: After the voxelization from Gaussians, can the final density volume be compatible with traditional CT scans and be viewed in CT software?

Yes the reconstructed volume is compatible with traditional CT scan viewers. We show a screenshot of inspecting reconstructed volumes with the Weasis DICOM medical viewer in Fig. 4 (PDF).

Q4.7: Ethics review.

All data used in our experiments are from open-source datasets. We have properly cited the relevant references (Appx. B) and adhered to the respective data licenses.

Reference

[a] Villarraga-Gómez, Herminso, and Stuart T. Smith. "Effect of the number of projections on dimensional measurements with X-ray computed tomography." Precision Engineering 66 (2020): 445-456.

[b] Zhang, Zhicheng, et al. "A sparse-view CT reconstruction method based on combination of DenseNet and deconvolution." IEEE transactions on medical imaging 37.6 (2018): 1407-1417.

[c] Rückert, Darius, et al. "Neat: Neural adaptive tomography." ACM Transactions on Graphics (TOG) 41.4 (2022): 1-13.

2024-08-10

Thanks for the response. The authors have addressed my concerns. I will raise my score accordingly.

2024-08-11

Thank you for your valuable feedback and the time you took to review our paper. We’re pleased that our response addressed your concerns. We will ensure these clarifications are incorporated into the final version of the paper.

审稿意见

评分: 7置信度: 52024-07-10

Motivation

The authors propose to adapt 3D Gaussian Splatting (3DGS) to sparse-view tomographic reconstruction, i.e., to recover a radiodensity 3D volume from a small set of X-ray images and corresponding sensor information. This is relevant for various clinical and industrial applications.

Contributions

Their R2-Gaussian model iterates over existing 3DGS solutions tuned for XR/CT imaging, proposing a 3DGS initialization scheme better suited for tomographic reconstruction.
The authors also correct an integration bias in 3DGS (meant for faster image inference but causing ambiguities in volumetric reconstruction) and provide the corresponding CUDA patch.
The proposed R2-Gaussian includes other adaptations (custom densification parameters, voxel-based regularization, simplified isotropic kernels), resulting in an end-to-end CT reconstruction system.

Results

The authors provide extensive qualitative evaluation and quantitatively compare to other NeRF-based and traditional CT reconstruction methods, showing that their method provides a better trade-off between volume accuracy and reconstruction time.

Relevance

Effort in applying 3DGS to XR/CT imaging has grown the past year [6, 27, 39], as 3DGS appears to be a well-suited representation for such applications (due to its compactness, fast convergence, etc.). This work is a meaningful iteration and could benefit the community.

优点

(somewhat ordered from most to least important)

S1. Convincing Comparative Evaluation and Qualitative Results

The quantitative comparison to state-of-the-art CT reconstruction methods appear convincing, with the proposed solution demonstrating a better trade-off between volume accuracy and reconstruction time.
The authors provide a lot of meaningful qualitative results, to illustrate theoretical contributions, to highlight the benefits of their method, but also to showcase its limitations. This makes reading this paper the more interesting.
An ablation study w.r.t. some of the key contributions and w.r.t. some hyperparameters is also provided.
Though limited in number (15 volumes), the authors evaluate on different categories (animal, vegatal, and synthetic targets).

S2. Iterative yet Meaningful Contributions Towards 3DGS for XR/CT

The discussion w.r.t. the integration bias in vanilla 3DGS is interesting, and the technical solution brought by the authors appear valuable to the community. As mentioned in the paper, their corrected CUDA implementation could benefit other 3DGS works targeting volumetric reconstruction.
The authors propose an initialization scheme dedicated to volumetric tomography, as usual 3DGS initialization techniques (e.g., SfM) are not applicable here. This is a relevant contribution, properly described and evaluated (qualitative + quantitative evaluation).
The proposed system, tackling xrays-to-CT reconstruction in an end-to-end differentiable manner, is indeed novel. Existing 3DGS models [6, 27, 39] for XR/CT imaging rather focus on digitally-reconstructed-radiograph (DRR) novel-view synthesis (NVS) rather than CT reconstruction.

S3. Sound Theory and Reproducibility

Background theory is well described by the authors, and the scientific/technical insight of the authors w.r.t. the identified integration bias could benefit the community.
The authors provide their model implementation, which appears sound and well-structured (note that I did not try to run the code, but had a look at key files).

S4. Detailed Discussion of Limitations and Contributions

The authors put significant effort in discussing and illustrating some of their method's limitations (needle-like artifacts inherent to 3DGS, varying convergence time, limited extrapolation ability, etc.), as well as summarizing its impact (possible clinical/industrial applications, benefit of their CUDA code to the CV community, etc.) in Appendices G and H.

S5. Well-Written and Illustrated Paper

Overall, the paper is nicely structured, written, and illustrated. E.g., Figures 2-4 are helpful to understand the methodology at a glance.

缺点

(somewhat ordered from most to least important)

W1. Lack of Consideration for Real-World Noise and Anisotropic Effects

The authors claim that "X-ray attenuation depends only on isotropic density" [L140] to justify their radiodensity-based model, but this is not entirely correct. While most models generating digitally reconstructed radiographs (DRRs) from CT volumes indeed consider x-ray attenuation as an isotropic phenomenon, this is a approximation. Some x-ray transport effects, such as Compton scattering, are actually anisotropic (but because CT volumes do not inherently contain the material information necessary to accurately simulate Compton scattering, DRR models ignore the anisotropy part) [a]. However, the authors do claim that they trained their model on DRRs generated using TIGRE [5] configured to simulate Compton scattering [L217-224]. Does it mean that the authors preprocessed the CT volumes to replace the attenuation values by material information (e.g., mapping HU values to a set of predefined materials)? More information on the data generation process would be helpful here. If indeed the model was trained on DRRs containing anisotropic residual noise (c.f. Compton effect's residual impact on XR imaging), but the proposed algorithm itself only consider isotropic attenuation, how does it impact the results? E.g., it could be interesting to generate 2 sets of input DRRs (one generated with the approximated/simplified attenuation model and one more realistic) and compare the final accuracy of the reconstructed CT volumes.
The fact that the method is only applied to synthetic inputs (DRRs generated by TIGRE) rather than real, usually noisier, X-ray images is also problematic. The paper would benefit from a real-world evaluation, or at least a discussion on why it was not performed.

W2. Lack of References/Comparisons to SOTA on XR-3DGS

The authors mention some of the existing 3DGS solutions applied to CT/XR imaging (e.g., X-Gaussian [6], GaSpCT [39], Li et al.'s model [27]) [L89-94] but do not perform any form of comparison with those. The lack of qualitative/quantitative comparison is fair (the authors argue that these models "cannot generate CT models" [L91-92], which is somewhat true ; though a comparison on the XR-NVS task could have made this paper stronger). But I would argue that the authors should have better referenced some of these works in the Methodology. E.g., even if less formalized, radiative Gaussians are already presented in X-Gaussian [6] ; and even though it is performed in the pixel domain rather than voxel one, GaSpCT [39] already proposes a total variation loss to regularize their XR-3DGS. While I still believe that the proposed work is a valuable iteration over these works (by better formalizing and adapting the Gaussian properties and rasterization to CT data), I think the authors should be more transparent w.r.t. the SOTA.
References w.r.t. total variation (TV) theory are also missing, making it hard to contextualize the scope of the authors' contribution w.r.t. the objective function.

W3. Somewhat Overstated Contributions

The CUDA implementation of the radiodensity voxelization seems to be more a technical feat rather than a scientific contribution. While possibly valuable to the community, I do not see any novelty in this module (maybe not as GPU-optimized, but differentiable point-cloud-to-voxel-grid tools already exist, e.g., in PyToch3D).
Changes to the adaptive control are unclear/minor, according to the authors' descriptions [L210-214] (i.e., changing the size threshold w.r.t. pruning large Gaussians, editing the density of cloned/split Gaussians).
The positive impact of the TV regularization does not appear that statistically significant (+0.32dB for PSNR, +0.009 for SSIM, +1m33s for convergence). Maybe some qualitative results could help grasp its contribution?

W4. Limited Dataset Size

The quantitative evaluation is performed on only 15 samples, even though varied. Larger CT datasets are available, e.g. CTPelvic1K [b].

W5. Limited Impact of Integration Bias Correction (?)

The authors claim that "this integration bias, though having a negligible impact on imaging rendering, leads to significant inconsistency in density retrieval" [L184-186], but they only provide qualitative imaging results (Fig. 6) to justify their un-biasing contribution. Additional results could better contextualize the corresponding claims.

Additional Reference:

[a] Gao, Zhongpai, et al. "DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering." arXiv preprint arXiv:2406.02518 (2024).

[b] Liu, Pengbo, et al. "Deep learning to segment pelvic bones: large-scale CT datasets and baseline models." International Journal of Computer Assisted Radiology and Surgery 16 (2021): 749-756.

问题

see Weaknesses for key questions/suggestions.

Q1. Typo?

[L223] Do the authors mean "Compton scatter" rather than "ponton scatter"?

局限性

Some limitations and societal impacts are discussed in detail (see S4 above).

作者回复

2024-08-05

We appreciate your positive review and valuable feedback.

Q2.1: Did the authors preprocess the CT volumes?

Yes we convert raw volumes from HU to attenuation coefficients. Following [62, 7, 27], we then normalize voxel values to [0,1] for balanced evaluation across modalities. We will add more details in the revised manuscript.

Q2.2: How does anisotropic residual noise impact the results?

Q2.3: The paper would benefit from a real-world evaluation.

We address two questions together since they all relate to scattering effects. We agree that anisotropic effects, such as Compton scattering, occur in real-world X-ray imaging. We follow most CT reconstruction work [13,2,50,61,62,7], assuming that the target radiodensity field to be isotropic, and treating scattering as a noise source on the detector.

We do consider scattering in the experiments.

When preparing X-ray projections, we follow conventions [a] to model scattering noise with Poisson.
As requested, we further evaluate our method on real-world data containing scattering effects. We use FIPS [b], a public dataset providing real 2D X-ray projections. FIPS includes three objects (pine, seashell, and walnut). Each case has 721 projections in the range of $0^{\circ}\sim 360^{\circ}$ . Since ground truth volumes are not available, we use FDK to create pseudo-ground truth with all views and then subsample 75/50/25 views for sparse-view experiments. We report the quantitative and qualitative results in Tab. 1 (PDF) and Fig. 1 (PDF). Our method outperforms baseline methods in 75- and 50-view scenarios. In 25-view, our method slightly underperforms IntraTomo but is 11 $\times$ faster. Overall, our method shows superior performance and efficiency in the presence of real-world noise and scattering effects.

Q2.4: The authors should better reference existing X-ray 3DGS works.

We will enhance the comparison with existing X-ray 3DGS methods by adding more details in the related work section. Please note that all X-ray 3DGS works were preprinted/under review before the NeurIPS submission deadline.

We would also like to clarify the following points regarding the comments:

Radiative Gaussian: Although X-Gaussian and our method coincidentally use the same term, the motivations and formulations are quite different.
- X-Gaussian replaces view-dependent spherical harmonics with a view-independent feature vector based on the isotropic assumption. It retains color and opacity, which do not physically represent radiodensity field. Besides, it uses alpha-blending, which contradicts the unordered nature of X-ray imaging.
- We define the Gaussian kernel as a local radiodensity field and derive new rendering equations (Eq. 7), demonstrating that summation should be used instead of alpha-blending (L176-178). See Q3.1 (Reviewer P8A3) for more details.
Total variation (TV): Our work does not claim contributions to TV regularization. Instead, we use TV to demonstrate the possibility of applying voxel-based supervision to Gaussians, thanks to the differentiable voxelizer. To our knowledge, we are the first to do so.

Q2.5: References w.r.t. TV are missing.

We will add reference [c] w.r.t. TV.

Q2.6: I do not see novelty in the radiodensity voxelization.

We conclude the technical novelty of our voxelizer as the first differentiable CUDA-accelerated one for 3D Gaussians. The voxelizer offers opportunities to apply other voxel-based losses (such as SDF supervision) to 3D Gaussians, which can benefit the community.

Q2.7: Changes to the adaptive control are unclear/minor.

We made minor modifications to adaptive control to suit X-ray imaging. We do not intend to claim a contribution regarding adaptive control. We will clarify it in the revised manuscript.

Q2.8: Some qualitative results could help grasp the contribution of TV regularization.

We show qualitative results with and without TV in Fig. 12. It is clear that adding TV promotes smoothness and homogeneity. However, needle-like artifacts still persist, consistent with the general acknowledgment that high-level priors such as TV do not significantly improve performance. As mentioned in Q2.6, we do not claim contributions to TV but rather to a novel strategy for applying 3D losses.

Q2.9: Larger CT datasets are available.

While there are larger CT datasets, such as CTPelvic1K, they only cover human organs with similar structures and materials. We focus on method's representation capability of arbitrary objects. Therefore, we chose data across various modalities, aiming at diversity rather than quantity. Besides, NeRF-based baseline methods typically require hours for training (SAX-NeRF needs 13 hours). We unfortunately do not have sufficient resources to support thousand-level experiments.

Compared with previous work, our dataset has the same size as SAX-NeRF (15), and is larger than NAF (5) and X-Gaussian (5).

Q2.10 Additional results could better contextualize the claims of integration bias correction.

We show more quantitative and qualitative results regarding integration bias in Tab. 2 (PDF) and Fig. 2 (PDF). Our method achieves better results than X-3DGS in both 2D and 3D. This suggests that correcting integration bias improves both image rendering and volume reconstruction in CT. Note that this conclusion slightly differs from L253-254, and we will update it in the revised manuscript.

Q2.11: Typo: "Compton scatter" or "ponton scatter"?

We use Poisson to model photon statistics on the detector, which also includes Compton scattering. We will use "photon statistics" or "Compton scattering" for clarity.

Reference

[a] Zhu, Lei, et al. "Noise suppression in scatter correction for cone‐beam CT." Medical physics (2009).

[b] Siltanen, Samuli, et al., "FIPS: Open X-ray Tomographic Datasets.", Zenodo (2022)

[c] Rudin, Leonid I., et al. "Nonlinear total variation based noise removal algorithms." Physica D: nonlinear phenomena (1992).

2024-08-08

I thank the authors for their thorough response, as well as my fellow reviewers for their insightful comments. I appreciate the author's effort to address my (overall minor) concerns and questions, and I lean towards maintaining my current score (accept).

I do hope that, were the paper accepted, the authors would account for the reviewers' remarks, as summarized by the authors in their global response, e.g.:

Including results on real-world data c.f. DGKr and w7Ci (me). These new experiments gathered by the authors demonstrate the real-world applicability of their work (c.f. Tab. 1 and Fig. 1 of rebuttal PDF).
Better discussing/referencing existing XR-GS works c.f. P8A3 and w7Ci (me). The results and discussion provided in the authors' response would benefit the readers, by better contextualizing their work.

I would also suggest:

Clarifying the isotropic simplification at the core of some claims/contributions. E.g., the authors' claims that "[X-Gaussian] uses alpha-blending, which contradicts the unordered nature of X-ray imaging" [response] and that "we can individually integrate each 3D Gaussian to rasterize an X-ray projection" [L162 + Equation 5] are only correct in the context of the isotropic simplification of X-ray imaging. I.e., if actual physics effects, such as Compton scattering, were to be considered, then the ordering of the Gaussians would matter (see preprint [i] for contributions to XR-GS orthogonal to R $^2$ -Gaussian's, as well as [ii, iii] w.r.t. why ordering matters in GS-based scattering simulation). I do agree with the authors that most CT reconstruction models adopt the isotropic simplification; and, therefore, that their rasterization simplification is legitimate. However, readers should be more explicitly made aware of the basis of the authors' claims (isotropic approximation [L100, L140, L162]).
Clarifying the benefits of proposed GS voxelizer compared to existing solutions, e.g., PyTorch3D PC-to-voxel solution, which is also CUDA-based and differentiable but may require a few tweaks to work on Gaussians.
Including discussed references, e.g. [c] (TV).

Reference:

[i] Gao, Zhongpai, et al. "DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering." arXiv preprint arXiv:2406.02518 (2024).

[ii] Zhou, Yang, Songyin Wu, and Ling-Qi Yan. "Unified Gaussian Primitives for Scene Representation and Rendering." arXiv preprint arXiv:2406.09733 (2024).

[iii] Condor, Jorge, et al. "Volumetric Primitives for Modeling and Rendering Scattering and Emissive Media." arXiv preprint arXiv:2405.15425 (2024).

2024-08-09

Thank you for your careful review of our rebuttal. Your detailed comments are crucially helpful in improving our paper, and we sincerely appreciate your support. We will incorporate the reviewers' remarks into the final version. We will also thoroughly address the points you raised, especially clarifying the isotropic simplification and the benefits of our proposed voxelizer.

审稿意见

评分: 8置信度: 52024-07-12

This paper presents a 3D reconstruction method for sparse-view computed tomography using 3D Gaussian Splatting. The core contribution is the reformulation of the volumetric rendering equation to include view-independent central density estimation. Additionally, the paper introduces a differentiable voxelizer that converts a set of 3D Gaussians into a voxel grid of densities, proving effective in computed tomography tasks.

优点

The paper reveals the view-dependent integration bias in 3D Gaussian Splatting (3DGS), which, to my knowledge, has not previously been reported in the community. This discovery may have a high impact to computer vision.
While this is not the first paper to apply 3DGS to computed tomography, it is the first to accurately reconstruct 3D volumes with an image formation tailored specifically for this task.
The proposed method is well evaluated and compared against other baseline methods.
Exposition is clear. The paper really reads well.

缺点

I don’t see any particular weakness of the paper.

问题

I have a couple of questions.

The paper focuses on sparse-view computed tomography. I am curious about how the method would perform in dense-view scenarios. In industrial CT scanning, capturing a few thousand projections for high-quality microscale geometric details is not uncommon. With a sufficient number of projections, the FDK algorithm typically performs well. How does the proposed method compare to FDK when a large number of projections are used? At what point might it start to underperform, if at all? Would it still outperform in dense-view scenarios?

Additionally, what are the implications of correcting the integration bias in image-based 3D reconstruction tasks? Would this correction lead to improved 3D reconstructions as well? If not, why?

局限性

The only limitation I can think of is the scope of this work; computed tomography represents a relatively niche area in the fields of machine learning and computer vision.

作者回复

2024-08-05

We appreciate your recognition of our work and your valuable feedback.

Q1.1: How does the proposed method compare to FDK when a large number of projections are used?

We further evaluate FDK and our method with 500 to 2000 views. Results in Tab. A show that our method outperforms FDK by a large margin in all settings. Additionally, our method achieves a peak PSNR of around 39.85 dB while FDK is at approximately 37 dB.

Table A. Quantitative results of FDK and our method under dense-view scenarios.

No. views	PSNR (FDK) $\uparrow$	SSIM (FDK) $\uparrow$	PSNR (Ours) $\uparrow$	SSIM (Ours) $\uparrow$	Time (Ours)
50 (reference)	26.5	0.422	37.98	0.952	8m14s
500	34.04	0.755	39.73	0.964	8m33s
1000	36.67	0.899	39.84	0.963	9m5s
1500	36.89	0.913	39.84	0.963	8m49s
2000	37.00	0.919	39.85	0.963	9m22s

Q1.2: What are the implications of correcting the integration bias in image-based 3D reconstruction tasks?

While RGB-based 3D reconstruction is out of our scope, we share some preliminary findings. We compare vanilla 3DGS and rectified one (R-3DGS) on NeRF synthetic dataset. We define the geometry field as the sum of Gaussian opacities, the same as SUGAR (CVPR'24). For vanilla 3DGS, we compute the mean of recovered 3D opacities of all training views. We then use our voxelizer (Sec. 4.2.2) to query opacity volumes and extract meshes using marching cubes (MC). Note that because the actual iso-value of the surface is unknown, we report chamfer distances (CD) with three MC thresholds.

Results in Tab. B and Fig. 3 (PDF) show that correcting the integration bias does not harm 2D rendering. Furthermore, it improves 3D reconstruction, though less significantly than in volumetric CT reconstruction. We suspect three reasons. First, for opaque objects, Gaussians are trained to be flat, so integration values ( $\mu$ in L182) do not change significantly in front views. Second, Gaussians are close to the surface, allowing for reasonable surface extraction using only positions. Third, the splatting technique involves many simplifications in rendering equations, which may have more impact than integration bias on 3D reconstruction.

Since these findings are preliminary, we do not include them in this paper. We will make efforts to develop a bias-free 3DGS for RGB-based reconstruction in future research.

Table B. Quantitative results of vanilla 3DGS and rectified one (R-3DGS) on NeRF-synthetic dataset.

	Vanilla 3DGS	R-3DGS
2D PSNR $\uparrow$	31.46	31.28
2D SSIM $\uparrow$	0.966	0.967
No. Gaussians	285k	345k
CD (MC=5.0) $\downarrow$	0.0182	0.0202
CD (MC=10.0) $\downarrow$	0.0179	0.0147
CD (MC=20.0) $\downarrow$	0.0172	0.0141

作者回复

2024-08-05

Dear Reviewers,

Thank you for your insightful comments and constructive suggestions. We appreciate Reviewers sKx6 and w7Ci for recognizing our paper's solid technical contribution, high impact on related areas, excellent evaluation, and good writing. We are grateful for the positive feedback from all reviewers on our novel CT reconstruction framework and the discovery of integration bias.

Based on these valuable suggestions, we have added more experimental analysis and clarified important concepts. Please note that we add some figures and tables in the PDF file. Here is a summary of the changes:

Experiments on real-world data (Reviewer w7Ci and P8A3). We further evaluate our method on the real-world dataset FIPS [a]. The results in Tab. 1 (PDF) and Fig. 1 (PDF) show that our method outperforms baseline methods in the presence of real-world noise and scattering effects.
Clarification of our contribution w.r.t. existing X-ray 3DGS works (Reviewers w7Ci and DGKr). We summarize our primary contribution as the first theory-supported 3DGS framework for direct CT reconstruction, and the discovery and remedy of previously overlooked integration bias. We provide a detailed comparison between our method and X-Gaussian in Q3.1 (Reviewer P8A3).
Demonstration of integration bias (Reviewers w7Ci and DGKr). We have added more quantitative and qualitative results (Tab. 2 and Fig. 2 in PDF) to demonstrate the necessity of correcting integration bias.

We hope our response has addressed the initial concerns. Please let us know if you have any other questions.

Kind Regards,

Authors

Reference

[a] Siltanen, Samuli, et al., "FIPS: Open X-ray Tomographic Datasets.", Zenodo (2022)

最终决定Accept (poster)

2024-09-25

This paper initially received mixed scores with two postive (strong accept, accept) and two negative (borderline reject, borderline reject) reviews. Some reviewers praised the clarity of the writing and felt the proposed application of 3DGS to medical imaging could have significant impact. Reviewers also raised concerns about novelty compared to X-Gaussian and evaluation on real-world data. The authors provided a rebuttal, and after the discussion period the reviewers felt that their concerns were addressed. The paper is accepted, and the authors should incorporate the results and clarifications from the rebuttal into the paper.