GSDF: 3DGS Meets SDF for Improved Neural Rendering and Reconstruction
We propose GSDF, a dual-branch system that enhances rendering and reconstruction at the same time, leveraging the mutual geometry regularization and guidance between Gaussain primitives and neural surface.
摘要
评审与讨论
The paper introduces GSDF, a new approach for both novel view synthesis and surface reconstruction that relies on a dual-branch architecture combining 3D Gaussian Splatting (3DGS) with neural Signed Distance Functions (SDF).
Details:
The paper aims to improve the quality of both the rendering compared to vanilla 3DGS and the reconstructed surface compared to neural SDF approaches like NeuS. To this end, it uses two branches:
- 3DGS Branch: Inspired by Scaffold-GS, this branch outputs RGB images with splatting-based, rasterized rendering.
- Neural SDF Branch: Inspired by Instant-NSR, a custom implementation of NeuS using hash grids (similar to Instant-NGP) to accelerate optimization. This branch outputs RGB images using volumetric rendering.
The method establishes mutual guidance and joint supervision between the two representations, relying on three main ideas:
- Efficient Sampling: Using depth maps rasterized with Gaussian splats to guide the volumetric rendering in the SDF branch, allowing for much more efficient sampling along rays. In practice, the sampling range for a ray going through a pixel is computed using the SDF value of the point resulting from the backprojection of the 3DGS depth map at pixel .
- Guided Densification and Pruning: Using the SDF branch to guide the densification and pruning processes in the 3DGS representation. The underlying idea is straightforward: Densification should be stronger near the surfaces, and Gaussians far from the surfaces should be pruned. In practice, the paper proposes to compute a criterion depending on both the gradient of Gaussian primitives (similarly to Scaffold-GS) and the SDF value of the center of a Gaussian. The higher the gradient and the closer the Gaussian is to the surface, the more likely it is to be densified. In the meantime, Gaussians located far from the surface are pruned.
- Geometric Alignment: Aligning geometric properties (depth and normal maps) of both branches to ensure the consistency of the two branches as well as better geometric accuracy of the 3DGS representation, as Gaussians generally tend to cheat on the geometry to allow for better rendering.
The proposed approach limits floaters and ensures better alignment of 3D Gaussian primitives with the underlying surface. The overall geometric quality and details of the 3DGS representation are improved, reaching higher performance than concurrent state-of-the-art methods. Moreover, the paper explains that the 3DGS branch allows for accelerated convergence (and better performance) of the SDF branch thanks to much more efficient sampling along rays.
优点
-
The paper is clear and well-written. The details are easy to follow.
-
The proposed approach shows excellent performance on quantitative benchmarks, effectively outperforming concurrent state-of-the-art works.
-
The qualitative results are impressive and clearly demonstrate the improvement in geometric accuracy and surface details brought by the approach.
-
As the paper explains, “[The] framework is versatile and can be easily adapted to incorporate future advanced methods for each branch” (lines 121-122). In other words, from a high-level perspective, I believe the paper offers a simple plug-in strategy that may allow jointly optimizing any Gaussian-based approach with any Neural SDF approach. I would be very curious to know if the authors tried their strategy with a regular, vanilla 3DGS representation rather than Scaffold-GS; I have no doubt that the results would be worse, but for generalization purposes, it would be interesting to know if the Scaffold-GS structural properties are essential to the approach, or if it can easily extend to 3DGS-based representations that do not rely on anchor points and MLP-decoders for the parameters.
缺点
-
Concerning Speedup Contribution: It is undeniable that the proposed approach allows for better reconstruction quality. However, it is unclear how the dual-branch system helps to make the SDF optimization faster, as claimed in the paper (see subsection 4.2.2, for example, line 246: “our method optimized the SDF field significantly faster than previous methods”). Although the paper claims a speedup over NeuS (which requires 8 hours for DTU scenes), the proposed approach uses a custom implementation of NeuS, called Instant-NSR, that is supposed to be able to train NeuS models in 10 minutes on some scenes. Is the speedup over NeuS really due to the dual-branch system (and the benefits each approach brings to the other), or is it due to the Hash-grid architecture? In the end, it appears that the proposed approach actually slows down the optimization compared to a single branch relying on Instant-NSR (or a single branch relying on Gaussians, as the optimization time is longer than the concurrent 2DGS approach). Theoretically, the speedup claimed in the paper would make sense; however, in practice, it is not clear how using two branches allows for establishing a virtuous circle and speeding up optimization. On the contrary, it might slow down optimization compared to using a method relying on a single branch.
-
Surface extraction method: I did not see any discussion about the mesh extraction method for evaluating the surface reconstruction. Since the approach relies on two different representations that are supposed to become consistent with each other, I assume it would be possible to leverage one representation or the other (or both) to extract a surface mesh. I suppose the authors use a Marching Cubes algorithm on the neural SDF, but I might be wrong. If Gaussians are indeed consistent with the SDF and well-aligned with the surface, would it be possible to extract a mesh using Poisson reconstruction, similar to SuGaR, or TSDF, similar to 2DGS (although it would not scale well to background regions)? Would it be worse than applying a marching algorithm on the SDF? An ablation regarding the surface extraction method would have been very interesting, especially in a setup where two representations are available at the same time.
-
Memory Footprint: 3DGS has a pretty high memory footprint in itself. Scaffold-GS might help reduce this footprint, but what about the proposed dual-branch approach? The approach sounds very heavy memory-wise, as it simultaneously optimizes two NVS models. The paper explains that a single NVIDIA A100 GPU with 80GB of VRAM (which is a lot) was used, but does not give details about the memory consumption of the approach. What is the minimum memory requirement to optimize the proposed model? Memory (and time) consumption are very important criteria for graphics-based applications.
-
Novelty of the Approach: The approach heavily relies on two existing works (Scaffold-GS and Instant-NSR), and might look like a combination of these works with limited novelty. However, I think the combination of both works is not trivial, and the paper proposes a satisfying strategy to make each model benefit from the other. The proposed work is also quite reminiscent of NeuSG, a CVPR 2024 paper that proposes to jointly optimize a set of 3D Gaussians and a neural SDF. Even though the novelty of the proposed work might not be so great, this is no point for rejection in my opinion, as the work is technically solid and the results are of high quality.
问题
-
Why rely on Scaffold-GS? I understand that the underlying voxel+anchor-based structure of Scaffold-GS helps enforce a better structure of 3D Gaussians as well as quickly identifying which Gaussians could be pruned or densified. However, it would be really interesting to have an ablation using vanilla 3DGS rather than Scaffold-GS for the 3DGS branch, to know if the underlying regularization coming from the Scaffold-GS structure is essential to the overall approach, or if it can generalize to unstructured 3D Gaussians.
-
What resources (time and memory) are needed for optimizing the model? The paper explains that the approach requires 2 hours on DTU compared to 8 hours with NeuS. However, the paper uses Instant-NSR’s implementation, which makes NeuS converge much faster (10 minutes for scenes from the Blender dataset, for example). Is the speedup really due to the proposed method, or is it due to the hash-grid implementation from Instant-NSR?
-
What is the optimization time (and memory requirement) for unbounded scenes, like Mip-NeRF 360 or Tanks&Temples?
-
What surface extraction method is used? Is it possible to use the Gaussians for extracting the surface? Or both branches?
-
(Bonus question!) Looking at Figure 4, it is undeniable that the proposed method achieves sharper results than the concurrent 2DGS approach. However, in the particular case of the Truck scene from Tanks&Temples (2nd row of Figure 4), it seems that 2DGS better reconstructs fine holes in the topology, looking at the back and the top of the truck. I’m curious to know if the authors have an idea of what component in their approach causes such a limitation in this particular example.
局限性
The authors addressed most of the limitations of their work. However, they do not discuss the memory requirements of the approach (relying on two different models at the same time), which might be much higher than concurrent approaches such as 2DGS, for example.
Moreover, the limitations are only discussed in the supplementary material, which is a problem in my opinion. I encourage the authors to try to move the limitations to the main paper in the final version.
Thanks for your efforts and valuable comments. Below we address concerns for each question. Common concerns are detailedly responded in the global rebuttal. Additional Figures and Tables are provided in the attached PDF, in which the index is denoted as Figure A-D and Table A-B.
Q1. Why using Scaffold-GS; Generalization of the framework.
A1: We used Scaffold-GS because it manages 3D Gaussians more efficiently, resulting in less memory consumption, more accurate depth predictions, and better rendering quality.
Following the reviewer's suggestion, we tested our framework's generalization by switching the GS branch to vanilla 3DGS. As shown in Figure D, GSDF_3DGS results in better reconstruction quality than using only the SDF branch (Instant-NSR). Additionally, the rendering quality improved from 28.21 to 28.31, as shown in Table B.
Q2. Time consuming and memory usage.
A2: Thanks for pointing out. Indeed the speedup partially comes from the hash-grid implementation. We additionally compared the reconstruction quality of GSDF and only SDF-branch (I.e. instant NSR) with regard to training time, as illustrated in Figure A. It shows that GSDF can achieve higher reconstruction quality than the baseline when trained for the same amount of time and iterations. Please also refer to the global response.
Regarding resources, we recorded the memory usage of our method and other methods in Table A. GSDF indeed uses more memory and we will clarify the limitations and move the limitations from supplementary material to the main paper.
Q3. The surface extraction method and the geometry of GS-branch.
A3: We extract the mesh using Marching Cubes on the SDF branch. As described in the paper (L26-30), we identified that enforcing Gaussian primitives to align with scene surfaces by restricting their shape and position can lead to compromise in rendering. Therefore, we keep the diversity of 3D Gaussians in pursuit of a better rendering quality. In addition, the placement of the Gaussian primitives of our GS-branch is truly closer to the potential surfaces compared to using only our GS branch, as illustrated in Figure B.
Bonus Question: fine holes in 2DGS.
Answer: 2DGS optimizes discrete 2D Gaussians, while SDF optimizes a continuous field. Compared to global representation, the discrete primitives are more flexible to represent holes, while global representation is better at capturing continuous surface. Additionally, we introduced a curvature term in the loss function when optimizing SDF to avoid incorrect holes. However, this term can lead to over-smoothing, which may also lead to the missing fine holes in the Truck scene.
I would like to thank the authors for the clarifications as well as the efforts they made during the rebuttal.
The rebuttal provides convincing additional experiments, and addressed all of my concerns.
While the overall strategy of the paper is not particularly novel (combining 3DGS and SDF branches for surface reconstruction and high-quality rendering), I believe the authors presented a technically solid work with convincing quantitative and qualitative results.
I also appreciate the additional experiment consisting in replacing Scaffold-GS by vanilla 3DGS; The high-quality extracted surface obtained with vanilla 3DGS (see the Barn reconstruction in the rebuttal PDF) shows that the proposed regularization may indeed act as a more general pipeline for combining neural SDFs and 3DGS-based radiance field, and inspire future work.
For these reasons, I decide to increase my rating.
Thanks for your comment, we appreciate your effort to help us validate the generalization of the proposed framework.
This paper proposes to jointly optimize 3D Gaussian Splatting (3DGS) and SDF (like NeuS). During GS optimization, the Gaussians are aligned to the zero-level set (and normals) of SDF. During the NeuS-like optimization, Gaussians are used to limit the range of ray sampling, resulting in efficient optimization. Experiments show that the proposed method achieves better reconstruction and rendering compared with SDF-based methods (e.g., NeuS) and GS-based methods, including recent SuGaR and 2D-GS, which explicitly align Gaussians to surfaces.
优点
- This paper proposes a nice combination of 3DGS and SDF (but not the first to do, which I'll detail in the weakness section). Both Gaussians and SDF representations provide merit to each other, resulting in accurate reconstruction and rendering.
- Experiments are well done. Very recent methods, like 2D Gaussian splatting and SuGaR are adequately evaluated, showing the merit of doing joint optimization of Gaussians and SDF, not just by aligning Gaussians on the surfaces.
缺点
Related work
This method is not the first one combining the 3DGS and SDF, as mentioned in related work (NeuSG [5]). Regarding CfP of NeurIPS, "papers that appeared online within two months of a submission will generally be considered "contemporaneous" in the sense that the submission will not be rejected on the basis of the comparison to contemporaneous work."
In other words, NeuSG, which was submitted on Dec 1, 2023, in arXiv, has to be regarded as an "official" existing work. Thus, the paper should discuss more the technical differences from the NeuSG work. I basically agree with the argument in L81-86 that mentions the subjective merit of the proposed method compared to NeuSG; the proposed method does significantly more for tighter integration of 3DGS and SDF.
Meanwhile, the readers should wonder if the proposed method achieves better than NeuSG or not. Unfortunately, the code of NeuSG does not seem available, and an official comparison may be difficult. I did not come across the NeuSG paper deeply and may mention something wrong, but I'm wondering if the authors may show a quick ablation study using a baseline that somewhat mimics a simplified version of NeuSG by only using normal supervision by SDF and force the Gaussians flat.
问题
- Although I may miss some information, an interesting potential merit of the proposed method compared to SDF-based methods is the efficient ray sampling, while the overall optimization should take longer than vanilla 3DGS. I would like to see the discussions on training time compared to those methods.
局限性
I did not find notable negative social impacts.
Thanks for your efforts and valuable comments. Below we address concerns for each question. Common concerns are detailedly responded in the global rebuttal. Additional Figures and Tables are provided in the attached PDF, in which the index is denoted as Figure A-D and Table A-B.
Q1. Comparison to NeuSG.
A1: Please refer to the common responses for a detailed explanation.
Following the reviewer's recommendation ("by only using normal supervision by SDF and force the Gaussians flat"), we implemented the NeuSG. Figure D shows that GSDF achieves better reconstruction than NeuSG. Additionally, NeuSG's rendering quality decreased from 28.77 to 28.63 PSNR, while GSDF increased PSNR to 28.93
Q2. Time consuming.
A2: Please refer to the common responses for a detailed explanation.
Thanks for the rebuttals and additional results. I would also like to discuss those with other reviewers.
Thanks for your comment, we appreciate your effort.
This paper tackles the challenge of representing 3D scenes from multiview images by introducing a novel dual-branch architecture named GSDF, which combines 3D Gaussian Splatting and neural Signed Distance Fields. This architecture enhances both rendering and reconstruction through mutual guidance and joint supervision. By aligning Gaussian primitives with potential surfaces and speeding up SDF convergence, the method achieves finer geometric reconstructions and minimizes rendering artifacts. Demonstrating robustness and accuracy, the approach is effective in both synthetic and real-world scenarios.
优点
The paper employs a dual-branch approach for simultaneous scene rendering and mesh reconstruction. It utilizes an SDF (Signed Distance Field) branch to guide the geometric optimization of the Gaussian branch, including operations such as adding and removing points. By leveraging bidirectional optimization, the method simultaneously enhances the reconstruction and rendering quality of both branches. This approach ensures high rendering quality while effectively reconstructing the scene's mesh, achieving smooth reconstruction results on both object-level and scene-level datasets. Additionally, the method has also demonstrated commendable performance in novel view synthesis tasks.
缺点
It appears that in the reconstruction and rendering branches, the authors have merely pieced together appropriate solutions. Although bidirectional optimization has been applied to both branches, the unity of the method seems lacking. The interrelation between the methods is weak, resembling more of an incremental improvement rather than a cohesive, integrated advancement.
问题
- The rendering and reconstruction results mentioned in the paper are derived from which branch? Are both rendering and reconstruction results obtained from the 3DGS branch, or is rendering performed by the 3DGS branch while mesh is obtained from the Instant-NSR branch?
- Can using pseudo-depth and normals (obtained from other estimation algorithms) directly enhance Instant-NSR or Gaussian, potentially achieving good reconstruction and rendering results without the need for bidirectional optimization?
- Which branch's rendering results are displayed in the ablation study? If it is the Instant-NSR branch, then different strategies for adding and removing points should not affect its rendering results. Similarly, if it is the 3DGS branch, should depth-guided sampling also not affect its rendering outcomes?
局限性
Yes
Thanks for your efforts and valuable comments. Below we address concerns for each question. Common concerns are detailedly responded in the global rebuttal. Additional Figures and Tables are provided in the attached PDF, in which the index is denoted as Figure A-D and Table A-B.
Q1. Output of each branch.
A1: In our framework, we use the SDF-branch to reconstruct accurate geometry and the GS-branch to render the images. Please refer to the common responses for a detailed explanation.
Q2. Pseudo-depth and normals from other estimation algorithms.
A2: Using pseudo-depth and normals from other estimation algorithms is possible. However, it would make the method highly dependent on the quality of these algorithms. In contrast, our framework allows the two branches to mutually promote each other, resulting in a more robust approach.
Q3. Ablations on sampling process.
A3: The rendering results in the ablation study are from the GS branch, and the reconstruction results are from the SDF branch. 'Depth-guided sampling' and other operations affect both branches due to the 'Mutual Geometry Supervision' mechanism.
I would like to thank the authors for the clarifications as well as the efforts they made during the rebuttal.
Thanks for your comment, we appreciate your effort.
This paper introduces GSDF that utilizes a joint optimization of the GS-branch and SDF-branch to constrain the inherent geometric issues of the 3DGS. Furthermore, it proposes three mutual guidances to ensure satisfactory outcomes in both rendering and reconstruction. The extensive experiments on datasets such as DTU, Tanks and Temples (T&T), and Mip-NeRF 360 demonstrate that the method can achieve high rendering quality while obtaining better geometric results.
优点
- The article is clearly written, allowing one to easily understand the motivation in designing the two branches.
- The author conducted extensive experiments across various datasets to illustrate the high quality of rendering and geometric results achieved by GSDF.
- Utilizing the SDF-branch to supervise the inaccurate geometry or depth of 3DGS seems to be reasonable.
缺点
- Instead of sampling relying on predicted SDF, GSDF samples near the depth from GS branch. However, I guess the imprecise depth of 3DGS could infulence the SDF-branch. I am curious whether there is a difference between the results of the SDF-branch and the baseline's SDF-branch results?
- Which branch outputs the final result of GSDF method? If the GS branch outputs both rendering and gemetry results, how about the rendering and geometry outputs of another branch?
- It seems that the time-consuming aspect is missing in Table 1 or Table 2, and the quantitative result of NeuS is missing in Figure 4. Additionally, as reported on lines 248-249, the time consumption of GSDF (2 hours) is significantly higher compared to 2DGS (5-10 minutes) and the original 3DGS. This substantial gap may be attributed to the volumetric rendering of the SDF branch; thus, the improved geometry results might be benefiting from this branch. It may be necessary to compare it with some neural implicit reconstruction methods (like Neuralangelo) to better evaluate its performance given the additional computational cost.
- In Table 3 of the ablation study, if depth-guided sampling is removed (w/o depth-guided sampling), one would need to consider whether the SDF-branch is now completely identical to the baseline NeuS. At this point, it would be important to assess if the results of the SDF-branch are close to baseline of SDF-branch. Furthermore, as mentioned in Lines 248-249, one should investigate whether the time consumption has significantly increased due to the absence of depth-guided sampling, given that this might affect the efficiency of the volumetric rendering performed by the SDF-branch.
- The result in table1, the CD results of NeuS and 2DGS is not as good as in their papers. And there might be some ambiguity in the symbolic expressions presented in the text. e.g. is line154 Fsdf the same with Line139 Fs?
问题
please refer to weaknesses section.
局限性
The authors didn't explicitly address the limitation of their approach.
Thanks for your efforts and valuable comments. Below we address concerns for each question. Common concerns are detailedly responded in the global rebuttal. Additional Figures and Tables are provided in the attached PDF, in which the index is denoted as Figure A-D and Table A-B.
Q1. Imprecise depth; SDF-branch vs. Baseline's SDF-branch.
A1: As we described in the paper (L124-133), the GS-branch is effective in locating the sampling area while accurate depth is not a strict requirement. The depth guided sampling uses the position of depth point's SDF value as the interval of sampling, considering both the depth from the GS-branch and the predicted SDF from the SDF-branch. Additionally, to enhance robustness, we warm up the GS-branch to provide a coarse depth. Extensive experiments demonstrate that GSDF remains robust even if the depth from the GS branch is not precise.
There are noticeable differences between the results of SDF-branch and the baseline's SDF-branch. In Figure 4 of the main paper, the first column is the results of the baseline's SDF-branch and the last column shows the results of our SDF-branch.
Q2. Each branch's output.
A2: Note that, our framework features a two-branch design, where each branch excels in its respective task while benefiting from mutual supervision: the SDF-branch specializes in accurate geometry reconstruction, while the GS-branch focuses on high-quality image rendering. This mechanism enables our approach to achieve superior results in both individual methods (L24-27).
Q3. Time consuming, comparison with NeuS and Neuralangelo.
A3: We did not include NeuS results in Figure 4 of the main paper because NeuS struggles with reconstructing complex scenes. We present several object-level cases in Figure C, showing GSDF consistently outperforming NeuS. We included the time consumption for GSDF and single-branch methods in Table A. Although GSDF is slower per iteration, it achieves faster SDF convergence compared to the SDF branch alone in terms of both iteration and training time (see Figure A). Importantly, GSDF yields significantly better quality results.
Following the reviewer's suggestion, we compared GSDF with Neuralangelo. Neuralangelo requires over 12 hours on 2 GPUs and produces inferior results compared to GSDF, as shown in Figure D. Additionally, the actual time for 2DGS in real scenes is longer than 5-10 minutes, and its rendering quality is degraded, as shown in Tables A and B.
Q4. Ablation of the sampling process.
A4: In Table 3 of the main paper, when depth-guided sampling is removed, we switch to SDF-guided sampling, as used in most NeuS-based reconstruction methods. We compared the SDF convergence speed between GSDF and our SDF branch alone. As shown in Figure A, GSDF achieves faster SDF convergence in both training iterations and time. Specifically, following [1], we use kernel size as an indicator of reconstruction quality, where a smaller kernel size indicates better geometry. Our method consistently achieves better results with faster convergence.
Reference: [1] Wang Z, Shen T, Nimier-David M, et al. Adaptive shells for efficient neural radiance field rendering. ACM Trans. Graph., 42, 2023.
Q5. The configuration of the reported report and notation.
A5: Since our method is not purely a reconstruction method, we split the dataset into training and test sets, with 1 test view for every 8 views. This differs from the default settings in NeuS and 2DGS, which used all images for training. Therefore, we used their officially released code to train the models with our settings.
Regarding the notation, F_sdf and F_s are indeed the same MLP. We will clarify this in the revised version of the paper.
Thanks for your clarification. My concerns are partially resolved, and I raised my rating.
Thanks for your comment, we appreciate your effort.
We thank all reviewers for their valuable feedback. We are encouraged that reviewers find
- our two-branch design is novel and effective in boosting reconstruction and rendering quality simultaneously;
- our analysis and experiments are useful and comprehensive.
We will release our code for reproduction and future research. Here we address some common concerns.
1. Function of each branch
Note that, our framework features a two-branch design, where each branch excels in its respective task while benefiting from mutual supervision: the SDF-branch specializes in accurate geometry reconstruction, while the GS-branch focuses on high-quality image rendering.
We emphasize that achieving good qualities in both rendering and reconstruction is extremely hard and no recent method can attain this goal to the best of our knowledge. As we described in the paper (line 24-26), a representation alone struggles to achieve good quality in both reconstruction and rendering. Through thorough analysis, we noticed that a naive integration (like guidance merely by loss) hardly balances the learning priority during training, thus, it can only boost the quality of either rendering or reconstruction at the cost of sacrificing the other.
Instead, we dug deeper into the architecture characteristics of the two branches and propose a tight guidance from model architecture, e.g., using the predicted depth from GS-branch to guide the ray sampling of SDF-branch (significantly guide the convergence of SDF branch) and use the SDF field to guide the densification of the GS-branch (geometry-based density control rather than heuristic strategy). The effectiveness is confirmed through comprehensive evaluations, showing improved reconstruction and rendering quality.
2. Time consumption
Inference time remains unaffected as each branch can be used individually.
Regarding training time:
- In each iteration, the training overhead of our two-branch design is higher than single-branch designs (Tab A). However, GSDF achieves better reconstruction quality comparing to the SDF-branch alone when trained for the same amount of time/iterations, as illustrated in Fig A.
- The improvement in both the rendering and reconstruction quality are nonnegligible. The effectiveness of our proposed framework comes from the combination of local and global optimization. By introducing mutual optimization, the training time will inevitably increase. However, without much overhead on training efficiency, we can achieve improvement in both rendering and reconstruction, whilst our design does not have an impact on the inference stage, guaranteeing efficiency in downstream applications.
- The core contribution is our design of the two-branch (geometry + rendering) framework, where each branch can be updated to a more efficient version in the future. We demonstrated the generalizability of our framework by switching the GS-branch from Scaffold-GS to 3DGS, which still exhibited superior rendering and reconstruction quality (Fig D, Tab B)
3. Comparison with Concurrent work
The concurrent work NeuSG aims for an improved reconstruction, which augments the SDF branch by a SDF loss to encourage the SDF value of Gaussian points and MVS points to be 0, and a normal loss to encourage normal consistency between SDF and Gaussian. Extra regularization loss (including flattening the gaussian shape) is introduced to make gaussian more friendly to geometry, regardless of the potential sacrifice in rendering quality.
GSDF has critical differences comparing to NeuSG:
- Instead of only focusing on reconstruction, GSDF aims to improve both geometry and rendering, whose effectiveness has been verified by extensive experiments.
- Beyond loss-based guidance, we investigate the combination inside the model architecture and propose a tightly-coupled two-branch design including depth-guided sampling and SDF-guided densification.
- Moreover, GSDF does not require an extra MVS process to provide accurate geometry, making ours a more versatile method requiring only coarse initialization.
The ratings for this paper include two accepts (LpsC and H3YA), and two borderline accepts (96yp and yTkf). All reviewers acknowledge the effectiveness of the method presented in this paper. The combination of 3DGS and SDF mutually enhances each other, improving both reconstruction and rendering quality. Reviewers H3YA and LpsC have concerns about the paper's contribution and originality. All reviewers raised some technical concerns. The rebuttal addressed most of these concerns, so the AC recommends acceptance.