Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering
We propose a normal-invovled rendering strategy for 3DGS, termed Normal-GS, which help enhance both the rendering quality and the normal estimation accuracy.
摘要
评审与讨论
The authors propose a novel appearance modeling technique for an chor-based 3D Gaussian Splatting representation based on normal information and an incident lighting parametrization with MLPs. In addition to representation several regularization techniques are used to stabilize normal and incident light optimization. The central claim of the paper is to achieve competitive the rendering quality while achieving more detail geometric information. The authors show experiments on the MIP-NeRF360, Tanks&Temples, Synthetic-NeRF and the deep blending dataset and provide a quantitative comparison in rendering quality and normal-accuracy.
优点
- The paper is easy to read and all components are well explained.
- The conducted experiments use the most relevant datasets and metrics to assess the image quality, LPIPS, SSIM, PSNR.
- The underlying research problem of reconstructing accurate appearance and geometry is relevant for the field of neural rendering and an very active research area.
缺点
-
The experimental evaluation only contains 3DGS based methods. Even though there are inherent advantages of 3DGS methods, I'd still expect a comparison to NeRF-based methods on the same research problem, e.g. Ref-NeRF and follow-ups. The central question here would be, how good is the proposed method compared to the best Neural Field based method?
-
The general definition of normals in the context of 3D gaussians sounds a bit vague. The authors say that they define the normal of the a gaussian primitive as the shortest axis of the 3D gaussian without actually defining a surface. How do you define the normals when gaussians overlap or are semi-transparent?
-
The authors claim to improve geometric quality and show the comparison in normal accuracy in table 2. From a 3D reconstruction perspective, the normal accuracy indeed can indicate high quality, however metrics on the reconstructed geometry, like chamfer distance, F1-score, might be a better indicator for geometric quality.
问题
I'd would be great if the authors could provide explanations to weaknesses.
局限性
The authors addressed valid limitations in the dicussion part.
We thank the reviewer for helpful comments and suggestions. We are glad to address the issues raised in the review.
Q1: Comparison to NeRF-based methods
We would like to include the comparisons to NeRF-based methods here and in the final version. It is important to highlight that 3DGS-based methods, including ours, are relatively fast and could support real-time rendering compared with NeRF-based methods. As illustrated in Table C, on the Mip-NeRF 360 dataset, our method performs the second best in terms of the PSNR and achieves comparable SSIM and LPIPS scores. The involvement of the proposed IDIV could provide better novel view synthesis quality compared with Ref-NeRF, which further assures the novelty of our design. Please also note that the training time for Zip-NeRF is > 10 hours on high-performance GPUs, which is much longer than 3DGS-based methods, including ours.
| Method | PSNR | SSIM | LPIPS |
|---|---|---|---|
| 3DGS [2] | 28.691 | 0.870 | 0.182 |
| Mip-NeRF 360 [26] | 29.231 | 0.844 | 0.207 |
| Ref-NeRF [22] | 28.553 | 0.849 | 0.196 |
| Zip-NeRF [30] | 30.077 | 0.876 | 0.170 |
| Ours | 29.341 | 0.869 | 0.194 |
Table C. Rendering quality comparisons on the Mip-NeRF 360 dataset.
Q2: Definition of normals in the context of 3D Gaussians.
We acknowledge that we did not define normals for true surfaces but rather established a normal direction for each Gaussian. As discussed on L202-203, we define the normal for a 3D Gaussian based on its geometric properties. Specifically, during optimization, 3D Gaussians often exhibit flatness around the surface areas, as observed in [5, 7, 45]. Therefore, for a near-flat Gaussian ellipsoid, its shortest axis thus functions as the normal vector. Consequently, we use the shortest axis of each Gaussian as its normal attribute. For areas overlapped by multiple Gaussians, the normal is the weighted average of overlapped Gaussians. As shown in Fig.1, we found with our IDIV design, the model can also render a reasonable surface normal for the semi-transparent cover in this definition. Moreover, defining physically correct normals for semi-transparent areas remains an interesting and open problem and we leave it for future work.
Q3: Metrics on the reconstructed geometry other than the normal accuracy.
We agree that the overall geometry quality is also important, in addition to the normal accuracy. We follow your suggestions and add experiments to evaluate the reconstruction quality of our method on the DTU dataset in terms of the mean Chamfer distance. We also report the rendering quality indicated by PSNR to demonstrate the advantages of our method. The results of 3DGS [2], SuGaR [8], and 2DGS [44] are adopted from Table 3 of 2DGS [44]. For a fair comparison, we conduct experiments under the same setting as 2DGS [44]. As Table 9 illustrates, our method achieves outstanding rendering quality while maintaining better geometric quality than 3DGS.
| Method | Mean Chamfer Distance | PSNR |
|---|---|---|
| 3DGS [2] | 1.96 | 35.76 |
| SuGaR [8] | 1.33 | 34.57 |
| 2DGS [44] | 0.80 | 34.52 |
| Ours | 0.94 | 37.63 |
Table A. Geometric and rendering quality comparisons on the DTU dataset.
We appreciate your recommendations regarding the comparisons to NeRF-based methods, clarification of defining normals, and measurements of the overall geometry quality. We will add them in the revised version.
Dear Reviewer Gwpd,
Thanks again for your thoughtful review, which helped us improve the quality and clarity of our paper. We sincerely hope that our rebuttal has addressed your questions and concerns. And if it is possible, please let us know if there are any additional clarifications that we can offer. We appreciate your valuable suggestions.
Thank you very much for your time,
Best Regards.
Authors of Submission 8220
Dear authors,
Thank you , I appreciate your efforts in answering my concerns. The provided results show that the actual reconstructed geometry is accurate and I strongly encourage the authors to add the answer to Q3 to the main paper. The comparison against the NeRF baslines provide evidence that it is also competitive to this line of works. AS the authors resolve my main concerns, I increase my rating to weak accept.
Best Reviewer Gwpd
Dear Reviewer Gwpd,
We are glad to include the suggested measurements of the overall geometry quality, clarifications regarding the normal definition, and comparisons to NeRF-based methods. Your valuable feedback has significantly improved the quality and completeness of our paper. We thank you again for your time and efforts during the review process.
Best Regards,
Authors of Submission 8220
Normal-GS successfully combines color estimation with surface normal optimization, achieving remarkable surface normal prediction without sacrificing view synthesis quality compared to previous methods. In calculating diffuse color, the traditional method that extracts the surface normal components from the incident light integration is used, thus diffuse color is expressed as a simple product of the diffuse albedo, normal, and integration of the incident light (referred to as IDIV in the paper). Based on Scaffold-GS, which predicts multiple nearby 3DGS parameters by passing a single anchor neural Gaussian to a global MLP, IDIV is calculated for each anchor and used for diffuse color prediction. Additionally, by integrating the IDE proposed in RefNeRF into Scaffold-GS, the specular component is successfully represented with respect to the normal. Finally, geometry is regularized by comparing the rendered surface normal with the normal obtained from the depth map.
优点
The idea of approximating diffuse color equation in terms of surface normal by extracting the normal from the integral equation is both simple and powerful.
Additionally, the geometry inaccuracy that can occur due to view-dependent effects is effectively handled by using IDE to manage specular artifacts.
The proposed method compares well with other recent methods that demonstrate robust surface normal in view synthesis.
Detailed explanations of IDE and surface normal estimation are well-documented in the supplementary materials, making it easy to follow.
缺点
Due to the lack of explanation of Scaffold-GS in the related work section, the description of the network output is confusing. While it is unnecessary to provide additional details about Scaffold-GS, it is important to clearly explain the network output.
As I understand it: In Scaffold-GS, an anchored feature f_v is fed into a global MLP to initially predict the parameters of ‘k’ adjacent 3DGS. In Normal-GS, according to line 194, global MLP theta_l is used to additionally predict IDIV. For diffuse color, it is directly calculated using the predicted IDIV and diffuse albedo. For specular color, it is calculated by passing IDE, normal, and feature into the color MLP theta. However, there is no mention of how components like opacity and diffuse albedo are calculated. It is unclear whether these components are embedded and stored like the original 3DGS, or if they are calculated through a separate MLP. Therefore, it would be helpful to specify which MLP is responsible for each component, similar to IDIV, to clarify the process.
There is no quantitative ablation results. Additionally, there is no ablation of the depth-normal loss.
Personally, I think this paper, due to its contributions through the refactorization of the rendering equation, would be more suitable for a computer vision or graphics conferences.
问题
How is "3DGS w/ IDIV" in the ablation studies (section 4.2) (b) implemented? Did you create an additional anchor neural Gaussian to the original 3DGS and use an global MLP to predict IDIV, or did you place it as an optimization component like opacity in each 3DGS and use it for rendering? Additionally, in (c), "ours" generally refers to the complete Normal-GS, so the meaning of "w/ IDIV" is unclear. If (c) refers to Normal-GS without the specular component, it would be better to label it as "Ours w/o L_s."
In Figure 2, it seems that there is a typo where the representation of the "vector" in the "IDIV vector" is duplicated.
局限性
As mentioned in the paper, learning the depth of distant objects, like the sky, consistently is extremely challenging. Therefore, it seems that learning surface normal based on depth will also be very difficult.
Additionally, as described in the weaknesses section, clearer network outputs would be beneficial.
We thank the reviewer for appreciating our work and making constructive comments.
Q1: Clearly explain the network output, and how components like opacity and diffuse albedo are calculated.
We would like to clarify it here and in the final version. The whole procedure about how to calculate IDIVs and the specular color is the same as your comments in the second paragraph of the weakness section and as our explanation on L194-197. As for the opacity, because the updating strategy of the Scaffold-GS depends on the value of opacity, we followed the Scaffold-GS and predicted it using a global MLP. As for the diffuse albedo, we reused the color MLP of the Scaffold-GS to predict it.
Q2: How is “3DGS w/ IDIV” in the ablation studies (section 4.2) (b) implemented?
For “3DGS w/ IDIV” in Fig. 5 (b), we attached the IDIV as an additional attribute to each 3D Gaussian without using global MLPs. As described on L299, in this setting we applied the Laplacian and Total-Variation loss to IDIVs. This was done by first rendering IDIVs following the 3DGS rendering pipeline and then applying image-space regularizers. Please note that this setting was “sensitive to the tuning of loss weights” as discussed on L302 and motivated us to use the locally shared structure, such as Scaffold-GS, to regularize IDIVs.
Q3: Quantitative Ablations.
Thanks for the constructive suggestions. To better verify the effectiveness of the components, we add the quantitative results on DTU by comparing the mean chamfer distance and PSNR for geometry and rendering quality evaluation. Please refer to Table B. Our base model is the Scaffold-GS. 1) We add depth-normal into the Scaffold-GS, the results of (b) in Table B shows that the depth-normal loss can significantly improve the geometry quality but ruin the rendering with more than 2dB PSNR drop. 2) Then we introduce the specular component into (b), which can improve the rendering quality without compromising the geometry, which further assures the importance of involving normal into the rendering. 3) We add the proposed IDIV into the model to get our final model (d). Our full model further improves the PSNR by more than 0.2 dB with similar geometry quality. Compared with the base model (a) and the results of 3DGS (CD: 1.96, PSNR: 35.76), our full model achieves a better balance between the geometry quality and the rendering fidelity.
| Model | Details | Mean Chamfer Distance | PSNR |
|---|---|---|---|
| (a) | Scaffold-GS | 1.84 | 38.14 |
| (b) | (a) + Depth-normal loss | 0.95 | 35.90 |
| (c) | (b) + | 0.93 | 37.37 |
| (d) | (c) + (Full model) | 0.94 | 37.63 |
Table B. Quantitative ablation studies on the DTU dataset.
Q4: Some typos.
Thank you for pointing out these typos. We will fix them in the final version.
We will also refine our writing as suggested and include the mentioned experiments in the revised version. We appreciate the reviewer's valuable help and comments.
Dear Reviewer cnbL,
Thanks again for your thoughtful review, which helped us improve the quality and clarity of our paper. We sincerely hope that our rebuttal has addressed your questions and concerns. And if it is possible, please let us know if there are any additional clarifications that we can offer. We appreciate your valuable suggestions.
Thank you very much for your time,
Best Regards.
Authors of Submission 8220
I somewhat agree with the comment from reviewer 5i8r that the motivation for directly optimizing the surface normals, which are byproducts of geometry, without targeting the geometry itself, is weak.
From the perspective of geometry estimation, I believe that comparisons with methods like SuGaR and 2DGS, which were conducted in the rebuttal, should be included in the paper.
However, since the calculation of normals is essential for using geometry as a rendering component, I still believe this research holds value.
Assuming that the implementation details and comparisons with existing surface reconstruction method are supplemented, I will maintain my score.
Dear Reviewer cnbL,
We are glad to include the implementation details, suggested quantitative ablations, and comparisons with existing surface reconstruction methods. We sincerely appreciate your constructive comments and feedback, which have significantly enhanced the quality and completeness of our paper. Thank you again for your time and efforts during the reviewing process.
Best Regards,
Authors of Submission 8220
This paper proposes a novel appearance modeling of 3D Gaussian Splatting (3DGS) for both accurate appearance representation and geometry reconstruction. Existing 3DGS methods suffer from the trade-off of the accuracy of appearance and geometry due to the disconnection between surface geometry and rendering. To address this problem, this paper reformulates the rendering equation for fast physically-based 3D Gaussian (Normal-GS) rendering, directly connecting surface normal and rendering color. For stable optimization, Normal-GS uses anchor-based MLPs that implicitly consider the local smoothness of local illumination. The experimental results show that Normal-GS reconstructs accurate surface normal while retaining the rendering quality of 3DGS.
优点
- Propose a novel physically-based rendering method for 3D Gaussian Splatting that can use shading cues for normal estimation, backpropagating photometric loss to the surface normal of 3D Gaussian.
- Exploit anchor-based 3DGS to represent and stably optimize local incident light without time-consuming ray tracing.
- Experimentally show that the proposed method can achieve both high-quality rendering and accurate normal estimation even in complex lighting and specular scenes where existing methods degrade.
缺点
- The illumination of diffuse and specular components is independent of each other. This is physically implausible.
- The anchor-based regularization would prevent the reconstruction of detailed geometry like bicycle spokes in Fig. 6. Furthermore, since IDIV depends on the surface normal through , it also requires the smooth surface normal.
问题
- How plausible are the optimized diffuse and specular reflection components? What if only either of them is rendered?
- Can the IDIV be visualized to validate that it captures local incident light? For example, in outdoor scenes, it is oriented to the direction of the sun and other directions outside and within the shadow region, respectively.
- Scaffold-GS uses the enhanced view-dependent features instead of directly using a local feature . Which features does this method use? Regarding both geometry and appearance modeling, view-independent Gaussian attributes would be desirable.
局限性
Yes.
We thank the reviewer for appreciating our work and providing constructive comments.
Q1: The illumination of diffuse and specular components is independent of each other. This is physically implausible.
We agree that diffuse and specular components cannot be independent physically. However, our main purpose is to strive for a better balance between rendering and geometry quality. Our insight is to keep the model simple without solving the highly-unconstrained inverse rendering problem, which needs too many extra regularizers. We do believe our framework could be further improved by introducing additional parameters like metalliciness [16, 17, 18, 45] for downstream applications.
Q2: Visualization of Diffuse, Specular and IDIVs.
Thanks for the helpful suggestions. We are glad to provide more results to show the plausibility of our method. We show the visualization of our specular and diffuse components in Figure A of the one-page PDF. We choose scenes containing specular materials to better demonstrate the effectiveness of our method. From the visualization, we can observe that our method successfully disentangles the diffuse and specular components. We also visualize IDIVs in Figure B of the one-page PDF. We choose an outdoor scene for better visualization. We can observe that IDIVs approximately align with the sunlight in bright regions (on the table) and roughly diverge in the shadow region (under the table), which indicates that our IDIVs capture local incident lighting information.
Q3: The anchor-based regularization would prevent the reconstruction of detailed geometry like bicycle spokes in Fig. 6.
We acknowledge the detailed geometry like bicycle spokes is not fully modeled by our method. The thin structure is a challenging problem in discrete Gaussian representation which also related to the aliasing issue. We leave it to the future work.
Q4: Furthermore, since IDIV depends on the surface normal through , it also requires the smooth surface normal.
We agree with your comments. A smooth and correct normal is important for many applications, including estimating lighting and supporting capturing specular effects. We applied the depth-normal loss to regularize estimated normals.
Q5: Scaffold-GS features.
We agree that view-independent Gaussian attributes are desirable. However, during our experiments, we found that view-dependent features would produce slightly better rendering quality (~0.1-0.2 dB). We believe that the viewing direction could help determine the inside-outside direction of the surface, thus benefiting the rendering quality.
We are glad to include the suggested results in the revised version. Thank you again for your valuable feedback.
Dear Reviewer 68er,
Thanks again for your thoughtful review, which helped us improve the quality and clarity of our paper. We sincerely hope that our rebuttal has addressed your questions and concerns. And if it is possible, please let us know if there are any additional clarifications that we can offer. We appreciate your valuable suggestions.
Thank you very much for your time,
Best Regards.
Authors of Submission 8220
I appreciate the authors addressing all my questions. I raise my rating from Borderline Accept to Weak Accept.
Dear Reviewer 68er,
We appreciate your time and valuable feedback during the reviewing process. We believe your suggestions help improve the clarity and plausibility of our paper. We will include the suggested results and clarifications in our final version.
Best Regards,
Authors of Submission 8220
This paper addresses the challenge of achieving high rendering quality and accurate geometry in computer vision and graphics. While recent advancements in 3D Gaussian Splatting (3DGS) have enabled real-time high-fidelity novel view synthesis, the discrete and noisy nature of 3D Gaussian primitives hinders accurate surface estimation. The authors propose Normal-GS to integrate normal vectors into the 3DGS rendering pipeline. Surface colors are re-parameterized as the product of normals and a specially designed Integrated Directional Illumination Vector (IDIV).
优点
- The writing of this paper is good. It is easy to follow and understand the technical details.
- The usage of PBR for optimizing normals of 3DGS is interesting, though it is not new.
- Although the experimental results show only slightly better quantitative measurements compared to SpecGaussian, the quality of the normals is significantly improved.
缺点
- L102-103, the authors claim that PBR-based 3DGS methods show lower rendering quality than original 3DGS, which is not correct, please refer to the tables in GaussianShader [7].
- The most closely related work to this paper is 2DGS [44], but it is surprisingly not mentioned or included in the experiment section, which is quite perplexing. 2DGS simplifies 3DGS by aligning it with the surface normal direction, allowing the scale parameters to be directly optimized based on the appearance loss. This omission raises confusion since 2DGS has direct relevance and could potentially provide insights or comparisons for the proposed method.
- The integration of Physically Based Rendering (PBR) techniques, specifically Bidirectional Reflectance Distribution Function (BRDF), with 3D Gaussian Splatting (3DGS) is not a novel concept. As mentioned by the authors, previous works have already explored this combination and achieved satisfactory results. Moreover, existing methods mostly enable "relighting" while the proposed method is not. In the proposed method, the authors introduce a simplification of the integral over the up half-space of a point, referred to as IDIV, which represents the relationship between illumination and incident light direction. They decode this IDIV using a Multi-Layer Perceptron (MLP) under the assumption of Lambertian reflectance. However, the authors also draw inspiration from Ref-NeRF and incorporate a specular component to account for specular effects. This additional complexity and the combination of different techniques have resulted in confusion and difficulty in understanding the underlying purpose, raising doubts about the technical novelty of the method, as it appears to be a combination of existing approaches (referred to as an "A+B" manner).
- In order to assess the quality of the geometry (indirectly optimized by normals, which is the motivation stated by the authors), there are established metrics and datasets available, such as DTU, that can be utilized. It is perplexing that the authors solely evaluate the Mean Angular Error (MAE) of normals, as the quality of normals is not always the primary factor when evaluating geometry. Instead, the overall geometry quality holds greater importance and should be the focus of evaluation.
问题
I have expressed my concerns regarding the paper above, and I strongly recommend that the authors address them by enhancing the motivation, technical contributions, and exposition. These improvements are necessary to enhance the quality of the paper. As it stands, I am inclined to reject the current form due to the unclear motivation, absence of baseline methods, and limited applicability. I am also open to revising my rating based on the response and other reviewers' feedback.
局限性
NA
Thank you very much for your review and valuable feedback. Below we address the concerns raised in the review.
Q1: L102-103, the authors claim that PBR-Based 3DGS methods show lower rendering quality than original 3DGS, which is not correct, please refer to the tables in Gaussian Shader [7].
We appreciate the reference to Gaussian Shader [7], which indeed demonstrates improvements in rendering quality on synthetic datasets. We agree that [7] demonstrates better performance under controlled, synthetic conditions. However, our primary concern is the Gaussian Shader’s reliance on a global environment map, which does not align with real-world scenarios with analysis on L165-174 of our paper. In our experiments, as illustrated in Figure 4 and Table 1, [7] faced difficulties in modeling complicated real-world scenes. These experiments are re-run directly using their released codes.
Q2: 2DGS [44] is not mentioned or included in the experiment section.
We would like to clarify that the omission of 2DGS [44] from our experiments was not intentional. We discussed and cited 2DGS on L94-95 of our paper. However, it is important to note that 2DGS was recently released (with codes available on May 3rd). This timing is the primary reason for its exclusion from our experiments. Additionally, as noted on L94-95, 2DGS sacrifices some rendering fidelity in novel view synthesis due to the lack of a clear relationship between geometry and appearance, as shown in Table 3 and Table 4 of [44]. We also want to emphasize that 2DGS and our method are conceptually orthogonal and could potentially be combined. Unlike 2DGS, our method explicitly addresses the interactions between lighting and normals. We have also added comparisons with 2DGS, focusing on geometric reconstruction quality and rendering quality. Please refer to our response to Q3 and Table A for these comparisons.
| Method | Mean Chamfer Distance | PSNR |
|---|---|---|
| 3DGS [2] | 1.96 | 35.76 |
| SuGaR [8] | 1.33 | 34.57 |
| 2DGS [44] | 0.80 | 34.52 |
| Ours | 0.94 | 37.63 |
Table A. Geometric and rendering quality comparisons on the DTU dataset.
Q3: The quality of normals is not always the primary factor when evaluating geometry. Instead, the overall geometry quality holds greater importance and should be the focus of evaluation.
We agree that overall geometry quality is important, but we believe that accurate normal is also very useful for many applications and has nice properties such as strong generalization capability [A, B]. We appreciate the reviewers' recognition of the accuracy of our normal predictions. To address the importance of overall geometry quality, we followed your suggestion and further evaluated our method's performance on the DTU dataset. We focused on both the reconstruction and rendering quality. We assessed the mean Chamfer distance and PSNR, adopting results for 3DGS [2], SuGaR [8], and 2DGS [44] directly from Table 3 of 2DGS [44]. For a fair comparison, we followed the process from 2DGS. As shown in Table A, our method achieves outstanding rendering quality while with a better mean Chamfer distance than 3DGS.
[A] Bae, Gwangbin, and Andrew J. Davison. "Rethinking inductive biases for surface normal estimation." CVPR. 2024.
[B] Guangyao Zhai et.al Monograspnet: 6-dof grasping with a single rgb image. ICRA. 2023.
Q4: The integration of PBR … with 3DGS is not a novel concept. ...existing methods mostly enable “relighting” while the proposed method is not.
We acknowledge the significant contributions of previous PBR-based methods [7, 16, 17, 45], especially as they enable “relighting” editability. However, as we discussed on L44-51 and L98-101, prior attempts often require approximations and meticulous regularization terms, which compromise either rendering quality or geometric accuracy. This motivates us to design a straightforward method to integrate normal information explicitly and simply into 3DGS. We also directly compared with the PBR-based method, Gaussian Shader [7], in our experiment section and our method achieves better rendering results and normals.
Q5: The motivation and novelty of our work.
Thank you for your feedback. We respectfully disagree with the characterization of our method as merely a combination of existing approaches. Our primary motivation is to address the balance issue between rendering and geometry quality, rather than focusing on relighting or intrinsic decomposition. From a physically-based rendering perspective, we identify that the core issue arises from the lack of interaction between appearance and normal estimation. Solving this problem is challenging, and our approach aims to simplify the model while avoiding the complexities of highly unconstrained inverse rendering problems, which often require numerous additional regularizers. Our method introduces only a few additional IDIV and reflection parameters, yet achieves remarkable improvements in normal estimation and maintains rendering quality. We believe our approach provides valuable insights into the issue and establishes a new baseline that could benefit the community.
Q6: ... under the assumption of Lambertian reflectance. ... incorporate specular components to account for specular effects.
We would like to clarify that our approach does not assume objects are Lambertian. As stated on L151-153, our initial consideration is for a simple ideal case with Lambertian objects. For more complex materials, including those with specular effects, we address this in Section 3.4 of our paper. As illustrated in Figure 2 of our paper, our method is designed to handle both diffuse and specular components. This allows us to account for a range of material properties beyond the Lambertian assumption.
We thank you again for your valuable feedback and we will include suggested experiments in the final version.
Dear Reviewer 5i8r,
Thanks again for your thoughtful review, which helped us improve the quality and clarity of our paper. We sincerely hope that our rebuttal has addressed your questions and concerns. And if it is possible, please let us know if there are any additional clarifications that we can offer. We appreciate your valuable suggestions.
Thank you very much for your time,
Best Regards.
Authors of Submission 8220
I appreciate the authors' efforts during the rebuttal phase. After carefully reviewing their responses and the comments from other reviewers, I believe my initial justification is correct and valid, and I stand by my evaluation leaning towards rejection. IMO this work still requires further improvement to meet the standards of NeurIPS, including the motivation, technical novelty, experiments, etc. This paper may be better suited for graphics venues instead of machine learning conferences. Some specific comments:
- I disagree with the assertion that the proposed method surpasses the inverse rendering technique for its 'environment map representation.' The proposed method also utilizes PBR-based rendering, albeit in a simplified form without the functionality of relighting or material editing. Furthermore, inverse rendering methods demonstrate better performance in novel view synthesis compared to 3DGS.
- I maintain my evaluation that this work is incremental, as there are already numerous existing studies on 3DGS employing PBR rendering with simplified BRDF, such as split-sum. Recent works, such as [1][2], have incorporated more accurate PBR representations within 3DGS, and I think this direction is more promising as it significantly improves the accuracy of PBR rendering in 3DGS.
[1] Unified Gaussian Primitives for Scene Representation and Rendering
[2] 3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes
Dear Reviewer 5i8r,
Thank you for your feedback and the opportunity to address your concerns. We fully acknowledge the significant contributions of previous PBR methods [7, 16, 17, 45] targeting inverse rendering, and we appreciate your references to recent methods combining ray tracing with 3DGS.
Our approach, however, addresses the core balance issue between rendering and geometry quality through a simple yet effective design, without complex regularizers, assumptions, or ray tracing, as detailed in our paper (L32-36, L52-53, Figure 1) and response to Q5. This focus differentiates our approach from inverse rendering techniques that often require additional regularizers or assumptions. In the Experiment section of our paper, we directly compared our method with a prior inverse rendering method [7] and showed our superior rendering quality and normal accuracy. We believe our method establishes a new baseline that could benefit the community. We respectfully emphasize that this explanation is crucial to understanding the motivation and advantages of our method.
Regarding the “environment map representation” and “split sum” [A], both of which assume the existence of a global environment map, we would like to clarify that our method does not use this representation. Instead, as comprehensively outlined in our main paper (L165-174, Figure 4, experiments) and our response to Q1, this assumption has limited applicability in real-world scenarios. To clarify further, Figure 4 clearly illustrates the disadvantage of relying on global environment maps, where [7] fails to capture specular effects but our method succeeds. Moreover, in the “Dr. Johnson” scene in “Deep Blending”, which involves multiple rooms without a global environment map, the previous method [7] failed, as shown in Table 1 of our paper.
We appreciate the suggestion that our work might be “better suited for graphics venues.” However, given the success of related work in the field of neural rendering and geometry modeling, such as IDR [B], Neural-PIL [C], NeuS [D], VolSDF [E], SAMURAI [F] and NDRMC [G] at NeurIPS, we believe our contributions align well with the conference’s scope, particularly in “Machine Vision.”
Thank you again for your thorough review.
Best regards,
Authors of Submission 8220
[A] Brian Karis. Real shading in Unreal Engine 4. SIGGRAPH 2013 Course: Physically Based Shading in Theory and Practice, 2013.
[B] Yariv, Lior, et al. "Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance." NeurIPS (2020).
[C] Boss, Mark, et al. "Neural-pil: Neural pre-integrated lighting for reflectance decomposition." NeurIPS (2021).
[D] Wang, Peng, et al. "Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction." NeurIPS (2021).
[E] Yariv, Lior, et al. "Volume rendering of neural implicit surfaces." NeurIPS (2021).
[F] Boss, Mark, et al. "Samurai: Shape and material from unconstrained real-world arbitrary image collections." NeurIPS (2022).
[G] Hasselgren, Jon, et al. "Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising." NeurIPS (2022).
We sincerely thank all the reviewers for their constructive and insightful comments, especially for recognizing our work by
- Clear presentation. "easy to follow and understand the technical details" (5i8r), "well-documented" (cnbL), "excellent presentation" (68er), "easy to read and all components are well explained" (Gwpd).
- Soundness of design. "Approximating diffuse color in terms of the surface normal by extracting the normal from the integral equation is both simple and powerful", "handle view-dependent effects by using IDE" (cnbL), "a novel method that can use shading cues for normal estimation, backpropagating photometric loss to the surface normal of 3D Gaussian", "stably optimize local incident light without time-consuming ray tracing." (68er). "Good soundness" (5i8r, Gwpd).
- Superior experimental results. "the quality of the normals is significantly improved" (5i8r), "demonstrate robust surface normal in view synthesis", "achieving remarkable surface normal prediction" (cnbL), "the proposed method can achieve both high-quality rendering and accurate normal even in complex lighting where existing methods degrade." (68er).
We will polish and revise the paper according to the reviewer's suggestions. Below please find our detailed responses. We sincerely welcome additional feedback and further discussions.
Best regards,
Authors of Submission 8220
The paper introduces a new approach to integrating surface normals into the 3D Gaussian Splatting pipeline, aimed at improving both rendering quality and geometric accuracy while maintaining real-time performance. Reviewers generally found the paper well-written and the proposed method innovative, particularly in how it balances rendering and geometry quality without the need for complex regularization.
However, some reviewers initially criticized the paper for being incremental, questioning its novelty and the clarity of certain aspects. There were also concerns about the lack of comparisons with NeRF-based methods and the absence of detailed ablation studies. The authors addressed these points by providing additional experiments, comparisons, and clarifications in their rebuttal, which helped alleviate most concerns.
Despite one reviewer maintaining a stance against acceptance, the other reviewers appreciated the improvements made during the review process, leading to an increased rating from them. Given the solid technical contribution, clarity, and relevance to the field, the paper is recommended for poster acceptance.