7.3

/10

Spotlight4 位审稿人

最低4最高5标准差0.5

4.3

置信度

创新性2.0

质量2.5

清晰度3.0

重要性3.0

NeurIPS 2025

LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering

Jonas Kulhanek,Marie-Julie Rakotosaona,Fabian Manhardt,Christina Tsalicoglou,Michael Niemeyer,Torsten Sattler,Songyou Peng,Federico Tombari

OpenReview PDF

提交: 2025-05-10更新: 2025-10-29

TL;DR

LODGE delivers outstanding quality and superior rendering speeds in large-scale 3D scenes, enabling real-time rendering even on mobile devices.

摘要

关键词

3D Gaussian splattingLevel-of-Detail

评审与讨论

审稿意见

评分: 4置信度: 42025-06-16

This paper presents LODGE, a hierarchical level-of-detail (LOD) method for 3D Gaussian Splatting (3DGS), designed to enable real-time rendering of large-scale 3D scenes even on memory-constrained devices such as mobile phones. The core approach constructs a chunk-based spatial partitioning with depth-aware smoothing and importance-based pruning to reduce the number of active Gaussians. By precomputing Gaussian sets per chunk and employing an opacity blending mechanism, the method ensures smooth transitions between chunks during rendering. Experimental results demonstrate state-of-the-art rendering quality and speed while achieving notable memory and computational efficiency. However, despite its practical benefits, the method draws heavily on prior LOD-based 3DGS work, with limited technical novelty in its core contributions.

优缺点分析

Strengths

The paper is well-organized and easy to follow, making the technical approach accessible to the reader.
The chunk-based rendering strategy improves rendering efficiency and significantly reduces GPU memory usage, which is crucial for real-time mobile deployment.
The proposed opacity blending mechanism between adjacent chunks is an effective solution for minimizing visual discontinuities, a common artifact in LOD-based rendering.
Experimental results show clear advantages over existing baselines, including in terms of runtime and memory consumption.

Weaknesses

Limited Technical Novelty

The proposed method largely builds on existing LOD-based 3D Gaussian Splatting approaches such as Octree-GS [1] and H3DGS [2], without introducing fundamentally new mechanisms. The chunk-based design appears more as an engineering integration of existing ideas rather than a conceptual innovation, and the contributions listed in the paper partially overlap and should be clarified.

Inadequate Analysis of Prior Work

While the paper mentions that existing LOD methods are inefficient or memory-heavy, it lacks a detailed technical analysis of why these approaches fall short. A more rigorous comparison—especially regarding memory scheduling, rendering consistency, and scalability—is needed to motivate the proposed solution.

Chunk Boundary Issues and Visual Artifacts

The method encounters significant challenges when handling Gaussians near chunk boundaries. Although opacity blending helps, the requirement to load the union of adjacent Gaussian sets leads to memory spikes. In scenarios where the camera moves diagonally across chunks, visible artifacts may still occur due to insufficient blending robustness.

Depth Threshold Selection and Adaptability

The depth threshold is selected via a greedy empirical search, lacking theoretical support or dynamic adaptability. This raises concerns about generalization and performance under changing viewpoints or in dynamic scenes. Moreover, the current design does not address spatial load imbalance, which may result in localized computational bottlenecks.

Methodological Clarity and Justification

Key methodological choices—such as the use of K-means clustering for chunk division and the opacity blending function—are insufficiently justified. The geometric intuition behind blending is unclear, and the influence of clustering parameters on rendering quality and performance is not analyzed.

Evaluation Limitations and Fairness

The evaluation setup lacks fairness in GPU memory comparisons: competing methods are assumed to keep all Gaussians in GPU memory, while the proposed method benefits from streaming. A consistent memory evaluation protocol is necessary. Additionally, only two scenarios from the H3DGS dataset are used, limiting claims of generalizability.

Missing Ablation and Sensitivity Analyses

The ablation study does not separately examine the impact of the threshold selection or number of LOD levels. These factors directly influence rendering quality and efficiency, and their effects should be studied quantitatively.

Missing Related Work

A relevant and recent work, LOD-GS [4], is not cited or discussed. This omission weakens the positioning of the paper in the context of current research and should be addressed with a proper comparison and discussion.

[1] Ren K, Jiang L, Lu T, et al. Octree-gs: Towards consistent real-time rendering with lod-structured 3d gaussians[J]. arXiv preprint arXiv:2403.17898, 2024.

[2] Kerbl B, Meuleman A, Kopanas G, et al. A hierarchical 3d gaussian representation for real-time rendering of very large datasets[J]. ACM Transactions on Graphics (TOG), 2024, 43(4): 1-15.

[3] Seo Y, Choi Y S, Son H S, et al. FLoD: Integrating Flexible Level of Detail into 3D Gaussian Splatting for Customizable Rendering[J]. arXiv preprint arXiv:2408.12894, 2024.

[4] Shen J, Qian Y, Zhan X. LOD-GS: Achieving Levels of Detail using Scalable Gaussian Soup[C]//Proceedings of the Computer Vision and Pattern Recognition Conference. 2025: 671-680.

问题

Given that the core LOD representation follows established 3DGS approaches [1-2], could the authors more clearly articulate what specific gaps in prior work motivated the proposed chunk-based rendering scheme? In particular, what limitations of existing LOD methods does this chunk-based strategy uniquely address?
How does this work advance the field beyond incremental improvements to prior LOD-based 3DGS systems? Are there any fundamental rendering challenges that this chunk-based paradigm solves that were previously intractable with conventional LOD methods?
Given that loading the union of Gaussian sets from adjacent chunks may cause prohibitive memory peaks, could the authors discuss potential optimization strategies to maintain real-time performance on resource-constrained mobile devices?
The method appears sensitive to oblique camera trajectories that deviate from inter-chunk axes. Could the authors analyze the theoretical bounds of acceptable camera motion angles before artifacts occur? Would a view-dependent blending weight (e.g., incorporating camera direction) improve robustness?
Static thresholds seem suboptimal for dynamic scenes. What prevents the implementation of runtime threshold adjustment based on real-time performance monitoring? Would a feedback loop from the renderer to the LOD controller be feasible?

[1] Ren K, Jiang L, Lu T, et al. Octree-gs: Towards consistent real-time rendering with lod-structured 3d gaussians[J]. arXiv preprint arXiv:2403.17898, 2024.

[2] Kerbl B, Meuleman A, Kopanas G, et al. A hierarchical 3d gaussian representation for real-time rendering of very large datasets[J]. ACM Transactions on Graphics (TOG), 2024, 43(4): 1-15.

局限性

The current limitations section is too narrow. Beyond Gaussian streaming assumptions, the authors should discuss: Scalability to dynamic scenes, Sensitivity to camera paths, Runtime memory spikes, Generalization of depth thresholds, Training overhead, Failure cases and performance degradation. Moreover, recently published related works such as LOD-GS [4] should be discussed to situate the contribution more thoroughly in the current landscape.

最终评判理由

The authors have provided a careful and thorough response to my comments and have satisfactorily addressed my concerns. Therefore, I would like to raise my score.

格式问题

None

作者回复

2025-07-31

We thank the reviewer for the constructive feedback and will adjust the paper based on their comments and the rebuttal.

**

$W1, W2, Q1, Q2$ Clarifying relation to prior work, limited technical novelty**

** $W2, Q1, Q2$ Which gaps in prior work motivated LODGE**
The focus of our method is to enable rendering of large-scale 3DGS scenes on memory-restricted devices. Unlike traditional mesh-based rendering pipelines, where LOD reduces the amount of primitives required to be loaded and stored in memory, existing 3DGS LOD approaches (e.g., Octree-GS, and H3DGS) only focus on the rendering speed and they ignore the memory aspect of LoD representation. While rendering speed is key in many applications, the large memory requirements imposed by existing 3DGS LOD methods is a limiting factor preventing 3DGS to truly scale to large scenes and small-memory devices (as shown in Tab. 4). We believe to be the first to address the LOD problem from the practical perspective of reducing the number of Gaussians needed to be kept in memory and thus to enable rendering on memory restricted devices - something which cannot be achieved by existing approaches.

** $W1, W2, Q1$ What is the novelty wrt. Octree-GS and H3DGS**
Existing approaches (both Octree-GS and H3DGS) are fundamentally different from our approach in that for each rendered frame, they first need to compute a subset of Gaussians to be processed by the renderer. To this end, they require all Gaussians to be loaded in memory, which is prohibitive/infeasible for systems with restricted memory, e.g., mobile devices, as we show in Table 4. To resolve this issue, we propose a novel LOD rendering strategy with chunk-based caching. Our approach is technically novel and not based on either Octree-GS nor H3DGS. Detailed experiments show the effectiveness of our approach.

** $W1$ Overlapping contributions**
The contributions stated are the following: 1) LOD representation, 2) automatic LOD split selection procedure, 3) per-chunk caching, 4) opacity blending. We believe there are no overlaps as each describes a different contribution as can be seen in Table 3 where we selectively add each one to the base model.

**

$W3,Q3$ Chunk boundary issues and visual artifacts**

** $W3, Q3$ Memory spikes at crossing chunk centers**
In our experience, opacity blending does not lead to memory spikes in practical implementations. Without opacity blending, the point where we need to reload the set of Gaussians would be when crossing chunk boundaries. Therefore we would have to load both chunks into memory and we would have a sharp change of appearance. With opacity blending, we shifted the point where we need to reload Gaussians to the center of the chunk. Close to the center, there are no sharp changes if we only use the chunk’s set of Gaussians, and, therefore, we can unload the previous chunk and load the next one without the memory spike. We will add this more in-depth discussion to the paper.

** $W3$ Diagonal camera movement**
Regarding the diagonal movement across chunks - we agree that there could be artifacts visible in certain cases. However, in commonly used datasets we do not observe artifacts (we tested on H3DGS, Zip-NeRF, Mip-NeRF 360 datasets and on additional non-public data).
We believe the reason to be the following:

Close to the training camera distribution, the same viewpoint when rendered from the closest chunks will be almost the same as the different chunks are well conditioned by training cameras and artifacts are minor even without opacity blending.
The viewpoints outside of this distribution are on the boundary of the chunk-based representation and only the two closest chunks (from inside the training camera distribution) have a sizable effect on it. Therefore, using 2 closest chunks is sufficient in this case.

We will extend the discussion in the paper.

**

$W4$ Depth threshold selection and adaptability**

The depth threshold is chosen to minimize the number of processed Gaussians identified as a bottleneck of 3DGS rendering. A theoretical analysis of 3DGS renderer is not really realistic as the rendering speed depends on CUDA characteristics, tiny implementation details, but most importantly on the 3D scene structure itself. Therefore, we believe only empirical evaluation is possible. To support the greedy search, in Figure 4, we show the convex property of the number of processed Gaussians as a function of depth thresholds.
Regarding the spatial load imbalance, we would like to argue that all current NVS approaches consider and are evaluated on the set of test cameras whose distribution matches the train cameras distribution. Our approach is no different in that it also assumes that the set of cameras at inference will have the same distribution and therefore the automatic procedure produces good thresholds. If the camera moves outside of this distribution, the threshold selection might become suboptimal, but at train time using the training camera distribution is the only information available.

**

$W5$ Methodological clarity and justification**

Justification of K-means
Please refer to the answer to Q2 for reviewer 3YiK.

Intuition behind blending
Please see answer to W1 for reviewer 3YiK. The justification for opacity blending - blending alphas of two sets of Gaussians - is to interpolate between the two 3D scene representations of the two chunks. Using the projection of the camera position onto the line connecting the two chunk centers was chosen to make the interpolation function exactly 1 when passing through chunks but not exactly through chunk centers. We believe different choices of this function are also possible as long as this property holds and the function is smooth. In our experiments our choice worked well and we didn’t explore other options.

**

$W6$ Evaluation limitations and fairness**

Fairness in GPU memory comparisons
We report the number of Gaussians needed when running the rendering pass. Ultimately evaluating GPU memory usage is difficult as there are many choices possible. For example, we could measure peak GPU usage, but that is very much dependent on actual rasterizer implementation rather than on fundamental properties of the methods. Note, that some methods can use a lower SH degree or compress the attributes (OctreeGS, ScaffoldGS), which will reduce the memory usage, but a similar compression can be applied to other approaches and is not a fundamental property of the approach.
We believe reporting the number of Gaussians required to be loaded in memory when running the rendering is a fair proxy as it is comparable across various implementation and compression techniques and measures the more fundamental property of various LoD approaches.

Competing methods are assumed to keep all Gaussians in GPU memory
The competing methods compute LOD split on the fly - unlike our chunk-based strategy - and, therefore, need at least the positions of all Gaussians in memory at all times. Therefore, streaming Gaussians into memory is not feasible for these methods which was the main motivation for our approach - to enable rendering in cases when not all Gaussians fit into the memory (e.g. mobile devices).

Using only two scenarios from H3DGS
We used the only two 3D scenes for H3DGS which are public (see the official H3DGS project webpage). We do not have access to the private Waywe data used in H3DGS.

**

$W7$ Missing ablation and sensitivity analysis**

In the ablation study in Table 3, rows 2, 3, 4, 5 quantitatively evaluate the impact of the number of LOD levels on rendering quality, rendering speed, and memory consumption. Comparing rows 2 and 6 shows the impact of threshold selection.

**

$W8$ Missing related work**

LOD-GS was published at CVPR 2025. The paper does not seem to have been publicly available before the CVPR 2025 proceedings were published (June 12th, about a month after the NeurIPS submission deadline). We are happy to add a discussion of this work, but it was impossible to have been aware of this work by the time of submission.

**

$Q4$ Sensitivity to trajectories outside inter-chunk axes**

Oblique camera trajectories deviating from inter-chunk axes
Please refer to the answer to Q2 for reviewer 3YiK.

View dependent blending weights
The chunks were constructed based on camera positions (distances to chunk centers) only. Therefore, we don’t think there would be any benefit in using viewing directions in opacity blending.

**

$Q5$ Dynamic scenes and on-the-fly threshold adjustment**

Dynamic scenes with LODGE
In our work we only focussed on static scenes. With this assumption, we build the chunks which then stay fixed for the rendering. In most dynamic scenes, only a small portion of the 3D scene moves and the rest stays static (e.g. see $4$ ). In this setup, the dynamic object is likely small compared to the static part of the scene and does not need any LOD strategy and our approach is directly applicable.

Real-time threshold adjustment
After adjusting thresholds, LOD chunks need to be recomputed. We cannot do that for every frame, because we would end up in the same situation as prior works which require Gaussians to be loaded in memory at all times. However, once in a while we can rebuild LOD chunks based on new thresholds. In our case, we didn’t need to consider this scenario as we didn’t see a practical use for it.

2025-08-05

The authors have provided a careful and thorough response to my comments and have satisfactorily addressed my concerns. Therefore, I would like to raise my score.

审稿意见

评分: 4置信度: 52025-06-30

This paper modifies the existing LoD rendering mechanism of 3DGS, enabling improved efficiency and rendering quality. It also facilitates the real-time high-fidelity rendering of 3DGS on low-end devices. The key ideas include pruning-based LoD generation, chunk-based LoD rendering, and visibility filtering, and the authors provide a good implementation.

优缺点分析

Quality: The qualitative and quantitative comparison provides solid support for the superiority of the proposed method. The test on low-power devices is also impressive. However, in the ablation (Tab. 3), the performance difference of some model designs seems to be minimal, especially for PSNR and time cost. It is hard to distinguish the performance gain from random noise. The authors should report the average results of multiple runs to get convincing conclusions.

Clarity: Good, easy to read.

Originality: Relatively weak. The key idea, pruning-based LoD generation, chunk-based LoD rendering, and visibility filtering, has been discussed in CityGaussian. Other techniques like importance pruning and 3D filter also come from previous works.

Significance: A meaningful step towards LoD-based 3DGS rendering on low-end devices.

问题

See "Strengths And Weaknesses".

局限性

Yes, the limitation has been adequately addressed.

最终评判理由

My concern has been resolved. Therefore I would like to raise the score to borderline accept.

格式问题

Not any.

作者回复

2025-07-31

We thank the reviewer for the constructive feedback and will adjust the paper based on their comments and the rebuttal.

**

$W1$ Minimal performance difference in Tab. 3**

PSNR and time cost seems to be similar for rows Tab 3
The contributions proposed in the paper were not intended to increase the reconstruction quality (as measured by PSNR, LPIPS, and SSIM). Instead the main focus is to reduce a) rendering times, and b) reduce the number of Gaussians needed to be loaded in memory to enable deployment on memory-restricted (mobile) devices. To this end, we show that the PSNR stays similar to the full representation after adding each contribution, while we either reduce rendering time and/or the memory requirements. The aim of Tab. 3 is to show the following:

As discussed on lines 247-257, rows 1-5 show that the PSNR stays the same while adding LOD levels decreases the rendering time. At a certain point, adding further LOD levels does not gain much in terms of rendering speed, which validates our design choice of only using 3 layers of LOD (2 depth thresholds)
Row 6 and row 3 demonstrate that our automatic threshold selection procedure achieves the same quality while rendering significantly faster (L255)
Rows 7 and 8 further validate the chunk-based rendering since it reduces rendering time, and much more importantly, also drastically reduces the number of Gaussians loaded into memory
Finally, comparing rows 8 and 9, we show that the opacity blending procedure does not reduce quality (PSNR) and the increase in rendering time is not large. The numbers do not fully demonstrate the need for the opacity blending. However, the qualitative results in Figure 7 and especially the video in the supplementary material clearly show the need for the opacity blending procedure, which removes artifacts on chunk borders.

Authors should report average over multiple runs
As for the metrics, we follow standard practice in the literature $1,2,3,4,6,7,9,10,...$ and report the average results over the test set as it is infeasible to train multiple times on the same scene due to the large computational requirements.

**

$W2$ Relatively weak originality**

While the pruning-based LoD generation was tackled in CityGaussian (which used LightGaussian to build lower-quality representations of the same 3D scene), we believe our approach is a more principled solution to the problem: We explicitly reason and condition on the smallest possible size of Gaussians for the LoD level (in contrast LightGaussian builds LoD levels by simply pruning Gaussians with low importance score without constraints on Gaussian sizes). We use a 3D filter (same as in Mip-Splatting), but our use is very different from Mip-Splatting. While Mip-Splatting computes a 3D filter for each Gaussian based on the distance to the closest camera, we instead apply a 3D filter on the entire LOD level as a whole to restrict the size of the smallest Gaussians according to Nyquist sampling theorem. Note, that Mip-Splatting 3D filter changes during training for each Gaussian while in our case the 3D filter is constant and same for all Gaussians in the LoD level. To the best of our knowledge, using the 3D filter to build a LoD representation is a novel idea that has not been explored before.

2025-08-05

Thanks for the rebuttal. Most of my concerns are addressed. I would consider raising the score to borderline accept.

审稿意见

评分: 5置信度: 52025-07-02

LODGE proposes a systematic solution to large-scale scene NVS task. The core designs are: an LOD structure for managing the rendering overhead according to view distance, a cluster-based chunk partition to reduce the number of loaded gaussians, and an opacity interpolation scheme for removing the visual artifacts when moving across different chunks. In the experiments, the proposed method achieves most efficient rendering speed with SOTA rendering quality in both indoor and outdoor scenes.

优缺点分析

Strengths:

+: The experimental results are solid across different scenes. And both the visual quality and rendering speed are impressive, exhibiting an applicable solution for large-scale scene rendering.

+: The pipeline of constructing the LOD structure is interesting, from the finest level to the coarse level.

+: The paper are well-presented, making it easy to follow.

Weakness:

-: Large-scale reconstruction is a classic topic and many previous work on these topic are not compared, like CityGS[1], Level of Gaussians, etc. And, for reconstructing lod from finest level, the author should cite [2].

-: It's not clear how the author get the full representation model in Table 3, whose quality seems to be higher than all other baselines.

[1] CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

[2] PRoGS: Progressive Rendering of Gaussian Splats

问题

Please add comparisons with CityGS.

局限性

Yes.

最终评判理由

After reading the rebuttal and discussion, I keep my original score.

格式问题

作者回复

2025-07-31

We thank the reviewer for the constructive feedback and will adjust the paper based on their comments and the rebuttal.

**

$W1, Q1$ Comparison with CityGS, Level of Gaussians, including PRoGS in Related work**

Comparison with CityGS
We thank the reviewer for the suggestion of comparing with CityGS $17$ . We evaluated CityGS on all scenes (given the time constraint, we did our best to tune hyperparameters). The results are shown in the table below and will be included in the paper. We show the results compared to some of the baselines (for a complete set of baseline results, please take a look at Tables 1, and 2). As can be seen, our method outperforms CityGS in most metrics except for PSNR on H3DGS/Campus and ZipNeRF/Alameda, but in these cases, the difference in PSNR is small while we achieve much faster rendering and use less Gaussians.

H3DGS results

Method	SC (PSNR / SSIM / LPIPS / #G / FPS)	Campus (PSNR / SSIM / LPIPS / #G / FPS)
Zip-NeRF
$
1
$
24.78 / 0.770 / 0.381 / — / 0.09	21.34 / 0.768 / 0.422 / — / 0.20
H3DGS
$
10
$
26.42 / 0.807 / 0.331 / 7093 K / 38.07	24.60 / 0.798 / 0.396 / 6186 K / 34.32
FLOD
$
28
$
24.82 / 0.758 / 0.429 / 497 K / 208.41	24.10 / 0.777 / 0.453 / 595 K / 120.61
OctreeGS
$
27
$
25.98 / 0.807 / 0.326 / 1008 K / 120.27	25.22 / 0.800 / 0.408 / 642 K / 119.21
**CityGS
$
17
$
w/o LOD**	25.96 / 0.787 / 0.379 / 1743 K / 97.29	24.58 / 0.798 / 0.415 / 1254 K / 102.81
**CityGS
$
17
$
**	25.29 / 0.772 / 0.401 / 2615 K / 114.07	24.82 / 0.794 / 0.419 / 1881 K / 121.67
Ours	26.57 / 0.815 / 0.325 / 877 K / 257.46	24.75 / 0.803 / 0.394 / 1464 K / 218.96

Zip-NeRF results

Method	PSNR (A)	SSIM A)	FPS (A)	PSNR (L)	SSIM (L)	FPS (L)	PSNR (N)	SSIM (N)	FPS (N)
Zip-NeRF
$
1
$
22.97	0.738	0.13	26.76	0.822	0.13	28.21	0.845	0.13
H3DGS
$
10
$
22.21	0.739	27.82	26.34	0.823	30.49	27.28	0.849	33.11
FLOD
$
28
$
21.35	0.666	276.52	24.38	0.753	195.06	25.01	0.781	260.85
OctreeGS
$
27
$
22.94	0.734	119.83	26.04	0.817	153.06	27.05	0.839	146.29
**CityGS
$
17
$
w/o. LOD**	22.49	0.727	144.73	25.76	0.808	55.63	26.47	0.837	122.59
**CityGS
$
17
$
**	22.43	0.729	174.07	25.87	0.809	190.10	26.55	0.839	144.91
Ours	22.41	0.741	229.99	26.34	0.818	252.58	27.40	0.849	280.22

Comparing with Level of Gaussians
We considered including Level of Gaussians (LoG) in the Related Work section/comparison, but since there is no paper yet, only a public implementation, we believed the work is still under progress and the method itself (and the results obtained from the current implementation) might be subject to (significant) changes. We hope to include LoG in a future revision of the LODGE paper (after there is a paper or preprint for LoG).

Including PRoGS in Related Work
Thank you for the suggestion, we will cite PRoGS in the revised version.

**

$W2$ Performance of full representation in Table 3.**

The results in Table 3 were conducted on the H3DGS/SmallCity scene. The last row (our full model) corresponds to the entry “ours” in Table 1. Our base model (row 1 in Tab. 3) is similar to H3DGS, but it has a higher PSNR. This is caused by 1) not using depthmaps (which may be a source of noise in H3DGS as shown in the reported numbers), and 2) using the importance pruning approach from RadSplat, which also reduces floaters without the need for depth supervision. We will clarify this in the paper.

审稿意见

评分: 5置信度: 32025-07-03

The authors propose a novel Level-Of-Detail (LOD) method for 3D Gaussian Splatting, which effectively reduces computational and memory overhead for large-scale scene rendering while maintaining high-fidelity results. At the core of the method is a hierarchical LOD representation combined with a novel scene partitioning based on the Voronoi scene clustering. To further enhance rendering quality and efficiency, the paper introduces a depth-aware 3D smoothing filter and importance-based pruning for the LOD representation. To address potential boundary artifacts caused by scene clustering, an opacity-blending mechanism is incorporated.

One of the key advantages of the proposed method is that it eliminates the need for per-frame recomputation of active Gaussians, enabling dynamic loading of only the relevant splats for each view. The method is evaluated on two scenes from the Hierarchical 3DGS dataset and three scenes from the Zip-NeRF dataset, achieving state-of-the-art rendering performance while outperforming concurrent approaches such as H3DGS, OcreeGS, and FLOD in terms of memory usage and rendering efficiency. The authors also demonstrate the practical efficiency of their approach by testing in on multiple mobile devices, including iPhones and laptops.

优缺点分析

Strengths

The proposed scene clustering strategy offers significant efficiency improvements over prior LOD representations such as OctreeGS and H3DGS, which require per-frame computation to extract active Gaussians. By eliminating the need for these computations, the method enables dynamic loading of only the relevant Gaussians, resulting in substantial gains in both computational speed and memory usage. The practical efficiency is convincingly demonstrated through deployment on mobile devices, including iPhones and laptops.
The method shows notable improvements in rendering quality, particularly in challenging regions such as far-distance views and close-up reflective surfaces. As illustrated in Figures 5 and 6, the proposed method captures subtle visual details that are missed by concurrent approaches.
The experimental evaluation and ablation studies are well-structured and sufficiently comprehensive. The paper provides clear comparisons against strong baselines (e.g., H3DGS, OctreeGS), and the effectiveness of individual components is properly evaluated to support the overall claims.

Weaknesses

The mechanism for dynamic loading at chunk boundaries is not described in sufficient detail. Since the paper emphasizes rendering and memory efficiency, it would be valuable to explain how chunk-level loading is managed to avoid artifacts and ensure consistent performance, especially near boundaries.

问题

How large is each chunk, and what is the computational or memory overhead associated with dynamic chunk loading? Is this overhead negligible, or does it impact the overall efficiency of the system? It would be helpful to include additional evidence or quantitative data to support the claimed efficiency of the dynamic loading mechanism.
Is K-means clustering sufficient for effective scene partitioning? Given that scene visibility is heavily influenced by camera orientation, not just position, relying soly on camera positions for clustering may be suboptimal, especially in cases where visible regions are distant and viewing directions vary widely. Could you clarify whether your method accounts for camera orientation during clustering or selection? Additionally, can you provide more evidence that your partitioning strategy generalizes well to viewpoints with orientations different from those seen during training or cluster centers?

局限性

yes

最终评判理由

Most of my concerns have been adequately addressed through the rebuttal process, and I will maintain my current score. While I still have a few questions, as mentioned in my review, they are minor compared to the overall contribution of this paper.

格式问题

The paper is well-formatted.

作者回复

2025-07-31

We thank the reviewer for the constructive feedback and will adjust the paper based on their comments and the rebuttal.

**

$W1$ Dynamic loading not described in sufficient detail**

As described in section “Opacity blending for smooth cross-chunk transitions.” (L179-196), we take the two closest chunk centers and use their sets of Gaussians (as active Gaussians) on which we apply opacity blending. The dynamic loading means we load the active Gaussians’ properties (means, colors, etc.) into GPU memory. When the two closest chunks change (after passing the centre of the chunk), we remove the Gaussians from the previous chunk, keep the ones from the closest one, and load Gaussians from the next closest. The opacity blending at this point assigns weights close to 1 to all Gaussians from the closest chunk so there are no artifacts during loading the next chunk. We will add these details to the paper.

**

$Q1$ Memory overhead of dynamic chunk loading**

The chunk size depends on the 3D scene but for outdoor scenes the chunk sizes range between 3-5 meters. The overhead ultimately depends on the concrete implementation. In our experience (CUDA, WebGPU), the memory move is fast (15.26ms on CUDA with 877K Gaussians) and dynamic loading can be done asynchronously with the rendering, not slowing the rendering process.

**

$Q2$ Effectivity of K-means, dependency on camera orientations, generalization to novel viewpoints**

Using orientations in LOD chunks construction
For K-means clustering, we only use the camera positions, not the orientations. Note, that the projected size of a Gaussian only depends on camera position (distance), not viewing direction. Therefore, for LOD chunk construction, only camera positions need to be taken into account. While we further use visibility filtering (originally proposed in RadSplat), it is not as important as LOD Gaussians selection, and while visibility filtering increases performance (see rows 7 and 8 in Table 4), it is not critical for the method to work (compared with rows 6 and 7). While we believe that taking orientations into account could reduce the number of Gaussians by a large factor (e.g., the Gaussians behind the camera would not be included in the chunk, effectively halving the number of required Gaussians). However, reloading chunks would not be feasible with very fast camera rotations (as common in AR/VR). This is also the reason why orientations weren’t used in RadSplat, which first proposed the idea of visibility filtering. We will clarify this in the paper and extend the discussion.

Is K-means clustering sufficient for effective scene partitioning?
We believe other strategies, e.g., a regular grid, could work just as well for some types of captures where the camera density is constant in the 3D scene. However, K-means is very general and can be applied to any type of capture/3D scene as it directly operates on camera poses and does not assume any capture trajectory.

Does the chunking strategy generalize outside cluster centers?
Based on the evidence in Figure 7 and in the attached video, we conclude that the partitioning strategy generalizes to positions outside of cluster centers. Even when the selection is suboptimal (further from the cluster center), more Gaussians are selected than is strictly necessary to account for not being in the cluster center. This is because we use “depths offset by the chunk radius (distance to next closest chunk center) to ensure sufficient resolution for all camera positions inside the chunk“ (L171). This means that we usually take a larger set of Gaussians than is strictly necessary, to ensure sufficient coverage according to the Nyquist sampling theorem. While some Gaussians could project into sub-pixel sizes, the smallest Gaussians of each LOD level will cover at most a pixel - avoiding blurring at the target resolution.

Does the chunking strategy generalize outside training cameras distribution?
Regarding generalization outside training cameras distribution, unfortunately we don’t have any guarantees on the reconstruction quality outside of training cameras distribution. However, we would like to point out that most NVS methods $1,2,3,4,6,7,9,10,...$ are evaluated inside the training camera distribution and fail when the camera moves (or rotates) away from the training camera distribution. Since we cannot guarantee a good quality outside of training camera distribution even with our base model (since there are no constraints to guide the reconstruction there). Our LOD strategy selects the visible Gaussians based on training cameras and as shown in Table 3 it will neither “fix” the errors in underconstrained parts of the scene, nor will it make the problem worse. E.g., if the concern is that LOD will prune some Gaussians from a certain viewpoint because the scene was never seen from that viewpoint during training, the quality of the rendering from that viewpoint will be low anyway since there was no supervision for the base representation.

2025-08-09

Thank you to the authors for providing detailed rebuttals. Many of my concerns regarding dynamic loading and the chunking strategy have been addressed. However, I still have a concern about the limitation of using a chunking strategy based purely on position, without considering orientation. While the authors sufficiently explained the filtering of invisible parts based on orientation and argued that orientation-based loading is often unnecessary, my point concerns the need to load far visible chunks based on orientation. The proposed strategy appears to load only nearby chunks in terms of distance, but depending on the scene extents, there may be cases where very distant objects also need to be loaded. If we load only nearby chunks (e.g., with sizes of 3–5 meters), how can the system render faraway buildings or background objects, such as those 100 meters away?

Overall, this paper makes a valuable contribution to the area of efficient scene rendering, presenting a well-motivated approach with clear technical details and promising experimental results. The proposed method addresses important challenges in dynamic loading with practical solutions, and the writing is clear and well-structured. I believe the paper will be of interest to both researchers and practitioners in this field.

2025-08-09

Thank you for your positive feedback. We believe there may be a slight misunderstanding regarding how the system renders faraway objects, so let us clarify.

In chunk-based rendering on L162, we explain a strategy to reduce the number of Gaussians loaded into the memory.

We assume we already have our LOD representation (described on L114 LOD representation). This means that we have a multiple sets of Gaussians representing the same scene at increasingly lower level of detail. Furthermore, the higher the level, the less Gaussians it uses to represent the entire 3D scene. Finally, for each level $i$ , we expect to obtain pixel-perfect resolution when viewed from distance of at least $d_i$ .
Next, for each chunk, we build the list of indices of all Gaussians which would be visible if the camera was located at the chunk center and LOD rendering was used $^*$ $^{*}$ . However, we also account for camera being further from the center by increasing the depth thresholds used for LOD Gaussian selection (to ensure pixel-perfect resolution). This list will select Gaussians as follows:
1. First, it will select all Gaussians from level 0 (full representation) which are $d_1 + r$ distance from chunk center - $r$ being the radius of the chunk and $d_1$ is the first distance threshold.
2. Next, we progressively add "rings" (Gaussians which are between $d_i+r$ and $d_{i+1}+r$ distance away from the chunk center.
3. Finally, we add all Gaussians from lowest level which are further than $d_L+r$ from the chunk center. ( $L$ being the number of added LOD levels). Note, that since higher the LOD level, the larger the region from which we select the Gaussians (for 0th level, we select a small ball, the higher the level, the larger the volume of the ring we select. At the same time, however, the higher the level the less Gaussians it contains. Therefore, we effectively reduce the number of Gaussians needed inside the chunk.
To answer your concern that "there may be cases where very distant objects also need to be loaded" - indeed, distant objects need to be loaded. However, distant object need not to be represented at the highest resolution (which would result in Gaussians much smaller than pixel area and a large of Gaussians needed to be processed). The Gaussians selected for each chunk contain Gaussians representing the entire scene - not just a small neighborhood around the chunk center (not just the Gaussians inside the chunk). Therefore, faraway objects are still loaded, but they are represented with low-resolution Gaussians whose projected size matches or exceeds a pixel. This avoids the waste of storing and rendering extremely small Gaussians for distant geometry.
In actual implementation, we do not copy the Gaussians for each chunk, but inside each chunk we only store a set of indices - which Gaussians to select from the union of Gaussians from all LOD levels.

We will expand the chunk-based rendering section to make it clearer.

$^*$ Note, that there could be some artifacts caused by this assumption and we resolve them in the next section: Opacity blending for smooth cross-chunk transitions.

最终决定Accept (spotlight)

2025-09-17

The paper presents a novel and practical approach to large-scale 3D Gaussian Splatting with an effective LOD mechanism, dynamic chunk-based rendering, and memory-efficient strategies, validated through thorough experiments on indoor and outdoor datasets and low-power devices. All four reviewers acknowledge the solid technical contributions, strong evaluation, and practical impact, with minor clarifications addressed satisfactorily in the rebuttal. The decision to accept this paper has been approved by the SAC. We recommend that the authors incorporate the reviewers’ suggestions regarding chunk-based rendering, opacity blending, orientation handling, and clarifications in tables and methodology into the final version to ensure clarity and completeness.