GeGS-PCR: Fast and Robust Color 3D Point Cloud Registration with Two-Stage Geometric-3DGS Fusion
We propose GeGs-PCR, a two-stage point cloud registration method that integrates geometric, color, and Gaussian information, achieving robust performance and state-of-the-art results even in low-overlap scenarios.
摘要
评审与讨论
This manuscript proposes a novel colored-point-cloud registration method by incorporating color, coordinates and 3DGS together. Specifically, the method is built upon ColorPCR with three key improvements:
-
Adding an MLP for better color feature extraction.
-
Adding Gaussian representation of superpoint neighborhoods, which enhances feature extraction, position embedding during GeoTransformer-type superpoint attention, and differentiable rendering for pose refinement.
-
Adding a differentiable rendering loss (i.e., photometric loss) for pose refinement.
优缺点分析
Strengths:
-
The paper is well structured and written. it is easy to follow the authors’ ideas. The idea is original and moderately significant, where applying 3DGS on superpoint-matching is novel and possibly took inspiration from GMM-based registration methods. The paper also incorporated a theoretical proof for convergence guarantees of the 2nd stage pose refinement.
-
The core idea of applying gaussian during superpoint matching makes a lot of sense to me. Superpoint represents a small piece of local geometry which is highly likely anisotropic. Such property can be well captured with Mahalanobis distance instead of Euclidean distance in favor of superpoint matching and alignment.
-
Design spaces are well explored, although I feel sorry about the addition of many tunable parameters.
Weaknesses:
-
While performance increments are consistent, the tested benchmarks are quite saturated, and the absolute improvements are not significant.
-
Ablation experiments are not well conducted, which IMHO is very important to validating the effectiveness for all your added components. See Q2 for details.
-
Many niche designs have lots of parameters, but lack corresponding ablation experiments. See Q3 for details.
Advice (not a strength or weakness):
- It is viable to use KITTI-360 for coloring surround-view KITTI point clouds, although this would create an alternative benchmark and requires re-running the baselines. I would also recommend using other autonomous-driving datasets such as nuScenes or Waymo with surround-view color images.
问题
-
Design of Eq. 5 is questionable. The concatenated components have different meanings along different dimensions, but are compressed together with a single top-k operation. That way the model will not know the meaning of a specific dimension after the top-k, highly possibly causing confusion.
-
Ablation is under-optimized and needs explanation. Here I assume that your work is completely based on Color-PCR (which I think is self-evident, but please point out if otherwise). While the ablation only provides deduction of single components from GeGS-PCR, from retrospective we can also see them as incomplete increments from ColorPCR to GeGS-PCR. However, all the ablated models perform much worse than ColorPCR, comparing ablation figures in Table 3 and Table 8 against ColorPCR figures in Tab. 2. For example, does Tab. 3 line (a) hint that ColorPCR + Color Decoder + 3DGS + Geometric PE deteriorates registration performance compared to barebone ColorPCR? Maybe you can approach this concern by adding a fair baseline in Tab. 3 and describing the ablation components more clearly.
-
Many designs are not ablated. E.g., does the design of Eq. 3 achieve the optimal color encoding performance, compared to using no alpha, or applying alpha on all layers? What is the effect of tuning lambda in Eq. 4, k in Eq. 5, lambda in L213 (duplicate notation), gamma in Eq. 6, gamma in Eq. 10 (duplicate notation), lambda in Eq. 11(2nd duplicate notation), \sigma_{GS} in Eq. 19 and lambda in Eq. 21(3rd duplicate notation)? The list continues. Maybe you can approach this concern by both reducing the number of manually-tuned parameters and doing ablation when necessary. You can simply add some descriptions on feasible value ranges from your experience for less important parameters. However, leaving all those parameters with no value specification is not acceptable.
Overall, the paper presents a novel idea that seems to work perfectly consistent during experiments. However, the severely under-qualified and inconsistent ablation greatly undermines the credibility of such experimental results, causing me to rate a borderline reject. I would be happy to raise my ratings if my major concerns are addressed.
局限性
Yes
最终评判理由
Currently, my main concern about ablation consistency is solved and I would like to raise my rating to 4 (borderline accept). I would like to further engage in discussion and keep the right to change the figures later.
格式问题
-
Appendix L495: `3DGS-Self-Attention’ should be a separate paragraph.
-
Figure 2: What is the term for processing 3DGS features, Decoder (Fig. 2) or Encoder (L183)?
-
There are many duplicate notations for tunable parameters. See Q3 for examples.
We are grateful for the reviewer’s deep understanding and highly value their expertise in the relevant areas.
S.1: The property can be well captured with Mahalanobis distance instead of Euclidean distance in favor of superpoint matching and alignment.
Table 1 compared Mahalanobis and Euclidean distances in coarse registration. Mahalanobis with anisotropic modeling improves accuracy in scenes with strong anisotropy, while Euclidean works well in regular or symmetric scenes with lower cost. Our method will adaptively selects the distance metric based on scene characteristics to balance accuracy and efficiency.
Table 1. Comparison of Superpoint Matching Strategies
| Method | RR-5cm(%) | RR-10cm(%) | InlierRatio(%) | FeatureMatchingRecall(%) |
|---|---|---|---|---|
| EuclideanDistance | 67.2±0.8 | 88.1±0.5 | 58.1±1.2 | 64.7±1.0 |
| MahalanobisDistance | 71.9±0.6 | 82.4±0.7 | 65.9±0.9 | 72.2±0.7 |
W.1: The absolute improvements are not significant.
Limited gains were observed on C3DM, where ColorPCR already performs well due to high overlap and rich textures. GeGS-PCR challenges in low-overlap, sparse, and ambiguous scenes, showing stronger adaptability on challenging datasets like C3DM and ColorKITTI. Preliminary tuning suggests that further improvements are possible with more resources, but time constraints prevented full hyperparameter optimization.
Not W or S: It is viable to use KITTI-360 for coloring surround-view KITTI point clouds.
To evaluate the downsampling strategy in real-world complex scenes, we analyzed three datasets: ScanNet, Waymo, and KITTI-360.
-
ScanNet consists mainly of indoor scenes with uniform point density and simple geometry, lacking uneven density and color disturbances.
-
Waymo provides RGB data but lacks accurate LiDAR registration ground truth;
-
KITTI-360 offers large-scale outdoor point clouds in real urban environments with highly uneven density (dense foreground and sparse background), well-aligned RGB and LiDAR data.
Table 2 show that GeGS-PCR outperforms on Kitti-360, demonstrating the effectiveness of Geometric-3DGS (colorized KITTI-360 data will release).
We will mitigate information loss in high-density areas through local enhancement strategies such as region weighting and multi-scale geometric features.
Table 2. Registration results on CKitti-360
| Method | IR(%) | RR(%) | Total-time(s) | RRE(°) | RTE(cm) |
|---|---|---|---|---|---|
| YOHO | 10.99 | 48.13 | 8.31 | ~25.9 | ~0.55 |
| SpinNet | 14.86 | 33.41 | 73.1 | ~9.9 | ~0.47 |
| Geotransformer | 8.4 | 25.30 | 1.66 | ~7.4 | ~0.37 |
| ColorPCR | 12.5 | 45.01 | 1.29 | ~1.58 | ~0.35 |
| Ours | 15.03 | 56.88 | 1.18 | ~1.26 | ~0.28 |
Q.1: Design of top-k is questionable.
In code, we mitigate this by first computing top-k scores separately for geometry and color modalities, normalizing and weighting them before final selection from the combined candidate set. This design aims to balance simplicity and performance, and normalization helps align scale differences.
Tables 3,4 show this strategy is effective and robust across datasets. In future, we plan to introduce a semantic alignment module (e.g., learnable projection) before fusion to improve feature consistency.
Table 1. Quantified performance on ColorKitti.
| Model | Model Time (s) | Pose Time (s) | Total Time (s) | End-to-End Latency (ms) | FPS | GPU Usage (GB) |
|---|---|---|---|---|---|---|
| SpinNet | 6.025 | 0.039 | 6.064 | 60.64 | 0.016 | 3-4 |
| Predator | 0.008 | 0.520 | 0.528 | 52.8 | 1.893 | 3-4 |
| GeoTransformer | 0.018 | 0.322 | 0.340 | 14.5 | 2.913 | 3-4 |
| ColorPCR | 0.016 | 0.550 | 0.566 | 18.6 | 2.263 | 3-4 |
| GeGS-PCR (ours) | 0.020 | 0.327 | 0.347 | 15.3 | 2.540 | 2-4 |
Table 2. Quantified performance on C3DM.
| Model | #Sample | End-to-End Latency (ms) | FPS | GPU Usage (GB) |
|---|---|---|---|---|
| FCGF | 5000 | 33.78 | 0.296 | 3-4 |
| D3Feat | 5000 | 31.12 | 0.321 | 3-4 |
| SpinNet | 5000 | 60.646 | 0.016 | 3-4 |
| Predator | 5000 | 51.52 | 0.194 | 3-4 |
| CoFiNet | 5000 | 17.50 | 0.544 | 3-4 |
| GeoTransformer | 5000 | 15.33 | 0.613 | 2-4 |
| PEAL | 5000 | 18.65 | 0.537 | 3-4 |
| ColorPCR | 5000 | 17.12 | 0.585 | 3-4 |
| Ours | 5000 | 16.72 | 0.588 | 2-4 |
| CoFiNet | all | 19.1 | 0.699 | 3-4 |
| GeoTransformer | all | 15.4 | 0.953** | 2-4 |
| PEAL | all | 17.3 | 0.752 | 3-4 |
| ColorPCR | all | 18.8 | 0.775 | 3-4 |
| Ours | all | 16.8 | 1.010 | 2-4 |
We evaluated the impact of noise on the Top-k mechanism through color disturbance tests (Table 5). Despite random ±10% hue shifts in HSV space across all datasets, GeGS-PCR showed minimal performance drop and outperformed color-dependent methods like ColorPCR. These results confirm that the Top-k mechanism preserves geometric structure and performs well even under noise.
Table 5. Registration results under random hue shift (±10%) on all datasets.
| Method | C3DM-5000(NoShift) | C3DM-5000(HueShift) | Drop(%) | C3DLM-5000(NoShift) | C3DLM-5000(HueShift) | Drop(%) | KITTI(NoShift) | KITTI(HueShift) | Drop(%) |
|---|---|---|---|---|---|---|---|---|---|
| GeoTransformer | 92.0±0.3 | 89.5±0.3 | -2.5 | 88.5±0.3 | 84.7±0.4 | -3.8 | 95.4±0.2 | 93.2±0.3 | -2.2 |
| ColorPCR | 97.5±0.2 | 92.7±0.3 | -4.8 | 93.8±0.3 | 89.1±0.3 | -4.7 | 96.1±0.3 | 91.8±0.4 | -4.3 |
| Ours(GeGS-PCR) | 97.9±0.1 | 96.8±0.2 | -1.1 | 94.7±0.2 | 93.5±0.2 | -1.2 | 99.7±0.2 | 98.9±0.5 | -0.5 |
Q.2: Ablation is under-optimized and needs explanation.
Table 3 shows results for our complete Geometric-3DGS model, not a combination with ColorPCR. These experiments test the synergy and robustness of our components, not all possible combinations against ColorPCR (Table 2). Modules like 3DGS need joint training with differentiable rendering for optimal performance. We conducted standalone ablation studies (Table 6) based on ColorPCR, confirming that our modules consistently improve performance, even with weaker baselines.
Table 6. Ablation results based on ColorPCR baseline.
| Method | C3DM-PIR(%) | C3DM-FMR(%) | C3DM-IR(%) | C3DM-RR(%) | C3DLM-PIR(%) | C3DLM-FMR(%) | C3DLM-IR(%) | C3DLM-RR(%) |
|---|---|---|---|---|---|---|---|---|
| (a)ColorPCR(baseline) | 89.2 | 99.5 | 80.5 | 96.5 | 62.7 | 96.5 | 56.6 | 88.5 |
| (b)w/o ColorEncoder | 89.4 | 99.5 | 80.6 | 96.6 | 62.8 | 96.6 | 56.8 | 88.6 |
| (c)w/o 3DGS | 89.5 | 99.6 | 80.7 | 96.7 | 63.0 | 96.7 | 56.9 | 88.8 |
| (d)w/o differentiable rendering | 89.6 | 99.5 | 80.8 | 96.8 | 63.1 | 96.8 | 57.0 | 88.9 |
| (e)w/o color | 86.1 | 97.9 | 77.3 | 92.7 | 55.2 | 89.8 | 46.3 | 77.9 |
| (f)w/o Geometric PE | 88.8 | 99.3 | 80.2 | 96.3 | 62.4 | 96.0 | 56.4 | 88.0 |
| (g) w/o LoRA | 88.9 | 99.3 | 80.1 | 96.2 | 62.4 | 96.1 | 56.3 | 88.4 |
| (h)Geometric-3DGS (all) | 90.0 | 99.6 | 82.4 | 97.6 | 63.9 | 97.4 | 58.7 | 90.2 |
Results show no performance drop after adding modules in GeGS-PCR. Minor fluctuations are due to module removal. We clarified in the table caption that all methods start from ColorPCR and added explanations in the text to clarify each row and comparison target, preventing misunderstandings.
Q.3: Many designs need to be ablated.
1. ablation
Our model does not explicitly model color confidence weights, but its Geometric-Color fusion module implicitly learns modality confidence (e.g., in Eq. 3). To improve robustness under severe hue shifts (±10%), we calculate local color stability via RMSE and dynamically adjust , boosting color influence in stable areas and reducing it in noisy regions.
Table 7 shows GeoTransformer struggles under hue shifts; applying fixed or only at the last layer worsens performance. Using full-layer maintains stability, while GeGS-PCR with color confidence weighting is the most robust with minimal performance drop.
Future work will explore explicit color confidence modeling and enhanced multimodal fusion.
Table 7. Registration results under color confidence weights.
| Method | NoHueShift | HueShift | Drop(%) | Remarks | ColorConfidenceInfo |
|---|---|---|---|---|---|
| GeoTransformer | 92.0±0.3 | 89.5±0.3 | -2.5 | No color | No color confidence |
| Ours (no ) | 96.9±0.1 | 95.1±0.2 | -1.7 | No color confidence, no dynamic fusion | Static color feature usage, no confidence weights |
| Ours (fullLayer ) | 97.6±0.1 | 96.2±0.3 | -1.4 | applied at all layers | No explicit confidence weighting, for all layers |
| Ours (finalLayer ) | 92.5±0.3 | 91.0±0.3 | -1.5 | applied at final layer only features | No color, geometric features only |
| Ours (w/o ) | 97.6±0.2 | 96.5±0.3 | -1.0 | With dynamic color confidence | Color confidence dynamically adjusted, weight learned |
| Ours | 97.9±0.2 | 96.8±0.3 | -1.0 | applied as Eq.3 | Color features weighted with in final layer |
2. Parameters ablation
We sincerely thank the reviewer for the feedback on the design details. We conducted ablation experiments and verified the tuning range for these symbols (Table 8).
These parameters were initially designed for flexibility across tasks, with tuning tailored per scenario. Following the reviewer’s suggestion, we plan to reduce redundant parameters to improve robustness and practicality, and will design an adaptive adjustment mechanism in the revised version.
Due to time constraints, full optimization of all parameters was not completed. However, we are confident that further tuning in future versions will improve performance and better showcase our method’s strengths. We thank the reviewer for their understanding and guidance.
Table 8. Performance of ablation study base on parameters
| Parameter | Value Range | Every Value | RR (Every value) | Conclusion |
|---|---|---|---|---|
| (Eq.4) | [0.01, 1.0] | 0.01, 0.05, 0.1, 0.5, 1.0 | 0.91, 0.96, 0.90, 0.91, 0.92 | Stable (0.05) |
| (Eq.5) | [2, 5] | 2, 3, 4, 5 | 0.87, 0.97, 0.97, 0.89 | Tunable |
| (Line213) | [0.05, 0.2] | 0.05, 0.1, 0.15, 0.2 | 0.88, 0.97, 0.97, 0.91 | Tunable |
| (Eq.6) | [1, 3] | 1, 1.5, 2, 3 | 0.97, 0.92, 0.92, 0.89 | Tunable |
| (Eq.10) | [1, 3] | 1, 1.5, 2, 3 | 0.89, 0.97, 0.97, 0.91 | Tunable |
| (Eq.11) | [0.1, 1] | 0.1, 0.3, 0.7, 1.0 | 0.88, 0.97, 0.97, 0.85 | Tunable |
| (Eq.19) | [0.01, 0.1] | 0.01, 0.03, 0.07, 0.1 | 0.86, 0.96, 0.96, 0.88 | Tunable |
| (Eq.21) | [0.01, 1.0] | 0.1, 0.5, 0.8, 1.0 | 0.90, 0.97, 0.96, 0.92 | Stable (0.5) |
Paper Formatting Concerns
We sincerely thank reviewer for the valuable suggestions. We revised the layout, standardized the naming as “Encoder,” and unified symbols throughout the paper and appendix.
Thank you very much for providing such long and insightful reply. Some of my concerns are addressed as follows:
Q2. Ablation experiment not complete: A table consistent with Tab. 1 in the paper is provided, with proper and reasonable performance increments with module additions.
Q3. Many design choices need ablation: The authors provided detailed ablation on the adaptive fusion weight (Eq. 3). Value specifications of all other parameters are provided and their names are distinguished apart.
However, the author response and evidence supporting a reply to Q1 do not convince me. Tab. 3 and Tab. 4 demonstrated the inference speed and costs, while Tab. 5 showed the method's robustness to hue shifts, not the effect of removing or changing the k in the top-k operation in Eq. 5. While this is indeed a minor problem, the authors are suggested to conduct related experiments in the revision.
Currently, my main concern about ablation consistency is solved and I would like to raise my rating to 4 (borderline accept).
We sincerely appreciate your recognition of our work and your revised rating. We will include the ablation study on the top-k operation in our revised version. Thank you once again for your constructive suggestions on our work.
The paper introduces GeGS-PCR, a novel two-stage method for 3D point cloud registration that integrates geometric, color, and Gaussian (3DGS) information to achieve robust performance, particularly in low-overlap and incomplete scenarios. The method employs a dedicated color encoder to extract noise-robust color features, a Geometric-3DGS module to encode local neighborhood information of superpoints, and a joint photometric loss with differentiable rendering for refined registration. The authors also create a new dataset, ColorKitti, by colorizing the Kitti dataset to validate the method's generalization. Experimental results on Color3DMatch, Color3DLoMatch, and ColorKitti demonstrate state-of-the-art performance.
优缺点分析
Strengths:
-
The paper presents a high-quality contribution with rigorous experimental validation. The proposed GeGS-PCR method achieves improvements over state-of-the-art methods (e.g., ColorPCR, GeoTransformer) across multiple metrics (FMR, IR, RR, RRE, RTE) on Color3DMatch and Color3DLoMatch datasets. The creation of ColorKitti enhances the evaluation scope, addressing the scarcity of colored point cloud datasets. The reproducibility is well-supported with open-access code, datasets, and detailed experimental settings.
-
The paper is well-structured, with clear descriptions of the problem, method, and experimental setup. The pipeline (Fig. 2) effectively illustrates the coarse-to-fine approach, and mathematical formulations (e.g., covariance matrix, photometric loss) are precise and well-explained. The use of figures (e.g., Fig. 3, 7, 8) and tables (e.g., Tables 1, 2, 7, 8) enhances understanding of qualitative and quantitative results.
Weaknesses:
-
The performance improvement over ColorPCR, while notable, is marginal in some metrics (e.g., 0.4% increase in RR on C3DM, 1.3% in IR), raising questions about the practical significance of the added complexity. The computational complexity of the Geometric-3DGS module, particularly in high-overlap scenarios, is acknowledged but not quantified. The paper lacks a detailed comparison of computational resources (e.g., GPU memory usage, training time) against baselines, which is critical for assessing practical deployability. Additionally, the reliance on superpoint downsampling may limit performance in scenarios with irregular point density, which is not explored. The ablation study in Table 3 is not sufficiently clear, making it difficult to discern the precise impact of each component (e.g., color encoder vs. 3DGS) due to overlapping metric improvements and lack of detailed analysis.
-
The significance is somewhat limited by the focus on specific datasets (Color3DMatch, Color3DLoMatch, ColorKitti), with no evaluation on other real-world datasets (e.g., ScanNet, Waymo). The marginal performance gains over ColorPCR may reduce the perceived impact, especially given the increased computational complexity.
问题
See weakness.
局限性
The authors adequately address the limitations and potential negative societal impacts. They acknowledge that GeGS-PCR’s reliance on superpoint downsampling leads to high memory and computational overhead in high-overlap scenarios and note the limitation of focusing on superpoint-level rather than scene-level registration. These are discussed in Section A.6, with future directions proposed (e.g., leveraging semantic scene understanding). The societal impact section highlights positive applications (e.g., autonomous navigation, urban planning) and negative risks (e.g., privacy concerns in surveillance, data manipulation), with mitigation strategies like gated releases suggested. The transparency in addressing these points is commendable and aligns with ethical research practices.
格式问题
No.
We truly appreciate the reviewer’s constructive feedback and professional analysis.
W.1: Detailed Performance Evaluation
1. Improvement is marginal in some metrics
The performance improvement on the C3DM dataset is relatively small because it contains scenes with high viewpoint overlap and rich texture, where methods like ColorPCR already perform well. In contrast, GeGS-PCR is tailored for more challenging scenes with low overlap, sparse geometry, and ambiguous or varied color textures. Therefore, improvements are more significant on datasets like C3DLoMatch and ColorKITTI. Our geometric-3DGS module and color-guided encoder provide clear advantages in these difficult conditions.
However, our method demonstrates stronger adaptability and robustness in realistic and complex scenarios, which are more relevant for practical applications. Due to submission deadlines, hyperparameter tuning was limited; initial experiments indicate further improvements are possible with more thorough tuning and additional computational resources.
2. Quantified GPU memory usage and training time
We found our paper includes high-overlap scenes with visual overlaps of 60.8%, 79.7%, 41.8%, and 82.5% (see qualitative results). While our focus is on low-overlap scenes, the computational cost of Geometric-3DGS in high-overlap cases requires further evaluation.
Furthermore,we acknowledge that the original version of our manuscript lacked a detailed quantification of the runtime memory overhead of 3DGS. To address this, we have conducted additional performance evaluations to demonstrate the deployment feasibility of GeGS-PCR in practical scenarios. The results are summarized in Tables 1 and 2, including:
- FPS: 2.94 on the ColorKitti dataset, 1.010 on the C3DM dataset
- Latency: 16.7 ms on ColorKitti, 15.3 ms on C3DM
- GPU Memory Usage: 2–4 GB on both datasets
- Lower model and pose estimation time on ColorKitti compared to competing methods
These results clearly indicate that GeGS-PCR is practically test on RTX 4070 GPU and competitive with these approaches.
Table 1. Quantified performance on ColorKitti.
| Model | Model Time (s) | Pose Time (s) | Total Time (s) | End-to-End Latency (ms) | FPS | GPU Usage (GB) |
|---|---|---|---|---|---|---|
| SpinNet | 6.025 | 0.039 | 6.064 | 60.64 | 0.016 | 3-4 |
| Predator | 0.008 | 0.520 | 0.528 | 52.8 | 1.893 | 3-4 |
| GeoTransformer | 0.018 | 0.322 | 0.340 | 14.5 | 2.913 | 3-4 |
| ColorPCR | 0.016 | 0.550 | 0.566 | 18.6 | 2.263 | 3-4 |
| GeGS-PCR (ours) | 0.020 | 0.327 | 0.347 | 15.3 | 2.540 | 2-4 |
Table 2. Quantified performance on C3DM.
| Model | #Sample | End-to-End Latency (ms) | FPS | GPU Usage (GB) |
|---|---|---|---|---|
| FCGF | 5000 | 33.78 | 0.296 | 3-4 |
| D3Feat | 5000 | 31.12 | 0.321 | 3-4 |
| SpinNet | 5000 | 60.646 | 0.016 | 3-4 |
| Predator | 5000 | 51.52 | 0.194 | 3-4 |
| CoFiNet | 5000 | 17.50 | 0.544 | 3-4 |
| GeoTransformer | 5000 | 15.33 | 0.613 | 2-4 |
| PEAL | 5000 | 18.65 | 0.537 | 3-4 |
| ColorPCR | 5000 | 17.12 | 0.585 | 3-4 |
| Ours | 5000 | 16.72 | 0.588 | 2-4 |
| CoFiNet | all | 19.1 | 0.699 | 3-4 |
| GeoTransformer | all | 15.4 | 0.953 | 2-4 |
| PEAL | all | 17.3 | 0.752 | 3-4 |
| ColorPCR | all | 18.8 | 0.775 | 3-4 |
| Ours | all | 16.8 | 1.010 | 2-4 |
3. Reliance on superpoint downsampling may limit performance
We acknowledge that superpoint downsampling may cause local information loss in high-density regions, potentially affecting registration accuracy. Our method has mainly been tested on outdoor datasets (C3DM, C3DLoMatch, ColorKITTI) where point cloud distributions are relatively regular. However, in more complex scenarios such as indoor environments, vegetation, or occlusions, point densities can be irregular. The C3DLoMatch dataset contains occluded and low-overlap segments with irregular densities, yet our method still performs well, indicating a certain robustness of GeGS-PCR. Similar methods like GeoTransformer and ColorPCR also use superpoint downsampling without showing performance limitations.
To evaluate the downsampling strategy in real-world complex scenes, we analyzed three datasets: ScanNet, Waymo, and KITTI-360.
-
ScanNet consists mainly of indoor scenes with uniform point density and simple geometry, lacking challenges of uneven density and color disturbances.
-
Waymo provides RGB data but lacks accurate LiDAR registration ground truth; it has a large data volume and significant viewpoint differences between sensors, making it unsuitable for registration evaluation.
-
KITTI-360 offers large-scale outdoor point clouds in real urban environments with highly uneven density (dense foreground and sparse background), well-aligned RGB and LiDAR data, and many geometrically similar but texturally distinct objects.
Therefore, we selected KITTI-360 for extended evaluation (colorized KITTI-360 data will release). Results (see Table 3) show that our method outperforms ColorPCR and other baselines under color variations and complex scenes, demonstrating the effectiveness of Geometric-3DGS.
In future work, we plan to mitigate information loss in high-density areas through local enhancement strategies such as region weighting and multi-scale geometric features.
Table 3 Registration results on CKitti-360
| Method | IR(%) | RR(%) | Total-time(s) | FPS | GPU Usage (GB) | RRE(°) | RTE(cm) |
|---|---|---|---|---|---|---|---|
| YOHO | 10.99 | 48.13 | 8.31 | 0.193 | 2-4 | ~25.9 | ~0.55 |
| SpinNet | 14.86 | 33.41 | 73.1 | 0.025 | 3-4 | ~9.9 | ~0.47 |
| Geotransformer | 8.4 | 25.30 | 1.06 | 0.997 | 2-4 | ~7.4 | ~0.37 |
| ColorPCR | 12.5 | 45.01 | 1.29 | 0.875 | 3-4 | ~1.58 | ~0.35 |
| Ours | 15.03 | 56.88 | 1.18 | 1.102 | 2-4 | ~1.26 | ~0.28 |
Nevertheless, we also recognize that extending our method to the ScanNet and Waymo datasets would further enhance its applicability. Due to time constraints and our best efforts, we have not yet conducted tests on these datasets. However, we plan to incorporate these datasets in future work, re-adapt the method, and run the necessary baseline methods for evaluation.
4. Ablation study is not sufficiently clear
The ablation results in Table 3 are based on our full Geometric-3DGS model, not combined with ColorPCR baseline. These tests verify the synergy and robustness of our system components, not the individual improvement of each module in all combinations, so direct comparison with Table 2 baselines isn’t possible.
Some modules (e.g., 3DGS) require joint training with differentiable rendering for best results. Removing parts like differentiable rendering weakens features and supervision, hurting fine registration (see Table 3a). We also studied removing color or the entire 3DGS component.
To clarify metric repetition, we ran independent ablations on ColorPCR (Table 4) controlling module order and data synergy. Results show consistent gains even on weaker baselines, proving our module design is sound, generalizable, and optimizable.
Table 4. Ablation results based on ColorPCR baseline
| Method | C3DM-PIR(%)↑ | C3DM-FMR(%)↑ | C3DM-IR(%)↑ | C3DM-RR(%)↑ | C3DLM-PIR(%)↑ | C3DLM-FMR(%)↑ | C3DLM-IR(%)↑ | C3DLM-RR(%)↑ |
|---|---|---|---|---|---|---|---|---|
| (a)ColorPCR(baseline) | 89.2 | 99.5 | 80.5 | 96.5 | 62.7 | 96.5 | 56.6 | 88.5 |
| (b)w/o ColorEncoder | 89.4 | 99.5 | 80.6 | 96.6 | 62.8 | 96.6 | 56.8 | 88.6 |
| (c)w/o 3DGS | 89.5 | 99.6 | 80.7 | 96.7 | 63.0 | 96.7 | 56.9 | 88.8 |
| (d)w/o differentiable rendering | 89.6 | 99.5 | 80.8 | 96.8 | 63.1 | 96.8 | 57.0 | 88.9 |
| (e)w/o color | 86.1 | 97.9 | 77.3 | 92.7 | 55.2 | 89.8 | 46.3 | 77.9 |
| (f)w/o Geometric PE | 88.8 | 99.3 | 80.2 | 96.3 | 62.4 | 96.0 | 56.4 | 88.0 |
| (g) w/o LoRA | 88.9 | 99.3 | 80.1 | 96.2 | 62.4 | 96.1 | 56.3 | 88.4 |
| (h)Geometric-3DGS (all) | 90.0 | 99.6 | 82.4 | 97.6 | 63.9 | 97.4 | 58.7 | 90.2 |
Based on the results, there is no performance degradation after adding modules issue in GeGS-PCR. The slight performance fluctuations observed in the intermediate steps of the table are due to the removal of certain modules. Following the reviewer's suggestion, we have clarified in the table caption that all methods are based on ColorPCR as the starting point, and we have added explanations in the main text to clarify the meaning of each row and the comparison targets to avoid any misunderstandings.
W.2: The significance is limited by the focus on specific datasets and need to quantified performance on real-world datasets
Based on the performance testing on the real-world Kitti-360 dataset provided in Table 3 of W.1. In order to systematically evaluate the impact of lighting conditions, sensor errors, color distortions in real-world scenarios, we supplemented the following experiments. Color Disturbance Robustness Experiment: We introduced severe random hue drift (±10%) in the HSV color space of ColorKitti, C3DM, and C3DLM to simulate common noise in real sensors, sensor mismatches, and color loss phenomena. The results in Tables 5 show that, although color-dependent algorithms (ColorPCR) experience some degradation, GeGS-PCR, due to its robust Geometric-3DGS backbone’s adaptive fusion design, still maintains strong registration accuracy.
The results show that GeGS-PCR, across multiple datasets, demonstrates stronger robustness and a smaller accuracy decrease compared to existing methods. However, due to its robust geometric backbone and adaptive fusion design, our method still maintains strong registration accuracy. This indicates that rough registration can be achieved using only the geometric branch, and color information primarily serves as an optimization signal under reliable conditions.
Table 5. Registration results under random hue shift (±10%) on all datasets.
| Method | 3DM-5000(NoShift) | 3DM-5000(HueShift) | Drop(%) | 3DLM-5000(NoShift) | 3DLM-5000(HueShift) | Drop(%) | KITTI(NoShift) | KITTI(HueShift) | Drop(%) |
|---|---|---|---|---|---|---|---|---|---|
| GeoTransformer | 92.0±0.3 | 89.5±0.3 | -2.5 | 88.5±0.3 | 84.7±0.4 | -3.8 | 95.4±0.2 | 93.2±0.3 | -2.2 |
| ColorPCR | 97.5±0.2 | 92.7±0.3 | -4.8 | 93.8±0.3 | 89.1±0.3 | -4.7 | 96.1±0.3 | 91.8±0.4 | -4.3 |
| Ours(GeGS-PCR) | 97.9±0.1 | 96.8±0.2 | -1.1 | 94.7±0.2 | 93.5±0.2 | -1.2 | 99.7±0.2 | 98.9±0.5 | -0.5 |
Limitations
We acknowledge the absence of a dedicated limitations section in the current version. In future revisions, we will explicitly include a detailed discussion of the limitations of our method.
I appreciate the authors’ response. My questions have been adequately addressed, and I will retain my original score.
We sincerely thank the reviewer for meticulously reviewing our work and providing valuable suggestions. Your feedback has greatly improved the quality of our research, and we are truly grateful.
This paper addresses the challenge of robust 3D point cloud registration, particularly in scenarios with low overlap and incomplete geometry, by leveraging both geometric and color information. The authors propose GeGS-PCR, a novel two-stage method integrating a dedicated color encoder and a Geometric-3DGS (3D Gaussian Splatting) module. The approach tightly fuses geometric and color data via a dual-modal encoder and models the local neighborhood with 3D Gaussian parameters, further optimized by LoRA to reduce computational cost. A differentiable rendering and a joint photometric loss are introduced in the fine registration stage for further refinement. The method is validated on both indoor (Color3DMatch, Color3DLoMatch) and outdoor (ColorKitti) colored point cloud datasets, demonstrating state-of-the-art performance, especially under low-overlap conditions. Ablation studies and qualitative results are presented to justify each module's effectiveness.
优缺点分析
Strengths:
Technical Novelty & Originality: The combination of geometric and color features at a deep level, the introduction of the Geometric-3DGS module, and the use of differentiable rendering for point cloud registration are all novel. The integration of LoRA for efficient parameterization is innovative and relevant for scalability.
Quality & Thoroughness: The methodology is clearly described, including mathematical formulation, network architecture, and optimization strategies. Both coarse and fine registration pipelines are well justified.
Empirical Evaluation: Extensive experiments on multiple benchmarks (Color3DMatch, Color3DLoMatch, ColorKitti) convincingly demonstrate the method's superiority, especially in low-overlap and noisy scenarios. Ablation studies support the necessity of each component.
Clarity: The paper is generally well-organized. Key ideas, architecture, and modules are well-illustrated and justified with both qualitative and quantitative results.
Reproducibility: Implementation details, hyperparameters, dataset splits, and a public code repository are provided. Training/test procedures are described clearly.
Weaknesses:
Scalability / Computational Cost: The paper acknowledges that using superpoints and 3DGS parameterization may incur significant memory and computation overhead in high-overlap or very large scenes, but lacks a deeper discussion or experiments on scalability to truly large-scale datasets or real-time constraints.
Limited Statistical Analysis: While results are comprehensive, the paper does not provide error bars or statistical significance analysis, making it harder to assess the robustness of the improvements.
Ablation on Individual Modalities: Although ablation is done, there is limited exploration into failure cases where color might hurt registration (e.g., severe color noise, lighting changes, or sensor mismatch), or where geometric features are insufficient.
Potential Overfitting to Synthetic Colorization: ColorKitti is constructed by colorizing KITTI point clouds via projection, which may not reflect real-world sensor noise or missing color. The generalizability to native colored LiDAR or different sensors is not verified.
Broader Impacts & Safeguards: The broader societal impact is discussed, but concrete safeguards or risk mitigation for data/model misuse are not detailed.
问题
Scalability Experiments: Have the authors evaluated the memory and computation bottlenecks of GeGS-PCR on very large-scale or real-time datasets (e.g., full-scale outdoor driving datasets with hundreds of millions of points)? What strategies could further mitigate memory usage?
Robustness to Color Noise/Distortion: How does the method perform if the color channels are corrupted (e.g., illumination changes, sensor noise, partial missing color)? Would incorporating color confidence weights improve robustness?
Generalization Beyond Synthetic Colorization: Since ColorKitti is constructed by projecting color onto LiDAR points, have the authors tested the method on naturally colored point cloud datasets (e.g., from RGB-D sensors or mobile mapping platforms)? How does the method generalize to such data?
Failure Modes: Are there scenarios where color information may harm registration, for example, in the presence of repetitive textures or objects with similar color but different geometry? How does GeGS-PCR handle such cases?
Statistical Significance: Can the authors provide statistical analysis (e.g., multiple runs, error bars) to demonstrate the significance and robustness of their improvements?
局限性
The authors have adequately discussed the limitations, especially regarding memory usage and computation overhead in high-overlap scenarios, as well as the dependency on superpoint extraction. They also note the need for scene-level registration and semantic integration in future work.
最终评判理由
I appreciate the additional experiments and quantitative analyses, which address several of my initial concerns. Based on the author response and new evidence, I believe the authors have adequately addressed my main concerns. I will maintain my borderline accept (score: 4). Should further improvements be incorporated (e.g., more diverse datasets), the paper could be rated higher in the future.
格式问题
No major formatting issues observed.
We sincerely thank the reviewer for their valuable suggestions and unique insights.
W.1: Scalability / Computational Cost
We acknowledge that the original version of our manuscript lacked a detailed quantification of the runtime memory overhead of 3DGS. To address this, we have conducted additional performance evaluations to demonstrate the deployment feasibility of GeGS-PCR in practical scenarios. The results are summarized in Tables 1 and 2:
- FPS: 2.94 on the ColorKitti dataset, 1.010 on the C3DM dataset
- Latency: 16.7 ms on ColorKitti, 15.3 ms on C3DM
- GPU Memory Usage: 2–4 GB on both datasets
- Lower model and pose estimation time on ColorKitti compared to competing methods
These results clearly indicate that GeGS-PCR is practically test on RTX 4070 GPU and competitive with these approaches.
Table 1. Quantified performance on ColorKitti.
| Model | Model Time (s) | Pose Time (s) | Total Time (s) | Latency (ms) | FPS | GPU Usage (GB) |
|---|---|---|---|---|---|---|
| SpinNet | 6.025 | 0.039 | 6.064 | 60.64 | 0.016 | 3-4 |
| Predator | 0.008 | 0.520 | 0.528 | 52.8 | 1.893 | 3-4 |
| GeoTransformer | 0.018 | 0.322 | 0.340 | 14.5 | 2.913 | 3-4 |
| ColorPCR | 0.016 | 0.550 | 0.566 | 18.6 | 2.263 | 3-4 |
| GeGS-PCR (ours) | 0.020 | 0.327 | 0.347 | 15.3 | 2.540 | 2-4 |
Table 2. Quantified performance on C3DM.
| Model | #Sample | Latency (ms) | FPS | GPU Usage (GB) |
|---|---|---|---|---|
| FCGF | 5000 | 33.78 | 0.296 | 3-4 |
| D3Feat | 5000 | 31.12 | 0.321 | 3-4 |
| SpinNet | 5000 | 60.646 | 0.016 | 3-4 |
| Predator | 5000 | 51.52 | 0.194 | 3-4 |
| CoFiNet | 5000 | 17.50 | 0.544 | 3-4 |
| GeoTransformer | 5000 | 15.33 | 0.613 | 2-4 |
| PEAL | 5000 | 18.65 | 0.537 | 3-4 |
| ColorPCR | 5000 | 17.12 | 0.585 | 3-4 |
| Ours | 5000 | 16.72 | 0.588 | 2-4 |
| CoFiNet | all | 19.1 | 0.699 | 3-4 |
| GeoTransformer | all | 15.4 | 0.953 | 2-4 |
| PEAL | all | 17.3 | 0.752 | 3-4 |
| ColorPCR | all | 18.8 | 0.775 | 3-4 |
| Ours | all | 16.8 | 1.010 | 2-4 |
| Quantization (16-bit) | all | 14.6 | 1.072 | 1.5-3 |
| Sparse LoRA Attention | all | 15.1 | 1.022 | 1.6-3 |
| Downsampled 3DGS | all | 14.2 | 1.097 | 1.3-2.5 |
| Shared Color-Geo Encoder | all | 15.0 | 1.035 | 1.4-3 |
| Efficient Attention | all | 14.9 | 1.110 | 1.2-2.8 |
We analyzed three datasets: ScanNet, Waymo, and KITTI-360—to select a suitable extended evaluation set.
- ScanNet features indoor scenes with uniform point density and simple geometry, lacking challenges like non-uniform density and color disturbances.
- Waymo provides RGB images but lacks precise LiDAR registration ground-truth, has poor LiDAR-RGB alignment, limiting fusion quality.
- KITTI-360 offers large-scale outdoor scenes with non-uniform density, well-aligned RGB-LiDAR data, and many geometrically similar but texturally distinct objects, making it ideal for robust registration evaluation.
We chose KITTI-360 for extended tests (colorized KITTI-360 data will release). As Table 3 shows, our method outperforms baselines under color variations and complex scenes.
KITTI-360 mainly contains daytime scenes with limited lighting variations; nighttime and semantic scene understanding are left for future work.
Table 3 Registration results on CKitti-360
| Method | IR(%) | RR(%) | Total-time(s) | FPS | GPU Usage (GB) | RRE(°) | RTE(cm) |
|---|---|---|---|---|---|---|---|
| YOHO | 10.99 | 48.13 | 8.31 | 0.193 | 2-4 | ~25.9 | ~0.55 |
| SpinNet | 14.86 | 33.41 | 73.1 | 0.025 | 3-4 | ~9.9 | ~0.47 |
| Geotransformer | 8.4 | 25.30 | 1.06 | 0.997 | 2-4 | ~7.4 | ~0.37 |
| ColorPCR | 12.5 | 45.01 | 1.29 | 0.875 | 3-4 | ~1.58 | ~0.35 |
| Ours | 15.03 | 56.88 | 1.18 | 1.102 | 2-4 | ~1.26 | ~0.28 |
Based on preliminary experiments (Table 2), we plan to reduce memory usage with the following strategies:
- 16-bit Quantization: Compresses weights from 32-bit to 16-bit, cutting GPU memory by ~30% and slightly speeding up inference.
- Sparse LoRA Attention: Prunes redundant attention parameters, saving ~20% memory without accuracy loss.
- Downsampled 3DGS: Uses stride-based downsampling to reduce memory by ~25% without accuracy drop.
- Shared Color-Geometry Encoder: Combines encoders to share weights, reducing memory by ~15%.
- Efficient Attention: Replaces standard attention with efficient approximations, further lowering memory and improving FPS.
W.2: Limited Statistical Analysis
We conducted ten independent experiments to strengthen statistical reliability. Table 4 reports mean and standard deviation to assess robustness.
Results show GeGS-PCR significantly outperforms other methods on RR, IR, and RMSE (p < 0.05). Higher RR and IR correspond to lower RMSE, with GeGS-PCR consistently achieving RMSE < 0.01, confirming its stability. Although error bars are omitted due to limit, visualized statistical analysis will be added in the revised version.
Table 4. Registration Recall (%, mean ± std) on 3DMatch over 10 random seeds(p < 0.05).
| Method | Metric | C3DM-5000 | C3DM-2500 | C3DM-1000 | C3DM-500 | C3DM-250 | C3DLM-5000 | C3DLM-2500 | C3DLM-1000 | C3DLM-500 | C3DLM-250 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| RR | 98.1 | 98.3 | 98.1 | 98.2 | 98.3 | 83.1 | 83.5 | 83.3 | 83.1 | 82.6 | |
| CoFiNet | IR | 49.8 | 51.2 | 51.9 | 52.2 | 52.2 | 24.4 | 25.9 | 26.7 | 26.8 | 26.9 |
| RMSE | 0.156±0.01 | 0.159±0.01 | 0.161±0.01 | 0.164±0.01 | 0.167±0.01 | 0.248±0.01 | 0.253±0.01 | 0.256±0.01 | 0.261±0.01 | 0.264±0.01 | |
| RR | 97.9 | 97.9 | 97.9 | 97.9 | 97.6 | 88.3 | 88.6 | 88.8 | 88.6 | 88.3 | |
| GeoTransformer | IR | 71.9 | 75.2 | 76.0 | 82.2 | 85.1 | 43.5 | 45.3 | 46.2 | 52.9 | 57.7 |
| RMSE | 0.142±0.01 | 0.146±0.01 | 0.150±0.01 | 0.155±0.01 | 0.160±0.01 | 0.222±0.01 | 0.228±0.01 | 0.233±0.01 | 0.238±0.01 | 0.245±0.01 | |
| RR | 99.0 | 99.0 | 99.1 | 99.1 | 98.8 | 91.7 | 92.4 | 92.5 | 92.9 | 92.7 | |
| PEAL | IR | 72.4 | 79.1 | 84.1 | 86.1 | 87.3 | 45.0 | 50.9 | 57.4 | 60.3 | 62.2 |
| RMSE | 0.132±0.01 | 0.137±0.01 | 0.140±0.01 | 0.142±0.01 | 0.146±0.01 | 0.194±0.01 | 0.202±0.01 | 0.207±0.01 | 0.212±0.01 | 0.218±0.01 | |
| RR | 99.5 | 99.5 | 99.5 | 99.5 | 99.5 | 96.5 | 96.5 | 97.0 | 97.0 | 96.7 | |
| ColorPCR | IR | 75.0 | 80.5 | 84.7 | 86.5 | 87.8 | 51.2 | 56.6 | 63.1 | 66.0 | 68.0 |
| RMSE | 0.118±0.01 | 0.123±0.01 | 0.126±0.01 | 0.129±0.01 | 0.133±0.01 | 0.182±0.01 | 0.188±0.01 | 0.194±0.01 | 0.200±0.01 | 0.205±0.01 | |
| RR | 99.5 | 99.6 | 99.5 | 99.7 | 99.6 | 97.6 | 97.4 | 97.1 | 97.2 | 97.0 | |
| Ours | IR | 76.3 | 82.4 | 86.3 | 86.6 | 89.1 | 53.4 | 58.0 | 60.3 | 63.4 | 65.5 |
| RMSE | 0.106±0.01 | 0.111±0.01 | 0.114±0.01 | 0.117±0.01 | 0.115±0.01 | 0.165±0.01 | 0.172±0.01 | 0.176±0.01 | 0.181±0.01 | 0.186±0.01 |
W.3: Ablation on Individual Modalities
To evaluate the effects of lighting, sensor errors, and color distortions, we conducted: We tested lighting and sensor errors via:
- Color disturbance with ±10% hue drift on ColorKitti, C3DM, C3DLM, showing GeGS-PCR’s robustness over color-dependent methods (Table 5);
- Geometric insufficiency using color-similar but geometrically different 3DMatch scenes;
- Color-ambiguous scenes experiments (Table 6) where GeGS-PCR’s adaptive fusion with learnable balances geometry and color for stable registration.
Results indicate GeGS-PCR is more robust with less accuracy loss across datasets. Rough registration relies mainly on geometry, with color aiding optimization under reliable conditions.
Table 5. Registration results under ue shift (±10%) on datasets.
| Method | 3DM-5000(No Shift) | 3DM-5000(Hue Shift) | Drop(%) | 3DLM-5000(No Shift) | 3DLM-5000(Hue Shift) | Drop(%) | KITTI(No Shift) | KITTI(Hue Shift) | Drop(%) |
|---|---|---|---|---|---|---|---|---|---|
| GeoTransformer | 92.0±0.3 | 89.5±0.3 | -2.5 | 88.5±0.3 | 84.7±0.4 | -3.8 | 95.4±0.2 | 93.2±0.3 | -2.2 |
| ColorPCR | 97.5±0.2 | 92.7±0.3 | -4.8 | 93.8±0.3 | 89.1±0.3 | -4.7 | 96.1±0.3 | 91.8±0.4 | -4.3 |
| Ours | 97.9±0.1 | 96.8±0.2 | -1.1 | 94.7±0.2 | 93.5±0.2 | -1.2 | 99.7±0.2 | 98.9±0.5 | -0.5 |
Table 6. Registration results on different scenes.
| Method | Scene A(Geo Insufficient) | Scene B(Objects With Similar Color) | Scene C(Repeated Textures) | Average |
|---|---|---|---|---|
| GeoTransformer | 88.4±0.3 | 85.1±0.4 | 86.2±0.3 | 86.6 |
| ColorPCR | 90.3±0.3 | 84.9±0.3 | 83.5±0.4 | 86.2 |
| Ours | 94.7±0.2 | 91.1±0.2 | 92.6±0.2 | 92.8 |
GeGS-PCR model does not explicitly model color confidence weights, but its Geometric-Color fusion module implicitly learns modality confidence (e.g., in Eq. 3). To improve robustness under severe hue shifts (±10%), we calculate local color stability via RMSE and dynamically adjust , boosting color influence in stable areas and reducing it in noisy regions.
Table 7 shows geometry-only GeoTransformer struggles under hue shifts; applying fixed or only at the last layer worsens performance. Using full-layer maintains stability, while GeGS-PCR with color confidence weighting is the most robust with minimal performance drop.
Future work will explore explicit color confidence modeling and enhanced multimodal fusion.
Table 7. Registration results under color confidence weights.
| Method | No Shift | Hue Shift | Drop(%) | Remarks | Color Confidence Info |
|---|---|---|---|---|---|
| GeoTransformer | 92.0±0.3 | 89.5±0.3 | -2.5 | No color | No color confidence |
| Ours(No ) | 96.9±0.1 | 95.1±0.2 | -1.7 | No color confidence, no dynamic fusion | Static color feature usage, no confidence weights |
| Ours(Full Layer ) | 97.6±0.1 | 96.2±0.3 | -1.4 | applied at all layers | No explicit confidence weighting, for all layers |
| Ours(Final Layer ) | 92.5±0.3 | 91.0±0.3 | -1.5 | applied at final layer only features | No color, geometric features only |
| Ours(Color Confidence) | 97.6±0.2 | 96.5±0.3 | -1.0 | With dynamic color confidence | Color confidence dynamically adjusted, weight learned |
| Ours | 97.9±0.2 | 96.8±0.3 | -1.0 | applied as Eq.3 | Color features weighted with in final layer |
W.4: Native Colored LiDAR
We made our best efforts to find natural color point cloud data, but since devices like Hesai XT32, GigaMesh are used in the project and have not been open sourced, we applied GeGS-PCR on real world data from the Kitti-360 for testing.
W.5: Safeguards We discussed the applications and clarified that it uses anonymized public data, focuses on low-risk tasks, will be open sourced under restrictive licenses to prevent misuse, encourages community evaluation to address bias and security concerns.
Q.1: Scalability
See W.1.
Q.2:Color Noise/Distortion
See W.3.
Q3: Generalization Beyond Synthetic Colorization
See W.4.
Q.4: Failure Modes
See W.3.
Q.5: Statistical Significance
See W.2.
Thank you to the authors for the detailed and constructive rebuttal. I appreciate the additional experiments and quantitative analyses, which address several of my initial concerns.
Scalability / Computational Cost: The authors have provided explicit runtime, latency, and GPU memory numbers for both indoor (C3DM) and outdoor (ColorKitti) datasets, showing that GeGS-PCR is practical on current GPUs and comparable to recent baselines. They also propose strategies for further reducing memory usage, which increases my confidence in the deployability of the method.
Robustness to Color Noise/Distortion: The new results under random hue shifts (±10%) and the extended evaluation on KITTI-360 demonstrate the robustness of the method to color channel disturbances and challenging real-world conditions. The explicit comparison to other methods is convincing.
Generalization & Dataset Diversity: The authors have justified their dataset choices and added results on KITTI-360. While further evaluation on more diverse or native colored LiDAR datasets (e.g., ScanNet, Waymo) would be ideal, the presented results on large-scale and non-uniform data are adequate for this stage.
Ablation & Statistical Significance: The response provides statistical results (mean ± std over 10 seeds) and new ablation studies, which improve the credibility and completeness of the experimental validation.
Failure Modes: Additional discussion and experiments show that the system is robust even in the presence of color ambiguities and geometrically similar structures.
Remaining Concerns: The only remaining limitations are minor, such as full testing on more diverse datasets and additional exploration of explicit color confidence modeling, both of which are acknowledged as future work.
Final Recommendation: Based on the author response and new evidence, I believe the authors have adequately addressed my main concerns. I will maintain my borderline accept (score: 4). Should further improvements be incorporated (e.g., more diverse datasets), the paper could be rated higher in the future.
We sincerely thank you for your thoughtful and detailed response, as well as the substantial improvements. We truly appreciate the time and expertise you dedicated to the review process.
The paper introduces GeGS-PCR, a two-stage pipeline for pairwise 3D point-cloud registration under low overlap and incomplete geometry. Stage 1 performs coarse alignment with a Color-Enhanced Geometric-3D Gaussian-Splatting (3DGS) representation: a learned color encoder is fused with down-sampled super-points that are parameterised as anisotropic Gaussians; a dedicated 3DGS self-attention transformer yields an overlap-invariant global feature. Stage 2 refines the pose by jointly minimising a differentiable photometric-plus-geometric loss, rendered from the fused 3DGS scene. Efficiency is maintained with LoRA compression of attention weights. Extensive experiments on Color3DMatch / Color3DLoMatch and a new outdoor ColorKITTI benchmark show state-of-the-art registration recall and markedly lower RTE/RRE compared to recent baselines. The appendix provides reproducibility materials, ablations, theoretical convergence proofs, and a discussion of memory overhead and scene-level extensions.
优缺点分析
Strengths
- Comprehensive evaluation across three benchmarks, large ablation suite (removal of color, 3DGS, photometric loss). Well-designed loss with proof of Lipschitz convergence.
- Clear problem set-up and modular description; appendix lists hyper-parameters and scripts.
- Brings 3D Gaussian Splatting, hot in 3D rendering, to registration ; delivers new SOTA on low-overlap indoor/outdoor data; public ColorKITTI is likely to spur follow-up work.
- Novel combination of colour encoding + 3DGS + joint photometric optimisation; first to apply 3DGS attention to registration.
Weaknesses
-
Runtime/memory not fully quantified: only qualitative note that 3DGS adds “some” overhead; no FPS numbers on KITTI. Coarse-to-fine pipeline cannot yet perform scene-level global registration, acknowledged by authors.
-
Main text occasionally defers key algorithmic details (e.g., 3DGS attention formulation) to appendix, hampering standalone readability.
-
Impact may be limited for lidar-only pipelines without colour.
-
Method builds on several recent ideas (Gaussian-Splatting, transformer attention, LoRA) in an incremental fashion; theoretical novelty is moderate.
问题
- Please report end-to-end latency (ms) and peak VRAM for both stages on KITTI and 3DMatch, and compare against GeoTransformer/ColorPCR. This will clarify practical deployability.
- Have you tested performance when colour channels are synthetically perturbed (e.g., random hue shift) or in night-time lidar scans? An ablation would show whether the geometric branch alone suffices.
- Table 3 shows faster convergence, but how much accuracy is lost if LoRA is removed? Reporting final metrics with/without LoRA would justify this choice.
局限性
- Memory & Latency Overhead. The 3D Gaussian-Splatting representation adds mean/covariance/opacity parameters, pushing GPU usage to ~11 GB in dense indoor scenes; no FPS or memory numbers are reported for KITTI, so deployment on edge devices remains uncertain.
- Strong Reliance on RGB. Both stages leverage colour cues; performance under poor illumination, night-time, or colour-missing LiDAR‐only data is untested, leaving robustness to colour noise unclear.
- Pairwise-only Rigid Alignment. The pipeline estimates a single SE(3) transform per pair; multi-view or scene-level registration (e.g., full SLAM) is not evaluated, limiting direct applicability to sequential mapping tasks.
格式问题
N/A
We sincerely appreciate the reviewer’s valuable and constructive comments, which have been instrumental in improving the quality of our manuscript.
W.1: Runtime/memory not fully quantified.
We acknowledge that the original version of our manuscript lacked a detailed quantification of the runtime memory overhead of 3DGS. We have conducted additional performance evaluations, the results are summarized in Tables 1 and 2:
- FPS: 2.94 on the ColorKitti dataset, 1.010 on the C3DM dataset
- Latency: 16.7 ms on ColorKitti, 15.3 ms on C3DM
- GPU Memory Usage: 2–4 GB on both datasets
- Lower model and pose estimation time on ColorKitti compared to competing methods
These quantitative metrics clearly indicate that GeGS-PCR is practically test on RTX 4070 GPU and competitive with mainstream approaches.
Table 1. Quantified Performance on ColorKitti.
| Model | Model Time (s) | Pose Time (s) | Total Time (s) | End-to-End Latency (ms) | FPS | GPU Usage (GB) |
|---|---|---|---|---|---|---|
| SpinNet | 6.025 | 0.039 | 6.064 | 60.64 | 0.016 | 3-4 |
| Predator | 0.008 | 0.520 | 0.528 | 52.8 | 1.893 | 3-4 |
| GeoTransformer | 0.018 | 0.322 | 0.340 | 14.5 | 2.913 | 3-4 |
| ColorPCR | 0.016 | 0.550 | 0.566 | 18.6 | 2.263 | 3-4 |
| GeGS-PCR (ours) | 0.020 | 0.327 | 0.347 | 15.3 | 2.540 | 2-4 |
Table 2. Quantified performance on C3DM.
| Model | #Sample | End-to-End Latency (ms) | FPS | GPU Usage (GB) |
|---|---|---|---|---|
| FCGF | 5000 | 33.78 | 0.296 | 3-4 |
| D3Feat | 5000 | 31.12 | 0.321 | 3-4 |
| SpinNet | 5000 | 60.646 | 0.016 | 3-4 |
| Predator | 5000 | 51.52 | 0.194 | 3-4 |
| CoFiNet | 5000 | 17.50 | 0.544 | 3-4 |
| GeoTransformer | 5000 | 15.33 | 0.613 | 2-4 |
| PEAL | 5000 | 18.65 | 0.537 | 3-4 |
| ColorPCR | 5000 | 17.12 | 0.585 | 3-4 |
| Ours | 5000 | 16.72 | 0.588 | 2-4 |
| CoFiNet | all | 19.1 | 0.699 | 3-4 |
| GeoTransformer | all | 15.4 | 0.953 | 2-4 |
| PEAL | all | 17.3 | 0.752 | 3-4 |
| ColorPCR | all | 18.8 | 0.775 | 3-4 |
| Ours | all | 16.8 | 1.010 | 2-4 |
W.2: Defer Key Algorithmic.
We realized that some key algorithmic details were indeed explained in the appendix. However, due to page limitations in the main text, we had to omit these algorithms from main part. In the revised version, we will carefully move these details into the main text to improve its standalone readability.
W.3: Limited for lidar-only pipelines without color.
GeGS-PCR is primarily designed based on geometric features, with additional Gaussian features incorporated for enhanced registration. During our investigation, we further observed that color information also plays a critical role in improving registration accuracy. We proposed Geometric-3DGS, which globally unifies geometric, Gaussian, and color features into a consistent representation. In cases where color information is unavailable from the beginning (i.e., color-deficient scenarios), GeGS-PCR gracefully falls back to relying on the joint geometric and Gaussian features for registration. The absence of color does not impose inherent limitations or side effects on the method’s applicability. Tables 3 and 4 present the registration performance of GeGS-PCR on standard datasets without color information, and Table 3 directly compares these results with those in Table 1 of the main paper. These additional results will be included in the revised version. In the main text, the ablation study shown in Table 3, line (e), indicates a performance drop when color is removed from the 3DGS model. This is because disabling the color channel prevents the construction of a complete 3DGS representation and affects the differentiable rendering module. This degradation is caused by the internal dependency of the original 3DGS architecture on color, not due to a lack of color information in the dataset itself.
Table 3. Comparison of registration results on 3DM and 3DLM.
| Metric | 3DM-5000 | C3DM-5000 | 3DM-1000 | C3DM-1000 | 3DLM-5000 | C3DLM-5000 | 3DLM-1000 | C3DLM-1000 | 3DLM-250 | C3DLM-250 |
|---|---|---|---|---|---|---|---|---|---|---|
| Feature Matching Recall (%) | 99.5 | 99.5 | 99.4 | 99.5 | 97.4 | 97.6 | 97.0 | 97.1 | 97.0 | 97.0 |
| Inlier Ratio (%) | 76.0 | 76.3 | 86.2 | 86.3 | 53.3 | 53.4 | 66.6 | 66.9 | 70.1 | 70.3 |
| Registration Recall (%) | 97.8 | 97.9 | 97.3 | 97.5 | 90.5 | 90.7 | 90.2 | 90.4 | 89.7 | 89.8 |
Table 4. Registration results without RANSAC on KITTI.
| Model | No-RTE (cm) | Color-RTE (cm) | No-RRE (°) | Color-RRE (°) | No-RR (%) | Color-RR (%) |
|---|---|---|---|---|---|---|
| GeGS-PCR (RANSAC-50k) | 6.7 | 6.3 | 0.18 | 0.16 | 99.8 | 99.9 |
| GeGS-PCR (LGR) | 5.9 | 5.7 | 0.15 | 0.13 | 99.8 | 99.9 |
Q.1: Runtime/memory not fully quantified.
see W.1.
Q.2: Test performance when color channels are synthetically perturbed.
Evaluating the robustness of the system under color variations such as random hue shifts is of great importance. We conducted additional experiments by applying significant random perturbations (±10%) in the HSV color space to simulate color distortions caused by lighting inconsistencies or sensor variations. In Tables 5, GeGS-PCR maintains high registration accuracy and robustness under such hue shift conditions. These new experiments will be included in the revised version to better demonstrate the model’s generalization ability in the presence of color distortions.
Table 5. Registration results under random hue shift (±10%) on all datasets.
| Method | 3DM-5000(NoShift) | 3DM-5000(HueShift) | Drop(%) | 3DLM-5000(NoShift) | 3DLM-5000(HueShift) | Drop(%) | KITTI(NoShift) | KITTI(HueShift) | Drop(%) |
|---|---|---|---|---|---|---|---|---|---|
| GeoTransformer | 92.0±0.3 | 89.5±0.3 | -2.5 | 88.5±0.3 | 84.7±0.4 | -3.8 | 95.4±0.2 | 93.2±0.3 | -2.2 |
| ColorPCR | 97.5±0.2 | 92.7±0.3 | -4.8 | 93.8±0.3 | 89.1±0.3 | -4.7 | 96.1±0.3 | 91.8±0.4 | -4.3 |
| Ours(GeGS-PCR) | 97.9±0.1 | 96.8±0.2 | -1.1 | 94.7±0.2 | 93.5±0.2 | -1.2 | 99.7±0.2 | 98.9±0.5 | -0.5 |
To evaluate the robustness of our method under varying illumination and potential nighttime scenarios, we conducted a comprehensive analysis of three candidate datasets: ScanNet, Waymo, and KITTI-360.
- ScanNet primarily consists of indoor scenes with uniform point cloud density and relatively simple geometry, lacking the challenging conditions of non-uniform density and color disturbances that our method aims to address.
- The Waymo Open Dataset, although it provides RGB images, is designed mainly for detection and tracking tasks. It lacks precise registration ground-truth between LiDAR frames, making it unsuitable for evaluating registration accuracy. Additionally, it has a massive data volume, high preprocessing overhead, and significant viewpoint discrepancies between LiDAR and RGB sensors, which hinders high-quality fusion of color and geometric features.
- In contrast, KITTI-360 offers multiple advantages: large-scale outdoor point clouds captured in real urban environments, highly non-uniform point density (dense foreground and sparse background), well-aligned RGB and LiDAR data through multi-sensor fusion, and the presence of many geometrically similar but texturally distinct objects such as buildings, pedestrians, and vehicles.
So, we selected KITTI-360 as the extended evaluation dataset to better assess the real-world applicability and robustness of our method (colorized KITTI-360 data will release). As shown in Table 6, our method consistently outperforms ColorPCR and other baselines even in the presence of color variations and complex scene structures, demonstrating the effectiveness of the Geometric-3DGS strategy in practical scenarios. We also acknowledge that KITTI-360 is primarily collected during daytime and contains only limited lighting variations or low-light scenes. Due to time constraints, we leave a comprehensive evaluation under nighttime conditions for future work, and we plan to incorporate semantic-level scene understanding to further improve the generalization of our method.
Table 6. Registration results on CKitti-360
| Method | IR(%) | RR(%) | Total-time(s) | RRE(°) | RTE(cm) |
|---|---|---|---|---|---|
| YOHO | 10.99 | 48.13 | 8.31 | ~25.9 | ~0.55 |
| SpinNet | 14.86 | 33.41 | 73.1 | ~9.9 | ~0.47 |
| Geotransformer | 8.4 | 25.30 | 1.66 | ~7.4 | ~0.37 |
| ColorPCR | 12.5 | 45.01 | 1.29 | ~1.58 | ~0.35 |
| Ours | 15.03 | 56.88 | 1.18 | ~1.26 | ~0.28 |
Q.3: Test performance when LoRA is removed.
We have added an ablation study to evaluate the impact of the LoRA module, including a variant where LoRA is removed. As shown in Table 7, row f, removing LoRA leads to a slight drop in registration performance—for instance, the registration recall (RR) decreases from 97.5% to 97.2% on C3DM, and from 89.3% to 89.2% on C3DLM. These results suggest that LoRA primarily helps accelerate convergence during training, while also providing a modest yet consistent improvement in accuracy. We will incorporate these results into the main text in the revised version of the manuscript to enhance the completeness of our analysis.
Table 7. Performance of ablation experiments on C3DM and C3DLM.
| Method | C3DM-PIR(%) | C3DM-FMR(%) | C3DM-IR(%) | C3DM-RR(%) | C3DLM-PIR(%) | C3DLM-FMR(%) | C3DLM-IR(%) | C3DLM-RR(%) |
|---|---|---|---|---|---|---|---|---|
| (a) ColorPCR(baseline) | 89.2 | 99.5 | 80.5 | 96.5 | 62.7 | 96.5 | 56.6 | 88.5 |
| (b) w/o ColorEncoder | 89.4 | 99.5 | 80.6 | 96.6 | 62.8 | 96.6 | 56.8 | 88.6 |
| (c) w/o 3DGS | 89.5 | 99.6 | 80.7 | 96.7 | 63.0 | 96.7 | 56.9 | 88.8 |
| (d) w/o differentiable rendering | 89.6 | 99.5 | 80.8 | 96.8 | 63.1 | 96.8 | 57.0 | 88.9 |
| (e) w/o color | 86.1 | 97.9 | 77.3 | 92.7 | 55.2 | 89.8 | 46.3 | 77.9 |
| (f) w/o LoRA | 92.0 | 99.3 | 86.3 | 97.5 | 63.8 | 97.4 | 59.9 | 90.5 |
| (g)Geometric-3DGS(Full) | 92.0 | 99.6 | 82.4 | 97.6 | 63.9 | 97.4 | 58.7 | 90.2 |
L.1: Memory & Latency Overhead.
see W.1.
L.2: Strong Reliance on RGB.
see W.3.
L.3: Multi-view or Scene-level registration.
We focus is on accurately estimating rigid transformations between point clouds to achieve stable and reliable registration. We haven't explored streaming data or SLAM tasks. We have supplemented our work on KITTI-360 dataset. We plan to extend our method to scene-level registration, integrating semantic scene understanding to enhance adaptability and generalization in complex real-world environments.
During the rebuttal, the authors have effectively addressed most of the reviewers' concerns, including performance evaluations on additional datasets and efficiency analyses. After reading the submission, I believe it was well written, and the technical contribution is solid. Given the positive final ratings from all reviewers, I am inclined to recommend acceptance.