PaperHub
5.3
/10
Poster4 位审稿人
最低5最高6标准差0.4
5
5
6
5
3.5
置信度
正确性3.0
贡献度2.3
表达2.8
NeurIPS 2024

Learning 3D Equivariant Implicit Function with Patch-Level Pose-Invariant Representation

OpenReviewPDF
提交: 2024-05-08更新: 2024-12-19
TL;DR

We design the patch-level pose-invariant 3D feature representation to represent the 3D shape, resulting in the implicit displacement estimation of 3D query points based on the local patch-level pose-invariant representation.

摘要

关键词
Equivariant implicit neural representation; Pose-invariant representation; Generalizability to 3D objects; Robustness to transformation

评审与讨论

审稿意见
5

The paper addresses 3D surface reconstruction from point clouds. It proposes a patchwise rotation equivariant neural network to map query points to their 3D displacement to the surface. The local rotation equivariance allows weight-sharing between similar patches at different orientation, and displacement fields have been shown to outperform occupancy and distance fields. Experiments show the method outperforms the baselines on several datasets.

优点

S1) Using equivariant models to promote better use of model capacity is a great idea. Point clouds often present patches that are similar up to rotation so this design choice makes a lot of sense.

S2) Results are strong and seems state-of-the-art on surface reconstruction from point clouds.

缺点

W1) As far as I understand, the ideas in the paper are not novel, so the contributions are around combining existing ideas. a) NVF [1] introduced the idea of using vector instead of distance fields, b) E-GraphONet [2] uses the idea of rotation-equivariant models for implicit surface representation, c) Zhao et al [3] uses the particular way of achieving equivariance through SVD on point sets. This might not be a deal-breaker since the results are good but more novel ideas would make for a stronger submission.

W2) I think the PCA-based alignment is not very robust. While it is perfectly rotation equivariant given the exact same point cloud patch on a different orientation, I think in practice we would see slightly different patches so the alignment is not guaranteed. Moreover, the way the ambiguity on the axis orientation is resolved seems to rely on the furthest point position so a moving a single point slightly might change the orientation drastically. There are other methods that are use equivariant layers which seem more appropriate such as Vector-Neurons [4] and SE(3)-transformers [5], why weren't they considered?

W3) Given W2 I found the design decision of using the PCA-alignment quite arbitrary. Given that the most related works are NVF and E-GraphONet, I believe a more natural choice would be to modify E-GraphONet to predict vector fields instead of occupancy fields, would it perform better than the proposed method? If the goal is to show that the PCA-alignment can be better than equivariant layers, a comparison against E-GraphONet on occupancy field prediction should have been performed.

References:

[1] Yang et al, "Neural Vector Fields: Implicit Representation by Explicit Learning", CVPR'23.

[2] Chen et al, "3D Equivariant Graph Implicit Functions", ECCV'22.

[3] Zhao, "Rotation invariant point cloud classification: where local geometry meets global topology.", 2021.

[4] Deng et al, "Vector neurons: A general framework for so (3)-equivariant networks", ICCV'21.

[5] Fuchs et al, "SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks", NeurIPS'20.

问题

Q1) Poisson surface reconstruction [6] is a classical method for surface reconstruction that is quite robust, how does the proposed method compare to it and similar follow-up works?

Typos:

L123: P -> P_i

L48, L77: PEFI -> PEIF

References:

[6] Kazhdan et al, "Poisson Surface Reconstruction.", SGP'06.

局限性

The limitation regarding robustness of PCA-alignment should be addressed more clearly (see W2).

作者回复

Thanks for the questions and comments. Please see the following responses.

Q1: Novelty and relevant methods discussion.

Thanks for this question. The major novelty of our approach is to learn the equivariant implicit vector fields for 3D reconstruction. Our novelty lies in the motivation and idea to design the patch-based intrinsic feature learning network. Our framework is inspired by the observation that local patches of 3D shapes repeatedly appear on 3D surfaces if removing the poses, we thus conduct pose normalization, and design the patch feature learning and learnable memory bank for learning the intrinsic patch geometry features. With them, the learned implicit vector fields are equivariant. These designs enable us to achieve state-of-the-art reconstruction results while being robust to the rotations of input points, as shown in the experiments.

Compared with [1], our approach achieves equivariance for the vector field prediction. Compared with [2], our model is equivariant by proposing intrinsic patch geometric feature learning, instead of using vector neurons in [2]. Compared with [3], we tackle different tasks (classification vs. 3D reconstruction), and the model designs are significantly different. We will more clearly discuss the relation and novel contributions compared with these related works in the paper.

Q2: The Robustness of PCA-based alignment and comparison with other alternatives, such as Vector-Neurons[4] and SE(3)-transformers [5].

Thanks for this good question. In our approach, we compute the PCA over local patches of local points. As suggested, we conduct experiments to test the robustness of PCA-based alignment, and compare with other alternatives using Vector-Neurons[4] and SE(3)-transformers [5].

(1) To test the robustness of PCA, when computing the PCA over patches, we randomly perturbed the patch point coordinates by Gaussian noises with σ=0.001\sigma=0.001 (the average distance from knn points to patch center point is about 0.004), resulting in perturbed PCA matrices. The results on the ABC dataset in Table R4-1 show the robustness of PEIF to PCA perturbations.

Table R4-1. Comparison of results for random perturbations of rotation matrices.

MethodsCD↓EMD↓NC↑F-Score↑
w/ perturbation0.2502.6800.9600.990
w/o perturbation0.2412.6720.9690.998

(2) Using the same GPU, we substitute the layers of our network with the Vector-Neurons[4]-based layers, and set hyper-parameters fitting the GPU memory. The results are in Table R4-2. Table R4-3 provides the hyper-parameters and resource consumption of our model and Vector-Neurons[4]-based implementation.

Table R4-2. The results of PEIF with Vector-Neurons [4] equivariant layers.

MethodsCD↓EMD↓NC↑F-Score↑
Vector-Neurons0.3792.7690.8950.955

Table R4-3. The hyper-parameter comparison of Vector-Neurons [4] and Pose-normalization based PEIF on ABC dataset with a single NVIDIA 4090 GPU.

CostVector-Neurons [4]PCA-based
Feature dim64128
Hidden feature dim32256
Para (M)0.717.65
Training time (s, per epoch)283.5934.56
Training memory (G)15.9717.75
Testing time (s, per shape)664.9640.98
Testing memory (G)3.541.56

(3) We also attempt to use the SE(3)-transformers [5] to replace the patch feature extraction (SRM, PFEM) in our model. Using the same GPU, even by setting the batch size to 1 and the feature dimension of 64, we experienced out-of-memory issues when training the SE(3)-transformer based implementation.

As a summary, the PCA-based patch pose normalization enables us to design a light-weight network achieving equivariant implicit function learning, and achieves sota results shown in experiments.

Q3: Further comparison with E-GraphONet [2].

For a fair comparison, we tried to use PEIF to predict the occupancy field (OF) by only changing its loss to regress OF. In such a setting, we use the same training/test dataset as E-GraphONet. The reconstruction results are reported in Table R4-4. We also trained E-GraphONet to predict the vector field, we tried our best but the learned model cannot reasonably reconstruct object surfaces. The integration of training based on both occupancy and vector field prediction deserves us to try in the future.

Table R4-4. The reconstruction results of PEIF predicting occupancy field on ABC dataset.

MethodsCD↓EMD↓NC↑F-Score↑
PEIF (OF)0.8744.71990.7090.675

Q4: Comparison of PSR [6] and similar follow-up works on the ABC dataset.

As suggested, we compare with PSR [6], OccNet [7], and SAP [8] on the ABC dataset in Table R4-5. Our PEIF achieves the best results.

Table R4-5. The reconstruction results on ABC dataset.

MethodsCD↓EMD↓NC↑F-Score↑
PSR [6]1.2074.1370.5350.584
OccNet [7]0.6913.2610.7970.636
SAP [8]0.3762.7650.9460.962
PEIF (Ours)0.2412.6720.9690.998

Q5: Typos.

Thanks and we will fix these typos.

[1] Yang et al. Neural vector fields: Implicit representation by explicit learning. CVPR. 2023.
[2] Chen et al. 3D Equivariant Graph Implicit Functions. ECCV. 2022.
[3] Zhao et al. Rotation invariant point cloud classification: where local geometry meets global topology. Pattern Recognition. 2021.
[4] Deng et al. Vector neurons: A general framework for so (3)-equivariant networks. ICCV. 2021.
[5] Fuchs et al. SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks, NeurIPS. 2020.
[6] Kazhdan et al. Poisson surface reconstruction. Proceedings of the fourth Eurographics symposium on Geometry processing. 2006.
[7] Mescheder et al. Occupancy networks: Learning 3d reconstruction in function space. CVPR. 2019.
[8] Peng et al. Shape as points: A differentiable poisson solver. NeurIPS. 2021.

评论

Dear reviewer FF8y,

Thanks for your questions and suggestions. Considering your insightful comments, we have carefully responded to your questions and will include these revisions in our paper. According to your comments, we have made the following responses. (1) Novelty and relevant methods discussion. (2) The robustness of PCA-based alignment. (3) Implementing our network with Vector-Neurons/SE(3)-transformer for equivariance, resulting in decreased performance/increased memory and computing cost. (4) Further comparison with E-GraphONet by predicting occupancy using our model. We hope these responses addressed your concerns, and we expect to discuss with you in this author-reviewer discussion phase if you have additional questions.

Kind regards,

All authors

评论

W1) Thanks for the reply, I think it does confirm that the method is more of a combination of previous ideas than proposing new ideas. Again, this does not prevent acceptance but I it see as a weakness.

W2) Thanks for the comparison against Vector-Neurons and SE(3)-Transformers. I can see that the SE(3) transformers are much more expensive but I did not expect Vector-Neurons to be so, perhaps some explanation of the reasons is warranted. I think the sentence "the PCA-based patch pose normalization enables us to design a light-weight network achieving equivariant implicit function learning" is the motivation that I found lacking before. Please rewrite accordingly, acknowledging that there are other more expensive equivariant methods that could be used but the simple PCA alignment could be an efficient way to achieve similar or even better performance.

W3) Thanks, it seems that when predicting occupancy fields the proposed method performs worse than E-FGraphONet. Is this a fair comparison between the equivariant architectures themselves since now the task has the same inputs and targets? Can we conclude then that the PCA-alignment can be faster but less expressive than VN-based models?

Q4) Thanks for including the PSR experiment! It is nice to see how far the learning methods can improve results.

Conclusion: I appreciate the rebuttal and new experiments. I think the motivation of PCA-alignment being a simple and fast way to achieve equivariance suits the paper. I now lean towards acceptance. My score is not higher because W1 still stands (novelty is limited).

评论

Thank you for the positive feedback on our rebuttal, and the additional comments/questions. We further respond to these questions as follows.

(1) Vector neuron-based method constructs equivariant neural layers by extending neurons from 1D scalars to 3D vectors, thereby ensuring equivariance when implementing SO(3) actions in vector-based feature space. This higher-dimensional vector-based representation requires higher momery cost, more matrix operations and vector transformations to maintain geometric properties and rotation invariance, and consumes more computational cost.

As suggested by the reviewer, we will analyze and discuss the computational cost considering the other computationally expensive equivariant alternatives, and clarify the motivation on using PCA for patch pose normalization, as a simple yet effective strategy for learning equivariant implicit function, while yielding good experimental results.

(2) Thanks for the question on the comparison of expressiveness of PCA-alignment and VN-based representation. We think that the current experiments can hardly disentangle the effects of PCA/VN from the network design and conclude on the expressiveness comparison between them. In our model, the design of PCA-normalization at patch level is tied with our specific network design. By removing the poses of local patches using PCA-alignment, we propose the modules of "patch feature extraction module" , "intrinsic patch geometry extractor" for learning the patch-level intrinsic geometric features, as well as the "spatial relation module" encoding the off-set-based features between query point coordinate to its neighboring patch (kNN). These designs are integrated with the PCA-alignment for achieving the off-set vector field estimation. The comparative analysis between the PCA-alignment and VN is an interesting topic, and we are interested to design experiments that can disentangle the effects of PCA/VN from the specific network designs to compare them. Due to the limited time, we will consider this in the future work.

审稿意见
5

This paper studies a simple task: input dense point cloud and output the implicit surface reconstruction of the geometry. To achieve this goal, the model uses an "equivariant" network to predict the displacement field. Since the input point cloud is dense, this paper crops the nearest patch on the surface point cloud to the query space point and uses PCA to canonicalize the patch. Once aligned, a transformer will predict the query points displacement vector to the surface. Since the PCA canonicalization is known, the displacement can be transformed back. This simple task is evaluated on shape and scene data.

优点

  • The insight of reusing elementary shapes with different poses is good.
  • This paper, although straightforward, considers using equivariance to model such elementary shapes/local intrinsic patterns.

缺点

  • Heuristic baseline: I have a feeling that the model may depend very much on the dense KNN queries of the surface point cloud, in other words, the network is learning a potentially too easy task, just find the nearest point (or interpolate the nearest point) in the patch and compute a displacement to it if the point patch is too dense. A heuristic baseline could be just fitting a small parametric (polynomial, or even plane) to the nearest patch, and analytically computing, or directly finding the nearest point to produce the displacement vector.
  • Noise/Sparse/Partial data? The task might be too easy in the current literature. Since the input is a completely dense point cloud, the geometry is almost given, and this work still depends on the surface point KNN queries to do the canonicalization, so what happens if the input observation is partial, sparse, or noisy?

问题

Please see weakness

局限性

Some limitations are discussed in the end.

作者回复

Thanks for the valuable comments and suggestions. Please see below for the responses.

Q1: A heuristic baseline.

We aim to learn the equivariant implicit function that outputs the vector of each query point to its nearest point on the unknown continuous 3D surface. Since only discrete points on the surface are observed, it is challenging to infer the vector of the query point to the continuous 3D surface based on these discrete points. Instead of using local knn points to regress a local surface function (e.g., polynomial functions) for each query point, our neural network-based approach infers the vector of the query point to the surface by learning the intrinsic geometry feature of the local patch to estimate the vector of each query point. We also design and take advantage of the rotation-invariant patch features for achieving equivariant implicit function learning. This design enables us to achieve sota performance for 3D reconstruction. In Q2, we also evaluate the performance of the degradations of input point clouds.

Q2: The performance of PEIF on sparse/noisy/partial data.

As suggested, we evaluate the performance of our PEIF on different data degradations (sparse, noisy, and partial) on the ABC dataset.

In the following experiments, the test input point clouds with different degradations, and the results are reported in Tables R3-1 to R3-3. In Tables 1-3 of the uploaded PDF file in “general response”, we also report the results when both the training and test point clouds are degraded.

(1) Sparse point cloud. In experiments of the original submission, all the compared methods use the same number of input points (10k) for each shape in testing, as NVF. We randomly select a subset of input points as input, and the results are in Table R3-1.

Table R3-1. The reconstruction results of sparse data on the ABC dataset.

MethodsNNCD↓EMD↓NC↑F-Score↑
NVF5k0.2972.7060.9350.979
GeoUDF5k0.3062.7260.9400.985
GridFormer5k0.2922.6940.9520.982
PEIF (Ours)5k0.2692.6790.9450.988
------------------------------
NVF2k0.4092.7250.9320.946
GeoUDF2k0.3992.7110.9350.952
GridFormer2k0.3692.7030.9450.956
PEIF (Ours)2k0.3602.6850.9380.960

(2) Noisy point cloud. We plugged Gaussian noise with standard deviation (σ\sigma) as 0.005 and 0.01 to the input points.

Table R3-2. The reconstruction results from noisy input on the ABC dataset.

Methodsσ\sigmaCD↓EMD↓NC↑F-Score↑
NVF0.0050.5123.2570.7120.924
GeoUDF0.0050.4963.2680.7320.911
GridFormer0.0050.8393.3210.7930.805
PEIF (Ours)0.0050.4803.1320.7450.952
------------------------------
NVF0.010.7923.6870.7230.693
GeoUDF0.010.7853.4280.7100.655
GridFormer0.011.1323.3790.7590.510
PEIF (Ours)0.010.7733.3580.7150.702

(3) Partial point cloud. We remove a fraction (with ratio pp) of the input points to form a partial point cloud. Specifically, we use the farthest point sampling to select a set of center points and remove their K-NN points to ensure the sampling fraction.

Table R3-3. The reconstruction results from partial points on the ABC dataset.

MethodsppCD↓EMD↓NC↑F-Score↑
NVF10%0.2642.6970.9430.992
GeoUDF10%0.2682.6950.9590.994
GridFormer10%0.2672.7060.9640.982
PEIF (Ours)10%0.2462.6920.9600.996
------------------------------
NVF20%0.2742.7100.9400.990
GeoUDF20%0.2752.7450.9470.991
GridFormer20%0.2982.7460.9460.987
PEIF (Ours)20%0.2492.6970.9560.995
评论

After reading the reviews and rebuttals, I appreciate the author's effort in additional experimental results. The partial/noisy data experiments are convincing. I keep my original positive recommendation.

评论

Thanks for your inspiring questions and positive comments on our work. We will include these additional results and revisions in the paper (main body or appendix).

审稿意见
6

The authors introduce the 3D Patch-level Equivariant Implicit Function (PEIF), leveraging a 3D Patch-level Pose-Invariant Representation (PPIR) to address the surface reconstruction task. To overcome the limitation that existing Implicit Neural Representations (INRs) are not equivariant to 3D rotation, they develop PEIF to encode both equivariant and invariant information, thereby enhancing generalization to unseen 3D rotations. The SE(3)-equivariant implicit function is optimized using displacement optimization loss and patch discrimination loss with ground-truth 3D models. Experimental results on surface reconstruction datasets validate the effectiveness of PEIF.

优点

  1. Motivation: The study is well-motivated, addressing the redundancy in existing INR-based methods concerning local orientation-normalized patches. Moreover, the current methods are weak against unseen rotations of local shapes.

  2. Technical Novelty and Soundness: The introduction of local pose-invariant representation for SE(3) equivariant implicit function is novel for 3D surface reconstruction. Patch-based pose normalization facilitates efficient training without the need for rotation augmentation.

  3. Verification of Rotational Robustness: The authors demonstrate the rotational robustness of the proposed method, as shown in Table 4.

  4. Performance Improvement: The proposed PEIF achieves superior performance compared to both equivariant and non-equivariant surface reconstruction methods on the ShapeNet, ABC, and SyntheticRooms datasets, as indicated in Tables 1 and 2.

缺点

  1. Rotational Robustness: Further clarification is needed regarding the experimental settings in Table 4. Specifically, it is unclear whether "w/o rotation" and "w/ rotation" refer to rotation augmentation during training or testing.

  2. Missing Citations: A similar approach exists in 2D pixel-level correspondence, as detailed in the work by Lee et al. (CVPR 2023). This study also utilizes local-level dominant orientation from rotation-equivariant features and normalizes the equivariant feature using the dominant orientation for an invariant descriptor. It would be beneficial to cite this work and discuss the similarities and differences.

[A] Learning Rotation-Equivariant Features for Visual Correspondence (Lee et al., CVPR 2023)

  1. Computational Cost: There is a lack of discussion regarding the computational cost of the proposed PEIF. Information on computation time and memory consumption, and a comparison with E-GraphONet would be valuable.

问题

Further Research Direction: The concept introduced could potentially be extended to few-shot training scenarios, where the local embedding might capture various types of 3D rotations. Did the authors explore this direction?

局限性

I recommend a weak accept score for this paper due to its strong motivation and technical novelty in addressing the limitations of existing INRs with respect to 3D rotation equivariance. The introduction of the 3D Patch-level Equivariant Implicit Function (PEIF) and its verification of rotational robustness demonstrate a significant advancement in 3D surface reconstruction, achieving state-of-the-art performance on multiple datasets.

However, there are some limitations that should be addressed. The experimental settings regarding rotational robustness need further clarification. Additionally, the paper lacks citations to related works in 2D pixel-level correspondence that employ similar techniques, which would strengthen the discussion of novelty and prior art. Finally, the computational cost of PEIF, in terms of computation time and memory consumption, is not discussed, leaving questions about its practical applicability compared to existing methods. Addressing these points would enhance the overall contribution and clarity of the paper.

作者回复

We thank the reviewer for the positive comment that our approach is well-motivated and novel. Please see below for responses.

Q1: The experimental settings of "w/o rotation" and "w/ rotation" in Table 4.

In Table 4, "w/ rotation" and "w/o rotation" represent that the testing input point cloud is with and without arbitrary rotation respectively. This will be clarified in paper.

Q2: Missing citations of related works in 2D pixel-level.

Thanks for this comment. RELF [1] uses group equivariant CNNs to extract discriminative rotation-invariant descriptors for 2D images. Differently, our framework is designed for 3D reconstruction, and we design equivariant implicit function learning for 3D reconstruction. We will include the reference of equivariant network in 2D and discussion of RELF [1] in our paper.

Q3: Computational cost.

We report the computational cost, including the training time per epoch, training memory, testing time per 3D shape, and testing memory cost on the ABC dataset, in Table R2-1. Methods of GeoUDF and GridFormer include two stages of upsampling/reconstruction and reconstruction/refinement. We report the computation cost of them in each table cell with two values (denoted as · + ·), respectively representing the costs for each stage. We will include these details in the appendix.

Table R2-1. The comparison of computational cost.

CostTraining Time (s)Training Mem (G)Testing Time (s)Testing Mem (G)
NVF28.066.7073.90.66
GeoUDF61.20+58.7814.99+14.99124.732.27
GridFormer26.48+26.786.59+6.5913.320.31
E-GraphONet37.8216.111.921.49
PEIF (Ours)34.5617.7540.981.56

Q4: Further research direction.

It is a good suggestion to extend our approach to few-shot training scenarios. Along this direction, our patch-based pose invariant representation can be taken as a foundation network for pre-training, followed by fine-tuning on few-shot examples. In the pre-training step, we may learn a general representation of intrinsic 3D patch features, and the fine-tuning may adapt these representations to the given few-shot examples. We will include this direction as a future work in the conclusion section.

[1] Lee et al. Learning rotation-equivariant features for visual correspondence. CVPR. 2023.

评论

Dear reviewer 6vbf,

Thanks for the valuable questions/suggestions and the overall positive comments on the motivation and novelty. We have carefully responded to your questions and will include these revisions in our paper. In the rebuttal phase, we have responded in the following aspects. (1) The setting of w/o or w/ rotation. (2) Missing citation and the corresponding discussion. (3) More details on the computational cost. (4) The suggested future research direction. If you have additional questions, we expect to discuss them with you in this author-reviewer discussion phase.

Kind regards,

All authors

评论

Thank you for your rebuttal.

W1) The authors have addressed my question by clarifying the meaning of the table.

W2, W4) The authors should discuss existing similar methods related to equivariant to invariant mapping in 2D image matching [1] in their final revision.

W3) The rebuttal has released my concerns.

Overall, I am satisfied with the author’s rebuttal, and I am scoring a weak accept for this paper.

评论

Thank you for the valuable questions and the positive feedback. As suggested, we will discuss the related work of equivariant/invariant mapping in 2D image matching [1] in the final revision.

审稿意见
5

In this paper, the authors address the task of surface reconstruction. They propose a patch-level pose-invariant representation of 3D objects, which is employed in the design of a patch-level equivariant implicit function. The proposed PEIF framework is composed of three modules: the spatial relation module, the patch feature extraction module, and the intrinsic patch geometry extractor. They authors demonstrate the effectiveness of the proposed framework for the surface reconstruction task through comprehensive experimental evaluations.

优点

  • The authors introduce a novel pose normalization scheme and a displacement predictor that employs the proposed pose normalization scheme, accompanied by rigorous proofs
  • The proposed method shows the state-of-the-art performance in the surface reconstruction task, surpassing other equivariant method (i.e. E-GraphONet)
  • The proposed method shows the state-of-the-art performance in the cross-domain evaluation setting

缺点

  • The proposed method exhibits a significantly longer inference time compared to other equivariant methods (E-GraphONet). It appears that the majority of thie increased inference time results from the computation of the SVD. A detailed analysis of the inference time would be beneficial for a more comprehensive understanding of the proposed method
  • The proposed method shows comparable performance compared to GeoUDF (which is not an equivariant method). This raises questions about the necessity of using an equivariant method for the surface reconstruction task
  • The authors claim that the proposed pose-invariant property is intended to enhance the cross-domain generalization ability. To validate this, the cross-domain experiment should include the result of E-GraphONet (which is also pose-invariant)
  • In table 4, it seems that other algorithms also quite robust to rotation changes. The authors are encouraged to provide further explanation on this observation.
  • An ablation study concerning the three modules (i.e., the spatial relation module, the patch feature extraction module, and the intrinsic patch geometry extractor) is not provided. The inclusion of such a study would be valuable.

问题

  • Could the authors provide a visualization of the learned memory bank? This would aid in comprehending the proposed intrinsic patch geometry extractor.
  • What value of K is used in the MGN experiments?
  • Minor comments
    • L57: 3D construction -> 3D reconstruction
    • L62: introduces Transformer -> introduces transformer
    • Figure2: displacement Predictor -> displacement predictor
    • L180: multi-head memory bank index starts from 1, but it starts from 0 in Figure 3
    • L191: displacement estimate -> displacement estimation
    • L234: distance(CD -> distance (CD

局限性

The authors have adequately addressed the limitations and societal impact in the main paper.

作者回复

Thanks for these comments. We address the concerns and questions as follows.

Q1: The visualization of the learned memory bank.

We provided two approaches for visualizations of the learned memory bank. Please refer to Figure 1 in the attached PDF file uploaded in the top “general response”.

(1) We visualize the set of point patches with the highest weights to the corresponding element of the memory bank (the weights are computed by Eqn. (12)). These patches are highlighted by colors in these examples. It shows that the patches with high weights to each element of memory have similar geometry structures.

(2) We further visualize (by t-SNE) the features of point patches with the highest weights (Eqn. (12)) to different elements of the learned memory, rendered by different colors. It shows that the patches with high weights assigned to different elements of learned memory have clustered features in the feature space.

We will include these visualizations in the appendix of our paper.

Q2: What value of KK is used in the MGN experiments?

KK is set to 54 in testing on the MNG dataset, using the trained model with K=54K=54 on Synthetic Rooms dataset. As shown in Table R1-1, when changing KK to 48 and 32 on the Synthetic Rooms in training, the test results using the corresponding KK on MGN are stable. We also presented the ablation studies on KK in Table 5 of the manuscript.

Table R1-1. The impact of KK when training on Synthetic Rooms and testing on MGN.

KCD↓EMD↓NC↑F-Score↑
480.2472.7240.9610.998
320.2522.7350.9590.991

Q3: Minor comments on typos.

Thanks and we will correct them.

Q4: A detailed analysis of the inference time.

In Table R1-2, we report the time consumption of each operator in PEIF to process 10,000 query points. Specifically, the operations include SVD (Singular Value Decomposition), PE (Point-wise Feature Extraction), SRM (Spatial Relation Module), PFEM (Patch Feature Extraction Module), IPGE (Intrinsic Patch Geometry Extractor) and Others (other Conv layers).

Table R1-2. The time to process 10,000 points on the ABC dataset using one NVIDIA 4090 GPU.

OperatorSVDFESRMPFEMIPGEOthersTotal
Time (s)0.44730.17730.02970.00040.00080.020.6688

We will add them in the main body or appendix of our paper.

Q5: Comparison with GeoUDF and necessity of equivariance.

As shown in Tables 1, 2 of the manuscript and Tables R3-1 to R3-3 in response to Reviewer H6Ua, our results are better than GeoUF, especially for degraded data. Moreover, Table R2-1 in response to Reviewer 6vbf shows that our time/memory cost is lower than GeoUDF.

(1) Learning the equivariant implicit representation ensures that the reconstructed surfaces are robust to the rotation of the input points. Figure 2 of the uploaded PDF in “general response” shows that GeoUDF generated 3D surfaces with some noise artifacts after rotation. While our results are more smooth and stable to rotations.

(2) The equivariance/invariance is important to our model's performance. We learn the equivariant implicit vector field based on patch intrinsic geometric features by removing patch poses. It is inspired by the observation that local patches are repetitively appearing on 3D shapes if removing patch poses. This novel idea is essential for achieving improved reconstruction accuracy, while robust to rotations. As shown in Table 5, if removing the equivariance design (i.e., w/o pose normalization), the 3D reconstruction accuracy apparently decreases.

Q6: The comparison of cross-domain generalization ability with E-GraphONet.

As suggested, we compare with E-GraphONet in the cross-domain experiment as reported in Table R1-3. Our PEIF achieves better quantitative results on the MGN dataset. These results will be included in Table 3 of our paper.

Table R1-3. The cross-domain evaluation on the real MGN dataset, where the model is pre-trained on the Synthetic Room dataset.

MethodCD↓EMD↓NC↑F-Score↑
E-GraphONet0.4333.8170.8630.920
PEIF (Ours)0.2412.6720.9690.998

Q7: Further clarification on the robustness of the competed algorithms to rotation changes.

Thanks for the question. We have further evaluated the rotation robustness of the methods in Table 4 to different rotation angles, which is reported in Table R1-4. Figures 2, 3 of the uploaded PDF in "general response" demonstrate that NVF and GeoUDF exhibit noise and holes after rotation, while our results are more smooth and robust to rotations. In Q5, we also discussed the importance of equivalence/invariance of vector field/patch features to our model for achieving good performance.

Table R1-4. The results for different rotation angles (0/90/180/270) on the ABC dataset.

MethodCD↓EMD↓NC↑F-Score↑
NVF0.245/0.260/0.263/0.2622.685/2.683/2.685/2.6870.963/0.950/0.944/0.9520.996/0.994/0.993/0.993
GeoUDF0.245/0.256/0.253/0.2632.688/2.691/2.698/2.6830.9640.956/0.958/0.9660.997/0.993/0.996/0.994
E-GraphONet0.432/0.441/0.436/0.4452.688/2.696/2.702/2.6900.910/0.906/0.906/0.8970.906/0.896/0.909/0.906
PEIF(Ours)0.241/0.249/0.247/0.2432.672/2.675/2.678/2.6760.969/0.966/0.964/0.9680.998/0.996/0.998/0.998

Q8: An ablation study of the three modules.

We conduct an ablation study of the three modules (e.g., SRM, PFEM, IPGE) in our PEIF on the ABC dataset. Table R1-5 reports the quantitative measures of our PEIF without these modules. The results show that the intrinsic patch geometry extractor (IPGE) contributes more to the performance of PEIF.

Table R1-5. Ablation study of the three main modules proposed in PEIF.

SettingCD↓EMD↓NC↑F-Score↑
w/o SRM0.2432.6860.9620.997
w/o PFEM0.2442.6990.9610.996
w/o IPGE0.2762.7150.9590.992
PEIF (Full)0.2412.6720.9690.998
评论

Dear reviewer dfBj,

Thanks for your inspiring suggestions and questions. We have carefully considered your questions/concerns, and responded in the following aspects. (1) We have provided the visualization of the learned memory bank, showing that the elements of the memory bank represent patch-level 3D geometric patterns. (2) We have presented more details on the inference time, the setting of K, etc. (3) We have provided additional experiments on the justification of the necessity of equivariance and robustness to the rotations. (4) We have conducted ablation studies on the key modules and the cross-domain comparison with E-GraphONet. Due to the limited remaining time for authors-reviewer discussion, we are expecting to have further discussion with you if there are any additional concerns.

Kind regards,

All authors

评论

I appreciate the authors' effort in providing additional results. The rebuttal effectively clarified and addressed the key issues I had with the paper. I lean towards acceptance.

评论

Thank you for the positive feedback. We will incorporate the corresponding revisions into our paper.

作者回复

General Response

We appreciate the reviewers' positive comments on the novelty (especially Reviewers dfBj, 6vbf, H6Ua), motivation (especially Reviewers 6vbf, FF8y, H6Ua), and performance gain (especially Reviewers 6vbf, H6Ua, FF8y). We have responded to these questions/suggestions of reviewers, and will incorporate the corresponding revisions into our paper's main body or appendix.

Following this general response, we uploaded a PDF file containing the figures/additional tables, referred to in the responses to each reviewer's comments.

评论

Dear Reviewers,

The authors have posted their rebuttals to the reviews. Could you please respond to the rebuttals? Please engage in the discussion with the authors. Your help is much appreciated.

Thanks,

AC

评论

We would like to thank ACs for handling our paper, and the reviewers for their inspiring questions and generally positive comments on the motivation/novelty/experimental performance of our approach.

We briefly summarize the contributions in our paper. The major novelty lies in the intrinsic geometry modeling at the patch level, which is achieved by learning the pose-normalized intrinsic patch geometric feature, enhanced by the learnable memory banks with elements modeling the representative local 3D geometric features. These designs are motivated by the goal of taking advantage of the patch repetition after respective pose normalization, and attaining the intrinsic geometric representation of 3D shapes.

According to the reviewers' suggestions, we have conducted some additional experiments and a more detailed analysis of our model. We will incorporate them in the main body or appendix of the revised manuscript.

Thanks again for the valuable reviews in improving our work.

最终决定

Most reviewers have agreed on the soundness and good performance of the proposed work. There was some differences among the reviewers about the significance of the novelty, and the issue was that the main idea could be seen as somewhat combinatorial. The AC reviewed the paper in detail to yield the following conclusion:

It is true that the concepts of neural vector fields and rotation equivariance are borrowed from other works, and the use of SVD for rotation calculation is common in the field. However, these are either more of a larger framework or a basic tool. On the other hand, designing an implicit function based on patch-wise representation (for rotation equivariance) is interesting and novel. This representation is not straightforward and is not a direct combination of building blocks, so there is a non-trivial contribution. Accordingly, an accept decision is recommended.