PaperHub
4.3
/10
Rejected4 位审稿人
最低2最高5标准差1.3
5
5
5
2
4.5
置信度
正确性2.8
贡献度2.3
表达2.8
NeurIPS 2024

NeCGS: Neural Compression for 3D Geometry Sets

OpenReviewPDF
提交: 2024-05-11更新: 2024-11-06
TL;DR

A highly effective neural compression scheme for 3D geometry sets.

摘要

This paper explores the problem of effectively compressing 3D geometry sets containing diverse categories. We make the first attempt to tackle this fundamental and challenging problem and propose NeCGS, a neural compression paradigm, which can compress hundreds of detailed and diverse 3D mesh models ($\sim$684 MB) by about 900 times (0.76 MB) with high accuracy and preservation of detailed geometric details. Specifically, we first represent each irregular mesh model/shape in a regular representation that implicitly describes the geometry structure of the model using a 4D regular volume, called TSDF-Def volume. Such a regular representation can not only capture local surfaces more effectively but also facilitate the subsequent process. Then we construct a quantization-aware auto-decoder network architecture to regress these 4D volumes, which can summarize the similarity of local geometric structures within a model and across different models for redundancy elimination, resulting in more compact representations, including an embedded feature of a smaller size associated with each model and a network parameter set shared by all models. We finally quantize and encode the resulting features and network parameters into bitstreams through entropy coding. After decompressing the features and network parameters, we can reconstruct the TSDF-Def volumes, where the 3D surfaces can be extracted through the deformable marching cubes. Extensive experiments and ablation studies demonstrate the significant advantages of our NeCGS over state-of-the-art methods both quantitatively and qualitatively. We have included the source code in the Supplemental Material.
关键词
geometry compression

评审与讨论

审稿意见
5

The manuscript introduces a neural compression paradigm for effectively compressing diverse sets of 3D geometry models. The authors propose a two-stage framework that first converts irregular mesh models into a regular 4D TSDF-Def volume representation and then employs a quantization-aware auto-decoder network to achieve redundancy elimination and compact representation. The method claims to compress a large number of 3D mesh models with high accuracy and preservation of geometric details, outperforming state-of-the-art methods both quantitatively and qualitatively.

优点

  • The paper presents a unique method for compressing 3D geometry sets by leveraging neural networks, which is a significant advancement in the field. NeCGS achieves an impressive compression ratio, which is a critical metric for 3D geometry data compression.
  • The method maintains high accuracy and preserves detailed geometric structures even at high compression ratios. The authors have conducted comprehensive experiments and ablation studies across various datasets, demonstrating the effectiveness of their approach.
  • The inclusion of source code in the supplemental material enhances the reproducibility and transparency of the research.
  • The paper is well-organized, with clear explanations of the methodology and results.

缺点

  • The manuscript mentions that the optimization process for TSDF-Def volumes is time-consuming (over 15 hours), which could be a limitation for practical applications. The manuscript should address the long optimization time required for the TSDF-Def volumes. Future work could focus on accelerating this process to make the method more practical.
  • While the method performs well on tested datasets, it is unclear how well it generalizes to other, more complex, or varied 3D geometry sets, such as some geometry with thin structures or open boundaries (cloth).
  • The choice of an auto-decoder network is effective, but the paper could benefit from a more detailed explanation of why this architecture was chosen over others.
  • While the method outperforms existing techniques, a more thorough comparison in terms of trade-offs, especially related to computational resources, would be insightful.
  • The paper could provide more insights into how the method scales with the size and complexity of the 3D geometry sets. The paper should include scalability tests to understand how the method performs with larger and more complex datasets.

问题

The manuscript presents a contribution to the field of 3D geometry data compression with the introduction of NeCGS. The innovative approach of using a neural network for compression and the high compression ratios achieved is commendable. However, there are several areas where the manuscript could be improved: computation efficiency, scalability on various data, and other minor issues. In conclusion, the manuscript is well-written and presents a promising new direction for 3D geometry compression. Addressing the above points will significantly enhance the manuscript's contribution to the field. I am on the fence, and looking forward to the reply and other reviews.

局限性

NA

作者回复

Comment 1. The manuscript ... more practical.

Response:

  • Thanks for the valuable suggestion. First, we clarify the optimization process of converting 3D models into TSDF-Def 4D volumes is efficient, as shown in the table below, while 15 hours refers to the time consumed by the whole compression process, including the TSDF-Def representation process and optimization process of both auto-decoder and features.
  • The optimization process of converting the 3D models of a dataset into TSDF-Def volumes can be parallelized using multiple GPUs. In our experiment, we utilized 8 NVIDIA RTX 3090 GPUs.The table below details the time consumed for processing the mixed dataset. |Resolution| Time per Shape (s)|Total Time (h)| |-----|-----|-----| |32|22.39 | 0.46 | |48| 24.34| 0.50| |64| 26.67| 0.55 | |128| 28.03| 0.58| |256|41.51 |0.86 |
  • Second, we want to emphasize that our NecGS is designed for the offline compression of 3D geometry datasets to save storage space, where the optimization/compression time should not be a key factor. Instead, we should concentrate more on the decompression/decoding speed because the users expect to obtain the 3D models timely when querying the dataset. As shown in Table 3, when the resolution is 128, the decompression time of our NeCGS is only 98.95 ms, satisfying the real-time requirement.
  • Moreover, there are various potential solutions to accelerate the optimization. At the software level, more efficient convolution (e.g., using multiple 1-D or 2-D convolution to approximate the 3-D convolution or using 3D sparse convolution to replace the traditional 3D convolution) can be used to speed up the process. At the hardware level, optimization programs can be run using multiple GPUs or other more efficient hardware. Like NeRF, the initial algorithms required several days for optimization. Subsequently, improved methods have been introduced to accelerate the optimization process to a few minutes or even seconds.

Comment 2. While the method ... open boundaries (cloth).

Response:

  • Our NeCGS can be adapted to newly added 3D models to the dataset. Given a new 3D model, we first represent it into a TSDF-Def volume through Algorithm 1. Then following the optimization progress in Sec. 3.2, we only optimize its corresponding embedded feature while keeping the trained decoder unchanged. The visual results are shown in Fig. R1 of the uploaded PDF, demonstrating its generalization ability to new geometry data. Moreover, in this situation, the optimization is significantly less because only embedded features are optimized.
  • The dataset we used indeed consists of complex shapes, as shown in Fig. 1, where the complex structures of the decompressed models remain. In Fig. R5 of the uploaded PDF files, we demonstrate additional complex shapes.
  • The signs in TSDF are ambiguous on models with open boundaries, our method cannot directly compress models of this category. However, an alternative and straightforward approach is to utilize UDF (unsigned distance field) rather than SDF to represent them in our method, which allows for the processing of models with open boundaries.

Comment 3. The choice of an ... over others.

Response:
In the ablation study, we indeed compared our auto-decoder framework with auto-encoder, a widely used framework. The visual results shown in Fig. 8(a) demonstrate the superiority of our decoder-based structure. In the auto-encoder framework, the embedded features are adjusted by optimizing the encoder, which is less flexible than the auto-decoder, where the embedded features are optimized directly. In the final version, we will provide more explanations.

Comment 4. While the ..., would be insightful.

Response:
We refer the reviewer to the 4th response to Reviewer qNSe for comparison of compression time.

Comment 5. The paper ... complex datasets.

  • Thank you for the insightful comments. In the final version, we will complement more discussions about this point. In our experiments, the embedded features and the decoder are optimized over a fixed number of epochs with a constant batch size. Consequently, the overall optimization time and computational expenses scale proportionally with the amount of geometric shapes being compressed.
  • To validate this, in addition to the mixed dataset utilized in the experiments (600 shapes), we create two additional mixed datasets of different sizes by selecting 100 and 300 shapes from the remaining three datasets. This results in mixed datasets comprising 300 and 900 shapes, respectively. The table below shows the optimization times for various Mixed datasets. |# Shapes|Optimization Time (h)| |-----|-----| |300|8.25| |600|16.32| |900|24.37|
  • More importantly, we also want to emphasize that our NeCGS our NecGS is designed for the offline compression of 3D geometry datasets to save storage space, where the optimization/compression time should not be a key factor. Instead, we should concentrate more on the decompression/decoding speed because the users expect to obtain the 3D models timely when querying the dataset. As shown in Table 3, when the resolution is 128, the decompression time of our NeCGS is only 98.95 ms, satisfying the real-time requirement.
  • Besides, there are various potential solutions to accelerate the optimization. At the software level, more efficient convolution (e.g., using multiple 1-D or 2-D convolution to approximate the 3-D convolution or using 3D sparse convolution to replace the traditional 3D convolution) can be used to speed up the process. At the hardware level, optimization programs can be run using multiple GPUs or other more efficient hardware. Like NeRF, the initial algorithms required several days for optimization. Subsequently, improved methods have been introduced to accelerate the optimization process to a few minutes or even seconds.
评论

Thanks for your great efforts! After reading the response, some major issues have been addressed well. I would keep my original score. Thanks!

评论

It is wonderful to get your further feedback mentioning that our initial responses have effectively tackled your concerns. We appreciate your recognition of our efforts.

评论

Dear Reviewer 18DA

Thanks for your time and effort in reviewing our manuscript and the favorable recommendation. In our previous response, we addressed your remaining concerns directly and comprehensively. We very much look forward to your further feedback on our responses.

Best regards,

The authors

审稿意见
5

This paper proposes a neural compression algorithm, NeCGS to significantly compress geometry datasets. The algorithm mainly consists of 2 components, 1) regular geometry representation: This is an optimization algorithm to optimize the TSDF field such that the error between the original geometries and the geometries reconstructed by the deformable marching cube algorithm is minimized and 2) compact neural representation: regresses the optimized TSDF-def fields from compressed latent states, quantizes the latent states and compresses them further into bitstreams. The trained decoder can then be used to reconstruct the TSDF-def fields and the geometries can be reconstructed using the DMC algorithm.

优点

The NeCGS algorithm can provide high compression ratios with impressive reconstruction capability of the geometries. Better geometry representations can be achieved using the proposed optimization algorithm. This is evident from the ability of the DMC method to accurately reconstruct surfaces. The DMC algorithm is also significant and seems to provide better reconstruction of detailed structure in the geometries. Overall, the developed compression method has high potential and the results presented in the paper are very impressive.

缺点

The biggest weakness of the proposed approach is the computational cost of the method. The exorbitantly large times required to compress the datasets reduce the value proposition. Additionally, it is not clear how much the computational cost scales with the size of the geometry dataset.

问题

  • The reconstruction results from the GPCC method are very close to the NeCGS. Would it be possible to compare the compression times as well for all the base line methods?
  • In the ablation study section, the authors compare reconstruction accuracy for resolutions of 64, 128 and 256 and it seems the reconstruction quality does not vary by much. What happens if the resolution is reduced? It would certainly reduce the optimization costs. It would be interesting to find out how low of a resolution can be used that still out performs the baselines and the resolution at which the reconstruction accuracy significantly deteriorates?
  • What happens in the scenario where the geometry dataset needs to be modified or more geometries need to be added? Would the optimization cost be similar or significantly lesser?
  • It would be interesting to see how the optimization cost scales with the size of the geometry dataset?
  • Are the latent vectors of the auto decoder randomly sampled? Can more details be provided regarding that.
  • More details related to the DMC algorithm need to be provided. The workings of the algorithm are not entirely clear from the explanation in the paper.
  • In Fig. 4, is there an upper limit to the compression ratios achieved by the NeCGS? How do the results compare if you increase it?
  • Figure 6 is before 5 in the paper.

局限性

NA

作者回复

Comment 1. The reconstruction results ... baseline methods?

Response: * Actually, when zooming in Fig. 5 of the manuscript, the decompressed shapes by our NeCGS exhibit superior quality to those by GPCC, showcasing significantly smoother shapes decompressed by NeCGS.

  • We refer the reviewer to the 4th response to Reviewer qNSe for comparison of compression time.

Comment 2. In the ablation ... significantly deteriorates?

Response: * The 3D models reconstructed from their much low-resolution TSDF-Def volumes will exhibit substantial errors while leading to a notable increase in compression distortion.

  • In addition to the resolution examined in the ablation study, we also experimented with lower resolutions, specifically 32 and 48. The distortions are presented in the table below. It is evident that decreasing the resolution from 64 to 48 results in a significant increase in distortion. Visual representations can be found in Fig. R2 of the uploaded PDF file.
    |Res.|Size (MB)|Com. Ratio| CD (1e-3) | NC| F1-0.005|F1-0.01| |-----|-----|-----|-----|-----|-----|-----| |32|1.037|364.898|16.982|0.872|0.215|0.542| |48|1.269|298.187|12.240|0.895|0.314|0.705| |64|1.408|268.75|4.271|0.927|0.721|0.966| |128|1.493|253.45|3.436|0.952|0.842|0.991| |256|1.627|232.58|3.234|0.962|0.870|0.995|

Comment 3. What happens ... or significantly lesser?

Response: Our NeCGS can be adapted to newly added 3D models to the dataset. Given a new 3D model, we first represent it into a TSDF-Def volume through Algorithm 1. Then following the optimization progress in Sec. 3.2, we only optimize its corresponding embedded feature while keeping the trained decoder unchanged. The visual results are shown in Fig. R1 of the uploaded PDF, demonstrating its generalization ability to new geometry data. Moreover, it is worth noting that in this situation, the optimization is significantly less because only embedded features are optimized.

Comment 4. It would be interesting ... geometry dataset?

Response: * In the experiment, the embedded features and the decoder are optimized over a fixed number of epochs with a constant batch size. Consequently, the overall optimization time and computational expenses scale proportionally with the amount of geometric shapes being compressed.

  • To validate this, in addition to the mixed dataset utilized in the experiments (600 shapes), we create two additional mixed datasets of different sizes by selecting 100 and 300 shapes from the remaining three datasets. This results in mixed datasets comprising 300 and 900 shapes, respectively. The table below displays the optimization times for various Mixed datasets. |# Shapes|Optimization Time (h)| |-----|-----| |300|8.25| |600|16.32| |900|24.37|

Comment 5. Are the latent vectors ... regarding that.

Response: Before the optimization, the latent vectors (embedded features) are initialized as random Gaussian noise with a mean of 0 and a standard deviation of 1/3. In the experiment, with the quantization limits set at 1 and -1, the 1/3 standard deviation ensures that nearly every value of the initialized latent vectors falls within the [-1, 1] range, aligning with the three-sigma rule of Gaussian distribution.

Comment 6. More details ... in the paper.

Response: *Our DMC is modified from Marching Cubes and used to extract surfaces from TSDF-Def volumes. Besides TSDF, we assign a deformation for each corner of the cubes, making the cubes adjust detailed structures of the shapes, as shown in Fig. 3. The triangle extraction in each cube is the same as the original Marching Cubes, where the only difference is the coordinates of cube corners. We will add more detailed description of DMC in the future version.

  • Algorithm 1 summarizes the whole optimization process to optimize the TSDF-Def volume from the given 3D model. Given 3D model, S\mathbf{S}, we can optimize and obtain its corresponding TSDF-Def volume V\mathbf{V} through Algorithm 1. Initially, we distribute grid points G\mathbf{G} uniformly across space, serving as the corners of the cubes utilized in DMC. Before optimization, we initialize V[...,0]\mathbf{V}[...,0] as the ground truth TSDF at the location of G\mathbf{G} and the deformation as V[...,1:3]=0\mathbf{V}[...,1:3]=0. During the optimization, the optimal TSDF-Def volume could be optimized by minimizing the difference between the reconstructed shape DMC(V)`DMC`(\mathbf{V}) and the original shape S\mathbf{S}. In the future version, we will clarify the unclear descriptions to make the optimization process easier to understand.

Comment 7. In Fig. 4, is ... increase it?

Response: * The compression ratio is highly related to the reconstruction error, i.e., a larger compression ratio generally introduces more serious reconstruction errors. Only discussing the compression ratio does not make much sense. In practice, we need to balance compression ratio and reconstruction accuracy according to requirements.\ *To answer your question, We further increase the compression ratio, and the quantitative results on the Mixed dataset are shown in Fig. R4 of the uploaded PDF file. Obviously, when increasing the compression ratio, our NeCGS is still better than baseline methods.

Comment 8. Figure 6 is before 5 in the paper.

Response: Thanks. We will carefully check the layout.

评论

Dear Reviewer jc1U

Thanks for your time and effort in reviewing our manuscript and the favorable recommendation. In our previous response, we addressed your remaining concerns directly and comprehensively. We very much look forward to your further feedback on our responses.

Best regards,

The authors

评论

Thanks for addressing many of my concerns in the rebuttal. I wanted to follow up on comment 3 and the authors response to that because that seems to be the biggest weakness of the paper at the moment. I really appreciate the work that the authors have put in in such a short period of time to performed all the additional experiments. However, I have some additional comments that are important to address.

As stated in the paper, the main objective is to propose a mechanism to compress large datasets and not to have this mechanism generalize to other datasets. Because the objective is to compress existing datasets the question of compression time is important. From the results it seems that to achieve a reasonable accuracy, a resolution of 128 or 256 is required and the optimization requires about 24hrs for a dataset containing just 600 geometries (roughly 400Mb). Now, if we consider datasets containing 60000 geometries where compression is truly required this method becomes computationally prohibitive and has no utility. The ability to add new samples to the existing dataset without spending much time on optimization would have been an important feature to prove the value of this method. However, from the results provided in Fig. R1 it seems like the reconstruction accuracy of the new geometries does not seem to be as good as the reconstruction accuracy of the training geometries even when the new geometries seem to be somewhat close to the training distribution. Why is that the case? This is a problem because this alludes to the fact that either the decoder is overfit to the training geometries or the decoder does not have the capacity to represent these geometries. Can you provide the error curves of this optimization to verify that these losses are actually decreasing and it is being able to find the correct embedded vectors representing this geometry? Would it be possible to add more diverse geometries to this experiment and report the reconstruction accuracy in each case? How does GPCC perform for the same geometries? Also, the authors state that the cost of optimization is significantly less, can the authors quantify that? In my experience the number of iterations required for this optimization to converge can be significantly larger. I think these details are important to improve the value proposition of the method.

评论

It is great to receive your further feedback, showing that our initial responses have addressed many of your concerns. In the following, we will address your remaining questions.

  1. From the results it seems that to achieve a reasonable accuracy, a resolution of 128 or 256 is required and the optimization requires about 24 hrs for a dataset containing just 600 geometries (roughly 400Mb).
  • As demonstrated in Table 3 of our manuscript, a resolution of 128 is sufficiently precise for the reconstructed meshes, requiring approximately 16 hours to finalize the optimization process.

  • In the response, we have gathered the compression times of different methods. It is worth noting that VPCC necessitates around 40 hours to finalize compression, which is much more time-consuming than our method. Additionally, the training process for method PCGCv2 is also time-consuming (PCGCv2 requires numerous hours for training. In the Table of the initial response, the time only counts the inference time without considering the training time.). Besides, we have explored multiple techniques to expedite the optimization procedure.

  1. The ability to add new samples to the existing dataset without spending much ... training distribution. Can you provide the error curves of this optimization to verify that these losses are actually decreasing and it is being able to find the correct embedded vectors representing this geometry? Would it be possible to add more diverse geometries to this experiment and report the reconstruction accuracy in each case? How does GPCC perform for the same geometries?
  • The table below displays the reconstruction accuracy for unseen meshes during optimization. Through iterative processes, the precision of the reconstructed model is steadily enhanced.
EpochCD (1e-3)NCF1-0.005F1-0.01
1006.3970.9320.5060.890
2005.6180.9420.6760.944
3004.6990.9470.7090.956
4004.5680.9480.7220.959
  • Thingi10K comprises a total of 10,000 distinct meshes. Consequently, the unseen meshes sourced from the Thingi10K dataset exhibit greater diversity, with their reconstructed outcome depicted in Figure R1 of the provided PDF.

  • We evaluate the accuracy of the generalized new meshes, as illustrated in the table below. It is evident that the reconstructed new meshes exhibit greater errors compared to the training meshes. Nonetheless, our method excels in reconstructing the overall shapes and decompressing unseen meshes more accurately than GPCC.

DataCD (1e-3)NCF1-0.005F1-0.01Opt. Time (min/per mesh)
Seen3.4360.9520.8420.9911.60
Unseen Ours4.5680.9480.7220.9591.01
Unseen GPCC11.9410.9120.5510.8540.06
  1. *Now, if we consider datasets containing 60000 geometries where compression is truly required this method becomes computationally prohibitive and has no utility. *
  • The simplest and most straightforward approach involves grouping the samples of the dataset, compressing each group separately in parallel, thereby cutting down on compression time.

  • By solely optimizing the embedded features and keeping the decoder weights fixed for new meshes, the average optimization time per new mesh significantly reduces, offering a fresh approach to compressing extensive geometric data. To achieve this, 1) Initially, a small subset of data is chosen for optimizing the embedded features and decoder weights. 2) Subsequently, the decoder weights are set, and only the embedded features are optimized for the remaining meshes, enabling rough reconstruction post-optimization. 3) The embedded features of all meshes are refined, along with the decoder weights, over several epochs. This three-stage optimization strategy, as opposed to directly optimizing all embedded features and decoder weights, results in considerable time savings.

评论

Dear Reviewer jc1U

Thank you for dedicating your time and effort to reviewing our submission. We hope our thorough responses have effectively addressed your additional comments. We would greatly appreciate hearing from you before the impending discussion deadline.

Best regards,

The authors

审稿意见
5

This paper proposes a method to compress 3D geometry of diverse categories of objects. In the first step, the paper proposes a method to first convert an irregular mesh to a regular representation like a 4D TSDF-Def volume that implicitly describes the geometry. After this, an auto-decoder is trained that learns to reconstruct the 4D TSDF-Def volume from a compressed feature vector which is unique for each shape. Hence, with this design the model can summarize the similarity of local geometric structures within and across different 3D meshes resulting in a compact representation. Results on AMA, DT4D and Thingi10K datasets shows that the model can achieve compression of 3D models to a reasonable extent.

优点

  1. Clarity: the paper is well written with each component of the method explained clearly which is easy to understand.
  2. Reproducibility: All the details to replicate the results are provided along with the code and architecture details in the supplementary material.

缺点

  1. The intuition behind preferring TSDF-Def 4D volume over TSDF 3D volume is unclear, even though an ablation study shows better reconstruction for thin structures. The quantitative results in Table 2 only show marginal improvements. An brief intuitive explanation of the design choice is helpful.
  2. There are lot of methods which try to compress a neural field. For e.g. Triplanes[1], HashGrid [2], Vector Quantization [3], TensoRF [4], Dictionary Fields [5]. It is not very clear why this method does not compare with all these techniques which can be used for compression?
  3. Can this method generalize? Can I use the trained auto-decoder setting to compress a new 3D mesh on which the model is not trained on? How about other methods with which the method compares.
  4. The paper does not do a relative comparison of the compression time with the baseline methods. Given the optimization time shown in Table 3, I have concerns about the practical usage of this method.

[1] Peng, Songyou, et al. "Convolutional occupancy networks." ECCV, 2020.
[2] Müller, T., Evans, A., Schied, C., & Keller, A. (2022). Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG, 2022.
[3] Takikawa, Towaki, et al. "Variable bitrate neural fields." ACM SIGGRAPH, 2022.
[4] Chen, Anpei, et al. "Tensorf: Tensorial radiance fields." ECCV, 2022.
[5] Chen, Anpei, et al. "Dictionary fields: Learning a neural basis decomposition." ACM TOG, 2023.

问题

Please refer to weakness above.

Line 123, "mplicitly" should be "implicitly".

局限性

Limitations are adequately discussed.

作者回复

Comment 1. The intuition behind preferring TSDF-Def 4D volume over TSDF 3D volume is unclear, ... The quantitative results in Table 2 only show marginal improvements. An brief intuitive explanation of the design choice is helpful.

Response:

  • For a geometry dataset usually containing 3D models with various simple and complex structures, TSDF can represent 3D models with simple structures but fails to accurately represent them with complex ones when the resolution is relatively limited. Differently, we propose TSDF-Def 4D volume by assigning additional offsets at each corner of the cubes, so that it can well represent 3D models with both simple and complex structures.
  • As Fig.7 shows, compared to TSDF volume, our TSDF-Def volume can handle thin structures of the shapes and has demonstrated much better performance. In Table 2, the quantitative advantage of our TSDF-Def over TSDF is marginal because the results are the average error of the entire dataset.
  • To address your concern, we divide the dataset into two parts when evaluating compression performance: (1) the shapes with thin and fine structures (20 shapes); and (2) those without detailed structures.
  • The quantitative results are shown in the following table, where it can be seen that our TSDF-Def volumes can significantly reduce the distortion for the models with thing structures. It is obvious that the advantage of our TSDF-Def 4D volume would be more significant than the TSDF 3D volume on the datasets with more complex and fine structures. Fig. R3 demonstrates the selected models. |Representation|Data|CD(1E-3)|NC|F1-0.005|F1-0.01| |----|----|----|----|----|----| |TSDF|All shapes|5.015|0.944|0.662|0.936| |TSDF|Shapes w/ thin structures|11.628|0.454|0.728|0.861| |TSDF|Shapes w/o thing structures|4.786|0.670|0.943|0.947| |TSDF-Def|All shapes|4.913|0.947|0.674|0.943| |TSDF-Def|Shapes w/ thin structures|8.751|0.506|0.800|0.874| |TSDF-Def|Shapes w/o thing structures|4.783|0.680|0.948|0.950|

Comment 2. There are lot of methods which try to compress a neural field. For e.g. Triplanes[1], HashGrid [2], Vector Quantization [3], TensoRF [4], Dictionary Fields [5]. It is not very clear why this method does not compare with all these techniques which can be used for compression?

Response:
The methods in [1-5] are focused on the compact representation of feature volumes, where the continuous feature of any point in the space could be obtained through bilinear or trilinear interpolation. Although these methods could be used to compress geometry models, the required storage space can be as high as several MBs, as shown in Fig.7 of [2], Fig.5 of [5], and so on, which is much larger than our method (averaging only a few KBs per model). When reducing the feature dimension to make the feature to be KBs, these methods fail to recover shapes from their implicit fields. We believe that these algorithms can be combined with some compression techniques and additional efforts in the future to achieve more efficient compression. For these reasons, our current baseline methods do not include these methods.

Comment 3. Can this method generalize? Can I use the trained auto-decoder setting to compress a new 3D mesh on which the model is not trained on? How about other methods with which the method compares.

Response:

  • Thank you for your insightful comment! Yes, our NeCGS can be generalized to new 3D models. Specifically, given a new 3D model, we first represent it into a TSDF-Def 4D volume through Algorithm 1. Then following the optimization progress in Sec. 3.2, we only optimize its corresponding embedded feature while keeping the trained decoder unchanged.
  • The visual results are shown in Fig. R1 of the uploaded PDF, showing the ability to generalize. It is not surprising that our NeCGS can be generalized to new 3D models because after fitting 3D models with various structures, the decoder can learn the prior knowledge of various geometry data, and it can represent unseen models by only optimizing its embedded features. The methods under comparison can also be adapted to new 3D models.
  • GPCC, VPCC, and Draco are traditional compression methods that do not require training, enabling them to compress new meshes. PCGCv2 could utilizes the trained encoder to compress the new models directly. QuantDeepSDF operates as an auto-decoder framework, allowing it to generalize new models similarly to our NeCGS.

Comment 4. The paper does not do a relative comparison of the compression time with the baseline methods. Given ..., I have concerns about the practical usage of this method.

Response:

  • As shown in the table below, compared to compression methods for single shapes, i.e., GPCC, PCGCv2, and Draco, our method requires more time. While compared with VPCC and QuantDeepSDF which can compress the entire dataset at once, the compression process of our NeCGS is faster. We also want to note that according to the quantitative comparison in Fig. 4 and qualitative comparison in Fig. 5 of the manuscript, the compression performance of our method is significantly better than that of the baseline methods. |Method|Compression Time (h)| |-----|----| |GPCC| 0.625 | |VPCC| 39.34 | |PCGCv2| 1.76 | |Draco| 0.03 | |QuantDeepSDF| 18.91 | |Ours|16.32 |
  • Again, we want to emphasize that our NecGS is designed for the offline compression of 3D geometry datasets to save storage space, where the optimization/compression time should not be a key factor. Instead, we should concentrate more on the decompression/decoding speed because the users expect to obtain the 3D models timely when querying the dataset. As shown in Table 3, when the resolution is 128, the inference time of our NeCGS is only 98.95ms, satisfying the real-time requirement.

Comment 5. Line 123, "mplicitly" should be "implicitly".

Response: Thanks. We will correct this typo.

评论

Dear Reviewer qNSe

Thanks for your time and effort in reviewing our manuscript. In our previous response, we addressed your concerns directly and comprehensively. We very much look forward to your further feedback on our responses. Let us discuss.

Best regards,

The authors

评论

I thank the authors for the rebuttal. After carefully reading the rebuttal by the authors and comments by the reviewers, I can confirm that my concerns about generalization and compression time is resolved i.e. although the model has long training time, it can compress the entire dataset at once. Further, as mentioned and showed in the PDF the model can also generalize significantly faster to new shapes. This is beneficial. However, I am not fully convinced by the authors response on compressed neural fields (comment 2). The authors pointed out visual results in Fig 5 of [5]. However, I don't feel it is an apple to apple comparison as the geometry complexity of the model in Fig 5 of [5] is much higher than the complexity of models shown in this paper. Hence, I strongly suggest the authors to show a comparison with at least 1 method (the best one) for further insights. In addition to this, I also suggest the authors to benchmark the models generalization vs compression time trade off on a larger and complex pool of shapes to get a better picture. Having said this, most of my major concerns have been resolved and hence I am willing to increase my score to borderline accept!

评论

It is wonderful to get your further feedback mentioning that our initial responses have effectively tackled your concerns. The authors appreciate your favorable recommendation with the highest confidence. Moving forward, we will tackle the remaining questions you have.

Comment 1. However, I don't feel it is an apple to apple comparison as the geometry complexity of the model in Fig 5 of [5] is much higher than the complexity of models shown in this paper.

Response As shown in our response, we have preliminarily conducted the experiment about [2] and [5] using their released codes on the Thingi10K dataset, when reducing the feature dimension to make the model/feature size to KBs, we cannot extract meshes from them. We will also explore them comprehensively and fairly in the final version.

Comment 2. In addition to this, I also suggest the authors to benchmark the models generalization vs compression time trade off on a larger and complex pool of shapes to get a better picture.

Response Thanks for your valuable comments! We will test the compression performance and generalization of our algorithm on larger and more complex shapes in the final version. Additionally, we will build a benchmark through these data to support the advancement of this field.

Finally, we appreciate the valuable comments and timely feedback from the reviewers.

审稿意见
2

this paper looks at the problem of compressing 3d shapes (esp geometry). this paper proposes a two stage approach. the first stage is regular geometry representation. the second stage is compact neural compression. results show some improvements.

优点

  1. compressing 3d shapes is important to many applications

缺点

  1. this paper over claims what it does. in L1-3, it says that they made the first attempt to tackle the problem of compressing 3D geometry sets containing diverse categories. this isn't true. there are at least two papers doing geometry compression of 3D geometry [a], [b].

[a] On the Effectiveness of Weight-Encoded Neural Implicit 3D Shapes https://arxiv.org/abs/2009.09808 [b] Neural Progressive Meshes https://arxiv.org/abs/2308.05741

  1. [a] and [b] are very important references but they are not cited nor discussed. it's not necessary to compare the proposed method with [a] and [b], but at least the authors should acknowledge the existence of these two papers.

  2. optimization time is too long

  3. it is unclear whether the proposed method is reproducible

  4. typo L43: Matching cubes -> Marching cubes

问题

see comments above

局限性

yes

作者回复

Comment 1. this paper over claims what it does. in L1-3, it says that they made the first attempt to tackle the problem of compressing 3D geometry sets containing diverse categories. this isn't true. there are at least two papers doing geometry compression of 3D geometry [a], [b].

Response: We strongly disagree with you, due to the following facts.

  • After a comprehensive survey, we confirm that all previous methods for 3D geometry compression, including the mentioned [a] and [b] for processing single 3D shapes, have primarily focused on either individual or sequential/dynamic 3D models. Different from previous methods, our NeCGS is designed for the offline compression of 3D geometry sets comprising various unrelated 3D shapes.
  • Technically, our NeCGS is significantly different from [a] and [b]. Specifically, [a] employs an individual MLP to regress the implicit field of a single model, using the MLP weights to represent the model. Meanwhile, [b] is designed for mesh simplification and restoration through triangle merge and divide, with the simplified meshes serving as representations of the original meshes. Differently, to well handle each model of a dataset usually containing 3D models with various structures, we propose innovative TSDF-Def volumes to represent 3D models as structured 4D tensors. TSDF-Def can not only represent 3D models with relatively simple structures like TSDF but also well preserve intricate details of 3D models with complex and fine structures. After optimizing each shape into a TSDF-Def volume, we design an auto-decoder structure to regress these tensors, where the embedded features and decoder weights are quantized during the optimization and encoded to bitstreams to represent the whole geometry set.

Comment 2. [a] and [b] are very important references but they are not cited nor discussed. it's not necessary to compare the proposed method with [a] and [b], but at least the authors should acknowledge the existence of these two papers

Response:

  • The authors confirm that [a] and [b] are not the most relevant references. But we are fine with citing these two papers in the final version.

  • In addition to the response to Comment 1 that has clearly elaborated the differences between ours and [a][b], we further clarify that both [a] and [b] emphasize geometric representation over compression techniques. Besides, there are so many methods for compressing single 3D models, and we have cited well-known methods; however, due to page limitations, it is impossible to cite all of them.

Comment 3. optimization time is too long

Response:

  • We must emphasize that our NeCGS is focused on the offline compression of geometry datasets to save storage space, where the optimization/compression/encoding time should not be a key factor. Instead, we should concentrate more on the decompression/decoding speed because the users expect to obtain the 3D models timely when querying the dataset. As shown in Table 3, the decompression process of our NeCGS is extremely fast, allowing for real-time invocation.

  • Moreover, there are various potential solutions to accelerate the optimization. At the software level, more efficient convolution (e.g., using multiple 1-D or 2-D convolution to approximate the 3-D convolution or using 3D sparse convolution to replace the traditional 3D convolution) can be used to speed up the process. At the hardware level, optimization programs can be run using multiple GPUs or other more efficient hardware. Like NeRF, the initial algorithms required several days for optimization. Subsequently, improved methods have been introduced to accelerate the optimization process to a few minutes or even seconds.

Comment 4. it is unclear whether the proposed method is reproducible

Response:

  • We have included the source code in the submitted supplementary material, as highlighted in the abstract,
  • We have provided sufficient implementation details in Sec. 4.1. Based on the information provided, we believe any qualified researcher can replicate the results in our paper.
  • Reviewer qNSe acknowledged the reproducibility aspect by stating, 'Reproducibility: All the details to replicate the results are provided along with the code and architecture details in the supplementary material'. And Reviewer 18DA also recognize the reproducibility by stating, 'The inclusion of source code in the supplemental material enhances the reproducibility and transparency of the research'.

Comment 5. typo L43: Matching cubes \to Marching cubes

Response: Thanks for this valuable comment. We will correct the typo.

评论

Dear Reviewer tAuE

Thanks for your time and effort in reviewing our manuscript. In our previous response, we addressed your concerns directly and comprehensively. We very much look forward to your further feedback on our responses. Let us discuss.

Best regards,

The authors

评论

RE: Response to comment 1.

I do not think the authors fully understand the references I pointed out. Both [a] and [b] can do offline compression and were tested on Thingi10K which consists of various unrelated 3D shapes.

We further clarify that both [a] and [b] emphasize geometric representation over compression techniques

I do not see how representations and compression techniques are disentangled in this case. Choosing the right representation is part of the compression technique.

RE: Response to comment 3.

It doesn't really matter whether your method is online or offline. If one wants to apply your method to some new shapes and they don't want to lose accuracy, they'll have to go through the optimization process and this process takes a lot of time. It doesn't really matter if your method can uncompress fast.

评论

Dear Reviewer tAuE

It is great to receive your further feedback so that we can discuss your misunderstandings comprehensively.

1) We believe we understood references [a] and [b] exactly, but you have seriously misunderstood them. We suggest you read the two papers carefully again. In your first post, you mentioned 'If you look at Figure 4 of [a], you'll see that their method is not one MLP for one shape.' Note it was posted first at 3:36 pm, but deleted at 3:42 pm. Thus, this sentence is not on the Openreview but appears in the email automatically sent to authors. It is true that [a] was applied to compress a set of 3D models (i.e., 10,000 3D models) from Thingi10K. [a] compresses the 3D models one by one, i.e., an MLP is regressed for a 3D model independently, so there are 10,000 MLPs regressed for the 10,000 3D models. Moreover, we refer you to the GitHub link of [a] (https://github.com/u2ni/ICML2021/tree/main/thingi10k-weightEncoded), where 10,000 files are presented to store the optimized parameters for each shape. [b] utilizes an encoder-decoder framework to achieve mesh simplification and restoration, as shown in Fig. 2 of [b], and it simplifies the meshes one by one, like the baseline method PCGCv2. By contrast, our method regresses a single auto-decoder (i.e., a 3D CNN) for all 3D models involved in a set. Such a shared network by all 3D models can explore the redundancy/local similarity among different 3D models to some extent.

2) As for a standard compression framework, the input is the raw data and the output is its bitstream, where technologies such as quantization and entropy coding are used to reduce the storage. [a] utilizes a tensorflow framework, where the network parameters are stored in h5 files (see L214 at https://github.com/u2ni/ICML2021/blob/main/neuralImplicitTools/src/model.py). The h5 file is different from the bitstream used in the compression field since it would include the names of variables and other attributes. [b] compresses/represents the raw data into simplified triangle meshes, which is significantly different from the bitstream. Thus we clarify that both [a] and [b] emphasize geometric representation over compression techniques.

3) More importantly, in your post, you mentioned '...Choosing the right representation is part of the compression technique', but you have completely ignored/overlooked one of our contributions, i.e., the proposed TSDF-Def representation converting any irregular 3D meshes into regular 4D volumes with fine structures well preserved.

4) ' ...If one wants to apply your method to some new shapes and they don't want to lose accuracy, they'll have to go through the optimization process and this process takes a lot of time.' We also disagree with this comment by you. We have conducted additional experiments to demonstrate the generalization of the trained decoder for new meshes. Specifically, given a new mesh, we only optimize the embedded features for the new mesh and keep the weights of the trained decoder fixed. Such an optimization process is very time-saving, and the decompressed models are accurate enough, as shown in Fig. R1. of the uploaded one-page PDF file. As discussed in our rebuttal, it is not surprising that our NeCGS can be generalized to new 3D models because after fitting 3D models with various structures, the decoder can learn the prior knowledge of various geometry data, and it can represent unseen models by only optimizing its embedded features. The methods under comparison can also be adapted to new 3D models.

5) 'It doesn't really matter if your method can uncompress fast.' We argue that the decompression speed DOES Matter. Imagine that given a set of 3D models that are stored in compressed bitstreams to save space, if you want to get the 3D models for downstream analysis or applications, the slow decompression process will seriously limit the efficiency of the downstream process.

评论

[b] utilizes an encoder-decoder framework to achieve mesh simplification and restoration, as shown in Fig. 2 of [b], and it simplifies the meshes one by one, like the baseline method PCGCv2. By contrast, our method regresses a single auto-decoder (i.e., a 3D CNN) for all 3D models involved in a set. Such a shared network by all 3D models can explore the redundancy/local similarity among different 3D models to some extent.

If you could look at [b], they have one shared network for all 3D models.

I really really do not think this paper is ready for publication. Every reviewer raised different questions and it requires the authors to post such long responses to clarify things. This means that this paper needs a few rounds of major revision. Not to mention, in this case, we already have a few back-and-forth discussions but there is still a lot of confusion. I strongly suggest the authors revise the way of presentation.

评论

Thanks for your quick action.

Comment 1. If you could look at [b], they have one shared network for all 3D models.

Response You still misunderstood [b]. Could you please take the time to read it carefully? As replied in our previous posts, [b] simplifies the meshes one by one, like the compared baseline method PCGCv2. More specifically, [b] trains a shared encoder to simplify the raw meshes one by one, like PCGCv2. [b] and our NeCGS have totally different working mechanisms. Our NeCGS utilizes an auto-decoder framework, where the embedded features and decoder weights are optimized to represent the whole dataset. Besides, our method concentrates on implicit representation, while [b] concentrates on explicit representation, i.e., triangle merge and divide. We believe your confusion is completely caused by your misunderstanding about these methods and lack of knowledge of this topic.

Comment 2. Every reviewer raised different questions and it requires the authors to post such long responses to clarify things.

Response We are wondering whether you have read and fully understood their comments and our responses. We answered all questions directly and comprehensively and provided additional necessary experiments.

Comment 3. Not to mention, in this case, we already have a few back-and-forth discussions but there is still a lot of confusion. I strongly suggest the authors revise the way of presentation.

Response We believe all confusion is due to your lack of relevant foundational knowledge and failure to carefully read the related papers. We believe our explanations have been sufficiently clear, and we strongly recommend you carefully read the paper, supplement the necessary foundational knowledge, and become a more professional reviewer, so that the entire community can progress more effectively.

作者回复

We thank the reviewers for the time and effort in reviewing our work, as well as your recognition of the novelty of our work.

We are grateful to the reviewers for acknowledging our NeCGS algorithm: 1. Reviewers qNSe, jc1U, and 18DA have all noted the significant compression performance of our NeCGS on various datasets, showcasing its effectiveness; 2. Reviewer 18DA recognizes our method as a notable advancement in the field of geometry compression; 3. Reviewers qNSe and 18DA have acknowledged the reproducibility of our NeCGS; 4. Reviewers qNSe and 18DA have praised the clarity and accessibility of our writing.

In this work, we present a new compression framework, namely NeCGS. Different from previous compression methods for compressing either individual 3D models or sequential/dynamic 3D sequences, our NeCGS is the first method focused on the offline compression of 3D geometry datasets usually containing diverse and unrelated 3D models with various structures. Technically, to handle each model of a dataset well, we propose innovative TSDF-Def volumes to represent 3D models as structured 4D tensors. TSDF-Def excels in depicting not only straightforwardly structured 3D models like TSDF but also in accurately preserving intricate details of complex and finely structured 3D models. The extensive experiments demonstrate the significant superiority of our NeCGS over state-of-the-art ones.

In the following, we will respond to the main comments that the reviewers are concerned about.

Compression/Optimization/Encoding Time

  • We have to emphasize that our NeCGS is focused on the offline compression of geometry datasets to save storage space, where the compression (or encoding) time should not be considered as a key factor. Instead, we should concentrate more on the decompression speed because the users expect to obtain the 3D models timely when querying the dataset. As shown in Table 3, the decompression process of our NeCGS is extremely fast, allowing for real-time invocation.

  • Moreover, there are various potential solutions to accelerate the optimization. At the software level, more efficient convolution (e.g., using multiple 1-D or 2-D convolution to approximate the 3-D convolution or using 3D sparse convolution to replace the traditional 3D convolution) can be used to speed up the process. At the hardware level, optimization programs can be run using multiple GPUs or other more efficient hardware. Like NeRF, the initial algorithms required several days for optimization. Subsequently, improved methods have been introduced to accelerate the optimization process to a few minutes or even seconds.

  • As shown in the table below, compared to compression methods for single shapes, i.e., GPCC, PCGCv2, and Draco, our method requires more time.
    While compared with VPCC and QuantDeepSDF which can compress the entire dataset at once, the compression process of our NeCGS is faster.

  • Finally, We also want to note that according to the quantitative comparison in Fig. 4 and qualitative comparison in Fig. 5 of the manuscript, the compression performance of our method is significantly better than that of the baseline methods. Such advantages were also acknowledged by all reviewers.

MethodCompression Time (h)
GPCC0.625
VPCC39.34
PCGCv21.76
Draco0.03
QuantDeepSDF18.91
Ours16.32

Generalize Unseen Models

  • We confirm that our NeCGS can be generalized to new 3D models. Specifically, given a new 3D model, we first represent it into a TSDF-Def 4D volume through Algorithm 1 of our manuscript. Then following the optimization progress in Sec. 3.2, we only optimize its corresponding embedded feature while keeping the trained decoder unchanged.

  • In Fig. R1 of the uploaded PDF file, we also experimentally demonstrate this generalization ability. It is not surprising that our NeCGS can be generalized to new 3D models because after fitting 3D models with various structures, the decoder can learn the prior knowledge of various geometry data, and it can represent unseen models by only optimizing its embedded features. The methods under comparison can also be adapted to new 3D models.

Last but not least, we will make the reviews and author discussion public regardless of the final decision, Besides, we will include the newly added experiments and analysis in the final manuscript/supplementary material.

Thanks again for your time and effort in our submission. We appreciate any further questions and discussions

评论

Dear Reviewers qNSe and 18DA,

Thanks for your time and effort in reviewing our manuscript. In our previous response, we have addressed your comments directly and comprehensively. We are nearing the deadline for the discussion phase between the reviewers and authors. The authors understand that you may have a very busy schedule and would appreciate any further feedback you could provide.

Best regards, The authors

最终决定

This paper proposes a neural compression paradigm based on TSDF-Def volume and encoder-decoder design for compressing sets of 3D geometries with diverse categories. The proposed method outperforms the state-of-the-art methods both quantitatively and qualitatively. The paper received a mixture score of three borderline accept and one strong reject. Initially, the reviewers had common concerns about the compression/optimization/encoding time of the proposed method compared to other baselines and the generalization ability of the method. The authors try to address most of the issues well in the rebuttal discussion stage, however, concerns remain about fair comparisons with existing methods and the ability to add new samples to existing datasets without spending too much time on optimization. Therefore, although AC considers this to be a potentially great paper, many significant revisions to the current manuscript are required to meet NeurIPS publication standards. The AC regrets not accepting this paper but does commend it for its good paper quality and encourages the authors to revise the paper and resubmit it.