PaperHub
5.5
/10
Rejected4 位审稿人
最低3最高8标准差1.8
3
8
5
6
3.8
置信度
ICLR 2024

CSI: Enhancing the Robustness of 3D Point Cloud Recognition against Corruption

OpenReviewPDF
提交: 2023-09-19更新: 2024-02-11

摘要

关键词
Corruption RobustnessPoint Cloud Classification

评审与讨论

审稿意见
3

This paper introduce the CSI, which incorporates DAS and SEM to find the essential subset of the point cloud for representation learning. This improves the model's robustness against data corruption.

优点

  • The focused topic is crucial for real-world applications of point clouds.
  • The introduced SEM demonstrates its effectiveness.

缺点

  • The novelty of the proposed density-aware sampling may be limited as similar ideas have already been explored by previous methods [1,2], but the authors do not provide any comparisons with them.
  • The writing is hard to read and follow. For example,
    • Too many sentences that are too long.

    Similarly, in medical imaging, where point cloud data aids in 3D reconstructions from MRI or CT scans, the presence of artifacts, noise, and incomplete data — arising from limited resolution, patient movement, or implants — poses substantial challenges.

    By studying the robustness among various 3D architectures including PointNet (Charles et al., 2017), PointNet++ (Qi et al., 2017), DGCNN (Wang et al., 2019), etc., they revealed that Transformers, specifically PCT (Guo et al., 2021), can significantly enhance the robustness of point cloud recognition.

    • Inconsistent usage of section references. For instance, Section 3.2 (SELF -ENTROPY MINIMIZATION) uses different references like §3.1\S 3.1, §1\S 1, and Section 2; dad_a and ded_e in equation 4.
  • The performance is not satisfactory. It over claims to "significantly outperform state-of-the-art methods by 5.2% and 4.2% on the respective benchmarks." This also weakens the motivation, as the authors believe that the data augmentation is inadequate in countering data corruption, leading to the proposal of CSI. However, it turns out that CSI performs worse than certain data augmentation techniques.
    • In Table 1, PCT+CSI achieves an ER of 18.4 in ModelNet40-C, which is clearly inferior to PCT+PointCutMix-R (16.3) and PCT+PointCutMix-K (16.5). These two configurations are from the method [3] that originally introduces the ModelNet40-C dataset. I observe that the authors perform the experiment with PCT+PointCutMix-R+CSI in Table 2, but the comparison is unfair because it involves comparing A+B+C against A+B, and the overall improvement is minimal (0.4%).
    • Similar observations can be found in Table 2 of the Appendix, where PCT+PointCutMix-R+CSI fails to outperform PCT+WOLFMix.

References:

[1]: Pointwise Rotation-Invariant Network with Adaptive Sampling and 3D Spherical Voxel Convolution. AAAI 2020.

[2]: Density-adaptive Sampling for Heterogeneous Point Cloud Object Segmentation in Autonomous Vehicle Applications. CVPRW 2019.

[3]: Benchmarking Robustness of 3D Point Cloud Recognition Against Common Corruptions.

问题

  1. Did the authors try using PCT+DAS in Table 1? It would be more illustrative to include this result.
  2. I am still confused about how SEM works. The authors mention that "Applying SEM to the row-wise embeddings in S amplifies the importance of the most crucial point-level feature corresponding to feature row i." Can the authors provide visualizations of attention maps to show which areas of the point cloud are salient?
  3. Why did the authors suddenly switch to using mOA as a metric in Table 2 of the Appendix instead of ER?
  4. Have the authors tried using the mCE metric from PointCloud-C [4] to directly compare the results of this paper with those in [4]?

References:

[4]: Benchmarking and Analyzing Point Cloud Classification under Corruptions.

评论

We are glad that the reviewer found our topic crucial for real-world applications of point clouds and our method effective. We appreciate the opportunity to address the points you have raised.

The novelty of the proposed density-aware sampling may be limited as similar ideas have already been explored by previous methods [1,2], but the authors do not provide any comparisons with them.

We appreciate the reviewer’s concerns regarding the novelty of our proposed Density-Aware Adaptive Sampling (DAS) and the challenges in directly comparing it with methods presented in references [1] and [2]. The method in [1] fundamentally differs from ours in terms of output format and integration with subsequent processing layers. Specifically, our DAS outputs a format of (S, S, H) and is intricately linked with spherical voxel convolution layers, making it incompatible as a direct replacement for FPS or DAS in PCT architectures. Therefore, a direct comparison is not feasible due to these structural and functional differences.

Regarding the method in [2], we faced the challenge of no open-sourced code being available. Despite this, we endeavored to reproduce their method to the best of our ability. Due to time constraints, we implemented a version of Grid Density Sampling (GDS) with a grid size of 8, as larger grid sizes would have necessitated significantly longer training times. The results of this implementation are presented:

ModelNet40-CmEROcclusionLidarDensity Inc.Density Dec.CutoutUniform GaussianImpulse UpsamplingBackgroundRotationShearFFDRBFInv. RBF
PCT+CSI (GDS)21.9654.5459.8511.4614.413.111.6511.322.4313.4845.2819.0414.613.32
PCT+CSI (DAS)18.3856.2360.212.3712.7311.9710.8310.7312.311.6311.1917.4111.9812.05
PointCloud-CmCEScaleJitterDrop-GDrop-LAdd-GAdd-LRotate
PCT+CSI (GDS)0.9961.170.5280.8151.0431.1661.2441.009
PCT+CSI0.7571.1280.4720.560.8890.3290.9240.995

While it is true that the underlying motivation of our DAS may share similarities with these previous works, the implementation and the operational context significantly differ. A key distinction of our DAS is that it is not a grid-based approach, unlike GDS. We believe that grid-based operations can potentially disrupt the inherent structure of point clouds. Our ablation studies on ModelNet40-C and PointCloud-C demonstrate that our non-grid-based approach is more effective, supporting the unique contribution and novelty of our DAS methodology in preserving the integrity of point cloud structures while enhancing sampling efficiency.

The writing is hard to read and follow. For example, too many sentences that are too long, inconsistent usage of section references.

We greatly appreciate your feedback regarding the readability of our manuscript. We recognize that clear and accessible writing is crucial for effective communication of our research. To address your concerns, we will undertake a comprehensive review of the manuscript to simplify and shorten complex sentences, ensuring they are more reader-friendly. Additionally, we will systematically revise the usage of section references to maintain consistency throughout the text. Our goal for the final version is to enhance overall clarity and ease of reading, making our research more accessible to a broader audience. We are committed to improving these aspects to ensure our work is communicated as clearly and effectively as possible.

评论

The performance is not satisfactory. It over claims to "significantly outperform state-of-the-art methods by 5.2% and 4.2% on the respective benchmarks." This also weakens the motivation, as the authors believe that the data augmentation is inadequate in countering data corruption, leading to the proposal of CSI. However, it turns out that CSI performs worse than certain data augmentation techniques.

  • In Table 1, PCT+CSI achieves an ER of 18.4 in ModelNet40-C, which is clearly inferior to PCT+PointCutMix-R (16.3) and PCT+PointCutMix-K (16.5). These two configurations are from the method [3] that originally introduces the ModelNet40-C dataset. I observe that the authors perform the experiment with PCT+PointCutMix-R+CSI in Table 2, but the comparison is unfair because it involves comparing A+B+C against A+B, and the overall improvement is minimal (0.4%).
  • Similar observations can be found in Table 2 of the Appendix, where PCT+PointCutMix-R+CSI fails to outperform PCT+WOLFMix.

The major limitation of data augmentation is that different data augmentation techniques have varying degrees of effectiveness against distinct types of corruption. This is because the process of augmenting data often relies on heuristic approaches, which may not always align with the underlying data distribution. Motivated by the robustness can also be improved from the model architecture perspective, our proposed CSI aims to exploit the distribution shift in unseen data itself rather than in a heuristic way. However, when combined with data augmentation, the margin of improvement is relatively modest. This can be attributed to the inherent complexity of data augmentation methods in 3D point clouds, which often entail the blending of point clouds from different classes. This amalgamation creates a level of ambiguity that makes the identification of a critical subset in the augmented point cloud challenging. We appreciate you pointing out the limitation of CSI when combined with the augmentation. This certified solution remains unexplored and it is a potentially promising avenue for our future work.

Did the authors try using PCT+DAS in Table 1? It would be more illustrative to include this result.

Details of the ablation study (CE) on PointCloud-C are listed as below:

mCEScaleJitterDrop-GDrop-LAdd-GAdd-LRotate
PCT0.9250.8720.870.52810.781.3851.042
PCT+DAS0.7841.0320.5660.5560.8790.3191.0731.065
PCT+CSI0.7571.1280.4720.560.8890.3290.9240.995

Results (ER) on ModelNet40-C are shown as below:

mEROcclusionLidarDensity Inc.Density Dec.CutoutUniform GaussianImpulse UpsamplingBackgroundRotationShearFFDRBFInv. RBF
PCT25.556.676.711.814.314.512.113.939.117.457.918.111.512.4
PCT+DAS19.5957.8266.512.7813.2412.0711.4911.6912.9813.411.5618.7812.4712.91
PCT+CSI18.3856.2360.212.3712.7311.9710.8310.7312.311.6311.1917.4111.9812.05

I am still confused about how SEM works. The authors mention that "Applying SEM to the row-wise embeddings in S amplifies the importance of the most crucial point-level feature corresponding to feature row i." Can the authors provide visualizations of attention maps to show which areas of the point cloud are salient?

Visualization of the attention maps in the last self-attention module before and after applying SEM is exhibited in the appendix. After applying SEM, only critical point-wise correlations are maintained with others being filtered out.

评论

Why did the authors suddenly switch to using mOA as a metric in Table 2 of the Appendix instead of ER?

Just keep aligned with the metric in Table 9 as listed in PointCloud-C paper. We will change the metric from mOA to ER in our final version. The results of ER are listed below:

Model (%) \downarrowCleanERScaleJitterDrop-GDrop-LAdd-GAdd-LRotate
DGCNN7.423.69.431.624.820.729.527.521.5
PointNet9.334.211.920.312.422.287.943.840.9
PointNet++7.024.98.237.215.937.318.127.330.2
RSCNN7.726.110.137.020.031.421.031.731.8
SimpleView6.124.38.222.630.828.129.023.228.3
GDANet6.621.17.826.519.718.525.728.521.1
CurveNet6.222.18.222.917.621.239.727.517.4
PAConv6.427.08.546.324.820.832.035.720.8
RPC7.020.57.928.212.216.527.427.823.2
PCT7.021.98.227.513.120.723.038.122.4
PCT+CSI7.316.310.515.213.918.49.525.321.3
DGCNN+OcCo7.823.415.120.622.421.542.623.318.0
Point-BERT7.830.78.839.817.123.857.039.628.5
PN2+PointMixUp8.521.515.722.519.937.513.516.924.3
DGCNN+PW7.419.18.727.324.518.123.821.010.3
DGCNN+RSMix7.016.112.427.616.212.28.317.318.7
DGCNN+WOLFMix6.812.99.322.617.311.98.411.49.7
PointNet+WOLFMix11.625.719.915.014.322.465.719.323.2
PCT+WOLFMix6.612.79.427.09.410.28.813.910.5
GDANet+WOLFMix6.612.98.527.913.211.49.011.48.8
RPC+WOLFMix6.713.59.530.610.510.69.813.210.3
PCT+CSI+PointCutMix-R7.212.911.311.511.315.98.212.216.3

Have the authors tried using the mCE metric from PointCloud-C [4] to directly compare the results of this paper with those in [4]?

Direct comparison about mCE is listed as below:

OA\uparrowmCE\downarrowScaleJitterDrop-GDrop-LAdd-GAdd-LRotate
DGCNN0.92611111111
PointNet0.9071.4221.2660.6420.51.0722.981.5931.902
PointNet++0.931.0720.8721.1770.6411.8020.6140.9931.405
RSCNN0.9231.131.0741.1710.8061.5170.7121.1531.479
SimpleView0.9391.0470.8720.7151.2421.3570.9830.8441.316
GDANet0.9340.8920.830.8390.7940.8940.8711.0360.981
CurveNet0.9380.9270.8720.7250.711.0241.34610.809
PAConv0.9361.1040.9041.46511.0051.0851.2980.967
PCT0.930.9250.8720.870.52810.781.3851.042
RPC0.930.8630.840.8920.4920.7970.9291.0111.079
PCT+CSI0.9270.7571.1280.4720.560.8890.3290.9240.995
评论

I appreciate the authors' rebuttal. Here is my response.

The novelty of the proposed density-aware sampling may be limited as similar ideas have already been explored by previous methods [1,2], but the authors do not provide any comparisons with them.

Although the usage or architecture differs between [1] and this paper, the authors may provide comparisons on the motivation or working mechanism in theory. Additionally, I would like to appreciate the discussions provided on the method in [2].

The performance is not satisfactory.

This paper (PCT+CSI) is clearly inferior to PCT+data_aug in both ModelNet40-C and PointCloud-C, which hinders its contribution.

ModelNet40-CER
PCT25.5
PCT+CSI18.4
PCT+RSMix17.3
PCT+PointCutMix-K16.5
PCT+PointCutMix-R16.3
PointCloud-CmCEScaleJitterDrop-GDrop-LAdd-GAdd-LRotate
PCT0.9250.8720.8700.5281.0000.7801.3851.042
PCT+CSI0.7571.1280.4720.560.8890.3290.9240.995
PCT+WOLFMix0.5741.0000.8540.3790.4930.2980.5050.488

In addition, the authors claim that the major limitation of data augmentation is that different data augmentation techniques have varying degrees of effectiveness against distinct types of corruption. However, according to table 2, we observe that PCT+WOLFMix consistently enhances the original PCT across all types of corruption except for Scale. On the other hand, PCT+CSI fails to improve performance in both Scale and Drop-G. This finding further weakens the authors' motivation.

评论

Although the usage or architecture differs between [1] and this paper, the authors may provide comparisons on the motivation or working mechanism in theory.

The motivation behind the method proposed in [1] is to address a bias that occurs when point clouds are uniformly sampled into regular spherical voxels. Specifically, points around the pole appear sparser than those around the equator in spherical coordinates, which skews the resulting spherical voxel signals. Therefore, their method aims to adjust this density discrepancy caused by spherical coordinates to better serve subsequent spherical voxel convolution layers. On the other hand, the motivation for our proposed DAS is to automatically filter outliers that are a result of the conventional FPS method. In simple terms, their method is more of a specialized approach designed to improve rotation robustness, while our proposed method aims to enhance robustness against a wider range of corruptions.

In addition, the authors claim that the major limitation of data augmentation is that different data augmentation techniques have varying degrees of effectiveness against distinct types of corruption. However, according to table 2, we observe that PCT+WOLFMix consistently enhances the original PCT across all types of corruption except for Scale. On the other hand, PCT+CSI fails to improve performance in both Scale and Drop-G. This finding further weakens the authors' motivation.

Firstly, we acknowledge that WOLFMix is an efficient data augmentation strategy. However, it's not equivalent to directly comparing PCT+CSI with all other data augmentation methods. This is because PCT+CSI is trained without any data augmentation. Therefore, it would be more appropriate to either compare PCT+CSI with other standalone architectures or to compare a data-augmented PCT+CSI with other data augmentation methods. Our aim is not to replace the data augmentation method but rather to explore ways to improve robustness from the perspective of model architecture. Even though our proposed CSI does not show a significant boost when combined with data augmentation, it still achieves an ER of 15.9 on ModelNet40-C and an MCE of 0.632 on PointCloud-C, which are competitive results. As for making CSI work as significantly as when it is applied without data augmentation, we acknowledge that this is a limitation of our current work. We plan to investigate this issue further in future research.

ModelNet40-CER
PCT25.5
PCT+CSI18.4
PCT+RSMix17.3
PCT+PointCutMix-K16.5
PCT+PointCutMix-R16.3
PCT+PointCutMix-R +CSI15.9
PointCloud-CmCEScaleJitterDrop-GDrop-LAdd-GAdd-LRotate
PCT0.9250.8720.8700.5281.0000.7801.3851.042
PCT+CSI0.7571.1280.4720.560.8890.3290.9240.995
PCT+CSI+PointCutMix-R0.6321.2020.3610.4560.7730.2780.4580.898
PCT+WOLFMix0.5741.0000.8540.3790.4930.2980.5050.488
审稿意见
8

The paper presents an important contribution addressing the challenge of robustness in 3D point cloud recognition. The proposed CSI method shows promising results and demonstrates improvements over existing methods. The authors propose a novel critical subset identification (CSI) method that utilizes the set property of point cloud data to enhance recognition robustness. The CSI framework consists of two components: density-aware sampling (DAS) and self-entropy minimization (SEM), which cater to static and dynamic CSI, respectively. Experimental results show that the CSI approach outperforms state-of-the-art methods on corruption robustness benchmarks.

优点

(1). The paper introduces a novel method, CSI, to enhance the robustness of 3D point cloud recognition against data corruption. This is an innovative and practical contribution that addresses an important challenge in the field.

(2). The CSI framework incorporates two components, DAS and SEM, which provide a comprehensive approach to critical subset identification. The combination of these two techniques allows for both static and dynamic CSI, improving the robustness of recognition models in different scenarios.

(3). The paper presents thorough evaluations of the proposed CSI method on two corruption robustness benchmarks. The experimental results demonstrate significant improvements over state-of-the-art methods, validating the effectiveness of the approach.

缺点

(1). The paper could improve the clarity of exposition. Some parts of the paper, particularly in the methodology section, are not explained in a clear and concise manner, which may impede the reader's understanding.

(2). The paper could benefit from more detailed evaluation and ablation studies. While the experimental results show the superiority of the CSI method, it would be valuable to have a more in-depth analysis of its performance and a comparison with widely-known baselines in the field.

问题

I think the method proposed in ModelNet40-C and PointCloud-C should also be compared and analyzed.

伦理问题详情

/NA

评论

We are glad that the reviewer found our method has demonstrated improvements. We appreciate the opportunity to address the points you have raised.

The paper could improve the clarity of exposition. Some parts of the paper, particularly in the methodology section, are not explained in a clear and concise manner, which may impede the reader's understanding.

We appreciate the feedback on the clarity of our exposition, especially in the methodology section. We understand that clear and concise communication is crucial for the reader's comprehension. To address this, we will undertake a thorough revision of our manuscript, with a specific focus on enhancing the clarity and conciseness of the methodology section. We aim to ensure that our final version presents our methods and findings in a more accessible and understandable manner. This revision will include refining complex explanations, simplifying technical jargon where possible, and providing additional context or examples to facilitate better understanding.

The paper could benefit from more detailed evaluation and ablation studies. While the experimental results show the superiority of the CSI method, it would be valuable to have a more in-depth analysis of its performance and a comparison with widely-known baselines in the field.

Thank you for the constructive feedback. We acknowledge the importance of in-depth analysis and comprehensive comparisons in our research. Although our experimental section already evaluates numerous baselines within this domain, we understand the need for deeper insights into the performance of the CSI method. In light of your suggestions, we will expand our analysis to include more detailed evaluations and additional ablation studies. These enhancements will focus on a finer comparison with well-established baselines in the field, and on dissecting the individual contributions and impacts of the various components of the CSI method. Our aim is to provide a clearer, more comprehensive understanding of why and how CSI demonstrates superiority over these baselines, thereby enriching the overall value and contribution of our work to the field.

I think the method proposed in ModelNet40-C and PointCloud-C should also be compared and analyzed.

Thank you for emphasizing the importance of comparative analysis with ModelNet40-C and PointCloud-C. We have indeed conducted such comparisons and included them in our manuscript. Specifically, Table 1 in the main text and Tables 1 and 2 in the appendix present a comprehensive comparison of our method against others mentioned in the original papers of ModelNet40-C and PointCloud-C. These tables detail the performance metrics and demonstrate how our method stacks up against existing methods under similar conditions. We believe this thorough comparison effectively illustrates the strengths and limitations of our proposed approach in the context of these well-established datasets.

评论

I have reviewed all the feedback from the reviewers, and I believe that the newly added results currently provide sufficient support for the author's claim. As a result, I am upgrading my rating.

审稿意见
5

This article proposes a critical subset identification (CSI) method for bolstering recognition robustness in the face of data corruption, which consists of two parts: density-aware sampling (DAS) and self-entropy minimization (SEM). DAS uses local density weighting to better sample point cloud data. During the training process, SEM introduces an optimization strategy of entropy minimization to the significance value calculated by self-attention, which improves the model's attention to points with higher significance values . The authors subsequently conducted experiments on two corruption benchmarks: ModelNet40-c and PointCloud-c, proving that their method can effectively improve the robustness of the point cloud transformer (PCT) model while ensuring performance on clean data sets.

优点

  • This paper is well-written, especially Section 3 providing clear and easily understandable explanations of the CSI framework.
  • The idea of introducing Entropy minimization in Self-Attention Modules is simple and effective, and it integrates the significance values of different points into the model training process.
  • The experiments are extensive in terms of implemented models. Especially the exploration of the impact of hyperparameters in the ablation study demonstrates the key parameters that affect the method.

缺点

  • The proposed method is somehow ad-hoc, with the authors needing to specify that DAS is only suitable for models including sampling&aggregation module. Also the justification for the proposed method is not well established, with the motivations being weak.
  • The experiments on CSI are not persuasive enough, from the method comparison to the diversity of datasets. The authors should consider comparing with other train-time point cloud robust methods. The selected datasets are all derived from ModelNet40, indicating a lack of experimental diversity.

问题

  1. Please explain the statement “local density of a point positively correlated with its significance”? Previous work has indicated that significant points are usually outward points in the point cloud, which are generally sparse[1]. Also, traditional sampling methods like FPS tend to find points with low density as representations.
  2. The authors should compare with other SOTA train-time methods under the same settings in the main experiment, such as data augmentation or modules addition to the model.
  3. The authors present the SEM method for Global Feature in Table 5. Given that the authors are making a critical point selection, what is the purpose of this?
  4. The experiments should consider including datasets not based on ModelNet40.
  5. Tables 1 and 2 in Supplementary show that CSI cannot effectively handle point dropping (e.g., occlusion) and transformations (e.g., rotation), and may even be harmful. Does this indicate the limitations of CSI in terms of generalizability? Therefore, the reviewer points out that the results derived from the ModelNet40-C can be misleading.

[1] Zheng, Tianhang, et al. "Pointcloud saliency maps." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.

评论

We are glad that the reviewer found our study valuable to this field and that our method was simple yet effective. We appreciate the opportunity to address the points you have raised.

Please explain the statement “local density of a point positively correlated with its significance”? Previous work has indicated that significant points are usually outward points in the point cloud, which are generally sparse[1]. Also, traditional sampling methods like FPS tend to find points with low density as representations.

The observation that significant points are predominantly located at the outer regions of the point cloud does not hold true in scenarios involving data corruption, as demonstrated in Figure 3. In such cases, Farthest Point Sampling (FPS) tends to retain an excessive number of noisy data points. This limitation of FPS is a key motivator for our proposal of the Density-Aware Sampling (DAS) method. DAS is specifically designed to mitigate the adverse effects caused by FPS, particularly its tendency to sample a higher number of noisy points in the presence of various forms of corruption, such as background noise and global noise.

The authors should compare with other SOTA train-time methods under the same settings in the main experiment, such as data augmentation or modules addition to the model.

We acknowledge the suggestion to benchmark our approach against other state-of-the-art (SOTA) train-time methods under equivalent experimental conditions, including factors like data augmentation and module additions. In response, we direct attention to Table 2 in the main manuscript, which provides a comprehensive comparison where all evaluated methods, including ours, are integrated with the data augmentation technique PointCutMix-R. Additionally, for a more extensive comparison, Table 2 in the supplementary material, ranging from the entry 'DGCNN+OcCo' to 'PCT+CSI+PointCutMix-R', offers a detailed analysis of various other train-time methods. This inclusion ensures a thorough and fair evaluation of our method against current SOTA techniques under consistent experimental settings.

The authors present the SEM method for Global Feature in Table 5. Given that the authors are making a critical point selection, what is the purpose of this?

The incorporation of the SEM method for the global feature, as detailed in Table 5, is intended as an additional ablation study. Similar to the critical point selection process, the application of SEM is designed to enhance the distinctiveness of global features. The experiments presented in Table 5 investigate the individual and combined effects of critical point selection and SEM application. Our findings indicate that while applying SEM to both local and global features yields performance improvements, the point-level selection demonstrates a more significant benefit. This comparison underscores the unique contribution of each technique to the overall model performance.

The experiments should consider including datasets not based on ModelNet40.

In addressing the recommendation to include datasets beyond ModelNet40 in our experiments, we emphasize our focus on point cloud recognition within this work. ModelNet40, along with its two corrupted variants, ModelNet40-C and PointCloud-C, are extensively recognized and employed in related research. These datasets are particularly relevant and suitable for the objectives of our study. Our decision to utilize these datasets is grounded in their widespread acceptance in the field, ensuring that our findings are comparable and relevant to current standards in point cloud recognition research.

Tables 1 and 2 in Supplementary show that CSI cannot effectively handle point dropping (e.g., occlusion) and transformations (e.g., rotation), and may even be harmful.

We appreciate the reviewer's observation regarding the performance of CSI under conditions of point dropping and transformations, as shown in Tables 1 and 2 of the supplementary material. We acknowledge that these results may highlight a limitation in the generalizability of the CSI approach, particularly in handling occlusions and rotations. This limitation is indeed an area that warrants further investigation and improvement in future iterations of our work. Regarding the use of ModelNet40-C, we understand the concern that the results derived from this dataset might present a skewed perspective. We will consider this point critically in our analysis and discussion, ensuring that we clearly communicate the potential limitations of our findings in the context of different datasets and conditions. However, we argue that it is hard for a method to cure all corruption types. This feedback is invaluable for guiding our ongoing research efforts to refine and enhance the robustness and applicability of our approach.

评论

Thank you to the authors for providing new experimental results and additional interpretations. However, some concerns remain unaddressed.

These datasets are particularly relevant and suitable for the objectives of our study.

ModelNet40-C and Point Cloud-C might be insufficient due to significant overlaps in these datasets, such as the "add global" in Point Cloud-C and the "background" in ModelNet40-C. The authors might consider incorporating a wider variety of noise datasets to strengthen the argument.

Table 2 in the supplementary material, ranging from the entry 'DGCNN+OcCo' to 'PCT+CSI+PointCutMix-R', offers a detailed analysis of various other train-time methods...

The additional data provided by the authors are greatly appreciated. However, these data suggest that the pure CSI method does not outperform methods with data augmentation (PCT+CSI v.s. PCT+WOLFMix), and even the enhanced version (PCT+ CSI +PointCutMix-R) fails to achieve the best results.

Finally, I understand and agree with the authors' statement that it is challenging for a method to address all types of corruption. The suggestion is for the authors to focus on specific types of corruption to further explore and demonstrate the unique capabilities of CSI.

评论

We sincerely thank the reviewer for the reply and we would like to further clarify:

ModelNet40-C and Point Cloud-C might be insufficient due to significant overlaps in these datasets, such as the "add global" in Point Cloud-C and the "background" in ModelNet40-C. The authors might consider incorporating a wider variety of noise datasets to strengthen the argument.

To the best of our knowledge, ModelNet40-C and PointCloud-C are currently the most comprehensive and relevant datasets available for this purpose. However, we recognize the value of diversifying the datasets used for evaluation to strengthen our argument further. Unfortunately, the field lacks a wide variety of standardized noise datasets specifically tailored for 3D object recognition.

The additional data provided by the authors are greatly appreciated. However, these data suggest that the pure CSI method does not outperform methods with data augmentation (PCT+CSI v.s. PCT+WOLFMix), and even the enhanced version (PCT+ CSI +PointCutMix-R) fails to achieve the best results.

It is accurate that in the specific comparisons mentioned, the PCT+CSI method does not surpass the performance of PCT+WOLFMix, and similarly, the enhanced version PCT+CSI+PointCutMix-R does not achieve the top results. This outcome highlights a crucial aspect of our research, which is the exploration of the balance between innovative interpolation techniques and traditional data augmentation strategies in enhancing model performance.

The primary aim of incorporating CSI was to explore a novel approach in the context of point cloud processing, focusing on the subspace properties of the data. While the results may not currently exhibit superior performance compared to all existing data augmentation methods, they do provide valuable insights into the potential and limitations of subspace interpolation techniques in this domain.

We acknowledge that further refinement and combination with other methods might be necessary to fully realize the potential of CSI. This also opens up avenues for future research, where the synergies between subspace interpolation methods and data augmentation techniques can be further explored and optimized.

审稿意见
6

The paper proposes a "critical subset identification (CSI)" framework for robust point cloud perception, which comprises 1) a new point sampling strategy "density-aware sampling (DAS)" that locates high-density point areas for anchors, and 2) a new optimization objective "self-entropy minimization (SEM)" that encourage high-confidence predictions.

优点

  1. The two proposed techniques are clear and reasonable to me and design choices are backed up by concrete examples. Figure 3. shows a concrete example where Farthest Point Sampling and Random Sampling fail and the new Density-Aware sampling succeeds. Both techniques should be easy to implement in practice.

  2. The ablation studies are thorough in the paper. It helps to understand the effect of the neighbor number k in DAS and the layer position of the SEM loss.

  3. A significant all-around robustness improvement is achieved. As shown in the supplementary table, the model gains better robustness to not only global noise injection but also various other types of corruption.

缺点

  1. SEM is mostly based on previous knowledge that entropy minimization helps classification robustness, which slightly undermines the significance of the proposed techniques. Nonetheless, the paper provides a detailed discussion of how entropy minimization should be applied to transformer-based point classifiers in both attention layers and the classification head, accompanied by sufficient ablation studies.

  2. It is not clear how general DAS is and how it affects the classifier's robustness to more types of corruptions other than global noise addition shown in Figure 3. Table 1 and Table 2 in the supplementary material ablate CSI as a whole so they can not show the effect of DAS. It would be better if DAS could be individually studied on different types of corruption.


minor suggestion

The title could be more informative in my opinion. It might be better to use "critical subset identification" in replace of "CSI".

问题

Please address the questions in the weakness section.

评论

We are glad that the reviewer found our method effective. We appreciate the opportunity to address the points you have raised.

SEM is mostly based on previous knowledge that entropy minimization helps classification robustness, which slightly undermines the significance of the proposed techniques. Nonetheless, the paper provides a detailed discussion of how entropy minimization should be applied to transformer-based point classifiers in both attention layers and the classification head, accompanied by sufficient ablation studies.

Although previous work has investigated how entropy minimization helps classification robustness, in their works, they merely investigated how to minimize entropy during inference. For example, tent [1] minimizes Shannon Entropy on classification logits through several iterations. The biggest difference is that we give a thorough analysis of how to apply entropy minimization during training and which features contribute most by conducting ablation studies not only on transformer self-attention maps but also on intermediate features.

[1] Wang, Dequan, et al. "Tent: Fully test-time adaptation by entropy minimization." arXiv preprint arXiv:2006.10726 (2020).

It is not clear how general DAS is and how it affects the classifier's robustness to more types of corruptions other than global noise addition shown in Figure 3. Table 1 and Table 2 in the supplementary material ablate CSI as a whole so they can not show the effect of DAS. It would be better if DAS could be individually studied on different types of corruption.

More visualization results are available in the appendix of our revised manuscript. One can observe the effectiveness of DAS compared with FPS (Farthest Point Sampling) and RS (Random Sampling). Besides, our quantitative experimental results have also validated that DAS has outperformed other baseline methods in improving corruption robustness.

AC 元评审

This paper proposes a new approach to improve the robustness of 3D point cloud recognition. The method includes two components: density-aware sampling (DAS) and self-entropy minimization (SEM). The two components are used for static and dynamic critical subset identification, respectively. Experimental results show the effectiveness of the proposed method on corruption robustness benchmarks.

This is a borderline paper (6, 5, 8, 3). Most reviewers appreciate the effectiveness of the proposed method. However, after the detailed author-reviewer discussion, Reviewer xKH7 and Reviewer Jv1W still have concerns about the motivation and superiority of the paper. Reviewer Jv1W further pointed out that the density-aware sampling techniques have already been published in some early papers, and he/she thinks that the performance is not strong enough considering that this is not a theoretical paper.

为何不给更高分

N/A

为何不给更低分

N/A

最终决定

Reject