Target-Guided Adversarial Point Cloud Transformer Towards Recognition Against Real-world Corruptions
摘要
评审与讨论
This paper introduces a framework consisting of two modules: the Adversarial Significance Identifier, which selects tokens with high importance, and the Target Guided Prompter, which selectively drops important tokens to achieve more generalized performance. This approach aims to mitigate the overfitting on specific pattern problem.
优点
S1. The idea of digging sub-optimal patterns which could contribute to the final performance is interesting.
S2. The framework seems like a general idea which could be applied to other method as supplemental modules.
缺点
W1. The presentation remains to be improved
W2. The experiment is not strong enough
问题
O1. The term "local pattern" is ambiguous and lacks a formal definition. Does it refer to a graph constructed from the Point Cloud set tokens or the importance ranking of the tokens? The paper should provide a clear and formal definition of "local pattern." Regarding Figure 1, the confusion matrices in parts (a) and (b) appear indistinguishable to the reviewer. Clarification on their differences would be beneficial. The abstract is difficult to understand, particularly in describing the two modules. It requires significant improvement for readability. The description in lines 42-52 is much clearer than the abstract and could serve as a model for revision.
O2. The paper employs the mini-pointnet method to generate Point Cloud tokens. However, there are other point cloud tokenizing methods available. The paper should evaluate the adversarial mechanism using different tokenization methods to validate its robustness.
O3. The proposed framework appears general and potentially applicable to various methods, which is a positive aspect. Can this method be applied to other state-of-the-art (SOTA) methods by incorporating the "digging sub-optimal patterns" mechanism? If so, the reviewer recommends including experiments demonstrating the broader applicability of the proposed method.
O4. The notations used in the focal tokens identification section are unclear. The paper uses m = 1, ..., D when introducing the F_topk, which is a R^{k \times C} matrix. Is D equivalent to C? The reviewer did not find a definition for D. Additionally, M in equation (2) is introduced as a vector rather than a matrix. The paper should revise the notation for clarity. The process for selecting the top-k tokens is not adequately described. Detailed explanation on how these tokens are chosen should be included.
局限性
The paper claims the limitation of this paper is that the optimal utilization is remaining unexplored. The authors claim that they will address the issue in the future work.
We sincerely appreciate your detailed and insightful reviews. We hope our response can address your concerns.
Q1.1: The term "local pattern" is ambiguous and lacks a formal definition. Does it refer to a graph constructed from the Point Cloud set tokens or the importance ranking of the tokens? The paper should provide a clear and formal definition of "local pattern.".
Sorry for the confusion about the definition. Local pattern refers to the geometric structure of a small region within the point cloud, which is captured by a subset of tokens. The clear and formal definition will be included in the final manuscript.
Q1.2: Regarding Figure 1, the confusion matrices in parts (a) and (b) appear indistinguishable to the reviewer. Clarification on their differences would be beneficial.
Sorry for the confusion. We have provided a visualization in the authors' response document that highlights the differences more clearly. The dominant red along the diagonal underscores our approach's superior performance compared to the standard transformer. We will incorporate it in the final manuscript to ensure the distinctions are clear.
Q1.3: The abstract is difficult to understand, particularly in describing the two modules. It requires significant improvement for readability. The description in lines 42-52 is much clearer than the abstract and could serve as a model for revision
Sorry for the confusion about the abstract. As you suggested, we have carefully revised the abstract for readability. The revised abstract will be included in the final manuscript.
Q2: The paper employs the mini-pointnet method to generate Point Cloud tokens. The paper should evaluate the adversarial mechanism using different tokenization methods to validate its robustness.
Thanks for your advice! As you suggested, we compare our method with two alternative tokenization methods: mini-DGCNN and mini-PCT. The results are shown in the table below.
As we can see from the results, the performance of our adversarial mechanism is relatively stable across different tokenization methods. This indicates that our method is not sensitive to the specific tokenization method used and can effectively improve the robustness of point cloud models. Additionally, we would like to mention that the choice of tokenization method may affect the performance of the model on different tasks and datasets. Therefore, we recommend exploring different tokenization methods and choosing the most suitable one for specific applications. We will add the results in the final version.
| Methods | mCE(%, ) |
|---|---|
| mini-DGCNN | 71.1 |
| mini-PCT | 72.4 |
| mini-PointNet (Ours) | 72.2 |
Q3: The proposed framework appears general and potentially applicable to various methods, which is a positive aspect. Can this method be applied to other state-of-the-art (SOTA) methods by incorporating the "digging sub-optimal patterns" mechanism?
Thank you for your suggestion! As you suggested, we have extended the 'digging sub-optimal patterns' mechanism to two state-of-the-art (SOTA) methods, PointM2AE and PointGPT, on the ModelNet-C dataset. The results are promising and demonstrate the general applicability of our approach.
As shown in the table below, incorporating our 'digging sub-optimal patterns' mechanism into PointM2AE and PointGPT resulted in significant reduction in mCE scores. These results suggest that our approach can effectively enhance the robustness of various point cloud recognition models. By encouraging the model to explore and utilize a broader range of patterns, our method enables the models to better generalize to corrupted data.
| Methods | mCE(%, ) |
|---|---|
| PointM2AE | 83.9 |
digging sub-optimal patterns | 82.9 ( 1.0) |
| PointGPT | 83.4 |
digging sub-optimal patterns | 82.0 ( 1.4) |
Q4.1: The notations used in the focal tokens identification section are unclear. The paper uses m = 1, ..., D when introducing the F_topk, which is a R^{k \times C} matrix. Is D equivalent to C? The reviewer did not find a definition for D.
Sorry for the confusion. This is a typo. The variable D in F_topk should indeed be equivalent to C, which represents the number of feature channels in the tokens. We will revise the paper to correct this error and ensure consistency in our notation.
Q4.2: Additionally, M in equation (2) is introduced as a vector rather than a matrix.
You are correct that M in equation (2) is a vector rather than a matrix. We will revise the equation and the corresponding text to reflect this correction.
Q4.3: The process for selecting the top-k tokens is not adequately described. Detailed explanation on how these tokens are chosen should be included.
Sorry for the confusion. We will provide a more detailed explanation of how the focal tokens are selected.
-
Feature Response Calculation: For each token, we compute the feature response for each channel, with the help of auxiliary supervisory process. This involves assessing how strongly each token responds in each of the C channels.
-
Sorting Tokens by Channel Responses: Once the feature responses are calculated, we sort the tokens based on their response values within each channel. This step ensures that tokens with higher responses are ranked higher.
-
Selecting Focal Tokens: After sorting, we select the top k tokens for each channel. This selection is done by choosing the k highest-ranked tokens based on their feature responses in each channel.
By following these steps, we ensure that only the most significant tokens, in terms of their feature responses, are retained for further processing.
For further clarity, we kindly refer you to the A.6 part of the Supplementary Material, where we provide pseudo-code of focal tokens identification process.
The reviewer is satisfied with the rebuttal and has increased the score. Please incorporate the rebuttal content into the final manuscript. Thank you.
Thanks for your satisfaction with our reply! We greatly appreciate your positive evaluation. We will incorporate the additional experiments and improve the paper in the final version. If you have any further concerns or questions, please do not hesitate to reach out. We are committed to addressing any remaining issues promptly and thoroughly. Thank you again for your valuable feedback and best wishes!
The paper proposes a novel architecture called Target-Guided Adversarial Point Cloud Transformer (APCT) for robust 3D perception in the presence of corrupted data. The APCT integrates an Adversarial Significance Identifier and a Target-guided Promptor to augment global structure capture and enhance the model's resilience against real-world corruption. The paper presents extensive experiments on multiple benchmarks, demonstrating the effectiveness and state-of-the-art performance of the proposed method.
优点
- The paper introduces a novel architecture, APCT, that addresses the challenge of robust 3D perception in the presence of corrupted data.
- The APCT integrates an Adversarial Significance Identifier and a Target-guided Promptor, which effectively improve the resilience of point cloud models against various types of corruptions.
- The paper presents extensive experiments on multiple benchmarks, including ModelNet-C and ScanObjectNN-C, demonstrating the effectiveness and state-of-the-art performance of the proposed method.
缺点
- Since data augmentation methods like PointMixUp and PointCutMix can improve the robustness, the experiments should be performed.
- Similar previous works like PointDP are not compared.
- Some typo errors in equation 3.
I would like to increase my score if the authors solve those issues.
问题
How about the performance on SOTA MVImageNet dataset?
局限性
/NA
We sincerely appreciate your detailed and insightful reviews. We hope our response can address your concerns.
Q1: Augmentation methods like PointMixUp and PointCutMix can improve the robustness, the experiments should be performed.
Thanks for your advice! As you suggested, we further evaluate the performance of our approach with various data augmentation methods on ModelNet-C dataset, the results are as follows. Beside the discussed PointMixup [1] and PointCutMix [2], we additionally incorporate experiments with the data augmentation techniques PointWOLF [3], RSMix [4], and WOLFMix [5].
Among these, PointMixup, PointCutMix and RSMix fall under the category of mixing augmentation, where they mix several point clouds following pre-defined regulations. PointWOLF pertains to deformation techniques that non-rigidly deforms local parts of an object. WOLFMix combines both mixing and deformation augmentations, which first deforms the object, and subsequently rigidly mixes the deformed objects together.
As shown in the table, data augmentation methods further improve the robustness of our method against point cloud corruptions. Employing mixing or deformation data augmentation techniques independently can enhance the robustness of the model, e.g. the results of our model with PointWOLF (67.0% mCE) and with PointMixup (66.2% mCE). When these two techniques are combined, as in WOLFMix, the robustness of the model is further augmented (64.7% mCE). Additionally, these experiments demonstrate the compatibility of our method with various data augmentation techniques, further underscoring its potential in addressing data corruption. We will add the results in the final version.
| Methods | mCE(%, ) |
|---|---|
| APCT (Ours) | 72.2 |
| + PointMixup | 66.2 |
| + PointCutMix-R | 69.7 |
| + PointWOLF | 67.0 |
| + RSMix | 71.3 |
| + WOLFMix | 64.7 |
Reference:
[1] Pointmixup: Augmentation for point clouds. ECCV 2020.
[2] Pointcutmix: Regularization strategy for point cloud classification. Neurocomputing 2022.
[3] Point Cloud Augmentation With Weighted Local Transformations. ICCV 2021.
[4] Regularization Strategy for Point Cloud via Rigidly Mixed Sample. CVPR 2021.
[5] Benchmarking and Analyzing Point Cloud Classification under Corruptions. ICML 2022.
Q2: Similar previous works like PointDP are not compared.
Thanks for your advice! The primary objective of our approach is to enhance the model's robustness against real-world corruptions. Consequently, most of our experiments are centered around this goal. The experimental results presented in the paper demonstrate the effectiveness of our method in achieving this objective.
Improving the model's defense against point cloud attacks is a secondary goal. We made significant efforts to include comparisons with relevant methods such as PointDP and IF-Defense. However, we were unfortunately unsuccessful in these attempts.
-
As PointDP is not an open-source model, we were unable to obtain its implementation to evaluate its performance in comparison to our baseline method. Therefore, we could not include a direct comparison within the constraints of this submission.
-
Additionally, due to time limitations, we were unable to conduct experiments on other related method such as IF-Defense in time. However, we are committed to addressing this in the final manuscript and will make every effort to include these comparisons.
However, during rebuttal, we have conducted some extending experiments on ModelNet40-C dataset. We kindly refer you to our response to Q2 of Reviewer K9Vn for details.
We appreciate your understanding and will strive to improve our manuscript based on your valuable feedback.
Q3: Some typo errors in equation 3.
Sorry for the typo errors. We have carefully revised the paper to fix all typos.
Q4: How about the performance on SOTA MVImageNet dataset?
Thanks for your advice! It is very necessary to evaluate APCT in more challenging MVImageNet [6] dataset.
In our paper, we have experimented on five datasets of different tasks, i.e., ModelNet-C and ScanObjectNN-C (classification against corruption), ShapeNet-C (part segmentation against corruption), ScanObectNN (classification), ModelNet (attack defense). On different benchmarks with various domains, our APCT can attain competitive performance to existing specialist models.
As you suggested, we further evaluate the performance of our approach on one additional dataset Mvimgnet [6]. It is a challenging benchmark for real-world point cloud classification, which contains 64,000 training and 16,000 testing samples. As shown in the table, our approach achieves 86.6 OA and still exhibits good generalization capacity in real-world scenarios.
| Methods | OA |
|---|---|
| PointNet | 70.7 |
| PointNet++ | 79.2 |
| DGCNN | 86.5 |
| PAConv | 83.4 |
| PointMLP | 88.9 |
| APCT (Ours) | 86.6 |
Reference:
[6] Mvimgnet: A large-scale dataset of multi-view images. CVPR 2023.
I am satisfied with the rebuttal and thus keep my rate unchanged.
Thanks for your satisfaction with our reply! We greatly appreciate your positive evaluation. We will incorporate the additional experiments and improve the paper in the final version. If you have any further concerns or questions, please do not hesitate to reach out. We are committed to addressing any remaining issues promptly and thoroughly. Thank you again for your valuable feedback and best wishes!
The paper introduces a novel architecture called the Adversarial Point Cloud Transformer (APCT). This model aims to enhance the robustness of 3D perception models against real-world corruptions. The APCT integrates two core components: the Adversarial Significance Identifier and the Target-guided Promptor. The Adversarial Significance Identifier identifies significant tokens by analyzing global context, while the Target-guided Promptor focuses the model's attention on less dominant tokens, effectively broadening the range of patterns the model learns. Extensive experiments demonstrate that APCT achieves state-of-the-art results on multiple corruption benchmarks, proving its effectiveness in handling various types of data corruptions.
优点
The paper introduces a novel approach by combining adversarial training with point cloud transformers. The experiments are comprehensive and robust, demonstrating the effectiveness of the proposed method across various corruption scenarios. The paper is well-written, with clear and detailed explanations and a logical flow. Visual aids effectively support the textual content. The research addresses a critical issue in 3D point cloud recognition, providing valuable insights and practical solutions that can be applied in real-world scenarios.
缺点
The paper could explore additional complex corruption scenarios beyond those covered. The impact of the proposed method on computational overhead is not thoroughly discussed, which could be important for practical implementations. While the method is validated on several datasets, more diverse and larger-scale datasets could further strengthen the findings [a].
[a] Benchmarking and Improving Robustness of 3D Point Cloud Recognition against Common Corruptions
问题
I would like to see results on the mentioned ModelNet40-C dataset
局限性
N/A
We sincerely appreciate your detailed and insightful reviews. We hope our response can address your concerns.
Q1: Disscussion about the impact of the proposed method on computational overhead could be important for practical implementations.
Thanks for your advice! It is very necessary to discuss the computational overhead of the proposed method. As you suggested, we discuss the impact of the proposed method on computational overhead as follows, including memory, training and inference speed. Experiments are conducted on one GeForce RTX 3090.
As seen in the table, our method incurs no additional memory overhead. When compared with the baseline technique, our method brings a slight decrease in training speed (~ 6% delay) and inference speed (~ 6% delay), while delivers a significant 4.0% mCE reduction.
| Method | Memory (G) | Train speed (samples/s) | Infer speed (samples/s) | mCE (%, ) |
|---|---|---|---|---|
| Baseline | 10.9 | 415.2 | 1111.7 | 76.2 |
| +Ours | 10.9 | 383.4 | 1045.8 | 72.2 |
| Δ | - | ↓~6% | ↓~6% | ↓4.0 |
Q2: Results on the mentioned ModelNet40-C dataset.
Thanks for your advice! As you suggested, we further evaluate the performance of our approach on ModelNet40-C [1] dataset. It is a comprehensive benchmark on 3D point cloud corruption robustness, consisting of 15 common and realistic corruptions.
As shown in the table, our approach exhibits remarkable robustness on the ModelNet40-C dataset, it beats the superior PCT by 1.4 and achieves the ER_cor of 24.1. The results on ModelNet40-C demonstrate that our APCT has excellent robustness to various point cloud corruptions. We will add the results in the final version.
| Model | ER_cor ↓ | Occlusion | LiDAR | Density Inc. | Density Dec. | Cutout | Uniform | Gaussian | Impulse | Upsampling | Background | Rotation | Shear | FFD | RBF | Inv. RBF |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PointNet | 28.3 | 52.3 | 54.9 | 10.5 | 11.6 | 12.0 | 12.4 | 14.4 | 29.1 | 14.0 | 93.6 | 36.8 | 25.4 | 21.3 | 18.6 | 17.8 |
| PointNet++ | 23.6 | 54.7 | 66.5 | 16.0 | 10.0 | 10.7 | 20.4 | 16.4 | 35.1 | 17.2 | 18.6 | 27.6 | 13.4 | 15.2 | 16.4 | 15.4 |
| DGCNN | 25.9 | 59.2 | 81.0 | 14.1 | 17.3 | 15.4 | 14.6 | 16.6 | 24.9 | 19.1 | 53.1 | 19.1 | 12.1 | 13.1 | 14.5 | 14.0 |
| RSCNN | 26.2 | 51.8 | 68.4 | 16.8 | 13.2 | 13.8 | 24.6 | 18.3 | 46.2 | 20.1 | 18.3 | 29.2 | 17.0 | 18.1 | 19.2 | 18.6 |
| PCT | 25.5 | 56.6 | 76.7 | 11.8 | 14.3 | 14.5 | 12.1 | 13.9 | 39.1 | 17.4 | 57.9 | 18.1 | 11.5 | 12.4 | 13.0 | 12.6 |
| SimpleView | 27.2 | 55.5 | 82.2 | 13.7 | 17.2 | 20.1 | 14.5 | 14.2 | 24.6 | 17.7 | 46.8 | 30.7 | 18.5 | 17.0 | 17.9 | 17.2 |
| Ours | 24.1 | 54.9 | 54.7 | 11.7 | 12.9 | 14.2 | 12.1 | 12.6 | 26.3 | 13.4 | 80.6 | 18.3 | 12.1 | 13.0 | 12.7 | 12.2 |
Reference:
[1] Benchmarking and Improving Robustness of 3D Point Cloud Recognition against Common Corruptions. arXiv 2022.
I have raised my rating to 6 and thanks for the rebuttal.
Thanks for your satisfaction with our reply! We greatly appreciate your positive evaluation. We will incorporate the additional experiments and improve the paper in the final version. If you have any further concerns or questions, please do not hesitate to reach out. We are committed to addressing any remaining issues promptly and thoroughly. Thank you again for your valuable feedback and best wishes!
We sincerely appreciate all reviewers and community members for their efforts in evaluating the paper and writing suggestions that greatly help us improve the work! Please find our responses to your individual questions below. We look forward to discussing any issues further should you have any follow-up concerns!
This paper introduces a novel architecture designed to enhance robustness in 3D point cloud recognition against real-world corruptions. The method, APCT, incorporates two key components: the Adversarial Significance Identifier, which selects important tokens based on global context, and the Target-guided Promptor, which focuses attention on less significant tokens, broadening the range of patterns learned by the model. Experiments demonstrate that APCT outperforms existing methods on multiple benchmarks, achieving state-of-the-art results for 3D recognition robustness.
The paper introduces a technically sound, novel method with strong empirical validation across diverse benchmarks. While there are areas for improvement in terms of presentation and exploration of alternative methods, the authors addressed most concerns effectively in the rebuttal. The contributions are relevant to real-world applications in 3D perception and are likely to impact future research. Thus, the paper should be accepted.