PaperHub
5.3
/10
Poster4 位审稿人
最低5最高6标准差0.4
5
5
6
5
4.0
置信度
正确性2.5
贡献度2.5
表达2.8
NeurIPS 2024

PCoTTA: Continual Test-Time Adaptation for Multi-Task Point Cloud Understanding

OpenReviewPDF
提交: 2024-04-23更新: 2024-11-06

摘要

关键词
Continual Test-time AdaptationPoint Cloud Understanding

评审与讨论

审稿意见
5

This paper present an innovative, pioneering framework for Continual Test-Time Adaptation in multi-task point cloud understanding, enhancing the model’s transferability towards the continually changing target domain.

优点

Introducing CTTA into a multi-task 3D vision setting is practical and realistic. The implementation of all modules is based on existing challenges in CTTA.

缺点

The integration of CTTA with multi-task point clouds is not well developed, and the experimental design is not entirely reasonable. The innovative method does not clearly distinguish itself within this specific setting.

问题

  1. In the Introduction section, "On the other hand, few works like MM-CCTA" should be changed to "On the other hand, few works like MM-CTTA."
  2. The representation of each learnable prototype in (a) of Figure 3, which pairs once with all Source prototypes, lacks clarity and does not reflect the motivation well.
  3. The described new domain leans more towards a new dataset, and in experiments, this domain change occurs only once. Is this specific CTTA or traditional TTA?
  4. What are the specific challenges of combining CTTA with point cloud multi-task learning, especially since the method in this paper is quite general and can also be applied to 2D tasks?
  5. What pretrained models were used in methods like CTTA? How did they utilize two source domains? Why is there such a significant difference in experimental accuracy? Can you provide a detailed analysis of the reasons behind this large gap, which I believe is not solely due to the method proposed in this paper?
  6. This paper does not clearly explain how learnable prototypes are obtained and learned. What about their specific quantities, and does this quantity also affect the experimental results? Could you provide ablation experiments?
  7. There is a lack of experiments and detailed evaluations to assess the effects of error accumulation and forgetting resistance.

局限性

The challenge of the task, say CTTA for point cloud is unclear, and the authors did not explain why do existing image-based methods (such as Tent, CoTTA, RMT) not appropriate to the task. The novelty is also confused, the reviewer cannot understand the necessary of building a complicated graph instead of directly injecting prototype information to the current data point. If I miss something important of the paper, please let me know.

作者回复

We greatly appreciate your detailed review. We're glad to see your kind recognition of the innovation of our pioneering framework for Continual Test-Time Adaptation in multi-task point cloud understanding.

In the following, we will address each of your concerns:

Q1: Typo. Thanks. We will carefully proofread the paper and revise all typos in the revision.

Q2: Learnable prototype in Fig 3(a). Sorry for the confusion. Each learnable prototype is designed to capture distinct features and characteristics of the target domain, paving the way for handling subsequent unknown testing data. These learnable prototypes are dynamic and adjusted during the adaptation process to better align with the source prototypes, which represent the source domains’ features.

Our APM pairs and mixes based on the similarity between the source-learnable prototypes and the current target, effectively incorporating all source domain information. Catastrophic forgetting is effectively mitigated by explicitly fusing inherent source representations (prototypes) and applying graph attention mechanisms to target features, thus achieving good stability of the models.

Q3: CTTA setting. Strictly following typical CTTA methods 9,379, 37 , we perform continual test-time adaptation on one additional new dataset, consisting of two target domains to ensure fair comparisons. Our testing samples’ sequence changes randomly and continuously between these two target domains, ensuring a fair comparison in line with the CTTA setting. We will clarify it in the revision.

Q4: Specific challenges of PCoTTA. Simply combining CTTA with point cloud learning faces great challenges. Firstly, compared to grid-structured 2D images, 3D point clouds are unordered and more challenging in CTTA. To address this, we specifically designed a graph attention mechanism to fully learn token sequences within a complete graph structure. This effectively captures contextual relationships and semantic features between unordered and irregular point cloud patches. Secondly, the catastrophic forgetting issue is underexplored for point cloud multi-task learning in continually varying domains, where the model would inevitably forget the knowledge of previously learned tasks while adapting to new ones. Particularly, in 3D point cloud understanding, different tasks are more challenging due to variations in data density and distribution. It’s harder to obtain a unified model that generalizes over raw 3D points compared to grid-like 2D images. To address this, we build a new CoTTA benchmark for multi-task point cloud understanding, and devise task-specific prototype banks, including both source and learnable prototypes, and design APMs by explicitly fusing inherent source prototypes with the current target and applying graph attention mechanisms to target features, thus achieving good stability of the models. Note that our PCoTTA is specifically designed for 3D point clouds, but it has potential to be adjusted to 2D tasks and we will explore it in future work.

Q5: More analysis for the experiment results. We reproduced the CTTA method using PIC as the backbone network. In our benchmark, all sources are treated as one expanded dataset for PIC pre-training, enabling the integration of multi-domain information. Notably, our PCoTTA forms prompt-query pairs from two different sources.

PIC shows significant results and excels in multi-task scenarios, but it lacks specific designs for multi-domain learning. In contrast, our PCoTTA effectively addresses this by aligning the testing data features with the familiar source prototypes and dynamically updating learnable prototypes through the GSFS module.

To the best of our knowledge, no existing CTTA methods support multi-task learning. We propose to integrate CTTA and PIC to facilitate multi-task and multi-domain learning, and our novel design achieves significant improvements. In particular, our task-specific prototype bank enhances multi-task learning capabilities, and the Gaussian Splatted-based Graph Attention efficiently refines target data representation to align with source domains.

Q6: Learnable prototypes. Learnable prototypes are initialized as trainable parameters and designed to capture semantic features across target domains. In CPR, these prototypes become more distinct from each other through the repulsion loss L_pr, while the most similar prototype to the current target is drawn closer to the target features.

In general, this process can be viewed as end-to-end unsupervised clustering, with separation based on the number of potential target domains. Consequently, the number of learnable prototypes ideally approximates the number of target domains, though this is not strictly required.

We conducted an additional ablation study on the number of learnable prototypes, as shown in Table C in rebuttal PDF, and the results indicate minimal changes. Additionally, we show the case with no learnable prototypes (i.e., quantity 0), where our method degrades to aligning the target feature with solely considering source prototypes’ similarities. While this case achieves some degree of test-time adaptation, its performance is less decent than our PCoTTA.

Q7: More experiments to verify catastrophic forgetting and error accumulation. Thanks for your valuable suggestions. Please refer to Q3@Reviewer SkYG for more details.

评论

I appreciate the detailed responses from the authors, and they solved my concerns. Some further comments:

  1. If no source prototype is given due to privacy protection , will the method still work?
  2. Although the graph network is used for 3d data, it is not the main contribution of the paper, and graph attention network is not novel. I suggest the authors improve their presentation, and make readers know the difference of CTTA between 3d and 2d data. Otherwise, it should be a general method for any classification tasks.
评论

Q1: Thanks for the comment. We’d like to point out that many recent papers of TTA/CTTA [I, II, III] reveal that the use of source prototypes like features, tokens, and statistics from the source domain do not pose privacy issues and can be utilized to further enhance the adaptability on target data. Firstly, TTAC [I] calculates category-wise and global statistics of source domains as anchors and pre-saves them for streaming online test-time adaptation. Similarly, the published work [II] generates the class-wise source prototypes before model deployment and presents an auxiliary task based on nearest source prototypes to align the source and target features. Besides, TPS [III] computes each class prototypes of source domains, enabling the prototypes to be cached and reused for all subsequent test-time prediction.

Since our work is inspired by them, we strictly follow the same setting. In line with [I, II, III], our source prototypes are pre-cached before deployment and stored as constant parameters (token-like vectors) alongside the pre-trained source model. After deployment, our method does not access the source data to ensure no interaction with the source data is involved in the test-time adaptation stage. Different from them, our method focuses on domain-level prototypes instead of class-level prototypes, avoiding pseudo labeling of categories and instead highlighting the inherent features of the entire domain. Please note [I, II, III] are three examples of this common practice, and there are many emerging in recent years. We just wish to convey that this is indeed a well-recognized practice in the community.

Similar to [I, II, III], source prototypes are the key and indispensable information that we exploit to devise our methodology. Without it, our method is incomplete in addressing the test-time feature shifting during the continual adaptation and would lead to less decent results. We have also analyzed the cases without source prototypes in our ablation studies, as shown by models B and C in Table B of the rebuttal PDF. In this case, our method relies solely on learnable prototypes (i.e., incomplete framework), achieving a certain degree of adaptation. Although this reduces our method's effectiveness, it still outperforms CoTTA [37]. Specifically, CoTTA achieves 58.3, 56.7, and 55.2 during the 3 different rounds, while our PCoTTA achieves 36.8, 36.2, and 35.7, demonstrating superiority and the state-of-the-art performance in continual test-time adaptation for 3D point cloud.

In revision, we will improve the clarity regarding it in the paper.

[I] Su et al. Revisiting realistic test-time training: Sequential inference and adaptation by anchored clustering. In NeurIPS 2022.

[II] Choi et al. Improving test-time adaptation via shift-agnostic weight regularization and nearest source prototypes. In ECCV 2022.

[III] Sui et al. Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models. Preprint in ArXiv 2024.

Q2: Thanks for the suggestion. Firstly, we would like to stress that our main contribution is that we introduce a new multi-task continual test-time adaptation task for point cloud understanding, and present the first, pioneering framework PCoTTA for this new task, rather than merely a graph attention module. The proposed PCoTTA improves the point cloud model's adaptability and robustness in continuously changing domains by aligning the target domains with all source domains. Besides, we also present a new multi-task benchmark for this new CoTTA setting.

Secondly, although graph attention has been previously studied in the domain adaptation field, it has not been exploited in continual test-time adaptation or the CTTA scenarios in 3D data. Graph attention is effective at capturing contextual relationships between nodes on 2D images as pixels are regularly distributed, but it may be less effective on unordered 3D data. Considering this, we formulate a Gaussian kernel function to calculate attention coefficients based on token similarity, which does not require ordered input and facilitates unordered point cloud patches, thus characterizing 3D point cloud data. We then devise Gaussian Splatted-based Graph Attention that takes the calculated attention coefficients as input and fuses source-learnable prototype pairs with the target data features, aligning them with the sources. This method enables comprehensive, patch similarity-based adaptation for effective CTTA in 3D point cloud understanding.

Lastly, the key difference between a 3D point cloud and a 2D image is that 3D data is disordered, unstructured, and sparsely distributed, making traditional 2D image methods less effective or even cannot be directly applied. As aforementioned, our method involves specific designs for 3D point cloud data, which may need extra adjustments when applying to 2D images, and may lead to less decent performance on 2D images.

评论

Thank you again, and we’d like to provide a bit further clarification here.

Below shows our setting belongs to the CTTA and justifies the exploitation of source prototypes. (Our method is incomplete but still works without exploiting source prototypes information, still outperforming CoTTA [37]. Please see our last response above).

This can be evidenced by the recent CTTA works [IV, V, VI] that involves source prototypes information during the continual test time adaptation. For example, RMT [IV] extracted source prototypes of each class and pre-cached them before the adaptation, and then used the source prototypes to calculate the contrastive loss during the continual test-time. Similarly, SANTA [V] also pre-computed the source prototypes before the adaptation and used them for the target alignment during the continual test time adaptation. Besides, OBAO [VI] also conducted in a similar manner and acknowledged that this is fair in the CTTA setting. As pointed out in Page 8 of this ECCV paper [VI], “Some previous methods [IV, V] directly penalize the movement of target domain samples in the feature space relative to the source prototypes. This can be broadly interpreted as penalizing the movement of corresponding elements between ˆV and Vt in our defined CRG.”

Since we are inspired by and following these works, our setting belongs to the CTTA category. We would revise the paper to re-clarify this issue to eliminate misunderstandings.

Besides, we reproduce two CTTA methods, RMT [IV] and SANTA [V], which use source prototypes, and present the comparison results in the Table below, demonstrating that our method still outperforms these CTTA methods with source prototypes. The reasons for the superiority to these methods lie in three aspects, shown below.

  1. Usually, these methods heavily rely on the student-teacher architecture to realize the consistency regularization. As a result, they would inevitably introduce the pseudo label noise in their approaches, leading to error accumulation. Although they use symmetric cross-entropy or other techniques to alleviate the pseudo label noise, such problems still exist and cannot be fundamentally addressed. In contrast, our PCoTTA framework does not use any online or offline pseudo labeling techniques [9, 37], which inherently avoids the risk of error accumulation. In Table 1 of the manuscript, the results across 3 continuous rounds also illustrate the effect in avoiding error accumulation.

  2. These methods are specifically designed for CTTA in 2D images and perform well on 2D images. However, compared to 2D images, 3D point cloud data is disordered, unstructured, and sparsely distributed, making these 2D image-based CTTA methods less effective or even cannot be directly applied. Our method involves specific designs for 3D point cloud data, e.g., Gaussian Splatted-based Graph Attention for comprehensive, patch similarity-based adaptation, well-suited for 3D data, and achieves better performances than these methods.

  3. These methods often focus on single tasks and all lack specialized design in multi-task learning, which may lead to gradient conflicts in the optimization process of continual test-time adaptation. Instead, Our PCoTTA devises task-specific prototype banks where individual source-learnable prototype pairs are used for different adaptations in each task, thus favoring the multi-task learning in our setting.

Rounds123
Target DomainsModelNet40ScanObjectNNModelNet40ScanObjectNNModelNet40ScanObjectNN
MethodsRec./Den./Reg.Rec./Den./Reg.Rec./Den./Reg.Rec./Den./Reg.Rec./Den./Reg.Rec./Den./Reg.
RMT [IV]31.2/44.0/34.347.4/59.6/39.930.6/43.5/33.945.6/53.0/35.830.4/42.7/33.845.9/51.1/36.4
SANTA [V]32.3/42.1/37.844.9/55.2/38.631.7/41.9/37.442.0/53.4/35.630.1/41.6/36.440.6/52.9/34.7
Ours6.3/21.4/15.48.9/28.3/20.75.5/19.9/14.68.5/26.9/19.65.4/18.6/12.18.2/25.2/19.3

[IV]. D ̈obler et al. Robust Mean Teacher for Continual and Gradual Test-Time Adaptation. In CVPR 2023.

[V]. Chakrabarty et al. SANTA: Source Anchoring Network and Target Alignment for Continual Test Time Adaptation. In TMLR 2023.

[VI]. Zhu et al. Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation. In ECCV 2024.

评论

Dear Reviewer 5Vdz,

We thank you for your time in providing further comments.

We have carefully responded to them. Could you please take a few minutes to review these responses? If anything is unclear, we are happy to clarify further.

Thank you

评论

Dear Reviewer 5Vdz,

We thank you so much for raising your rating and for your support. We will revise the paper based on your constructive comments.

审稿意见
5

The paper introduces an innovative and unified framework for Continual Test-Time Adaption in multi-task point cloud understanding, which includes reconstruction, denoising and registration. The framework integrates with three new modules for different purposes, i.e., automatic prototype mixture for preventing catastrophic forgetting, Gaussian Splatted feature shifting for mitigating error accumulation and contrastive prototype repulsion learning the distinctive features implicitly. Experimental results on public datasets demonstrate the state-of-the-art performance of the proposed framework and the effectiveness of the proposed modules.

优点

  1. Writing quality is good. The paper is well-structured, and clearly written.
  2. SOTA performance. The proposed framework outperforms the state-of-the-arts with a impressively large margin on three 3D point cloud understanding tasks: reconstruction, denoising and registration.
  3. Ablations. Ablation experiments are provided to verify the effectiveness of the proposed modules.

缺点

  1. Insufficient explanations and verifications. Although the results presented in Table 3 reveal the performance improvement achieved by each proposed module, their underlying purposes such as preventing catastrophic forgetting and error accumulation are not readable. More specifically, the reader may not be able to judge from these values why the APM can resolve the catastrophic forgetting. The author should provide deeper analysis and more effective verification/visualization to support the claimed effects.
  2. Counterintuitive “source prototype estimation”. If I did not get it wrong, the tokens output from PointMAE should be inordered and irregular. How and why can the prototype be calculated by averaging all tokens without considering their permutations?

问题

Please refer to the weaknesses above.

局限性

The limitation is included in the paper.

作者回复

We deeply appreciate your thorough review. We are pleased to see your kind recognition of the innovation of our framework for Continual Test-Time Adaptation in multi-task point cloud understanding and our three new modules. Additionally, we appreciate your acknowledgment of the effectiveness demonstrated through extensive experimental results.

In the following, we will address each of your concerns:

Q1: More experiments to verify catastrophic forgetting and error accumulation. As for the issue of catastrophic forgetting, we have verified it with both quantitative and qualitative experiments. As shown in Table 1 of the manuscript, our continual test-time adaptation setting is similar to the one in CoTTA 3737 where we track the continuous performance across 3 independent evaluation rounds, with samples shuffled randomly in each round. The results in 3 rounds demonstrate the stability, i.e., robust continuous online learning abilities, and resilience against catastrophic forgetting in continually varying target domains. Moreover, we provide T-SNE visualizations for three independent validation rounds in Figure A and task-specific visualizations for each round in Figure B in the rebuttal PDF. Our method remains stable across continuous rounds, demonstrating that our proposed APM and GSFS effectively mitigate catastrophic forgetting by explicitly leveraging constant source prototypes and source domain representations, thereby avoiding over-reliance on adaptively learned information. Finally, our APM pairs and mixes based on the similarity between the source-learnable prototypes and the current target, effectively incorporating all source domain information. As a result, catastrophic forgetting is effectively mitigated by explicitly fusing inherent source representations (prototypes) and applying graph attention mechanisms to target features, thus achieving good stability of our models.

As for the issue of error accumulation, we have verified it both theoretically and empirically. Firstly, as evidenced by 3838 , pseudo labels tend to introduce pseudo label noise and could lead to error accumulation in long-term adaptation. In contrast, in our PCoTTA framework, we do not use any online or offline pseudo labeling techniques 9,379, 37 , which inherently avoids the risk of error accumulation. In Table 1 of the manuscript, the results across 3 continuous rounds also illustrate the effect in avoiding error accumulation.

Q2: Source prototype estimation. Sorry for the confusion. Following Point-MAE 2626 , our method reshuffles the patch-wise tokens after the Transformer decoder to maintain their order. Additionally, similar to PIC 99 , during the testing, our mask is applied to the query target (i.e., the test output), and therefore, patch-wise token shuffling is not needed. Averaging all tokens to form a source prototype ensures a general and comprehensive representation of source domains, avoiding bias from individual samples. This permutation-invariant method effectively summarizes the domain’s information. We will improve the clarity in the revision.

评论

I am glad to have the author's feedback. The rebuttal somewhat addressed my concerns (Q2). However, Q1 still remains questioned since I was asking about the evidence of the underlying effect of each component, not the ultimate architecture. For instance, there should be a visualization comparison between w/ and w/o APM to see if the samples indeed avoid unsatisfactory alignments. Thus I hold my rating.

评论

Dear Reviewer ni1v,

We thank you for your time in reviewing our paper.

We have carefully responded to your concerns in the rebuttal. If possible, could you please take a few minutes to review these responses? If anything is unclear, we are happy to clarify further.

Thank you

评论

Thank you for your valuable comment.

As per your comment, we conducted the experiment with and without APM using T-SNE visualization just now, obviously indicating undesirable alignment between targets and sources when APM is not used. This is also evidenced by the quantitative results of models B and C in Table B of the rebuttal PDF.

Given the T-SNE visualization figure is not able to be attached here, we will include it in the revision.

审稿意见
6

This paper introduces a novel framework designed to enhance model transferability in continually changing target domains for multi-task point cloud understanding. The framework, termed PCoTTA, includes three key components: Automatic Prototype Mixture (APM), Gaussian Splatted Feature Shifting (GSFS), and Contrastive Prototype Repulsion (CPR). These components work synergistically to prevent catastrophic forgetting, mitigate error accumulation, and ensure the distinguishability of prototypes during adaptation. The authors present comprehensive experimental results demonstrating the superiority of PCoTTA over existing methods across multiple tasks, including point cloud reconstruction, denoising, and registration.

优点

[1] Innovative Framework: The introduction of PCoTTA is pioneering in the field of continual test-time adaptation for multi-task point cloud understanding. The framework's design is both practical and realistic, addressing a significant gap in the current state of research. [2] New Benchmark: The creation of a new benchmark for practical continual test-time adaptation in multi-task point cloud understanding is a valuable contribution to the field, facilitating future research and comparison. [3] Experimental Validation: The paper provides extensive experimental results across multiple tasks and domains, demonstrating the effectiveness and superiority of the proposed method. [4] Writing quality: This paper is written and organized well

缺点

[1] Limited Task Variety: While the paper demonstrates the framework's effectiveness , it could benefit from including a broader range of point cloud tasks to further validate its versatility and robustness. E.g. Traditional domain adaptation task, classification task on ModelNet -> ScanObjectNN [2] Real-World Application: Although the framework is tested on both synthetic and real-world datasets, more discussion on real-world applicability and potential challenges in diverse practical scenarios would strengthen the paper. [3] Efficiency Metrics: The paper primarily focuses on the effectiveness of the framework. It would be beneficial to provide more analysis on the throughput/inference speed. [4] Comparison with Broader Techniques: It will be interesting to include more point cloud understanding methods like PointNext, and domain adaptation techniques.

问题

Refer to the Weakness. I will improve my rates when the weaknesses are solved.

局限性

Refer to the Weakness.

作者回复

We greatly appreciate your thorough review and valuable feedback. We are pleased that you recognized the novelty and practical application of our PCoTTA framework for continual test-time adaptation in multi-task point cloud understanding. We appreciate your acknowledgment of our new benchmark aimed at advancing future research. Additionally, we thank you for your confirmation that our experimental validation and clear writing effectively demonstrated the superiority of our method.

In the following, we will address each of your concerns:

Q1: Other tasks. Thanks for your valuable suggestions. Following PIC 1111 , our PCoTTA is fundamentally designed for regression tasks, making them ‘unified’ with position output (x, y, z) and a single loss. This focus on regression tasks inherently affects its ability to discrimination tasks such as classification. Honestly, multi-task learning in point cloud is still in its early stage, and our focus does not lie in unifying as many tasks as possible. In future work, we would like to specifically enhance the diversity of tasks by developing models applicable to other tasks like classification.

Q2: Real-world application. Thanks. Our method shows potential for many real-world applications, e.g., autonomous driving and virtual reality, as indicated by the analysis of its computational efficiency, e.g., number of parameters and running time, in Table D. Since our PCoTTA is an end-to-end test-time adaptation method that does not employ a teacher-student model or pseudo labeling technique, it is more efficient and suitable for real-time deployment.

However, other tasks like classification are not considered provided we are following PIC’s multi-task setting. As future work, we would like to investigate on how to enhance the diversity of point cloud understanding tasks within a single framework. In addition, though our model is efficient (0.06 seconds at inference), we still need to consider the computing power of devices, and may need to further enhance its efficiency through reducing the size of the backbone model.

Q3: Efficiency metrics. We present an analysis of model parameters and running time in Table D in the rebuttal PDF. The results show that our method can infer target data in a fast speed, and our model has the fewest parameters compared to other CTTA methods.

Q4: More Comparisons. As suggested, we compare our PCoTTA with PointNext. With the same setting, we evaluate it on our new benchmark, where it is trained on all source domains and tested on unseen targets with 3 independent evaluation rounds. Moreover, we reproduced ViDA 11 , a specialized method for CTTA. Table A shows our method’s superiority.

11 Liu et al. Vida: Homeostatic visual domain adapter for continual test time adaptation. In ICLR 2024.

评论

Thanks for responses. Keep my ratings. Good luck !

评论

We thank you so much for your positive support. We will definitely revise our paper accordingly based on your valuable comments.

审稿意见
5

This paper presents a new point cloud benchmark for Continual Test-Time Adaptation (CTTA) and compiles relevant 3D datasets. Additionally, this paper devises three innovative modules for PCoTTA, including automatic prototype mixture (APM), Gaussian splatted feature shifting (GSFS), and contrastive prototype repulsion (CPR) strategies, to collectively address the issues of catastrophic forgetting and error accumulation in CTTA tasks.

优点

  1. The point cloud CTTA dataset compiled in this paper is highly significant.

  2. The idea of using Automatic Prototype Mixture to avoid catastrophic forgetting is sensible, but it does not align well with the standard CTTA setting, which cannot access source domain data.

  3. Good writing ensures that the contributions of the paper are clearly understandable.

缺点

  1. My main concern is that the method design violates the basic setting of the CTTA task. The CTTA task stipulates that source domain data cannot be accessed to better simulate real-world applications and ensure data privacy. However, this paper utilizes source prototypes, which are derived from both source domain data and the source model, thus not complying with the regulations. Additionally, this comparison is unfair because if this paper requires the use of source prototypes, then other CTTA methods should also be allowed to use source prototypes.

  2. The paper is severely missing related works and the necessary CTTA baselines for comparison, including [a], [b], [c], [d], etc. [a] EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization. [b] ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation [c] Towards stable test-time adaptation in dynamic wild world. [d] Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

  3. Relying solely on T-SNE and the improvement in main experiment accuracy to validate the method's solution to catastrophic forgetting and error accumulation is insufficient. More extensive experiments and theoretical proof are needed to substantiate your claims.

  4. The ablation study should include scores for using only the contrastive prototype repulsion or Gaussian splatted feature shifting individually to highlight the significance of each contribution. Additionally, contrastive prototype repulsion should not be considered an independent contribution, as it is a well-established technique.

  5. The visualizations in the main text should include comparisons with other CTTA methods.

问题

If the authors address the issues mentioned in the weaknesses, I am willing to increase the rating.

局限性

yes

作者回复

We would like to express our sincere gratitude for your review. We are pleased that you recognized the significance of our compiled point cloud CTTA dataset and the novelty of our three modules to address catastrophic forgetting and error accumulation. Additionally, we are glad that the clarity of our writing ensured a clear understanding of our paper's contributions.

In the following, we will address each of your concerns:

Q1: CTTA setting. Thanks. We acknowledge that the CTTA task prohibits access to source domain data to ensure real-world applicability and data privacy. Strictly following this protocol, our PCoTTA, which utilizes source prototypes, is designed carefully to ensure a fair comparison. The source prototypes are derived exclusively from the source model, which is pre-trained before any adaptation. Source prototypes leverage the structured knowledge embedded within the source model, aiming to improve the robustness and efficiency of the adaptation without compromising data privacy or violating the setting of CTTA. This is analogous to utilizing the parameters (prompts) of a pre-trained source model 99 in CTTA, and is a common practice in CTTA.

Q2: Missing related works. Following your advice, we will include these papers in the revised version. Since aa , cc , and dd do not have official open-source code or models available, we were unable to compare our results with them. We reproduced bb in our setting and evaluated it on our proposed benchmark. The results are shown in Table A, indicating that bb just performs similarly to CoTTA 3737 , and our method outperforms it significantly. We attribute this to its focus on domain adaptation without specific designs for multi-task learning, which prevents it from outperforming our method.

Q3: More experiments to verify catastrophic forgetting and error accumulation. As for the issue of catastrophic forgetting, we have verified it with both quantitative and qualitative experiments. As shown in Table 1 of the manuscript, our continual test-time adaptation setting is similar to the one in CoTTA 3737 where we track the continuous performance across 3 independent evaluation rounds, with samples shuffled randomly in each round. The results in 3 rounds demonstrate the stability, i.e., robust continuous online learning abilities, and resilience against catastrophic forgetting in continually varying target domains. Moreover, we provide T-SNE visualizations for three independent validation rounds in Figure A and task-specific visualizations for each round in Figure B in the rebuttal PDF. Our method remains stable across continuous rounds, demonstrating that our proposed APM and GSFS effectively mitigate catastrophic forgetting by explicitly leveraging constant source prototypes and source domain representations, thereby avoiding over-reliance on adaptively learned information. Finally, our APM pairs and mixes based on the similarity between the source-learnable prototypes and the current target, effectively incorporating all source domain information. As a result, catastrophic forgetting is effectively mitigated by explicitly fusing inherent source representations (prototypes) and applying graph attention mechanisms to target features, thus achieving good stability of our models.

As for the issue of error accumulation, we have verified it both theoretically and empirically. Firstly, as evidenced by 3838 , pseudo labels tend to introduce pseudo label noise and could lead to error accumulation in long-term adaptation. In contrast, in our PCoTTA framework, we do not use any online or offline pseudo labeling techniques 9,379, 37 , which inherently avoids the risk of error accumulation. In Table 1 of the manuscript, the results across 3 continuous rounds also illustrate the effect in avoiding error accumulation.

Q4: More ablations and clarification of CPR. Thanks. As per your constructive advice, we present additional ablation studies in Table B, evaluating the use of CPR and GSFS individually. These results clearly demonstrate the incremental benefits of each component and their combined effect on improving performance.

While contrastive learning is a well-established technique, our CPR leverages domain prototypes (the new knowledge), introducing a novel aspect to the method. Traditional contrastive learning focuses on instance-level representations, whereas our approach innovates on the use of domain-level learnable prototype interactions. Equipped with the proposed CPR and learnable prototypes, our method provides structured and informative representations of various target domains to guide the adaptation process. We will revise this in the new version.

Q5: Visualizations comparisons with other CTTA methods. Thanks. We have provided the visual results of other CTTA methods in Appendix A.3. Our PCoTTA excels in producing high-quality predictions across multiple tasks, even as the target domain changes. This is due to our three innovative modules, which minimize discrepancies between source and target domains, enhancing overall prediction quality. We will include these visualizations in the revised paper.

评论

I appreciate the author’s response and the additional experiments, but I still have some unresolved concerns.

1)CTTA setting problem: First, I am very clear about the description of the CTTA setting in papers [9] and [37]. In CTTA, it is permissible to use source domain pre-trained model parameters. So, I would like to ask if the source prototypes you used are the features/tokens extracted by the model or the model's parameters/token-wise visual prompt. As you described in Line 158, “we save all source prototypes Zs derived from the model at the last epoch,” so you not only used the model parameters but also the source prototypes. In the CTTA setting, only the pre-trained source model is allowed to be accessed; you cannot access any part of the source model's training process, as this would still involve interacting with the source data. How did you obtain the source prototypes? For example, if the source model is already deployed on an end-device, how do you obtain the source prototypes? Existing source-free TTA and source-free CTTA methods [35, 37, a, b] do not access any source domain features/tokens. Therefore, I suggest that the method proposed in this paper should not be compared with source-free methods.

2)Paper [a] has official open-source code available at https://github.com/Lily-Le/EcoTTA.

3)What are the details of the reproduction of paper [b] in table A? Why is it performing worse than the source model?

If my remaining concerns can be addressed, I am willing to improve my rating.

评论

Thanks for your comments. We would like to further address your concerns as follows:

Q1: Sorry for causing confusion. We agree that the source prototypes we used are the tokens extracted by the model during the source pretraining stage and pre-computed and pre-saved in the device for test-time adaptation. We’d like to stress that this does not violate the fairness of the CTTA setting.

In fact, many recent papers of TTA/CTTA [I, II, III] reveal that the use of source prototypes like features, tokens, and statistics from the source domain do not pose privacy issues and can be utilized to further enhance the adaptability on target data. Firstly, TTAC [I] calculates category-wise and global statistics of source domains as anchors and pre-saves them for streaming online test-time adaptation. Similarly, the published work [II] generates the class-wise source prototypes before model deployment and presents an auxiliary task based on nearest source prototypes to align the source and target features. Besides, TPS [III] computes each class prototypes of source domains, enabling the prototypes to be cached and reused for all subsequent test-time prediction.

Since our work is following and inspired by them, we strictly follow the same setting. In line with [I, II, III], our source prototypes are pre-cached before deployment and stored as constant parameters (token-like vectors) alongside the pre-trained source model. After deployment, our method does not access the source data to ensure no interaction with the source data is involved in the test-time adaptation stage. Different from them, our method focuses on domain-level prototypes instead of class-level prototypes, avoiding pseudo labeling of categories and instead highlighting the inherent features of the entire domain.

As per your constructive advice, we will carefully re-clarify this issue in the revised manuscript to eliminate misunderstandings.

[I] Su et al. Revisiting realistic test-time training: Sequential inference and adaptation by anchored clustering. In NeurIPS 2022.

[II] Choi et al. Improving test-time adaptation via shift-agnostic weight regularization and nearest source prototypes. In ECCV 2022.

[III] Sui et al. Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models. Preprint in ArXiv 2024.

Q2: Thank you for the information, we have also noticed this link, however, this is a community implementation rather than the official code. Additionally, according to the issue mentioned here https://github.com/Lily-Le/EcoTTA/issues/1#issuecomment-1667177037, the repository's author noted that the replicated results were not satisfactory and unsuccessful. Nonetheless, we still tested it on our new benchmark and obtained sub-optimal results. Therefore, we chose not to include these results in the rebuttal process to avoid any biased comparisons. We have cited this inspiring work in the revision and would endeavor to reproduce EcoTTA for comparisons in the future.

Q3: Apologies for the confusion. Firstly, PointNext in Table A is another point cloud understanding method that we reproduced under the multi-task setting, as suggested by Reviewer ynhY for additional comparisons. It is not the source model for ViDA [b].

Since we are following the same settings as CoTTA [37] for reproduction, we also use PIC [9] as the source model for ViDA and thus equip ViDA with multi-tasking and multi-domain learning capabilities. As such, all these CTTA methods will start from the same source pre-trained model and ensure fair comparisons.

From the table, we can observe that ViDA indeed achieves better results than PIC, but it does not outperform our method. The reasons for the superiority of our PCoTAA to ViDA lie in several aspects. Firstly, ViDA employs a teacher-student framework and uses a consistency loss where the teacher models’ predictions serve as the pseudo labels of the student model. However, as evidenced by [38], pseudo labels tend to introduce pseudo label noise and could lead to error accumulation in long-term adaptation. In contrast, in our PCoTTA framework, we do not use any online or offline pseudo labeling techniques [9, 37], which inherently avoids the risk of error accumulation. Secondly, ViDA is specifically designed for CTTA in 2D images and cannot well tackle the CTTA in 3D data. This is because compared to grid-structured 2D images, 3D point clouds are unordered and high dimensional, which is more challenging in CTTA. To address this, we specifically designed a graph attention mechanism to fully learn token sequences within a complete graph structure. This effectively captures contextual relationships and semantic features between unordered and irregular point cloud patches.

评论

Thank you for your response.

  1. The paper "Continual Test-Time Domain Adaptation [37]" was the first to set the CTTA problem, and subsequent works have followed its setup. In [37] Table 1, it is clearly stated that the CTTA setting involves "No Source" and "No Train stage." Therefore, I believe that in the correct CTTA setting, the source domain model's training process should not be accessed. Additionally, once the source prototypes are obtained, is it possible to infer information about the source domain data? Therefore, I suggest comparing this paper with non-source-free TTA/CTTA methods. BTW, you could directly compare it with papers that use source features, as the baselines compared in this paper do not access source features.

  2. The reproduction details for paper [b] are still unclear. For example, where is the adapter injected? How is uncertainty obtained from the 3D data?

评论

Q1: Thank you! In the CTTA setting of [37] (Table 1), "No Source" means no source data (x, y) is accessible during the test-time adaptation stage, and for 3D point cloud task, such data indicates point cloud inputs with coordinates (x, y, z) which are prohibited for use in the adaptation stage. As a matter of fact, it does not restrict the use of prototype features extracted by the source pre-trained model. We would like to state that our setting is fair since we do not use source data and we follow the common practice of [I, II, III] that source prototypes are pre-cached before deployment and stored as constant parameters alongside the pre-trained source model, and then use them for test-time adaptation. In general, the pre-cached source prototypes are derived from the source pre-trained model and can be regarded as part of the source pre-trained model. Please note [I, II, III] are three examples of this common practice, and there are many emerging in recent years. We just wish to convey that this is indeed a well-recognized practice in the community.

On the other hand, our proposed PCoTTA method still shows great superiority without the use of source prototypes, to the original CoTTA [37]. Take the point cloud reconstruction task as an example, as shown by model B (without source prototypes) in Table B of the rebuttal PDF and CoTTA’s results in Table 1 of the manuscript, where CoTTA [37] achieves 58.3, 56.7, 55.2 during the 3 different rounds, while our PCoTTA achieves 36.8, 36.2, and 35.7, demonstrating superiority and the state-of-the-art performance in continual test-time adaptation for 3D point cloud.

Thank you very much, and we hope the above can address your concerns.

Q2: Sorry for the confusion. We reproduce ViDA [b] for 3D point cloud understanding, strictly following its original setup. High-rank and low-rank ViDAs are injected into all layers of the source model (i.e., the PIC pre-trained model) and scaled using scale factors. Since ViDA is a 2D image method, we apply typical 3D data augmentation like rotation and scaling for training the teacher-student model. Like ViDA, we calculate uncertainty values and scale factors using the mean and variance of model outputs over several augmentations. However, instead of using predicted probabilities in classification tasks, we employ position offsets of outputs for our point cloud understanding tasks. Furthermore, given the Chamfer Distance (CD) loss is used in our point cloud understanding tasks (essentially regression tasks), we optimize the teacher-student model using the typical CD loss as a consistency loss instead of a cross-entropy loss which is typically used in classification tasks, to keep consistency. We will add these implementation details in the revision.

评论

The author’s response still hasn’t resolved my concern. The three articles you cited [I, II, III] are not about CTTA, so how can you claim that "many recent papers of TTA/CTTA" use this approach?" I have not found any recent CTTA papers that use source features; if I am mistaken, please correct me.

Meanwhile, I am not questioning the effectiveness of your method. You can use source features/tokens, but your method is no longer in a traditional CTTA setting. You need to compare it with baseline methods that use source features/tokens. If you can supplement the experiment and still validate the advantages of your method, I believe it could resolve my main concern. However, a detailed description of method reproduction and comparison is necessary.

However, I find the experiment provided in Table B of the rebuttal PDF very meaningful, as it demonstrates that PCoTTA can still work without using the source prototype. That said, putting aside the use of the source prototype, why does it perform so much better than CoTTA [37]? In the CoTTA reproduction, how was the test-time augmentation of the teacher model reproduced, and how many forward times did the teacher model perform? How was the test-time augmentation strategy of resizing input resolution in CoTTA implemented in 3D? How were the pseudo labels selected?

评论

Thank you for the further comments. We would like to address your concerns from 4 aspects as follows:

Q1: About the CTTA setting. Apologise if there is any confusion. Certainly, we totally agree that our manner is not exactly the same as the original paper [37] given it doesn’t use source prototypes. We meant to convey that [I, II, III] and more published works of CTTA [IV, V, VI] are confirming this common practice of using source prototypes. We just wish to convey that our setting still belongs to the CTTA category. This can be evidenced by the recent CTTA works [IV, V, VI] that involves source prototypes information during the continual test time adaptation.

For example, RMT [IV] extracted source prototypes of each class and pre-cached them before the adaptation, and then used the source prototypes to calculate the contrastive loss during the continual test-time. Similarly, SANTA [V] also pre-computed the source prototypes before the adaptation and used them for the target alignment during the continual test time adaptation. Besides, OBAO [VI] also conducted in a similar manner and acknowledged that this is fair in the CTTA setting. As pointed out in Page 8 of this ECCV paper [VI], “Some previous methods [IV, V] directly penalize the movement of target domain samples in the feature space relative to the source prototypes. This can be broadly interpreted as penalizing the movement of corresponding elements between ˆV and Vt in our defined CRG.”

Since we are inspired and following these works, we think our setting belongs to the CTTA category. We would revise the paper to re-clarify this issue to eliminate misunderstandings. Thank you again.

[IV]. D ̈obler et al. Robust Mean Teacher for Continual and Gradual Test-Time Adaptation. In CVPR 2023.

[V]. Chakrabarty et al. SANTA: Source Anchoring Network and Target Alignment for Continual Test Time Adaptation. In TMLR 2023.

[VI]. Zhu et al. Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation. In ECCV 2024.

Q2: About the comparisons with CTTA methods w source prototypes. As per your constructive suggestions, we are now reproducing several recent methods that leverage source prototypes in CTTA setting. We will post the results later with the reproduction details here once finished. We will also include them and more comparisons in the revision.

Q3: About the rationale of prototypes in CTTA. Following the typical prompt-based CTTA method VDP [9], we also pretrained the source model on the source domains for three epochs to initialize and pre-cached the learnable prototypes before the target adaptation. (as mentioned in Section 4.1, line 233 of the manuscript). As such, the learnable prototypes have some knowledge basis of souce domains and could guide the target data shifting without the source prototype, resulting in comparable performance. However, it is worth noting that our PCoTTA suffers from a considerable performance drop when source prototypes are not used, indicating the effectiveness of our exploiting of source prototypes information.

4: About more reproduction details. For teacher-student model-based 2D CTTA methods, including CoTTA [37] and ViDA [b], we replace test-time augmentations for 2D images like resizing and fliping with typical 3D data augmentation techniques, such as rotation and scaling. We follow the official settings of forward times for teacher model training, and as indicated in their official code: 32 for CoTTA and 10 for ViDA. Similar to the ViDA reproduction, the pseudo-labels in CoTTA's 3D implementation are the output point clouds, as it handles 3D point regression tasks. We treat each point as a category-like pseudo-label and use Chamfer Distance (CD) loss to optimize the teacher-student model.

评论

Thank you for your response. I look forward to your comparison with other methods that utilize source knowledge/information. I believe this is the fairest experiment to highlight the effectiveness of your approach, as the amount of information used by each method is consistent.

评论

Thank you!

We reproduce two CTTA methods, RMT [IV] and SANTA [V], which use source prototypes. In revision, we will include them and add more comparisons.

Q1. About the reproduction details. The reproduction details of these methods are as follows:

  1. We replace their class-level source prototypes with our domain-level token-like prototypes for our point regression tasks.
  2. In RMT [IV], we use symmetric Chamfer Distance to match their symmetric cross-entropy in L_SCE. Like other teacher-student CTTA methods we reproduced, we replace 2D image augmentation with 3D data augmentation (i.e., rotation and scaling) for their contrastive learning and teacher model training. We follow the official code for other settings in the teacher-student model, but we reduce the warm-up phase to 5000 samples since our benchmark is much smaller than ImageNet.
  3. For SANTA [V], we use similar 3D augmentation (rotation, scaling) to generate augmented samples for their Source Guided Target Alignment (a source prototype-based contrastive learning similar to RMT). Following the setting outlined in their paper, we only update the BatchNorm layer parameters in the source model during adaptation.

Q2: About comparison results and analysis. We present the comparison results in the Table below, demonstrating that our method still outperforms these CTTA methods with source prototypes. The reasons for the superiority to these methods lie in three aspects, shown below.

  1. Usually, these methods heavily rely on the student-teacher architecture to realize the consistency regularization. As a result, they would inevitably introduce the pseudo label noise in their approaches, leading to error accumulation. Although they use symmetric cross-entropy or other techniques to alleviate the pseudo label noise, such problems still exist and cannot be fundamentally addressed. In contrast, our PCoTTA framework does not use any online or offline pseudo labeling techniques [9, 37], which inherently avoids the risk of error accumulation. In Table 1 of the manuscript, the results across 3 continuous rounds also illustrate the effect in avoiding error accumulation.
  2. These methods are specifically designed for CTTA in 2D images and perform well on 2D images. However, compared to 2D images, 3D point cloud data is disordered, unstructured, and sparsely distributed, making these 2D image-based CTTA methods less effective or even cannot be directly applied. Our method involves specific designs for 3D point cloud data, e.g., Gaussian Splatted-based Graph Attention for comprehensive, patch similarity-based adaptation, well-suited for 3D data, and achieves better performances than these methods.
  3. These methods often focus on single tasks and all lack specialized design in multi-task learning, which may lead to gradient conflicts in the optimization process of continual test-time adaptation. Instead, Our PCoTTA devises task-specific prototype banks where individual source-learnable prototype pairs are used for different adaptations in each task, thus favoring the multi-task learning in our setting.
Rounds123
Target DomainsModelNet40ScanObjectNNModelNet40ScanObjectNNModelNet40ScanObjectNN
MethodsRec./Den./Reg.Rec./Den./Reg.Rec./Den./Reg.Rec./Den./Reg.Rec./Den./Reg.Rec./Den./Reg.
RMT [IV]31.2/44.0/34.347.4/59.6/39.930.6/43.5/33.945.6/53.0/35.830.4/42.7/33.845.9/51.1/36.4
SANTA [V]33.0/42.9/39.545.3/57.9/36.231.9/42.4/37.142.2/55.8/34.931.3/41.6/36.541.4/54.3/33.6
Ours6.3/21.4/15.48.9/28.3/20.75.5/19.9/14.68.5/26.9/19.65.4/18.6/12.18.2/25.2/19.3

[IV]. D ̈obler et al. Robust Mean Teacher for Continual and Gradual Test-Time Adaptation. In CVPR 2023.

[V]. Chakrabarty et al. SANTA: Source Anchoring Network and Target Alignment for Continual Test Time Adaptation. In TMLR 2023.

评论

There are serious problems with the reproduction of SANTA. Your baseline model is PIC, which is a transformer-based model, and each transformer block includes layer normalization. However, in the reproduction details, you claim, "we only update the BatchNorm layer parameters in the source model during adaptation." This reproduction and the associated experiments have significant flaws, and I also have concerns about the reproduction results for the comparison method TENT [35].

评论

Thanks for your comment. We think it may cause you a misunderstanding here. We ensured to use PIC [8] as the source model for all reproduction methods. In PIC, the model comprises the token embedding module (Encoder) and Transformer blocks, where the Encoder contains BatchNorm layers (shown below). Following the original works SANTA [V] and TENT [35], their BatchNorm layers are updated. This ensured fairness in reproduction. Please see the network details below. We hope this clarifies your concerns.

…
(MAE_encoder): MaskTransformer(
    (encoder): Encoder(
      (first_conv): Sequential(
        (0): Conv1d(3, 128, kernel_size=(1,), stride=(1,))
        (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv1d(128, 256, kernel_size=(1,), stride=(1,))
      )
      (second_conv): Sequential(
        (0): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
        (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv1d(512, 384, kernel_size=(1,), stride=(1,))
      )
    )
    (blocks): TransformerEncoder(
      (blocks): ModuleList(
        (0): Block(
          (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
          (drop_path): Identity()
          (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU(approximate=none)
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=False)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
        )
…
评论

I do not have any misunderstanding about the reproduction. I checked the official code of PIC, and the model only has two Batch Normalization layers, which are in front of the transformer encoder and the transformer decoder. The BN parameters are minimal. Following previous transformer-based CTTA methods, I believe that for the reproduction of SANTA and TENT, the Layer Normalization layers should be updated. Therefore, I think the reproduction and experiment are incorrect and unfair.

评论

Thanks for further comment.

As per your comment, we further update the LayerNorm parameters for SANTA [V], TENT [35], and AdaBN [18]. AdaBN [18] was compared in our paper and involves BatchNorm update. The results are shown in the below table, indicating that the update of LayerNorm only improves a bit and our method still outperforms them obviously. Our method does not update the source model (including Transformer Blocks).

We will discuss the above in the revision.

Rounds123
Target DomainsModelNet40ScanObjectNNModelNet40ScanObjectNNModelNet40ScanObjectNN
MethodsRec./Den./Reg.Rec./Den./Reg.Rec./Den./Reg.Rec./Den./Reg.Rec./Den./Reg.Rec./Den./Reg.
AdaBN + LN [18]58.7/52.1/37.764.1/76.8/57.258.9/51.5/37.264.1/74.2/53.956.8/50.3/35.562.1/71.7/51.1
TENT + LN [35]57.9/50.6/36.864.8/76.4/55.057.8/50.0/36.764.7/73.5/51.155.2/48.4/35.062.1/69.2/49.7
SANTA + LN [V]32.3/42.1/37.844.9/55.2/38.631.7/41.9/37.442.0/53.4/35.630.1/41.6/36.440.6/52.9/34.7
Ours6.3/21.4/15.48.9/28.3/20.75.5/19.9/14.68.5/26.9/19.65.4/18.6/12.18.2/25.2/19.3

[V]. Chakrabarty et al. SANTA: Source Anchoring Network and Target Alignment for Continual Test Time Adaptation. In TMLR 2023.

[18] Li et al. Revisiting batch normalization for practical domain adaptation[J]. In Preprint ArXiv 2016.

[35] Wang et al. Tent: Fully test-time adaptation by entropy minimization. In ICLR 2021.

评论

Thank you for your detailed responses. However, I still believe that the most accurate CTTA setting should not rely on any source information beyond the source model itself, including source features and tokens. This is because CTTA simulates the continual adaptation process post-deployment on the edge, where there is no opportunity to access the source model’s training phase or store source features. However, this reflects only my personal view, shared by some other researchers, and does not represent the views of all researchers. Additionally, the description of the baseline methods' reproduction in the paper is not entirely accurate or complete, and I hope the authors can address these issues. Finally, I appreciate the authors' efforts in addressing most of my concerns and conducting numerous additional experiments, and I will raise my rating to "Borderline Accept."

评论

We appreciate your time in reviewing our work and making our paper more comprehensive. Thank you for your support and for raising your rating. We will consider those points and revise our paper accordingly in the new version.

作者回复

We would like to thank the AC and all reviewers for their efforts and time in reviewing our paper. We appreciate their constructive and valuable comments. We are pleased to see reviewers’ acknowledgement of the significance of our compiled point cloud CTTA dataset or new benchmark (Reviewer SkYG, Reviewer ynhY, Reviewer 5Vdz), the novelty of the method (Reviewer SkYG, Reviewer ynhY, Reviewer ni1v, Reviewer 5Vdz), good writing/organization (Reviewer SkYG, Reviewer ynhY, Reviewer ni1v), and significantly superior performance (Reviewer ynhY, Reviewer ni1v, Reviewer 5Vdz).

For each reviewer, we have separately submitted a rebuttal accordingly. We addressed all concerns from each reviewer there. We will update our paper accordingly.

最终决定

This paper introduces a novel point cloud benchmark for Continual Test-Time Adaptation and a set of 3D datasets. Furthermore, it proposes three innovative modules for PCoTTA to tackle the challenges of catastrophic forgetting and error accumulation in CTTA tasks. After a fruitful discussion, the reviewers lean towards accepting the paper. In the camera-ready version, the authors should include the results presented during the rebuttal, as well as clarify the relation to the CTTA setting and other presentation issues raised by the reviewers.