PaperHub
6.6
/10
Spotlight4 位审稿人
最低3最高4标准差0.5
4
3
3
4
ICML 2025

Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios

OpenReviewPDF
提交: 2025-01-18更新: 2025-07-24

摘要

关键词
Multi-view Clustering; Contrastive Learning; Noisy Scenarios

评审与讨论

审稿意见
4

In this paper, the authors propose a novel multi-view clustering method AIRMVC for noisy scenarios. They formulate the noisy identification as the anomaly problem. Besides, a noise-robust contrastive loss is designed to enhance the model performance. Experiments on six datasets show the effectiveness of the proposed method.

给作者的问题

Please see Weaknesses and suggestions.

论据与证据

The motivation of AIRMVC is clearly articulated and supported by experimental validation. Additionally, extensive supplementary experiments are provided in the appendix to further reinforce the findings.

方法与评估标准

The authors investigate the problem of multi-view clustering in noisy scenarios, a challenge widely encountered in real-world applications. The proposed method is well-aligned with the stated motivation and is designed to enhance model robustness under noisy conditions, offering practical insights for real-world implementations.

理论论述

The authors provide a theoretical proof for noise-robust contrastive learning for supporting the rationale behind this approach.

实验设计与分析

The authors conducted extensive experiments, with most baseline methods being from 2024, effectively demonstrating the efficacy of the proposed approach.

补充材料

The authors supplemented the main text experiments in the appendix, providing relevant mathematical proofs and additional details on the experimental setup.

与现有文献的关系

The paper presents a comprehensive and well-rounded literature review.

遗漏的重要参考文献

The related work is well-presented and the literature review is thorough.

其他优缺点

Strengths:

1.The transformation of noise identification into an anomaly detection problem is an intriguing approach.

2.This paper conducts extensive experiments comparing the proposed method with 2024 state-of-the-art approaches, and the comprehensive experimental results validate its effectiveness.

3.Theoretical analysis demonstrates the robustness of the proposed contrastive learning mechanism in noisy environments.

Weaknesses:

1.In the noise identification module, both the projector and classifier are designed. Are these components shared across multi-views, or do they operate independently? The authors should provide a clear explanation regarding this aspect.

2.The authors have not discussed the limitations of AIRMVC or outlined potential directions for future research.

3.Although the authors conducted explanatory experiments in Figure 3 to support their motivation, they only utilized the relatively small-scale BBCSport dataset. It is recommended to perform experiments on larger datasets to further validate the motivation’s effectiveness.

4.The authors should release the source code to enhance the reproducibility of the study.

其他意见或建议

1.The notation II in Equation (10) should be explicitly defined for clarity.

2.On page 7, line 364, there is a typographical error in the quotation marks for "w/o D&R&Con."

作者回复

Explanation for projector and classifier: Thanks. We perform feature mapping and transformation in the latent space using a projector, and the sample predictions are obtained through a classifier. The projector and classifier are shared across different views. We will include the corresponding description in the final version.

Limitations & Future directions of AIRMVC: Thanks. In AIRMVC, we made an initial attempt to identify and rectify noise in an unsupervised setting and designed a robust contrastive learning method to further enhance the robustness of the model. However, the correction process relies heavily on the accuracy of the predicted distribution, which is the primary limitation of AIRMVC. In the future, improving the accuracy of the predicted distribution and exploring other reliable supervisory signals will be promising research directions.

Large-scale motivation experiments: Thanks. Following your suggestions, we conducted experiments on the Caltech101 and STL10 datasets with a 10% noise ratio. The experimental results are presented in Tab.1 and Tab.2. From these results, we observe the same conclusions as those described in the submitted version, presented as follows:

  1. Simply merging noisy multi-view data results in the most degraded clustering performance. This is primarily due to the absence of a noise rectification mechanism, which causes the negative effects of noisy views to compound each other. Moreover, the fusion process intensifies the influence of noise, leading to a scenario where multi-view clustering performs even worse than using a single view alone.

  2. In comparison to directly correcting noise based on the first view, our proposed AIRMVC demonstrates superior performance. This advantage arises because direct correction from a single view tends to enforce uniformity across views, potentially suppressing essential complementary information. In contrast, our noise detection and rectification strategy effectively removes noisy samples from each view while preserving beneficial cross-view diversity, thereby enhancing the overall clustering performance.

Tab.1 Motivation experiments on STL10 dataset.

MetricOursDirectly RectifySingle ViewNoisy data
ACC28.8126.4122.2215.05
NMI25.0424.2519.0410.78
PUR29.0126.8023.1813.78

Tab.2 Motivation experiments on Caltech101 dataset.

MetricOursDirectly RectifySingle ViewNoisy data
ACC21.4518.6215.2611.25
NMI37.1630.8222.2918.61
PUR34.6929.5220.1816.28

Code: Thanks. Following your suggestion, we will release the code in the final version.

Notation & Typos: Thanks. We will add the notations of Eq.10 and correct the typos in page 7. Furthermore, we will review the entire paper to enhance the overall presentation.

审稿人评论

Thank for rebuttal from the authors. My concerns and confusions are well-addressed and thus I would like to increase my score.

作者评论

Dear Reviewer,

Thanks for your increasing the score. We greatly appreciate the time and effort you have dedicated to reviewing our work!

Sincerely,

The Authors

审稿意见
3

This paper addresses the challenge of noisy data in multi-view clustering by proposing a method called AIRMVC. Specifically, AIRMVC first formulates noise identification and employs a Gaussian Mixture Model (GMM) to achieve this. It then introduces a hybrid rectification strategy with an interpolation mechanism to mitigate the adverse effects of noisy data. The paper validates the effectiveness of AIRMVC on six multi-view clustering datasets.

给作者的问题

Please see the weaknesses

论据与证据

Not all claims are fully supported. For instance, the paper asserts that no prior work has developed dedicated frameworks for identifying and rectifying noisy data. However, MVCAN (Xu et al., 2024) appears to be an earlier attempt at addressing this issue but is not appropriately acknowledged.

方法与评估标准

While the method is designed to address the noisy view problem, the chosen datasets and experimental settings are not sufficiently justified.

理论论述

No

实验设计与分析

Not all experiments are well-designed. Firstly, the datasets used are relatively small (fewer than 13,000 samples), which limits the evaluation of the method’s scalability and generalizability. Secondly, the evaluations are conducted with hand-crafted noise rather than real-world noise, potentially affecting the practical applicability of the method.

补充材料

Yes, I have reviewed the supplementary material, including notations, related works, and additional performance comparisons

与现有文献的关系

The paper designs a method for handling the noisy view issue in the multi-view clustering task.

遗漏的重要参考文献

Although related works are cited, the paper lacks a sufficient discussion on MVCAN (Xu et al., 2024), which may lead to an overstatement of its contributions.

其他优缺点

Strengths:

The proposed method achieves state-of-the-art performance on the six selected datasets.

Weaknesses:

The novelty of the paper is questionable, as it overclaims its contribution to handling noisy views. MVCAN (Xu et al., 2024) may already have laid the groundwork for this problem.

The experimental evaluation is insufficient, relying on small-scale datasets and artificially introduced noise instead of real-world noisy data.

I would consider improving my rating if the authors could give more clarifications or experiment results to address my concerns.

其他意见或建议

I look forward to seeing the method evaluated on large-scale datasets with real-world noise for a more comprehensive assessment of its effectiveness.

作者回复

Additional experiments: Thanks. Following your suggestions, with NVIDIA A6000 GPU we conduct experiments on CIFAR10 dataset, which contains 60,000 samples, 4 views and 10 classes. Besides, YouTube is a comprehensive video platform. We extract facial images from videos as a real-world data source. These multi-view facial images may include low-quality samples, which are treated as noise in our analysis. The Youtube dataset comprises 38,654 samples, 4 views and 10 classes. Detailed statistic information of the datasets is demonstrated in Tab.1. From the results shown in Tab.2 and Tab.3, we conclude that AIRMVC could achieve reliable performance on both large-scale dataset and real-world dataset, demonstrating its generalization capability.

Tab.1 Statistic information of the datasets.

DatasetsClassSampleView
Youtube1038,6544
CIFAR101060,0004

Tab.2 Experiment on YouTube dataset. OOM denotes out-of-memory during training process.

MetricCANDYRMCNCTGM-MVCSCE-MVCMVCANDIVIDEOurs
ACC62.8653.0558.2660.54OOM60.1666.23
NMI70.0665.2755.9164.22OOM65.3870.94
PUR70.2063.8160.1265.54OOM63.0175.10

Tab.3 Experiment on CIFAR dataset.

Noisy Rate0.10.10.10.30.30.30.50.50.50.70.70.7
MetricACCNMIPURACCNMIPURACCNMIPURACCNMIPUR
CANDY20.1612.3521.2118.2511.8218.0116.049.5416.0514.179.0614.64
RMCNC19.2511.5220.5818.2610.6419.2516.458.6816.2515.058.1414.99
TGM-MVC17.8210.0218.9915.297.9114.2513.426.0413.5711.055.7112.52
SCE-MVC18.2510.5519.5418.0210.0018.5715.158.0216.0514.237.6413.16
MVCANOOMOOMOOMOOMOOMOOMOOMOOMOOMOOMOOMOOM
DIVIDE20.5711.2621.2718.6910.0619.9516.848.3217.6214.056.0514.85
Ours22.6213.7123.3421.6713.2422.5220.0812.4920.8317.6810.2518.26

Discussion with MVCAN: Thanks. We discuss AIRMVC with MVCAN from three key perspectives:

  1. Optimization Strategy: MVCAN adopts a two-level iterative optimization framework, consisting of T-level and R-level optimization to refine the network. In contrast, In contrast, AIRMVC focuses on noise detection and correction using a Gaussian Mixture Model (GMM) and directly optimizes the network.

  2. Soft Label Acquisition: MVCAN employs a parameter-decoupled model to obtain view-specific representations and soft labels, mitigating the influence of noisy views. AIRMVC leverages a GMM trained with a shared projector and classifier to generate soft labels.

  3. Module Design: MVCAN incorporates unshared parameters, distinct clustering optimization functions, and a two-level iterative optimization approach. In comparison, AIRMVC introduces a dedicated noise detection and correction mechanism, along with a noise-robust contrastive learning framework to enhance model robustness.

Novelty of AIRMVC: Thanks. The novelty of AIRMVC mainly contains the following perspectives.

  1. Leveraging GMM, we reformulate the noise identification as an anomaly identification problem and propose a hybrid rectification strategy to automatically correct the noisy data.

  2. We design a noise-robust contrastive mechanism to generate more reliable representations. Theoretically, we have demonstrated that the features generated by this mechanism are more beneficial for downstream tasks.

  3. Extensive experiments on different benchmark datasets to verify the effectiveness and robustness of AIRMVC.

审稿人评论

Thank you for the detailed responses. I appreciate the additional experiments on large-scale and real-world noisy datasets, which address my second major concern. Furthermore, the detailed discussion on MVCAN highlights the novelty of the proposed method. I would like to raise my rating and maintain a positive stance on the paper.

作者评论

Dear Reviewer,

Thank you for increasing the score and acknowledging our approach. We sincerely appreciate the time and effort you dedicated to reviewing our work. Based on your suggestions, we will make further improvements in the final version.

Best,

The Authors

审稿意见
3

The paper considers the problem of multi-view clustering in the presence of noise. In particular, a new approach is proposed that aims to detect noisy samples, characterized as outliers, and to rectify them based on the assumption that the first view is noise-free. In addition, the construction of the pairs in the contrastive loss is improved by taking into consideration the soft clustering labels.

Update after rebuttal

After the additional clarifications provided by the authors, I have decided to increase my rating. While it still is based on relatively strong assumptions, which should be thoroughly discussed in a limitation section if the paper would be accepted, it could serve as an initial exploration of this setting.

给作者的问题

Please elaborate the experimental setup and comment on the effect of the assumption on the first view. The work further leverages another assumption, which is the presence of balanced clusters in 3.1. Does this overly bias the model to clustering settings where there are balanced classes?

论据与证据

Yes

方法与评估标准

While the setup is mostly reasonable, the type of noise added to the samples is not described. Based on the introduction and problem definition, the noise considered in this work is due to the image being corrupted, while previous referenced work mostly considers noise in the sense of alignment (are the two views representing the same object). However, how this noise has been added and how the X% of data in the experiments has been corrupted is unclear.

Related to this point, the proposed approach builds on the strong assumption of having one clean view for the rectification step. While the authors state that this is done following previous work with references provided, these prior works, to the reviewers knowledge make a different assumption where some data is known to be aligned and some misaligned and do not assume that there is one completely uncorrupted view. What is the effect if this assumption is not valid?

In summary, additional clarifications on this setup are needed to ensure that the evaluation is fair.

理论论述

The paper includes a theoretical interpretation of the noise-robust loss in Theorem 4.1, which appears to be correct.

实验设计与分析

As mentioned above, the experimental design is somewhat unclear when it comes to the addition of noise and clarifications are required. Beyond this, the experimental design appears sound. However, it would be beneficial to also report the clean performance for reference, despite the focus being mostly on the noisy setting.

补充材料

Yes, all of it.

与现有文献的关系

Within the extended multi-view clustering literature, there has been some work on designing more robust approaches. However, these have mostly been focusing on the design of approaches that are able to handle partial view alignment or incomplete views. While there are certain approaches that aim to address robustness to noise (such as Xu et al, CVPR 2024), it is a less explored domain and the paper contributes a new approach toward it.

遗漏的重要参考文献

Overall, the reviewer believes that prior work is cited adequately, but believes that the paper would benefit from a clearer discussion of what noise problems the different baselines address, as the baselines are designed for different types of noise.

其他优缺点

Overall, the paper addresses the interesting problem of corrupted views in deep multi-view clustering and the paper is mostly well-written and presents a set of relevant ablation studies to highlight the necessity of the different components. However, the presentation of the problem as well as how this relates to previous works on robustness in the multi-view space, which generally focus on another type of noise, should be improved. In addition it is based on a key assumption (first view is noise-free) and the effect of this assumption should be discussed, as it appears to differ from the assumptions in prior works and thus benefits the proposed approach.

其他意见或建议

The presentation of the problem formulation seems to have been moved out of Sec. 3, while it still is mentioned in Line 139 (can be removed). In line 304, "sub-optimal" should maybe be "runner-up" or "second best". For Table 4, state explicitly that this is the 10% noise scenario.

作者回复

Explanation for adding noise: Thanks. Different with the noisy alignment, we simulate noisy scenarios by injecting standard Gaussian noise to the original views, excluding the first view. Specifically, we generate random Gaussian noise with the same shape as the view and inject it into the original views at a ratio of x%. The parameter x% scales the generated Gaussian noise, thereby simulating different levels of noise contamination.

Experiments on clean data: Thanks. Following your suggestion, we conduct experiments on clean data with six datasets. The results are shown in Tab.1. Due to character limitations, more results can be found in Tab.1 at https://anonymous.4open.science/r/Res-11B2. From the results, we find that AIRMVC achieves promising performance in both clean and noisy scenarios.

Tab.1 Clean data performance

DatasetsUCI-digit--WebKB--STL10--
MetricACCNMIPURACCNMIPURACCNMIPUR
CANDY85.4577.9985.4535.1510.5534.7428.1522.6828.15
RMCNC40.5123.1635.6879.0521.8479.9923.0515.2824.64
TGM-MVC64.3565.7669.4079.6416.6179.6428.1820.8628.51
SCE-MVC84.5576.4886.1578.6519.0477.5428.6424.5929.05
DIVIDE89.2581.5289.4569.6120.2078.1229.6823.6328.95
Ours94.5590.1094.5580.6521.5480.6530.2624.8830.95

Explanation for assumption: Thanks. We provide explanation from three perspectives.

  1. Different with noisy data align, AIRMVC extends the definition of noisy by considering the presence of noise within views. Since we propose a method for detecting and correcting noise, an "ideal" view is required as a reference standard. In the unsupervised multi-view clustering scenario, there is no available label information. Therefore, we assume that the first view serves as an ideal view, acting as pseudo-supervision to correct noise in the other views.

  2. We show this assumption with a real-world multi-view scenario, i.e., the ideal view supplements and corrects the other views. For example, consider a case where the first view consists of high-resolution images, while the second view consists of low-resolution images. In the field of super resolution field, it is common to use high-resolution images (ideal view) to supplement information and guide the learning of low-resolution images (other view), e.g., 2019-ICCV-Guided Super-Resolution as Pixel-to-Pixel Transformation and 2021-CVPR-Robust Reference-based Super-Resolution via C2-Matching. Similarly, we select one view as the reference ("ideal") view to supplement and correct the other views.

  3. Previous works have regarded data partially align as noisy. During the model's testing phase, they use an alignment strategy to align the v1v-1 views to the first view, thereby fusing multi-view feature for clustering, e.g., CANDY (line 53 of https://github.com/XLearning-SCU/2024-NeurIPS-CANDY/blob/main/model.py) and RMCNC (line 236 of https://github.com/sunyuan-cs/2024-TKDE-RMCNC/blob/main/RMCNC_main/sure_inference.py). This alignment operation implies that these papers consider the first view as an ideal view. Therefore, although the scenario settings may differ, to maintain generality, we also treat the first view as an ideal view.

Explanation for baselines: Thanks. Previous studies consider data partially align as noisy. In our work, we extend the definition of noise and explore a more common noisy scenario, where noise exists within individual views. Recently, MVCAN is the only work that explores the issue of noisy views, leaving no other methods available for direct comparison. MVCAN incorporated comparisons with numerous contrastive learning-based methods. Following this setup of MVCAN, we evaluated the performance of various algorithms under our proposed noisy setting. To further validate the effectiveness of our approach, we included the latest multi-view clustering methods from 2024 in Tables 1 and 2 of our submitted version. Moreover, our selection of a substantial number of contrastive learning-based methods is that contrastive learning could enhance both the model's robustness and discriminative capability. Therefore, in the absence of directly comparable methods, we select contrastive learning-based methods to demonstrate the effectiveness of our method.

Explanation for balanced clusters: Thanks. Cluster balance is a widely adopted default assumption in clustering problems, and we follow this common assumption as well. Additionally, to further verify the cluster balance of samples, we conduct statistical analyses on the datasets used in AIRMVC. The results indicate that the sample classes in the utilized datasets are nearly balanced. Due to space limitations, detailed results can be found in Tab.2 in https://anonymous.4open.science/r/Res-11B2.

Typos & Presentation: Thanks. Following your suggestion, we will correct the typos and further improve the presentation.

审稿人评论

I would like to thank the authors for these clarifications and providing the additional results. Could the authors clarify why the performance reported for the benchmark methods on the clean data seem to be significantly lower as the one reported in the original publications (i.e. Candy, Divide, and SCE-MVC)? Additionally, what is the intuition behind AIRMVC performing better than the baselines when no noise is present?

While I certainly agree that it will be useful to follow the assumption of having one “ideal” view as a reference, simplifying the task. This assumption is more of a limitation in this case compared to the setup in prior work as you generally are aware if you have data or not, while the presence of noise in the data is more subtle. In addition, not having the view removes all the information in the view, while adding noise only degrades it. While I do not necessarily think that this is a major problem, I believe it would be a limitation worth discussing, potentially pointing to future work.

作者评论

Explanation for experimental results: Thanks for your comment. From the publicly available code of CANDY (line 11 of https://github.com/XLearning-SCU/2024-NeurIPS-CANDY/blob/main/dataset_loader.py) and DIVIDE (line 11 of https://github.com/XLearning-SCU/2024-AAAI-DIVIDE/blob/main/dataset_loader.py), it is evident that the datasets they used contain only two data views. In contrast, the datasets we employed, i.e., Caltech101 and Reuters, consist of five views. Therefore, the datasets used in our experiments are not the same. We directly report the performance obtained by reproducing their original code with our multi-view datasets, which accounts for the observed differences.

Regarding SCE-MVC (https://openreview.net/pdf?id=xoc4QOvbDs), we used different clustering metrics, i.e., ACC, NMI, and PUR for AIRMVC, whereas SCE-MVC employs ACC, NMI, and ARI. Since the authors of SCE-MVC have not released their code, we reproduced their results based on the descriptions in their paper, which introduced some discrepancies. The experimental results demonstrate that AIRMVC achieves promising performance in the clean setting, rather than necessarily achieving SOTA performance, which aligns with our previous response.

Explanation for clean performance: Thanks for your comment. From our reported results, AIRMVC demonstrates only promising performance in clean scenarios. Moreover, it does not achieve SOTA performance on some datasets. We further analyze the reasons behind its guaranteed performance. Compared with other modules, we design a contrastive learning mechanism to enhance the model's discriminative ability. Specifically, we employ a high-confidence threshold to improve the quality of positive and negative sample pairs in contrastive learning. Furthermore, we provide a concise theoretical analysis to justify the design of our contrastive learning mechanism.

Core idea of AIRMVC: The core idea of AIRMVC is to explore the noisy problem in unsupervised multi-view scenarios. The experimental results in the submitted version demonstrate the effectiveness of AIRMVC in noisy scenarios. Although AIRMVC may not achieve SOTA performance across all datasets in the clean scenario, its promising performance could demonstrate its generalizability.

Future work: Thanks for your comment. Noisy views are a prevalent challenge in real-world multi-view scenarios. However, existing research in MVC has largely overlooked this issue, and there remains a lack of standardized methodologies for simulating noisy datasets. In AIRMVC, we provide an initial exploration of the noisy view problem in an unsupervised setting. We are delight that our method of using an "ideal view" as a reference has received your recognition. Identifying a suitable reference view in an unsupervised scenario and designing more realistic noisy view simulation strategies are promising directions for future research. We fully agree that this is a worthwhile topic of discussion, and following your insightful suggestions, we will continue to explore this problem in greater depth.

According to this year's ICML policy, we are not permitted to engage in multiple rounds of discussion. please trust that we have carefully considered and made every effort to address the concerns you raised. We kindly hope our response addresses your concerns. We greatly appreciate the time and effort you have dedicated to reviewing our work!

审稿意见
4

To mitigate the impact of noisy data on multi-view clustering models, this paper proposes a method capable of automatically identifying and correcting noise. Specifically, the authors reformulate noise identification as an anomaly detection problem. Then, they design a hybrid correction strategy to enhance model robustness. Extensive experimental results demonstrate the effectiveness of the proposed approach.

给作者的问题

See above.

论据与证据

In the submitted version of the paper, the motivation for handling noise is clearly defined and illustrated in Figure 1. Additionally, the authors conduct experiments to verify that the presence of noise adversely affects multi-view clustering performance. The submitted version effectively clarifies the research problem.

方法与评估标准

In this paper, the authors conduct comprehensive experiments on six widely used benchmark datasets. The experimental results demonstrate that the proposed method effectively mitigates the impact of noise on clustering performance.

理论论述

In Appendix A.2, the authors provide a mathematical proof, which theoretically supports the proposed method and enhances the credibility of the study.

实验设计与分析

In this paper, the authors conducted extensive experiments, including comparative analyses under different noise ratios, comprehensive ablation studies, and sensitivity analysis experiments. Additionally, the methods compared in Tables 1 and 2 are all from 2024, ensuring a fair and up-to-date evaluation.

补充材料

The supplementary materials include related work, experimental results, hyperparameter tables, and more. The comprehensive supplementary materials provide the support for the findings presented in the paper.

与现有文献的关系

Compared to previous studies, this paper proposes a more effective approach to handling noisy data. The experimental results further validate this conclusion.

遗漏的重要参考文献

The comparison algorithms in the paper are primarily from 2024, incorporating the latest research methods.

其他优缺点

S: I. The paper investigates novel methods to mitigate the impact of noise on models, which is a practical area of research. II. From the submitted version, it is evident that the authors provide theoretical analysis and conduct extensive experiments. III. The proposed method is clearly described, making it easy to follow.

W: I. The authors have conducted detailed experimental validation; however, there is a lack of validation regarding the time and space consumption of the proposed method. I recommend that the authors add experiments to address. II. Figure 2 presents the overall framework of the paper. In the upper part of view2, does the dark blue color represent noise? I suggest adding definitions and descriptions of the different colored data in the legend. III. The authors divided the experimental section into four parts, with the fourth part being the sensitivity analysis of the parameters. This part is placed in Appendix A.3.3, but the appendix is labeled as RQ3 instead of RQ4, which needs to be corrected.

其他意见或建议

I. The paper contains a large number of formulas, and the vast majority of definitions and explanations are in accordance with the standards. However, in Equation 8 on page 4, the formula is too large and extends beyond the page. It needs to be adjusted. II. It is recommended to add more experimental details for the visualization experiments in Section 5.4.

作者回复

Experiments of time and space cost: Thanks. Following your suggestion, we conducted time and space complexity experiments on the six used datasets with 10% noisy ratio. Specifically, we measure the training time per epoch for all baselines using seconds as the evaluation metric. The space cost experiments are conducted on an NVIDIA A6000 GPU, measured in gigabytes (GB). The results are presented in Tab.1 and Tab.2. From these results, we observe that the time and space costs of AIRMVC remain within an acceptable range. In summary, AIRMVC demonstrates promising clustering performance while maintaining a reasonable computational cost.

Tab.1 Time cost for AIRMVC.

MethodsBBCSportsWebKBReutersUCI-dightCaltech101STL10Avg.
CANDYNeurIPS 20240.06570.12060.28610.33253.25003.62001.2792
RMCNCTKDE 20240.15360.51480.47850.89623.98005.68001.9505
TGM-MVCACM MM 20240.12060.25460.59310.67524.42006.28052.0573
SCE-MVCNeurIPS 20240.15210.26750.64280.60284.02556.08651.9629
MVCANCVPR 20240.15250.27560.44290.76364.35806.62102.1023
DIVIDEAAAI 20240.07950.15680.35240.33263.13503.62481.2802
AIRMVCOurs0.08250.14860.30580.33903.08003.52001.2460

Tab.2 Space cost for AIRMVC.

MethodsBBCSportsWebKBReutersUCI-dightCaltech101STL10Avg.
CANDYNeurIPS 20241.911.792.112.212.363.032.24
RMCNCTKDE 20241.882.551.962.162.462.992.33
TGM-MVCACM MM 20241.471.642.062.202.542.352.04
SCE-MVCNeurIPS 20241.571.722.302.262.702.452.17
MVCANCVPR 20241.561.571.661.281.441.471.50
DIVIDEAAAI 20242.021.782.052.192.362.972.23
AIRMVCOurs1.601.631.711.341.551.481.55

Explanation for symbol in Fig.2: Thanks. In Fig. 2(a), (b), and (c), the dark blue color represents noisy data. Following your suggestion, we will provide a more detailed explanation in the final version.

Typos & Format: Thanks. Following your suggestion, we will revise RQ3 and Eq.8 in the final version and review similar issues to enhance the overall presentation.

Details for visualization experiments: Thanks. We visualized the latent space features extracted by the encoder from the UCI-Digit dataset using the t-SNE algorithm. The visualization was performed every 20 epochs for the first 200 epochs. The experiments were conducted on an NVIDIA A6000 platform. We will provide additional descriptions in the future.

最终决定

This paper investigates robust multi-view clustering in noisy scenarios. The reviewers acknowledge some strengths:

1)The research addresses an important problem in the field. 2)A novel robust deep contrastive method for noisy multi-view clustering is proposed. 3)The paper provides comprehensive theoretical analysis and sufficient experimental validation.

Following rebuttal and discussion, all reviewers unanimously recommend acceptance.