PaperHub
4.3
/10
Rejected4 位审稿人
最低3最高6标准差1.3
5
3
6
3
4.3
置信度
正确性3.0
贡献度2.3
表达2.5
ICLR 2025

Dual-level Affinity Induced Embedding-free Multi-view Clustering with Joint-alignment

OpenReviewPDF
提交: 2024-09-13更新: 2025-02-05

摘要

关键词
Mulit-view ClusteringLarge-scale ClusteringAnchor Clustering

评审与讨论

审稿意见
5

This paper proposes a dual-level affinity induced embedding-free multi-view clustering method with joint alignment, called DLA-EF-JA. Based on previous anchor based multi-view clustering, it further considers the relations among anchors by learning an affinity matrix that are used to guide the anchor matrix learning with graph Laplacian. The multi-view anchors are adaptively aligned. The discrete cluster indicator is also jointly learned.

优点

  1. This paper is easy to read.
  2. Extensive experiments are conducted to show the effectiveness of the method as well as efficiency.

缺点

  1. The novelty of this paper is incremental. The authors consider the relations among samples by self-expression affinity learning, and add a graph based Laplacian for anchor matrix regularization. However, the self-expression affinity learning and graph based Laplacian are widely used in existing subspace clustering works. It also remains unclear why the anchor self-expression enhances the quality of anchors.

  2. Why learn an anchor affinity matrix SpS_p for each view separately? It seems to overlook inter-view interactions. Why not directly learn a consensus anchor affinity matrix? Will it improve the performance?

  3. How do you set the number of anchors kk? What is the influence of it?

  4. The experimental results are not convincing. For instance, OrthNTF achieves 69.4% Acc and 68.6% NMI values on the Reuters dataset, while this paper only reports 28.67% Acc and 3.07% NMI.

问题

See weaknesses.

评论

Q3: How do you set the number of anchors? What is the influence of it?

A3: Thanks. In experiments, we set the number of anchors to be equal to the number of clusters. The reasons are as follows.

When updating the variable Tp\mathbf{T}_p, the objective function is Tr(TpGpTp(λHp+αp2Mp2λSp)2αp2TpJp)\operatorname{Tr} \left( \mathbf{T} _{p}^{\top} \mathbf{G} _p \mathbf{T} _{p} \left( \lambda \mathbf{H} _{p} + \boldsymbol{\alpha} _p^2 \mathbf{M} _p - 2\lambda \mathbf{S} _{p}^{\top} \right) -2\boldsymbol{\alpha}_p^2 \mathbf{T} _{p}^{\top} \mathbf{J} _{p} \right) , which is the form of ABAC+AD\mathbf{A}^\top \mathbf{B} \mathbf{A} \mathbf{C} + \mathbf{A}^{\top} \mathbf{D}. Besides, the feasible region Tp1=1,Tp1=1,Tp0,1m×m\\{ \mathbf{T}_p^{\top} \mathbf{1}=\mathbf{1}, \mathbf{T}_p \mathbf{1}=\mathbf{1}, \mathbf{T}_p \in \\{0,1\\}^{m \times m} \\} is discrete. These cause this optimization problem being hard to solve. To this end, we adopt traversal searching on one-hot vectors to obtain the optimal solution. The traversal searching operation takes O(m!)\mathcal{O}(m!) computing overhead where mm is the number of anchors. Too large mm will induce intensive time cost. Therefore, in all experiments, we set mm to the number of clusters kk. More genius solving schemes could be further investigated in the future.

Q4: The experimental results are not convincing. For instance, OrthNTF achieves 69.4% Acc and 68.6% NMI values on the Reuters dataset, while this paper only reports 28.67% Acc and 3.07% NMI.

A4: Thanks. We are so sorry for bringing some unnecessary confusion to Reviewer zrPV.

This is mainly due to that the default hyper-parameter settings in the released code are not consistent with that in the paper. We have carefully corrected relevant parameter settings according to the suggestions presented in their paper and re-executed them. Please check the clustering result comparison table. Especially, for OrthNTF, on the dataset Reuters, the anchor number is automatically 100 in their experiments while it is automatically 93 in our experiments. The reason is that the dataset versions adopted in experiments are different. Accordingly, the generated clustering results are different. We execute OrthNTF under anchorRate=[0.1,0.2,0.3,,1.0][0.1, 0.2, 0.3, \cdots, 1.0], p=[0.1,0.2,0.3,,1.0]p=[0.1, 0.2, 0.3,\cdots, 1.0], λ=[0.001,0.005,0.01,0.05,0.1,0.5,1,5,10,50,100,500,1000,5000,10000]\lambda=[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000, 5000, 10000] respectively, and report the highest results.


Thanks very much to Reviewer zrPV for bringing this point to our attention!

评论

Dear reviewer zrPV,

We greatly value your insightful and constructive feedback. We hope that our response has addressed your concerns. If you have any further suggestions or questions, please do not hesitate to share them. We are more than willing to discuss them with you.

We fully understand that you are extremely busy, so would greatly appreciate your time in this process.

Best wishes,

The authors of 195

评论

We very thank Reviewer zrPV's insightful feedback and guidance for the revision of this manuscript. All concerns have been carefully responded point by point. We sincerely hope these issues have been cleared.

Q1: The self-expression affinity learning and graph based Laplacian are widely used in existing subspace clustering works. It remains unclear why the anchor self-expression enhances the quality of anchors.

A1: Thanks. The self-expression affinity learning is utilized to construct the sample-sample affinity with full size in subspace clustering. Inspired by this, we explicitly extract the global structure between anchors via self-expression learning, and meanwhile feed that into anchor-sample so as to better exploit the manifold characteristics hidden within samples. (Kindly note that in this work, we did not calculate the sample-sample relations through self-expression learning.) In addition to this, our work also designs a joint-alignment mechanism which does not involve the selection of the baseline view and meanwhile can cooperate with the learning of anchors. Moreover, a solving scheme with linear complexity enables our framework to effectively tackle MVC tasks.


Anchor self-expression learning can help extract the geometric characteristics between anchors, and meanwhile facilities the learning of anchors owing to the joint-optimization mechanism. To validate this point, we organize four groups of ablation experiments, i.e., No self-expression + No leanring (NSNL), No self-expression + Having learning (NSHL), Having self-expression + No learning (HSNL), Having self-expression + Having learning (HSHL, i.e., Ours). The comparison results are reported in the following table.

DatasetDERMATOCALTE7CoraREU7200ReutersCIF10Tra4FasMNI4V
ACC(%)
NSNL60.4243.8827.9614.3325.3219.7341.27
NSHL71.5149.0530.3516.7547.0526.6952.15
HSNL65.6464.5930.2416.6827.2024.0847.21
HSHL85.4780.6652.4426.2254.2626.8357.36
NMI(%)
NSNL64.3735.685.881.011.3812.5744.79
NSHL83.9740.216.022.5323.1915.4858.13
HSNL69.8437.9533.541.061.4312.9847.07
HSHL89.9745.2543.706.2531.8715.6459.21
Fscore(%)
NSNL63.7648.4328.7923.4233.8716.8637.64
NSHL73.7951.2530.4228.5443.0417.7046.77
HSNL69.3361.5430.4024.4335.2518.0341.43
HSHL87.9278.1241.1228.5544.8420.6451.37

As seen, HSNL is consistently preferable than NSNL, and HSHL is consistently preferable than NSHL. These demonstrate that the anchor self-expression can facilitate the clustering performance improvement. Additionally, NSHL is consistently preferable than NSNL, and HSHL is consistently preferable than HSNL. These illustrate that the anchor learning can help increase the clustering results. Therefore, we can conclude that the anchor self-expression learning can enhance the quality of anchors to increase the clustering results.

评论

Q2: Why learn an anchor affinity matrix Sp\mathbf{S}_p for each view separately? It seems to overlook inter-view interactions. Why not directly learn a consensus anchor affinity matrix? Will it improve the performance?

A2: Thanks. The motivation for doing so is that each view typically owns exclusive characteristics and learning an anchor affinity matrix for each view could better exploit the features belonging to each view itself.


About the inter-view interactions, all anchor-anchor and anchor-sample affinities on views can communicate with each other via the shared cluster indicator matrix C\mathbf{C} owing to the joint-optimization mechanism. (Kindly note that Ls\mathbf{L_s} consists of the anchor-anchor affinity Sp\mathbf{S}_p).


Directly learning a consensus anchor affinity matrix for all views could omit some informative features of certain views. Of course, this is also a flexible scheme for MVC. We conduct some experiments to validate the clustering performance under this situation, as shown in the following table where 'CAA' denotes the results based on the consensus anchor affinity. (We do not include the variance terms since their variances are all zero because of the embedding-free property.)

DatasetDERMATOCALTE7CoraREU7200ReutersCIF10Tra4FasMNI4V
ACC(%)
CAA81.2371.3349.7324.9748.4622.3653.38
Ours85.4780.6652.4426.2254.2626.8357.36
NMI(%)
CAA82.7642.2639.876.0329.8913.4351.97
Ours89.9745.2543.706.2531.8715.6459.21
Fscore(%)
CAA80.6469.7242.2825.2146.1319.5852.21
Ours87.9278.1241.1228.5544.8420.6451.37

It can be seen that we receive preferable results in most cases.

审稿意见
3

This paper aims to address these problems: (1) they existing methods focus only on the affinity relationship between anchors and samples, while overlooking that between anchors; (2) the cluster order is inconsistent across views and accordingly anchors encounter misalignment issue due to the lack of data labels. The proposed method explicitly exploits the geometric properties between anchors via self-expression learning skill, and utilizes topology learning strategy to feed captured anchor-anchor features into anchor-sample graph so as to explore the manifold structure hidden within samples more adequately. Experiments on multiple publicly available datasets confirm the effectiveness of the proposed method.

优点

(1) The proposed method considers the affinity relationship between anchors. (2) The proposed method devises a joint-alignment mechanism that not only eliminates the need for selecting the baseline view but also coordinates well with the generation of anchors. (3) The proposed method has linear complexity for the loss function.

缺点

(1) The novelty of this work is limited since the involved components have been widely used for anchor learning and spectral clustering. The authors only perform these components on the anchor data. (2) The authors do not compare the proposed method with theses popular deep learning ones.

问题

The authors should check the data since some methods, such as OrthNTF and GSC, since the performance of these new methods is very poor.

评论

We very thank Reviewer 2FNR's profound comments and guidance for the revision of this manuscript. All concerns have been carefully responded point by point. We sincerely hope these issues have been cleared.

Q1: The novelty of this work is limited since the involved components have been widely used for anchor learning and spectral clustering.

A1(1): Thanks. We emphasize that current alignment strategies typically require selecting the baseline view, and also are separated from the anchor generation. Different from them, we learn to align based on the characteristics of respective view itself and meanwhile jointly conduct anchor generation and anchor alignment, which makes them able to negotiate with each other. Besides, we explicitly take into account the geometric properties between (aligned) anchors, and feed that into the anchor-sample affinity to extract the manifold structure hidden within original samples more adequately. Especially, we also give a feasible solving scheme with linear complexity to optimize the resulting objective.

To demonstrate the strengths of our alignment strategy, we organize the experiments compared to some remarkable alignment algorithms like FMVACC [1], 3AMVC [2], AEVC [3].

FMVACC utilizes feature information and structure information of the bipartite graph generated by fixed anchors to build the matching relationship, and regards the first view of each dataset as the baseline view.

3AMVC gets rid of prior knowledge by identifying and selecting discriminative anchors within a single view using hierarchical searching, and takes the view exhibiting the highest anchor graph quality as the baseline view.

AEVC narrows the spatial distribution of anchors on similar views by leveraging the inter-view correlations to enhance the expression ability of anchors, and treats the view concatenated by column as the baseline view.

The comparison results are shown in the following table.

DatasetDERMATOCALTE7CoraREU7200ReutersCIF10Tra4FasMNI4V
ACC(%)
FMVACC74.13(±\pm3.36)53.34(±\pm2.84)47.10(±\pm4.07)22.95(±\pm0.89)42.31(±\pm3.17)25.58(±\pm0.66)55.44(±\pm2.25)
3AMVC62.60(±\pm5.50)38.17(±\pm3.58)44.73(±\pm3.74)33.31(±\pm1.19)49.56(±\pm2.45)26.38(±\pm0.93)57.24(±\pm2.26)
AEVC91.82(±\pm3.78)44.61(±\pm4.31)39.68(±\pm1.36)29.85(±\pm0.02)50.88(±\pm0.24)27.56(±\pm1.11)54.89(±\pm0.63)
Ours85.47(±\pm0.00)80.66(±\pm0.00)52.44(±\pm0.00)26.22(±\pm0.00)54.26(±\pm0.00)26.83(±\pm0.00)57.36(±\pm0.00)
NMI(%)
FMVACC80.78(±\pm4.44)38.41(±\pm2.92)33.50(±\pm2.56)9.94(±\pm1.54)28.50(±\pm2.29)12.86(±\pm0.67)57.82(±\pm0.93)
3AMVC57.43(±\pm3.28)41.45(±\pm4.42)26.63(±\pm3.37)11.68(±\pm2.11)31.03(±\pm1.66)14.02(±\pm0.71)58.61(±\pm1.62)
AEVC86.62(±\pm1.95)49.15(±\pm0.86)17.79(±\pm0.67)6.44(±\pm0.02)24.47(±\pm0.06)13.32(±\pm0.50)53.55(±\pm0.34)
Ours89.97(±\pm0.00)45.25(±\pm0.00)43.70(±\pm0.00)6.25(±\pm0.00)31.87(±\pm0.00)15.64(±\pm0.00)59.21(±\pm0.00)
Fscore(%)
FMVACC80.15(±\pm7.13)41.01(±\pm4.20)38.20(±\pm1.89)23.79(±\pm0.77)43.86(±\pm2.61)17.07(±\pm0.35)48.78(±\pm1.94)
3AMVC56.16(±\pm4.80)38.28(±\pm3.01)32.61(±\pm2.53)27.58(±\pm1.50)41.13(±\pm1.30)17.40(±\pm0.40)47.61(±\pm1.62)
AEVC87.94(±\pm3.70)46.16(±\pm1.37)26.57(±\pm1.24)22.19(±\pm0.01)36.19(±\pm0.63)17.15(±\pm0.27)45.99(±\pm0.53)
Ours87.92(±\pm0.00)78.12(±\pm0.00)41.12(±\pm0.00)28.55(±\pm0.00)44.84(±\pm0.00)20.64(±\pm0.00)51.37(±\pm0.00)

From this table, one can observe that our results are more desirable in most cases, which illustrates that our proposed alignment strategy is more worthy of recommendation.

[1] Wang et al., Align then fusion: Generalized large-scale multi-view clustering with anchor matching correspondences, NeurIPS, 2022.

[2] Ma et al., Automatic and Aligned Anchor Learning Strategy for Multi-View Clustering, ACM MM, 2024.

[3] Liu et atl., Learn from view correlation: An anchor enhancement strategy for multi-view clustering. IEEE CVPR, 2024.

评论

A1(2): Additionally, we also conduct experiments to validate the effectiveness of our alignment strategy, as shown in the following table where 'Wo-A' and 'WA' denote the results without and with involving alignment respectively.

DatasetDERMATOCALTE7CoraREU7200ReutersCIF10Tra4FasMNI4V
ACC(%)
Wo-A80.7376.5931.6516.6745.2925.9153.68
WA85.4780.6652.4426.2254.2626.8357.36
NMI(%)
Wo-A82.5339.5535.413.3224.7715.3056.47
WA89.9745.2543.706.2531.8715.6459.21
Fscore(%)
Wo-A79.4772.2330.6921.1442.5917.9047.41
WA87.9278.1241.1228.5544.8420.6451.37

As seen, our alignment strategy is working and can effectively increase the clustering results.


Further, we also do alignment separately as current methods do, and the comparison results are reported in the following table where 'SA' denotes the results based on separate alignment.

DatasetDERMATOCALTE7CoraREU7200ReutersCIF10Tra4FasMNI4V
ACC(%)
SA72.3762.4345.7822.8744.3622.9851.22
Ours85.4780.6652.4426.2254.2626.8357.36
NMI(%)
SA71.9741.2434.765.7827.3115.7352.73
Ours89.9745.2543.706.2531.8715.6459.21
Fscore(%)
SA70.3859.3233.4725.4639.8417.9646.31
Ours87.9278.1241.1228.5544.8420.6451.37

Evidently, our joint-alignment mechanism makes more impressive clustering results.


Furthermore, we also conduct experiments to demonstrate that the geometric features between anchors are beneficial for the clustering performance improvement. The comparison results are presented in the following table where 'NAA' denotes the results not considering the characteristics between anchors.

DatasetDERMATOCALTE7CoraREU7200ReutersCIF10Tra4FasMNI4V
ACC(%)
NAA71.5149.0530.3516.7547.0526.6952.15
Ours85.4780.6652.4426.2254.2626.8357.36
NMI(%)
NAA83.9740.216.022.5323.1915.4858.13
Ours89.9745.2543.706.2531.8715.6459.21
Fscore(%)
NAA73.7951.2530.4228.5443.0417.7046.77
Ours87.9278.1241.1228.5544.8420.6451.37

It can be seen that our results involving anchor-anchor characteristics are more encouraging.

Q2: The authors do not compare the proposed method with theses popular deep learning ones.

A2(1): Thanks. We organize the comparative experiments with deep learning methods AdaGAE [1], DEMVC [2], MFLVC [3], DSMVC [4].

AdaGAE utilizes a graph auto-encoder to extract the potential high-level information behind data and the non-euclidean structure, and avoids the collapse by building the connections between sub-clusters before they become thoroughly random in the latent space.

DEMVC generates the embedded feature representations by deep auto-encoders, and adopts the auxiliary distribution generated by kk-means to refine the deep auto-encoders and clustering soft assignments for all views.

MFLVC learns different levels of features from the raw features in a fusion-free manner to alleviate the conflict between learning consistent common semantics and reconstructing inconsistent view-private information.

DSMVC concurrently exploits complementary information and discards the mean ingless noise by automatically selecting features to reduce the risk of clustering performance degradation caused by view increase.

 [1] Li et al., Adaptive Graph Auto-Encoder for General Data Clustering, IEEE TPAMI, 2022. 

 [2] Xu et al., Deep Embedded Multi-view Clustering with Collaborative Training,  Information Sciences, 2021

 [3] Xu et al., Multi-Level Feature Learning for Contrastive Multi-View Clustering, IEEE CVPR, 2022. 

 [4] Tang et al., Deep Safe Multi-view Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase,  IEEE CVPR, 2022.
评论

A2(2): The comparison results are reported in the following table.

DatasetDERMATOCALTE7CoraREU7200ReutersCIF10Tra4FasMNI4V
ACC(%)
AdaGAE67.88(±\pm0.99)42.20(±\pm0.94)23.45(±\pm0.29)19.43(±\pm1.76)---
DEMVC40.50(±\pm0.88)54.41(±\pm1.76)30.54(±\pm1.64)24.58(±\pm1.84)53.05(±\pm0.91)26.75(±\pm1.16)51.28(±\pm0.95)
MFLVC80.73(±\pm0.47)43.42(±\pm0.26)31.02(±\pm0.82)25.42(±\pm1.47)---
DSMVC72.35(±\pm0.96)41.66(±\pm1.30)28.88(±\pm0.93)25.04(±\pm0.47)53.66(±\pm0.82)21.28(±\pm0.99)55.33(±\pm0.58)
Ours85.47(±\pm0.00)80.66(±\pm0.00)52.44(±\pm0.00)26.22(±\pm0.00)54.26(±\pm0.00)26.83(±\pm0.00)57.36(±\pm0.00)
NMI(%)
AdaGAE78.47(±\pm0.36)39.28(±\pm0.19)5.23(±\pm0.68)3.22(±\pm0.27)---
DEMVC31.03(±\pm0.11)16.70(±\pm0.64)6.34(±\pm0.30)4.84(±\pm0.90)34.21(±\pm0.35)16.18(±\pm0.90)59.74(±\pm0.91)
MFLVC81.23(±\pm0.10)58.74(±\pm0.15)12.97(±\pm0.14)3.25(±\pm0.90)---
DSMVC76.15(±\pm0.14)36.68(±\pm0.18)8.14(±\pm0.55)4.36(±\pm0.41)35.43(±\pm0.10)8.82(±\pm0.46)55.33(±\pm0.97)
Ours89.97(±\pm0.00)45.25(±\pm0.00)43.70(±\pm0.00)6.25(±\pm0.00)31.87(±\pm0.00)15.64(±\pm0.00)59.21(±\pm0.00)
Fscore(%)
AdaGAE67.74(±\pm0.79)50.51(±\pm0.41)23.68(±\pm0.14)19.61(±\pm1.23)---
DEMVC41.80(±\pm1.04)50.60(±\pm1.59)27.66(±\pm1.52)22.69(±\pm1.14)56.39(±\pm1.68)23.68(±\pm1.57)48.39(±\pm0.66)
MFLVC73.92(±\pm1.63)52.68(±\pm1.43)32.41(±\pm1.05)25.13(±\pm0.67)---
DSMVC73.79(±\pm1.89)51.00(±\pm1.28)30.14(±\pm1.12)25.01(±\pm0.22)56.85(±\pm1.54)21.01(±\pm1.85)55.03(±\pm1.86)
Ours87.92(±\pm0.00)78.12(±\pm0.00)41.12(±\pm0.00)28.55(±\pm0.00)44.84(±\pm0.00)20.64(±\pm0.00)51.37(±\pm0.00)

As seen, even against deep learning methods, our results are still comparable.

Q3: The authors should check the data since some methods, such as OrthNTF and GSC, since the performance of these new methods is very poor.

A3: Thanks. We deeply apologize for causing any confusion to Reviewer 2FNR. We have carefully check these. The reason for this phenomenon is that the default hyper-parameter settings in the released code are inconsistent with that in the paper. We have corrected these parameters according to the guidance presented in their paper and reorganized the comparative experiments. Please check the result comparison table. We very sincerely thank Reviewer 2FNR for pointing this out, which significantly helps us improve the manuscript.

评论

Dear reviewer 2FNR,

We greatly value your insightful and constructive feedback. We hope that our response has addressed your concerns. If you have any further suggestions or questions, please do not hesitate to share them. We are more than willing to discuss them with you.

We fully understand that you are extremely busy, so would greatly appreciate your time in this process.

Best wishes,

The authors of 195

审稿意见
6

This paper presents the DLA-EF-JA model, a multi-view clustering technique that leverages dual-level affinity to capture both anchor-sample and anchor-anchor relationships within data. The model introduces a joint-alignment mechanism to address the anchor misalignment problem across views, which eliminates the need for a baseline view. Unlike traditional embedding methods, DLA-EF-JA generates cluster labels directly, reducing variance and improving clustering stability. Extensive experiments across diverse datasets demonstrate that the proposed model achieves competitive performance compared to existing multi-view clustering methods.

优点

  1. The model’s dual-level affinity mechanism effectively captures both anchor-sample and anchor-anchor relationships, enhancing clustering accuracy by leveraging a fuller view of the data structure.
  2. The flexible joint-alignment method addresses anchor misalignment issues without requiring a fixed baseline view, making the model versatile for clustering data from different sources.
  3. The model's effectiveness is demonstrated through comprehensive evaluation on multiple datasets, highlighting its adaptability and strong performance across different data types and views.

缺点

  1. Limited Learning of Cross-View Complementarity: While the model integrates anchor relations, it lacks complex constraints like the Schatten p-norm that could help capture deeper cross-view complementarities. This may limit the model’s ability to fully leverage unique, complementary information in views with highly distinct features or dimensions. How does the model handle scenarios where the quality of anchors varies significantly across different views?

  2. Necessity of Anchor Alignment: The reliance on anchor alignment to maintain cross-view consistency introduces additional computational steps. Although this approach appears beneficial, some recent multi-view clustering methods successfully avoid alignment through feature space fusion or shared representations. It would be useful for the authors to elaborate on the essential role of anchor alignment in this model and under what conditions it might be adapted or simplified. Are there specific conditions or datasets where the necessity of anchor alignment might be relaxed or modified?

  3. Complexity of the Model: The model is somewhat complex, introducing more variables and mathematical processes. A more detailed explanation of the transition from Equation 2 to Equation 3 would enhance reader understanding of the methodology.

  4. Hyperparameter Tuning Requirement: The model’s performance is sensitive to carefully tuned hyperparameters, such as λ and β. Can the authors provide further insights into the potential effects of anchor noise and how it could be mitigated to improve robustness?

问题

Same as weaknesses section

评论

Q3: The model is somewhat complex, introducing more variables and mathematical processes. A more detailed explanation of the transition from Eq.(2) to Eq.(3) would enhance reader understanding of the methodology.

A3: Good suggestion! We here manage to explain the objective transition as much detail as possible.

The objective function in Eq.(2) is p=1vXpApZpF2+λApApSpF2+βi,j=1m[Zp]i,:[Zp]j,:22[Sp]i,j\sum _{p=1}^v \left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{Z} _{p} \right\|| _F^2 + \lambda \left\|| \mathbf{A} _{p} -\mathbf{A} _{p} \mathbf{S} _{p} \right\||_F^2 + \beta \sum _{i,j=1}^{m} \left\|| [\mathbf{Z} _p] _{i,:} - [\mathbf{Z} _p] _{j,:} \right\|| _2^2 [\mathbf{S} _p] _{i,j} .

The objective function in Eq.(3) is p=1vαp2XpApTpBpCF2+λApTpApTpSpF2+βTr(BpLsBpCC)\sum _{p=1}^v \boldsymbol{\alpha} _p^2 \left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{T} _{p} \mathbf{B} _{p} \mathbf{C} \right\|| _F^2 + \lambda \left\|| \mathbf{A} _{p} \mathbf{T} _p - \mathbf{A} _{p} \mathbf{T} _p \mathbf{S} _{p} \right\|| _F^2 + \beta \operatorname{Tr}( \mathbf{B} _p^{\top} \mathbf{L _s} \mathbf{B} _p \mathbf{C} \mathbf{C} ^{\top} ) .

Firstly, considering that the essence of anchor misalignment is that the order of anchors on different views is not identical, we can eliminate the misalignment problem via re-arranging anchors. Specially, we associate each view with a learnable matrix Tp\mathbf{T}_p to flexibly transform anchors according to the characteristics of respective view itself. (Kindly note that owing to transforming anchors in original view space, this can well preserve the view diversity.) Accordingly, the anchor matrix Ap\mathbf{A}_p on each view is reformulated as ApTp\mathbf{A}_p \mathbf{T}_p, the self-expression affinity learning ApApSpF2\left\|| \mathbf{A} _{p} - \mathbf{A} _{p} \mathbf{S} _{p} \right\||_F^2 becomes ApTpApTpSpF2\left\|| \mathbf{A} _{p} \mathbf{T} _p - \mathbf{A} _{p} \mathbf{T} _p \mathbf{S} _{p} \right\||_F^2, and the reconstruction error item XpApZpF2 \left\||\mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{Z} _{p} \right\||_F^2 becomes XpApTpZpF2\left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{T} _p \mathbf{Z} _{p} \right\|| _F^2.

Then, due to variance arising from the construction of embedding, we avoid forming embedding, and choose to directly learn the cluster indicators. We factorize the anchor graph as a basic coefficient matrix and a consensus matrix, and utilize binary learning to optimize the consensus matrix. Therefore, we have that the reconstruction error item XpApTpZpF2\left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{T} _p \mathbf{Z} _{p} \right\|| _F^2 is reformulated as XpApTpBpCF2 \left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{T} _p \mathbf{B} _{p} \mathbf{C} \right\|| _F^2, and the point-point guidance i,j=1m[Zp]i,:[Zp]j,:22[Sp]i,j\sum _{i,j=1}^{m} \left\|| [\mathbf{Z} _p] _{i,:} - [\mathbf{Z} _p] _{j,:} \right\|| _2^2 [\mathbf{S} _p] _{i,j} is reformulated as i,j=1m[BpC]i,:[BpC]j,:22[Sp]i,j\sum _{i,j=1}^{m} \left\|| [\mathbf{B} _p \mathbf{C}] _{i,:} - [\mathbf{B} _p \mathbf{C}] _{j,:} \right\|| _2^2 [\mathbf{S} _p] _{i,j}, which can be equivalently written as the matrix trace form of Tr(BpLsBpCC)\operatorname{Tr}(\mathbf{B} _p^{\top} \mathbf{L _s} \mathbf{B} _p \mathbf{C} \mathbf{C}^{\top}). Ls=DpSp\mathbf{L_s} = \mathbf{D}_p - \mathbf{S}_p, Dp=diag(j=1m[Sp]i,j i=1,,m)\mathbf{D} _p = diag(\sum _{j=1}^{m} [\mathbf{S} _p] _{i,j}~| i=1,\cdots, m). (Kindly note that the consensus cluster indicator matrix C\mathbf{C} provides a common structure for anchors on all views, inducing them rearranging towards the common structure.)

Finally, considering that views typically have different levels of importance, we introduce a learnable weighting variable for each view to automatically measure its contributions. Therefore, we have that p=1vXpApTpBpCF2\sum _{p=1}^{v} \left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{T} _p \mathbf{B} _{p} \mathbf{C} \right\||_F^2 is reformulated as p=1vαp2XpApTpBpCF2\sum _{p=1}^{v} \boldsymbol{\alpha} _p^2 \left\||\mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{T} _p \mathbf{B} _{p} \mathbf{C} \right\||_F^2.

At this point, we can obtain the objective function like Eq.(3).

For the feasible region, Tp1=1,Tp1=1,Tp0,1m×m \\{ \mathbf{T} _p^{\top} \mathbf{1}=\mathbf{1}, \mathbf{T} _p \mathbf{1}=\mathbf{1}, \mathbf{T} _p \in \\{0,1\\}^{m \times m} \\} denotes only re-arranging anchors and does not change the anchor values. Sp1=1,Sp0,i=1m[Sp]i,i=0\\{ \mathbf{S} _{p}^{\top} \mathbf{1}=\mathbf{1}, \mathbf{S} _{p} \geq 0, \sum _{i=1}^{m} [\mathbf{S} _{p}] _{i,i}=0 \\} denotes expressing oneself using other anchors and meanwhile avoids expressing oneself using oneself. {BpBp=Ik}\{\mathbf{B} _p^{\top} \mathbf{B} _p = \mathbf{I} _k \} denotes learning discriminative basic coefficients. i=1kCi,j=1,j=1,2,,n,C0,1k×n\\{ \sum _{i=1}^{k} \mathbf{C} _{i,j} =1, j= {1, 2, \dots, n}, \mathbf{C} \in \\{0,1\\}^{k \times n} \\} denotes that each column has only one non-zero element, that is, a sample belongs to only one cluster. α1=1,α0\\{ \boldsymbol{\alpha} ^{\top} \mathbf{1} = 1, \boldsymbol{\alpha} \geq 0 \\} denotes normalizing and meanwhile avoids trivial solutions.

评论

Q4: The model’s performance is sensitive to carefully tuned hyperparameters, such as λ\lambda and β\beta. Can the authors provide further insights into the potential effects of anchor noise and how it could be mitigated to improve robustness?

A4: Thanks. These parameters control critical trade-offs in the model. λ\lambda governs the balance between reconstruction loss and anchor self-expression regularization, while β\beta influences inter-view consistency. Mis-tuning these parameters could lead to sub-optimal performance or instability, especially when noisy anchors are introduced.


Anchor noise could bring the following potential effects,

  • it might induce the model to capture the noise patterns instead of the underlying true relationships in the data, leading to over-fitting.

  • it might cause the loss surface to become irregular, complicating optimization and making the choice of hyper-parameters λ\lambda and β\beta even more crucial.

  • it might bias the model towards spurious correlations, reducing the ability to generalize to unseen data.

  • it might distort the guidance provided to the model, leading to incorrect representations in subsequent optimization.


To mitigate anchor noise, some possible schemes are as follows,

  • introduce a pre-filtering mechanism or utilize confidence-based thresholds to exclude potentially noisy anchors during learning.

  • adopt multiple anchor sets from different initialization or sampling strategies and combine their outputs to mitigate individual noise effects.

  • carefully pre-process the datasets to identify and remove instances of anchor noise using the outlier detection techniques.

  • incorporate some prior knowledge to adaptively tune λ\lambda and β\beta during learning based on the noise levels.

评论

Dear reviewer 4XPe,

We greatly value your insightful and constructive feedback. We hope that our response has addressed your concerns. If you have any further suggestions or questions, please do not hesitate to share them. We are more than willing to discuss them with you.

We fully understand that you are extremely busy, so would greatly appreciate your time in this process.

Best wishes,

The authors of 195

评论

We very thank Reviewer 4XPe's thoughtful suggestions and guidance for the revision of this manuscript. All concerns have been carefully responded point by point. We sincerely hope these issues have been cleared.

Q1: It lacks complex constraints like the Schatten p-norm that could help capture deeper cross-view complementarities. How does the model handle scenarios where the quality of anchors varies significantly across different views?

A1: Thanks! This is a very promising research direction. The tensor Schatten p-norm is commonly regarded as a good means to exploit the complementary information between views, like [1], [2], [3], [4], [5], [6], etc. Our model at present dose not involve the Schatten p-norm, and adopts the shared cluster indicator matrix to capture view complementary information. Including the Schatten p-norm could help further enhance the clustering ability of our model. We will pay efforts to explore this in the future. We sincerely appreciate the Reviewer 4XPe for his/he fairly constructive suggestions!

About handling the scenarios where the quality of anchors varies significantly across different views, in this work, we utilize a learnable view-related weighting scheme to adaptively adjust the contributions of each view so as to balance the importance of anchors on the view. Perhaps, the anchor-wise weighting scheme will be more advisable since anchors on the same one view also could own diverse importance. We will further investigate this in the future. Thanks Reviewer 4XPe for bringing this to our attention.

Q2: The reliance on anchor alignment to maintain cross-view consistency introduces additional computational steps. Some methods avoid alignment through feature space fusion or shared representations. It would be useful for the authors to elaborate on the essential role of anchor alignment and under what conditions it might be adapted or simplified. Are there specific conditions or datasets where the necessity of anchor alignment might be relaxed or modified?

A2(1): Thanks. The anchor alignment indeed introduces additional computational steps due to the need for optimizing the permutation variables.


The works based on feature space fusion or shared representations usually extract a group of unified anchors rather than multiple groups of view-specific anchors to construct the similarity relationship. Although avoiding alignment, this paradigm could not effectively exploit complementary information between views due to the unified anchors being shared for all views. To further illustrate this point, we organize the comparison experiments between unified anchors (UA) and view-specific anchors (VSA, i.e., ours). The results are presented in the following table.

DatasetDERMATOCALTE7CoraREU7200ReutersCIF10Tra4FasMNI4V
ACC(%)
UA81.2371.3349.7324.9748.4622.3653.38
VSA85.4780.6652.4426.2254.2626.8357.36
NMI(%)
UA82.7642.2639.876.0329.8913.4351.97
VSA89.9745.2543.706.2531.8715.6459.21
Fscore(%)
UA80.6469.7242.2825.2146.1319.5852.21
VSA87.9278.1241.1228.5544.8420.6451.37

As seen, our results are more preferable than the counterparts adopting unified anchors, the reason of which could be that the view-exclusive complementary representation information (view-specific (aligned) anchors contain) outweighs the view-common consensus representation information (unified anchors contain).

[1] Guo et al., Logarithmic Schatten-p  Norm Minimization for Tensorial Multi-View Subspace Clustering, IEEE TPAMI, 2023.

[2] Xia et al., Tensorized Bipartite Graph Learning for Multi-view Clustering, IEEE TPAMI, 2023.

[3] Feng et al., Federated Fuzzy C-means with Schatten-p Norm Minimization, ACM MM, 2024.

[4] Li et al., Label Learning Method Based on Tensor Projection, ACM KDD, 2024. 

[5] Sun et al., Improved Weighted Tensor Schatten p-Norm for Fast Multi-view Graph Clustering, ACM MM, 2024. 

[6] Wang et al., Bi-Nuclear Tensor Schatten-p Norm Minimization for Multi-View Subspace Clustering, IEEE TIP, 2023.
评论

A2(2): About the role of anchor alignment, in this model, it aims at rearranging anchors to build pure self-expression affinities. If not aligning, the structure of generated anchor-anchor affinity on each view will be chaotic, and accordingly will deteriorate the anchor-sample relationship, hindering the clustering performance. To validate this point, we conduct the comparison experiments without alignment, as shown in the following table where 'NA' denotes the results based on no-alignment.

DatasetDERMATOCALTE7CoraREU7200ReutersCIF10Tra4FasMNI4V
ACC(%)
NA80.7376.5931.6516.6745.2925.9153.68
Ours85.4780.6652.4426.2254.2626.8357.36
NMI(%)
NA82.5339.5535.413.3224.7715.3056.47
Ours89.9745.2543.706.2531.8715.6459.21
Fscore(%)
NA79.4772.2330.6921.1442.5917.9047.41
Ours87.9278.1241.1228.5544.8420.6451.37

It can be observed that our results containing alignment evidently outperform these without alignment.


About the specific conditions or datasets where the necessity of anchor alignment might be relaxed or modified, considering the fact that multi-view data typically contains complementary view information and consensus view information and thereby more detailedly describes the instances, if the consensus information outweighs the complementary information, we could build unified anchors as previous methods do to construct the similarity and thereby avoid/relax alignment. However, due to the complexity of multi-view data, how to effectively measure the complementary information and consensus information is a challenging task. In the future, we will try to do this from the perspective of information theory.

审稿意见
3

In this work, a multi-view clustering method with joint anchor alignment was developed, which introduces dual-level affinity and achieves embedding-free clustering. The work is designed to address several problems due to the anchor misalignment issues. Therefore, the authors introduce a permutation mechanism for each view to jointly adjust the anchors. Besides, the method is free of learning the embedding by constructing the cluster labels directly from original samples. A self-expression learning structure is utilized on the anchors, which utilizes topology learning strategy to feed captured anchor-anchor features into anchor-sample graph. Extensive experiments validate the effectiveness of the proposed method.

优点

  1. The paper is well-structured, and the authors conduct a relatively comprehensive review on existing literatures.
  2. The experimental results demonstrate the effectiveness of the work

缺点

  1. A core idea of the work is to introduce an anchor permutation matrix, while this idea has been widely adopted by previous works. Hence, the novelty of the paper might not be sufficient to be published.
  2. The comparison methods lack some latest works. Since the work is an anchor alignment based method, more related works with anchor alignment should be compared. For example, the reference Liu 2024 (in line 581) was discussed in this paper, which includes anchor alignment mechanism, but it is not compared with the proposed work.
  3. In Table 1, several compared methods exhibit extremely poor performance on some datasets (e.g., PMSC on Cora, AMGL on DeRMATO). It might be better if the authors could explain the possible reasons.
  4. Table 5 does not include all the symbols. The Methodology section might be too brief, which should be introduced with more details by explaining the reasons for the design of each component.

问题

  1. What is the difference between the anchor alignment module with those of existing works?
  2. Why do some compared methods exhibit extremely poor performance on some datasets?
评论

A4(2): Subsequently, considering that the variance arises from the construction of embedding, we avoid forming embedding and choose to directly learn the cluster indicators. Specially, we factorize the anchor graph as a basic coefficient matrix Bp\mathbf{B}_p and a consensus matrix C\mathbf{C}, and utilize binary learning to optimize the consensus matrix. Therefore, we have that the term XpApTpZpF2\left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{T} _{p} \mathbf{Z} _{p} \right\|| _F^2 is reformulated as XpApTpBpCF2 \left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{T} _{p} \mathbf{B} _{p} \mathbf{C} \right\|| _F^2. The point-point guidance term i,j=1m[Zp]i,:[Zp]j,:22[Sp]i,j\sum _{i,j=1}^{m} \left\|| [\mathbf{Z} _p] _{i,:} - [\mathbf{Z} _p] _{j,:} \right\|| _2^2 [\mathbf{S} _p] _{i,j} is reformulated as i,j=1m[BpC]i,:[BpC]j,:22[Sp]i,j\sum _{i,j=1}^{m} \left\|| [\mathbf{B} _p \mathbf{C}] _{i,:} - [\mathbf{B} _p \mathbf{C}] _{j,:} \right\|| _2^2 [\mathbf{S} _p] _{i,j}, which can be equivalently transformed as the matrix trace form of Tr(BpLsBpCC)\operatorname{Tr}(\mathbf{B}_p ^{\top} \mathbf{L _s} \mathbf{B} _p \mathbf{C} \mathbf{C}^{\top}). Ls=DpSp\mathbf{L _s} = \mathbf{D} _p - \mathbf{S} _p, Dp=diag(j=1m[Sp]i,j i=1,,m)\mathbf{D} _p = diag( \sum _{j=1}^{m} [\mathbf{S} _p] _{i,j}~| i=1,\dots, m ). This paradigm not only makes the consensus matrix C\mathbf{C} successfully represent the cluster indicators, but also provides a common structure for anchors on all views, inducing them rearranging towards the corresponding matching relationship.

At the last, due to views generally having different levels of importance, we assign a weighting variable to each view to adaptively adjust its contributions. Accordingly, the term XpApTpBpCF2 \left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{T} _{p} \mathbf{B} _{p} \mathbf{C} \right\||_F^2 is further reformulated as αp2XpApTpBpCF2 \boldsymbol{\alpha} _p^2 \left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{T} _{p} \mathbf{B} _{p} \mathbf{C} \right\|| _F^2.

Based on the above analysis, we have that the objective is formulated as p=1vαp2XpApTpBpCF2+λApTpApTpSpF2+βTr(BpLsBpCC)\sum _{p=1}^v \boldsymbol{\alpha} _p^2 \left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{T} _{p} \mathbf{B} _{p} \mathbf{C} \right\|| _F^2 + \lambda \left\|| \mathbf{A} _{p} \mathbf{T} _p - \mathbf{A} _{p} \mathbf{T} _p \mathbf{S} _{p} \right\|| _F^2 + \beta \operatorname{Tr}( \mathbf{B} _p^{\top} \mathbf{L_s} \mathbf{B} _p \mathbf{C} \mathbf{C}^{\top} ).

The first item aims at building the similarity via minimizing the reconstruction error. The second item represents the self-expression affinity of aligned anchors. The third item plays a role in feeding anchor-anchor characteristics into anchor-sample.

Further, the constraints α1=1 \boldsymbol{\alpha} ^{\top} \mathbf{1} = 1 and α0 \boldsymbol{\alpha} \geq 0 aim at doing normalization and meanwhile avoid trivial solutions. {BpBp=Ik}\{ \mathbf{B} _p^{\top} \mathbf{B} _p = \mathbf{I} _k \} aims at learning discriminative basic coefficients. Tp1=1,Tp1=1,Tp0,1m×m \\{ \mathbf{T} _p^{\top} \mathbf{1}=\mathbf{1}, \mathbf{T} _p \mathbf{1} = \mathbf{1}, \mathbf{T} _p \in \\{0,1\\} ^{m \times m} \\} aims at rearranging anchors and meanwhile guarantees not to change the values of anchors. i=1kCi,j=1,j=1,2,,n,C0,1k×n \\{ \sum _{i=1}^{k} \mathbf{C} _{i,j} =1, j= {1, 2, \dots, n}, \mathbf{C} \in \\{0,1\\}^{k \times n} \\} guarantees that there is only one non-zero element in each column, i.e., one sample belongs to only one cluster. Sp1=1,Sp0,i=1m[Sp]i,i=0\\{ \mathbf{S} _{p}^{\top} \mathbf{1}=\mathbf{1}, \mathbf{S} _{p} \geq 0, \sum _{i=1}^{m} [\mathbf{S} _{p}] _{i,i}=0 \\} guarantees expressing oneself through other anchors while avoiding using oneself to express oneself.

Q5: What is the difference between the anchor alignment module with those of existing works?

A5: Thanks. The main differences are as follows,

  • We do not require selecting the baseline view.

  • We can coordinate with the generation of anchors.

The selection of baseline view not only brings complicated solving procedure but also affects the clustering performance. If the baseline view is not well selected, the graph structure will be inaccurately fused. Unlike this, we do not require the baseline view, and can automatically rearrange anchors according to respective view characteristics.

Besides, we also can coordinate with anchors in the unified framework and thereby facilitate the learning of anchors, which makes view information interact across different levels.

评论

Dear reviewer Ccq1,

We greatly value your insightful and constructive feedback. We hope that our response has addressed your concerns. If you have any further suggestions or questions, please do not hesitate to share them. We are more than willing to discuss them with you.

We fully understand that you are extremely busy, so would greatly appreciate your time in this process.

Best wishes,

The authors of 195

评论

Q4: Table 5 does not include all the symbols. The Methodology section might be too brief, which should be introduced with more details by explaining the reasons for the design of each component.

A4(1): Thanks. We have updated Table 5, please check it.

The symbols in this manuscript are as follows,

SymbolMeaning
nnthe number of samples
mmthe number of anchors
vvthe number of views
kkthe number of clusters
dpd_pthe data dimension on view pp
XpRdp×n\mathbf{X}_p \in \mathbb{R}^{d_p \times n}the data matrix on view pp
ApRdp×m\mathbf{A}_p \in \mathbb{R}^{d_p \times m}the anchor matrix on view pp
TpRm×m\mathbf{T}_p \in \mathbb{R}^{m \times m}the permutation matrix on view pp
BpRm×k\mathbf{B}_p \in \mathbb{R}^{m \times k}the basic coefficient matrix on view pp
CRk×n\mathbf{C} \in \mathbb{R}^{k \times n}the cluster indicator matrix
SpRm×m\mathbf{S}_p \in \mathbb{R}^{m \times m}the anchor self-expression matrix on view pp
DpRm×m\mathbf{D}_p \in \mathbb{R}^{m \times m}the degree matrix of Sp\mathbf{S}_p on view pp
αRv×1\boldsymbol{\alpha} \in \mathbb{R}^{v \times 1}the view weighting vector
ZpRm×n\mathbf{Z}_p \in \mathbb{R}^{m \times n}the anchor graph on view pp
LsRm×m\mathbf{L_s} \in \mathbb{R}^{m \times m}the Laplacian matrix about Sp\mathbf{S}_p
EpRm×n\mathbf{E}_p \in \mathbb{R}^{m \times n}TpBpC\mathbf{T} _{p} \mathbf{B} _{p} \mathbf{C}
FpRm×m\mathbf{F}_p \in \mathbb{R}^{m \times m}TpTpSp\mathbf{T} _{p} - \mathbf{T} _{p} \mathbf{S} _{p}
GpRm×m\mathbf{G}_p \in \mathbb{R}^{m \times m}ApAp\mathbf{A}_p^{\top} \mathbf{A}_p
HpRm×m\mathbf{H}_p \in \mathbb{R}^{m \times m}SpSp\mathbf{S}_p \mathbf{S}_p^{\top}
MpRm×m \mathbf{M}_p \in \mathbb{R}^{m \times m}BpCCBp\mathbf{B} _{p} \mathbf{C} \mathbf{C}^{\top} \mathbf{B} _{p}^{\top}
JpRm×m\mathbf{J}_p \in \mathbb{R}^{m \times m}ApXpCBp\mathbf{A}_p^{\top} \mathbf{X}_p \mathbf{C}^{\top} \mathbf{B}_p^{\top}
QpRm×m\mathbf{Q}_p \in \mathbb{R}^{m \times m}TpApApTp\mathbf{T} _{p}^{\top}\mathbf{A} _{p}^{\top} \mathbf{A} _{p} \mathbf{T} _{p}
ZRn×k\mathbf{Z} \in \mathbb{R}^{n \times k}2p=1vαp2XpApTpBp2\sum _{p=1}^v \boldsymbol{\alpha} _p^2 \mathbf{X} _p^{\top} \mathbf{A} _{p} \mathbf{T} _{p} \mathbf{B} _{p}
WRk×k\mathbf{W} \in \mathbb{R}^{k \times k}p=1vαp2BpTpApApTpBp+βBpLsBp\sum _{p=1}^v \boldsymbol{\alpha} _p^2 \mathbf{B} _{p}^{\top} \mathbf{T} _{p}^{\top} \mathbf{A} _{p}^{\top} \mathbf{A} _{p} \mathbf{T} _{p} \mathbf{B} _{p} + \beta \mathbf{B} _p^{\top} \mathbf{L_s} \mathbf{B} _p

For the methodology, we here provide more details to explain the reasons for the design of each component.

First of all, to exploit the geometric characteristics between anchors, inspired by the concept of subspace reconstruction, we introduce self-expression learning for anchors. Specially, we utilize the paradigm ApApSpF2\left\|| \mathbf{A}_p -\mathbf{A}_p \mathbf{S}_p \right\||_F^2 to explicitly extract the global structure between anchors.

After obtaining the anchor-anchor characteristic SpRm×m\mathbf{S}_p \in \mathbb{R}^{m \times m}, we need to integrate that into anchor-sample so as to exploit the manifold features inside samples. To this end, we adopt the idea of point-point guidance to adjust the anchor graph. Note that the rows of anchor graph ZpRm×n\mathbf{Z}_p \in \mathbb{R}^{m \times n} correspond to anchors, and thus we utilize the element [Sp]i,j[\mathbf{S} _p] _{i,j} to guide [Zp]i,t[\mathbf{Z} _p] _{i,t} and [Zp]j,t[\mathbf{Z} _p] _{j,t}, t=1,nt=1, \cdots n, which can be formulated as i,j=1m[Zp]i,:[Zp]j,:22[Sp]i,j\sum _{i,j=1}^{m} \left\|| [\mathbf{Z} _p] _{i,:} - [\mathbf{Z} _p] _{j,:} \right\|| _2^2 [\mathbf{S} _p] _{i,j} and aims at restricting similar features to maintain the consistency.

Then, to alleviate the anchor misalignment, considering that the nature of misalignment is that the order of anchors on different views is not identical, we alleviate the misalignment issue by rearranging anchors. Particularly, we associate each view with a learnable permutation matrix Tp\mathbf{T} _p to freely transform anchors in the original dimension space according to the characteristics of respective view. In addition to not involving selecting the baseline view, our mechanism also can coordinate with anchors in the unified framework and thereby facilitates the learning of anchors. Correspondingly, the anchor matrix Ap\mathbf{A}_p is reformulated as ApTp\mathbf{A}_p \mathbf{T}_p. The self-expression term ApApSpF2\left\|| \mathbf{A} _p -\mathbf{A} _p \mathbf{S} _p \right\||_F^2 and the reconstruction term XpApZpF2\left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{Z} _{p} \right\||_F^2 are reformulated as ApTpApTpSpF2\left\|| \mathbf{A} _p \mathbf{T} _p -\mathbf{A} _p \mathbf{T} _p \mathbf{S} _p \right\|| _F^2 and XpApTpZpF2\left\|| \mathbf{X} _{p} - \mathbf{A} _{p} \mathbf{T} _{p} \mathbf{Z} _{p} \right\|| _F^2, respectively.

评论

A1(2): Besides, we also conduct separate-alignment (SA) as current methods do, and the comparison results are summarized in the following table. (We omit the variance items since they are all zero.)

DatasetDERMATOCALTE7CoraREU7200ReutersCIF10Tra4FasMNI4V
ACC(%)
SA72.3762.4345.7822.8744.3622.9851.22
Ours85.4780.6652.4426.2254.2626.8357.36
NMI(%)
SA71.9741.2434.765.7827.3115.7352.73
Ours89.9745.2543.706.2531.8715.6459.21
Fscore(%)
SA70.3859.3233.4725.4639.8417.9646.31
Ours87.9278.1241.1228.5544.8420.6451.37

Evidently, our joint-alignment receives preferable results in most cases.

Q2: The comparison methods lack some latest works. For example, the reference Liu 2024 (in line 581) was discussed in this paper, which includes anchor alignment mechanism, but it is not compared with the proposed work.

A2: Thanks. We organize some new comparison experiments in which [1], [2] and [3] are all based on anchor alignment methods. ([3] is the reference Liu 2024 mentioned.)

Ref [1] utilizes feature information and structure information of the bipartite graph generated by fixed anchors to build the matching relationship, and regards the first view of each dataset as the baseline view.

Ref [2] gets rid of prior knowledge by identifying and selecting discriminative anchors within a single view using hierarchical searching, and takes the view exhibiting the highest anchor graph quality as the baseline view.

Ref [3] narrows the spatial distribution of anchors on similar views by leveraging the inter-view correlations to enhance the expression ability of anchors, and treats the view concatenated by column as the baseline view.

Please see the above table in A1(1) for experimental results.

Q3: In Table 1, several compared methods exhibit extremely poor performance on some datasets. It might be better if the authors could explain the possible reasons.

A3: Thanks. We sincerely apologize for causing the confusion to Reviewer Ccq1. The reasons are that the default parameter settings in the released code are inconsistent with that in the paper. We have carefully corrected these according to the guidance in their paper, and reorganized the comparison experiments. Please check it. We deeply appreciate Reviewer Ccq1 for reminding us this, which significantly helps us improve the quality of this manuscript.

Especially, PMSC, AMGL and MLRSSC still express inferior performance in certain scenarios, the reasons of which could be that PMSC reaches the consensus clustering under the premise of the basic partition realizing the ground truth and meanwhile equally treats every view, the factors generated by the cluster indicator with orthogonal constraints in AMGL impair the discriminability of some graphs, and MLRSSC linearly combines the generated representation matrices and only utilizes truncation operation to determine the penalty parameters.

[1] Wang et al., Align then fusion: Generalized large-scale multi-view clustering with anchor matching correspondences, NeurIPS, 2022. 

[2] Ma et al., Automatic and Aligned Anchor Learning Strategy for Multi-View Clustering, ACM MM, 2024. 

[3] Liu et atl.,  Learn from view correlation: An anchor enhancement strategy for multi-view clustering. IEEE CVPR, 2024
评论

We very thank Reviewer Ccq1's constructive comments and guidance for the revision of this manuscript. All concerns have been carefully responded point by point. We sincerely hope these issues have been cleared.

Q1: A core idea of the work is to introduce an anchor permutation matrix, while this idea has been widely adopted by previous works.

A1(1): Thanks. Current permutation strategies generally require selecting the baseline view, such as [1], [2], [3]. Moreover, the anchor generation, the anchor transformation and the graph construction are separated from each other, which hinders the interaction of view information across different levels. Unlike them, in this work, we associate a learnable permutation for each view to freely rearrange anchors during their original space, successfully unify anchor generation and anchor transformation as well as graph construction within one common framework, and meanwhile provide a feasible solving solution with linear complexity. Owing to the joint-alignment mechanism, we do not involve selecting the baseline view, and meanwhile anchors can be permuted automatically according to respective view characteristics. Also, this paradigm can coordinate with the learning of anchors.

Particularly, we conduct some experiments against [1], [2] and [3] to demonstrate the advantages of our alignment mechanism, as shown in the following table.

DatasetDERMATOCALTE7CoraREU7200ReutersCIF10Tra4FasMNI4V
ACC(%)
Ref [1]74.13(±\pm3.36)53.34(±\pm2.84)47.10(±\pm4.07)22.95(±\pm0.89)42.31(±\pm3.17)25.58(±\pm0.66)55.44(±\pm2.25)
Ref [2]62.60(±\pm5.50)38.17(±\pm3.58)44.73(±\pm3.74)33.31(±\pm1.19)49.56(±\pm2.45)26.38(±\pm0.93)57.24(±\pm2.26)
Ref [3]91.82(±\pm3.78)44.61(±\pm4.31)39.68(±\pm1.36)29.85(±\pm0.02)50.88(±\pm0.24)27.56(±\pm1.11)54.89(±\pm0.63)
Ours85.47(±\pm0.00)80.66(±\pm0.00)52.44(±\pm0.00)26.22(±\pm0.00)54.26(±\pm0.00)26.83(±\pm0.00)57.36(±\pm0.00)
NMI(%)
Ref [1]80.78(±\pm4.44)38.41(±\pm2.92)33.50(±\pm2.56)9.94(±\pm1.54)28.50(±\pm2.29)12.86(±\pm0.67)57.82(±\pm0.93)
Ref [2]57.43(±\pm3.28)41.45(±\pm4.42)26.63(±\pm3.37)11.68(±\pm2.11)31.03(±\pm1.66)14.02(±\pm0.71)58.61(±\pm1.62)
Ref [3]86.62(±\pm1.95)49.15(±\pm0.86)17.79(±\pm0.67)6.44(±\pm0.02)24.47(±\pm0.06)13.32(±\pm0.50)53.55(±\pm0.34)
Ours89.97(±\pm0.00)45.25(±\pm0.00)43.70(±\pm0.00)6.25(±\pm0.00)31.87(±\pm0.00)15.64(±\pm0.00)59.21(±\pm0.00)
Fscore(%)
Ref [1]80.15(±\pm7.13)41.01(±\pm4.20)38.20(±\pm1.89)23.79(±\pm0.77)43.86(±\pm2.61)17.07(±\pm0.35)48.78(±\pm1.94)
Ref [2]56.16(±\pm4.80)38.28(±\pm3.01)32.61(±\pm2.53)27.58(±\pm1.50)41.13(±\pm1.30)17.40(±\pm0.40)47.61(±\pm1.62)
Ref [3]87.94(±\pm3.70)46.16(±\pm1.37)26.57(±\pm1.24)22.19(±\pm0.01)36.19(±\pm0.63)17.15(±\pm0.27)45.99(±\pm0.53)
Ours87.92(±\pm0.00)78.12(±\pm0.00)41.12(±\pm0.00)28.55(±\pm0.00)44.84(±\pm0.00)20.64(±\pm0.00)51.37(±\pm0.00)

As seen, our results are more desirable in most cases.

[1] Wang et al., Align then fusion: Generalized large-scale multi-view clustering with anchor matching correspondences, NeurIPS, 2022. 

[2] Ma et al., Automatic and Aligned Anchor Learning Strategy for Multi-View Clustering, ACM MM, 2024. 

[3] Liu et atl.,  Learn from view correlation: An anchor enhancement strategy for multi-view clustering. IEEE CVPR, 2024.
评论

Dear SAC, AC and Reviewers,

We sincerely appreciate your precious time and profound comments. Your expertise and thorough evaluation have greatly enhanced the quality and clarity of our research. The constructive criticism and thoughtful suggestions have been instrumental in strengthening our work and refining our ideas!

In this work, we devise a joint-alignment mechanism, which does not require selecting the baseline view as current methods do and also can coordinate with the generation of anchors, to alleviate the anchor mismatching issue. It flexibly rearranges anchors in their original dimension space according to the characteristics of respective view itself, and makes view information able to interact across different levels.

Moreover, we explicitly take into account the geometric characteristics between (aligned) anchors, and successfully feed them into the anchor-sample affinity to exploit the manifold structure hidden within original samples more sufficiently.

Further, we directly learn the consensus cluster indicators that bride all anchors, permutations and views. This not only gathers multi-view information at the cluster-label level, but provides common structure for anchors on different views to induce them rearranging towards correct-aligning direction.

Meanwhile, a feasible solving scheme with linear complexity enables our model to effectively and efficiently work.

In addition to these, we also organized some new experiments against latest alignment methods and deep learning methods. The comparison results demonstrate that our proposed method provides preferable clustering performance.

Thanks once again for your invaluable contributions and support throughout the review process. Please don't hesitate to contact us if nay questions.

Best wishes,

The authors of 195

AC 元评审

The paper introduces DLA-EF-JA, a multi-view clustering method designed to address challenges such as anchor misalignment, instability, and overlooked anchor relationships. By leveraging self-expression and topology learning, the method explores underlying data structures and constructs cluster labels directly using a binary strategy to enhance stability.

The reviewers provided mixed feedback, raising several concerns, including: (1) the work lacks sufficient novelty; (2) the absence of popular deep learning methods as baselines, which raises questions about the reliability of experimental results; (3) limited improvement in performance compared to existing methods. Based on these considerations, the paper is not recommended for acceptance at this time.

审稿人讨论附加意见

During the rebuttal period, the reviewers' opinions remained unchanged.

最终决定

Reject