PaperHub
5.5
/10
Rejected4 位审稿人
最低5最高6标准差0.5
6
5
5
6
3.5
置信度
正确性2.5
贡献度2.5
表达2.8
ICLR 2025

Navigating Conflicting Views: Harnessing Trust for Learning

OpenReviewPDF
提交: 2024-09-26更新: 2025-02-05
TL;DR

Trust Discounting based Trust Fusion for Conflict Evidential Multi-view Classification

摘要

关键词
Multi-view LearningConflict Multi-view LearningReliable Multi-view Learning

评审与讨论

审稿意见
6

The paper introduces a novel approach to resolving conflicts between the various views in a multi-view learning scenario. The authors propose a computational, trust-based discounting method that enhances the existing Evidential Multi-view framework when the various views conflict with each other. The paper also introduces a new metric, Multi-View Agreement with Ground Truth, for measuring the reliability of the predictions.

优点

The paper introduces a novel approach to improving learning in the multi-view scenario, which has many real-world applications, from self-driving cars to medical diagnosis. The work appears to be original, and its empirical evaluation shows that it may have a significant practical impact. The paper's quality and clarity are the focus of the next section.

缺点

The main weakness of the paper the overall quality and clarity of the presentation: in trying to turn a 22-page paper into a 12-page one without removing/compressing any content, the authors made some choices/trade-offs that negatively impact the paper's readability. In its current form, after a reasonable Abstract and Introduction, the paper goes straight into 3.5-page theory section, followed by related work (which should precede it) and then the experimental section. The main contributions of the paper (Algorithms 1 & 2, and even the definition of the novel metric, MVAGT) are relegated to the 10-page APPENDICES, which also include many other details.

In this reviewer's opinion, the paper's quality & readability would be greatly improved by moving most of Section 2 to the APPENDICES. The new structure would be: Introduction, Related Work, Intuitive end-2-end running example (which will re-user some parts of the current Section 2), Algorithms & Metrics, and Experiments. In this form, the authors will have all the space required to INTUITIVELY explain the key concepts and to compare & contrast with existing approaches. Even though this requires adding a new section, it is worth doing, as it will greatly impact the paper's impact & penetration.

For the Related Work section:

  • 1st pgf: please explain intuitively (i) why don't noise-robust representation "fully resolve conflicts", and (ii) the main ideas behind cluster-based, self-representation-based, and partially view-alligned approaches
  • 2nd pgf: please compare-and-contrast TMC vs your approach
  • 3rd pgf: please explain intuitively why (i) CMVC "may inadvertently lead incorrect predictions converge to incorrect ones", and (ii) why this is NOT the case for your approach

For the new "Intuitive end-2-end running example" section:

  • please modify the title/term of TDTF, as it is confusing because it uses the word TRUST twice in a 5-gram; may change to "Trust Fusion based on Trust Discounting" or to something else that is easy to parse & understand.
  • the new section could be based on the existing Titanic domain, but it should have at least three examples/instances: (i) one in which there is no conflict, (ii) one in which your approach solves the conflict, and (iii) one in which your approach fails to solve the conflict. If (ii) and (iii) have more then one major use-case, you should add one example for each of them
  • Table 1 is a good start for a single example, but it should be expanded to all the examples mentioned in the bullet above. It should also explain (rather than rely on the reader to deduce) how the top part of the table is created from Figure 1 (one sentence would suffice), and how the bottom part is computed.
  • please provide an intuitive explanation of BCF and how it is applied to the expanded version of Table 1 (which will include 3+ examples)
  • please do NOT intermix the BCF story with A-CBF and ABF. If it is important to compare-and-contrast, do it at the end of this new section or add it to the related work section (in which case it should follow rather than precede this new section). Otherwise, relegate it to the Appendix.
  • please explain how the values in the to-be-expanded Table 2 (to 3+ examples) are computed
  • you should apply algorithms 1 & 2 to the examples in tables 1 & 2, and discus the results

问题

  1. lines 178-179: the last row in Table 1 is Polar Bear rather then Dolphin; also, please explain where is the trust score of 0.95 coming from?

  2. line 408: please explain the statement "Caltech101 is a dataset with fewer conflicts ...." How did you quantify it? Please be specific about the performance gap that you are are talking about. Also, please quantify the conflicts in each of the six datasets, then analyze the results (eg, how does this "measure of conflicts" explain/correlate-with the results on other datasets?)

  3. line 423: In Table 4, the performance gap between ETF and F-Mode is a lot larger on CUB compared to Caltech101. What is the explication of this gap? Also add an intuitive explanation of why does ETF massively outperforms F-Mode on CUB in Table 3, but situation is flipped in Table 4.

  4. Overall, you should choose the most important results in Section 4 and discuss them in depth (for example, AUROC & ABLATION could easily be moved to the APPENDICES)

  5. It would be interesting to discuss (intuitively & in-depth) why, for Handwritten, all accuracies are above 99%, while Fleiss' Kappa and MVAGT are spread over a much wider range. Please provide/discuss/analyze specific hypotheses for explaining this phenomenon; if necessary, run additional experiments to confirm/infirm them. Feel free to use "the 5-WHYs rule" that requires you to repeatedly ask WHY? until you identify the true root-cause(s).

伦理问题详情

N/A

评论

Dear Reviewer gZnp,

Thank you for your valuable feedback. We are carefully reviewing your suggestions and are working on preparing a detailed response. We appreciate your patience and look forward to addressing each of your points thoroughly.

审稿意见
5

This paper aims to address conflicting predictions among different views through a method based on computational trust discounting. Specifically, computational trust is modeled through an evidence network that operates on a view-specific and instance-wise basis. Extensive experiments on six real-world datasets show that the proposed method achieves better performance.

优点

  1. Addressing conflicts among different views is crucial for enhancing model reliability.
  2. Empirical and theoretical analyses provide an evaluation of the proposed method’s effectiveness.

缺点

Major Issues

Method

  1. Proposition 2 introduces an important point: the prediction uncertainty for conflicting views is higher than that obtained through traditional evidence-based fusion, which can help flag incorrect predictions in multi-view fusion with conflicting views. However, there are some issues. In equation (7), the authors use KL divergence as a regularization term for incorrect evidence. During optimization, particularly on the feature-based datasets used in this study, this term may overlook the influence of correct predictions. This often drives evidence for all classes towards zero, resulting in a high uncertainty score even for trusted predictions. The slight increase in uncertainty (as shown in Table 2) does little to improve the identification of incorrect predictions.

  2. If I understand correctly, this work aims to assess the trustworthiness of each view’s prediction using referral networks. When distrust is indicated, it implies a conflict with the ground truth, rather than with other views. How can the method theoretically guarantee an improvement in classification performance when dealing with conflicting views, especially if the referral networks themselves have poor performance?

  3. Based on the points above, I would like to discuss with the authors how these two propositions theoretically guarantee the effectiveness of the proposed method.

Experiments

The experimental analysis needs to be more comprehensive. Specifically:

  1. In Table 2, the results show that the proposed approach enhances the reliability of evidence-based fusion methods (i.e., BCF, A-CBF, and ABF). However, the experiments lack a direct comparison of these methods with and without the proposed approach in identical settings.

  2. The selection of comparison methods is limited. It is recommended to include additional evidence-based multi-view methods, such as TMDOA [1], CCML [2], and TMNR [3].

  3. All six datasets used are feature-based. To introduce more challenging tasks, consider adding multi-view datasets with text or image data. Additionally, incorporating real-world cases could visually demonstrate conflicts among different views and showcase the effectiveness of this method.

Clarifications

  1. Line 322 states: “However, this approach may inadvertently lead correct predictions to converge towards incorrect ones, potentially jeopardizing model stability.” Please provide support for this point.

Minor Issues

  1. In Table 1, the trust score for Dolphin (referral) is 0.90, but Line 179 states, “we know the trust score for Dolphin’s opinion is 0.95.” Please clarify this discrepancy.

[1] Liu W, Yue X, Chen Y, et al. Trusted multi-view deep learning with opinion aggregation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2022, 36(7): 7585-7593.

[2] Liu Y, Liu L, Xu C, et al. Dynamic Evidence Decoupling for Trusted Multi-view Learning[J]. arXiv preprint arXiv:2410.03796, 2024.

[3] Xu C, Zhang Y, Guan Z, et al. Trusted Multi-view Learning with Label Noise[J]. arXiv preprint arXiv:2404.11944, 2024.

问题

See in the weakness.

评论

Dear Reviewer rYTz,

Thank you for your valuable feedback. We are carefully reviewing your suggestions and are working on preparing a detailed response. We appreciate your patience and look forward to addressing each of your points thoroughly.

审稿意见
5

The paper presents a trust-based discounting method to improve multi-view classification (MVC) by addressing conflicts between different views. It extends the Evidential Multi-view framework using subjective logic, allowing for more reliable predictions. The approach involves an instance-wise, probability-sensitive trust discounting mechanism to evaluate and integrate view-specific predictions. The method is validated on six real-world datasets, showing improved performance in scenarios with significant discrepancies among views. Key contributions include a trust-discounting mechanism, a stage-wise training strategy, and comprehensive experimental validation demonstrating enhanced conflict resolution and prediction reliability.

优点

Originality

The paper introduces a trust-based discounting mechanism that innovatively integrates subjective logic with instance-wise probability-sensitive trust discounting to enhance the reliability of multi-view predictions. This approach offers a perspective in the field of multi-view learning, especially in dealing with conflicting information from various views.

Quality

The paper conducts robust experiments on six real-world datasets and uses a comprehensive set of evaluation metrics, including a new metric for multi-view agreement with ground truth, to validate the method's effectiveness and generalizability.

Clarity

The paper is organized with explanations of technical concepts that are generally clear. The use of figures and tables to support the key points is adequate.

Significance

The paper addresses conflicting information in multi-view classification and improve the reliability and accuracy of predictions, with broad applicability in the field of multi-view learning.

缺点

  1. The paper presents an incremental innovation in the field of multi-view learning by introducing a trust-based discounting mechanism that leverages subjective logic for uncertainty estimation. While the approach offers a novel perspective, the methodological contribution appears to be primarily theoretical, with the actual implementation relying on existing techniques such as the loss functions from the Trusted Multi-View Classification. This suggests that the paper may not offer substantial improvements over current methods in terms of loss function optimization.

  2. In terms of experimental validation, while the paper does provide a comprehensive set of evaluation metrics and tests the method on six real-world datasets, there is room for expansion. The datasets used may not fully capture the challenges associated with uncertainty in multi-view learning. For example, Caltech101 dataset, which has the highest number of categories among the tested datasets, also happens to be the one where your model underperforms. This raises a concern about whether the model may struggle with datasets that have a large number of categories. It would be beneficial to include larger and more diverse datasets to better reflect the complexities and uncertainties inherent in multi-view classification tasks. This would provide a more robust assessment of the method's effectiveness and its ability to handle increased data volume and the number of views.

  3. The motivation behind the paper is somewhat broad, and it does not clearly differentiate itself from existing methods that also address view conflicts, such as ECML in comparative methods. The paper could benefit from a more detailed discussion on how its approach differs from and potentially outperforms these methods, especially in scenarios with a large number of categories or high-volume data.

  4. The paper lacks a detailed description of the view-specific neural network's architecture and implementation, which is crucial for understanding the model's performance and reliability. Without this information, the reproducibility of the method is compromised, and it is difficult to assess the specific contributions of the proposed method to the overall performance.

  5. The ablation study is limited to hyperparameter tuning and does not provide a thorough analysis of the key components of the proposed method. A more comprehensive ablation study that systematically removes or modifies key components would provide clearer insights into their individual contributions to the model's performance. This could include varying the trust discounting factor or the influence of the evidence network on the final predictions.

问题

The paper does not provide information on the number of views in each of the datasets used. How does the trust discounting mechanism perform as the volume of data and the number of views increase?

Furthermore, the Caltech101 dataset in the appendix, which has the highest number of categories among the tested datasets, also happens to be the one where your model underperforms. This raises a concern about whether the model may struggle with datasets that have a large number of categories.

The others can see the weakness part

评论

Dear Reviewer AgTK,

Thank you for your valuable feedback. We are carefully reviewing your suggestions and are working on preparing a detailed response. We appreciate your patience and look forward to addressing each of your points thoroughly.

审稿意见
6

Not all views can be trusted to the same degree in the field of Multi-View Classification. Trust Discounting-Based Trust Fusion (TDTF) is proposed to model computational trust, emphasizing trustable views in each instance. This trust discounting mechanism aims to prioritize views that are more likely to align with the ground truth while discounting views with greater uncertainty or variable accuracy.

The method is tested across a variety of datasets and compare it to several baseline models. In addition to traditional MVC methods, TDTF is compared with baselines that incorporate static and weighted fusion techniques to asses its relative performance.

The results show TDTF based methods outperforming baselines in most scenarios. An ablation study also examines the effects of warm-up epochs on stabilizing trust scores, and the paper emphasizes the importance and necessity of these warm-up epochs.

优点

The paper is well-written, clearly setting up the problem and explaining the original proposed method, with mathematical justifications provided. The approach introduces a dynamic weighting mechanism that adjusts the influence of each view based on the instance, a relevant adaptation in MVC contexts where view reliability can vary. This instance-wise adaptability demonstrates originality, improving upon to static weighting systems commonly used in MVC. The method performs consistently well against several baselines, underscoring its robustness and flexibility across diverse scenarios. By prioritizing trustworthy views, this approach has significant potential to improve decision-making reliability in settings where the importance of each view shifts with changing conditions. The paper mentions domains such as autonomous driving as examples where this technique could improve accuracy.

缺点

The first weakness that stands out is the placement of the algorithm in the appendix. Understanding the method, as well as details about the training process are deferred to the end of the paper, making comprehension somewhat difficult. The process is key to understanding the paper and its results, and placing it in the appendix is a significant flaw, in my view.

The placement of the related work interrupts the flow of the paper - in my view, the context for MVCs and where this paper's contributions fit should come earlier.

The ablation study, while welcome, feels out of place without the inclusion of the algorithm within the body of the paper. The process loses significant clarity in doing so, and much of the results become harder to parse without it.

The original methods add significant time - with the exception of MGP, the author's methods take significantly longer to train.

The diversity of the instances of the datasets is not mentioned, making it difficult to know how well the trust mechanism works in diverse contexts.

There are minor spelling/grammatical errors, but none that affect understanding.

问题

It's unclear how this work scales in dynamic situations, such as the autonomous driving example cited early on, where there may be a variety of instances that differ significantly. The appendix has characteristics of said datasets, but the paper does not discuss how similar or varied the instances are, apart from a brief reference to Caltech101 having lower conflict. Could such data be provided?

The information that Table 1 provides is not clear. Is there real data being used to generate the table? If not, could you clarify what information the table is meant to convey?

Could you clarify the impact of the warm-up epochs on the overall performance? Based on Figure 3, there appears to be limited improvement at best. If only 1 dataset shows improvement, the inclusion of the ablation study seems suspect. Are there simpler procedures that might be used if warm-up epochs do not significantly enhance results?

Could you provide additional context on how MVAGT complements standard MVC metrics? It's explained in the appendix but I would appreciate further detail.

评论

Dear Reviewer Yd3r,

Thank you for your valuable feedback. We are carefully reviewing your suggestions and are working on preparing a detailed response. We appreciate your patience and look forward to addressing each of your points thoroughly.

AC 元评审

The authors study learning from multiple views. The technical contribution uses an existing fusion method [1] to deal with conflicting views which has also been explored in [2] but it does not really become obvious what the advantage/difference over [2] is. Also the empirical evaluation offers room for improvement. The data is rather standard and may not be well-suited for the studied problem of conflicting views (e.g., in contrast to the validation in [2]). In sum, the paper addresses an interesting and important topic but is not yet matured enough for publication at ICLR.

审稿人讨论附加意见

The authors provided many new results and explanations during the rebuttal which were (in most of the cases) acknowledge by the reviewers.

最终决定

Reject