3.0

/10

Rejected3 位审稿人

最低3最高3标准差0.0

3.7

置信度

正确性2.0

贡献度2.3

表达2.0

ICLR 2025

Feature Level Instance Attribution

Zhiyu Zhu,Jiayu Zhang,Xinyi Zhang,Zhibo Jin,Jiahao Huang,Jianlong Zhou,Fang Chen

OpenReview PDF

提交: 2024-09-19更新: 2025-02-05

摘要

关键词

Interpretabilityattribution

评审与讨论

审稿意见

评分: 3置信度: 42024-10-29

This paper discovers that artificially manipulating the attribution score by modifying samples can significantly intervene in the importance of training samples and produce explainability results at the feature-level during the intervention process. And the authors propose a new method called Feature Level Instance Attribution (FLIA) to locate crucial feature of training data that significantly impact causality. Although the theoretical analysis and experimental results are provided to support three arguments which ensure the core logic of FLIA, direct experimental validation of FLIA's effectiveness is lack. The practical applicability of FLIA requires further explanation and verification.

优点

The paper discovers that TDIA results could be altered by very small perturbations, and these changes could significantly affect the model’s decision-making process without altering the sample’s confidence and proposes a new method called Feature Level Instance Attribution (FLIA).
The author provides rigorous mathematical derivation, introduces the calculation process of IL value, and derives the contribution of samples to IL value at the fine-grained feature level.

缺点

The validation of FLIA totally relies on indirect evidence，so the practical applicability of FLIA requires further explanation or experimental verification.
The experiment results are not convincing enough. In experiment A, the experimental results are not based on repeated experiments, and there was no significance analysis experiment. In experiment B, the evaluation metric used is self-defined and lack a clear formula definition. These makes the experimental results not convincing enough.
The writing of this paper needs to be strengthened. For example, Figures 1 and 2 are referenced in the wrong order. What’s more, the Figure 1 contains a lot of content but lacks detailed explanation, which makes it difficult to understand.

问题

In addition to the weaknesses I have already listed, I have the following questions:

The author claims that the FLIA implements feature-level analysis and is the first study to do so. However, this claim is questionable: in the related work section, existing feature attribution methods also aim to compute the contribution of features of training data to model decisions, so why do these methods fail to achieve fine-grained feature-level analysis? In order to support this argument more clearly, it is recommended that the authors provide a detailed comparison of FLIA with existing feature attribution methods, specifically clarifying the unique aspects of FLIA in analyzing the impact of features of these training data on model decisions, and the limitations of other methods in this context. Such a comparison will help readers better understand the innovation and contributions of FLIA.
In the related work section, the authors introduce a categorization of TDIA methods, stating that they can be divided into retraining-based methods and gradient-based impact analysis methods, with the latter further consisting of both static and dynamic methods. However, upon reviewing this section, I found that the structure does not entirely align with the initial categorization presented. Specifically, the discussions in Sections 2.3 and 2.4 are not clearly linked to the stated categories. I recommend that the authors either revise their initial categorization to explicitly include the methods from Sections 2.3 and 2.4 or clarify how these additional methods relate to the TDIA framework they've outlined.

审稿意见

评分: 3置信度: 32024-11-02

The paper presents a novel approach to analyzing the training data influence of individual instances called the FLIA algorithm. Most notably it allows for understanding the feature values for each instance contributed towards the training data influence value which is not seen in previous works in the area. The paper presents three core arguments regarding how instance level testing data influence analysis values can me modified, methods for computing the IL values, and some experiments supporting the arguments.

优点

The paper is well written with few writing issues, the introduction and related works provides a good high level overview of the task and why it is important. The need for novel feature level understanding of samples is well motivated.

缺点

While I believe the method has merit and is interesting, there are structural problems with the paper in its current state.

Major Issues:

The concept of model confidence in a sample is introduced and used without any explanation but is used as a core piece of evidence supporting argument 1. The confidence difference correlation index is then introduced in the next section supporting argument 2 and its unclear whether these refer to the same metrics. Furthermore, the description for this metric doesn't provide a clear understanding of what it actually does, the sentence on lines 405 to 407 is particularly unclear
The method section covers both prior works and the new proposed FLIA algorithm, it's not made explicitly clear what is original contribution and what are prior works. If the IL term as a product of two gradients is novel, please make this more explicit. In particular, the IL values is introduced in the context of previous works (98-103, 264-266) and the experiments consistently refer to IL values and its unclear when FLIA is actually being evaluated.
I think the core arguments need to be more clearly defined and rigorous, what does it mean to be modified and why is it desirable at all? Additionally the terminology is inconsistent (i.e. Figure 1 uses the term 'altered' but argument 1 uses 'modified').

Minor issues:

The code is difficult to understand, the code is quite dense without a single comment, even in the example notebook being provided.

Overall I think the paper needs to more clearly set out and define the aims, particularly the definitions and evaluation of the core arguments presented.

问题

Were other random seeds tested for the experimental results?
What is the new evaluation metric CDCI, how is it implemented, what are its properties?

审稿意见

评分: 3置信度: 42024-11-05

This paper proposes to combine feature attribution and instance attribution and develops a feature-level instance attribution (FLIA) method. Specifically, the proposed method extends the TracIn method and identifies important features whose perturbation leads to large change in instance-level influence value. The proposed method is evaluated on a set of image classification settings with a variety of evaluation metrics.

优点

This paper investigates an interesting problem that combines feature and instance attribution.
The proposed method is well-motivated.
The authors conducted extensive experiments.

缺点

This paper missed a directly relevant work [Pezeshkpour et al. 2022] that exactly combines feature and instance attribution, which significantly undermines the novelty of this work.
The clarity of the writing could be improved. For example, the arguments in Section 3 and their connection to the "core logic" aren't very clear to me.
The experiment setups, especially experiment A, should be better clarified. Section 4.2 directly talks about the experiment results without a clear description of the experiment setup, making it difficult to interpret the results.

References

[Pezeshkpour et al. 2022] Pouya Pezeshkpour, Sarthak Jain, Sameer Singh, Byron Wallace. Combining Feature and Instance Attribution to Detect Artifacts. Findings of ACL 2022.

问题

See Weaknesses.

AC 元评审

2024-12-21

The paper presents a method to combine feature attribution and instance attribution by perturbing samples to learn better explainability results. The paper demonstrates three experiment results on subselected datasets.

Strengths (based on reviewers' input):

Tackling an important problem with a well-motivated method
Paper includes three experiments and reproducibility code

Weaknesses

Concerns about the paper not engaging sufficiently with related work
Experimental setups were unclear
Experiment analysis does not include a significance calculation
In the methodology section, several concepts are not well explained, which makes the paper's arguments hard to follow.
Separation between contributions and prior work is at times unclear

Because of extensive concerns from the reviewers, I am advocating for a reject. I hope the authors are able to take the feedback and clarify their arguments for a stronger submission in future venues.

审稿人讨论附加意见

The authors did not include a rebuttal response.

最终决定Reject

2025-01-22

Reject