/10

Poster4 位审稿人

最低3最高3标准差0.0

ICML 2025

Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets

Wei Liu,Zhongyu Niu,Lang Gao,Zhiying Deng,Jun Wang,Haozhao Wang,Ruixuan Li

OpenReview PDF

提交: 2025-01-24更新: 2025-08-16

摘要

关键词

Unsupervised rationale extractionselective rationalizationself-explaininginterpretability

评审与讨论

审稿意见

评分: 32025-03-11

This paper proposes TeaRGIB to enhance the robustness of dynamic graph representation learning methods by using the Information Bottleneck (IB) principle to reduce the redundancy and noise in dynamic graphs. TeaRGIB first employs von Neumann entropy (VNGE) to model the evolution of dynamic graphs. And it decomposes the dynamic graph learning problem into learning compact structure and representation given the previous temporal graphs. Experimental results show that the proposed method enjoys good link prediction accuracy and robust to several types of noise.

给作者的问题

See above.

论据与证据

The reviewer is confused about the motivation of the structural evolution loss. The author needs to elaborate on why von Neumann entropy (VNGE) could capture the structural evolution. And what is the motivation of aligning von Neumann entropy (VNGE) with the mutual information? Is there any theoretical justification for this alignment?

方法与评估标准

The decomposition of the dynamic graph information bottleneck objective in Section 4.1 follows GIB [1] and utilizes the local dependency assumption, which is similar to Dynamic GIB [2]. The author needs to discuss the difference between Dynamic GIB and the proposed method to highlight the novelty.

[1]. Graph Information Bottleneck. NeurIPS 2020. [2]. Dynamic Graph Information Bottleneck. WWW2024.

理论论述

The reviewer checks most of the proofs, and they seem to be good. But further theoretical justification for structural evolution in Section 4.2 is necessary.

实验设计与分析

The proposed TeaRGIB only adopts the Graph Attention Network as the backbone network. The generalization ability of TeaRGIB to other (dynamic) graph neural network architectures is questionable.

补充材料

I checked the proofs and the abilation study.

与现有文献的关系

The key contribution is to extend the Graph Information Bottleneck principle to the dynamic graph learning domain. The proposed method could enhance the robustness of dynamic graph learning models. However, the proposed method is similar to Dynamic GIB as pointed out in the Methods And Evaluation Criteria Part. The author is encouraged to further discuss the difference between the two work, or the contribution would be limited.

遗漏的重要参考文献

[1]. Dynamic Graph Information Bottleneck. WWW 2024. [2]. TempME: Towards the Explainability of Temporal Graph Neural Networks via Motif Discovery. NeurIPS 2024.

其他优缺点

Please see above.

其他意见或建议

The author is encouraged to address the concerns in the above parts.

作者回复

2025-03-31

Thank you for your review. But it seems that you may have mistakenly pasted review comments intended for another paper here. So, we will wait for your correction.

审稿意见

评分: 32025-03-12

This paper examines the phenomenon of spurious correlations induced by the cooperative rationalization approach, where a generator selects informative subsets of data as rationales for predictors. It reveals that even clean datasets can develop unintended spurious correlations due to the cooperative game between generator and predictor. The paper introduces an adversarial inspection and instructional mechanism to identify and mitigate these correlations, achieving significant empirical improvements on various text and graph classification tasks.

给作者的问题

See weaknesses.

论据与证据

The paper mentioned "such a cooperative game could unintentionally introduce a sampling bias during rationale extraction", which can be a very interesting finding. However, the claim seems to broad and the authors did not present comprehensive insights into how this can generalize to a wider range of different scenarios. I would suggest to narrow down the scope of the claim of this paper to the focused areas, including text-based and graph-based ones.

方法与评估标准

Yes

理论论述

They appear to be generally correct and the intuition makes sense to me as well.

实验设计与分析

They appear to be carefully designed and I do not have major complaints about the experimental designs and analyses.

补充材料

Yes most of the appendix.

与现有文献的关系

This paper is properly positioned as a part of the broader scientific literature with clarifications about the relationship between itself and other works.

遗漏的重要参考文献

N/A

其他优缺点

Strengths :

1 - This paper highlights a previously underexplored source of spurious correlations introduced by cooperative rationalization, even in clean datasets.

2 - It also proposes a practical adversarial inspection and instructional method to mitigate identified biases effectively.

3 - The paper demonstrates the effectiveness of the proposed approach across diverse datasets and multiple model architectures (GRUs, BERT, GCN).

Weaknesses:

1 - This paper involves broad claims that may not hold in the domain not discussed in this paper.

2 - I did not see much analysis about complexity and running time. How does the proposed method scale to large datasets? Extra adversarial inspections and instructions could significantly increase training time and computational resources.

3 - Although the paper provides theoretical explanations, it does not thoroughly explore conditions under which the adversarial intervention might fail or succeed universally.

其他意见或建议

Typos exist, e.g., Figure ?? in Appendix B.3.

作者回复

2025-03-31

Thank you for taking the time to carefully review our work and provide constructive feedback.

Claims&Wekaness1. The claim seems to broad.

A1. Thanks a lot for the valuable suggestion. We will narrow down the scope to the text and graph domains to ensure rigor.

We'd like to kindly clarify that we have mentioned rationalization is mainly used for NLP (the right part of L93–94). Although our experiments are conducted on text and graph data, the theoretical analysis in Sec4 is based on abstract variables and is not restricted to a specific scenario. Could you kindly clarify your concern in more detail regarding why you believe the analysis may not generalize to other scenarios? This would help us address your concern more effectively.

We believe such correlations may also arise in the image domain. For example, consider a cat-vs-dog classification dataset in which every original image contains a green pixel. If the generator selects a subset of pixels as the rationale, it may (during this selection process) consistently include the green pixel as part of the rationale for all images labeled 1, while excluding it from images labeled 0. This kind of sampling bias would introduce a spurious correlation.

However, our focus in this paper is solely on explaining why such situations can arise (i.e., identifying a potential risk or vulnerability). Assessing how likely this is to occur in real-world settings is somewhat beyond the scope of this paper. We appreciate your suggestion and will clarify and narrow down the scope in our revision.

Weakness2. Complexity and running time.

A2. Thanks for the valuable suggestion. Below is a comparison between the complexity of vanilla RNP and RNP+A2I (referred to as A2I for short). In general, A2I introduces an additional attacker module, and as a result, A2I has approximately 1.5 times the number of parameters of vanilla RNP. However, we would like to note that introducing extra modules is common practice in the field of rationalization. For example, our A2I model has a comparable number of parameters to Inter-RAT, a previously proposed variant of RNP.

Below are the practical time and memory consumption of RNP and A2I on the Beer-Appearance dataset (we add the results of Inter-RAT as a reference).

Beer-Appearance	batch size	lr	epochs	memory (MB)	RTX3090 hours
Inter-RAT	256	0.001	20	3660	2.34
RNP	128	0.0001	200	2184	0.37
A2I	128	0.0001	200	2784	0.61
A2I	128	0.0001	100	2784	0.31
A2I	256	-	-	3664	-

Per epoch, RNP consumes only about 60% of the training time compared to RNP+A2I. However, A2I converges significantly faster than vanilla RNP and typically requires fewer training epochs (about 50% on most datasets). Since memory usage is influenced by batch size, we further assign A2I a batch size of 256 (the same as Inter-RAT). Under this setting, the memory usage is 3664 MB, which is comparable to that of Inter-RAT.

Regarding the second question, "How does the proposed method scale to large datasets?", we are not entirely sure about the specific concern. Are you referring to datasets with more samples, or datasets with longer text inputs? If you mean the former, we believe the table in the Appendix addresses this concern: the largest dataset we use (Hotel-Cleanliness) contains 150k text samples. If you mean the latter, we follow one of our baseline Inter-RAT to test our A2I on the Movie dataset, which has much longer texts (the average length is about 775, and we set the max length to be 1024). The results are as follows (the results of other baselines are copied from Inter-RAT):

Movie	S	P	R	F1
RNP	20.0	35.6	21.1	24.1
A2R	20.0	48.7	31.9	34.9
INVRAT	20.0	33.9	24.3	28.3
Inter-RAT	20.0	35.7	35.8	35.7
A2I	20.6	45.0	31.9	37.1

Our A2I still significantly outperforms RNP in terms of F1 score on the long-text dataset.

Weakness3. It does not thoroughly explore conditions under which the adversarial intervention might fail or succeed universally.

A3. We are sorry, but we are not very sure about your specific concern. Could you please specify it with more details?

We would like to clarify that our analysis already covers both sides of cooperative rationalization and how our attack succeeds in each case. Specifically, in the first part of Sec 4.3 (L241–316), we discuss how our attacker can correct the predictor and generator when they select the wrong rationales. Then, from L318 to the end of Sec4.3, we discuss how, when the predictor and generator select the correct rationales, our attacker does not introduce any negative impact. We think these two cases together cover both sides of the issue.

Typos. Thank you very much for the careful reading, it refers to Fig.3. We will fix it in our revision._

审稿意见

评分: 32025-03-14

The paper examines the unintended biases in self-rationalizing models, where a generator selects key input segments for a predictor. It reveals that cooperative training can introduce spurious correlations even in clean datasets. An adversarial inspection method is proposed to detect and mitigate these biases, improving interpretability and performance across multiple classification tasks.

给作者的问题

n/a

论据与证据

The paper claims that cooperative rationalization can introduce spurious correlations even in clean datasets. It shows this through a theoretical analysis demonstrating how the generator’s selection process can create dependencies between extracted rationales and labels. Empirical evidence supports this claim by showing that predictors trained on randomly selected rationales still achieve high accuracy, indicating reliance on trivial patterns (Figure 3).

The paper claims that its proposed adversarial method (A2I) can detect and mitigate spurious correlations. It shows this through an attack-based inspection method that successfully identifies trivial patterns (high attack success rate in Figure 6) and an instruction mechanism that reduces reliance on these patterns, leading to improved rationale quality across multiple datasets (Tables 1–4).

方法与评估标准

The proposed methods effectively address spurious correlations in rationalization frameworks using adversarial inspection and instruction. Evaluation criteria, including benchmark datasets and rationale quality metrics, align well with the problem. The results are strong for both text and graph tasks.

理论论述

yes, in Appendix C

实验设计与分析

The paper compares its method (A2I) with multiple baselines, including RNP, FR, Inter RAT, and NIR The attack success rate is used to as a measurement criteria. The experiments span different types of datasets: text (BeerAdvocate, HotelReview) and graphs (BA2Motifs, GOODMotif). The method is tested with both BERT (a pretrained Transformer) and non-pretrained models like GRUs and GCNs.

补充材料

yes, appendix

与现有文献的关系

The paper builds on Rationalizing Neural Predictions (RNP), a cooperative framework where a generator selects rationales that a predictor then uses for classification.

遗漏的重要参考文献

N/A

其他优缺点

strengths: Unlike prior work that focuses on spurious correlations inherent in datasets, this study uncovers how rationalization frameworks themselves can introduce biases, even in clean data. The paper provides mathematical proofs (Appendix C) and extensive empirical validation across multiple benchmarks (Tables 1–4).

weakness: there is no ablation study on how each of the component of the proposed algorithm contribute to the success the effectiveness of the method seems to rely on the attacker’s ability to identify trivial patterns.

其他意见或建议

n/a

作者回复

2025-03-31

Thank you deeply for taking the time to thoroughly review our paper.

If we understand correctly, the only weakness you mentioned is about the lack of ablation study.

Q1. There is no ablation study on how each of the component of the proposed algorithm contribute to the success the effectiveness of the method seems to rely on the attacker’s ability to identify trivial patterns.

A1. We think this is a misunderstanding. Compared to the standard rationalization framework RNP, our A2I introduces only one extra component (the attacker). So, the results of RNP can directly serve as the ablation study of our RNP+A2I. Also, the results of FR can serve as the ablation study of our FR+A2I.

And we have designed a special experiment to verify the effectiveness of the attacker. The attack success rate of RNP in Figure 6 implies that our attacker can successfully identify the trivial patterns learned by the predictor. And the low attack success rate of our RNP+A2I implies that our attacker-based instruction can effectively deter the predictor from adopting trivial patterns.

审稿意见

评分: 32025-03-14

This paper addresses a crucial issue in rationalization frameworks. They find that even if the original dataset does not have spurious correlations, the cooperative generator-predictor setup can cause spurious correlations that the predictor can exploit. This paper identifies the cause of this issue and proposes an attacker-based method A2I. A2I introduces an attacker which selects trivial patterns to fool the predictor. They empirically show that this introduction of an attacker can significantly improve the performance of rationalization, and furthermore show that the attacker successfully captures the trivial patterns recognized by the predictor and that the inspection prevents the predictor from adopting the trivial patterns.

给作者的问题

In Equation (11), The meaning of [0.5,0.5] is unclear. Does this represent a random variable?

论据与证据

Their claims are convincing.

方法与评估标准

The authors’ attacker-based strategy is well motivated and practically sensible. By introducing an attacker that selects trivial patterns to flip predicted labels, the approach mitigates spurious correlations introduced by the generator–predictor interplay. This design choice is well explained in Section 4.

理论论述

The paper does not present formal theorems or statements in the main part of the paper. I did not review the appendix.

实验设计与分析

Experiment of Figure 3: I am not fully convinced that this experiment provides the evidence that the generator-predictor interaction creates spurious correlation. Here is why: Since the generator is not trained with the ground truth label information, for a given trained g, Z and Y are independent. The independence of g and Z on Y means that (in the notation of equation (5)), P[Y=1|Z=t+, g] = P[Y=1]. Therefore, the orange curve should not include the spurious correlation. This contradicts with the statement in the paper.

补充材料

I did not review the supplementary material.

与现有文献的关系

The authors shed light on the spurious correlations induced by the generator–predictor framework. This has a sharp difference from those embedded in the dataset, which has been discussed in prior works.

遗漏的重要参考文献

I am not aware of any papers which should be discussed.

其他优缺点

Strength: They perform extensive empirical study to support their idea and the performance of the proposed algorithm. In particular, they examine the performance on multiple text classification datasets , multiple sparsity regimes, and different encoder architectures. Furthermore, the study of attack success rate successfully supports their ideas on the spurious correlations and attackers. Weaknesses: While they claim their algorithm’s performance is comparative to LLama3.1-8b-instruct, the LLM is relatively small, and it is better to compare it with larger models or other models. I cannot judge whether their algorithm is state of the art.

其他意见或建议

The notation “g” is used in the first section without giving definition. In the later section, I noticed that the paper defined it as the generator. It might be better to define it earlier.

Equation (8) can be stated in a more mathematically precise way. I believe the current form does not make sense though I understand what the authors want to state.

作者回复

2025-03-31

We sincerely thank you for dedicating your time and expertise to review our paper. Your insightful comments and suggestions are highly valued and appreciated.

Q1. Experiment of Figure 3: I am not fully convinced that this experiment provides the evidence that the generator-predictor interaction creates spurious correlation. Here is why: Since the generator is not trained with the ground truth label information, for a given trained g, Z and Y are independent. The independence of g and Z on Y means that (in the notation of equation (5)), P[Y=1|Z=t+, g] = P[Y=1]. Therefore, the orange curve should not include the spurious correlation. This contradicts with the statement in the paper.

A1. We are sorry to cause such a misunderstanding. We agree that $g$ is not trained with $Y$ , but we can not say $P[Y=1|Z=t+, g] = P[Y=1]$ . Here is a simple intuitive toy example.

Suppose there is a box containing approximately 200 cat photos and 200 dog photos. The cat images are labeled as 1, and the dog images are labeled as 0. For both cats and dogs, half of the images have a red background and the other half have a green background. In this dataset, background color is considered a trivial pattern and is independent of the label $Y$ .

Now, suppose a "hand" randomly samples 100 cat images and 100 dog images from the box to form a new dataset. Although the hand is a random hand and never uses the labels during sampling, biases may still occur in the specific sample. For example, the sampled cat images might include 50 with red backgrounds, while the sampled dog images might only include 30 with red backgrounds. As a result, under this particular sampling by the hand, background color and label $Y$ are no longer independent.

Returning to rationalization. We consider that in the original dataset, all text samples end with a period. However, one possible scenario is that in the random sample drawn by $g$ , 51% of the positive samples include the period as part of $Z$ , while only 49% of the negative samples include the period in $Z$ . As a result, "whether $Z$ contains a period" is no longer independent of the label $Y$ . This spurious correlation is caused by the sampling bias of $g$ . This is because once the generator, originally a random variable, is instantiated into a specific value $g$ , it inevitably contains bias.

In the example above, the spurious correlation caused by bias may not seem very severe, as we only considered a single trivial pattern. However, given that text should be regarded as a high-dimensional vector, chances are that the accumulation of biases from multiple different trivial patterns can become quite significant.

Q2. While they claim their algorithm’s performance is comparative to LLama3.1-8b-instruct, the LLM is relatively small, and it is better to compare it with larger models or other models. I cannot judge whether their algorithm is state of the art.

A2. We are sorry to say that our RTX3090 and RTX 4090 GPUs cannot afford LoRA fine-tuning for models larger than 8B.

Our methods is SOTA compared to previous rationalization methods. But we acknowledge that our method can not outperform more powerful LLMs like GPT4. However, the training of those LLMs usually involves extensive human alignment, which can be very expensive. Our model is small. Besides, we do not use human-annotated rationales for training. So, our model is much cheaper and can work in places with limited resources.

Despite the issues of SOTA results, our two major contributions still make sense. First, we find a new kind of spurious correlation, which may open up new avenues for future research. Second, we propose a direction to mitigate this issue with attack, which may inspire others to develop better methods.

Q3. In Equation (11), The meaning of [0.5,0.5] is unclear. Does this represent a random variable?

A3. Yes, it represents a random noise. Thanks for the reminder, and we will clarify this in the revision.

Q4. Suggestions about $g$ and Eq.8.

A4. Thank you for your suggestion. We will do it in our revision.

最终决定Accept (poster)

2025-05-01

All reviewers agree that this is a valuable contribution to understanding and eliminating spurious correlations in cooperative rationalization.