I can suggest emphasizing discrimination (cf. GANs) instead of "attacking" across the paper for clarity since no attacking entity is considered in this work. "An attacker" is confusing in the context of actually improving interpretability / robustness.
Figure 4: Using the accuracy metric may be misleading as there is no information regarding the class (im)balance. Please either plot a horizontal line showing the class ratio, comment on it in the figure's caption, or plot F1 etc.
Assumption 1 becomes unrealistic in scenarios with multiple classes. Binary sentiment classification is an oversimplified example.
Overall, I like Section 4.2. But, I disagree with the rationale given in L262–270 regarding the result in Figure 4:

"Does this strange result stem from the fact that the 10% randomly selected patterns already contain enough sentiment inclination for classification? The answer is no. [...] We observe that the green line indicates a significantly lower accuracy (about 58%), implying that the randomly selected patterns contain only minimal sentiment information."

In my opinion, the answer is we don't know:

A) It is probable that the predictor trained using the full texts (green line) itself learns spurious correlations (shortcuts) that are different from these contained in the 10% randomly selected patterns.
B) It is probable that the predictor learns variable interactions, e.g. one important word lies inside the 10%, and another one lies inside the 90%; access to both is required for accurate prediction (rationale).

Thus, the implication seems incorrect.

Other feedback

L36: introduce the abbreviation for "XAI"
L83–85: "This phenomenon then leads to a trust concern: whether the extracted rationale is really responsible for the label in the original dataset. This problem is important because explanations should also be aligned with their social attribution (Jacovi & Goldberg, 2020; 2021)." I can disagree; explanations don't have to be aligned with their social attribution but rather be faithful to the model. Confirmation bias is a real threat to progress in research on interpretability.
RW: Authors might be interested in a very related work: "Post hoc explanations may be ineffective for detecting unknown spurious correlation" ICLR 2022
L92: What is denoted by the letter "g"; generator? It was never introduced.
L127: typo, missing reference
L193: typo, methods constrain
L232: please rephrase "it will sometimes results in some problems."
L242/Fig.3: "a local of the causal graph" sounds odd
Eq. 8: missing spaces next to " & "
L281: use another letter instead of "n", which was used before to denote the number of variables (T_1, ..., T_n)
Figure 5: wrong wording in "Attack to Inspection and Instruction", do you mean "for" or "as"?
L310: "Inspection" seems to be a new concept introduced here, but the paper's Introduction gives no intuition of what it really means "to inspect". Also, "the trivial patterns learned by the predictor can be inspected through attack" sounds like defining "inspection" by using the word "inspect". I like the sentence in L331 explaining that "an attacker can identify uninformative trivial patterns and classify them into the opposite class." and can recommend moving this explanation to the beginning of Sec. 4.3 or even Introduction.
L342: wrong wording in "The situation of a text X contains", do you mean "if", "when", or "containing"?