PaperHub
6.0
/10
Poster4 位审稿人
最低5最高7标准差0.7
6
5
6
7
4.0
置信度
COLM 2025

Improving Fisher Information Estimation and Efficiency for LoRA-based LLM Unlearning

OpenReviewPDF
提交: 2025-03-21更新: 2025-08-26
TL;DR

We propose VILA, a scalable and efficient unlearning method for LLMs that addresses FILA's limitations by improving parameter importance estimation and reducing computational overhead.

摘要

关键词
Machine UnlearningLarge Language Model

评审与讨论

审稿意见
6

This paper studies the LLM unlearning with LoRA-based fine-tuning. It identifies two problems for FILA, a LoRA-based unlearning method: the incorrect Fisher information estimation and the high time cost to collect importance estimation. The proposed method, VILA, corrects the Fisher information computation and employs a low-rank approximation for importance map estimation to improve efficiency. The experiment results on three unlearning benchmarks show that it outperforms FILA and has better efficiency.

接收理由

  • This work identifies the incorrect estimation of the previous FILA method and proposes a simple and intuitive method to correct the error and improve efficiency.
  • A comprehensive experiment involving three unlearning benchmarks and a group of different LLMs, which shows much better efficiency compared to FILA.

拒绝理由

  • The experiment design is missing in isolating the two proposed solutions. From my understanding, the proposed two solutions can be separately applied to the baseline FILA. However, there are no clear experimental results showing how each component affects unlearning performance.

给作者的问题

  • Considering the sensitivity of LLM unlearning, it would be better to report the forget performance trajectory along training instead of only selecting one point in the table.
  • Minor:
    • Line 151: we initialize the LoRA or FILA initilize the LoRA ?
    • The title "Overhaul of parameter-efficient knowledge unlearning" may be a bit overstated from my understanding, since this paper mainly targets improving a specific method FILA. However, there are many other parameter-efficient unlearning methods, such as the task vector methods.
评论

We appreciate Reviewer ZX4F's insightful comments, which help clarify and strengthen our paper. Below, we address each point individually.

11 Isolation of Proposed Solutions

"The experiment design is missing in isolating the two proposed solutions... However, there are no clear experimental results showing how each component affects unlearning performance."

We thank Reviewer ZX4F for this insightful comment, which prompts us to conduct additional ablation experiments clearly isolating each proposed solution component:

  • FI Correction (Eq. 5): Correcting the Fisher Information estimation.
  • LoRA Approximation (Eq. 9): Approximating the full-model Fisher Information using LoRA adapter parameters to enhance computational efficiency.

We provide detailed ablation results using the Phi-1.5B model below:

MethodForget 1%Forget 5%Forget 10%Average Gain
GD + FILA-2.17-10.23-13.84
w/ FI Correction-1.27-9.61-9.541.94
w/ LoRA approx.-2.17-9.61-13.540.30
w/ Both (VILA)-1.54-9.61-10.801.43
NPO + FILA-2.17-6.09-8.83
w/ FI Correction-1.85-6.34-5.851.02
w/ LoRA approx.-2.17-6.58-9.30-0.32
w/ Both (VILA)-2.17-5.17-9.300.15
IHL + FILA-2.17-5.40-1.79
w/ FI Correction-1.54-0.85-0.472.17
w/ LoRA approx.-2.17-4.53-0.190.82
w/ Both (VILA)-1.85-1.17-0.831.84

Key observations from these experiments are:

  • FI Correction alone consistently yields the highest unlearning performance across most settings, confirming that correcting Fisher Information significantly improves unlearning effectiveness.
  • LoRA Approximation alone provides efficiency gains but achieves limited or no improvement over the FILA baseline.
  • Our proposed method, VILA (combining both), achieves performance close to the FI Correction alone scenario while substantially reducing computational overhead. Thus, VILA effectively balances high unlearning performance with computational efficiency.

22 Forget Performance Trajectory

"Considering the sensitivity of LLM unlearning, it would be better to report the forget performance trajectory along training instead of only selecting one point."

We fully agree with the reviewer that reporting forget performance trajectories throughout training is crucial for comprehensive evaluation. To address this, we conduct analysis on the TOFU benchmark, explicitly tracking the trajectories of forget quality and model utility throughout the unlearning process. As OpenReview does not support embedding figures directly in review responses, we provide these detailed trajectory plots via the following external link (see Figure 5): Llama2-7B Unlearning Trajectories.

The results indicate that our method (VILA) consistently achieves improved forgetting performance compared to FILA, while notably preserving comparable model utility. While FILA shows comparable forget performance with IHL in some cases, VILA consistently achieves a better balance between forgetting and utility, maintaining higher Forget Quality with minimal loss in Model Utility across diverse settings. Moreover, our method accomplishes this performance at significantly lower computational and storage costs compared to FILA, making VILA particularly advantageous in practical large-scale unlearning scenarios.

33 Minor Comments

  • Line 151 (Initialization Clarification): Thank you for pointing out this ambiguity. Line 151 describes FILA's initialization process. We will explicitly revise this sentence as follows:

"FILA initializes the LoRA matrices B and A using the weighted low-rank approximation (WLRA) objective based on the forget importance map."

  • Title Overstatement: We appreciate this remark. Our primary contribution indeed focuses on addressing critical limitations of FILA. To align the title more closely with the actual contributions of the paper, we propose revising it to:

"Improving Fisher Information Estimation and Efficiency for LoRA-based LLM Unlearning."

This revised title accurately reflects the scope of our work without overstating its claims. We will update the manuscript accordingly.

评论

Dear Reviewer ZX4F,

Thank you very much for your thoughtful comments and for taking the time to review our work. We truly appreciate your feedback and are glad that the additional experiments addressed your concerns.

As suggested, we will incorporate the isolation ablation studies into the final version. Your input has been very helpful in improving the quality and clarity of the paper.

Best regards,
Authors

评论

Thanks for the detailed response and additional experiments. I have raised my rating and kindly request that authors include the isolation experiments in the paper for future versions.

审稿意见
5

The paper proposes a parameter-efficient unlearning method. One key observation is that when the forget set has a different distribution from the full dataset, the estimation of Fisher information should be rectified by considering the score function on the forget set. The method achieves both good speedup and unlearning performances.

接收理由

The experiment results are good. The paper is well-written and easy to follow.

拒绝理由

  • regarding line 173 - 176, the current unlearning benchmarks choose forget subsets different from remain subsets, but the general definition of unlearning doesn't make such assumption (D_f could be arbitrary in line 113), so can we compare FILA and VILA when removing some in-domain data.
  • The right-hand side of Equation 5 doesn't depend on \Delta W. Another question is that when D_f is different from D, the expectation of the score function is no longer zero, can we only take it as an importance score? (e.g., some ablation analyses on Equation 5).
  • It seems that some approximation assumptions in Theorem 1 are not justified. For example, 1) line 416 drops the impact of the initial value of LoRA, so it would be better to add a test that unlearns with zero-initialized A and B rather than the standard (Gaussian, zero) initialization. 2) the notations Var_D in line 417-419 conflicts with Equation 5 which is the variance of the score function. 3) line 423, zero covariance doesn't imply full independence. I understand that we do need some conditions to make things work as expected, but it would also be important to check (empirically) to what extent those assumptions stand and explicitly discuss the limitations of those conditions.
评论

We sincerely thank Reviewer 5shu for providing insightful comments and constructive feedback, which significantly helped us improve the clarity and rigor of our manuscript. Below, we provide clear and structured responses to each comment individually.


11 In-domain Unlearning Scenario

"Regarding line 173–176, the current unlearning benchmarks choose forget subsets different from remain subsets, but the general definition of unlearning doesn't make such assumption (D_fD\_f could be arbitrary in line 113), so can we compare FILA and VILA when removing some in-domain data?"

To address this insightful comment, we conduct additional experiments designed explicitly for the in-domain unlearning scenario:

  • Experimental Setup: In practice, reproducing “D_fD\_f could be arbitrary” assumption would require access to the language model’s entire pretraining corpus, which is not available. Therefore, we approximate the in‐domain setting as follows:
    • Forget‐Set Construction: We randomly select 10% of question-answer pairs from the TOFU dataset to form the forget set, ensuring its distribution closely aligns with the overall dataset.
    • Metric Selection: Because our forget set is randomly sampled, many forgotten Q&A pairs lack perturbed answers, so we cannot compute the Truth Ratio. To overcome this, we evaluate both Model Utility (MU) and Forget Quality (FQ) using ROUGE, Probability score together with the model’s output probabilities.
    • Model Selection: We evaluate both Model Utility (MU) and Forget Quality (FQ) using the harmonic mean of ROUGE and probability scores. Based on this metric, we perform hyperparameter tuning 15 times and subsequently select the final configuration that minimizes FQ while maintaining at least 95% of the original MU.
    • Method: To compare FILA and our method fairly, we compute the importance map without using LoRA and apply only the Fisher Information correction.
    • Models: Phi-1.5B and Llama2-7B

For brevity, we present results only for the IHL setting (Phi-1.5B and Llama2-7B). The experimental results are summarized below:

MethodLlama2-7b MU↑Llama2-7b FQ↓Phi-1.5b MU↑Phi-1.5b FQ↓
IHL0.950.650.880.69
IHL + FILA0.930.500.890.55
IHL + OURS0.940.520.880.57

Key observations:

  • Under this in-domain scenario, FILA and VILA exhibit relatively similar performance, with only minor differences observed. This aligns well with our theoretical expectation, as the primary advantage of VILA arises when there is a distribution mismatch between the forget set and the entire set.
  • When the forget set closely matches the entire set, the benefit of Fisher Information correction diminishes, naturally leading to comparable performance between FILA and VILA, as confirmed by our results.

Additional clarification regarding practical scenarios:

We also emphasize that, in practical applications, the forget set typically does not represent the overall data distribution. Thus, methods explicitly accounting for distributional differences between the forget set and entire data can provide substantial advantages.

We fully acknowledge the reviewer’s valid concern regarding our general definition of unlearning (line 113), which currently reads:

"The goal of unlearning is to effectively eliminate the knowledge associated with a specified forget set mathcalD_f\\mathcal{D}\_f in the LLM, without retraining the model from scratch."

To explicitly reflect this practical nuance, we propose revising line 113 as follows:

"The goal of unlearning is to effectively eliminate the knowledge associated with a specified forget set mathcalD_f\\mathcal{D}\_f in the LLM without retraining from scratch. In practical scenarios, the distribution of the forget set mathcalD_f\\mathcal{D}\_f often differs from that of the entire data."

We sincerely appreciate Reviewer 5shu’s insightful feedback, which helps us better clarify the conditions and practical applicability of our proposed approach.

评论

55 Approximation Assumptions in Theorem 1

We sincerely thank the reviewer for highlighting important issues regarding the approximation assumptions in our theoretical analysis. We fully acknowledge and appreciate these insightful comments. To address them, we have revised and expanded our proof, explicitly stated four assumptions (A.1–A.4), and provided clear empirical validation.

For clarity and readability, here we highlight the key improvements. The full revised proof with detailed validations is provided in the following supplementary document (Full Proof). Please note that reviewing this supplementary proof is not necessary to understand our main response; it is included simply to provide additional details for completeness.

(1) LoRA Initialization (A.1)

Purely zero-initialized LoRA parameters (A=0,B=0A=0,B=0) yield zero gradients, making meaningful importance estimation impossible. Empirically, we observe that overly small initialization (e.g., std=0.01) led to unstable gradients, whereas overly large values (std ≥ 0.40) violated Assumption A.1 by introducing excessive noise. Hence, we treat LoRA initialization as a hyperparameter tuned equally across all methods within 15 validation trials, confirming effective and stable initialization values are easily identifiable (e.g., std=0.05).

(2) Empirical Validation and Discussion of Approximation Assumptions

We fully agree with the reviewer regarding the necessity of empirically validating our theoretical assumptions. Therefore, we explicitly revisit and empirically validate each assumption as follows:

  • Assumption A.1 (Negligible Magnitude of Initial Matrices): We empirically confirm that initial matrices B_0,A_0B\_0, A\_0 and their associated terms (B_0A_0,B_0DeltaA,DeltaBA_0B\_0A\_0, B\_0\\Delta A, \\Delta B A\_0) are consistently at least 4 orders of magnitude smaller compared to the learned update term DeltaBDeltaA\\Delta B\\Delta A. This empirically justifies our approximation.

  • Assumption A.2 (Independence between DeltaB\\Delta B, DeltaA\\Delta A): We empirically confirm that the resulting gradients follow approximately Gaussian distributions. We also measure the element-wise covariance between DeltaB\\Delta B and DeltaA\\Delta A across 500 gradient samples, separately for the forget and retain sets. The covariances are sharply concentrated near zero, supporting our approximation that the two matrices are effectively independent.

  • Assumption A.3 (Independence among Distinct Elements within Each Matrix): Empirical measurements similarly confirm that off-diagonal covariance terms among distinct elements are negligible. Since each element follows an approximately Gaussian distribution, this further supports the assumption of statistical independence, validating the practical reasonability of this approximation.

  • Assumption A.4 (Negligible Expectation Values): We find that the squared expectations are small but not strictly negligible. Still, we assume them to be negligible to significantly reduce memory and computation. This leads to a 100× memory and 40× speed improvement, with minimal impact on performance. Future work may revisit this assumption for further gains.


We hope this thoroughly addresses Reviewer 5shu's insightful comments. We greatly appreciate the reviewer's constructive feedback and will incorporate all improvements in our final manuscript.

评论

22 Clarification on Equation 5

“The right-hand side of Equation 5 doesn't depend on DeltaW\\Delta W.”

We appreciate the reviewer pointing out this issue. It was indeed a typo. We have corrected Equation 5 as follows to clearly reflect the dependence on DeltaW\\Delta W:

Var_D[ΔW]:=E_D[(Wlogp_W(D))2](E_D[Wlogp_W(D)])2.\mathrm{Var}\_{\mathcal{D}}[\Delta W] :=\mathbb{E}\_\mathcal{D} \left[ \left( \frac{\partial}{\partial W} \log p\_{W}(\mathcal{D}) \right)^2 \right] - \left( \mathbb{E}\_\mathcal{D} \left[ \frac{\partial}{\partial W} \log p\_{W}(\mathcal{D}) \right] \right)^2.


33 Validity of the Expectation as an Importance Score

"Another question is that when D_fD\_f is different from DD, the expectation of the score function is no longer zero, can we only take it as an importance score?"

To investigate this, we conduct additional experiments on two alternative importance measures:

  • ExpILA: Uses the magnitude of the expected score function.
  • AbsILA: Uses the expected magnitude of the score function.

For brevity, we present results only for the NPO setting (Phi-1.5B and Llama2-7B), but similar trends are consistently observed in the GD and IHL settings as well.

ModelMethodForget 1%Forget 5%Forget 10%Average Gain
Phi-1.5BFILA-2.17-6.09-8.831.12
ExpILA-1.85-5.62-10.542.74
AbsILA-1.85-5.86-9.061.23
VILA-1.85-1.17-0.837.68
Llama2-7BFILA-3.30-11.18-11.060.72
ExpILA-1.27-12.18-5.484.72
AbsILA-2.52-12.18-4.263.92
VILA-1.54-4.32-4.595.74

Key observations from the above experiments:

  • ExpILA and AbsILA generally achieve comparable or better performance compared to FILA, particularly when the forget set size is small (Forget 1%).
  • As the forget set size grows (Forget 5% and Forget 10%), VILA consistently demonstrates better performance than ExpILA and AbsILA.
  • Importantly, these results empirically confirm our claim: when the distribution of the forget set differs from that of the entire data, the expectation values of the score function become different from zero. Thus, correcting the original Fisher Information estimation (as done in VILA) becomes essential for robust and accurate importance estimation.

We sincerely thank Reviewer 5shu for raising this insightful point. We will incorporate these empirical results and their analyses into our final manuscript to further clarify our method’s validity and effectiveness.


44 Clarification of Notations and Contributions

Reviewer 5shu pointed out a potential confusion regarding the notation mathrmVar_mathcalD\\mathrm{Var}\_{\\mathcal{D}} used in both Equation 5 (variance of gradients for the full model parameters) and lines 417–419 (variance approximation under the LoRA-based efficiency assumption). This feedback enabled us to significantly improve not only our notation clarity but also the overall structure of our manuscript. We sincerely appreciate this valuable input.

Clarification: In our manuscript, the operator mathrmVar_mathcalD\\mathrm{Var}\_{\\mathcal{D}} consistently denotes the empirical variance computed over dataset mathcalD\\mathcal{D}. The difference between Equation 5 and lines 417–419 lies solely in their arguments (score functions):

  • Equation 5: Variance computed for gradients w.r.t. original full-model parameters WW.
  • Lines 417–419: Variance computed specifically for gradients w.r.t. LoRA adaptor parameters A,BA, B.

We will explicitly clarify in lines 417–419 that the variance calculation is performed specifically on the LoRA adaptor parameters' score functions.

Cause and resolution: We believe the underlying reason for the confusion was not merely due to insufficient notation clarification, but rather because our manuscript did not explicitly emphasize the distinct contributions of our proposed method:

  • FI Correction: We correct FILA's incorrect assumption of a zero-expectation in Fisher Information estimation.
  • Efficiency Improvement: We efficiently approximate Fisher Information using only LoRA adaptor parameters, rather than the full model.

These two contributions are developed separately and then combined into our final proposed method, VILA. However, our original manuscript did not clearly present this process. We believe that if this had been clearly explained, the confusion regarding notation would have been greatly reduced. Therefore, we will restructure our manuscript to present these two contributions individually and then explain how they are integrated into VILA.

We again thank Reviewer 5shu for this insightful comment, which has significantly improved the clarity of our manuscript.

审稿意见
6

This paper proposes a new parameter efficient model optimization technique Variance based Importance sstimation and efficient Low rank Adaptation (VILA) for LLM unlearning. This work builds on the limitations addressed in a prior work called Fisher-Initialization of Low-rank Adapters (FILA). The authors argued that 1) FILA does not accurately represent parameter importance in unlearning setting, 2) FILA is not scalable - although treated as parameter efficient, FILA still requires computing gradients for all model parameters to construct the importance map of parameters vs forget set. Then the authors proposed a corrected technique that showed improvement in WMDP, MUSE Books, and TOFU datasets. The authors applied their method to fine-tine Llama2-7B and Phi-1.5B.

  • I am increasing my support for the paper as the authors' response was convincing in terms of more analysis and clarifying the contributions' importance.

接收理由

Reasons To Accept

  • The paper is well-written and easy to follow.
  • The proposed method investigates the limitation of FILA in a theoretical manner and suggests modifications that are intuitive and improves the results.

拒绝理由

  • Although the improvements on top of FILA seems to work as the results demonstrate, except for MUSE books, the improvement does not seem transformational. This work seems like more of an incremental work with limited contribution. One way to compensate would have been to dig down the analysis section to uncover some exciting learnings. That part is missing which makes it another paper that is giving better numbers, but offers no other insights. The methodology is well written and I can understand what has been done, but I do not know what is the micro level impact of these design choices.
评论

We sincerely thank Reviewer WfNG for the thoughtful and insightful comments. Below, we explicitly address each key concern.

11 Incremental Contribution

"Although the improvements on top of FILA seem to work, the improvement does not seem transformational. This work seems incremental, with limited contribution."

We appreciate the reviewer’s candid evaluation. While we agree that our primary contribution is indeed to achieve improved unlearning performance with computational efficiency, we would like to emphasize another important contribution of our work: We explicitly identify and correct the improper use of Fisher Information (FI) in the unlearning context for the first time.

Previous unlearning methods, such as:

  • Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening (AAAI 2024)
  • Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs (ICLR 2025)
  • Unlearning Targeted Information via Single Layer Unlearning Gradient (ICML 2025)

have used FI without considering the distributional shift between the forget set and the entire data, leading to inaccurate importance estimations. This critical issue has been overlooked in the unlearning literature. Our work explicitly highlights this problem and provides a theoretically grounded correction.

To empirically demonstrate the significance of our FI correction, we conduct extensive ablation studies (summarized in the next section). These studies indicate that without the FI correction, unlearning performance deteriorates, thus empirically validating the necessity of our proposed correction.

Therefore, our work is not limited to merely improving a single existing method (FILA); it contributes broadly to the unlearning field by correcting a fundamental theoretical oversight.

评论

22 Micro-level Impact Analysis

"What is the micro level impact of these design choices?"

We thank Reviewer WfNG for this insightful comment, which prompts us to conduct additional ablation experiments clearly isolating each proposed solution component:

  • FI Correction (Eq. 5): Correcting the Fisher Information estimation.
  • LoRA Approximation (Eq. 9): Approximating the full-model Fisher Information using LoRA adapter parameters to enhance computational efficiency.

We provide detailed ablation results using the Phi-1.5B model below:

MethodForget 1%Forget 5%Forget 10%Average Gain
GD + FILA-2.17-10.23-13.84
w/ FI Correction-1.27-9.61-9.541.94
w/ LoRA approx.-2.17-9.61-13.540.30
w/ Both (VILA)-1.54-9.61-10.801.43
NPO + FILA-2.17-6.09-8.83
w/ FI Correction-1.85-6.34-5.851.02
w/ LoRA approx.-2.17-6.58-9.30-0.32
w/ Both (VILA)-2.17-5.17-9.300.15
IHL + FILA-2.17-5.40-1.79
w/ FI Correction-1.54-0.85-0.472.17
w/ LoRA approx.-2.17-4.53-0.190.82
w/ Both (VILA)-1.85-1.17-0.831.84

Key observations from these experiments are:

  • FI Correction alone consistently yields the highest unlearning performance across most settings, confirming that correcting Fisher Information significantly improves unlearning effectiveness.
  • LoRA Approximation alone provides efficiency gains but achieves limited or no improvement over the FILA baseline.
  • Our proposed method, VILA (combining both), achieves performance close to the FI Correction alone scenario while substantially reducing computational overhead. Thus, VILA effectively balances high unlearning performance with computational efficiency.
评论

33 Additional Experiments and Insights

"One way to compensate would have been to dig down the analysis section to uncover some exciting learnings. That part is missing, making it another paper giving better numbers without deeper insights."

We fully acknowledge the concern and have conducted six additional experiments to investigate the micro-level impacts of our design choices. Below, we briefly summarize key insights from these analyses. More detailed experimental results and further analyses are available in the supplementary appendix (see Section B-G) at the following link: AdditionalExperimentalResultsAdditional Experimental Results . Please note that reviewing these supplementary results is not necessary to understand our main response; it is included simply to provide additional details.

  • In-domain Unlearning (Expectation ≈ 0): When the distribution of the forget set closely matches that of the entire dataset, the performance gains from FI correction diminish. This explicitly indicates that FI correction is particularly effective for scenarios with significant distribution mismatch. Notably, most unlearning scenarios inherently involve such distributional mismatches, underscoring the broad applicability and necessity of our FI correction.

  • Expectation as an Importance Score: Using the raw expectation or its absolute value (ExpILA/AbsILA) provides reasonable performance for small forget sets but deteriorates as the forget set size increases. Importantly, these experiments empirically confirm that the expectation values of the FI score function become non-zero under realistic distribution mismatches. Thus, treating expectation values as zero, as done by existing approaches, is invalid. The expectation itself provides meaningful signals for parameter importance, clearly validating our corrected FI estimation approach (VILA).

  • LoRA Initialization Sensitivity (Sigma Ablation): Our ablation study on LoRA parameter initialization shows that extremely small or large initialization standard deviations harm FI estimation accuracy and performance.

  • Selective Application of Importance Map (25% layers): Interestingly, applying the importance map selectively to only the top 25% of layers achieves better unlearning performance than applying it to all layers uniformly. This demonstrates that our importance estimation is able to identify specific layers crucial for effective unlearning.

  • Training Trajectories Analysis: Through detailed analysis of the unlearning trajectories, we demonstrate that VILA consistently maintains stable and superior performance across the entire training process. This indicates that VILA improves not only final performance but also robustness and stability during the unlearning procedure.

  • Empirical Validation of Independence Assumptions (LoRA Parameters): Our theoretical analysis assumes independence both between LoRA update matrices (DeltaB\\Delta B, DeltaA\\Delta A; Assumption A.2) and among distinct elements within each matrix (Assumption A.3). Empirical covariance calculations across all parameters confirm that covariance terms, both between and within matrices, are consistently negligible (close to zero). These empirical observations support the validity of our independence assumptions.

Taken together, these micro-level insights clearly demonstrate that the superior performance and efficiency of our method stems from precise design decisions.

Remarks

We sincerely appreciate Reviewer WfNG's insightful comments and critiques. They have greatly assisted us in clarifying and strengthening our work. We will explicitly incorporate these insights into our revised manuscript to highlight both theoretical significance and empirical contributions clearly.

评论

Thanks for all the additional experiments and explanation. I have increased my score. I hope you will revise the paper with the additional experiments.

评论

Dear Reviewer WfNG,

Thank you very much for your thoughtful and well-reasoned review. We greatly appreciate the time and effort you dedicated to carefully evaluating our responses. Your insightful feedback prompted us to conduct additional ablation studies, significantly improving the quality and clarity of our work.

As suggested, we will incorporate all additional experiments into the final manuscript. Once again, we sincerely appreciate your valuable contribution to shaping our paper.

Best regards,
Authors

审稿意见
7

This work introduce a scalable and efficient unlearning technique, VILA, for large language models which addresses the limitations of FILA by enhancing importance estimation and reducing computational overhead. Instead of retrieving the Fisher Information (FI) from the full model parameters, this work proposed to retrieve FI from the LoRA parameters. Experiments on the TOFU, WMDP, and MUSE benchmarks demonstrate that the proposed VILA consistently outperforms existing approaches in unlearning performance while preserving model utility with higher efficiency.

接收理由

  1. This work studies an existing weakness in the FILA method, and proposes a fix that could improve both the performance and efficiency.

  2. The experiments on multiple datasets show that the proposed method is effective.

拒绝理由

  1. Though the experiments show the effectinveness, more ablation studies are desired for how the variance of the gradient of the model parameter can be approximated with those of LoRA adapters. For example, if the importance map is changed from the calculated one from 0% to 100%, how will the final performance change.
评论

We sincerely appreciate Reviewer Fpuh’s valuable feedback and thoughtful suggestions regarding additional ablation studies.

11 Sensitivity of VILA to the extent of importance map usage

"More ablation studies are desired for how the variance of the gradient of the model parameter can be approximated with those of LoRA adapters. For example, if the importance map is changed from the calculated one from 0% to 100%, how will the final performance change?"

To address this insightful comment, we conduct additional ablation experiments to investigate the sensitivity of our method (VILA) to the extent to which the importance map is applied. Specifically, we vary the proportion of layers to which our calculated importance map is applied. We compute the average importance score per layer and then select the top n n% of layers ( n \= 25, 50, 75) ranked by this importance score. The calculated importance map is applied to these selected layers (top n n%), while the remaining layers are updated uniformly without importance weighting.

The following table summarizes the results obtained on the TOFU benchmark (Forget 10% setting with Llama2-7B):

Method0% (Baseline)25% layers50% layers75% layers100% (VILA)
GD + VILA-16.61-0.23-0.83-0.47-1.18
IHL + VILA-7.70-0.01-0.03-0.29-0.40
NPO + VILA-13.84-3.94-4.76-5.29-4.59

From these experiments, we draw the following key observations:

  • Interestingly, applying the importance map selectively to only the top 25% of layers yields better unlearning performance compared to applying it to all layers.

  • This result suggests that our method (VILA) is able to identify layers that are more critical to unlearning, thus achieving improved forgetting performance when updates are restricted to fewer layers.

  • One possible interpretation of this phenomenon is that restricting updates to a smaller set of more relevant layers reduces unnecessary parameter changes, potentially helping preserve model utility and achieve more targeted unlearning.

  • This observation is consistent with recent findings indicating that selective layer-based unlearning can yield better outcomes, as demonstrated by Agrawal & Huang (2025):

    Agrawal, Saransh, and Kuan-Hao Huang. "SHA256 at SemEval-2025 Task 4: Selective Amnesia—Constrained Unlearning for Large Language Models via Knowledge Isolation." arXiv preprint arXiv:2504.12996 (2025).

We sincerely thank Reviewer Fpuh for highlighting this important aspect, which we had previously overlooked. We will carefully incorporate these insights and experimental results into our final manuscript to provide a more comprehensive understanding.

评论

Thanks for your response. I think the reported results should be considered to be included in a revision. I'll keep my score unchanged.

评论

Dear Reviewer Fpuh,

Thank you sincerely for taking the time to thoroughly review our paper. Your comments encouraged us to conduct an additional experiment applying our method progressively rather than across all layers at once, which led to some very interesting findings.

We truly appreciate your contribution, and we will make sure to clearly present the results of this experiment in the final manuscript.

Best regards,
Authors

评论

We sincerely thank all reviewers for carefully reviewing our manuscript and providing valuable, constructive feedback. We particularly appreciate the reviewers' recognition of our contributions:

  • Clear and Accessible Presentation ( Reviewer5shu,WfNGReviewer **5shu**, **WfNG** ): The manuscript is well-written and easy to follow.
  • Rigorous Correction of FILA’s Weaknesses ( ReviewerFpuh,WfNG,ZX4FReviewer **Fpuh**, **WfNG**, **ZX4F** ): Our method effectively identifies and corrects FILA’s limitations, improving both unlearning performance and computational efficiency.
  • Comprehensive Empirical Validation ( Reviewer5shu,Fpuh,WfNG,ZX4FReviewer **5shu**, **Fpuh**, **WfNG**, **ZX4F** ): Extensive experiments across multiple benchmarks demonstrate that VILA consistently outperforms existing methods while preserving model utility.
  • Significant Efficiency Gains ( ReviewerFpuh,ZX4FReviewer **Fpuh**, **ZX4F** ): Our low-rank approximation substantially reduces computational cost compared to FILA, as validated by multiple LLM benchmarks.

We have carefully revised our manuscript to address all the reviewers' comments. Below, we summarize our responses clearly and comprehensively. For reviewers interested in additional details and further experimental analyses, we provide a supplementary document available at the following link: SupplementarydocumentSupplementary document .

We emphasize that reviewing this supplementary material is not necessary to understand our responses. Rather, it is provided purely for reviewers who may wish to explore the analyses in greater depth. Specifically, we provide the following new or revised materials:

  • Additional experiments on the In-domain Unlearning Scenario (Reviewer 5shu Q1Q1 , Reviewer WfNG Q2Q2 )
  • Clarification on Equation 5 (Reviewer 5shu Q2Q2 )
  • Validation of the Expectation as an Importance Score (Reviewer 5shu Q3Q3 , Reviewer WfNG Q2Q2 )
  • Clarification of Notations and Contributions (Reviewer 5shu Q4Q4 )
  • Explicit description of Approximation Assumptions in Theorem 1 (Reviewer 5shu Q5Q5 )
  • Sensitivity Analysis of VILA to the extent of importance map usage (Reviewer Fpuh Q1Q1 , Reviewer WfNG Q2Q2 )
  • Further clarification of our method’s Novelty and Contributions (Reviewer WfNG Q1Q1 )
  • Isolation of Proposed Solutions: FI Correction and LoRA Approximation (Reviewer ZX4F Q1Q1 , Reviewer WfNG Q2Q2 )
  • Analysis of the Forget Performance Trajectory (Reviewer ZX4F Q2Q2 , Reviewer WfNG Q2Q2 )
  • Correction of minor Typos (Reviewer ZX4F Q3Q3 )
  • Paper Title Change (Reviewer ZX4F Q4Q4 )

We thank all reviewers once again for their insightful comments, which have significantly enhanced our manuscript.

最终决定

This paper addresses the challenge: how to efficiently erase sensitive information without costly retraining in LLMs. The authors propose VILA as an alternative framework that corrects parameter importance estimation based on Fisher Information with LoRA adapters. VILA outperforms FILA in both accuracy and efficiency across several benchmarks, showing stronger theoretical grounds and empirical results on improvement.

All reviewers agree that VILA realizes notable theoretical corrections to FILA's Fisher Information estimation by considering distribution shifts in the forget set. They all highlight significant accuracy and efficiency gains across multiple datasets and LLM architectures. Writing is structured with clarity.

However, some reviewers point out that the proposed work is not transformative rather incremental without providing fine-grained qualitative insights. They also mention that more rigorous ablation studies that isolate the independent effects of Fisher Information correction and LoRA approximation.

While the proposed work may be incremental, the gains in efficiency and accuracy significantly improve from FILA, providing a solid framework for the domains and tasks that need machine unlearning in LLMs. I highly encourage the authors to incorporate additional conversation during rebuttal processes, later adding qualitative examples for added insights. Provided the above revisions are made, this work will offer both immediate practical value and theoretical clarity for unlearning.