We thank the reviewer for their detailed feedback, and for appreciating the distributional approach we used, as well as the potentially broad applicability of our approach.

Based on their comments, we have made changes to the manuscript to enhance clarity.

We now address the individual concerns raised by reviewer ZvuL.

W1 A: The main weakness of this paper is the limited understanding of how keeping the HiFi part results in keeping the knowledge of learned models. Otherwise, how tuning the HiFi part results in forgetting the specific learned knowledge.

In this work, the 'knowledge' refers to the data distributions learned by the model - thus, by the definition of HiFi components (Section 4.1, lines 245-288), keeping HiFi channels in well-trained models ensures that the learned distributions are maintained. The reconstruction error based formulation proposed in Section 4 is principled and rigorously derived. Thus, by maintaining the HiFi components, it follows that the distributional information regarding the entire data distribution (for pruning), or the remain classes (for unlearning) is maintained.

W1 B: At the conceptual level of understanding, it is quite convincing that the components showing similar distributional behaviors with the layer outputs are probably the crucial parts of the knowledge. However, it is not guaranteed theoretically.

We point out that the knowledge learned by the model is the distribution generating the data, and as such, retaining HiFi components ensures that the model's knowledge is retained. Since retaining the HiFi components are rigorously derived from the minimization of the layer-wise output reconstruction error, our CoBRA-P and CoBRA-U algorithms (Section 6) are principled, theoretically motivated approach that ensures that knowledge is retained (CoBRA-P) or removed (CoBRA-U) as needed.

W2: Quality of presentation

We thank the reviewer for the feedback. We have addressed the typographical errors in the work, and have fixed the size of images in Figure 5. Additionally, we have made other improvements to the text, highlighted in blue.

W3 A: Although the authors have provided the 'Related Work' part in the Appendix ... in-depth analysis of the prior works investigating the importance of weights or sensitivity measures of weights should be considered.

The setting of our work, wherein we need to edit models (specifically for pruning and classwise unlearning), without access to the training data or loss function, is unique, with very few technically similar works available. We point out to the reviewer our significant literature survey into structured pruning methods, most of which analyze the importance of weights. For instance, in Appendix A.3, we discuss works that use gradients to measure weight importance for structured pruning [Molchanov et al, 2019], Weight Norms [Li et al, 2019], reconstruction error [Luo et al, 2017], feature map ranks [Lin et al, 2020], and a variety of other methods (see the survey [Hoefler et al, 2021]). In Appendix A.2, we note [Wang et al, 2022], which uses class-discriminative scores for unlearning in the federated setting. However, we have clarified Appendix A from Lines 894 to 906 to further highlight different measures of weight importance used in the literature.

W3 B: I think that HiFi is another viewpoint to measure the importance of weights so that it has the potential to show further impact on continual learning (also without data of the past tasks) and explainability.

We are glad that you point out that our notion of HiFi components is applicable other important model editing tasks. However, the scope of this work is limited to using HiFi components to combine structured pruning and classwise unlearning under the model editing umbrella.

Q1: In the "What is Model Editing" part on line 176, to my understanding, 'B' is the number of components (not an individual weight, but a group of weights) in the model. Therefore, the equation, |θ|−B, looks wrong because |θ| is commonly used for the number of weights, not components. Would you clarify the equations?

should indeed represent the number of components in the model. We have amended the main document by defining , the total number of components in the network in place of . The change is highlighted in blue.