We appreciate the reviewers’ valuable feedback and have addressed each point as follows:

W1: Reconstruction Error

For calcuating the reconstruction error in Equation (13), we first obtain the input for each layer from a set of samples and then compute the reconstruction coefficients using the least squares method with its corresponding task vector. The reconstructed vector, , is derived using these coefficients along with the task vector. Then we calculate the Relative Reconstruction Error (RRE) for the sample from task as follows:

Table 2.1 shows the result of different layers and tasks. which demonstrate that the relative reconstruction errors across different layers are extremely small, which further validates our theoretical analysis.

Table 2.1: Reconstruction Error Results on eight tasks of different layers

Task	SUN397	Cars	RESISC45	EuroSAT	SVHN	GTSRB	MNIST	DTD
Layer 1	1.3e-5	1.3e-5	1.3e-5	3.3e-3	7.1e-3	3.6e-3	5.8e-3	1.3e-5
Layer 3	1.1e-5	1.1e-5	1.2e-5	1.4e-5	1.6e-5	1.3e-5	1.3e-5	1.1e-5
Layer 6	1.0e-5	1.1e-5	1.1e-5	1.3e-5	1.2e-5	1.2e-5	1.2e-5	1.1e-5
Layer 12	9.8e-6	1.1e-5	1.2e-5	1.1e-5	1.2e-5	1.1e-5	1.2e-5	1.0e-5

W2: Resource Consumption Comparison

We report the computational time and GPU memory usage of different method on ViT-B-32 tasks. In comparison to the Adamerging method, our approach not only improves performance but also significantly reduces computational cost. The details are summarized in the following table:

Table 2.2: Detailed computational time and gpu memory requirements on ViT-B-32 tasks.

Method	Accuracy (%)	Time	GPU Memory (GB)
Ties Merging	72.4	4 s	0
Adamerging	81.1	127 min	17.1
WUDI-Merging-CFS (CPU)	84.4	5 s	0
WUDI-Merging-CFS (GPU)	84.4	2 s	1.8
WUDI-Merging	85.2	1 min 54 s	4.0

Q1 & Q2: How to Use Random Vectors or Subsets of the Task Vector for Optimization

For the subset of task vectors, we randomly sample a subvector from the original task vector as follows:

$'_' allowed only in math mode\tau^{\text{sub}}_i = \tau_i [\text{rand\\_index}, :]$

The corresponding loss is computed as:

\tau^{\text{random}}_i \sim \mathcal{N}(\mu_i, \sigma_i^2), \quad \text{where} \quad \mu_i = \text{mean}(\tau_i) \quad \text{and} \quad \sigma_i = \text{std}(\tau_i)

\mathcal{L}_{\text{random}} = \sum_{i=1}^{n} \frac{1}{|\tau_i|_F^2}\delta_i(\tau^{\text{random}}_i)^\top = \sum_{i=1}^{n} \frac{1}{|\tau_i|_F^2}(\tau_m - \tau_i)(\tau^{\text{random}}_i)^\top $ All reported results are averaged over 5 sampling runs. We thank the reviewers for their suggestions and will include these additional details in the revision.

We thank the reviewer again for their constructive feedback and hope that our detailed responses address the concerns. We look forward to further discussion.