Dear Reviewer 9s1q,

Thank you for recognizing the merits of our work. Please see our response to the weaknesses below:

Figures: Thank you for the suggestions for making the figures more understandable. We have updated the explanations for subfigures A and C in Figure 2 and improved the text font in all of our figures for clarity in the updated version of our paper (see revision).

Computational costs of clustering in LAMP: We note that our clustering approach incurs only marginal additional computational cost and does not require significant resources. Naive k-means clustering has a computational complexity of where = iterations of the algorithm, = number of clusters, =number of samples, =dimension of vectors, implying that the computation costs rise linearly with each of these factors.

The value is constant in our experiments.
The variables and are small in our experiments (see Lines 354-358). The optimal values of are between 5 and 50, and the value of is capped at 8192 by virtue of using random projections to compress high-dimensional vectors (Lines 877-881 in Appendix). The value of can increase with the size of the datasets, however, there exist many efficient methods for calculating pairwise cosine similarities effectively. We use the efficient implementation of MiniBatch-KMeans in scikit-learn to compute clusters for our datasets, which significantly reduces the compute time.
The maximum time taken for a single k-means clustering run in our LAMP experiments using this implementation is 48 minutes on a 48-core CPU for approximately 1.8 million samples. In addition, the faiss library can also be used for efficient computation of pairwise cosine similarities using quantization.
Further, our proposed approach Lite-Lamp seeks to reduce the number of samples at each step, effectively reducing the time taken for k-means clustering. The maximum time taken for a single k-means clustering run in Lite-LAMP experiments is 15 minutes on the 48-core CPU for approximately 660K samples, which makes it as efficient as other data selection methods like LESS [1] and COINCIDE [2].

Since there are many ways to control each of the variables that contribute to the complexity of k-means clustering (, and ), we think that LAMP is not a prohibitively computationally expensive method.

[1] Xia, Mengzhou, et al. "Less: Selecting influential data for targeted instruction tuning." arXiv preprint arXiv:2402.04333 (2024).

[2] Lee, Jaewoo, Boyang Li, and Sung Ju Hwang. "Concept-skill Transferability-based Data Selection for Large Vision-Language Models." arXiv preprint arXiv:2406.10995 (2024).

Combinatorial Prediction of Function Tools for Sample Selection: Each function tool assigns a different value to every sample in our dataset. Since each functional tool has its unique range and distribution, we normalize these values to enable comparison of entropies across these tools. We appreciate the suggestion for combining these distributions to enable further gains for our method. We believe this idea presents an exciting direction for future work, potentially improving the robustness and efficiency of data selection in future implementations, but also introduces additional challenges that need to be addressed:

determining top- or dynamic number of beneficial function tools for each data pair, and
controlling the relative influence of predictions.

The second point would be crucial as different tools may have varying degrees of relevancy and reliability for different samples, and an imbalance could affect the robustness of the selection process.

In summary, the authors appreciate the reviewer for raising this constructive discussion and believe that exploring these challenges, as well as developing methods to effectively combine the outputs of multiple functional tools, will be a meaningful direction for future work.