AutoAL: Automated Active Learning with Differentiable Query Strategy Search
To our knowledge, we present the first automatic AL query strategy search method that can be trained in a differientiable way.
摘要
评审与讨论
This paper addresses the active learning problem. Given the existence of numerous active learning methods for selecting the most informative samples, this work proposes an end-to-end framework that integrates multiple approaches. The framework consists of SearchNet and FitNet:
SearchNet assigns a score to each sample, ranks them, and selects those with the highest loss. FitNet models the data distribution within the unlabeled dataset and guides the training of SearchNet. However, the main contribution requires further clarification. Instead of selecting a single strategy for the dataset, the framework appears to consider all active learning strategies based on the loss function (Equation 6) and Algorithm 1 (Page 5). If all strategies are incorporated, it is not surprising that the proposed method, AutoAL, achieves the best performance in Figure 2. This suggests that the score definition in Equation 6 is well-designed, but contradicts the claim that the framework selects a single strategy for the dataset.
update after rebuttal
Additionally, it is unclear whether FitNet is truly necessary. For instance, Line 157 states that FitNet is trained on unlabeled data, whereas Figure 1 indicates that FitNet is trained and fine-tuned using labeled pool data. This inconsistency raises questions about its necessity. Moreover, the ablation study primarily focuses on SearchNet, which appears to play a more significant role in performance and seems to dominate the results.
给作者的问题
-
Is there any optimization strategy used to compute the scores fast, as shown in Table A.1 (Line 565)?
-
Is FitNet truly necessary? Why do the ablation studies not evaluate its impact?
-
Why is FitNet both trained and fine-tuned twice? What is the influence of loss Ls and Lf on the final selection?
-
The experiments focus on classification problems—can the framework also be applied to regression tasks?
论据与证据
yes
方法与评估标准
This work proposes a new method that incorporates all active learning algorithms for sample selection.
理论论述
no theory
实验设计与分析
The experimental design is well-structured. It compares the proposed method with various active learning strategies and includes an ablation study on SearchNet and the active learning candidate strategies. Additionally, experiments are conducted on multiple datasets for each evaluation.
补充材料
no supplementary material
与现有文献的关系
It is particularly interesting for researchers focused on active learning.
遗漏的重要参考文献
The work in [1] explores the use of active learning strategies in deep learning networks for question answering. It specifically incorporates uncertainty-based active learning strategies into the training process of question answering models.
[1] Uncertainty-Based Active Learning for Reading Comprehension. Jing Wang, Jie Shen, Xiaofei Ma, and Andrew Arnold. Transactions on Machine Learning Research 2022.
其他优缺点
If an end-to-end framework could effectively select the best strategy for a given dataset or task, it would be highly interesting. This work is a pioneering effort in that direction.
However, further clarification is needed to demonstrate the necessity of FitNet, as SearchNet alone appears to be sufficient.
Additionally, according to the table, the time cost of the proposed method is relatively low, which is surprising.
其他意见或建议
no
We thank for your feedback and the confirmation of our proposed AutoAL! We agree that AutoAL is an end-to-end framework could effectively select the best strategy for a given dataset or task. We also thank for your valuable question, so we want to clarify the followings:
Q1: Is there any optimization strategy used to compute the scores fast, as shown in Table A.1 (Line 565)?
A1: This is due to our proposed differential bi-level optimization strategy. Actually all the time cost for AutoAL-Search only contains the training time of the two neural networks. We do find AutoAL may need extra time compared with other baselines, however, the main cost is due to the sample query process of the candidate ALs.
Q2: Is FitNet truly necessary? Why do the ablation studies not evaluate its impact?
A2: Thanks for pointing out this. Because FitNet is the same net as the final classification net, and is used in AutoAL to yield the informativeness of each unlabeled sample, so we thought it's unnecessary to do ablation on it in our original version. However, we added this experiment on CIFAR100 for you to refer to the results:
| AL Labeled datasets | AutoAL w/o FitNet | AutoAL |
|---|---|---|
| 4000 | 33.1 ± 0.2 | 33.2 ± 0.1 |
| 6000 | 36.8 ± 0.2 | 39.4 ± 0.1 |
| 8000 | 40.6 ± 0.1 | 44.1 ± 0.1 |
| 10000 | 41.5 ± 0.2 | 47.2 ± 0.0 |
| 12000 | 45.8 ± 0.3 | 50.4 ± 0.1 |
| 14000 | 47.9 ± 0.2 | 52.5 ± 0.0 |
| 16000 | 50.1 ± 0.1 | 54.9 ± 0.2 |
| 18000 | 49.8 ± 0.3 | 56.1 ± 0.1 |
| 20000 | 53.2 ± 0.1 | 57.0 ± 0.1 |
Q3: Why is FitNet both trained and fine-tuned twice? What is the influence of loss Ls and Lf on the final selection?
A3: In our settings, FitNet is first trained on the labeled dataset to yield the informativeness of unlabeled sample, then co-optimized with the SearchNet. For , it guides the update of SearchNet, to better decide which AL candidate performs the best in the settings now. For , it will guide the FitNet to better draw the distribution of unlabeled dataset.
Q4: The experiments focus on classification problems—can the framework also be applied to regression tasks?
A4: Yes, this is definitely possible. Such as in object detection tasks, as long as the candidate AL strategy can output the most possible coordinates of the target object, AutoAL can learn from it and decide which one is the best coordinate. However, in this paper, we just demonstrate the superior of AutoAL in classification task. We plan to do more tasks as a future work.
Essential References Not Discussed: Thanks fo providing the related works. In our initial version, we mainly consider the algorithm selection works with many candidate ALs, but not focused on providing a new AL strategy. However, we believe the works you provided is great and we plan to adds them into our final version.
The paper proposes an algorithm selection strategy for active learning. The proposed method utilizes the existing labeled dataset and trains differentiable policies for data selection. Experiments are conducted on numerous datasets showing the effectiveness of their algorithm.
给作者的问题
N/A
论据与证据
I did find the claim in this paper to be unsupported by evidence. This is mostly centered around the comparison against existing algorithm selection algorithms. The authors mention "[Compared to Hacohen & Weinshall and Zhang et al.], both the computational cost and the lack of differentiability make the optimization of these works inefficient."
I am very confused by this comment as
- The proposed bilevel optimization approach seems to be much more computational expensive than both of these existing works. Both of the existing works use simple statistics in choosing algorithms, while this paper uses parameterized neural network classes, which is much more computational costly.
- The author mention the lack of differentiability make existing methods inefficient, but never compare to any of them in experiments. There is also no analysis other than this single sentence.
方法与评估标准
If the authors are trying to make an argument that their methods is better, the author should compare against (Hacohen & Weinshall) and (Zhang et al.), and in their own games. (Hacohen & Weinshall) is proposed for different computation budget settings, and (Zhang et al.) is proposed for imbalance. There is currently no experiment.
理论论述
N/A
实验设计与分析
See sections above.
补充材料
I did review the computational runtime. There is no comparison against existing work, but the authors claim their method is more efficient. A theoretical time complexity would be helpful here.
与现有文献的关系
This paper studies the algorithm selection work for active learning.
遗漏的重要参考文献
I think the paper could benefit from discussing related differentiable policy learning work in active learning. They should also discuss how their parameterization scheme is similar/different from these work.
[2] https://arxiv.org/abs/1909.03585
[3] https://arxiv.org/abs/2010.15382
其他优缺点
N/A
其他意见或建议
I would recommend the authors position their paper for general active learning scenarios. The two existing literature only study certain scenarios of deep active learning (low-high budget and imbalance). However, I still think it is necessary to compare against these algorithms under their proposed scenarios. If the author's algorithm indeed performs better than those papers in their proposed scenarios, this would be a very influential work. Even if the algorithm does not perform as well, I think the algorithm is still a solid contribution to the community. However, at the same time, the authors also need to note the shortcomings of their approach in such cases.
Thank you for your valuable feedback. We appreciate that you confirm our contribution to the community on solving the strategy selection problem and our method: using differentiable bi-level framework.
For your concerns, we add more experiments to show the results as below: Q1: The author should compare against (Hacohen & Weinshall) and (Zhang et al.), and in their own games. (Hacohen & Weinshall) is proposed for different computation budget settings, and (Zhang et al.) is proposed for imbalance.
A1: We thank for your insightful comments. We do find we should compare with their works so that it can further show the effectiveness of our proposed AutoAL. To compare with [1], we conducted the experiements on two datasets, CIFAR10 and CIFAR100, which are used in both of our experiments and their experiements. In their settings, the CIFAR10 dataset only has two classes and CIFAR100 with 10 classes respectively. The results are shown as follows:
For CIFAR-10:
| AL Labeled datasets | TAILOR | AutoAL |
|---|---|---|
| 2000 | 80.2 ± 0.2 | 80.2 ± 0.1 |
| 3000 | 84.5 ± 0.1 | 85.3 ± 0.2 |
| 4000 | 87.5 ± 0.2 | 89.4 ± 0.1 |
| 5000 | 89.4 ± 0.2 | 91.7 ± 0.1 |
| 6000 | 91.9 ± 0.2 | 92.8 ± 0.2 |
| 7000 | 93.3 ± 0.1 | 93.5 ± 0.1 |
| 8000 | 95.1 ± 0.1 | 94.8 ± 0.2 |
| 9000 | 96.5 ± 0.2 | 96.6 ± 0.1 |
For CIFAR-100:
| AL Labeled datasets | TAILOR | AutoAL |
|---|---|---|
| 4000 | 37.8 ± 0.3 | 38.2 ± 0.2 |
| 6000 | 49.8 ± 0.1 | 50.4 ± 0.1 |
| 8000 | 58.2 ± 0.1 | 62.3 ± 0.2 |
| 10000 | 65.8 ± 0.2 | 68.7 ± 0.1 |
| 12000 | 70.9 ± 0.2 | 72.3 ± 0.2 |
| 14000 | 75.5 ± 0.3 | 78.4 ± 0.2 |
| 16000 | 80.0 ± 0.2 | 80.2 ± 0.1 |
| 18000 | 83.2 ± 0.1 | 83.4 ± 0.3 |
We found that AutoAL can outperform TAILOR in nearly all AL iterations, both in CIFAR-10 and CIFAR-100, this result is consistent with the results shown before because for SVHN and other medical datsets, although they are imbalanced, AutoAL can still outperform the baselines.
For [2], unfortunately they didn't open-source their code, but we found a toolbox from [3] which officially implemented Probcover [4] and TypiClust [3], two baselines used in [2]. From the experiment results in [2], we observed that Probcover consistently performs best in low budget settings while Badge [5] excels in high budget settings. Therefore, we selected Probcover, TypiClust, and Badge as our baseline methods, focusing our tests on CIFAR-10. As shown in our results, AutoAL performs well in medium and high budget settings but underperforms in low budget scenarios. We attribute this to deep neural networks typically requiring larger datasets for proper training—when the budget is low, SearchNet and FitNet cannot be fully optimized. However, traditional AL settings rarely use extremely low budgets, and AutoAL demonstrates its effectiveness when the budget is adequate. We acknowledge that AutoAL is designed for general active learning scenarios, not specifically for low-budget settings.
| Budget (L+A) | 500+500 | 7k+1k | 20k+5k |
|---|---|---|---|
| ProbCover | 50.3 ± 0.4 | 79.8 ± 0.2 | 87.5 ± 0.2 |
| TypiClust | 49.8 ± 0.3 | 80.0 ± 0.2 | 87.2 ± 0.1 |
| Badge | 50.2 ± 0.4 | 79.9 ± 0.1 | 88.1 ± 0.3 |
| AutoAL | 46.8 ± 0.3 | 80.3 ± 0.1 | 89.3 ± 0.1 |
Essential References Not Discussed: Thanks for providing the related works. In our initial version, we mainly consider the algorithm selection works with many candidate ALs, but not focused on providing a new AL strategy. However, we believe the works you provided is great and we plan to adds them into our final version.
We hope these results can address your problems, and reconsider our contributions. We appreciate if you can consider raising your score.
References: [1] Zhang, Jifan, et al. "Algorithm selection for deep active learning with imbalanced datasets." Advances in Neural Information Processing Systems 36 (2023): 9614-9647.
[2] Hacohen, Guy, and Daphna Weinshall. "How to select which active learning strategy is best suited for your specific problem and budget." Advances in Neural Information Processing Systems 36 (2023): 13395-13407.
[3] Hacohen, Guy, Avihu Dekel, and Daphna Weinshall. "Active learning on a budget: Opposite strategies suit high and low budgets." arXiv preprint arXiv:2202.02794 (2022).
[4] Yehuda, Ofer, et al. "Active learning through a covering lens." Advances in Neural Information Processing Systems 35 (2022): 22354-22367.
[5] Ash, Jordan T., et al. "Deep batch active learning by diverse, uncertain gradient lower bounds." arXiv preprint arXiv:1906.03671 (2019).
Thank you for the rebuttal and I have raised my scores.
Thanks for your score and response. Also appreciate your time and efforts on reviewing our work.
This paper addresses the challenge of active learning (AL) by proposing AutoAL, a automated active learning framework. The authors highlight that optimal AL strategies vary across different datasets and problem settings. To address this, AutoAL first extracts scores from multiple acquisition functions and then employs a bi-level optimization approach to identify the most effective acquisition strategy dynamically. The model consists of FitNet and SearchNet, which are trained in a differentiable framework. Specifically, the labeled dataset is partitioned into a pseudo-validation set and a training set, enabling FitNet to fit the training set while SearchNet determines the most informative unlabeled samples for annotation.
给作者的问题
None
论据与证据
- First Differentiable AL Strategy Selection
- To the best of my knowledge, AutoAL is the first differentiable active learning selection method. The use of bi-level optimization for acquisition function selection has been explored in other fields, such as few-shot learning, but it has not been explicitly applied in AL before.
- The paper provides empirical evidence across multiple datasets showing that AutoAL consistently outperforms prior AL methods.
- Assumption on Sampling Bias in Labeled Data
- AutoAL assumes that dividing the labeled set into two subsets can sufficiently approximate the data distribution of the unlabeled pool (Line 173 - 178).
- However, active learning is inherently biased, as sampling bias exists in the labeled data [Farquhar’21]. The assumption that the sampled labeled data follows the original data distribution does not hold, which could affect the performance of AutoAL. (See discussion under "Methods And Evaluation Criteria.")
- Baseline Implementation Concerns
- The result shows performance gains across most datasets, but there are concerns about whether the baselines were correctly implemented.
- Some prior AL studies indicate that performance may degrade as AL rounds progress, which is observed in Figure 2, particularly for PathMNIST.
[Farquhar’21] Farquhar, S., Gal, Y., & Rainforth, T. (2021). On statistical bias in active learning: How and when to fix it. ICML21
方法与评估标准
- Sampling Bias in Actively Selected Data
- One major issue with AutoAL is that actively sampled data does not follow the same distribution as the unlabeled pool.
- Farquhar et al. (2021) demonstrated that actively sampled data has inherent statistical biases, meaning FitNet and SearchNet do not necessarily generalize well to the full data distribution.
- Moreover, since the labeled dataset is halved for training FitNet and SearchNet, this could further change the optimal AL strategy, leading to unexpected selection behaviors.
- Other Methodological Concerns
- Apart from the issue mentioned above, the design of AutoAL's framework and its evaluation metrics appear reasonable.
[Farquhar’21] Farquhar, S., Gal, Y., & Rainforth, T. (2021). On statistical bias in active learning: How and when to fix it. ICML21
理论论述
This work does not introduce new theoretical claims.
实验设计与分析
- Potential Issues in Baseline Comparisons
- In Figure 2, several baseline methods show performance degradation in later rounds, which is unexpected.
- This issue is particularly severe in PathMNIST, where adding more labeled data results in worse accuracy.
- Given that the experiments were conducted with three trials and reported variance, such trends contradict intuition.
- There is a concern that some baselines may not have been properly implemented, which could unfairly favor AutoAL.
- Ablation Studies Provide Useful Insights
- Figure 4 (Candidate set size ablation) and Figure 5 (AL strategy score visualization) provide useful explanations regarding how AutoAL selects acquisition strategies dynamically.
- The ablation design is clear and informative.
补充材料
Yes, all supplementary material was reviewed.
与现有文献的关系
- AutoAL aligns with prior studies that show the best AL strategy depends on budget and dataset properties.
- Bi-level optimization has been applied in few-shot learning and hyperparameter tuning, but this is the first work to introduce differentiability into AL strategy search.
遗漏的重要参考文献
Some recent active learning methods [Mahmood’21, Kim’23, Yehuda’22] are missing. Considering the problem setting [Hacohen’22] needs to be included.
[Mahmood’21] Mahmood, R., Fidler, S., & Law, M. T. (2021). Low budget active learning via Wasserstein distance: An integer programming approach. ICLR 2022.
[Yehuda’22] Yehuda, O., Dekel, A., Hacohen, G., & Weinshall, D. (2022). Active learning through a covering lens. NeurIPS 2022.
[Kim’23] Kim, Y. Y., Cho, Y., Jang, J., Na, B., Kim, Y., Song, K., ... & Moon, I. C. (2023, July). SAAL: Sharpness-aware active learning. In ICML 2023.
[Hacohen’22] Hacohen, G., Dekel, A., & Weinshall, D. (2022). Active learning on a budget: Opposite strategies suit high and low budgets. In ICML 2022.
其他优缺点
- Clarity Issues in Writing and Notation
- The notation is difficult to follow, and the mathematical expressions are not clearly defined.
- For example, the inputs and outputs of SearchNet and FitNet (Lines 165 - 178) are not explicitly defined, making the method difficult to understand.
- Some notations are used before being defined (e.g., Equation (1) loss terms are not introduced until Equations (7) and (8)). Some notation in Equation (5) is not defined.
- Figure 1 is difficult to interpret, as it reuses model illustration with "After Training" arrow without clear explanations.
- High Computational Cost
- AutoAL requires training two networks and running R inner-loop updates.
- According to Table 2, AutoAL is up to 7x more expensive than entropy-based sampling.
- The paper does not analyze the algorithmic complexity, making it difficult to assess scalability to large-scale datasets like ImageNet.
其他意见或建议
None
Thank you for your feedback. For your questions, we make the following comments to clarify our points:
Q1: Sampling Bias in Actively Selected Data
A1: For our initial seed dataset setting, it is i.i.d., ensuring an unbiased starting point. Traditional single-criterion AL strategies, especially those based solely on uncertainty [1], are more susceptible to accumulating such bias over iterations. In contrast, AutoAL integrates multiple AL strategies, each contributing different perspectives (uncertainty, diversity, etc.). This fusion, combining multiple AL candidates while maintaining a differentiable framework between search and fit nets, inherently reduces bias, prevents overfitting to any individual AL sampling strategy, and adapts dynamically as the AL process evolves.
Q2: Baseline implementation Issue
A2: Thanks for the question. We observed this problem too. To clarify, all baseline implementations come from public GitHub repositories, not from our own. Our framework builds upon the open-source deepAL [1]. The accuracy degradation appears in many works, including Figure 1 in [3,4], Figure 3(b) in [5], Figure 3 and 4 in [6], and deepAL [1]. We attribute this to two factors:
- The amount of data causing overfitting in the classification model.
- Medical datasets often contain redundancy and confusing information. These medical images have many-to-one relationships, as one patient may correspond to several pathology images. Some patients may have multiple conditions (e.g., X-rays showing posterior spinal fixators used for spine repair). These features can influence predictions. Proper diagnosis requires both local and global features, making bias problems more severe in these cases.
Q3: Writing Issues Q3.1: Notion Issues.
A3.1: We are sorry that there are many notations in our method parts. We will make a table in the appendix to define the important notions.
We tried to describe the input for SearchNet and Fitnet in Figure 1. The initialed labeled pool is divided to two queues. The first queue is used to train FitNet. The second queue is used to train both FitNet and SearchNet. Thanks for pointing out, we will polish the writing in our final version.
Q3.2: Notions used before defined.
A3.2: Our writing logic was to first interpret the whole framework of AutoAL, then the detailed parts such as the loss functions in Equation (1).
Q3.3: Figure 1.
A3.3: AutoAL's FitNet is first trained on the first queue of labeled dataset. After the parameter of FitNet is adapted to the second training step, FitNet and SearchNet are jointly trained with the second data queue. We will add more explanation to the description in Figure 1.
Q4: Runtime Complexity
A4: Please first refer to the rebuttal to Reviewer khE5, A2. Follow the part 5.1 of pulished work [7], we made a similar complexity analysis. As described in our appendix A.1. let denotes the updating time of SearchNet and FitNet in each AL round per batch, denotes the sample querying and classification model training in each AL round per batch. We can define the AL score querying time for each candidate AL per batch as . When there are AL algorithms, in each batch, the computation complexity is O(T) in the worst case, where T is the total number of rounds and is the number of batches depending on the stop criterion of AL. In the future, we plan to do parallel processing for the candidate AL score calculation, which will further improve the upper bound to O(T))
Q5: Essential References Not Discussed
A5: Thanks for finding these. About the low budget and high budget setting, we have added new experiments to show the results. Please refer to the reply to Reviewer bNSN.
We hope you can reconsider the final rating.
Reference: [1] Sharma, Manali, and Mustafa Bilgic. "Evidence-based uncertainty sampling for active learning." Data Mining and Knowledge Discovery 31 (2017): 164-202.
[2] Zhan, Xueying, et al. "A comparative survey of deep active learning." arXiv preprint arXiv:2203.13450 (2022).
[3] Gal, Yarin, Riashat Islam, and Zoubin Ghahramani. "Deep bayesian active learning with image data." International conference on machine learning. PMLR, 2017.
[4] Geifman, Yonatan, and Ran El-Yaniv. "Deep active learning over the long tail." arXiv preprint arXiv:1711.00941 (2017).
[5] Kim, Seong Tae, Farrukh Mushtaq, and Nassir Navab. "Confident coreset for active learning in medical image analysis." arXiv preprint arXiv:2004.02200 (2020).
[6] Mishal, Inbal, and Daphna Weinshall. "DCoM: Active Learning for All Learners." arXiv preprint arXiv:2407.01804 (2024).
[7] Zhang, Jifan, et al. "Algorithm selection for deep active learning with imbalanced datasets." Advances in Neural Information Processing Systems 36 (2023): 9614-9647.
The proposed AutoAL is an automatic query strategy search algorithm that utilizes bi-level optimization framework to select optimal AL strategies built upon existing uncertainty and diversity-based approaches.
给作者的问题
Please answer the concerns in the weaknesses section.
论据与证据
Yes, most of the claims are supported by clear and convincing evidence. The computational overhead is not very convincing considering the performance the proposed method delivers.
方法与评估标准
Yes, several baselines are explored with a range of well-known datasets.
理论论述
The theoretical claims look fine to me. The differentiable query strategy optimization, the bi-level optimization problem setup look reasonable.
实验设计与分析
Yes.
补充材料
Yes. The complexity analysis and class imbalance information are helpful to understand the contribution.
与现有文献的关系
AutoML is basically an automatic query search algorithm which may help in selecting the optimal AL strategies for various applications.
遗漏的重要参考文献
- Desreumaux, L., & Lemaire, V. (2020). Learning active learning at the crossroads? Evaluation and discussion. arXiv preprint arXiv:2012.09631.
- Fang, M., Li, Y., & Cohn, T. (2017). Learning how to active learn: A deep reinforcement learning approach. arXiv preprint arXiv:1708.02383.
- Makili, L. E., Vega Sánchez, J. A., & Dormido-Canto, S. Active learning using conformal predictors: Application to image classification.
The approaches in these papers are somewhat similar to the proposed method. These papers could have been used as part of the baselines or references.
其他优缺点
Strengths –
The proposed framework AutoAL designed on top of uncertainty and diversity based AL approaches, utilizes two neural networks optimized concurrently under bi-level optimization framework to select optimal AL strategies to solve the traditional active learning problem. Overall, the performance looks promising and the paper is easy to follow.
Weaknesses -
The proposed AutoAL does not seem novel . It utilizes the same uncertainty and diversity-based criterion to select batches of samples.
The average run time of the total AutoAL is significantly higher than most of the baselines used as part of the experiments, perhaps because of the involvement of deep neural nets, SearchNet and FitNet co-optimization in a bi-level optimization structure, whereas the accuracy is relatively 1%-3% better than most of the baselines.
It is not clear how AutoAL could be used to add more AL frameworks to extend its applications to other domains such as image segmentation, object detection etc.
其他意见或建议
It would be interesting to see how vision transformer models would work based on this framework.
hank you for your valuable feedback. We appreciate that you confirm our method development, promising performance, and fluent paper writing.
For your concerns, we make the following comments to clarify our points:
Essential References Not Discussed: We thank for your efforts in finding these related materials for us. We plan to add them to the related works. Especially for [1], it contains a lot of datasets, which worth trying.
We plan to add one sentence to related works part, in adaptive sample selection in AL section, [1] frame the algorithm selection task as "learning how to actively learn" and use deep Q-learning algorithm to select which strategy is best suited.
Q1: The proposed AutoAL does not seem novel. It utilizes the same uncertainty and diversity-based criterion to select batches of samples.
A1: Our contribution creates a differentiable bridge connecting "search" and "fit" models in Active Learning, outperforming manual and non-differentiable approaches. While AutoAL builds on existing uncertainty and diversity-based methods, it solves two key problems:
- Single-criterion AL methods accumulate bias over iterations. AutoAL incorporates multiple methods to reduce this bias.
- Criteria that work well in current settings may perform bad in future rounds. For example, BADGE [3] performs poorly with low budgets but excels with increased resources. AutoAL selects the optimal criterion each round to address this issue.
This integration reduces cumulative bias, with experiments confirming AutoAL outperforms two-stage selection methods like BADGE [3].
Q2: The average run time of the total AutoAL is significantly higher than most of the baselines used as part of the experiments, perhaps because of the involvement of deep neural nets, SearchNet and FitNet co-optimization in a bi-level optimization structure, whereas the accuracy is relatively 1%-3% better than most of the baselines.
A2: Thanks for pointing out the average runtime issue. Please refer to Appendix A.1, where we divided the total runtime into three main parts. The co-optimization runtime is relatively small compared to the time that candidate ALs spend querying scores for images. We believe this cost is acceptable. Our tests used 7 candidate ALs, which isn't always necessary (as shown in Figure 4 for OrganCMNIST). Using fewer candidates would significantly reduce runtime. We plan to integrate AL methods with lower computational complexity to further reduce AutoAL's total runtime.
Q3: It is not clear how AutoAL could be used to add more AL frameworks to extend its applications to other domains such as image segmentation, object detection etc.
A3: Thanks for your feedback. We plan to test AutoAL on various applications, as our method is task-agnostic. For tasks like image segmentation, whenever candidate AL methods [4,5] can score instances, SearchNet and FitNet will train on these scores. Since most published works [2,6] also only focus on image classification datasets, we believe our results demonstrate AutoAL's superiority in most scenarios.
Q4: It would be interesting to see how vision transformer models would work based on this framework.
A4: Thanks for your feedback. We want to clarify that using a basic learning is a normal setting which has been used in published works [1,4], however, we plan to add the experiments for transformers in the future to see the performance of AutoAL [6,7].
Reference: [1] Desreumaux, Louis, and Vincent Lemaire. "Learning active learning at the crossroads? evaluation and discussion." arXiv preprint arXiv:2012.09631 (2020).
[2] Hacohen, Guy, and Daphna Weinshall. "How to select which active learning strategy is best suited for your specific problem and budget." Advances in Neural Information Processing Systems 36 (2023): 13395-13407.
[3] Ash, Jordan T., et al. "Deep batch active learning by diverse, uncertain gradient lower bounds." arXiv preprint arXiv:1906.03671 (2019).
[4] Li, Jun, José M. Bioucas-Dias, and Antonio Plaza. "Hyperspectral image segmentation using a new Bayesian approach with active learning." IEEE Transactions on Geoscience and Remote Sensing 49.10 (2011): 3947-3960.
[5] Yang, Lin, et al. "Suggestive annotation: A deep active learning framework for biomedical image segmentation." Medical Image Computing and Computer Assisted Intervention− MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, September 11-13, 2017, Proceedings, Part III 20. Springer International Publishing, 2017.
[6] Sener, Ozan, and Silvio Savarese. "Active learning for convolutional neural networks: A core-set approach." arXiv preprint arXiv:1708.00489 (2017).
[7] Yoo, Donggeun, and In So Kweon. "Learning loss for active learning." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.
This paper proposes a new method for active selection that leverages existing AL algorithms as constituent agents. It consists of two neural networks, fitnet and searchnet, each trained using the pool of data that has already been labeled. searchnet is fit to select the best ament a set of pre-chosen active learning algorithms, and fitnet, which is used to judge the usefulness of each unlabeled samples and helps guide searchnet. The two are trained in a bilevel optimization framework that’s partly enabled by some additional machinery that makes the learning objective for both models differentiable. The fully trained models are used to guide data selection.
给作者的问题
As mentioned above, questions are around the robustness of the approach to other data types, batch sizes, and model architectures.
论据与证据
The authors claim that this is an effective approach to data selection, and they provide experiments on image data demonstrating that this seems to be the case.
方法与评估标准
The criteria make sense for the problem at hand, but it would have been nice to have seen more variance in the model architecture (they just used resnets), batch size, and data type (currently just images).
理论论述
There are no theoretical claims.
实验设计与分析
The experimental design is typical for pool-based active learning, consisting of labeled data, unlabeled data, and a sequential learning algorithm. The agent selects unlabeled points it believes are most productive for learning, they're labeled by an oracle, placed in the labeled pool, and the classifier is updated. The goal is to have the highest performing model possible given a fixed labeling budget.
补充材料
The supplement only contains some additional experimental details—I looked through it but didn't think it needed to be included.
与现有文献的关系
The key contribution is in a straightforward and effective way to do unlabeled data selection in a fashion that aggregates existing method. Such an approach can be expensive, as the constituent algorithms can be expensive on their own, but I believe there is a growing need for effective data selection strategies.
遗漏的重要参考文献
Active learning is a very large field, and people have been thinking about it for a long time, but I think limiting the related work to the few mentioned contemporary papers to be acceptable. I wouldn't say anything truly essential has been omitted.
其他优缺点
The provided results seem compelling, but here are a few notes that I feel would make the paper stronger:
-
Experiments are only done on image data and with resnets. I wonder how contingent the performance of the approach is on the convolution inductive bias—it would be useful to see some experiments with more naive models, like MLPs. Along these lines, I’ve seen other papers show results for different acquisition batch sizes, which I feel would also give more clarity to how the approach performs.
-
Experiments only show the earliest parts of learning curves. The results look very promising, but it would be nice to see asymptotic performance.
其他意见或建议
This is a weak point, but calling the method AutoAL seems very general, as other techniques, such as ALBL, also do what I think I’d classify superficially as “automated active learning” in the same way. I'd consider switching to something more descriptive of this technique in particular.
Thank you for your valuable comments and time. We appreciate that you confirm our contribution and appreciate our work.
For your questions, we add more experiment results for your reference: Q1: Experiments are only done on image data and with resnets. I wonder how contingent the performance of the approach is on the convolution inductive bias—it would be useful to see some experiments with more naive models, like MLPs. Along these lines, I’ve seen other papers show results for different acquisition batch sizes, which I feel would also give more clarity to how the approach performs. Q2: Experiments only show the earliest parts of learning curves. The results look very promising, but it would be nice to see asymptotic performance.
A1 and A2: To answer your questions, we conduct new experiments to show the results, especially on CIFAR10 dataset, as it's commonly used by other related works [1,2]. Speicially, we changed the acquisition batch size from 1000 (original setting of our paper) to 5000 to show the difference. Then we compare the results between our proposed AutoAL with two baselines, KMeansSampling [3] and LPL [4], which are the worst and best performed baseline in our original CIFAR 10 experiments. The results show that our method can still outperform the baselines. Also using a basic learning is a normal setting which has been used in published works [1,4], but we plan to add the experiments for MLPs in the future.
| Labeled dataset | KMeansSampling | LPL | AutoAL |
|---|---|---|---|
| 10000 | 74.3 ± 0.1 | 72.6 ± 0.2 | 74.2± 0.1 |
| 15000 | 77.2 ± 0.1 | 78.7 ± 0.1 | 78.8 ± 0.1 |
| 20000 | 80.4 ± 0.0 | 82.2 ± 0.1 | 82.7 ± 0.0 |
| 25000 | 82.3 ± 0.1 | 85.0 ± 0.2 | 85.3 ± 0.1 |
| 30000 | 84.2 ± 0.1 | 87.8 ± 0.1 | 88.4 ± 0.1 |
| 35000 | 85.6 ± 0.2 | 88.6 ± 0.2 | 89.3 ± 0.1 |
| 40000 | 86.1 ± 0.1 | 89.8 ± 0.1 | 90.6 ± 0.0 |
Q3: This is a weak point, but calling the method AutoAL seems very general, as other techniques, such as ALBL, also do what I think I’d classify superficially as “automated active learning” in the same way. I'd consider switching to something more descriptive of this technique in particular.
A3: Thanks for your suggestions. We plan to change the name AutoAL to DDAL, representing Differentiable Deep Active Learning in the final version.
References: [1] Sener, Ozan, and Silvio Savarese. "Active learning for convolutional neural networks: A core-set approach." arXiv preprint arXiv:1708.00489 (2017).
[2] Hacohen, Guy, and Daphna Weinshall. "How to select which active learning strategy is best suited for your specific problem and budget." Advances in Neural Information Processing Systems 36 (2023): 13395-13407.
[3] Ahmed, Mohiuddin, Raihan Seraj, and Syed Mohammed Shamsul Islam. "The k-means algorithm: A comprehensive survey and performance evaluation." Electronics 9.8 (2020): 1295.
[4] Yoo, Donggeun, and In So Kweon. "Learning loss for active learning." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.
The reviewers appreciated the intuitively motivated method to select among active learning strategies and the strong experimental results. However, given that the method lacks theory (e.g., the labeled set is not representative of the data distribution), the method isn't fully contextualized within similar active-learning-policy-learning (a.k.a. "learning to active learn"), and this research area is very well-trodden, I only recommend a weak accept.