PaperHub
5.3
/10
Rejected4 位审稿人
最低5最高6标准差0.4
5
5
5
6
3.8
置信度
ICLR 2024

BOWLL: A DECEPTIVELY SIMPLE OPEN WORLD LIFELONG LEARNER

OpenReviewPDF
提交: 2023-09-18更新: 2024-02-11
TL;DR

we introduce the first cohesive baseline for lifelong learning in an open world setting

摘要

关键词
Open World LearningLifelong LearningContinual LearningActive LearningBenchmark Baseline

评审与讨论

审稿意见
5

The manuscript delineates a simple method, termed as BOWLL, which is devised as a baseline for the evaluation of open world lifelong learning - a conjunction of open-set recognition, active learning, and continual learning. The BOWLL exhibits an innovative usage of the Batch Normalization layer - a commonly-used component of neural networks, along with Out-of-Distribution Detection module, Active Query module, Memory Buffer, and Pseudo Data that endow the proposed method with competitive performance for open world lifelong learning.

优点

S1. The paper's significance is underscored by its motivation to facilitate future research in this domain.

S2. The BOWLL model's novelty is encapsulated in its innovative usage of the Batch Normalization layer, along with Out-of-Distribution Detection module, Active Query module, Memory Buffer, and Pseudo Data.

S3. The evaluation is comprehensive, with comparisons to baseline methods and ablation study providing a compelling demonstration of the promising performance of the BOWLL method.

S4. The clarity of the manuscript enhances accessibility for readers, facilitating a straightforward understanding of the proposed approach.

缺点

W1. Although the paper provides a comprehensive explanation of the methodology, further technical insights regarding the implementation and each module within the BOWLL method would be beneficial.

W2. The paper falls short in providing a detailed analysis of the limitations of the proposed BOWLL, a factor which could be significant for future research and practical applications.

W3. The computational complexity of the BOWLL algorithm, especially for those selection and replacement strategies, which could be a concern for large-scale datasets or practical applications, is not discussed in the manuscript.

W4. Although the paper employs sound-good methodology and achieves competitive performance, further efforts regarding the technical innovation and methodological novelty would be beneficial.

W5. The manuscript could delve deeper into the Continual Train Step, a factor which could be pivotal for understanding the pipeline of open-world lifelong learning.

W6. A more detailed exposition of the datasets used in the evaluation, including their characteristics and potential biases, would enrich the manuscript.

问题

C1. How does the method balance the data in the memory buffer and pseudo-images?

C2. What is the formulation of RTV()R_{TV}() and Rl2()R_{l_2}() respectively in Eq. (7)?

C3. What is the meaning of β\beta in the evaluation metric LCA?

C4. Haven't the model used those discarded data?

C5. What is the relationship between open-world learning and open-world lifelong learning?

评论

We thank the reviewer for their assessment and appreciate the additional feedback on how to improve our paper. We have uploaded a revised version of our paper, that already incorporates the mentioned suggestions and provides clarifications with respect to the remaining concerns and questions. For convenience, the changes and additions are highlighted in blue in the pdf. We have also posted a summary of all changes, spanning all reviewers’ feedback at the top. In the following we provide additional short statements and responses to the specific reviewer’s points.

W1 & W4: “further technical insights regarding the implementation and novelty”

We agree that more details were necessary. Whereas earlier we have just had a technical description of the OoD module (A.2), added appendices A.3 and A.4 now provide the full details for the active query and Deep Inversion. With respect to novelty, we emphasize that there exists no “GDUMB equivalent” as the default in open world learning. BOWLL provides this first baseline with a single easy to use common component, by identifying the three-fold use of batch-normalization. Both the OoD detector and the active query are novel by themselves.

W2: limitations and prospects of BOWLL

We agree that these are valuable. We now provide a brief summary in the conclusion and point to a new appendix A.10 for an extended account. In summary, we discuss the diagonal Gaussian covariance in batch-norm to render the easiest baseline, discuss potential caveats of Deep Inversion, and highlight how BOWLL can easily be used in other scenarios without supervision.

W3: “computational complexity of BOWLL”

We now provide an intuitive walkthrough of the computational aspect of BOWLL’s components at the end of new appendix A.4.. In summary, the main critical component is Deep Inversion, not the active query step, as DI requires many updates on a synthetic image before converging. However, we also alleviate this concern by proceeding to discuss the trade-off in light of our findings of figure 3, where BOWLL* (without Deep Inversion) shows only around a 3% accuracy decrease. We agree that it was helpful to expose this to prospective readers in the pdf.

W5: “”delve deeper into the continual learning step

In addition to the new appendix sections to provide more details for the individual components, we have revised the main body’s sections 3.3 and 3.4, where we now more precisely describe the use of the memory and the update step. This should improve clarity and remove potential prior ambiguity.

C1: “Balance data in memory buffer and pseudo-images”

We now describe that we only generate as many pseudo-images as in the memory buffer, a 1:1 balance that is technically a prospective hyper-parameter, discussed in appendix A.4 together with computational trade-offs.

C2: ”Formulation of equation 7 - priors”

We now state the intuition behind the priors in the main body and provide exhaustive definitions in appendix A.4. In short, The l2 term ensures that the range of values of the synthetic data remains in a reasonable range and the total variation (TV) prior ensures that pixels in a vicinity are related (to form entities in the specific case of images) - as motivated by cited prior work.

C3: “meaning of beta in LCA”

LCA sums over mini-batches of data trained on so far. In essence, it asses how quickly the model learns. For instance, beta=1 is one-shot learning. To clarify, we now provide an extended description of definitions and their interpretation of all metrics in appendix A.6.

C4: “Hasn’t the model used those discarded data”

We have revised the text to more precisely point out that data that has been rejected in the OoD step is never queried by the active learner and that data that is not in the memory buffer, is not included in optimization steps. As such, all data may have been passed through the model to figure out whether to include it, but only data that makes it into the memory buffer is ultimately trained on. Table 2 shows the number of data points that are being used in optimization.

C5: ”Relationship between open world learning and open world lifelong learning”

We realize we have not pointed this out precisely before, but open world learning only mentions that data is included or rejected, then queried/labelled and an incremental learning step is conducted. Often this leads to an interpretation that novel identified data is simply concatenated to existing datasets over time (inspired by active learning). To disambiguate this from a continual learning step (where we do not store old data), we have included the term lifelong. We have included a statement in the related work section, pointing to a new appendix A.1 with prior work’s definitions.

Finally, we thank the reviewer again for the feedback and invite them to confirm that our revised manuscript has adequately included their points.

评论

We thank the reviewer once more for their constructive feedback and the great suggestions to further improve our paper.

We have taken great care and effort to incorporate all reviewers’ points in form of both summarized answers and tangible updates to the actual paper pdf.

We believe this new revised paper version includes required clarifications and addresses the reviewer’s prior concerns.

As the discussion phase is about to end, we would appreciate an acknowledgement of our efforts and will be happy to answer any additional questions, if necessary.

审稿意见
5

This paper addresses open-world continual learning, an emerging research area, and suggests leveraging Bayesian Network statistics to enhance various phases of open-world learning.

优点

  1. Applying BN statistics for OOD detection, active learning, and continual learning is a novel and unified approach.
  2. The significance of the problem is notable.
  3. It surpasses a strong baseline, GDUMB.

缺点

I believe this paper might overstate its contributions for the following reasons:

  1. It seems to focus solely on the class-incremental learning scenario in continual learning, despite claiming to address various types of continual learning settings. How about, for example, Task-incremental learning [1]?

  2. The paper claims that BOWLL can achieve OOD detection, active learning, and continual learning, but I only see a comparison in final and LCA performance in table 2. This falls short of adequately demonstrating the model's superiority in all three objectives.

[1]: Continual learning of a mixed sequence of similar and dissimilar tasks. Ke et al., NeurIPS 2020

问题

See above

评论

We thank the reviewer for their assessment and appreciate the additional feedback on how to improve our paper. We have uploaded a revised version of our paper, that already incorporates the mentioned suggestions and provides clarifications with respect to the remaining concerns and questions. For convenience, the changes and additions are highlighted in blue in the pdf. We have also posted a summary of all changes, spanning all reviewers’ feedback at the top. In the following we provide additional short statements and responses to the specific reviewer’s points.

“Overstated contributions: paper focuses only on class-incremental learning in continual learning”

We believe there is a misconception of what the paper is doing and trying to achieve, but we taken the point as feedback to improve and have extended the writing to avoid this in the future. To first clarify, the paper is not tackling a purely continual learning problem. In fact, the experiments follow a logical sequence (see next point) to step by step lead the reader to open world learning, where the data does not only change over time, but may also include both irrelevant or corrupt data. Open world learning is a very challenging realistic scenario, that requires the model to predict whether a data point should be rejected/accepted, how informative it is, and how to continue training. Traditional continual learning only focuses on the last aspect. We are not investigating task-incremental learning because it assumes one knows “which task to predict” during inference through a provided label. This is a step back in the opposite direction of open world learning, where we don’t even assume that the data must be related at all to any of the tasks. As mentioned, we have however taken the feedback to improve the paper and now include a) a detailed introduction and definition to open world learning in appendix A.1; b) an improved experimental setup section to clarify our experiments (see also below) c) a more detailed description of the data and training details in appendix A.7; d) extended motivation and discussion of the active query’s formulation in appendix A.3 (active query).

“Paper claims can achieve OOD detection, active learning, continual learning but doesn’t adequately demonstrate the model’s superiority”

We believe the paper shows this already, as the experiments are meant to showcase individual components in a cascade of insights, first on learning speed, then on forgetting, and ultimately in the full open world learning experiment (third experiment). Especially the last, realistic scenario without curated data shows that BOWLL is the only method to succeed here, making it a very reasonable future baseline. However, we have taken the reviewer’s feedback to improve presentation and more rigorously highlight the individual aspects. Primarily we have improved the paper in the following:

  • Each experimental subsection has been reworded to be more precise on the experimental take-away and now clearly states which component is supported by the found experimental evidence. In short:
    1. Table 2 focuses on highlighting that BOWLL achieves the same performance in the particular domain incremental setting as GDUMB, but does so with less data and massively faster learning speed (LCA) - highlighting the “active” part.
    2. Figures 2 and 3 support that BOWLL features less catastrophic forgetting through the way it maintains its memory - highlighting the “continual” part.
    3. Figures 4 highlights how BOWLL’s OoD detector is crucial to perform meaningful in the open scenario, where the scenario actually contains non-curated data.
  • Based also on reviewer BmgJ’s feedback, we have included two additional baselines in the second and third experiment: experience replay (which visits all task’s data first and then keeps a memory as it proceeds) and Softmax based OoD detection, to highlight that BOWLL is a very meaningful baseline beyond GDUMB alone. The discussion and plots are updated respectively.

Comment: “Bayesian Network Statistics”

We are unsure if there may be a misunderstanding from our side on the reviewers remark “suggests leveraging Bayesian Network statistics”, so we would like to clarify that BN in our case is batch-normalization. While batch-normalization may certainly be framed in a Bayesian Network context and may even be sampled from, we want to emphasize that BOWLL is a simple, yet highly performative baseline that does not require a framing of Bayesian Neural Networks or their use. Any non-Bayesian neural network that contains batch-normalization is convertible into a BOWLL learner, which makes it a meaningful baseline for open world learning.

Finally, we thank the reviewer again for the feedback and invite them to confirm that our revised manuscript has adequately included their points. If the reviewer has any remaining questions, we are happy to clarify them further.

评论

We thank the reviewer once more for their constructive feedback and the great suggestions to further improve our paper.

We have taken great care and effort to incorporate all reviewers’ points in form of both summarized answers and tangible updates to the actual paper pdf.

We believe this new revised paper version includes required clarifications and addresses the reviewer’s prior concerns.

As the discussion phase is about to end, we would appreciate an acknowledgement of our efforts and will be happy to answer any additional questions, if necessary.

审稿意见
5

This paper proposes a baseline method for open-world lifelong methods that relies on the idea of batch normalization. The method relies on three important components: 1) detection of out-of-distribution examples using batch-norm statistics, 2) active querying of remaining examples by using batch-norm statistics, 3) continual training using both example replay and generated pseudo-examples that also rely on information coming from batch-norm statistics. Experiments are run on three benchmark datasets, and compared to strategies such as joint learning, funetuning and GDUMB, using metrics such as backward transfer and accuracy.

优点

  • The paper intends to tackle the very important problem of open-world lifelong learning by exploiting a simple yet effective strategy of batch-normalization statistics. These statistics are exploited in several parts of the learning process, including discarding OOD examples, actively selecting most effective examples, and actually learning from these selected examples in a continual learning setting.
  • Experiments show that the proposed baseline is quite competitive, in particular in terms of backward transfer (Table 2)
  • The paper is well-written and easy to follow. The components of the solution are clearly explained, and the diagram in Fig. 1 is very self-explanatory.

缺点

  • The main weakness that I see in the paper is the limitation of the experiments. I would have expected more robust experiments in more varied datasets, and a larger number of datasets and tasks.
  • Similarly, I would have expected more comparisons with other SOTA methods that, although not originally open-world learning, perhaps could be slightly modified for the sake of comparison.

问题

  • Table 2 shows quite a remarkable good performance of the proposed method in the case of backward transfer, which is a very challenging problem in continual learning, and is difficult to achieve. Could you provide more insights as to why this would be the case?
评论

We thank the reviewer for their assessment and appreciate the additional feedback on how to improve our paper. We have uploaded a revised version of our paper, that already incorporates the mentioned suggestions and provides clarifications with respect to the remaining concerns and questions. For convenience, the changes and additions are highlighted in blue in the pdf. We have also posted a summary of all changes, spanning all reviewers’ feedback at the top. In the following we provide additional short statements and responses to the specific reviewer’s points.

“The main weakness that I see in the paper is the limitation of the experiments”

We have provided a more detailed description of our experiments and have clarified their purpose more thoroughly in the main body and appendix. In addition, we agree that there may not exist any SOTA methods designed specifically for open world learning, but that some other methods are interesting for readers to see in this context. To this end, we have run and added two further popular baselines to our experiments: experience replay (ER) and Softmax based OoD detection (according to the published works of Rollick et al and Hendrycks et al). Whereas GDUMB actively fills a memory buffer at random and thus is the closest in spirit to BOWLL, ER is a more intuitive comparison in experiment 2’s focus on forgetting, as ER first trains on all data before maintaining a random memory buffer over time. In figure 3, we can see that this training on all data does not lead to ER outperforming BOWLL. Naturally ER fails in the experiment with corrupted and OoD data. Here, Softmax based OoD detection provides a baseline for the open world learning scenario of the third experiment. We further note that appendix A.8 contains an ablation study, demonstrating the meaningfulness of BOWLL’s components and showing how BOWLL performs when individual parts are stripped away.

Question: could you provide more insights in the case of backward transfer

We agree with the reviewer that observing positive backward transfer is remarkable and interesting. In general, and particularly in the mentioned specific scenario of the sequence of multiple digit based datasets (table 2), we attribute the high performance to the fact that BOWLL contains an active learning query that balances data novelty with similarity (rather than for instance querying based on high entropy alone or doing so at random as in GDUMB). As such, new data that features some form of similarity is prioritized and thus contains partial information to further improve retrospectively. In line with the question of reviewer ELi2, we understand that the precise benefit of the exact active query formulation may not have been fully clear enough in the present manuscript. We have made the wording more precise and now provide an extended discussion of how and why the active query has been conceived in appendix A.3.

Finally, we thank the reviewer again for the feedback and invite them to confirm that our revised manuscript has adequately included their points.

评论

We thank the reviewer once more for their constructive feedback and the great suggestions to further improve our paper.

We have taken great care and effort to incorporate all reviewers’ points in form of both summarized answers and tangible updates to the actual paper pdf.

We believe this new revised paper version includes required clarifications and addresses the reviewer’s prior concerns.

As the discussion phase is about to end, we would appreciate an acknowledgement of our efforts and will be happy to answer any additional questions, if necessary.

审稿意见
6

In this paper, the authors introduce the first monolithic baseline for open world lifelong learning, which remedies the lack of well suited baselines for evaluation. Particularly, the simple batch normalization technique is repurposed for 3 subtasks in lifelong learning: open-set recognition, active learning and continual learning. Through extensive empirical evaluation, the resulting approach proves simple yet highly effective to maintain past knowledge, selectively focus on informative data, and accelerate future learning. The proposed method also compares favorably to other related baselines.

优点

  • A simple and reliable baseline is always valuable, especially for the less-studied open world lifelong learning area. The method seems competitive on the benchmarked datasets.
  • The unified use of the batch norm statistics for the 3 components in lifelong learning is interesting and promising. The ablation in Table 3 of the appendix is nice, indicating the involved components are indispensable.

缺点

  • One main concern of this paper is the missing analysis for some components of the proposed lifelong learner (see questions below).

问题

  • The image synthesis method based on Deep Inversion seems interesting. All it's doing is to generate class-conditioned pseudo-images using past representations (the running mean and variance from the batch normalization layers). How much cost will such image synthesis incur? How faithful are the generated images? Why not opt for feature synthesis which seems natural and efficient given the maintained feature mean and variance?
  • For active query, the acquisition function is designed using entropy weighted with sample similarity. How important is such weighting? Is this the best way to strike a good tradeoff between exploration and similarity? Any other formulations for ablation/comparison?
评论

We thank the reviewer for their assessment and appreciate the additional feedback on how to improve our paper. We have uploaded a revised version of our paper, that already incorporates the mentioned suggestions and provides clarifications with respect to the remaining concerns and questions. For convenience, the changes and additions are highlighted in blue in the pdf. We have also posted a summary of all changes, spanning all reviewers’ feedback at the top. In the following we provide additional short statements and responses to the specific reviewer’s points.

“How faithful are the generated images of Deep Inversion (a); Why not opt for feature synthesis, which seems natural and efficient given the maintained feature mean and variance (b). ”

a) We remark that the generated images do not necessarily need to be faithful in the sense of the original data, as the intend is to mitigate catastrophic forgetting only and not train on them from scratch or reconstructing natural images. However, the included L2 and TV priors (now explained better in the main body and in much more detail in appendix A.4) regulate the image values to stay within a meaningful range and ensure that pixels in a local vicinity retain similarity, in order to ensure that entities are formed rather than just producing any (adversarial) noise that fits the statistics.

b) We agree that feature synthesis is a highly intriguing direction, and have included a short discussion and motivation for our choice in the new appendix A.4. In essence, we believe that a strong baseline needs to both be performant but also simple to implement. Using synthesized data points from Deep Inversion matches this requirement, as it allows us to straightforward interleave these instances with the real memory buffer, populated by the active learner, in optimization. It is thus trivial to code and employ in any system that already contains batch-norm, the majority of current NNs. We do however agree that feature synthesis is intriguing and now explicitly point to it, referring also to works such as Pellegrini et al (IROS 2020) that have already demonstrated the efficacy of “latent replay” techniques. (We are happy to include more references here, if the reviewer has explicit pointers). We are excited for future work to pick up on this direction and showcase how such a more involved approach could beat the current BOWLL baseline, in similar spirit to how future work may extend the use of diagonal covariance towards full measures.

“For the active query, … is this the best way to strike a good tradeoff between exploration and similarity?”

Yes indeed, the reviewer’s intuition is correct and the design is about the tradeoff between exploration and similarity. We noticed that we could perhaps highlight this aspect more prominently and are now doing so in the main body, pointing further to a new appendix section A.3. Similar to the extended description of the conception of the OoD detector in A.2, we now describe the conception of the active query in much more detail. Here, we also point out the failure mode of excluding the similarity term. That is, informativeness gauged by the entropy term alone might yield highly novel data that is unrelated to the task or undesired (e.g. very noisy, perturbed, corrupted or fully unrelated data). The memory replacement step follows a similar logic. At the end, we now also note that there could technically be a more elaborate weighting of the exploration and similarity terms, but that we did not introduce respective weights to analyze this option further.

Finally, we thank the reviewer again for the feedback and the already highly positive perception of our work. Nevertheless, we invite them to read our overall additions to the manuscript and to confirm that our revised manuscript has adequately included above points.

评论

We sincerely thank all reviewers for their valuable suggestions and pointers on how to improve our paper. As already outlined in the individual responses, we have taken all reviewers’ feedback into account and have uploaded an updated pdf. For convenience, we have highlighted all changes and suggested additions in blue and have concatenated the appendix to the main pdf for now. We will remove the color and separate the pdfs again for the camera ready version. Once more, we express our gratitude to all reviewers for helping us with these further paper improvements.

We briefly summarize all updates in the following:

  • More concise BOWLL overview: Although the reviewers have already complimented the readability of the manuscript, we have reworded parts of sections 3.1 (overview of BOWLL) and section 3.4 (Summary of all BOWLL components) to be even more clear and concise.
  • Additional details on active query design: we now explicitly motivate and mention the trade-off of exploration and similarity in design of our active query. In addition to clarification in the main body, we now provide an extended description of the equation’s conception in the new appendix section A.3 and motivate its composition. This section now complements the previously existing similar appendix A.2 for the OoD score.
  • More precise formulation of individual experimental take-aways: We have reworded parts of our experimental discussion to more explicitly highlight the individual experiment’s role in providing empirical evidence for each of the three perspectives (OoD, active, continual) in BOWLL. The texts are now more precise and each subsection now ends on a specific take-away message, as supported by the quantitative findings.
  • Extended description of experimental setup: the start of section 4 now describes dataset sequences and baselines more precisely. In addition, we now provide a better intuition for the employed metrics and have included detailed mathematical definitions in a new appendix section A.6. The existing account of dataset and training details has now also been extended in appendix A.7.
  • Two more experimental baselines: In addition to the comparison with the prior baseline of GDUMB, we have now also added experiments with experience replay and softmax based OoD detection. We agree that these help showcase how BOWLL excels as a baseline in all three dimensions, and that these common baselines provide the reader with more intuition, despite not being open world learning focused. For instance, we show that experience replay performs better than GDUMB and similarly in terms of forgetting to BOWLL in figure 3, yet at the cost of data and learning speed (as it trains on all data before constructing the memory first), and expectedly fails in open world scenarios in figures 4. Figures 3 and 4 have been updated respectively.
  • Auxiliary discussion and definition of open world lifelong learning: In addition to the main body’s summary, we are now pointing to a new appendix section A.1 that provides a more detailed account of what constitutes open world learning and why the term lifelong has been explicitly added to the term in our work.
  • Mathematical specification of Deep Inversion priors: We provide mathematical definitions for the employed data modality priors (total variation and l2-norm) in Deep Inversion in appendix A.4, as derived by respectively cited priors works.
  • Extra discussion on the rationale behind employing data replay instead of feature rehearsal: We discuss and cite the idea of employing feature rehearsal, rather than pure data rehearsal in appendix A.4 for the Deep Inversion module. In summary, we employ data rehearsal as BOWLL is meant as a simple, yet highly functional, baseline that allows easy interleaving of data during training. However, we now explicitly acknowledge the potential conception of other future variants.
  • New discussion of involved computation: We discuss the involved computation in appendix A.4. As Deep Inversion is the only component with heavy involved compute, the respective discussion particularly delves into the cost of the additional few percent points gained by Deep Inversion, in relation to the accuracy trade-off already ablated in figure 3 (refer to BOWLL*, which removes Deep Inversion).
  • Inclusion of limitations and prospects: We now briefly summarize limitations and prospects in the conclusion and provide a more detailed discussion of the points in the new appendix section A.10. In summary, these are related to intentional limitation to diagonal covariance of conventional batch-norm to retain the simple character, the discussed caveats/trade-offs of employing Deep Inversion, and potential applications in scenarios without supervision.

We will appreciate a brief update from the reviewers in confirmation of above changes that have incorporated their feedback.

AC 元评审

(a) Summarize the scientific claims and findings of the paper based on your own reading and characterizations from the reviewers.

  • The authors study the problem of "open-world lifelong learning," defined as combining three broad CL challenges: 1) novel class detection, 2) active querying, 3) knowledge consolidation
  • The main contribution is a new baseline method for the problem referred to as BOWLL. BOWLL combines OOD detection from unlabelled data, active learning, and continual re-training. BOWLL relies on the statistics of batch norm layers to quantify and inform its different modules (e.g., to derive uncertainty for the acquisition function used in active learning).

(b) What are the strengths of the paper?

  • The described setting is rich and enables the evaluation of its different components separately or in combination. This likely provides ideas that others can build upon.
  • BOWLL achieves its different goals while remaining relatively straightforward (its three modules all use statistics of batch norm).

(c) What are the weaknesses of the paper? What might be missing in the submission?

  • The proposed studies are limited, given the scope of the setting and the different components of the baseline
  • Some of the claims are difficult to verify given the very synthetic nature of the evaluation (Split CIFAR-10). For example, it is unclear if the findings about OOD and active learning carry over to more challenging settings.

为何不给更高分

As noted by the authors, the reviewers did not offer much feedback to the authors after their initial review. The authors participated fully in the process and provided detailed replies, justifications, and an improved version of their work. Three of the four reviewers participated in our private discussion and considered the authors' responses. Reviewer ELi2 recognized some of the criticism raised by the other reviewers and ended up lowering their overall rating.

This paper is very much a borderline one. On the one hand, the authors study a complete and challenging problem space for supervised continual learning. One that requires a combination of entirely different learning-based modules. In this context, the authors propose a straightforward baseline approach to enable the community to build upon their insights further. On the other hand, novel settings offer opportunities to study challenges methodically. Such analysis is largely absent from the paper. Further, BOWLL is one possible instantiation of a method for the setting. Still, it also makes several arbitrary decisions (as all methods do, of course) and would benefit from being further explored (e.g., through a larger empirical study across different datasets/environments).

To be clear, I do not argue for additional "sota" methods (I am not sure what that refers to in this context) but rather a more thorough exploration and evaluation of the proposed setting and baseline.

Overall, while the authors provided an improved version of their work, I still find the paper would benefit from more thorough studies. I am sorry that I cannot recommend acceptance at this stage.

为何不给更低分

N/A

最终决定

Reject