5.5

/10

Rejected4 位审稿人

最低5最高6标准差0.5

3.8

置信度

正确性3.3

贡献度2.3

表达2.3

ICLR 2025

Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models

Gang Li,Wendi Yu,Yao Yao,Wei Tong,Yingbin Liang,Qihang Lin,Tianbao Yang

OpenReview PDF

提交: 2024-09-27更新: 2025-02-05

TL;DR

We propose a safety-centric framework to ensure zero-forgetting in iterative model development process by utilizing data-dependent constraints.

摘要

关键词

Model Developmental SafetyContinual LearningVision-Language ModelsConstrained Optimization

评审与讨论

审稿意见

评分: 5置信度: 32024-10-29

The paper formulates the safety multi-stage development problem using a comprehensive mathematical framework, offering a detailed analysis of its application on CLIP with a theoretically derived, task-dependent head. The authors propose an efficient constrained optimisation algorithm, which is empirically validated through extensive experiments.

优点

Introducing the concept of model developmental safety is highly valuable, particularly in the context of large language models (LLMs), where the continual development often strains prior safety and alignment constraints. This concept is timely and impactful.
The paper provides a robust guarantee for model developmental safety (MDS) of CLIP, underpinned by a detailed convergence analysis.
Leveraging theoretical insights, the authors apply LoRA-based, task-dependent heads to effectively reduce the value of $\delta$ , with empirical validation provided in Appendix A.5.3.
The proposed method demonstrates impressive performance improvements over baselines, notably in terms of the safety ratio, showcasing its effectiveness and robustness.

缺点

Applying the model developmental safety (MDS) framework to vision-language models like CLIP for image classification is an interesting approach; however, it may not fully showcase the safety-critical nature of MDS. Since image classification in CLIP carries relatively low safety risk, especially compared with application in the safety of Large Language Models (LLM).

Due to this, It’s challenging to distinguish this work from conventional Continual Learning (CL) approaches, despite the explanations in the related work section. To clarify the unique contribution, it could be beneficial to either emphasise scenarios where safety risks in CLIP are more evident or explore a more safety-critical application domain. For instance, focusing on multiple cycles of model development within LLMs—which frequently involved fine-tuning and are urgently required to ensuring safety and alignment—may better align with MDS objectives and make the safety focus more explicit and practical.

问题

In Eq. (2), the concept of DevSafety seems to be defined as the worst-case performance drop of protected tasks. Could you please elaborate on how this definition differs from similar metrics, such as the forgetting measure commonly used in continual learning? Or is the primary aim of DevSafety indeed to achieve zero forgetting?
he continual learning (CL) baselines included seem somewhat dated, with the most recent stemming from 2018 (Castro et al., 2018). It would strengthen the paper’s claims to compare the proposed method with more recent baselines mentioned in the related work.
For clarity, it might be helpful to define $s(\mathrm{\mathbf{x}}; \mathrm{\mathbf{w}})$ at its first mention (L158), rather than waiting until L179, as this could enhance readability and comprehension for readers.
Providing a brief discussion of the limitations and potential directions for future work would be valuable, helping readers understand the broader impact and next steps for this research.
The literature review on Safe Reinforcement Learning (SafeRL) at Line 134, while informative, may fall slightly outside the main scope of this article. You might consider clarifying its relevance to the paper’s focus, or potentially removing this section to maintain a more concise scope.

Francisco M Castro, Manuel J Marín-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV), pp. 233–248, 2018.

2024-11-24

We thank the reviewer for acknowledging the value of our work and providing helpful comments. Below we would like to answer the remaining questions.

RQ1: It’s challenging to distinguish this work from conventional Continual Learning (CL) approaches.

A: Our proposed framework differs from conventional continual learning in two folds. (a) Learning setting: In typical settings for continual learning, models are always trained to learn new tasks, with limited access to previously trained data. In contrast, our work may be utilized to either learn new tasks or improve existing tasks, with sufficient number of old data for ensuring zero-forgetting on protected tasks. (b) Goal: the goal of continual learning is to have a good average performance for all the learned tasks, our work prioritze preserving some protected tasks while improving the target tasks. Therefore, our work exhibits substantial distinction with continual learning.

RQ2: Image classification in CLIP carries relatively low safety risk. It could be beneficial to explore a more safety-critical application domain.

A: Note that experiments in our paper like weather detection and scene recognition are directly related to safety-critical autonomous driving systems. In these scenarios, enhancing the detection of one type of weather at the expense of reduced performance for other types could pose significant safety risks. Moreover, our proposed retention-centric optimization framework is generic and CLIP model with classification task is a demonstration, it can be easily extended to other losses or other models, such as a supervised finetuning loss for LLMs or a standard cross-entropy loss for learning a lightweight model. We hope our work can inspire researchers in safety-critical application domain for further exploration.

RQ3: How does the definition of DevSafety differ from similar metrics, such as the forgetting measure commonly used in continual learning?

A: As pointed out by reviewer, DevSafety is defined to take worst-case of all the protected tasks to measure if all the protected tasks are strictly preserved. In contrast, forgetting measure commonly used in continual learning is defined as average performance drop of protected tasks. Note that average performance doesn't drop doesn't mean each individual protected task performance doesn't drop, e.g., some protected tasks get better and some protected tasks get worse. Due to the intrinsic nature of each protected task, such as each task is associated with detecting one kind of disease for medical diagnosis, it may lead to potential unsafe deployment even when the average performance doesn't drop. So, the primary aim of DevSafety is indeed to achieve zero forgetting for safety-critical applications.

RQ4: It would strengthen the paper’s claims to compare the proposed method with more recent baselines mentioned in the related work.

A: Thank you for your suggestion. We are working on adding a recent replay-based contrastive continual learning baseline [1]. However, given limited time we are still working on the experiments. We would like to emphasize that all the baselines we compared are continual learning baselines, with FLYP standing for direct-replay methods, GEM as a typical continual learning method, WCCL and RM as the regularized-based continual learning baselines tailored to our setting. We expect that continual learning can not ensure model developmental safety as they focus on trading off between protect tasks and target tasks.

[1] Cha, Hyuntak, Jaeho Lee, and Jinwoo Shin. "Co2l: Contrastive continual learning." Proceedings of the IEEE/CVF International conference on computer vision. 2021.

RQ5: Suggestions about the definition of s(x;w), discussion of future directions, literature review on Safe Reinforcement Learning (SafeRL)

A: We thank the reviewer for helpful suggestions for improving the paper. We have revised the writing accordingly. We'd like to mention that the reason for deferring the definition of s(x;w) to line 179 is that the definition in line 179 is the special form for CLIP models, so we present the special explicit form of s(x;w) after introducing CLIP models to avoid confusion. As suggested, we have included more discussion of further directions of this work. Please refer to our new version of the paper.

评论- Thank you for your prompt reply

2024-11-25

Thank you to the authors for their efforts in improving the paper. However, some of my concerns, particularly regarding RQ1, RQ2, and RQ4, remain insufficiently addressed. As such, I regret that I am unable to raise my score at this time.

2024-11-28

Q: Concerns regarding RQ1 and RQ4 remain insufficiently addressed.

A: We thank the reviewer for the prompt response. Regarding R4, we've finished the experiments with a recent replay-based baseline, namely Co $^2$ L[1]. Following their paper, we tune their $\tau$ in {0.05, 0.1}, $\kappa$ in {0.1, 0.2}, $\kappa^*$ in {0.01, 0.1}, $\lambda$ in {0.1, 1, 10}. The results are presented below. We can see that even recent SOTA continual learning method still fails to ensure model developmental safety indicated by that Retention Ratio is zero and DevSafty measure is less than zero. This is anticipated as conventional continual learning focuses on trading off between protect tasks and target tasks, without ensuring zero-forgetting. We included the results in the revision. The experiments further highlight the distinction between conventional continual learning and our approach, demonstrating that existing continual learning methods cannot achieve the model developmental safety explored in this paper, as related to RQ1.

Method	Measures	100	1k	2k	4k
Base	RetentionRatio//DevSafety	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)
	Target Tunnel	0.1064(0.0000)	0.1064(0.0000)	0.1064(0.0000)	0.1064(0.0000)
Co $^2$ L	RetentionRatio//DevSafety	0.00%//-0.1407(0.0043)	0.00%//-0.1252(0.0061)	0.00%//-0.0821(0.0029)	0.00%//-0.0479(0.0039)
	Target Tunnel	0.6808(0.0460)	0.8936(0.0626)	0.8936(0.0301)	0.8723(0.0000)
RM	RetentionRatio//DevSafety	0.00%//-0.1021(0.0022)	0.00%//-0.0969(0.0036)	0.00%//-0.0955(0.0057)	0.00%//-0.0897(0.0068)
	Target Tunnel	0.9574(0.0233)	0.8894(0.0340)	0.8808(0.0170)	0.8681(0.0085)
Ours	RetentionRatio//DevSafety	40.00%//-0.0050(0.0076)	60.00%//-0.0001(0.0043)	100.00%//0.0105(0.0053)	100.00%//0.0186(0.0058)
	Target Tunnel	0.9362(0.0699)	0.8723(0.0233)	0.9106(0.0159)	0.8723(0.0233)

Method	Measures	100	1k	2k	4k
Base	RetentionRatio//DevSafety	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)
	Target Foggy	0.3953(0.0000)	0.3953(0.0000)	0.3953(0.0000)	0.3953(0.0000)
Co $^2$ L	RetentionRatio//DevSafety	0.00%//-0.0686(0.0064)	0.00%//-0.1217(0.0383)	0.00%//-0.1305(0.0183)	0.00%//-0.0721(0.0154)
	Target Foggy	0.7132(0.0109)	0.6047(0.0380)	0.6357(0.0110)	0.6357(0.0290)
RM	RetentionRatio//DevSafety	0.00%//-0.0418(0.0062)	0.00%//-0.0173(0.0054)	0.00%//-0.0159(0.0034)	20.00%//-0.0124(0.0091)
	Target Foggy	0.5674(0.0378)	0.5023(0.0186)	0.4419(0.0658)	0.2279(0.0174)
Ours	RetentionRatio//DevSafety	0.00%//-0.0241(0.0082)	60.00%//-0.0009(0.0044)	100.00%//0.0044(0.0033)	100.00%//0.0061(0.0047)
	Target Foggy	0.5721(0.0406)	0.4930(0.0174)	0.4326(0.0186)	0.4279(0.0316)

Method	Measures	100	1k	2k	4k
Base	Retention Ratio//DevSafety	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)
	Target Overcast	0.7361(0.0000)	0.7361(0.0000)	0.7361(0.0000)	0.7361(0.0000)
Co $^2$ L	Retention Ratio//DevSafety	0.00%//-0.0138(0.0099)	0.00%//-0.0072(0.0032)	0.00%//-0.0095(0.0043)	0.00%//-0.0137(0.0052)
	Target Tunnel	0.5916(0.0417)	0.8369(0.0049)	0.8396(0.0055)	0.8507(0.0172)
RM	Retention Ratio//DevSafety	0.00%//-0.2932(0.0365)	0.00%//-0.3016(0.0228)	0.00%//-0.2444(0.0120)	0.00%//-0.2634(0.0105)
	Target Overcast	0.9787(0.0050)	0.9730(0.0028)	0.9588(0.0041)	0.9647(0.0023)
Ours	Retention Ratio//DevSafety	0.00%//-0.0655(0.0249)	20.00%//-0.0043(0.0037)	60.00%//0.0012(0.0029)	100.00%//0.0046(0.0016)
	Target Overcast	0.8789(0.0464)	0.7827(0.0225)	0.7562(0.0167)	0.7525(0.0366)

[1] Cha, Hyuntak, Jaeho Lee, and Jinwoo Shin. "Co2l: Contrastive continual learning." Proceedings of the IEEE/CVF International conference on computer vision. 2021.

2024-12-02

Dear Reviewer,

Thank you for the valuable time you have dedicated to reviewing our paper. We have included additional experimental results with a recent replay-based baseline to further validate the effectiveness of our method. As the author-reviewer discussion phase is coming to a close, we would greatly appreciate it if you could let us know whether they address your concerns or if further clarification is needed.

评论- Thank You for Addressing My Concerns

2024-12-03

Apologies for the delayed response. I have carefully reviewed your further reply and the additional experiments addressing RQ4 and RQ1. Regarding RQ2, I acknowledge the potential of this work to extend to other domains, as elaborated in the theoretical analysis. However, I feel there remains a gap between a “conceptually extendable and inspiring” work and one that is fully “implemented and validated” within the target application.

Regrettably, I am unable to raise my score at this stage. That said, I firmly believe this work holds significant promise and would merit a much higher score once it has been thoroughly implemented and validated in more safety-critical applications.

评论- Thank you for your response!

2024-12-03

We are glad to hear that you agree our work holds significant promise.

While we believe more experiments can strengthen the paper, we would like to point out that (i) we have already conducted extensive experiments including 5 baselines, 4 target tasks and 2 datasets, but also ablation studies on the proposed algorithm; (ii) our experiments do include the tasks in safety-critical application in autonomous driving.

审稿意见

评分: 5置信度: 42024-10-31

The paper proposes model developmental safety to argue the importance of handling catastrophic forgetting with a constrained optimization framework. The proposed method is evaluated to ensure the development of CLIP models. The experiments cover datasets from self-driving to scene classification.

优点

The theoretical analysis of the framework in Section 5 is sound and comprehensive.
The evaluation of ensuring CLIP models' continual development is fair.
The visualization of the learning trajectories is well-presented.

缺点

Although the authors tried to address the term ambiguity in Section 2 (with AI Safety), the use of the terms "safety" / "safety-centric" in this paper is often overstated because it doesn’t engage with the broader ethical and operational safety considerations commonly associated with the term. Even further, in Line 132, the paper writes "safety of safety", which is not rigorously explained. In fact, the paper focuses on designing constraints to preserve task performance, which, while essential, diverges from widely understood safety principles in deep learning models. An alternative term such as "developmental stability", "continual stability", or "capability preservation" can more clearly represent the framework's intentions without abusing the term of safety.
The empirical evaluation is lacking. Note that the Vision Language Model (VLM) is a general category of foundation models, and CLIP is only one example of this category. Other representative variants such as LLaVA and BLIP that can generate languages are not evaluated in the paper. In the abstract, the paper writes that "...we study how to develop a pretrained vision-language model (aka the CLIP model)...", which may mislead future readers since "aka" is wrongly used here.
Considering the paper's motivation to ensure stable continual development without harming protected capabilities, it largely overlaps with the task of knowledge/representation editing [1,2,3,4,5,6] on VLM/LLM. However, few pieces of related literature are discussed in the paper. Authors may consider discussing the main advantages of their proposed framework regarding this existing line of research.

[1] Mass-Editing Memory in a Transformer. arXiv:2210.07229.

[2] EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models. arXiv:2308.07269

[3] KEBench: A Benchmark on Knowledge Editing for Large Vision-Language Models. arXiv:2403.07350.

[4] Representation Engineering: A Top-Down Approach to AI Transparency. arXiv:2310.01405.

[5] PaCE: Parsimonious Concept Engineering for Large Language Models. arXiv:2406.04331.

[6] Reducing Hallucinations in Vision-Language Models via Latent Space Steering. arXiv.2410.15778.

问题

Please address my concerns stated in the weakness section. Also, please revise or re-consider all uses of "aka" in the paper (e.g., Line 29, Line 124) as they may lead to unnecessary confusion.

2024-11-24

Thank you for your constructive comments. We've revised our paper to address raised concerns.

RQ1: The use of the terms "safety" / "safety-centric" in this paper is confusing because it doesn’t engage with the broader ethical and operational safety considerations commonly associated with the term.

A: We thank the reviewer for the constructive comments. We agree that some usages of term "safety" in the paper might be confusing, such as "safety-centric method", "safety of safety". To address ambiguity, we replaced "safety-centric method" with "retention-centric method", "safety of safety" with "retention of safety", "safety ratio" with "retention ratio", since our work, as summarized by the reviewer, focuses on designing constraints to retain protected task performance. But we prefer to keep the term "model developmental safety" as our work underscores the importance of strictly preserving existing protected ability in favor of potential safe applications and development efficiency in the model development process. Other words like "stability", "preservation" are wildly adopted in the existing continual learning literature but these works just mitigate the forgetting but not achieve strictly perservation or zero-forgetting. I believe the term "model developmental safety (MDS)" will be helpful for readers to identify the difference between our work and exisiting literature for iterative model development process. Furthermore, to prevent any ambiguity around the term "model developmental safety", we define it at the beginning of the paper, and revise the paper to ensure that every mention of "safety" is preceded by "developmental", when referring to MDS, to clarify its meaning for readers.

RQ2: CLIP is only one kind of Vision Language Model (VLM), 'aka' is inappropriately used in the paper and other representative variants such as LLaVA and BLIP that can generate languages are not evaluated in the paper.

A: Thanks for pointing out the inappropriate use of "aka". To avoid confusion, we have revised the paper with "… we study how to develop a pretrained vision-language model, specifically the CLIP model, …" . Note that the focus of the paper is to propose a constrained optimization framework to strictly preserve the existing protected capabilities while improving target task performance in iterative model development process. To demonstrate the proposed framework, we apply the framework to develop a CLIP model for acquiring new capabilities or improving existing capabilities of image classification. Our framework owns the potential to be applied to other variants of VLMs such LLaVA and BLIP with corresponding adaption with objective and constraint design, but as they are not the focus of this paper, we leave them for further exploration.

RQ3: The relationship between this paper and knowledge/representation editing on VLM/LLM.

A: We thank the reviewer for the constructive suggestion. Knowledge/representation editing on VLM/LLM is related to our work but has a different focus. Knowledge/representation editing, emerging in the era of large foundation models, aims to efficiently modify the behavior of LLMs with minimal impact on unrelated inputs[2], such as to update stale facts, eliminate unintended biases, reduce undesired hallucinations, etc. With knowledge editing minimizing the impact on unrelated contents, our proposed framework is a general framework and aims to strictly preserve protected abilities in favor of potential safe applications and development efficiency in iterative model development process. On the other hand, our proposed framework may be applied to addressing the challenge faced by knowledge/representation editing, by formulating the modified part as the objective (target task) and regrading the unrelated parts as constraints (protect task) to ensure zero-forgetting on unrelated parts (i.e., complete locality in knowledge/representation editing). We have incoporated the discussion about knowledge/representation editing in the revision.

[2] EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models. arXiv:2308.07269

审稿意见

评分: 6置信度: 32024-11-04

This paper aims to study the problem of improving model accuracy on new categories while ensuring that accuracy on fixed existing categories does not degrade. It formulates the problem as an inequality-constrained optimization problem and proposes an algorithm to solve it. Overall, I believe the paper's core innovation lies in introducing a new optimization algorithm for solving non-convex constraint problems.

优点

1.The paper provides extensive theoretical analysis of the optimization algorithm to ensure its correctness.

2.Experiments are conducted on large-scale datasets to verify the general performance of the method.

3.The appendix effectively supplements details of the methodology and experiments.

缺点

1.The article is based on the premise of safety, proposing the assumption of strictly maintaining the original model performance unchanged. However, in real-world applications, classification tasks do not always require extremely high accuracy, and some fluctuation in accuracy is acceptable in certain scenarios. Given that the classification task discussed in the article is not an extreme case, I believe that the strict maintenance assumption proposed may be overly rigid for the actual tasks accomplished by the CLIP model.

2.The abstract and introduction are somewhat misleading, as catastrophic forgetting encompasses a broad range of phenomena beyond the classification issues discussed in the paper, including the ability to recognize image content. The “protected capabilities of the old model” described in the introduction may cause ambiguity.

3.Evaluating model performance solely using the safety ratio metric is insufficient. The issue with the safety ratio metric is that, if a model update method results in an imperceptible decrease in accuracy on existing categories while significantly increasing accuracy on new categories, such a scenario might be acceptable to a certain extent. However, this situation would be rated poorly with this metric. To differentiate these cases from methods that cause significant performance declines on existing categories, it is necessary to include data on the change in recognition accuracy for existing categories after training.

4.When evaluating the ability to protect classification accuracy across multiple categories, the safety ratio is not provided, and I cannot find any points in Figure 2 that obviously exceed the DevSafety (acc) boundary of 0. This raises doubts as to whether the improvement in dressing room classification accuracy was accompanied by declines in certain other categories, making me skeptical of the authors’ conclusion that old performance remains consistent in multi-task scenarios.

问题

Please refer to Weaknesses.

2024-11-24

We thank the reviewer for acknowledging the contribution of our work. Below, we would like to answer the questions raised.

RQ1: Classification tasks do not always require extremely high accuracy, the strict maintenance assumption proposed may be overly rigid.

A: We politely disagree with the reviewer that "classification tasks do not always require extremely high accuracy". In autonomous driving, if the weather condition is foggy, but the system identifies it as sunny, it could cause a wrong decision and may result in accidents. Similarly in medical diagnosis, if a patient is misclassified he/she may suffer from life risk. Please also note that our constrained optimization framework is generic, it can be potentially extended to other scenarios with different loss functions or models.

RQ2: The “protected capabilities of the old model” described in the introduction may cause ambiguity, as catastrophic forgetting encompasses a broad range of phenomena beyond the classification issues discussed in the paper.

A: The introduction intends to be general. Indeed, the "protected capabilities of the old model" means the general capabilities of models not just the classification issues. As discussed in section 3, it may also be coding ability of LLMs or detection ability of objective detection models. The classification ability of models is just the one we measured in our experiments.

RQ3: A method may result in an imperceptible decrease in accuracy on existing categories while significantly increasing accuracy on new categories, such a scenario might be acceptable to a certain extent. It is necessary to include performance changes for protected tasks after training.

A: Our evaluation is based on the motivation of this paper, i.e., preserving the performance of protected tasks while improving that of a target task. One might argue that this might not be needed in some scenarios that can tolerate some performance drop of protected tasks. However, this is not the problem we addressed in the paper. As we have argued in the paper, preserving the performance of protected asks are very important in some safety-critical applcations (e.g., autonomous driving, medical diagnosis). Nevertheless, we also include the DevSafety(acc) numbers for each method in Appendix A.5 in the revision and presented below for your reference, which directly show the largest decrease over all the protected tasks. We can see that baselines usually lead to 3-10 percent decrease when targeting Tunnel and 1.5-7 percent decrease when targeting Foggy.

RQ4: I cannot find any points in Figure 2 that obviously exceed the DevSafety (acc) boundary of 0.

A: As long as Devsafety is larger than or equal to zero, it achieves the model developmental safety. In Figure 2, as long as the points are on the vertical line or at the right of the vertical line it means that model developmental safety is achieved. In Figure 4, as long as the points are above the red horizontal time, it means that model developmental safety is achieved. This indeed happens for our method, which means we preserve the performance of other classes while improving the dressing room classification accuracy.

评论- Results for DevSafety(acc) numbers

2024-11-24

		100	1k	2k	4k
Base(Ref)	SafetyRatio//DevSafety	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)
	Target Tunnel	0.1064(0.0000)	0.1064(0.0000)	0.1064(0.0000)	0.1064(0.0000)
FLYP	SafetyRatio//DevSafety	0.00%//-0.0398(0.0067)	0.00%//-0.0660(0.0126)	0.00%//-0.0647(0.0123)	0.00%//-0.0774(0.0069)
	Target Tunnel	0.9361(0.0330)	0.9702(0.0318)	0.9915(0.0170)	0.9659(0.0170)
WCCL	SafetyRatio//DevSafety	0.00%//-0.0836(0.0164)	0.00%//-0.0756(0.0090)	0.00%//-0.0673(0.0103)	0.00%//-0.0893(0.0089)
	Target Tunnel	0.9957(0.0085)	0.6000(0.1002)	0.6553(0.0282)	0.6383(0.0485)
GEM	SafetyRatio//DevSafety	0.00%//-0.1019(0.0267)	0.00%//-0.1034(0.0153)	0.00%//-0.1301(0.0169)	0.00%//-0.0873(0.0231)
	Target Tunnel	0.8255(0.1214)	0.5915(0.2020)	0.6085(0.0768)	0.3915(0.1819)
RM	SafetyRatio//DevSafety	0.00%//-0.1021(0.0022)	0.00%//-0.0969(0.0036)	0.00%//-0.0955(0.0057)	0.00%//-0.0897(0.0068)
	Target Tunnel	0.9574(0.0233)	0.8894(0.0340)	0.8808(0.0170)	0.8681(0.0085)
Ours	SafetyRatio//DevSafety	40.00%//-0.0050(0.0076)	60.00%//-0.0001(0.0043)	100.00%//0.0105(0.0053)	100.00%//0.0186(0.0058)
	Target Tunnel	0.9362(0.0699)	0.8723(0.0233)	0.9106(0.0159)	0.8723(0.0233)

		100	1k	2k	4k
Base(Ref)	SafetyRatio//DevSafety	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)
	Target Foggy	0.3953(0.0000)	0.3953(0.0000)	0.3953(0.0000)	0.3953(0.0000)
FLYP	SafetyRatio//DevSafety	0.00%//-0.0590(0.0140)	20.00%//-0.0281(0.0167)	0.00%//-0.0254(0.0101)	0.00%//-0.0201(0.0105)
	Target Foggy	0.5721(0.0315)	0.5209(0.0581)	0.5302(0.0228)	0.4977(0.0186)
WCCL	SafetyRatio//DevSafety	0.00%//-0.0504(0.0123)	0.00%//-0.0259(0.0080)	20.00%//-0.0141(0.0111)	0.00%//-0.0132(0.0076)
	Target Foggy	0.3395(0.0865)	0.2186(0.0186)	0.2093(0.0208)	0.2000(0.0114)
GEM	SafetyRatio//DevSafety	0.00%//-0.0695(0.0099)	0.00%//-0.0339(0.0053)	0.00%//-0.0424(0.0060)	0.00%//-0.0424(0.0060)
	Target Foggy	0.3349(0.0865)	0.2837(0.0271)	0.2558(0.0000)	0.2558(0.0000)
RM	SafetyRatio//DevSafety	0.00%//-0.0418(0.0062)	0.00%//-0.0173(0.0054)	0.00%//-0.0159(0.0034)	20.00%//-0.0124(0.0091)
	Target Foggy	0.5674(0.0378)	0.5023(0.0186)	0.4419(0.0658)	0.2279(0.0174)
Ours	SafetyRatio//DevSafety	0.00%//-0.0241(0.0082)	60.00%//-0.0009(0.0044)	100.00%//0.0044(0.0033)	100.00%//0.0061(0.0047)
	Target Foggy	0.5721(0.0406)	0.4930(0.0174)	0.4326(0.0186)	0.4279(0.0316)

2024-11-27

Thanks for the author's response. I have also read the comments from the other reviewers, and I tend to maintain my rating.

审稿意见

评分: 6置信度: 52024-11-06

This paper focuses on the model deployment cycle for a learning-enabled system. The author proposes a concept called "model developmental safety" (MDS) to measure whether the learning-enabled system can strictly maintain the performance, i.e., zero forgetting, of the old tasks for safety-critical domains. The author proposes an efficient constrained optimization algorithm tailored to finetune the pretrained CLIP model that takes the MDS as the data-dependent constraint, providing a statistical guarantee for achieving MDS. Experiments have been conducted on BDD100k from autonomous driving scenarios and Places365 for scene recognition.

优点

(1) The proposed "model developmental safety" (MDS) concept seems interesting and relevant to safety-critical applications, though many concerns remain, which will be elaborated on in the Weakness section.

(2) The proposed constrained optimization algorithm is sound for fine-tuning CLIP by retaining old data to achieve MDS; its effectiveness has also been validated by comparison with other methods.

缺点

(1) The motivation and necessity of the MDS is not sound enough. First, the MDS can be viewed as a more strict version of preventing catastrophic forgetting, i.e., maintaining "zero forgetting" during continual learning. The author claimed in lines 065-069 that zero forgetting is crucial for many safety-critical applications when considering the whole deployment cycle of the learning-enabled cycle, which is reasonable. However, only strictly preserving the model's original performance is not enough. For instance, strictly maintaining the performance of tasks that are not good enough may not bring more benefits to improving the safety of the existing learning-enabled applications. The review may suggest that the author calibrate their statement.

Moreover, other than the traditional paradigm of continual learning, there also exist other paradigms like data engines that consider the whole machine learning cycles [a, b, c] to achieve the safe development of the learning-based system, where [c] provides an automatic self-improved data engine for safety-critical application, i.e., autonomous driving. Different from the present work, [c] does not need to retain old data to maintain the performance; instead, it mines the vast amount of unlabeled data to increase the performance of long-tailed or new tasks while maintaining the performance of the old tasks. Moreover, [c] validates the self-improved data engine on object detection tasks, which is more challenging and safety-critical in autonomous driving and classification. The reviewer may suggest the author include some discussion of other learning paradigms, like automatic data engines, given that they have similar motivations and targeted applications.

(2) The proposed algorithm seems too restricted to the pretrained CLIP model, making it hard to evaluate the applicability of the proposed method for safety-critical real-world applications. Although the proposed method is sound for fine-tuning the pretrained CLIP to achieve MDP, the proposed constrained optimization seems too restricted, making the reviewer wonder whether it has sufficient applicability for other foundation models. The development of the algorithm mainly depends on the CLIP model and the contrastive loss, while the contrastive loss is not the only choice for training the foundation model. The author may want to elaborate on how the proposed algorithm can be extended to different kinds of foundation models.

Moreover, there is a practical concern that the foundation model like CLIP may not satisfy the requirement for the real-time latency of safety-critical applications like autonomous driving, as the real-world intelligent system is also integrated with many different components. The author may want to show that the proposed algorithm can also apply to lightweight models other than foundation models to validate the applicability of the proposed method.

(3) In the experiment, the author only considered the classification task in autonomous driving and scene recognition. However, many safety-critical applications that are more challenging and underperformed [d] will benefit more from MDS, e.g., 2D and 3D object detections for perception and tasks for motion prediction. The author may want to have more case studies other than classification to show the generality of the proposed algorithm.

(4) For the comparison methods, the author only compared with the GEM proposed in 2017, while many other replay-based methods [e, f] have been proposed in recent years that can achieve state-of-the-art performance in the continual learning literature. The author may want to compare with those methods.

Minor:

(1) The reviewer wonders why DevSafety is measured by 'acc', while in Equation (2), it is defined by measuring the difference of empirical loss between the new and old models.

(2) The author may want to elaborate on 'mild conditions' in lines 273-274.

(3) What is the insight of leveraging the moving average estimators in lines 290-291?

(4) Typo in line 312: 'proected' -> 'protected'

(5) In lines 461-464, how should we interpret Figure 2 that the development safety has been achieved?

Reference:

[a] NEIL: Extracting Visual Knowledge from Web Data. ICCV 2013

[b] Never-ending learning. Communications of the ACM 2018

[c] AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving. CVPR 2024

[d] End-to-end Autonomous Driving: Challenges and Frontiers. TPAMI 2024

[e] A Comprehensive Survey of Continual Learning: Theory, Method and Application. TPAMI 2024

[f] Class-Incremental Learning: A Survey. TPAMI 2024

问题

Please refer to the Weaknesses section.

伦理问题详情

N/A.

2024-11-24

We thank the reviewer for dedicating the time to provide a comprehensive review. Below we would like to address raised concerns and questions.

RQ1: Only strictly preserving the model's original performance is not enough. For instance, strictly maintaining the performance of tasks that are not good enough may not bring more benefits to improving the safety of the existing learning-enabled applications.

A: We agree with the reviewer! However, we emphasize that the protected tasks are specified by the user and hence the user can choose which existing essential abilities to be protected. Thus, if one tasks' performance is very poor, one may not include such task as a protected one. In addition, our framework does not prevent the new model from becoming better than the old model on protected tasks. Indeed, from our results you can see that in many cases the performance on protected tasks are also improved while we improve the performance of a target task. For example, in Figure 1, the performance of a protected class "partly cloudy" is also significantly improved and that of other protected classes are slightly improved in Round 1. Similarly, in the right figure of Figure 4 we can also see that the new model also improves many protected classes.

RQ2: Other than continual learning, other paradigms like data engines that consider the whole machine learning cycles [a, b, c] to achieve the safe development of the learning-based system, such as [c], the reviewer may suggest the author include some discussion of this paradigm.

A: Thank you for pointing out these works! Automatic data engine paradigm is an important research direction for enhancing existing learning-enable systems by iteratively providing self-improved data. For example, [c] mines the vast amount of unlabeled data to increase the performance of detecting rare or unseen categories in object detection for autonomous driving systems. We have used a similar approach to mine a vast amount of unlabeled data on the internet to retrieve related data of the target task mentioned in line243 and detailed in Appendix A.2. However, the model updater module of [c] does not consider how to prevent zero-forgetting on protected tasks. We included the discussion with automatic data engine paradigm in Appendix A.2.

RQ3: With experiments conducted on CLIP models, can the proposed algorithm be extended to other models?

A: The proposed constrained optimization framework is generic and it is not tied to any kinds of models. Although the algorithms are developed specifically for the contrastive loss as the objective, it can be easily extended to other losses. Indeed, the contrastive loss can be replaced by any loss function. For example, if we consider LLMs, the objective can be a supervised finetuning loss of a target task. The key of our framework is how to handle the constraints. It can be also a standard cross-entropy loss for learning a lightweight model. We hope our work can inspire researchers in safety-critical application domain for more exploration.

RQ4: Only experiments with classification tasks in autonomous driving and scene recognition are included in the paper. How about other tasks, like 2D and 3D object detections for perception and tasks for motion prediction?

A: Thanks for the reviewer’s constructive comments on improving our paper. Note that classficaiton task is the foundamental, ubiquitous and one of the most important tasks in learning-based system, so we take it as the demonstration of our proposed framework. Since the focus of this paper is to introduce and demonstrate the general constrained optimization framework for model development safety, we leave further exploration on 2D and 3D object detections and tasks for motion prediction for furthur exploration.

RQ5: The author may want to compare with other replay-based methods.

A: Thank you for the suggestion! We are working on adding a recent replay-based contrastive continual learning baseline [1]. However, given limited time we are still working on the experiments. We would like to emphasize that all the baselines we compared are continual learning baselines, with FLYP standing for direct-replay methods, GEM as a typical continual learning method, WCCL and RM as the regularized-based continual learning baselines tailored to our setting. We expect that continual learning can not ensure model developmental safety as they aim to trade off between protect tasks and target tasks.

[1] Cha, Hyuntak, Jaeho Lee, and Jinwoo Shin. "Co2l: Contrastive continual learning." Proceedings of the IEEE/CVF International conference on computer vision. 2021.

评论- Part 2

2024-11-24

RQ6: Why is DevSafety measured by 'acc', while in Equation (2), it is defined by measuring the difference of empirical loss between the new and old models.

A: The "acc" corresponds to the zero-one loss in Equation (2). We use difference of accuracy instead of difference of cross-entropy loss to measure DevSafety is because that accuracy is what people care in practice for classification tasks.

RQ7: The author may want to elaborate on 'mild conditions' in lines 273-274.

A: As shown in page 97 proposition 2.1 in [2], assume function $F$ and $h_k$ are continuous, the optimal solution of the penalty form will converge to the optimal solution of constrained form when penalty parameter goes infinity. Our analysis (Theorem 1) also stated the conditions such that our algorithm for solving the penalized form with a large enough $\beta=O(1/\epsilon)$ finds an $\epsilon$ -level approximated KKT-condition.

[2] Bertsekas, Dimitri P. Constrained optimization and Lagrange multiplier methods. Academic press, 2014.

RQ8: What is the insight of leveraging the moving average estimators in lines 290-291?

A: This is motivated by the work [3], whose insight is to utilize the historical data such that the contrastive learning does not require a very large batch size (e.g. 32,768 for OpenAI CLIP) to achieve a satisfactory result.

[3] Yuan, Zhuoning, et al. "Provable stochastic optimization for global contrastive learning: Small batch does not harm performance." International Conference on Machine Learning. PMLR, 2022.

RQ9: In lines 461-464, how should we interpret Figure 2 that the development safety has been achieved?

A: Since the x-axis represents DevSafety (acc), developmental safety is achieved when a point is located on or to the right of the vertical dotted line, i.e., DevSafety(acc) $\geq 0$ . Similarly, the target is improved if the point is also located above the horizontal dotted line.

2024-11-26

I want to thank the author for the detailed response. Most of my concerns have been addressed, and thus, I am willing to increase my score. Please let me know when the result of RQ5 is ready, and I will consider it to decide whether it is worth further increasing the final score.

I would suggest the author include the reply of RQ4 as a discussion for future work in camera-ready since it is crucial for the community to appreciate the scope of the present paper and inspire future study.

评论- Thank you!

2024-12-03

Dear Reviewer SvDs:

Please check below our new experimental results for addressing RQ5 on comparing with one recent continual contrastive learning method.

Thank you for your time!

Regards Authors

2024-11-28

Thank you for your helpful suggestions. We incorporated the discussion of RQ4 in the revision. Moreover, we've finished the experiments with a recent replay-based baseline, namely Co $^2$ L[1]. Following their paper, we tune their $\tau$ in {0.05, 0.1}, $\kappa$ in {0.1, 0.2}, $\kappa^*$ in {0.01, 0.1}, $\lambda$ in {0.1, 1, 10}. The results are presented below. We can see that even recent SOTA continual learning method still fails to ensure model developmental safety, with a zero retention ratio across all the tasks. This is anticipated as conventional continual learning focuses on trading off between protect tasks and target tasks, without ensuring zero-forgetting. We included the results in the revision.

Method	Measures	100	1k	2k	4k
Base	RetentionRatio//DevSafety	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)
	Target Tunnel	0.1064(0.0000)	0.1064(0.0000)	0.1064(0.0000)	0.1064(0.0000)
Co $^2$ L	RetentionRatio//DevSafety	0.00%//-0.1407(0.0043)	0.00%//-0.1252(0.0061)	0.00%//-0.0821(0.0029)	0.00%//-0.0479(0.0039)
	Target Tunnel	0.6808(0.0460)	0.8936(0.0626)	0.8936(0.0301)	0.8723(0.0000)
RM	RetentionRatio//DevSafety	0.00%//-0.1021(0.0022)	0.00%//-0.0969(0.0036)	0.00%//-0.0955(0.0057)	0.00%//-0.0897(0.0068)
	Target Tunnel	0.9574(0.0233)	0.8894(0.0340)	0.8808(0.0170)	0.8681(0.0085)
Ours	RetentionRatio//DevSafety	40.00%//-0.0050(0.0076)	60.00%//-0.0001(0.0043)	100.00%//0.0105(0.0053)	100.00%//0.0186(0.0058)
	Target Tunnel	0.9362(0.0699)	0.8723(0.0233)	0.9106(0.0159)	0.8723(0.0233)

Method	Measures	100	1k	2k	4k
Base	RetentionRatio//DevSafety	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)
	Target Foggy	0.3953(0.0000)	0.3953(0.0000)	0.3953(0.0000)	0.3953(0.0000)
Co $^2$ L	RetentionRatio//DevSafety	0.00%//-0.0686(0.0064)	0.00%//-0.1217(0.0383)	0.00%//-0.1305(0.0183)	0.00%//-0.0721(0.0154)
	Target Foggy	0.7132(0.0109)	0.6047(0.0380)	0.6357(0.0110)	0.6357(0.0290)
RM	RetentionRatio//DevSafety	0.00%//-0.0418(0.0062)	0.00%//-0.0173(0.0054)	0.00%//-0.0159(0.0034)	20.00%//-0.0124(0.0091)
	Target Foggy	0.5674(0.0378)	0.5023(0.0186)	0.4419(0.0658)	0.2279(0.0174)
Ours	RetentionRatio//DevSafety	0.00%//-0.0241(0.0082)	60.00%//-0.0009(0.0044)	100.00%//0.0044(0.0033)	100.00%//0.0061(0.0047)
	Target Foggy	0.5721(0.0406)	0.4930(0.0174)	0.4326(0.0186)	0.4279(0.0316)

Method	Measures	100	1k	2k	4k
Base	Retention Ratio//DevSafety	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)	100%//0.00(0.0000)
	Target Overcast	0.7361(0.0000)	0.7361(0.0000)	0.7361(0.0000)	0.7361(0.0000)
Co $^2$ L	Retention Ratio//DevSafety	0.00%//-0.0138(0.0099)	0.00%//-0.0072(0.0032)	0.00%//-0.0095(0.0043)	0.00%//-0.0137(0.0052)
	Target Tunnel	0.5916(0.0417)	0.8369(0.0049)	0.8396(0.0055)	0.8507(0.0172)
RM	Retention Ratio//DevSafety	0.00%//-0.2932(0.0365)	0.00%//-0.3016(0.0228)	0.00%//-0.2444(0.0120)	0.00%//-0.2634(0.0105)
	Target Overcast	0.9787(0.0050)	0.9730(0.0028)	0.9588(0.0041)	0.9647(0.0023)
Ours	Retention Ratio//DevSafety	0.00%//-0.0655(0.0249)	20.00%//-0.0043(0.0037)	60.00%//0.0012(0.0029)	100.00%//0.0046(0.0016)
	Target Overcast	0.8789(0.0464)	0.7827(0.0225)	0.7562(0.0167)	0.7525(0.0366)

[1] Cha, Hyuntak, Jaeho Lee, and Jinwoo Shin. "Co2l: Contrastive continual learning." Proceedings of the IEEE/CVF International conference on computer vision. 2021.

2024-12-02

Dear Reviewer,

Thank you for your valuable feedback on our paper. As the author-reviewer discussion phase is nearing its end, please feel free to reach out if you have any remaining concerns or suggestions on our paper. We would greatly appreciate it.

AC 元评审

2024-12-17

This paper introduces Model Developmental Safety (MDS), a safety-centric framework to ensure zero-forgetting of protected capabilities during iterative model development, particularly in safety-critical domains.

After the rebuttal period, this paper receives mixed ratings. Reviewers 3RP8 and 1P9i still have remaining concerns, such as the lack of broader validation across diverse safety-critical domains and issues with the paper's presentation. All reviewers give borderline scores. Given the current weaknesses of the paper, the paper is rejected from the highly competitive ICLR conferences. The authors are encouraged to improve their manuscript according to the reviewers' suggestions, and submit it to the next venue.

审稿人讨论附加意见

最终决定Reject

2025-01-22

Reject