5.8

/10

Poster4 位审稿人

最低4最高8标准差1.5

4.5

置信度

正确性3.0

贡献度2.8

表达3.0

NeurIPS 2024

AverNet: All-in-one Video Restoration for Time-varying Unknown Degradations

Haiyu Zhao,Lei Tian,Xinyan Xiao,Peng Hu,Yuanbiao Gou,Xi Peng

OpenReview PDF

提交: 2024-05-11更新: 2024-11-06

摘要

关键词

Deep learningAll-in-one video restorationTime-varying unknown degradations

评审与讨论

审稿意见

评分: 8置信度: 52024-07-03

This paper studies the time-varying unknown degradations in videos and proposes an all-in-one video restoration network to recover corrupted videos. Specifically, the network consists of two modules named PGA and PCE, which are designed to address the pixel shifts issue caused by time-varying degradations and to tackle multiple unknown degradations, respectively. Through the collaboration of them, the network could effectively handle the time-varying unknown degradations.

优点

The problem of time-varying unknown degradations studied in this work is practical and challenging. In real-world scenarios, the degradations in videos dynamically change over time, and their types and levels are always unknown.
Compared with classic video restoration methods that deal with one specific degradation, the proposed method could handle time-varying and multiple unknown degradations with one model.
The paper comprehensively discussed existing video restoration methods and all-in-one image restoration methods, as well as their differences from the proposed method.

缺点

As shown in Table 2, although the proposed method could effectively handle time-varying degradations with different variation intervals, the variation intervals of the test sets are fixed. Could the proposed method handle degradations with variable intervals?
The experiments are only conducted on the test sets with combined degradations. How about the performance on single type of degradation with variable levels?
There is a recent method [1] that deals with multiple degradations. What are the differences between this method and the proposed one? Additionally, the authors should include it in the related works.

[1] Yang, et al. Video adverse-weather-component suppression network via weather messenger and adversarial backpropagation. ICCV, 2023.

问题

Please see the weaknesses.

局限性

Potential impacts and limitations have been discussed in the supplementary material.

作者回复

2024-08-07

Q1：Evaluation on degradations with variable intervals.

As suggested, we conduct new experiments by synthesizing new test sets with variable degradation intervals. The results show that our method effectively handles degradations with variable intervals. Specifically, the test sets are synthesized based on DAVIS-test and the intervals are randomly sampled from [t-v, t+v].

Table 1. Quantitative results on the test sets with variable intervals.

Test Sets	t=6, v=3		t=12, v=6
	PSNR	SSIM	PSNR	SSIM
RVRT	33.8849	0.9306	34.3231	0.9330
AverNet	34.0313	0.9338	34.3317	0.9344

Q2: The performance on single type of degradation.

As suggested, we synthesize new test sets, each containing only a single type of degradation, and evaluated the models on these sets. The results, as shown in Table 2, demonstrate that AverNet consistently outperforms RVRT across all types of degradation.

Table 2. Quantitative comparisons on single type of degradation.

Degradation	Noise		Blur		Compression
	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
RVRT	36.35	0.9603	35.65	0.9545	34.49	0.9431
AverNet	36.88	0.9641	36.65	0.9618	34.63	0.9466

Q3: The differences between the proposed AverNet and ViWS-Net.

The differences between AverNet and ViWS-Net [1] are discussed below and will be included in the related works. First, AverNet aims to handle time-varying unknown degradations, whereas ViWS-Net specifically focuses on a single type of degradation within one video. Additionally, AverNet employs prompt-guided modules to conditionally restore videos, while ViWS-Net explicitly optimizes a weather discriminator to classify the degradations and guide the restoration.

[1] Video Adverse-Weather-Component Suppression Network via Weather Messenger and Adversarial Backpropagation.

评论- After reading the response and all comments, I decide to raise my score.

2024-08-12

Thank you for providing the additional experiments and comprehensive explanation. The feedback has successfully addressed my previous concerns, and the additional experiments further demonstrate the flexibility and effectiveness of the proposed method. As the first study to address time-varying unknown degradations in videos, this paper presents an effective solution for restoring corrupted videos with a single model. I am confident that this work will make a significant contribution to the field of video restoration and be of great benefit to the community. Accordingly, I have raised the score.

评论- Thanks for the reviewer's reply!

2024-08-13

Dear Reviewer 2PV4,

Thank you for your positive feedback and for raising your score. We appreciate your approval of our revisions and are glad that the additional experiments and discussions have addressed your concerns. Your suggestions throughout the review process have been invaluable.

审稿意见

评分: 4置信度: 42024-07-03

The authors propose a prompt learning based framework for all-in-one video restoration with time-varying degradations. Their work employs a prompt-guided alignment module to overcome pixel shifts caused by time-varying degradations. Multiple unknown degradations are learned through a prompt-conditional module.

优点

The authors have a solid motivation to address time varying degradations in videos in an unified restoration setting
Their proposed method consistently outperforms prior work in their considered settings

缺点

The data pipeline used in this study appears overly simplistic and disrupts the content dependencies of corruptions like noise, which varies with overexposed or underexposed frames. Instead of merely adding random degradation types to random video frames, it would be more realistic to simulate the severity of these degradations over time (e.g., increasing JPEG compression, blur, or noise), thereby preserving temporal dependencies.
Merely increasing the variation intensity by reducing the number of frames per degradation is too simplistic. As mentioned earlier, increasing the severity of degradation is more beneficial.
Following the practicality aspect of proposed method, the authors do not provide evaluation on realistic degraded videos, such as VideoLQ or NoisyCity4, whether their time-varying degradation model can compete with prior work in this more complex setting.
The efficiency comparison in Table 1 appears inaccurate, as the number of parameters for both PromptIR and AIRNet is incorrect. There is a significant discrepancy between the officially reported numbers and those listed in Table 1.

问题

How does the model performance change under more complex degradation pipelines used in works such as BSRGAN or Real-ESRGAN? It is also not clear why these degradation pipelines were not at least considered as a starting point.
How does the model perform when adding multiple degradation to the same frame snippets or having a collection of different degradations per frame snippet instead of sequentially adding different single degradations to the video snippets

局限性

N/a

作者回复

2024-08-07

Q1&Q2：Increase the severity of degradations over time.

As suggested, we conduct new experiments by synthesizing four test sets with progressively worsening degradations and the results prove the effectiveness of our method. Specifically, in the first test set, different types of degradations are gradually introduced, with their intensities increasing over time. In the other three test sets, only one type of degradation is added, with its intensity increasing over time. Since the models have seen various degradations and their variations during training, we directly apply them to these test sets without retraining. The results are presented in the table below. From the tables, one could observe that our AverNet is effective in handling various degradation changes and outperforms RVRT in the settings where degradations worsen over time.

Table 1. Quantitative results on increasing degradation severity over time. Multiple degradations denote noise, blur, and compression are gradually added and their severity worsen with time. Noise, Blur, and Compression denote single type of degradation that worsen with time.

DAVIS-test	Multiple Degradations		Noise		Blur		Compression
Metric	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
RVRT	33.26	0.9201	35.81	0.9541	28.78	0.8264	28.99	0.8613
AverNet	33.41	0.9238	36.38	0.9577	29.24	0.8389	29.31	0.8804
Set8	Multiple Degradations		Noise		Blur		Compression
Metric	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
RVRT	29.63	0.8623	33.64	0.9457	26.99	0.7864	28.69	0.8794
AverNet	29.77	0.8656	33.88	0.9482	27.19	0.7924	28.89	0.8876

Q3：Evaluations on realistic degraded videos.

As we do not find the NoisyCity4 dataset, we can only conduct evaluations on the realistic video dataset VideoLQ [1]. The results in Table 2 show that our AverNet is more effective in dealing with realistic and complex degradations.

Table 2. Quantitative results on realistic video dataset VideoLQ.

Method	RVRT	AverNet
NIQE ↓	4.6602	4.6234
PI ↓	3.6491	3.6464
CNNIQA ↑	0.5470	0.5487
HyperIQA ↑	0.4547	0.4560
CLIPIQA ↑	0.3800	0.3899

[1] Investigating Tradeoffs in Real-World Video Super-Resolution.

Q4：Parameters for PromptIR and AirNet.

To comprehensively compare the models, we use the PyTorch model profiling API THOP to calculate the parameters. THOP calculates parameter counts using hooks on modules, which may result in lower values than those officially reported. We recalculate and update the results, i.e, PromptIR has 35.60M parameters, and AirNet has 8.93M parameters.

Q5：Clarification on our degradation pipeline and that of BSRGAN and Real-ESRGAN. The performance under pipelines of BSRGAN and Real-ESRGAN.

We argue that the degradation pipelines used in BSRGAN [1] or Real-ESRGAN [2] are not necessarily more complex than ours. In fact, our method has a comparable level of complexity with them. To provide a clearer comparison, we have summarized the key components of our pipeline alongside those of BSRGAN and Real-ESRGAN in Table 3 below. Additionally, our pipeline is designed for all-in-one video restoration that addresses time-varying unknown degradations in videos, whereas these pipelines are developed for blind image super-resolution.

Table 3. Comparison of degradation pipelines.

Degradation Types	Blur	Downsampling	Noise	Compression
BSRGAN	Gaussian Blur	Resize	Gaussian Noise, Processed camera sensor Noise	JPEG Compression
Real-ESRGAN	Gaussian Blur, 2D sinc filter	Resize	Gaussian Noise, Poisson Noise, Color Noise, Gray Noise	JPEG Compression
Ours	Gaussian Blur, Resizing Blur	-	Gaussian Noise, Poisson Noise, Speckle Noise	JPEG Compression, Video Compression

As suggested, we synthesize new test sets based on the pipelines of BSRGAN and Real-ESRGAN to evaluate the performance of our models. The results, presented in Table 4, show that our AverNet achieves comparable or even superior performance on both test sets synthesized through BSRGAN and Real-ESRGAN. Note that downsampling operation was removed to maintain the frame size.

Table 4. Quantitative comparisons on the test sets synthesized through BSRGAN and Real-ESRGAN.

Pipeline	BSRGAN		Real-ESRGAN
Metric	PSNR	SSIM	PSNR	SSIM
RVRT	26.31	0.7004	25.36	0.6036
AverNet	26.34	0.6977	25.37	0.5971

[1] Designing a Practical Degradation Model for Deep Blind Image Super-Resolution.

[2] Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data.

Q6：Clarification on the degradation pipeline.

Actually, our pipeline adds multiple degradations simultaneously to the same frame snippets. Specifically, in our pipeline, each type of degradation has a 0.55 probability of being sampled and applied to per snippet. In other words, each frame snippet in our test sets usually involves multiple degradations. Consequently, our experiments indeed evaluate the models on test sets containing multiple degradations per snippet, rather than just a single degradation per snippet.

2024-08-13

I appreciate the reviewers' responses during the rebuttal process. However, my primary concern about the data generation pipeline remains unresolved. The results in Table 2 and Table 4 are still unconvincing. While I agree that addressing the TUD problem is a crucial next step, the current paper does not sufficiently compare the proposed generation pipeline with prior approaches, nor does it seem to be thoroughly developed. As a result, I must maintain my current score.

评论- Clarification on the data generation pipeline.

2024-08-13

Dear Reviewer Zzy9,

We appreciate your approval of the TUD problem raised in the paper and would like to address your concerns as follows.

As you noted, addressing the TUD problem is a crucial next step in the filed of video restoration. To study this problem, we developed the pipeline to simulate the data of TUD. The previous pipelines of BSRGAN and Real-ESRGAN are not applicable as their goal is to generate images with mixed degradations for blind image super-resolution. In contrast, our pipeline aims to synthesize videos with time-varying degradations, which is well-aligned to our research purpose, i.e., all-in-one video restoration for TUD problem. Experimental results demonstrate that our method can effectively address the TUD problem compared with the baselines.

We hope these clarifications could address your concerns. Thank you once again for your feedback and for helping us improve our work.

审稿意见

评分: 5置信度: 42024-07-12

The paper considers the problem of all-in-one restoration in videos, which is fundamentally different from images due to time-varying notion of degradations affecting the videos. The paper proposes prompt based modules to condition the restoration of frames on.

优点

S1. The paper extends the problem of all-in-one restoration from the image domain to the video setting.

S2. A recurrent prompt-based architecture is proposed for the said purpose.

S3. Two datasets are synthesized, due to lack of such datasets, based on seven degradations with varying intensity of degradation over time.

缺点

W1. In longer videos (Set8), the performance of RVRT is very comparable to the proposed AverNet. Although RVRT does not include any prompts (implicit or explicit) to condition the restoration on.

W2. The paper lacks thorough exploration of the problem. Considering weather-induced degradations as base (instead of just DAVIS/Set8), and synthesizing the video datasets then would have been a more challenging problem to evaluate the effectiveness of the prompts in longer videos.

W3. It would benefit to include a baseline that considers the problem of all-in-one video restoration (or multiple degradations with one model) since those architectures are designed to condition the restoration procedure on the degradation information [1], [2]. I agree with the reasoning about [2] in line 84 onwards, however results on [1], and/or [2] would indicate the importance of the proposed formulation (i.e., the conditioning should take into account/model time-varying degradations).

[1] Video Adverse-Weather-Component Suppression Network via Weather Messenger and Adversarial Backpropagation

[2] Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal

问题

Q1. It is unclear up until Table 5 ablation what $t$ refers to i.e, the interval in key frames. If $t$ refers to the key frame interval, what does variation intensity mean in "Evaluation on Different Variation Intensity" paragraph, and how is interval in key frames used to synthesize the datasets?

Q3. Have the authors considered a non-prompt based setting for ablation experiments? In Table 4, all scenarios have prompts.

Q2. Have the authors considered a controlled setting wherein the degradations necessarily worsen with time i.e., severe noise, blur, etc. are introduced as time increases? This setting would highlight how well the prompts can adapt to the changing degradations.

局限性

The limitations, and societal impact are discussed.

作者回复

2024-08-07

Q1：Effectiveness of prompts in longer video Set8 with complex degradation changes.

Actually, our prompt-based AverNet is more effective in dealing with complex degradation changes in longer video Set8. To highlight how well the prompts can adapt to changing degradations, we synthesize new test sets with increasing degradation severity based on Set8. The results are shown in Table 1, from which one could observe that the prompts endow AverNet with a greater capacity to handle degradation changes in long videos. In detail, we synthesize four test sets where the degradations progressively worsen over time. In the first test set, different types of degradations are gradually introduced, with their intensities increasing over time. In the other three test sets, only single type of degradation is added, with the intensity increasing over time.

Table 1. Quantitative results on Set8 with increasing degradation severity over time. Multiple degradations denote noise, blur, and compression are gradually added and their severity worsen with time. Noise, Blur, and Compression denote single type of degradation which worsen with time.

Set8	Multiple Degradations		Noise		Blur		Compression
Metric	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
RVRT	29.63	0.8623	33.64	0.9457	26.99	0.7864	28.69	0.8794
AverNet	29.77	0.8656	33.88	0.9482	27.19	0.7924	28.89	0.8876

Q2：Effectiveness on weather-induced degradations.

As suggested, we synthesize a new dataset with time-varying weather degradations based on the video dataset REVIDE [1]. The results in Table 2 demonstrate that our prompt-based AverNet effectively handles weather-induced degradations. Specifically, we introduce three types of weather degradations (i.e., haze, snow and rain) through our data pipeline and the synthesis approaches similar to [2,3]. Due to time limitation, we train models on the dataset with 200k iterations, and use a fast baseline BasicVSR++ for comparison. RVRT is not compared since its training is too time-comsuming to finish in the rebuttal phase.

Table 2. Quantitative results on weather-induced degradations.

Method	PSNR	SSIM
BasicVSR++	37.36	0.9704
AverNet	39.82	0.9740

[1] Learning to Restore Hazy Video: A New Real-World Dataset and A New Method.

[2] Blind Image Decomposition.

[3] Relationship Quantification of Image Degradations.

Q3：Comparisons with all-in-one video restoration methods.

As suggested, we compare our method with ViWSNet [1] and present the results in Table 3. The results show that ViWSNet struggles with handling time-varying degradations and produces unsatisfying results. We speculate that this is because ViWSNet imposes a strong assumption through loss function, i.e., only single type of degradation exists in single video.

Note that Diff-TTA [2] is not compared since its code is not available, and we were unable to reproduce it during the rebuttal phase.

Table 3. Quantitative results of ViWSNet on DAVIS-test and Set8.

Test Sets	DAVIS(t=6)		DAVIS(t=12)		DAVIS(t=24)		Set8(t=6)		Set8(t=12)		Set8(t=24)
Metric	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
ViWSNet	16.38	0.5278	16.38	0.5273	16.38	0.5289	13.73	0.3579	13.72	0.3605	13.70	0.3574
AverNet	34.07	0.9333	34.09	0.9339	34.28	0.9356	31.73	0.9219	31.47	0.9145	32.45	0.9189

[1] Video Adverse-Weather-Component Suppression Network via Weather Messenger and Adversarial Backpropagation.

[2] Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal.

Q4：Clarifications on variation intensity t and key frame interval T.

The confusions may arise from the misunderstanding betweeen the variation intensity 't' in the data pipeline and the keyframe interval 'T' in the PCE module. Specifically, the lowercase ‘t’ controls the interval of degradation changes in the data pipeline. A smaller t corresponds to a higher variation intensity of degradations. For instance, t=6 indicates degradation changes every six frames. Additionally, the uppercase 'T' is the hyperparameter in the PCE module that controls the interval of keyframes. For example, T=12 indicates PCE module select one keyframe every twelve frames.

Q5：Non-prompt ablation studies.

We carry out non-prompt ablation study as suggested. As shown in Table 4, non-prompt baseline shows a significant drop in both the PSNR and SSIM metrics, highlighting the effectiveness of the two prompt-based modules.

Table 4. Ablation studies on the proposed prompt-based modules.

	PGA	PCE	DAVIS-test		Set8
			PSNR	SSIM	PSNR	SSIM
(A)			32.43	0.8910	27.99	0.8404
(B)		✓	32.59	0.9157	30.14	0.8958
(C)	✓		32.99	0.9156	29.80	0.8755
(D)	✓	✓	34.09	0.9339	31.47	0.9145

Q6：Experiments on degradations worsen with time.

As suggested, we carry out experiments under the above setting. The results are presented in Table 5, from which one could observe that our prompt-based AverNet significantly outperforms RVRT, which demonstrates the prompts effectively adapt to the changing degradations.

Table 5. Quantitative results on test sets with increasing degradation severity over time. Multiple degradations denote noise, blur, and compression are gradually added and their severity worsen with time. Noise, Blur, and Compression denote single type of degradation which worsen with time.

DAVIS-test	Multiple Degradations		Noise		Blur		Compression
Metric	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
RVRT	33.26	0.9201	35.81	0.9541	28.78	0.8264	28.99	0.8613
AverNet	33.41	0.9238	36.38	0.9577	29.24	0.8389	29.31	0.8804
Set8	Multiple Degradations		Noise		Blur		Compression
Metric	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
RVRT	29.63	0.8623	33.64	0.9457	26.99	0.7864	28.69	0.8794
AverNet	29.77	0.8656	33.88	0.9482	27.19	0.7924	28.89	0.8876

评论- Post Rebuttal Comments

2024-08-13

I thank the authors for the thorough rebuttal. I have gone through the other reviewers' comments, and authors' rebuttal. My comments are addressed, and therefore I am raising my score to borderline accept. I think the work is important, but exploring more complex degradations (such as rain/haze/snow) in more depth in the TUD setting would have been more interesting.

评论- Thanks for the reviewer's reply!

2024-08-13

Dear Reviewer wJZg,

Thank you for your positive feedback and for raising the score. We will include additional results about time-varying unknown degradations as you suggested, and provide a more thorough discussion in the revision. Besides, we will continue to explore the weather degradations under the TUD setting in our future works. Thank you again for the constructive suggestions and the approval of this work.

审稿意见

评分: 6置信度: 52024-07-13

This paper presents a video restoration method capable of addressing time-varying unknown degradations (TUD) with a single model. The proposed method employs two modules, i.e., the prompt-guided alignment (PGA) module and the prompt-conditioned enhancement (PCE) module in the propagation to leverage the temporal information for restoration. Experiment results on various types of degradations demonstrate the effectiveness of the proposed method.

优点

This paper considers a more practical and valuable problem named TUD in video restoration and presents a feasible solution to handle TUD with a single model.
The proposed modules take advantage of prompt learning to handle TUD and explicitly consider the degradations during propagation, which is interesting and innovative.

缺点

While the paper first studies time-varying unknown degradations, these degradations are synthesized through a degradation model, which may not accurately reflect real-world degradation distributions.
The intervals t of degradation variations in the test sets are all multiples of 6, which is the interval during training. It is uncertain that the proposed could generalize well on other intervals such as 9.
Previous works [1,2] could adaptively selected keyframes based on video changes. In contrast, the PCE module selects keyframes according to a fixed interval T, Why do not take the adaptive methods?
Some related works [3,4] are not included as they guided this area a lot. The authors should discuss them too. [1] Yule Li, Jianping Shi, Dahua Lin: Low-Latency Video Semantic Segmentation. CVPR 2018: 5997-6005. [2] Yu-Syuan Xu, Tsu-Jui Fu, Hsuan-Kung Yang, Chun-Yi Lee: Dynamic Video Segmentation Network. CVPR 2018: 6556-6565. [3] Jiaqi Ma, Tianheng Cheng, Guoli Wang, Qian Zhang, Xinggang Wang, Lefei Zhang: ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration. CoRR abs/2306.13653 (2023). [4] Yuang Ai, Huaibo Huang, Xiaoqiang Zhou, Jiexiang Wang, Ran He: Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration. CoRR abs/2312.02918 (2023)

问题

Please refer to the weaknesses section.

局限性

The authors should provide the discussion of limitations of this paper.
Related works should be addressed in a more sufficient way.

作者回复

2024-08-07

Q1: Evaluations on the realistic video dataset.

We further evaluate the effectiveness of our pipeline and network on realistic video dataset VideoLQ [1]. The results show that the models trained on our pipeline generalizes well on the realistic degradations. As shown in Table 1, our network and RVRT are not re-trained on the realistic video datasets such as RealVSR [2] but still shows great performance on VideoLQ.

Table 1. Quantitative results on the realistic video dataset VideoLQ.

Method	RVRT	AverNet
NIQE ↓	4.6602	4.6234
PI ↓	3.6491	3.6464
CNNIQA ↑	0.5470	0.5487
HyperIQA ↑	0.4547	0.4560
CLIPIQA ↑	0.3800	0.3899

[1] Investigating Tradeoffs in Real-World Video Super-Resolution.

[2] Real-world video super-resolution: A benchmark dataset and a decomposition based learning scheme.

Q2：Other degradation intervals.

As suggested, we conduct experiments on different intervals to show the generalization of AverNet. From Table 2, one could observe that AverNet shows better performance on two additional intervals.

Table 2. Quantitative results on different intervals t=9 and t=15.

Method	t=9		t=15
Metric	PSNR	SSIM	PSNR	SSIM
RVRT	33.92	0.9320	34.07	0.9347
AverNet	34.01	0.9356	34.17	0.9373

Q3：Adaptive keyframe selection.

Following [1,2], we adopt an adaptive strategy to select the keyframes, which chooses the frames with largest changes as keyframes. The results are presented in Table 3, which show that the adaptive strategy only brings slight improvement in SSIM. However, the computational burden of the adaptive strategy is not negligible in practice.

Table 3. Quantitative comparisons between fixed and adaptive keyframe strategy.

Keyframe Strategy	Fixed	Adaptive
PSNR	34.09	34.09
SSIM	0.9339	0.9341

[1] Low-Latency Video Semantic Segmentation.

[2] Dynamic Video Segmentation Network.

Q4：More related works [1,2] should be discussed.

As suggested, we will discuss the all-in-one image restoration methods [1,2] in the following and include them in the related works. MPerveiver [1] proposes a multimodal prompt learning approach to exploit the generative priors of Stable Diffusion to achieve high-fidelity all-in-one image restoration. ProRes [2] introduces additional visual prompts to incorporate task-specific information and utilize the prompts to guide the network for all-in-one restoration.

[1] ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration.

[2] Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration.

Q5：Discussion on limitations.

The training data for AverNet is based on the video synthesis approach, which generates videos with time-varying unknown degradations close to real-world scenarios. However, the corruptions in the real-world videos are complex and difficult to be simulated. Therefore, in real-world applications, our AverNet may need further validation and improvement.

评论- Reminder for review

2024-08-14

Dear Reviewer hhCd, I have noticed that you have not yet responded to the authors' rebuttal. I kindly urge you to engage in a discussion with the authors at your earliest convenience to help advance the review process.

最终决定Accept (poster)

2024-09-25

The final scores for this work are as follows: strong accept, weak accept, marginal accept, and marginal reject. On a broader scale, the overall evaluation leans towards a positive reception, though there is some divergence in opinions. After the rebuttal phase, the reviewer inclined towards rejection focused primarily on the limitations regarding performance improvement and chose to maintain his rating. However, two reviewers raised their scores, with one upgrading to a strong accept. These two reviewers indicated that the authors' rebuttal had addressed their previous concerns and acknowledged the importance and efficacy of the work. In summary, despite one reviewer's concerns about the contribution to performance improvement, he did not explicitly identify issues at the methodological level, and his score remains borderline. The other three reviewers are inclined to accept the work, with two being particularly decisive in their support. Therefore, I am inclined to accept this work.