5.2

/10

Rejected5 位审稿人

最低4最高7标准差1.2

4.4

置信度

正确性3.0

贡献度2.6

表达2.6

NeurIPS 2024

CLODE: Continuous Exposure Learning for Low-Light Image Enhancement using Neural ODEs

Donggoo Jung,Daehyun Kim,Tae Hyun Kim

OpenReview PDF

提交: 2024-05-13更新: 2024-11-06

TL;DR

Proposing an unsupervised method for low-light image enhancement via Neural ODE.

摘要

关键词

Low-light Image enhancementImage enhancementNeural ODEUnsupervised

评审与讨论

审稿意见

评分: 7置信度: 32024-06-16

This paper formulates the higher-order curve estimation problem as a NODE problem, enabling effective and accurate solutions with standard ODE solvers.

优点

This paper is well-written and structurally organised.

缺点

Reference formats are not consistent.

问题

Why choose E=0.6 in Eq.(11)? The setting of hyper-parameters in Eq.(17) should be mentioned. What's the challenge of applying ODE to continuous exposure learning? The background of the proposed method in Figure 4 is quite different from GT, why?

局限性

As described above

作者回复

2024-08-06

We are glad to hear that you found the paper to be well-written and structurally organized. We have diligently examined your comments and concerns as a reviewer, and have prepared responses addressing the raised concerns.

W1: Reference format

Thanks to your thorough suggestion, we believe we can make the reference format consistent. We promise to make the revision in the final version.

Q1: Hyper-parameters

The exposure level parameter $E$ in Eq.(11) guides the network to train the resulting final brightness of the output image.
For the exposure level parameter, we followed the prior settings of previous curve-adjustment based methods ([6, 9]). To ensure fairness and eliminate any effects from changes in $E$ , we consistently used the same value.
For the setting of hyper-parameters in Eq.(17), is detailed in L459-461 of the Appendix. As we provided in the Appendix A.1, the weights for each loss function are set to balance the scale of losses. To reiterate, the weights for the loss function $w_{col}$ , $w_{param}$ , $w_{spa}$ , $w_{exp}$ and $w_{noise}$ are set to 20, 200, 1, 10 and 1 respectively.

Q2: Challenge of applying ODE to continuous exposure learning.

There were three main challenges in applying ODE to continuous exposure learning: inference speed, tolerance setting, and time consumption in training.

[Inference speed]

First challenge is inference speed as written in L330-332. Since Neural ODE requires iterative processes to find the optimal solution, can result in somewhat slower inference times. To tackle this problem, we provide CLODE-S which is a compact version composed of a 2-layer network (R-fig. 1(b)). Although CLODE-S takes 0.0004M parameters and takes 0.005 second for image inference, CLODE-S shows promising performance in Table 5. Improving the inference speed of CLODE is our future research goal.
One possibility is to apply RectifiedFlow [1*] as mentioned in UK8A W2. RectifiedFlow[1*] transforms the solution paths of Neural ODEs into straight lines, enabling faster estimation of the Neural ODE system. We can first assume that the optimal solution found by CLODE is the expected optimal solution and then apply CLODE specifically to [1*].
Given the potential to apply such cutting-edge Flow Matching methods, we believe that CLODE is highly promising and can achieve fast inference speeds.

[Tolerance setting]

In addition, setting the tolerance is crucial in the NODE system because the ODE solver determines the state is optimal and has to be terminated, by the error of the current state within the allowable error rate. The error rate is defined by the following formula: $\Gamma_t = atol + rtol \times \text{norm}(Err_t)$
Here, $Err_t$ is the current state error, and atol and rtol represent the absolute and relative tolerance, respectively.
If $Err_t > \Gamma_t$ , the ODE solver adjusts the step size to minimize the error.
If $Err_t \leq \Gamma_t$ , the current image is considered the optimal solution state, and the process terminates.
Since $\Gamma_t$ is calculated by atol and rtol, setting these tolerance value is important. If atol and rtol are too small, training may fail, and if they are too large, effective brightness enhancement may not be achieved. As mentioned in the Appendix. L472, CLODE empirically sets atol and rtol to $1e^{-5}$ . For additional details about CLODE, please kindly refer to our global rebuttal G1.

[Time consumption in training]

Finally, as NODE involves simulation-based training, the model takes longer to train as it grows in size. To overcome this obstacle, we designed the architecture of CLODE to be as compact as possible while maximizing low-light image enhancement performance. For a better understanding of "simulation-based learning", we provide a schematic representation of CLODE (dopri5) in actual image enhancement at the bottom of R-Fig.2.

Q3: Figure 4

We understand Reviewer Qjwa's curiosity about Fig.4. We can explain the reason why the background of Fig.4 is quite different from the ground truth in two aspects.
One reason is the unsupervised methodology of CLODE. Since training is done without ground-truth images, CLODE improves the input images using only the given information from itself. The other reason comes from the lack of information in the input images. In some overexposed images in the SICE dataset, the overexposed regions do not contain the same details as the ground truth images.
In particular, the lack of information in the input image causes the same problem with supervised methods too, as shown in Fig.4. CLODE’s enhancement results may not match the ground truth perfectly, but despite this, our method performs better than other unsupervised methods and competes well with supervised methods. To address issues with over-exposed images, we will need to use a generative model, which we plan to explore in future research.

Thank you very much for taking the time to review our work. If there are any additional questions or points, we would be delighted to address them.

References

[1*] Liu, Xingchao, and Chengyue Gong. "Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow." In ICLR, 2023

2024-08-08

Thanks for your response. I have no more questions.

2024-08-08

We appreciate Reviewer Qjwa's time and constructive comments and discussions. We are also thankful for the acknowledgment of the effectiveness of our method. In the final version, we will include the discussions.

审稿意见

评分: 4置信度: 52024-07-08

This paper mainly addressed the problem of insufficient data for low-light enhancement. Specifically, it proposed CLODE , which employs Neural Ordinary Differential Equations to learn the continuous dynamics of the latent image for the first time. The experiments demonstrate the CLODE performs better than other unsupervised learning methods.

优点

This is the first attempt to formulate the higher-order curve estimation problem as a NODE problem.
CLODE can offer user controllability, enabling users to manually adjust exposure.

缺点

Details of User Controllable Design. Despite the better result with use control, detail of the users is missing. For example, the number of volunteers, and whether they are banned from the ground truth image before they adjust the output image. Also, involving human feedback bring much more time in the inference stage.
In Sec. 3.3 Inference Process, the relationship between the output image IT and noise-free image is questioned. Each iteration includes a noise removal module, yet the output image still contains some noise, contradicting the expectation of a noise-free result in the model.
Experimental Setup: The experimental setting described in [1] seems more suitable for unsupervised methods. Using only a single dataset for training in this study does not adequately reflect the advantages of the proposed method. A specific analysis comparing and justifying the differences in experimental setups is necessary.
Model Iteration Selection in "CLODE+" (Table 2): The manual operation required to select the iteration step raises concerns. How is this value determined to ensure suitable results? This approach appears more suited to image retouching tasks than enhancement.
Concern about the fair comparison with previous methods. This paper uses 5 different losses. I wonder whether only part of them is used in previous methods, are the proposed method align with previous methods? For example, some Retinex-based method does not explicitly consider the impact of noise, and they do not have Noise Removal process. Does CLODE still outperform other methods without noise removal? More ablation experiments are needed for thorough explanation.
Effectiveness of Noise Removal Module: In the first toy scene in Figure 4, as well as Figures 7 and 8, there is noticeable noise residue and some degree of color distortion, which casts doubt on the effectiveness of CLODE and its noise removal module for low-light enhancement.
More explanation of the superiority of CLODE. Can author provide clearer explanation of the mechanism? For example, in Figure 9 of Supp material, is the better results comes from the more iterations, or more iterations at the early stage, where the estimation is harder?

[1] Learning a Simple Low-light Image Enhancer from Paired Low-light Instances

问题

In tab. 3, can the continuous method be considered to have an early stop mechanism? While discrete methods adjust in every iteration as discrete value, the continuous method adjusts as a continuous value. Further analysis and additional visual results comparing these methods across iterations are anticipated.

局限性

None

作者回复

2024-08-06

We would like to thank the reviewer for recognizing our contribution and pointing out constructive concerns. We have made efforts to address the reviewer’s concerns as follows.

Q1, W7: More explanation of CLODE

Due to character limitation we placed explanation of this question in global rebuttal. Please refer to global rebuttal G1 to address reviewer's concern. We have prepared explanations along with additional analysis (R-fig. 2). In brief, CLODE includes an early stop mechanism.

W1, W4: User Control

First, we would like to cautiously mention that CLODE, even without user-control, is already state-of-the-art.
In Table 1 and 2, CLODE+ was determined based on the average values of images selected by two experts in the field.
To address Reviewer Z6DS’s curiosity, we conducted an extra user study on the LOL dataset with 21 participants who had no prior knowledge of the ground-truth images. The results are in R-Table 6. From the optimal state provided by CLODE, five images were generated for the participants by shifting the time steps by -0.5, -1, +0.5, and +1, respectively.
The results of the user study fall between the values for CLODE and CLODE+. Since CLODE already produces high-quality images through optimal solutions and empirically achieves more appealing images near the optimal state, finding user-preferred images is not challenging. Furthermore, the reason for the lower results in the user study is due to exceptionally dark reference images, even though the results from CLODE+ look better. Therefore, there is no significant difference in terms of visual quality (PI).

[Image retouching]

CLODE leverages NODE to offer user controllability as a kind of ‘free bonus feature', and this can be understood in terms of image retouching as Reviewer Z6DS's comment. However, the ability to perform image retouching through unsupervised learning is a clear advantage that makes CLODE more practical for use in diverse environments. We would also like to assert that our method demonstrates superior performance compared to various methods without user control.

W2: Noise Problem

Before the explanation, we would like to clarify that $I_T$ contains noise.
In Section 3.2.1, the input to the ODE function is $I_t$ , which is passed through the Noise Removal module to obtain the denoised image $\tilde{I_t}$ as shown in Eq.(7). The denoised $\tilde{I_t}$ is used as input to the Curve Parameter Estimation module and is provided by Eq.(8). The reason for using the denoised $\tilde{I_t}$ as an input of Curve Parameter Estimation module is to obtain a fine-grained curve parameter map $\mathcal{A}_t$ . The final output for the ODE function is expressed as $\mathcal{A}_t \otimes I_t \otimes (1-I_t)$ (Eq.(9)), where we utilize $I_t$ rather than the denoised image $\tilde{I_t}$ . The reason for not using $\tilde{I_t}$ is that it is difficult to preserve the details of the input image when image enhancement is performed by repeatedly denoising the image. Thus, the mentioned $I_T$ is the result of the improvement using the fine-grained curve parameter map, and since no direct denoising is performed in the iteration process, $I_T$ contains some noise. Lastly, to obtain a noise-free image $\tilde{I}_T$ , we apply the Noise Removal module to $I_T$ as described in L196-197.

W3: Experimental step

The reason for using a single training dataset for each task is to ensure a fair comparison with previous methods that include both supervised and unsupervised approaches.
Since most supervised methods’ official weight parameters are trained on a single dataset, we adopted the same approach.
We agree that training with diverse datasets is beneficial and provide results from training on all LOL and SICE datasets in R-Table 7. The overall performance is comparable to that achieved with a single training dataset.

W5: Losses

The four losses, excluding $\mathcal{L}_{noise}$ , were used in the same way as in the previous curve-adjustment methods. We acknowledge that some models do not account for noise, and although we mentioned the case without the noise module in Table 4, we kindly present a comparison again in R-Table 3. As the image becomes brighter, noise becomes more amplified, leading to a slightly lower SSIM. However, our method still outperforms other methods.

W6: Noise Module, Color Casts

Our noise removal module consists of very few parameters (Model size: 0.085MB, 22,275). While this may result in lower performance compared to existing denoising models, it effectively learns during the image enhancement process in CLODE.
For more explanation of the denoiser, we recommend referring to global rebuttal G2. Additionally, compared to low-light enhancement models that include noise removal ([10, 13]), the unsupervised denoising performance is competitive. (R-Table 3)
CLODE enhances the image based on the color statistics of the input image in an unsupervised manner, which can lead to the occurrence of color casts. To elaborate, while curve adjustment methods preserve the details of the input image and enhance it to a naturalness, the color loss follows the Gray-World hypothesis (L224), leading to these issues. We use the same color constancy loss as in previous curve-adjustment method [6]. Nevertheless, in comparison to existing methods, CLODE exhibits superior performance in terms of naturalness image quality metrics and color matching histogram loss in R-Table 4.
For further explanation on color casts, please refer to our response to 9pkv’s W1, Q1. We apologize for the lack of detailed explanation due to space limitations but assure you that we will address any additional questions thoroughly during the discussion period.

审稿意见

评分: 4置信度: 52024-07-08

This manuscript introduces CLODE, which learns low-light image enhancement using neural ordinary differential equations (NODE). The key innovation lies in formulating the higher-order curve estimation problem as a NODE problem. Experimental results show that the proposed approach outperforms state-of-the-art unsupervised counterparts across several benchmarks.

优点

The paper is easy to follow.
Using neural ordinary differential equations to address the iterative curve-adjustment update process shows better performance.

缺点

The novelty is limited, and the technical contribution is incremental. Apart from formulating the curve estimation as a NODE problem, the paper lacks innovation，which is the main reason why I gave this paper a lower score.
More strong supervised baselines should be included for reference. Comparing only a few relatively weak baselines can lead to a misunderstanding of the current gap between supervised and unsupervised methods.
Additionally, the authors should report some perceptual metrics for better comparison.
The writing and the presentation need improvement.

问题

Please refer to `Weakness'.

局限性

The authors have discussed the limitations of the paper.

作者回复

2024-08-06

Thank you for providing a thoughtful review. For enhancing our paper, we have diligently reviewed the weaknesses and questions raised regarding our paper and have prepared additional experiments and answers.

W1: Novelty

We cautiously wish to assert the novelty of our approach. In contrast to previous curve-adjustment methods that use discrete updates for gradual image enhancement, CLODE addresses the limitations of existing methods by reformulating them into NODE, which facilitates solving for the optimal solution in continuous space.
In addition, we would like to refer to Reviewer UK8A's mention, our motivation is strong and solid, and we address existing shortcomings in a straightforward and effective manner.
The proposed method shows optimal training results by reorganizing the curve-adjustment equation to NODE, which optimally handles the various exposure images in the inference time, different from a limitation of the existing curve-adjustment method. In addition, we designed a suitable network consisting of Noise Removal module and Curve Parameter Estimation module for this purpose (R-fig. (a)), as shown in Section 3.2.1 ODE function, and showed excellent performance compared with the existing unsupervised method.
Furthermore, by offering a user controllability that utilizes the features of NODE as shown in Fig.3 in the main paper, the potential of the model has been enhanced. User controllability features are among the most effective and practical aspects of applying NODE, as they allow for customized brightness outcomes to suit individual preferences.
In the context of unsupervised methods in low-light image enhancement, we firmly believe that the first attempt of transporting discrete curve-adjustment method problem to continuous space by neural ordinary differential equations, designing adequate architecture for the NODE method, and providing high visual performance, user-controllability, constitute meaningful contributions. Accordingly, we kindly ask for a reconsideration of the novelty of our work.

W2: References (strong supervised methods)

In the main paper, Retinexformer is referred, a transformer-based state-of-the-art method and the second-best performer in the NTIRE 2024 low-light challenge [1*], as a strong supervised comparison method.
We are thankful for Reviewer 9uXV’s concern and we are going to add additional strong supervised diffusion-based baselines GSAD [2*] and PyDiff [3*]. Even when compared to diffusion-based supervised methods, CLODE demonstrates competitive performance. We will include performances of two methods in our final revised version.

Diffusion-based method	LSRW/LOL (PSNR)	LSRW/LOL (PSNR mean GT)
GSAD	17.37/23.01	19.51/27.60
PyDiff	17.00/20.49	20.11/26.99
CLODE	17.28/19.61	19.61/23.16

W3: Perceptual metrics

In response to this question, we provided perceptual metrics (NIQE, BRISQUE, PI, Entropy) in Table 2 (SICE dataset) and Table 7 (LOL and LSRW datasets) in the Appendix.
Additionally, following the reviewer’s suggestion, we provide LPIPS (Learned Perceptual Image Patch Similarity) [4*] performance results. CLODE exhibits superior average LPIPS performance compared to other methods.

Dataset	URetinexNet	RetinexFormer	SCI	RUAS	ZeroDCE	NightEnhancement	PairLIE	CLODE
LSRW	0.308	0.315	0.398	0.469	0.317	0.583	0.342	0.331
LOL	0.121	0.131	0.358	0.270	0.335	0.241	0.248	0.263
SICE	0.264	0.263	0.486	0.608	0.239	0.360	0.305	0.235
MSEC	0.393	0.362	0.396	0.668	0.329	0.462	0.431	0.223
Average	0.272	0.268	0.410	0.504	0.305	0.412	0.332	0.263

W4: The writing and the presentation need improvement.

Lastly, as Reviewer 9uXV advised, we are willing to improve writing and presentation of the paper based on comments. In the final revision, we will incorporate the reviewer’s suggestions and additional results and improve the quality of the manuscript.

Reference

[1*] Liu, Xiaoning, et al. "NTIRE 2024 challenge on low light image enhancement: Methods and results." In CVPRW, 2024.

[2*] Hou, Jinhui, et al. "Global structure-aware diffusion process for low-light image enhancement." In NeurIPS, 2023.

[3*] Zhou, Dewei, Zongxin Yang, and Yi Yang. "Pyramid diffusion models for low-light image enhancement." In IJCAI, 2023.

[4*] Zhang, Richard, et al. "The unreasonable effectiveness of deep features as a perceptual metric." In CVPR, 2018.

审稿意见

评分: 6置信度: 42024-07-13

This paper This paper proposes an ODE-based method to tackle low-light image enhancement problems. The motivation of the paper is inspired by the observation that the conventional discrete iterative approaches set fixed update steps. It does not only miss the optimal solution and also does not guarantee the convergence. Hence, the proposed method takes the iterative curve-adjustment approach and formulates them into solving neural ordinary differential equations. This method is used to work with unsupervised learning to estimate the higher order curve parameters to reconstruct image structure details. Comprehensive experiments demonstrate that the proposed method outperforms the baseline methods on LOL and SICE benchmarking datasets.

优点

Strengths

This paper proposes a novel method that integrates the neural networks into an ODE optimization framework. The neural network is playing an adaptive set of updatable parameters.
Comprehensive experiments show that the proposed method outperforms the baseline methods in the task of low-light image enhancement.
The motivation of this paper is strong and solid. It is inspired by the drawbacks of the existing methods and tackle the problems directly in the proposed method.
This paper addresses the limitation of the proposed method.

缺点

Weakness

Based on the visual comparison in Figure 4, the proposed method tends to produce over-exposed areas for highlight regions.
The processing speed of the proposed method is one of the limitations.

问题

This paper is a novel low-light image enhancement work with solid equation derivation to support the its objective. The authors are expected to address the questions raised in the weakness sections in the rebuttal period.

局限性

The limitation is included in the main manuscript. The processing speed (inference speed) of the proposed method is slow compared to dedicated supervised DL methods.

作者回复

2024-08-06

We are glad to hear that you found the paper to be strong and solid. We have diligently examined your comments and concerns as a reviewer, and have prepared responses addressing the raised concerns.

W1: Highlight region

We understand the reviewer’s concern in Fig. 4. As a reminder, CLODE is trained in an unsupervised manner without ground truth images and loss functions are aimed to enhance regions of the image to the desired exposure values (e.g., $E$ =0.6) while preserving spatial and color constancy based on the input image statistics.
Although we employ spatial consistency loss and color constancy loss, because our method relies solely on the input image statistics, and the inferred color may appear grayish in enhanced images due to the Gray-World hypothesis (L224). This is a significant issue that needs to be addressed in unsupervised methods. We have spent a lot of effort searching for an appropriate color loss to apply to unsupervised methods, but we could not find a more suitable loss for environments that use only a single input image.
However, we would like to emphasize that our model is more robust to over-exposed images compared to other models. The third row in Fig.4 compares the results of each model for a given over-exposed image, and our model is the most robust to the over-exposure situation. The tonal adjustment result is closer to the ground truth than the other methods.
Over-exposed images partially contain pixel values outside the color range, which do not provide sufficient information for image enhancement. As can be seen in Fig.4, this issue is also present in supervised methods. However, we would like to underline that CLODE can provide visually reasonable results even for over-exposed images. Please also have a look at Fig.15 in the Appendix for additional results. Our method shows visually impressive results on various exposure conditions. Additionally, resolving issues with over-exposed images will require the use of a generative model, which we plan to explore in future research.

W2: Processing speed

In Sec. 5 Limitations, as Reviewer UK8A concerned, we reported less inference speed compared to existing methods. As Reviewer UK8A mentioned, the inference speed of CLODE is comparable to that of RetinexFormer. However, in Table 5 and L332, we also reported CLODE-S which is a compact version of the proposed method. CLODE-S consists of 0.0004M parameters and takes 0.005 second to infer, and its architectural details are shown in R-Fig.1(b). In addition, we remain a distillation approach for shortening inference time as a future work.
Another possibility is to apply RectifiedFlow [1*] in an unsupervised manner, which transforms the solution paths of Neural ODEs into straight lines, facilitating faster estimation of the Neural ODE system. We can initially assume that the optimal solution found by CLODE aligns with the expected optimal solution in [1*], and then apply CLODE specifically to [1*].
Given the potential to implement advanced Flow Matching methods [1*, 2*], we believe CLODE holds great promise and could achieve rapid inference speeds.

References

[1*] Liu, Xingchao, and Chengyue Gong. "Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow." In ICLR, 2023

[2*] Tong, A., Malkin, N., Huguet, G., Zhang, Y., Rector-Brooks, J., Fatras, K., ... & Bengio, Y. "Improving and generalizing flow-based generative models with minibatch optimal transport." In Transactions on Machine Learning Research, 2024

评论- Comment after rebuttal

2024-08-14

Thank you for the authors' feedback. The weakness concern is addressed in the rebuttal.

2024-08-14

Thank you for your response. It was a pleasure to write the rebuttal, and we appreciate your continued support of our work. Once again, thank you for your efforts, and we will incorporate the feedback into the final version.

审稿意见

评分: 5置信度: 52024-07-15

This paper proposes a Neural ODE method for curve-adjustment-based low light image enhancement methods to achieve better results which are often sub-optimal for fixed discrete step methods. Specifically, the proposed method reformulates the curve-adjustment-based from the discrete version into the ODE problem by introducing a continuous state. An ODE solver is adopted for the optimization to find the optimal step for the enhancement. Additionally, a simple denosier and a curve parameter estimation module are proposed for noise removal and parameter estimation, respectively. Extensive experiments are conducted to show the effectiveness of the proposed method.

优点

Turn the discrete curve-adjustment method into a NODE problem, benefited from the optimization to search for the optimal step.
User control support during inference is good for the application of the proposed method.
The proposed method seems to have good performance over other competitors.

缺点

The proposed method faces color casts, which is obvious in almost all qualitative results, even with a color constraint in loss functions.
The proposed method proposes to denoiser and curve parameter estimator in the NODE framework, however, generalize the method to existing curve-ajustment-based method seems to be a more attractive solution.
The denoiser seems to be weak since there is so much noise left for the qualitative results.

问题

What leads to the color cast, even with color constraints applied. And it seems that the steps increases, the color shifts become stronger. Comparisons with user-controlled result are also evident for the color shifts. An evaluation on color consistency is necessary for this issue.
Is there any potential to generalize such a method to previous methods, like turn previous methods into NODE and find optimal steps for them?
In Table 3 in the main paper and Figure 9 in the supp., the comparisons between discrete methods and the proposed method show good advantage of the proposed method over previous methods. But what is the statistics for a dataset or some test examples, like the most common optimal steps for inextreme cases? It may help to get interesting findings.
Why is the denoiser so weak? Can it be improved by use existing modules?

局限性

Yes, it is discussed in the paper.

作者回复

2024-08-06

Thank you very much for thoroughly reviewing our paper. We appreciate your feedback. To address the concerns you raised, we are providing several experimental results and our perspectives.

W1, Q1: Color cast

[What leads to the color cast?]

CLODE enhances the image based on the color statistics of the input image in an unsupervised manner, which can lead to color casts. To elaborate, while curve adjustment methods preserve image details and enhance naturalness, color loss following the Gray-World hypothesis (L224) can lead to color cast issues as the exposure level changes.
CLODE follows the same color constancy loss Eq. (12) as previous zero-reference based method [6] for color constancy. Nevertheless, compared to existing methods, CLODE using NODE scenario exhibits superior performance in terms of naturalness image quality metrics and color matching histogram loss. (R-Table 4)

[Color-matching histogram loss function]

To address Reviewer 9pkv’s color casts concern more precisely, we prepare additional comparison results on color-matching histogram loss function in HistoGAN [1*] which is designed for controlling color of input image in [1*] by matching color histograms of target image and input image. We utilized color-matching histogram loss to measure color casting degrees from ground-truth images to output images.
In the R-Table 4, we compare non-reference metrics and color-matching histogram loss results with existing unsupervised methods. The color-matching histogram loss of CLODE+ and CLODE ranked first and second-best in SICE dataset and first and third-best in LOL dataset. This indicates that CLODE exhibits fewer color casts compared to existing unsupervised methods, despite achieving high-quality enhancement results.

W2, Q2: Generalizing the method

As suggested, CLODE can be generalized to existing curve adjustment methods. While it is possible to apply NODE to existing architectures, accurate curve estimation is crucial for high quality. Therefore, we developed a new compact and efficient architecture that can effectively apply on NODE and estimate the fine curves as depicted in R-Fig.1 (a).
Regarding the conversion of previous methods to NODE, our experiments with the existing network [6] demonstrated that while applying NODE is feasible, it is not effective. ([6]_large* is a version of the network from [6] with increased parameters for performance improvement.) Method|#params (M)|PSNR/SSIM :---|:---:|:---: [6]+NODE|0.0794|17.17/0.571 [6]_large*+NODE|0.3593|19.26/0.637 CLODE|0.2167|19.61/0.718
We can also set the maximum allowable steps for CLODE, but instead of setting a fixed optimal step, using optimal steps is effective for diverse image datasets. (Please refer to R-Fig.2 and global rebuttal G1 for further explanation of CLODE).
Thus, we developed our own network (CLODE) using an adaptive solver with optimal steps that must vary for each image.
For additional explanations regarding the optimal step, please refer to the following Q3.

Q3: Optimal step statistics

We believe this concern is very constructive. As previously mentioned in L465, the maximum allowable step for the ODE solver is empirically set to 30, considering speed. The ODE solver terminates early if it finds the optimal solution within the maximum steps.
Furthermore, in order to provide the statistical analysis proposed by 9pkv, the average number of steps for the ODE solver was calculated across the SICE, BSD100, DIV2K, and LOL datasets. The SICE dataset comprises five to seven images per sample, with exposure levels ranging from under-exposed to over-exposed. Additionally, BSD100 and DIV2K were used to provide additional statistics for normal-exposed conditions. The results based on the exposure conditions are in R-Table 5.
The number of calculation steps increases in the order of under-exposed, over-exposed, and normal-exposed images. This is because improving a normal-exposed image is considered a stiff problem. For normal-exposed images, where minimal improvement is required, the dynamics are more stiff than for other images. Specifically, for dynamically stiff ODE problems, the step size taken by the solver is forced down to a small level even in a region where the solution curve is smooth, and these decreased step sizes may require more evaluation steps.
For a better understanding, we plotted R-Fig.3. In the inference time, CLODE aims to find the optimal solution by minimizing the loss functions, so in R-Fig.3 the y-axis represents the non-reference loss value, the x-axis represents time, and each point indicates a step.
The dopri5 solver we primarily use is a nonstiff solver. Although a stiff solver (e.g., ode15s) could be employed to shorten inference steps for normal-exposed images, it is not efficient for improving under- or over-exposed images that are close to nonstiff problems. Thanks to the quality suggestion, future research may explore ode solver algorithms that are dynamically adapted based on input image conditions.

W3, Q4: Weakness of denoiser

First of all, as mentioned in Q4, our method can be improved by adopting existing denoising modules. However by considering the inference speed issue of the model, we designed a compact size of denoiser. For 9pkv’s worry of the effectiveness of denoiser, we cautiously want to assert Table 4 in the main paper. From comparison of case (2c) and (2d), we could find the Noise Removal module (denoiser) raises PSNR 0.94dB and SSIM 0.141. To address the concern clearly, we prepared experiments on diverse aspects (+using existing denoiser) in global rebuttal. Please refer G2 in global rebuttal.

Reference

[1*] Mahmoud Afifi, Marcus A. Brubaker, and Michael S. Brown. "HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms." In CVPR, 2021.

2024-08-13

Thank you for your feedback. My concerns are well addressed, and I will raise my rating to borderline accept.

2024-08-13

Thank you for your response and constructive questions. We will include rebuttal content in the final vision. Once again, thank you for taking the time and raising our score.

作者回复

2024-08-06

To the reviewers.

First and foremost, we appreciate the reviewers' efforts. We have prepared responses to all comments, along with figures and tables in the attached PDF that show additional experiments to enhance our explanations. Below are brief explanations for each figure and table:

There are 4 figures and 7 tables in the attached PDF.

R-Figure:

1. UK8A W2, Qjwa Q2, 9pkv W2,Q2
- (a): Architectural details of two modules in CLODE.
- (b): Architectural overall of CLODE-S which consists of a 2-layered network.
2. Z6DS Q1, W7: Depicting inference trajectories of (a1) to (e1) and CLODE in Table 3 of the main paper.
3. 9pkv Q3: The plot of loss values of under exposed and normal exposed input images according to time steps.
4. 9pkv W4, Q4: Examples of employing existing denoising modules.

R-Table:

1. 9pkv W3, Q4: Results of existing denoising methods.
2. 9pkv W3, Q4: Results of denoising ablations.
3. Z6DS W5: Comparison results without using $L_{noise}$ .
4. 9pkv W1, Q1: Results of Color matching histogram loss
5. 9pkv Q3: Results of step statistics
6. Z6DS W1, W4: User study of CLODE+
7. Z6DS W3: Results of CLODE trained on LOL + SICE.

Additionally, we have prepared global rebuttals for Z6DS Q1, W7 and 9pkv W3, Q4, including G1. Explanation of CLODE and G2. Weakness of Denoiser.

G1. Explanation of CLODE

As mentioned in [1*], ODE reformulation provides benefits such as continuous space estimation, memory efficiency, and accurate problem-solving with ODE solvers. We’ve attached R-Fig.2 for additional analysis.
The top of R-Fig.2 shows discrete trajectories of models (a1) to (e1) from Table 3, while the bottom displays CLODE trajectories with Euler and dopri5 solvers. (Top) Discrete methods (a1) to (e1) enhance images but don’t achieve optimal exposure. (Bottom) CLODE (dopri5) provides more realistic image enhancement in continuous space.
CLODE (dopri5) uses an early stop mechanism. It tracks error at each state, terminating when the error is within allowable error rate. For dopri5, $k$ -order solutions ( $k$ =5) are used to calculate error ( $Γ_t$ ) as follows: $Γ_t = atol + rtol × norm(|O_t^k - O_t^{k-1}|), (Eq.(29))$
where the $k$ -order solution at time $t$ is denoted as $O_t^k$ and the $(k-1)$ -order solution is denoted as $O_t^{k-1}$ .
If $|O_t^k - O_t^{k-1}| > Γ_t$ the step size is re-adjusted. If it’s within $Γ_t$ , the solution is deemed optimal, and the process terminates.
ODE solvers are designed to find optimal solutions through iterative steps. R-Fig.2 shows that discrete methods can’t guarantee optimal solutions, which led us to develop the NODE method for continuous ODE problems. Thus, improvements are due more to NODE reformulation than to iteration count. Table 3 shows that NODE outperforms simple discrete repetition. For example, using the Euler method in 30 steps achieves better performance than method (e1).
As shown in Fig.9 and R-Fig.2, many steps are found initially because adaptive solvers like dopri5 need initial step estimation with higher-order solutions to ensure accuracy.
Dopri5 uses higher-order solutions to ensure the accuracy of the optimal solution, as seen in Eq (29), requiring at least 6 evaluations per state. Thus, it uses short intervals initially to store evaluations. We chose dopri5 as CLODE’s default solver for its stability and reliability across platforms like MATLAB.

G2. Weakness of Denoiser

[Effectiveness of denoiser]

Since NODEs do simulation-based training, as the denoiser (Noise Removal module) gets heavier, it consumes more time for training. Therefore, we employ a lightweight 3-layer network (0.085MB) as the denoiser in CLODE.
Noise removal module has a fewer number of parameters, but accomplishes an important task. The module learns to denoise the image at each step (Eq.(7)), thus learning jointly with the image enhancement process and helping to predict the fine-grained curve parameter maps at each step.
We train denoising during the image enhancement process to effectively train our noise module.
To demonstrate this effect, we compared three different scenarios (Pre-denoising, CLODE, Post-denoising). For clarity, "Pre-denoising" trains the denoiser only on the input image $I_0$ , while "Post-denoising" trains the denoiser only on the enhanced image $I_T$ .
CLODE shows outperforming results, which implies that CLODE is the best scenario among them. (R-Table 2)
The reason for these results is low-light images have low pixel values, which provide insufficient information for denoising. After enhancement, the original noise becomes entangled with the image content. Thus, we believe that continuous denoising is crucial for effective low-light correction because noise is amplified with continuous exposure enhancement.

[Using existing methods]

We could find performance improvements by utilizing existing denosiers to get $\tilde{I}_T$ .
For experiments, we adopted DnCNN [2*] and Restormer [3*] as a denoiser. In quantitative terms, there is 0.66 gain in SSIM by [3*] and 0.1dB gain in PSNR by [2*] and also in R-Table 1, we achieve visually outstanding results. (R-Fig. 4)
Thanks to 9pkv's suggestion, we have demonstrated that using the existing denoiser or increasing the size of the module can also be beneficial.

References

[1*] Chen, Ricky TQ, et al. "Neural ordinary differential equations." In NeurIPS, 2018.

[2*] Zhang, Kai, et al. "Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising." In IEEE transactions on image processing, 2017.

[3*] Zamir, Syed Waqas, et al. "Restormer: Efficient transformer for high-resolution image restoration." In CVPR, 2022.

2024-08-13

To Area Chair and reviewers

We are aware of the considerable effort you are putting into the conference.

Initially, we were hesitant to write this message, but with only one day left in the discussion period, we felt compelled to do so.

In preparing the rebuttal, we have invested significant time and effort into addressing the reviewers’ questions.

Despite having prepared the majority of the rebuttals for the four reviewers—especially for Z6DS and 9pkv, who posed numerous questions—we have yet to receive any response.

In addition, if further information or clarification is needed, please do not hesitate to contact the authors.

Thank you for your time and consideration.

最终决定Reject

2024-09-25

Thank you for taking the time to submit a rebuttal. After careful consideration and discussion, we regret to inform you that the paper has not been accepted. The paper proposes to formulate the iterative curve-adjustment update process in low light enhancement as solving Neural ODE. While the reviewers acknowledge the good results, they still have concerns. For example, the technical novelty is not viewed as substantial (by Reviewer 9uXV). The writing also need to be improved for clarity. Given these concerns, the ACs have made decision to not accept this paper this time.