PaperHub
6.4
/10
Poster4 位审稿人
最低4最高4标准差0.0
4
4
4
4
3.8
置信度
创新性3.0
质量2.8
清晰度2.8
重要性2.5
NeurIPS 2025

Luminance-Aware Statistical Quantization: Unsupervised Hierarchical Learning for Illumination Enhancement

OpenReviewPDF
提交: 2025-04-21更新: 2025-10-29

摘要

关键词
Low-light image enhancementLuminance-aware statistical quantizationHierarchical luminance adaptationDiffusion model.

评审与讨论

审稿意见
4

The paper presents Luminance-Aware Statistical Quantization (LASQ), an unsupervised framework for low-light image enhancement that treats illumination correction as a statistical sampling task rather than a direct pixel mapping. LASQ first shows, through empirical analysis, that natural low-to-normal luminance transitions follow power-law density curves; it then builds hierarchical “luminance adaptation operators” by sampling these curves with an adaptive Markov chain Monte Carlo strategy. These operators guide the forward path of a diffusion model, letting the network learn global-to-local corrections without paired supervision, while an optional adversarial head can exploit normal-light references when they exist.

优缺点分析

Strengths:

  1. The method is rooted in a measurable physical regularity, the power-law distribution of luminance, which gives clearer interpretability than black-box intensity mappings.

  2. Hierarchical sampling plus diffusion allows the same network to operate with or without reference images, improving generalization across sensors and scenes.

  3. Quantitative results show state-of-the-art no-reference scores and competitive full-reference scores, backed by qualitative comparisons that highlight reduced over-exposure and noise.

Weaknesses:

  1. The diffusion process uses 1,000 steps, so inference may slower and costlier than baselines.

  2. Several hyper-parameters (α, η, δ, number of layers) are hand-tuned; stability under different cameras or unseen lighting is not fully explored.

  3. The paper claims “physics-aware” operation, yet provides limited analysis of failure modes under extreme lighting, motion blur, or raw sensor data.

  4. The ablation experiment is insufficient.

问题

  1. The framework relies on accurate local luminance statistics, heavy sensor noise or clipping might break the power-law assumption. Has the author considered this question?

  2. Can the author provide a more adequate ablation experiment to verify the impact of the proposed method and the selection of hyperparameters on the results?

  3. The optional adversarial head is said to improve texture fidelity when paired references are available; how much benefit does it add in strictly unpaired training, and does it introduce any instability during optimization?

  4. The paper shows that LASQ slightly lags supervised diffusion models on PSNR for the LOL dataset; have you identified specific error patterns that explain this gap, and how might they be addressed?

I will adjust my score as appropriate according to the author's reply. 5. Could you provide failure examples involving extreme dynamic range, motion blur, or LED flicker, and discuss whether the current framework can be adapted to handle such cases?

局限性

Yes

最终评判理由

I have read the authors’ response as well as the concerns raised by most of the reviewers. The explanations provided have given me greater confidence in the effectiveness of the proposed method, and I have adjusted my score accordingly. I also recommend that the authors further improve the figures and descriptions in future versions, and use vector graphics to enhance readability.

格式问题

No

作者回复

We sincerely thank the reviewer for their valuable feedback and for giving us the opportunity to further explain and improve our work. We address each of your comments in detail below, and we hope that our responses resolve any questions or concerns you may have.

Q1: Robustness Beyond the Power-Law Assumption

We would like to clarify that our approach employs a coarse-to-fine MCMC sampling strategy to fit the luminance adaptation space, rather than rigidly enforcing the power-law assumption. As discussed in Appendix A (paragraph 4), we reinterpret the strict power-law model as a relaxed prior and leverage MCMC sampling—integrated with the forward process of a diffusion model—to explore the distribution space.

This design enables LASQ to flexibly capture a wider family of luminance mappings that are only loosely guided by the physics-inspired prior, rather than being constrained to a fixed analytical form. As a result, our method does not critically depend on the power-law assumption itself. The MCMC-based inference operates at a structured level, incorporating spatial context, which allows it to remain robust under spatially inconsistent or extreme lighting conditions—well beyond the capabilities of per-pixel or strictly model-driven approaches.

Challenging scenarios such as extreme dynamic range and heavy sensor noise, as you mentioned, will be specifically evaluated in the following rebuttal. The corresponding quantitative results further demonstrate the robustness of LASQ under these conditions.

Q2 & W2, W4: Hyperparameter Selection and Ablation

We conducted a hyperparameter sensitivity analysis on the LSRW dataset :

ParamValuePSNR ↑LPIPS ↓SSIM ↑
$\alpha$0.05 / 0.15 / 0.3 / 0.617.81 / 18.10 / 17.92 / 17.840.319 / 0.322 / 0.320 / 0.3240.512 / 0.543 / 0.530 / 0.519
$\eta$0.1 / 1.0 / 3.0 / 6.017.85 / 18.35 / 18.17 / 17.950.335 / 0.321 / 0.324 / 0.3290.537 / 0.543 / 0.546 / 0.540
$\lambda_d$0.1 / 1.0 / 10 / 2017.82 / 18.04 / 17.85 / 17.870.324 / 0.315 / 0.318 / 0.3230.545 / 0.553 / 0.549 / 0.531
$\lambda_g$0.001 / 0.005 / 0.01 / 0.117.76 / 18.22 / 18.16 / 17.880.312 / 0.310 / 0.309 / 0.3110.536 / 0.548 / 0.547 / 0.540
$N$90 / 100 / 110 / 12016.21 / 18.10/ 18.26 / 18.170.391 /0.296 / 0.300 /0.2890.480 / 0.542 / 0.546 /0.549

We varied key hyperparameters (βp\beta_{p} is determined by η\eta and δ\delta, where δ\delta is a variance stabilization factor typically set to 0.001) over a range. Results show only moderate metric changes. For example, α\alpha from 0.05 to 0.6 alters PSNR by <0.3 dB, with minimal perceptual impact. Performance improves with larger NN but saturates beyond N=100N{=}100, which we adopt as the default for a quality-efficiency trade-off. Results show our method is robust and stable across a wide hyperparameter range, indicating strong generalization.

We also conducted more ablation studies using both LSRW and LMIE:

MethodPSNR ↑SSIM ↑LPIPS ↓NIQE ↓PI ↓
k=0k = 015.820.4470.4294.514.35
k=1k = 116.230.4810.3414.314.19
k=2k = 217.380.5170.3923.883.48
k=3k = 318.120.5450.2973.112.96
k=4k = 418.240.5510.2933.143.02
w/o Lg\mathcal{L}_{\text{g}}17.930.4600.3343.213.13
Default18.140.5470.3083.153.00

We performed ablation studies to evaluate key components of LASQ. The case of k=0k=0, corresponding to enhancement in pixel space, yields the weakest results—validating the effectiveness of our latent-space design. Additionally, removing Lg\mathcal{L}_{\text{g}} leads to clear performance drops, confirming its importance to the final quality.

Q3: Effectiveness of the Adversarial Head in Unpaired Settings

We would like to clarify that the introduced adversarial discriminator is applied to unpaired references. In fact, the LASQ++ variant in our main paper is trained strictly with unpaired references, and the adversarial head still provides clear improvements in texture fidelity under this setting. We will revise the manuscript to better emphasize this point and avoid possible misunderstandings. Importantly, the adversarial discriminator does not introduce any instability during optimization, thanks to the appropriate choice of the weighting parameter λGAN\lambda_{GAN}. The detailed performance metrics are provided below.

MethodLOL(PSNR / SSIM / LPIPS)LSRW(PSNR / SSIM / LPIPS)
LASQ20.375 / 0.814 / 0.19118.137 / 0.547 / 0.308
LASQ ++20.481 / 0.807 / 0.20518.584 / 0.540 / 0.316

Q4: Generalization Limits and Adaptive Strategies on LOL Dataset

LASQ has demonstrated strong generalization ability and stability, surpassing some supervised methods on datasets such as LSWR, DICM, NPE, and VV. However, in the LOL dataset, where the training and testing images often come from similar scenes, supervised paired-learning methods can better fit the overall data distribution. Since LASQ does not utilize any reference images during training, its performance on certain metrics may be lower than that of supervised approaches. We analyzed the error maps and found that the primary issues are concentrated in a few images, where LASQ tends to produce less detailed textures and shows a global luminance discrepancy compared to the ground truth.

As a diffusion-based generative model, LASQ may occasionally generate overly smooth results. To address this, we plan to improve the decoder’s sensitivity to fine details by incorporating multi-scale texture refinement modules. Additionally, we will introduce a structure-aware loss function designed to explicitly penalize over-smoothing in regions with high structural complexity.

Furthermore, since LASQ operates without any reference images, it may lack awareness of the specific luminance distribution present in the LOL dataset. To mitigate this, we propose to anchor the MCMC sampling process around the dataset's typical luminance domain, using it as a prior to better align the generated outputs with the illumination characteristics of LOL images.

Q5 & W2, W3: Evaluation Analysis under Challenging Conditions

We first evaluated LASQ on “raw images” captured by three “different camera” models in the ELD dataset. The results show that LASQ achieves consistently strong performance across all camera sources, suggesting that our approach is not sensitive to differences in camera hardware or imaging pipelines. In contrast, other methods exhibit significantly less stability.

CameraCanonEOS70D (PSNR / SSIM / LPIPS)NikonD850(PSNR / SSIM / LPIPS)SonyA7S2 (PSNR / SSIM / LPIPS)
PairLIE17.45/ 0.485/0.52416.77/0.428/0.59117.10/0.462/0.541
WCDM17.29/ 0.560/0.51416.80/0.443/0.53617.17/0.555/0.539
LASQ17.75/ 0.602/0.41117.83/0.598/0.41717.90/0.611/0.420

To further examine robustness under challenging conditions, we applied LASQ to three representative datasets: ELD, which contains “raw sensor data” captured under “extreme dynamic lighting”; LOL_blur, which includes low-light images degraded by “motion blur”; and LED, which comprises low-light images affected by “LED flicker”.

MethodELD (PSNR / SSIM / LPIPS)LOL_blur (PSNR / SSIM / LPIPS)LED (NIQE / PI)
EnlightenGAN16.78 / 0.485 / 0.53916.15 / 0.537 / 0.5913.648 / 3.506
KinD++12.48 / 0.310 / 0.88817.88 / 0.526 / 0.5233.645 / 3.323
LightenDiffusion13.66 / 0.364 / 0.82618.18 / 0.643 / 0.4593.550 / 3.661
NeRCo10.29 / 0.260 / 0.71416.82 / 0.645 / 0.4473.546 / 3.282
PairLIE17.45 / 0.485 / 0.52417.35 / 0.616 / 0.4573.848 / 3.907
SCI14.53 / 0.149 / 0.56812.74 / 0.430 / 0.6373.961 / 3.268
SCL-LLE14.66 / 0.158 / 0.54711.57 / 0.423 / 0.6313.737 / 4.225
URetinex-Net16.73 / 0.463 /0.54417.39 / 0.634 / 0.4203.692 / 3.306
LASQ (Ours)17.69 / 0.610 / 0.40919.11 / 0.648 / 0.4183.545/ 3.257

Although performance metrics are slightly reduced compared to standard scenes, LASQ still achieves the best results across all three challenging conditions. This highlights its robustness and adaptability to real-world failure modes. In particular, the relatively strong performance under motion blur and LED flicker suggests that the proposed physics-aware design helps mitigate degradation caused by non-ideal sensor inputs and dynamic lighting effects, without relying on explicit temporal modeling or post-processing.

Under extreme conditions—such as low illumination or severe motion blur—our original framework showed limitations in preserving fine textures and accurate brightness. We will explore enhancing the decoder with multi-scale texture refinement and anchoring the MCMC sampling distribution to the ground truth luminance characteristics observed in extreme cases.

W4: Inference Computation

We compare LASQ with early lightweight methods (e.g., EnlightenGAN, KinD++) and recent diffusion models (e.g., WCDM, LightenDiffusion). While diffusion models outperform early methods, they are computationally heavy. LASQ retains their performance but offers much higher efficiency without relying on reference images, making its moderate overhead a practical trade-off.

MethodInference Time (ms)Memory Usage (MB)PSNR ↑
EnlightenGAN (2019)170.16241.4817.606
KinD++ (2021)4279.70372.1917.752
NeRCo (2023)354.772320.8719.738
PairLIE (2023)190.703499.7919.514
WCDM (2023)206.666017.8620.105
LightenDiffusion (2024)257.948049.9520.453
LASQ (ours)213.896496.6820.481

All supplementary tables and visualizations will be included in the final version of the paper or appendix.

评论

Thank you for your reply, your comment addressed most of my concerns. I have adjusted my score accordingly. I also recommend that the authors further improve the figures and descriptions in future versions, and use vector graphics to enhance readability.

评论

Thank you for your thoughtful feedback. We're glad our response addressed your concerns. We will refine the figures and descriptions in the future version and include the additional experiments in the appendix. We truly appreciate your time and effort in reviewing our work.

审稿意见
4

The paper proposes a novel, physics-inspired framework for low-light image enhancement (LLIE). The key innovation lies in reformulating LLIE as probabilistic sampling over hierarchical luminance layers that follow power-law distributions. Instead of using fixed mappings or empirical curves, LASQ applies a Markov Chain Monte Carlo (MCMC) strategy to sample luminance adaptation operators at different granularities, ranging from global adjustments to local refinements. Experiments demonstrate that LASQ attains state-of-the-art performance on non-reference datasets and comparable results to reference-based methods on paired datasets.

优缺点分析

Strengths:

  1. Theoretical Soundness: The paper presents a well-founded framework based on power-law distributed luminance statistics, supported by empirical observations
  2. Comprehensive Experiments: The paper includes detailed evaluations on both paired and unpaired benchmarks, using a wide array of perceptual and fidelity-based metrics
  3. Well-structured Paper: The paper follows a logical structure (motivation, method, experiments), and includes detailed tables and figures comparing against a large number of baselines.

Weaknesses:

  1. Computational Complexity: While the diffusion process and hierarchical sampling improve performance, they are computationally expensive and may limit real-time applications or deployment on low-power devices.
  2. Hyperparameter Sensitivity: The method introduces several hyperparameters (e.g., MCMC sampling step ) whose selection may impact robustness.

问题

  1. The hierarchical power-law distributions mentioned in the paper follow a coarse-to-fine design, with features from all levels being fused through weighted averaging within the diffusion model. However, the weights used in this process are fixed and not learnable. It would be worth exploring whether these weights could be made learnable, allowing the model to adaptively assign importance to different hierarchical levels based on the data, potentially improving fusion effectiveness and overall performance.
  2. LASQ employs Hierarchical Luminance Modeling to process low-light images at both local and global levels, followed by the use of a diffusion model framework to fuse images across different layers. However, it remains unclear whether the use of a diffusion model is strictly necessary for this fusion step. The authors should clarify why a diffusion-based approach is chosen over potentially more lightweight alternatives, and whether similar results could be achieved using more efficient models with lower computational cost.
  3. The paper adopts a diffusion model along with a hierarchical strategy, which increases the overall algorithmic complexity. Therefore, a comparison of computational complexity,such as inference time, number of parameters, and memory consumption, between the proposed method and existing approaches should be included.
  4. In the first row of Figure 3, the image generated by the LASQ method appears overly smooth, lacking fine-grained texture. In comparison, the results produced by NeRCo and PairLIE demonstrate better preservation of image details.

局限性

Please see the weakness and question parts.

最终评判理由

I have carefully read the authors’ response. Overall, I am relatively satisfied, especially with the extended explanation and comparisons on model efficiency. I believe my initial positive rating has already well reflected the quality of the paper.

格式问题

no major formatting issues.

作者回复

We sincerely thank you for your valuable feedback and constructive suggestions. We are particularly encouraged by your recognition of LASQ as "a novel, physics-inspired framework". Below, we provide a point-by-point response to your comments.

Q1: Learnable Fusion Weights via HMM-Based Adaptation

Thank you for your insightful suggestion — it aligns well with our own research perspective. We further explore an HMM-based auto-tuning framework that dynamically updates both the network hyperparameters and the fusion weights on a per-batch basis.

We begin by randomly initializing the hierarchical weights W=w1,w2,...,wnW=\\{w_1, w_2,...,w_n\\} for coarse-to-fine feature fusion. At each timestep tt, the emission probability is defined as:

p(Ft_HΘt)=p(Ft_Hθt_diff,W,I_Ht)p(I_Htγ)p(γθt_hyper)p(\mathcal{F}^{t}\_{H}|{\Theta}^{t}) = p(\mathcal{F}^{t}\_{H}|{\theta}^{t}\_\text{diff},W,\mathcal{I}\_{H}^t)p(\mathcal{I}\_{H}^t|\gamma)p(\gamma|\theta^{t}\_\text{hyper})

Here, the hidden states Θt\Theta^t include diffusion model parameters, hyperparameters and hierarchical fusion weights (Θt=θt_diff,θt_hyper,W\Theta^t=\\{{\theta}^{t}\_\text{diff},\theta^{t}\_\text{hyper},W\\}), while γ\gamma is drawn from a hyperparameter-dependent distribution. The term p(IHtγ)p(\mathcal{I}_H^t|\gamma) models MCMC-based sampling, and p(FHt)p(\mathcal{F}_H^t|\cdot) corresponds to the actual diffusion output conditioned on the intermediate latent representation. The state transition follows:

p(Θt+1Θt,Ft_H)=p(Θt+1Θt,L_totalt)p(L_totaltFt_H)p({\Theta}^{t+1}|{\Theta}^{t},\mathcal{F}^{t}\_{H}) = p({\Theta}^{t+1}|{\Theta}^{t},\mathcal{L}\_{\text{total}}^{t})p(\mathcal{L}\_{\text{total}}^{t}|\mathcal{F}^{t}\_{H})

This framework enables the model to adaptively assign importance to hierarchical levels based on data distribution, thereby enhancing fusion effectiveness and mitigating risks such as overfitting or suboptimal manual tuning. We will include this auto-tuning strategy as an extension in the revised version to demonstrate its feasibility and performance impact.

Q2: Necessity and Efficiency of Diffusion-Based Fusion

In LASQ, the luminance adaptation space is explored through a progressive MCMC sampling strategy, embedded with the forward process of a diffusion model. Specifically, the sampling over luminance states at different noise levels tt is aligned with the trajectory of the diffusion process, enabling structured and gradual adaptation of luminance features. This integration allows for effective exploration of the latent luminance space without requiring heavy supervision or fine-tuning, while ensuring stable convergence across diverse lighting conditions.

The diffusion model enables unsupervised traversal across hierarchical luminance layers, and provides a principled mechanism to progressively refine luminance representations from coarse global estimates to fine local details. This aligns well with the hierarchical structure of luminance variations in natural scenes, and allows the model to adaptively balance global consistency and local contrast. This layer-wise sampling and fusion strategy enhances the model’s robustness to complex illumination patterns and supports high generalization under diverse low-light conditions, without any reference images.

We appreciate the suggestion to explore more lightweight alternatives. We will explore adapting LASQ to lightweight models (e.g., CNNs) via a hierarchical luminance-based data augmentation strategy. Specifically, the augmented luminance maps will be allocated to different layers of the CNN, enabling image enhancement through a hierarchical training strategy. This framework maintains the core principle of hierarchical adaptation while significantly reducing computational overhead—all without requiring access to reference images. We consider this a promising direction for future research.

Q3 & W1: Computational Cost

We have now added detailed computational complexity metrics below (NVIDIA A800, LOL dataset). We clarify that the coarse-to-fine MCMC sampling mechanism is only used during training and is embedded into the forward diffusion process. It guides the model to traverse luminance layers in a hierarchical manner, enabling structured learning of light propagation. During inference, our model only performs the denoising step conditioned on the low-light input within the diffusion model, which is significantly more efficient.

We compare LASQ against both early non-diffusion-based methods (e.g., EnlightenGAN, KinD++), and recent diffusion-based approaches (e.g., WCDM, LightenDiffusion). While the early methods are lightweight, their performance lags far behind diffusion-based models across all key metrics. Existing diffusion models, although significantly more effective, tend to suffer from high computational cost due to deep architectures and iterative sampling. LASQ, while maintaining the performance advantages of diffusion models, achieves inference efficiency comparable to non-diffusion-based methods. Considering the substantial performance gain without reliance on reference images, the moderate computational overhead of LASQ is a practical and acceptable trade-off. Therefore, LASQ strikes a favorable balance between performance and computational efficiency, enabling deployment on low-power devices equipped with less than 8GB of GPU memory (e.g., NVIDIA Jetson AGX Orin), without compromising image enhancement quality. In future work, we also plan to explore lightweight variants of our framework for further efficiency.

MethodFLOPs (G)Params (M)Inference Time (ms)Memory Usage (MB)PSNR ↑SSIM ↑LPIPS ↓
EnlightenGAN (2019)16.458.64170.16241.4817.6060.6530.319
KinD++ (2021)17.498.274279.70372.1917.7520.7580.198
NeRCo (2023)184.2023.30354.772320.8719.7380.7400.239
PairLIE (2023)81.848.34190.703499.7919.5140.7310.254
WCDM (2023)374.4722.92206.666017.8620.1050.7950.211
LightenDiffusion (2024)367.9927.83257.948049.9520.4530.8030.192
LASQ (ours)219.7524.08213.896496.6820.4810.8140.191

Q4: Texture Preservation in Diffusion-Based Enhancement

Thank you for pointing this issue. As a diffusion-based generative model, LASQ may, in some isolated cases, produce overly smooth results with less detailed texture, particularly under challenging lighting or structure-less regions. However, on the full LSRW test set, LASQ outperforms both NeRCo and PairLIE in terms of all major evaluation metrics, indicating overall superior perceptual quality. Moreover, such texture-smoothing artifacts occur in less than 10% of the test cases.

To mitigate this issue in future work, we plan to enhance the decoder's detail sensitivity through multi-scale texture refinement modules, and introduce a structure-aware loss function that explicitly penalizes over-smoothing in texture-rich regions.

MethodPSNR ↑SSIM ↑LPIPS ↓
NeRCo17.8440.5350.371
PairLIE17.6020.5010.323
LASQ18.1370.5470.308

W2: Hyperparameter Selection and Sensitivity

We conducted a hyperparameter sensitivity analysis on the LSRW dataset :

ParamValuePSNR ↑LPIPS ↓SSIM ↑
$\alpha$0.05 / 0.15 / 0.3 / 0.617.81 / 18.10 / 17.92 / 17.840.319 / 0.322 / 0.320 / 0.3240.512 / 0.543 / 0.530 / 0.519
$\eta$0.1 / 1.0 / 3.0 / 6.017.85 / 18.35 / 18.17 / 17.950.335 / 0.321 / 0.324 / 0.3290.537 / 0.543 / 0.546 / 0.540
$\lambda_d$0.1 / 1.0 / 10 / 2017.82 / 18.04 / 17.85 / 17.870.324 / 0.315 / 0.318 / 0.3230.545 / 0.553 / 0.549 / 0.531
$\lambda_g$0.001 / 0.005 / 0.01 / 0.117.76 / 18.22 / 18.16 / 17.880.312 / 0.310 / 0.309 / 0.3110.536 / 0.548 / 0.547 / 0.540

We varied key hyperparameters (βp\beta_p is derived from η\eta) over a range. Results show only moderate metric changes. For example, α\alpha from 0.05 to 0.6 alters PSNR by <0.3 dB, with minimal perceptual impact. Other hyperparameters show similar stability, indicating strong robustness to tuning. We also evaluated two structural parameters on both LSRW and LMIE:

MethodPSNR ↑SSIM ↑LPIPS ↓NIQE ↓PI ↓
k=1k = 116.230.4810.3414.314.19
k=2k = 217.380.5170.3923.883.48
k=3k = 318.120.5450.2973.112.96
k=4k = 418.240.5510.2933.143.02
N=90N = 9016.210.4800.3914.294.21
N=100N = 10018.100.5420.2963.163.02
N=110N = 11018.260.5460.3003.132.99
N=120N = 12018.170.5490.2893.082.93

Performance improves with larger kk and NN, but saturates beyond k=3k{=}3 and N=100N{=}100; we adopt these as defaults for a quality-efficiency trade-off. Results show our method is robust and stable across a wide hyperparameter range, indicating strong generalization.

All supplementary tables and visualizations will be included in the final version of the paper or appendix.

审稿意见
4

This paper introduces a novel framework for low-light image enhancement (LLIE) called Luminance-Aware Statistical Quantization (LASQ). The approach redefines LLIE by addressing the inherent challenges of low-light image enhancement in practical settings, focusing on realistic and continuous luminance transitions rather than pixel-level mappings. It leverages hierarchical power-law distributions to model luminance transitions and proposes a statistical sampling process to emulate these transitions, allowing for better generalization and adaptability across various lighting conditions. The framework employs a diffusion model for unsupervised learning and achieves state-of-the-art performance, especially in scenarios where normal-light references are unavailable. The authors also provide extensive experiments demonstrating the effectiveness of LASQ in both reference-based and reference-free settings.

优缺点分析

Strengths:

  1. It introduces a Luminance Variation coordinate system and a power-law adaptation operator, grounding low-light to normal-light mapping in a physically motivated statistical model rather than purely data-driven method.

  2. The proposed method allows for unsupervised learning without the need for paired datasets, a significant advantage for real-world applications where paired data may not be available.

  3. It achieves superior NIQE and PI scores on unpaired datasets (e.g., DICM, NPE, VV) compared to other unsupervised approaches, demonstrating the method’s adaptability and robustness across diverse scenes.

Weaknesses:

  1. The paper mentions that power-law transformations are problematic in low-intensity areas, where small intensity variations cause large shifts in the model. How to handle the instability in low-intensity areas?

  2. The multi-scale MCMC sampling requires numerous iterations per image and multiple power-law operators, resulting in substantial computational and memory overhead during inference.

  3. The performance of the method is highly dependent on the choice of hyperparameters, such as the power-law exponents, sampling ratio, and the number of iterations in MCMC. The method would benefit from a clearer set of guidelines or automatic tuning mechanisms to avoid overfitting to specific datasets or scenarios.

问题

  1. Can you provide more details on the selection and tuning of key hyperparameters (e.g., α and β_P, etc.)? In particular, how sensitive are the models to changes in these parameters, and how can these parameters be optimized for a specific dataset?

  2. Given the multi-level sampling structure, how do you ensure that the MCMC process converges efficiently without excessive iterations?

  3. The paper mentions that power-law transformations are problematic in low-intensity areas, where small intensity variations cause large shifts in the model. How do you handle the instability in low-intensity areas?

局限性

Yes

最终评判理由

The authors basically resolved my concerns, especially the fact experimental parameters discussions, which sounds more convincing. Since the initial value is 4, I hence keep that as the final rating.

格式问题

N/A

作者回复

We sincerely thank the reviewer for identifying several important concerns and for the opportunity to further clarify our method. We address each point in detail below.

Q1 & W3: Hyperparameter Selection and Sensitivity

We conducted a hyperparameter sensitivity analysis on the LSRW dataset :

ParamValuePSNR ↑LPIPS ↓SSIM ↑
$\alpha$0.05 / 0.15 / 0.3 / 0.617.81 / 18.10 / 17.92 / 17.840.319 / 0.322 / 0.320 / 0.3240.512 / 0.543 / 0.530 / 0.519
$\eta$0.1 / 1.0 / 3.0 / 6.017.85 / 18.35 / 18.17 / 17.950.335 / 0.321 / 0.324 / 0.3290.537 / 0.543 / 0.546 / 0.540
$\lambda_d$0.1 / 1.0 / 10 / 2017.82 / 18.04 / 17.85 / 17.870.324 / 0.315 / 0.318 / 0.3230.545 / 0.553 / 0.549 / 0.531
$\lambda_g$0.001 / 0.005 / 0.01 / 0.117.76 / 18.22 / 18.16 / 17.880.312 / 0.310 / 0.309 / 0.3110.536 / 0.548 / 0.547 / 0.540

We varied key hyperparameters (βp\beta_p is derived from η\eta) over a range. Results show only moderate metric changes. For example, α\alpha from 0.05 to 0.6 alters PSNR by <0.3 dB, with minimal perceptual impact. Other hyperparameters show similar stability, indicating strong robustness to tuning. We also evaluated two structural parameters on both LSRW and LMIE:

MethodPSNR ↑SSIM ↑LPIPS ↓NIQE ↓PI ↓
k=1k = 116.230.4810.3414.314.19
k=2k = 217.380.5170.3923.883.48
k=3k = 318.120.5450.2973.112.96
k=4k = 418.240.5510.2933.143.02
N=90N = 9016.210.4800.3914.294.21
N=100N = 10018.100.5420.2963.163.02
N=110N = 11018.260.5460.3003.132.99
N=120N = 12018.170.5490.2893.082.93

Performance improves with larger kk and NN, but saturates beyond k=3k{=}3 and N=100N{=}100; we adopt these as defaults for a quality-efficiency trade-off. Results show our method is robust and stable across a wide hyperparameter range, indicating strong generalization.

In addition to the datasets reported in the main text (LOL, LSRW, DICM, NPE, VV, LIME, MEF), we tested LASQ on several challenging unseen domains—LOL_blur, LED, ELD, and Light-Effects—using the same fixed hyperparameters. Due to space constraints, detailed results are shown in the responses to the other reviewers. Consistently strong performance in these scenarios further confirms that LASQ generalizes well across diverse domains without retraining or re-tuning.

To improve generalization to new datasets, we add a hyperparameter summary table in the appendix. While tuning often generalizes well, we further explore an hidden Markov model (HMM)-based auto-tuning framework that updates both network and hyperparameters per batch. Specifically, the emission probability is:

p(Ft_HΘt)=p(Ft_Hθt_diff,I_Ht)p(I_Htγ)p(γθt_hyper)p(\mathcal{F}^{t}\_{H}|{\Theta}^{t}) = p(\mathcal{F}^{t}\_{H}|{\theta}^{t}\_\text{diff},\mathcal{I}\_{H}^t)p(\mathcal{I}\_{H}^t|\gamma)p(\gamma|\theta^{t}\_\text{hyper})

where hidden states Θt=θt_diff,θt_hyper\Theta^t=\\{{\theta}^{t}\_\text{diff},\theta^{t}\_\text{hyper}\\}. γ\gamma is drawn from a hyperparameter-dependent distribution. p(IHtγ)p(\mathcal{I}_H^t|\gamma) models MCMC sampling, and p(FHt)p(\mathcal{F}_H^t|\cdot) corresponds to the diffusion process. The state transition follows:

p(Θt+1Θt,Ft_H)=p(Θt+1Θt,L_totalt)p(L_totaltFt_H)p({\Theta}^{t+1}|{\Theta}^{t},\mathcal{F}^{t}\_{H}) = p({\Theta}^{t+1}|{\Theta}^{t},\mathcal{L}\_{\text{total}}^{t})p(\mathcal{L}\_{\text{total}}^{t}|\mathcal{F}^{t}\_{H})

This captures Bayesian updates of parameters. For gradient-based hyperparameter updates, we apply the chain rule:

L_totaltθt_hyper=E_γp(γθt_hyper)[L_totaltF_HtF_Htγγθt_hyper]\frac{\partial \mathcal{L}\_{\text{total}}^t}{\partial \theta^t\_{\text{hyper}}} = \mathbb{E}\_{\gamma \sim p(\gamma \mid \theta^t\_{\text{hyper}})} \left[ \frac{\partial \mathcal{L}\_{\text{total}}^t}{\partial \mathcal{F}\_H^t} \cdot \frac{\partial \mathcal{F}\_H^t}{\partial \gamma} \cdot \frac{\partial \gamma}{\partial \theta^t\_{\text{hyper}}} \right]

This framework reduces manual tuning and mitigates overfitting by dynamically adapting to data distributions. Preliminary experiments on the LSRW dataset show about 6% performance improvement.

Q2 & W2: Clarification on MCMC Sampling Cost and Convergence

We have now added detailed computational complexity metrics below (NVIDIA A800, LOL dataset). We clarify that the coarse-to-fine MCMC sampling mechanism is only used during training and is embedded into the forward diffusion process. It guides the model to traverse luminance layers in a hierarchical manner, enabling structured learning of light propagation. During inference, our model only performs the denoising step conditioned on the low-light input within the diffusion model, which is significantly more efficient.

We compare LASQ against both early non-diffusion-based methods (e.g., EnlightenGAN, KinD++), and recent diffusion-based approaches (e.g., WCDM, LightenDiffusion). While the early methods are lightweight, their performance lags far behind diffusion-based models across all key metrics. Existing diffusion models, although significantly more effective, tend to suffer from high computational cost due to deep architectures and iterative sampling. LASQ, while maintaining the performance advantages of diffusion models, achieves inference efficiency comparable to non-diffusion-based methods. This makes it highly suitable for real-world deployment. Considering the substantial performance gain without reliance on reference images, the moderate computational overhead of LASQ is a practical and acceptable trade-off. In future work, we also plan to explore lightweight variants of our framework for further efficiency.

MethodFLOPs (G)Params (M)Inference Time (ms)Memory Usage (MB)PSNR ↑SSIM ↑LPIPS ↓
EnlightenGAN (2019)16.458.64170.16241.4817.6060.6530.319
KinD++ (2021)17.498.274279.70372.1917.7520.7580.198
NeRCo (2023)184.2023.30354.772320.8719.7380.7400.239
PairLIE (2023)81.848.34190.703499.7919.5140.7310.254
WCDM (2023)374.4722.92206.666017.8620.1050.7950.211
LightenDiffusion (2024)367.9927.83257.948049.9520.4530.8030.192
LASQ (ours)219.7524.08213.896496.6820.4810.8140.191

While we reiterate that MCMC sampling is only applied during training, we fully acknowledge the importance of convergence efficiency and training stability. Notably, the sampling process is embedded into the forward diffusion trajectory, inherently generating a large number of diverse samples throughout training. This ensures thorough exploration of the latent space. Furthermore, the structured design of the latent operator space, the coarse-to-fine hierarchical sampling strategy, and the learned Markov regularization collectively facilitate natural and stable convergence without the need for excessive iterations.

Q3 & W1: Clarification on "Instability in Low-Intensity Areas"

We would like to clarify that our approach employs a coarse-to-fine MCMC sampling strategy to fit the luminance adaptation space, rather than rigidly enforcing the power-law assumption. As discussed in Appendix A (paragraph 4), we reinterpret the strict power-law model as a relaxed prior and leverage MCMC sampling—integrated with the forward process of a diffusion model—to explore the distribution space.

This design enables LASQ to flexibly capture a wider family of luminance mappings that are only loosely guided by the physics-inspired prior, rather than being constrained to a fixed analytical form. As a result, our method does not critically depend on the power-law assumption itself. The MCMC-based inference operates at a structured level, incorporating spatial context, which allows it to remain robust under spatially inconsistent or extreme lighting conditions—well beyond the capabilities of per-pixel or strictly model-driven approaches.

To further support this claim, we conduct comparative evaluations on the ELD dataset, which contains extremely low-light scenarios with strong noise and minimal luminance signal. As shown below, LASQ achieves competitive performance compared to prior methods, demonstrating its robustness even under severe low-intensity degradations. Its strong performance in additional challenging conditions—such as motion blur, extreme dynamic range, and LED flicker—is presented in our responses to other reviewers, further highlighting the stability and generalizability of LASQ.

MethodPSNR ↑SSIM ↑LPIPS ↓
EnlightenGAN16.780.4850.539
KinD++12.480.3100.888
LightenDiffusion13.660.3640.826
NeRCo10.290.2600.714
PairLIE17.450.4850.524
SCI14.530.1490.568
SCL-LLE14.660.1580.547
URetinex-Net16.730.4630.544
LASQ17.690.6100.409

All supplementary tables and visualizations will be included in the final version of the paper or appendix.

评论

The rebuttal basically resolved my concerns, i will update final rating accordingly.

评论

Thank you for your kind feedback. We're glad our response addressed your concerns, and your suggestions greatly motivate our future work. We also sincerely appreciate your time and effort in reviewing our work.

审稿意见
4

In this paper, authors propose LASQ to tackles the challenge of low-light image enhancement without requiring paired data. LASQ achieve this goal by reframing luminance adjustment as a multi-scale and physics‐informed statistical process. In practice, it leverages empirical observations of natural image intensity transitions to generate multi‐scale luminance operators for coarse‐to‐fine block‐wise refinement (within diffusion model). The result is a zero‐reference enhancement method that achieves sota performance on most of paired and unpaired benchmarks.

优缺点分析

  • Strengths
  1. the coarse-to-fine, block-wise refinement preserves both global consistency and local details at the same time. the multi-scale, power-law–based framework aligns with the physical process of illumination change is interesting and more explainable.
  2. the proposed method can work without the presence of normal-light references, which is more useful in real-world settings
  3. LASQ distills the authors’ observation of pixel intensity transitions in natural images ( expressed via the Luminance Variation Coordinate and power-law curves) into an intersting sampling operators. This elegant fusion of physical-statistical modeling with generative diffusion guidance is novel strategy or tasteful philosiphy to develope diffusion model for low-light enhancement.

-Weakness: Please see questions for details.

  1. The report of required computational cost is missing
  2. the pipeline training's sensity on those manual chosen hyperparameters is unknown.

问题

  • What is the computational cost of the proposed method? It requires block-wise refinement and multi-scale optimization from coarse to fine, which I assume is expensive, yet neither the main paper nor the supplementary material report any runtime or resource-usage measurements.

  • I also wonder how robust the method is when the power-law intensity assumption breaks down—for example, in scenes lit by a strong spotlight or with intense background light. Can it still perform well under such extreme conditions? Since it relies on manually designed statistical priors, it may only model the mapping from typical natural low-light to normal-light conditions and struggle with more complex or atypical cases.

  • The method depends heavily on empirically chosen hyperparameters (e.g., α, β, λ). How sensitive is its performance to these values? Has its stability been evaluated under different hyperparameter settings?

  • The paper claims an improvement in color fidelity. Have you validated this claim using specialized color-difference metrics to provide quantitative evidence?

局限性

yes

最终评判理由

Overall, all of my concerns are well addressed with thorough experiments and explanations. I maintain my original score, as I still lean toward a weak accept.

格式问题

no

作者回复

We sincerely thank you for your valuable feedback and constructive suggestions. We are particularly encouraged by your recognition of LASQ as "elegant fusion", "novel strategy" and "tasteful philosophy". Below, we provide a point-by-point response to your comments.

Q1 & W1: Computational Cost

We have now added detailed computational complexity metrics below (NVIDIA A800, LOL dataset). We clarify that the coarse-to-fine MCMC sampling mechanism is only used during training and is embedded into the forward diffusion process. It guides the model to traverse luminance layers in a hierarchical manner, enabling structured learning of light propagation. During inference, our model only performs the denoising step conditioned on the low-light input within the diffusion model, which is significantly more efficient.

We compare LASQ against both early non-diffusion-based methods (e.g., EnlightenGAN, KinD++), and recent diffusion-based approaches (e.g., WCDM, LightenDiffusion). While the early methods are lightweight, their performance lags far behind diffusion-based models across all key metrics. Existing diffusion models, although significantly more effective, tend to suffer from high computational cost due to deep architectures and iterative sampling. LASQ, while maintaining the performance advantages of diffusion models, achieves inference efficiency comparable to non-diffusion-based methods. This makes it highly suitable for real-world deployment. Considering the substantial performance gain without reliance on reference images, the moderate computational overhead of LASQ is a practical and acceptable trade-off. In future work, we also plan to explore lightweight variants of our framework for further efficiency.

MethodFLOPs (G)Params (M)Inference Time (ms)Memory Usage (MB)PSNR ↑SSIM ↑LPIPS ↓
EnlightenGAN (2019)16.458.64170.16241.4817.6060.6530.319
KinD++ (2021)17.498.274279.70372.1917.7520.7580.198
NeRCo (2023)184.2023.30354.772320.8719.7380.7400.239
PairLIE (2023)81.848.34190.703499.7919.5140.7310.254
WCDM (2023)374.4722.92206.666017.8620.1050.7950.211
LightenDiffusion (2024)367.9927.83257.948049.9520.4530.8030.192
LASQ (ours)219.7524.08213.896496.6820.4810.8140.191

Q2: Robustness to Extreme Lighting

We would like to clarify that our approach employs a coarse-to-fine MCMC sampling strategy to fit the luminance adaptation space, rather than rigidly enforcing the power-law assumption. As discussed in Appendix A (paragraph 4), we reinterpret the strict power-law model as a relaxed prior and leverage MCMC sampling—integrated with the forward process of a diffusion model—to explore the distribution space.

This design enables LASQ to flexibly capture a wider family of luminance mappings that are only loosely guided by the physics-inspired prior, rather than being constrained to a fixed analytical form. As a result, our method does not critically depend on the power-law assumption itself. The MCMC-based inference operates at a structured level, incorporating spatial context, which allows it to remain robust under spatially inconsistent or extreme lighting conditions—well beyond the capabilities of per-pixel or strictly model-driven approaches.

To validate the robustness and stability of LASQ across diverse lighting conditions, we evaluate it on two additional datasets: DICM, featuring scenes with “intense background light”, and Light-Effects, containing “strong spotlights”. In the tables below, each metric is presented as DICM / Light-Effects. As shown, LASQ consistently achieves the best performance across all metrics on both datasets, demonstrating its strong effectiveness even under conditions that significantly deviate from the power-law assumption.

MethodNIQE ↓PI ↓
EnlightenGAN2.7583 / 3.54512.4137 / 3.3061
NeRCo2.7690 / 3.54622.5701 / 3.2365
KinD_plus2.8584 / 3.64492.6532 / 3.3233
LightenDiffusion2.6889 / 3.55032.9789 / 3.6606
PairLIE3.2412 / 3.84773.6262 / 3.9066
SCI3.1745 / 3.96072.5552 / 3.2683
SCL-LLE2.6584 / 3.73672.4543 / 4.2245
URetinex-Net3.0365 / 3.69232.9806 / 3.7060
LASQ2.6190 / 3.24792.3633 / 3.1020

In addition to the datasets presented in the main paper and supplementary material (LOL, LSRW, NPE, VV, LIME, MEF), we also tested LASQ on several challenging unseen domains—LOL_blur, LED, and ELD. Due to space constraints, detailed results are provided in the responses to other reviewers. These experiments further demonstrate the strong generalization ability and robustness of LASQ across diverse and atypical low-light scenarios.

Q3 & W2: Hyperparameter Sensitivity

To address concerns about empirical parameter selection, we provide a hyperparameter sensitivity analysis on the LSRW dataset.

ParamValuePSNR ↑LPIPS ↓SSIM ↑
$\alpha$0.05 / 0.15 / 0.3 / 0.617.81 / 18.10 / 17.92 / 17.840.319 / 0.322 / 0.320 / 0.3240.512 / 0.543 / 0.530 / 0.519
$\eta$0.1 / 1.0 / 3.0 / 6.017.85 / 18.35 / 18.17 / 17.950.335 / 0.321 / 0.324 / 0.3290.537 / 0.543 / 0.546 / 0.540
$\lambda_d$0.1 / 1.0 / 10 / 2017.82 / 18.04 / 17.85 / 17.870.324 / 0.315 / 0.318 / 0.3230.545 / 0.553 / 0.549 / 0.531
$\lambda_g$0.001 / 0.005 / 0.01 / 0.117.76 / 18.22 / 18.16 / 17.880.312 / 0.310 / 0.309 / 0.3110.536 / 0.548 / 0.547 / 0.540

As illustrated in the table, we systematically varied key hyperparameters including α\alpha, η\eta, λd\lambda_d, and λg\lambda_g over a range of values (βp\beta_{p} is determined by η\eta). The results show that while performance slightly fluctuates with different settings, the overall impact on metrics remains moderate. For instance, varying α\alpha between 0.05 and 0.6 only causes a minor PSNR change (within 0.3 dB) and negligible shifts in perceptual scores. Similarly, other hyperparameters demonstrate a stable trend without sharp degradation. These experiments demonstrate that our method is not overly sensitive to hyperparameter selection, and maintains consistently strong performance across a broad range of settings—highlighting its robustness, practical stability, and generalization potential.

Q4: Color Fidelity Evaluation

To validate our claim of improved color reproduction, we performed a comprehensive quantitative evaluation on the LOL dataset using three widely recognized metrics that assess perceptual color accuracy: CIE76 (ΔE*ab), CIEDE2000 (ΔE₀₀), and FSIMc. ΔE*ab and ΔE₀₀ measure perceptual differences in hue, luminance, and chrom, while FSIMc captures perceptual image quality by incorporating color similarity.

MethodΔE*ab ↓ΔE₀₀ ↓FSIMc ↑
EnlightenGAN35.73417.9150.8792
KinD++30.95014.7400.9243
LightenDiffusion32.38815.6310.9178
NeRCo26.68212.6070.9342
PairLIE30.79014.2980.9192
SCI92.86258.5590.4585
SCL-LLE66.85537.1330.6928
URetinex26.08012.5510.9310
LASQ19.6499.2510.9567

As shown, our proposed LASQ achieves the lowest color difference scores and the highest FSIMc, confirming its superior color fidelity. These results quantitatively support our claim that LASQ delivers more faithful and perceptually accurate color restoration compared to other methods.

All supplementary tables and visualizations will be included in the final version of the paper or appendix.

评论

Thanks for authors' detailed explaination and rebuttal. Reponses solved all my concerns. I will change my rating correspondingly.

评论

Thank you for your positive response. We're glad our rebuttal addressed your concerns and truly appreciate the time and effort you've spent reviewing our work. We're also grateful for your willingness to adjust the final rating.

最终决定

The paper proposes LASQ, a zero-reference low-light image enhancement method that reframes luminance adjustment as a multi-scale, physics-informed statistical process within a diffusion framework. It achieves state-of-the-art results on paired and unpaired benchmarks. Reviewers praised the contribution and empirical quality; initial concerns about hyperparameter tuning and efficiency were resolved during rebuttal. I recommend acceptance.