PaperHub
6.1
/10
Poster4 位审稿人
最低2最高5标准差1.3
2
5
2
4
ICML 2025

High Dynamic Range Novel View Synthesis with Single Exposure

OpenReviewPDF
提交: 2025-01-13更新: 2025-07-24

摘要

关键词
High Dynamic RangeNovel View SynthesisLow Dynamic RangeData Synthesis

评审与讨论

审稿意见
2

The paper introduces Mono-HDR-3D, a framework for High Dynamic Range Novel View Synthesis (HDR-NVS) that operates effectively with only single-exposure Low Dynamic Range (LDR) images during training. The approach addresses limitations of previous multi-exposure methods by proposing a meta-algorithm that includes two dedicated modules: an LDR-to-HDR Color Converter (L2H-CC) and an HDR-to-LDR Color Converter (H2L-CC), forming a closed-loop design. Experimental results on synthetic and real datasets demonstrate significant improvements in HDR novel view synthesis quality compared to previous state-of-the-art methods (HDR-NeRF & HDR-GS).

update after rebuttal

Many thanks to the rebuttal. Out of concern for the rigor expected from reviewers and the noticeable lack of many relevant related works, I maintain my recommendation to reject.

给作者的问题

See "Other Strengths And Weaknesses" and "Other Comments Or Suggestions"

论据与证据

The evidence presented is convincing and supports the claims made by the authors. The experiments span both synthetic and real datasets, and the comparisons are conducted in a fair and comprehensive manner. Additionally, the ablation studies effectively isolate the contributions of various components within the proposed framework.

However, while it may be relatively straightforward to outperform HDR-GS and HDR-NeRF when tailoring the design to a specific setting, the real challenge lies in demonstrating that the method can also surpass these baselines under their own conditions (i.e., with inconsistent exposure times). Addressing this would provide stronger evidence for the generalizability of the approach.

A major problem is that the authors consider ideal conditions, where multi-view images have the same exposure. However, in real-world scenarios, this is difficult because each camera has different exposure settings (i.e., ISO, exposure time). I believe that single-view reconstruction is more suitable for scenes with a single exposure, rather than multi-view reconstruction. This is my biggest confusion with the paper.

方法与评估标准

The proposed methods make sense for the problem of HDR-NVS with single-exposure LDR images. The architecture of Mono-HDR-3D, with its dedicated color conversion modules and closed-loop design, directly addresses the challenge of learning HDR representations from limited information.

The evaluation criteria (PSNR, SSIM, LPIPS) are standard for novel view synthesis tasks and appropriate for assessing both quantitative quality and perceptual performance, this part is OK for me.

理论论述

The paper does not present extensive theoretical proofs but rather focuses on conceptual framework and experimental validation, for me this is not a big problem.

实验设计与分析

This paper utilizes appropriate benchmark datasets that include both synthetic and real scenes, compares its approach against state-of-the-art methods such as HDR-NeRF and HDR-GS, incorporates both quantitative metrics and qualitative visual comparisons, and conducts ablation studies to validate its design choices.

补充材料

No supplementary material provided.

与现有文献的关系

Closely related to HDR-NeRF and HDR-GS, but improve with only need single exposure time.

However, I believe the innovation is not sufficiently strong. Revising the paper does not allow for the exploration of a more challenging setting, such as single-view, in addition to single-exposure. In an ideal single-exposure scenario, it often only captures a single view. The assumptions made in this paper are therefore too restrictive.

遗漏的重要参考文献

Some LDR novel view synthesis methods maybe need to discuss, which handle lightness correction in LDR domain, and may could connect with 2D inverse tone mapping methods to serve as a baseline.

Like : [1]. Lighting up NeRF via Unsupervised Decomposition and Enhancement , ICCV 2023 [2]. Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption, AAAI 2024 [3]. A Bilevel Optimization Approach for Novel View Synthesis, AAAI 2024

其他优缺点

Would it be possible to include some video comparison results in the supplement? This could make the visual effects more apparent, especially regarding 3D consistency.

其他意见或建议

It might be more reasonable to add some 2D inverse tone methods and basic 3DGS combinations to the comparison methods.

Currently, the baselines are limited, and the paper lacks more depth analyze about how to ensure multi-view consistency.

作者回复

Reviewer ZUxK

Q1: While it may be relatively straightforward to outperform HDR-GS and HDR-NeRF when tailoring the design to a specific setting, the real challenge lies in demonstrating that the method can also surpass these baselines under their own conditions.

Great point! As suggested, we have now evaluated Mono-HDR-GS under the conventional multi-exposure setting. As shown in the results below, we show that overall, our method achieves superior performance for both HDR and LDR rendering. We will add this test in the final version.

HDR rendering results on the synthetic datasets.

MethodPSNR (\uparrow)SSIM (\uparrow)LPIPS (\downarrow)
HDR-GS38.310.9720.013
Mono-HDR-GS38.660.9760.012

LDR rendering results of observed exposure (LDR-OE) on the synthetic datasets.

MethodPSNR (\uparrow)SSIM (\uparrow)LPIPS (\downarrow)
HDR-GS41.100.9820.011
Mono-HDR-GS40.550.9830.011

LDR rendering results of novel exposure (LDR-NE) on the synthetic datasets.

MethodPSNR (\uparrow)SSIM (\uparrow)LPIPS (\downarrow)
HDR-GS36.330.9770.016
Mono-HDR-GS36.430.9790.014

Q2: The authors consider ideal conditions, where multi-view images have the same exposure. However, in real-world scenarios, this is difficult because each camera has different exposure settings.

Apologies for this misunderstanding (multi-view vs. multi-camera). Under the proposed single-exposure setting, even a single camera device with a single shutter time setup would suffice to acquire multi-view training imagery. This is more convenient than the multi-camera requirement as the reviewer mentioned. We will clarify this.

Q3: Some LDR novel view synthesis methods maybe need to discuss, which handle lightness correction in LDR domain, and may could connect with 2D inverse tone mapping methods to serve as a baseline, i.e. [5-7]. It might be more reasonable to add some 2D inverse tone methods and basic 3DGS combinations to the comparison methods. [5] Lighting up NeRF via Unsupervised Decomposition and Enhancement, ICCV 2023; [6] Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption, AAAI 2024; [7] A Bilevel Optimization Approach for Novel View Synthesis, AAAI 2024.

Thanks for the suggestions. We will discuss them though being less relevant, as they either focus on luminance correction (vs. our color space transformation) or single image cases (vs. our 3D scene modeling). They are thus not proper competitors due to addressing different problems.

Q4: Currently, the baselines are limited, and the paper lacks more depth analyze about how to ensure multi-view consistency.

To the best of our knowledge, HDR-NeRF and HDR-GS are the only two state-of-the-art methods for HDR-NVS. We are more than happy to include more if suggested. The core of our model lies in how to learn the mapping from LDR to HDR, whilst multi-view consistency is ensured by the 3D scene model (e.g., 3DGS or NeRF) adopted. As a result, our approach is open and generic to integrate with any 3D representation models.

Q5: Would it be possible to include some video comparison results in the supplement? This could make the visual effects more apparent, especially regarding 3D consistency.

Great suggestion! We will add video results.

审稿意见
5

This paper studies the high dynamic range novel view synthesis problem with only single-exposure LDR images given.

The authors propose a generic framework, Mono-HDR-3D, that learns to capture the underlying camera imaging process for bridging LDR and HDR space effectively under the challenging single exposure scenario. Designed as a generic approach, this method can be integrated with different 3D scene models such as NeRF and 3DGS.

给作者的问题

How did you visualize the HDR results of real scenes? More visualization results should be provided.

论据与证据

Yes

方法与评估标准

Yes

理论论述

Yes

实验设计与分析

Yes

补充材料

No supplementary material is submitted by the authors

与现有文献的关系

This work studies the 3D HDR imaging problem. It is related to 3D reconstruction techniques like NeRF and 3DGS. The most related work is HDR-GS

遗漏的重要参考文献

The references are fairly enough.

其他优缺点

Strengths:

(i) This work studies a more difficult and novel problem, single-exposure high dynamic range novel view synthesis, which is more challenging as previous methods all require at least three exposure times to learn the mapping function from LDR to HDR in 3D space.

(ii) The idea of decomposing the tone-mapping function into the camera imaging process is interesting. Based on this decomposition, the authors design the low dynamic to high dynamic color converter (L2H-CC) and high dynamic to low dynamic color converter. This is good and insighted.

(iii) The writing is good and clear. Especially the mathematical notations in section 3. The presentation is well-dressed. Especially the workflow paradigm of the pipeline in figure 2.

(iv) The performance is good and solid. As shown in Table 1. The improvements on HDR-NeRF and HDR-GS are 19 dB and 3 dB. Very impressive.

Weaknesses:

(i) How to validate the two MLPs L2H-CC and H2L-CC decompose the camera imaging process. There are no supervision in the loss function to ensure this part.

(ii) The HDR results in Figure 5 (c) look very terrible and are totally different from those in the original paper of HDR-GS.

(iii) The improvements on real scenes are marginal. Why? The authors do not explain this.

(iv) Code and models are not submitted. The reproducibility cannot be checked.

其他意见或建议

I suggest the authors to re-organize the paper to remove the blank in Line 159 - 164.

作者回复

Reviewer nprx

Q1: How to validate the two MLPs L2H-CC and H2L-CC decompose the camera imaging process. There are no supervision in the loss function to ensure this part.

Great question! It is exaclty due to no such supersion that makes the problem extremely challenging. It is the architecture of the two convertors (Fig. 3 & 4) we introduce here that imposes structural prior of camera imaging (Eq. (6)) to drive the model approaximate the underlying camera imaging process. This has been validated in ablation study (see Tab. 3) by comparing with plan MLP without such structure.

Q2: The improvements on real scenes are marginal. Why? The authors do not explain this.

On real scenes without HDR ground truth, the improvement is indeed more challenge to acquire. However, to provide more evidence, we have now quantitatively evaluated two no-reference image quality assessment (NR-IQA) metrics: NIQE [3] and CLIP-IQA [4], without the need for HDR ground truth. We report the HDR results on real-world datasets below:

MethodNIQE (\downarrow)CLIP-IQA (\uparrow)
HDR-GS6.400.48
Mono-HDR-GS3.630.50

This test further indicates the meaningful superiority of our method over previous alternatives. Pleases note, Tab. 2 (main paper) reports the results of LDR rendering which is not the focus of this work.

[3] Mittal A, Soundararajan R, Bovik A C. Making a “completely blind” image quality analyzer. IEEE Signal processing letters, 2012. [4] Wang J, Chan K C K, Loy C C. Exploring clip for assessing the look and feel of images. AAAI 2023.

Q3: Code and models are not submitted. The reproducibility cannot be checked.

Our code and models will be released later.

Q4: I suggest the authors to re-organize the paper to remove the blank in lines 159-164.

Thanks, we will.

Q5: The HDR results in Fig. 5cc look very terrible and are totally different from those in the original paper of HDR-GS.

Great spot, but please note that our single-exposure setting is a more challenging and more practical, as comthe to the conventional multi-exppsure setting as used in the HDR-GS. This contrast illustrates exactly the challenges with our new setting (e.g., the limited luminance information cannot fulfill the Nyquist-Shannon sampling theorem requirements for dynamic range recovery). We used the official codes of HDR-GS to ensure the correction (with the same code, we can reproduce the results of multi-exposure setting). Please note, this work does not fully solve the single-exposure HDR-NVS problem but indicates a meaningful forward step and forster more advanced research in the future.

审稿人评论

Thanks for your response. My concerns have been addressed. I raise my score.

作者评论

Thanks for your detailed review and constructive feedback, which greatly improved our work. We’re truly grateful that our response addressed your concerns and appreciate your updated score of 5. Your expertise has been instrumental in enhancing the quality of our work.

审稿意见
2

This paper proposes a novel method for HDR scene novel view rendering with single-exposure LDR images. The approach involves two key components: an LDR-to-HDR (L2H) converter and an HDR-to-LDR (H2L) converter, both designed based on the camera imaging process. The L2H module first converts LDR images into HDR representations, while the H2L module generates LDR images for supervision using the input images. Experiments on synthetic datasets demonstrate that this method achieves superior rendering quality compared to existing approaches.

给作者的问题

See Other Strengths And Weaknesses.

论据与证据

The major claims are supported by experiments.

方法与评估标准

The proposed methods make sense for the problem.

理论论述

N/A

实验设计与分析

The experimental designs are valid.

补充材料

N/A

与现有文献的关系

The paper is significant in the field of NVS since it proposes a novel method to create HDR representations with single exposure LDF images.

遗漏的重要参考文献

References are adequately discussed.

其他优缺点

Strength The design of HDR and LDR converters is novel. It’s interesting to use only single-exposure LDR images to generate HDR 3D representation, which can further inspire fields in view synthesis and inverse rendering. Results on synthetic data are promising.

Weakness

  • The proposed L2H-CC converts an LDR model to an HDR model, however, after reading the paper, I'm not completely clear how its design prevents the module from learning a trivial solution of mapping to an LDR instead of HDR mode.
  • Important technical details on implementation and experiments are missing and could benefit from more explanation:
  1. How does Mono-HDR-GS use HDR loss? The paper claims to only use LDR images, there should be no HDR ground truth during optimization.
  2. The ablation study of closed-loop design lacks implementation details. How does supervision work without “H2L-CC’? What supervision is used to learn “L2H-CC’?
  3. It’s better if authors can add ablation on the losses, l_{ldr}, l_{hdr}, and l_{h2l} for a more comprehensive evaluation.

The paper presents an interesting approach to HDR novel view synthesis using only LDR images, without relying on any data-driven priors. However, my main concerns lie in the lack of clarity in certain technical explanations and the omission of crucial experimental details. Notably, while the paper claims to train solely with LDR images, the implementation in Section 3.3 appears to incorporate HDR image loss, which raises questions about the training setup. A more positive rating will be considered if the authors can thoroughly address these concerns.

其他意见或建议

N/A

作者回复

Reviewer dY8v

Q1: The proposed L2H-CC converts an LDR model to an HDR model, however, after reading the paper, I'm not completely clear how its design prevents the module from learning a trivial solution of mapping to an LDR instead of HDR mode.

Let us summarize the key features of our method: First, previous methods such as HDR-GS (Cai et al., 2024) and HDR-NeRF (Huang et al., 2022) are inferior in design to tackle this more challenging single-exposure HDR-NVS problem since directly learning a HDR scene model from single-exposure multi-view imagery is extremely challenging (see L86-93). With Mono-HDR-3D, we instead first learn a LDR scene model. More importantly, we impose the inherent camera imaging mechanism [1-2] (see the camera imaging formula Eq. (6), Sec 3.2) to facilitate HDR color estimation - enabling more robust translation from LDR to HDR by leveraging imaging physics prior knowledge (L190-216, Sec. 3.2). We will further clarify.

[1] Noise-optimal capture for high dynamic range photography[C]//CVPR, 2010: 553-560. [2] Compressed-SDR to HDR Video Reconstruction[J]//TPAMI, 2023, 46(5): 3679-3691.

Q2: How does Mono-HDR-GS use HDR loss? The paper claims to only use LDR images, there should be no HDR ground truth during optimization.

Apologies for this misunderstanding. As stated in L255-265 that, our model supports both cases, with and without HDR ground truth. Given HDR data, the HDR loss LhdrL_\text{hdr} is used along with the others as shown in Eq. (9).

Q3: The ablation study of closed-loop design lacks implementation details. How does supervision work without “H2L-CC”? What supervision is used to learn “L2H-CC”?

In the closed-loop ablation study (Tab. 4), when the H2L-CC module is omitted, the HDR loss LhdrL_\text{hdr} directly supervises the L2H-CC module to learn the LDR-to-HDR transformation, and LldrL_\text{ldr} enforces the input to L2H-CC as a valid LDR model. Additionally, if H2L-CC exists, an extra loss Lh2lL_\text{h2l} will be added to supervise the L2H-CC.

Q4: It’s better if authors can add ablation on the losses, LldrL_\text{ldr}, LhdrL_\text{hdr}, and Lh2lL_\text{h2l} for a more comprehensive evaluation.

Thanks! As suggested, we have now conducted an exhaustive analysis of loss combination. The results of HDR rendering are reported below:

IndexLossPSNR (\uparrow)SSIM (\uparrow)LPIPS (\downarrow)
1LldrL_\text{ldr}---
2LhdrL_\text{hdr}33.930.9250.050
3Lh2lL_\text{h2l}11.870.5040.371
4LldrL_\text{ldr} + LhdrL_\text{hdr}38.190.9740.015
5LldrL_\text{ldr} + Lh2lL_\text{h2l}13.500.5070.359
6LhdrL_\text{hdr} + Lh2lL_\text{h2l}33.580.9340.058
7LldrL_\text{ldr} + LhdrL_\text{hdr} + Lh2lL_\text{h2l}38.570.9750.012

We highlight that:

  • HDR loss LhdrL_\text{hdr} is basically important as expected;
  • LDR loss helps clearly by properly supervising the LDR scene model optimization;
  • The closed loop loss Lh2lL_\text{h2l} adds further value on top.
审稿意见
4

This paper introduces Mono-HDR-3D, a novel single-exposure HDR-NVS approach that reconstructs 3D HDR scenes in NeRF or 3DGS using only LDR images, eliminating the need for multi-exposure inputs. The method comprises two modules based on LDR image formation principle, which is LDR-to-HDR module that predicts HDR details from LDR images, and HDR-to-LDR module that allows the model to be trained with LDR images.

给作者的问题

I question the necessity of H2L module: what happens if you render the image in HDR, and simply convert it to LDR with existing modules or analytic method, instead of approximating it with MLP? Is this case not possible because there are no module which support backpropagation for training? If not, how does the model perform in the case of such naive HDR-to-LDR conversion method? Please elaborate.

论据与证据

This paper claims that having a single 3D representation for HDR and approximating the LDR-to-HDR process and HDR-to-LDR process with neural networks brings about performance improvement through its closed-loop design, even in cases where only LDR images are available. In this process, they argue the importance of modeling the architecture in likeness to the camera imaging mechanism. This claim is backed up in the experiment section at Table 3 and 4.

However, I question the authors about the case when only LDR images are available for optimizing the scene, which the setting implied in the first paragraph of H2L-CC section at Section 3.2. In this case, how is the L2H module able to learn the mapping from LDR to HDR? It seems that this module is not generalizable and optimized per scene with NeRF or 3DGS, so I suppose it would not be able to receive and guidance signals for learning L2H in cases when no HDR images are available. I ask the authors to provide additional elaboration and experimental results for this setting.

方法与评估标准

Please view Claims and Evidence: I believe additional experiment would have had to be performed to validate the performance of this method in cases when different ratio of LDR / HDR images are available, including extreme cases where only LDR or HDR images are available for 3D scene optimization.

理论论述

See Claims and Evidence.

实验设计与分析

The soundness and validity of experimental designs and analysis have been verified. However, I find qualitative results to be somewhat lacking for me to be fully convinced in the performance of this method.

补充材料

I have reviewed the supplementary material.

与现有文献的关系

This paper builds upon and extends several key areas of research in HDR imaging, Novel View Synthesis (NVS), and computational photography by addressing the limitations of existing HDR-NVS methods and introducing a new single-exposure-based approach.

遗漏的重要参考文献

N/A

其他优缺点

  • This paper is well-written and easy to follow.
  • The architecture that emulates real-life LDR-to-HDR process and HDR-to-LDR process is somewhat novel, though the idea of basing the model architecture in real-life physical properties is familiar and well-known in the field of novel view synthesis.

其他意见或建议

N/A

作者回复

Reviewer yk8h

Q1: I question the necessity of H2L module: what happens if you render the image in HDR, and simply convert it to LDR with existing modules or analytic method, instead of approximating it with MLP? Is this case not possible because there are no module which support backpropagation for training? If not, how does the model perform in the case of such naive HDR-to-LDR conversion method? Please elaborate.

Great point! Please note analytical HDR-to-LDR conversion needs the camera CRF which is typically unknown. To address this, we thus learn to approximate. We will further clarify.

Q2: I question the authors about the case when only LDR images are available for optimizing the scene, which the setting implied in the first paragraph of H2L-CC section at Section 3.2. In this case, how is the L2H module able to learn the mapping from LDR to HDR? It seems that this module is not generalizable and optimized per scene with NeRF or 3DGS, so I suppose it would not be able to receive and guidance signals for learning L2H in cases when no HDR images are available. I ask the authors to provide additional elaboration and experimental results for this setting.

Without HDR ground truth, any model will be less properly constrained including ours. However, compared with previous methods, our Mono-HDR-3D has advantages due to leveraging the inherent camera imaging mechasnim (imaging physics prior) along with the closed-loop design (H2L-CC followed by L2H-CC, forming a loop).

As suggested, we now quantitatively evaluate two no-reference image quality assessment (NR-IQA) metrics: NIQE [3] and CLIP-IQA [4], without the need for HDR ground truth. We report the HDR results on real-world datasets below:

MethodNIQE (\downarrow)CLIP-IQA (\uparrow)
HDR-GS6.400.48
Mono-HDR-GS3.630.50

This test validates the efficacy of our model design.

[3] Mittal A, Soundararajan R, Bovik A C. Making a “completely blind” image quality analyzer. IEEE Signal processing letters, 2012. [4] Wang J, Chan K C K, Loy C C. Exploring clip for assessing the look and feel of images. AAAI 2023.

Q3: I believe additional experiment would have had to be performed to validate the performance of this method in cases when different ratio of LDR / HDR images are available, including extreme cases where only LDR or HDR images are available for 3D scene optimization.

Great suggestions and many thanks! We have now conducted this suggested experiment, with the ratio of LDR / HDR images being 1/1, 2/1, 3/1, 5/1, 0/1, 1/0, respecitively.The results are reported below:

MethodLDR / HDRPSNR (\uparrow)SSIM (\uparrow)LPIPS (\downarrow)
HDR-GS1/135.300.9650.030
Mono-HDR-GS1/138.570.9750.012
HDR-GS2/135.260.9630.033
Mono-HDR-GS2/137.970.9750.013
HDR-GS3/135.160.9580.035
Mono-HDR-GS3/137.530.9740.014
HDR-GS5/134.890.9610.027
Mono-HDR-GS5/135.510.9630.023
HDR-GS0/133.460.9360.075
Mono-HDR-GS0/133.930.9250.050
HDR-GS1/010.510.5030.350
Mono-HDR-GS1/013.500.5070.359

We highlight that:

  • As the amount of HDR data decreases, our model degragdes only marginally, suggesting the merit of data efficiency.
  • LDR supervision is useful by providing a better quality scene model to be converted.
  • HDR supervision is most critical as expected.
  • Compared with HDR-GS, overall our model is superior acrss all the cases.

Q4: I find qualitative results to be somewhat lacking for me to be fully convinced in the performance of this method.

Reconstructing HDR radiance fields from single-exposure LDR inputs constitutes an ill-posed inverse problem, as the limited luminance information fails to satisfy the Nyquist-Shannon sampling theorem for dynamic range recovery. Therefore, the perceptual quality of synthesized HDR images remains fundamentally constrained by the limited luminance information in single-exposure inputs.

审稿人评论

Thanks for your response, and it seems my concerns have been addressed. I raise my score to 4.

最终决定

This paper received divergent reviews: two weak rejects, one accept, and one strong accept. The rebuttal addressed most reviewers' concerns and the two accepting reviewers upgraded their scores. Although some reviewers still have concerns about comparison with some concurrent works, the AC feels that the issues are addressable in the revision. The AC's recommendation is therefore accept.