PaperHub
6.5
/10
Poster4 位审稿人
最低6最高8标准差0.9
6
6
8
6
3.5
置信度
正确性3.0
贡献度3.0
表达2.8
ICLR 2025

Learning Gain Map for Inverse Tone Mapping

OpenReviewPDF
提交: 2024-09-21更新: 2025-02-28

摘要

关键词
Computational PhotographyInverse Tone MappingGain Map

评审与讨论

审稿意见
6

The paper introduces a dual-branch network GMNet to estimate the Gain Map for Inverse Tone-mapping. The paper also builts a synthetic and a real-world dataset from the existing HDR resources and mobile devices, respectively.

优点

The idea of estimating the Gain Map instead of the HDR value is interesting, which can become a baseline for any future research.

缺点

The methods that were used for comparison in the paper are outdated. The method is not really novel. Lack of a common metric in HDR domain. There are also some technical issues about loss function, data and models.

问题

Although the idea is interesting, it is not new and quite similar to estimating the transfer function which is well-known in image restoration domain. That said the contribution of the paper is limited.

The literature reviews and compared methods are outdated which makes the paper less convincing. The latest method that was compared is from 2022 (KUNet) while the most recent one is DCDR-UNet: Deformable Convolution Based Detail Restoration via U-shape Network for Single Image HDR Reconstruction (based on Google Scholar).

In term of PSNR in HDR domain, there is another metric which is more common than PNSR-NGM named mu-PSNR. The paper should consider running the evaluation on this metric too.

It is not clear about Q_max. Q_max value should not be in the range of [0,1], is that correct? If so, I_GM is not in the range [0,1] either. And can not normalize it as well, the reviewer wonder how the loss can be calculated and how the model was converged?

Finally, as there is a upsampling function when estimating I_GM, the reviewer think that that I_GM might not represent the real I_GM precisely due to the interpolation function when applying upsampling (which can be either nearest neighbors or bidirectional).

评论

3. Concerns on the metrics of PSNR-NGM and PSNR-mu

First, we need to clarify that PSNR-NGM is NOT a metric for evaluating HDR images. It is only used in the ablation experiments on our network, evaluating the quality of the normalized GM.

Second, we follow the previous work [7] using PSNR-L in the linear domain, but employ PSNR-PQ to evaluate visual similarity instead of PSNR-mu, which plays a similar role in compressing HDR images. The main reason is that our work is closer to ITM tasks, so applying PQ-metric is more suitable for practical applications.

Third, to further address the reviewer's concerns, we further provide the experimental results of PSNR-mu, PSNR-PQ, and PSNR-L on real-world datasets in the table below, and our method still achieves superior performance.

ㅤMethodㅤPSNR-mu↑PSNR-PQ↑PSNR-L↑
KUNet33.277133.789533.3256
HDRUNet37.575037.396633.1686
HDCFM38.714438.189432.7963
FMNet41.016439.763232.6247
DCDR-UNet39.146638.325233.5421
EPCE-HDR30.338831.120732.7616
ITM-LUT39.288037.902131.6677
Ours41.601240.208833.9490

[7] Chen, Xiangyu, et al. "Hdrunet: Single image hdr reconstruction with denoising and dequantization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

4. Explanation of the normalization of QmaxQ_{max}

The QmaxQ_{max} in the real-world dataset is in the range of [0,5], and the IGMI_{GM} is also in the range of [0,5]. The IGMI_{GM} is divided by the global maximum 5 for normalization, differing from the INGMI_{NGM} normalized by the respective maximum of each data. Therefore, IGMI_{GM} will be normalized to [0,1], making a uniform and smooth convergence.

5. Concerns on the upsampling of IGMI_{GM}

There might be some misunderstanding here. The resolution of GM in the real world is reduced in most cases, as the downsampling is recognized by the standards [1] [2] [3] to save the bandwidth. Therefore, our method aims to learn the Groud-Turth GM, not the Interpolated GM.

Moreover, we perform ablation experiments on the resolution of GM in Table 5 in our paper, demonstrating that GMNet can achieve superior performance no matter with or without interpolation.

评论

We appreciate the reviewer's valuable comments and suggestions, and we hope our responses could address the concerns.

1. Similarity to the transfer function

We respectfully cannot agree. GM is NOT similar to the transfer function, which typically maps the input to the output with a learned function (implemented by neural networks nowadays). Suppose fϕ()f_{\phi}(\cdot) is a neural network, in the HDR task, it is implemented by LHDR=fϕ(LSDR)L_{HDR}=f_{\phi}(L_{SDR}), where ϕ\phi denotes the learned parameters of the network. In contrast, GM is an image-like auxiliary data that records pixel-wise dynamic range information, which is independently collected along with the SDR image and is implemented by LHDR=LGMLSDRL_{HDR}=L_{GM}\odot L_{SDR}, differing from the transfer function in essence. The proposed GMNet is inspired by the latest HDR data format [1] [2] [3], and the novelty of exploring this new format has been uniformly recognized by the other three reviewers.

[1] ISO. "Gain map metadata for image conversion." https://www.iso.org/standard/86775.html, 2024.

[2] Adobe. "Gainmap specification." https://helpx.adobe.com/camera-raw/using/gain-map.html, 2024.

[3] Google. "Ultra-hdr image format." https://developer.android.com/media/platform/hdr-image-format, 2024.

2. Additional comparison methods, e.g., DCDR-UNet

We keep up with the latest research, but there have been few open-source methods in recent years. For example, the reviewer mentioned DCDR-UNet [4] (NOT open-sourced yet), in which the latest comparison method is also KUNet that we have already compared in the paper. Besides, we have tried to positively contact the authors for code during the research and rebuttal periods, but we did not get responses.

To address reviewers' concerns, we unofficially reproduce DCDR-UNet by ourselves and add two more baselines, EPCE-HDR [5] and ITM-LUT [6]. The experiment results on the synthetic dataset are shown below, demonstrating that our method still achieves superior performance over these latest works. Finally, we guarantee that our work will be open-sourced to promote research along this line.

ㅤMethodㅤPSNR-L↑SSIM-L↑PSNR-PQ↑SSIM-PQ↑ΔEITPΔE_{ITP}HDR-VDP3↑
KUNet40.48790.995234.52010.96197.15389.7851
HDRUNet41.25790.996940.93150.99463.57999.9080
HDCFM41.85630.996744.90170.99922.47739.9115
FMNet40.76990.997044.47980.99962.30329.8991
DCDR-UNet41.80850.997142.01830.99623.05019.9215
EPCE-HDR41.59190.995631.47000.94308.58309.6694
ITM-LUT39.57350.995043.71520.99872.28659.7268
Ours43.55100.997747.62560.99981.62629.9477

[4] Kim, Joonsoo, et al. "DCDR-UNet: Deformable Convolution Based Detail Restoration via U-shape Network for Single Image HDR Reconstruction." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.

[5] Tang, Jiaqi, et al. "High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation." ECAI 2023. IOS Press, 2023. 2330-2337.

[6] Guo, Cheng, et al. "Redistributing the Precision and Content in 3D-LUT-based Inverse Tone-mapping for HDR/WCG Display." Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production. 2023.

评论

The reviewer values this method, however although the formulations are different between transfer function and mapping, the idea behind it is still similar in which every pixel in LDR domain has its own corresponding value in HDR domain. that is why the reviewer doesn't feel convinced about the novelty of this paper.

Furthermore, even though the code for DCDR-UNet is not publicly available, it is worth mentioning it in the literature reviews.

Finally, the evaluation should be done on both synthetic and real data, to avoid the cherry-picking results.

评论

Thank you for providing further feedback. We are encouraged that “the reviewer values this method”, and we would like to address the remaining concerns as below:

1. On the novelty of our paper

We respectfully argue the justification here.

The reviewer agrees that “the formulations are different between transfer function and mapping”. As the first attempt to introduce GM to the ITM task, we firmly believe the proposed method itself carves out a new direction for this task and could inspire future works along this line. This is also a consensus among other reviewers.

Moreover, it is NOT reasonable to dismiss the novelty of our work since “the idea behind it is still similar (to mapping)“, since this vague argument can be applied almost everywhere.

Even for the mapping direction, a number of methods based on mapping have been proposed [1] [2] [3] [4] [5] [6], and their novelty has been widely recognized. It is thus clear that whether a method is based on mapping should NOT be regarded as a measure of novelty.

2. On the discussion of DCDR-UNet

We cited DCDR-UNet in the original manuscript. As suggested, we provided the comparison results in the last revised version. For additional literature review, we have added a detailed discussion for DCDR-UNet in the Related Work Section in the latest version.

3. On the evaluation of real-world dataset

We have already provided the real-world results in the last revised version (see Table 3 in the main paper). For the convenience of review, we again list the experimental results on the real-world dataset below. As can be seen, the advantages of the proposed method remain on most metrics (the only exception is HDR-VDP3, where ours is very near to DCDR-UNet).

ㅤMethodㅤPSNR-L↑SSIM-L↑PSNR-PQ↑SSIM-PQ↑ΔEITPΔE_{ITP}HDR-VDP3↑
KUNet33.32560.990733.78950.96909.05789.6893
HDRUNet33.16860.991137.39660.99695.73589.8455
HDCFM32.79630.987938.18940.99566.29579.8035
FMNet32.62470.992139.76320.99884.55449.8110
DCDR-UNet33.54210.992438.32520.99704.86459.8781
EPCE-HDR32.76160.991431.12070.946610.43229.7218
ITM-LUT31.66770.984237.90210.99585.42999.6831
Ours33.94900.992840.20880.99934.02609.8757

In summary, we believe the major concerns are addressed in the latest manuscript by integrating valuable suggestions from all expert reviewers, and we hope the strength of our work can be better recognized now.

Thanks again for your valuable time and we look forward to your update on whether the remaining concerns have been addressed.

[1] Kong, Lingtong, et al. "SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging." European Conference on Computer Vision. Springer, Cham, 2024.

[2] Tel, Steven, et al. "Alignment-free hdr deghosting with semantics consistent transformer." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.

[3] Chung, Haesoo, and Nam Ik Cho. "Lan-hdr: Luminance-based alignment network for high dynamic range video reconstruction." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.

[4] Xu, Gangwei, et al. "HDRFlow: Real-Time HDR Video Reconstruction with Large Motions." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.

[5] Shu, Yong, et al. "Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.

[6] Kim, Joonsoo, et al. "DCDR-UNet: Deformable Convolution Based Detail Restoration via U-shape Network for Single Image HDR Reconstruction." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.

评论

Dear Reviewer 6tzz,

Thanks again for your valuable time. We are honored to see that with the advice from all expert reviewers, our paper has been further improved and better recognized by now.

As today is the last day the authors can upload a revised PDF, please feel free to let us know if you have any further questions. We are glad to provide further details on any aspects of our responses that may require additional clarification or elaboration.

We look forward to your positive feedback if your major concerns have been adequately addressed.

Sincerely,

The authors

评论

The author has addressed all of my concerns, I'm happy to update my score.

评论

Thank you for raising the score.

We appreciate your thoughtful feedback and the time you took to review our manuscript, and we are grateful for your recognition of the revisions and improvements made to address your concerns. Thank you again for your efforts in helping us enhance the quality of our work!

审稿意见
6

With recent advancements in display technology and the widespread adoption of HDR displays, a new dual-layer image format has emerged that is compatible with existing SDR displays. In response to this, the authors propose a new type of inverse tone mapping algorithm that infers a gain map applicable to adaptive HDR displays. Additionally, the authors construct an additional dataset for training and evaluating this type of network. Through experiments, the authors demonstrate that their method restores HDR images more effectively and reliably compared to other approaches.

优点

  • The authors' proposed inverse tone mapping algorithm, which takes real display environments into account, is both highly practical and innovative.
  • The authors have effectively organized the existing methods in this field through Table 1. I believe this will be very helpful for future researchers studying this area.
  • They have provided a formulation for the proposed method, which makes it easier to understand the approach proposed by the authors.

缺点

  • The authors need to conduct additional survey on SI-HDR. According to [A], SI-ITM is broadly divided into two branches of research: 1) methods that directly reconstruct HDR images and 2) methods that reconstruct bracketed exposures. In my opinion, there is a need to provide a detailed description of the differences between Learning from LDR stacks methods and generating a gain map.

  • Furthermore, the authors lack comparisons not only in terms of performance for learning from LDR stacks but also in terms of time efficiency with existing methods.

    • Like [B], it is possible to create images with relative EV +1 / -1 and expand the dynamic range. A discussion should be added on whether GM-ITM is more efficient than this case or not.
  • As the authors propose a new type of network, they need to provide a detailed structure of this network.

  • There is a lack of validation on datasets commonly used in inverse tone mapping.

    • HDR-SYNTH + HDR-REAL: Yu-Lun Liu et al. Single-image hdr reconstruction by learning to reverse the camera pipeline. CVPR, 2020
    • HDREye: Hiromi Nemoto et al. Visual attention in ldr and hdr images. 9th International Workshop on Video Processing and Quality Metrics for Consumer Electronics (VPQM), 2015.
    • VDS: Siyeong Lee et al. Deep chain HDRI: Reconstructing a high dynamic range image from a single low dynamic range image. IEEE Access, 2018

[A] Lin Wang and Kuk-Jin Yoon, Deep Learning for HDR Imaging: State-of-the-Art and Future Trends, TPAMI, https://arxiv.org/abs/2110.10394

[B] Ning Zhang et al. Revisiting the Stack-Based Inverse Tone Mapping, CVPR, 2023

问题

  • Could the authors explain why the GM can be transmitted at a reduced resolution compared to the original SDR?

  • Could you share experimental results for directly learning Q_max adjusted for the number of pixels?

  • Would it be possible to release the code and data to ensure reproducibility?

  • If I have misunderstood any part or if my concerns are adequately addressed during the review process, I would be more than willing to increase my score.

评论

We appreciate the reviewer's valuable comments and suggestions, and we hope our responses could address the concerns.

1. Additional investigation on stack-based methods

The stack-based HDR reconstruction task mentioned by the reviewer is an important research branch of SI-HDR. We investigate the works of [1] [2] [3] [4] [5] [6], and summarize the main differences from our method as follows. The GM-ITM methods are inspired by a novel HDR image format, learning auxiliary data GM that can upgrade SDR to HDR display. By contrast, the stack-based methods simulate the HDRI technology, generating the multi-exposure stack for HDR reconstruction. Besides, compared to the stack-based methods, the GM-ITM methods obtain the final HDR result by multiplication of a single SDR image and the corresponding GM, avoiding the artifacts that may occur during the multi-image fusion process. We will include these discussions in the revised paper.

[1] Wang, Lin, and Kuk-Jin Yoon. "Deep learning for hdr imaging: State-of-the-art and future trends." IEEE transactions on pattern analysis and machine intelligence 44.12 (2021): 8874-8895.

[2] Endo, Yuki, Yoshihiro Kanamori, and Jun Mitani. "Deep reverse tone mapping." ACM Trans. Graph 36.6 (2017): 1-10.

[3] Lee, Siyeong, Gwon Hwan An, and Suk-Ju Kang. "Deep chain hdri: Reconstructing a high dynamic range image from a single low dynamic range image." IEEE Access 6 (2018): 49913-49924.

[4] Lee, Siyeong, Gwon Hwan An, and Suk-Ju Kang. "Deep recursive hdri: Inverse tone mapping using generative adversarial networks." proceedings of the European Conference on Computer Vision (ECCV). 2018.

[5] Kim, Junghee, Siyeong Lee, and Suk-Ju Kang. "End-to-end differentiable learning to HDR image synthesis for multi-exposure images." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 2. 2021.

[6] Zhang, Ning, et al. "Revisiting the stack-based inverse tone mapping." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

2. Comparison experiments on efficiency

We actively contacted the authors of [6] for the code but did not get responses. Therefore, we compare the efficiency of our approach with other available methods [7] [8] [9], and the experimental results are shown below. The runtime is evaluated in NVIDIA A100 as the average of 100 trials on the real-world dataset in the resolution of 4096×30724096\times3072.

ㅤMethodㅤParams↓ㅤRuntime↓ㅤㅤFLOPs↓ㅤPSNR-L↑PSNR-PQ↑ΔEITPΔE_{ITP}
KUNet1.08M621.70ms7534.50G33.325633.78959.0578
HDRUNet1.58M613.89ms4161.75G33.168637.39665.7358
HDCFM0.10M497.05ms127.08G32.796338.18946.2957
FMNet1.24M304.13ms4147.13G32.624739.76324.5544
DCDR-UNet1.26M1085.91ms4502.77G33.542138.32524.8645
EPCE-HDR31.02M23992.84ms326412.14G32.761631.120710.4322
ITM-LUT0.57M19.81ms59.16G31.667737.90215.4299
Ours1.83M75.26ms1112.18G33.949040.20884.0260

As can be seen, benefiting from the simple form of the target GM, our method achieves the second fastest runtime. While the look-up-table-based solution ITM-LUT is faster, its performance on image quality metrics is much lower.

[7] Kim, Joonsoo, et al. "DCDR-UNet: Deformable Convolution Based Detail Restoration via U-shape Network for Single Image HDR Reconstruction." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.

[8] Tang, Jiaqi, et al. "High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation." ECAI 2023. IOS Press, 2023. 2330-2337.

[9] Guo, Cheng, et al. "Redistributing the Precision and Content in 3D-LUT-based Inverse Tone-mapping for HDR/WCG Display." Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production. 2023.

3. Detailed structure of the proposed network

We will provide more details of our network structure in the revised version of our paper. Specifically, in the local head, we use three convolutional layers and ReLU activation functions to extract initial features, where all convolutional layers are in size 3, stride 1, and padding 1, except for the stride of the first convolutional layer is set to 2 to downsample the input. The local tail can be represented as (ConvReLU)2PSConv(Conv \circ ReLU)^2 \circ PS \circ Conv, where ()n(\cdot)^n means cascade of nn modules and all convolutional layers are in size 3, stride 1, and padding 1. The rest implementation details of the other modules can be found in our code that will be open-sourced.

评论

We select LDR images with the exposure of 0 [EV] as the inputs for both VDS and HDREye datasets. The experiments with additional HDR-VDP3 metric on VDS and HDREye datasets are shown below.

MethodPSNR-L↑SSIM-L↑PSNR-PQ↑SSIM-PQ↑ΔEITPΔE_{ITP}HDR-VDP3↑
KUNet29.03940.843519.91520.802044.61976.7761
FMNet29.05430.844720.14090.828243.73376.7819
Ours28.80870.846821.14740.835640.81716.5656
MethodPSNR-L↑SSIM-L↑PSNR-PQ↑SSIM-PQ↑ΔEITPΔE_{ITP}HDR-VDP3↑
KUNet23.16480.664115.59610.539864.08995.5913
FMNet23.24160.662215.98040.592962.18015.7408
Ours23.24460.675816.61800.602356.74205.7035

As mentioned, these datasets are customized for the Sl-HDR task with a different intent from the ITM task. The uniformly low quantitative metrics for all methods clearly demonstrate the domain gap between the two tasks, making the comparison results less informative.

评论

My concerns have been fully addressed by the experiments added to the review, so I am upgrading the score. Additionally, I would like to see some quantitative metrics added to explain why the performance degradation is attributed to the domain gap.

评论

Thank you for raising the score.

We are encouraged that our response has effectively addressed your concerns. We sincerely appreciate your thorough review and the valuable time you have taken to help us improve our work.

Regarding your request for quantitative metrics to explain the performance degradation attributed to the domain gap, we appreciate the suggestion and agree that this would provide valuable insights. We are currently working on it and need a bit more time to complete the experiments. We aim to provide these additional results as soon as possible. Thank you again for your thoughtful feedback!

评论

Dear reviewer nA6K,

Thank you for your generous support first. Regarding your request for quantitative metrics to explain the performance degradation attributed to the domain gap, we make the following clarifications.

As stated in [A] [B], the HDR images in SI-HDR datasets are suggested to be reproduced with physically correct values using measured data. However, most HDR images in SI-HDR datasets do not have this data and store relative irradiance values, while GM-ITM datasets represent absolute display values. For a fair comparison, we choose the following quantitative metrics that do not rely on absolute values to measure the gap between datasets.

ㅤㅤㅤDatasetㅤㅤㅤCVpeakCV_{peak}SaturationKurtosisSkewness
SI-HDR (VDS)1.40800.371081.84104.4959
SI-HDR (HDREye)1.64770.327533.09523.9520
GM-ITM (Synthetic)0.47260.603011.96922.1164
GM-ITM (Real-wold)0.28800.47567.68362.0965

(1) Peak Variation. The Coefficient of Variation (CV) of peak value varies significantly between the SI-HDR and GM-ITM datasets. The SI-HDR data records real-world irradiance and the peaks fluctuate widely due to the diverse scenes, while the GM-ITM data is display-referred, in which the peaks are smoother for a better visual experience.

(2) Colorfulness. The SI-HDR data stores irradiance that has not been post-processed, so the Saturation is lower than GM-ITM data intended for display.

(3) Distribution. To better utilize the display capabilities of the device, GM-ITM data exhibits more balanced. In contrast, the SI-HDR data stores real-world irradiance that varies widely, thus the Kurtosis and Skewness are more extreme, making a significant gap in the distribution with the GM-ITM data.

We hope your remaining concerns can be adequately addressed now. If possible, we look forward to more positive feedback. Thanks once again!

[A] Nemoto, Hiromi, et al. "Visual attention in LDR and HDR images." 9th International Workshop on Video Processing and Quality Metrics for Consumer Electronics (VPQM). 2015.

[B] Akyüz, Ahmet Oǧuz, et al. "Do HDR displays support LDR content? A psychophysical evaluation." ACM Transactions on Graphics (TOG). 2007.

评论

4. Validation experiments on other datasets

Thank you for providing information of additional datasets. We would like to first explain that these datasets are customized for the SI-HDR task, which is different from the intent of the ITM task. The ITM task focuses more on dynamic range restoration and color gamut conversion instead of the illusion of missing details and reconstruction of unlimited irradiance. Nevertheless, as suggested, we conduct additional experiments and the results for VDS [3] and HDREye [10] are shown below.

MethodPSNR-L↑SSIM-L↑PSNR-PQ↑SSIM-PQ↑ΔEITPΔE_{ITP}
KUNet29.03940.843519.91520.802044.6197
FMNet29.05430.844720.14090.828243.7337
Ours28.80870.846821.14740.835640.8171
MethodPSNR-L↑SSIM-L↑PSNR-PQ↑SSIM-PQ↑ΔEITPΔE_{ITP}
KUNet23.16480.664115.59610.539864.0899
FMNet23.24160.662215.98040.592962.1801
Ours23.24460.675816.61800.602356.7420

Note that, while the domain gap between the tasks makes the quantitative metrics relatively low for all methods, the qualitative comparisons of diverse data from different sources [3] [10] [11] are meaningful and help to increase the credibility of our experiments. The results of the qualitative experiments are shown below.

[Figure A] https://picx.zhimg.com/80/v2-b62509ce89fd10e0a848910fa2465d4e.png

[Figure B] https://picx.zhimg.com/80/v2-e0543ac6ca186abb25a5b05ac38e92a3.png

Experimental results show that our method recovers local and global contrast well and achieves smooth and realistic visual results, verifying its generalizability over a wider range of data. We will add these additional results in the revised paper or supplementary document.

[10] Nemoto, Hiromi, et al. "Visual attention in LDR and HDR images." 9th International Workshop on Video Processing and Quality Metrics for Consumer Electronics (VPQM). 2015.

[11] Liu, Yu-Lun, et al. "Single-image HDR reconstruction by learning to reverse the camera pipeline." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

5. Why the GM can be transmitted at a reduced resolution?

The downsampling of the GM is recognized by standards [12] [13] [14] to save the bandwidth. The resolution-sensitive information such as edges, textures, etc., is stored in the full-resolution SDR image, and the dynamic range information recorded by GM is coupled with the SDR image. Therefore, the downsampling has a small impact on the detail of the final HDR, which is acceptable for bandwidth saving.

[12] ISO. "Gain map metadata for image conversion." https://www.iso.org/standard/86775.html, 2024.

[13] Adobe. "Gainmap specification." https://helpx.adobe.com/camera-raw/using/gain-map.html, 2024.

[14] Google. "Ultra-hdr image format." https://developer.android.com/media/platform/hdr-image-format, 2024.

6. Experimental results for directly learning QmaxQ_{max}

The ablation results on the synthesis dataset are shown in the table below, where "-", "O", and "√" in the QM column represent not learning, directly learning, and indirectly learning, respectively. Compared to not learning QmaxQ_{max}, learning QmaxQ_{max} directly gets better linear performance, but the imbalanced supervision between tensor and scalar causes a decline in perception metrics, and all the results have a significant decay compared to indirectly learning QmaxQ_{max}.

BackboneQMCWSAPSNR-L↑PSNR-PQ↑HDR-VDP3↑
---41.167045.27659.9210
O--41.375544.97179.9128
--42.347946.20679.9431
-43.048047.12389.9454
-42.908547.00119.9442
43.551047.62569.9477

7. Open source for code and datasets

We commit to making our codes and datasets publicly available upon acceptance of the paper, enabling the researchers to validate our work and apply it to broader scenarios. We appreciate your emphasis on transparency and look forward to contributing to the community in this way.

评论

Could you tell me which exposure values were selected as input images from the multi-exposure stacks in VDS/HDREye? Additionally, could you provide the average HDR-VDP-3 (or HDR-VDP-2) metrics for the generated HDR images?

审稿意见
8

The paper introduces a new task, Gain Map-based Inverse Tone Mapping (GM-ITM), to address the problem of converting standard dynamic range (SDR) images into high dynamic range (HDR) images using a Gain Map (GM). This work aims to improve HDR up-conversion by focusing on GM estimation instead of directly predicting HDR, leveraging a dual-branch network (GMNet) that includes Local Contrast Restoration (LCR) and Global Luminance Estimation (GLE) branches. To support the research, the authors also propose synthetic and real-world datasets to evaluate the method. The experiments demonstrate GMNet’s quantitative and qualitative superiority over existing methods for HDR-related tasks.

优点

Novelty: The paper addresses an emerging area in HDR up-conversion by proposing the GM-ITM task, which is relatively unexplored and offers an innovative approach to inverse tone mapping.

Method: The proposed dual-branch GMNet architecture is well-designed, with LCR and GLE branches targeting local and global image features respectively, which allows for improved GM estimation.

Evaluation: The authors perform extensive quantitative and qualitative evaluations on synthetic and real-world datasets, benchmarking GMNet against well-established HDR methods and showing clear improvements.

Dataset: The creation of both synthetic and real-world datasets for GM-ITM facilitates further research and addresses a gap in data availability, providing a valuable resource for the field.

Ablation Studies: The paper includes thorough ablation studies on key components, such as spatial-aware modulation and GM resolution, to substantiate the effectiveness of GMNet’s design choices.

缺点

Limited Comparison Scope: The paper primarily compares GMNet with existing HDR and SDR-to-HDRTV up-conversion methods, but a comparison with more diverse or advanced inverse tone mapping techniques could strengthen the results.

Complexity and Computation: The dual-branch network adds computational overhead, especially when processing high-resolution images. Discussion on efficiency or real-time applicability is limited, which could impact practical usage.

Dependency on New Data Format: The proposed method relies heavily on the novel GM format, which is not yet widely adopted. This dependency could limit the method's applicability outside specialized devices or formats.

Visual Comparisons in Real-World Scenarios: Although qualitative results demonstrate GMNet’s advantages, additional challenging real-world scenarios could better highlight its robustness. For instance, extreme lighting conditions or complex textures may expose limitations.

问题

A more detailed discussion about the computation complexity would help understand the applicability of the proposed method.

评论

We appreciate the reviewer's valuable comments and suggestions, and we hope our responses could address the concerns.

1. Comparison with more related methods

To further increase the comprehensiveness of the experiments, we append three more baselines for comparisons, namely DCDR-UNet [1], EPCE-HDR [2], and ITM-LUT [3]. The experiment results on the synthetic dataset are shown below, demonstrating that our method still achieves superior performance over these latest works.

ㅤMethodㅤPSNR-L↑SSIM-L↑PSNR-PQ↑SSIM-PQ↑ΔEITPΔE_{ITP}HDR-VDP3↑
KUNet40.48790.995234.52010.96197.15389.7851
HDRUNet41.25790.996940.93150.99463.57999.9080
HDCFM41.85630.996744.90170.99922.47739.9115
FMNet40.76990.997044.47980.99962.30329.8991
DCDR-UNet41.80850.997142.01830.99623.05019.9215
EPCE-HDR41.59190.995631.47000.94308.58309.6694
ITM-LUT39.57350.995043.71520.99872.28659.7268
Ours43.55100.997747.62560.99981.62629.9477

[1] Kim, Joonsoo, et al. "DCDR-UNet: Deformable Convolution Based Detail Restoration via U-shape Network for Single Image HDR Reconstruction." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.

[2] Tang, Jiaqi, et al. "High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation." ECAI 2023. IOS Press, 2023. 2330-2337.

[3] Guo, Cheng, et al. "Redistributing the Precision and Content in 3D-LUT-based Inverse Tone-mapping for HDR/WCG Display." Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production. 2023.

2. Complexity and computation

There is no additional burden on our dual-branch network for high-resolution input, as the input of the GLE branch ISDRLRI_{SDR}^{LR} is at a fixed 256×256256\times256 resolution. To evaluate the computational efficiency and real-time processing capability of our method, we conduct comparison experiments and add three more baselines [1] [2] [3]. The runtime is evaluated in NVIDIA A100 as the average of 100 trials on the real-world dataset in the resolution of 4096×30724096\times3072.

ㅤMethodㅤParams↓ㅤRuntime↓ㅤㅤFLOPs↓ㅤPSNR-L↑PSNR-PQ↑ΔEITPΔE_{ITP}
KUNet1.08M621.70ms7534.50G33.325633.78959.0578
HDRUNet1.58M613.89ms4161.75G33.168637.39665.7358
HDCFM0.10M497.05ms127.08G32.796338.18946.2957
FMNet1.24M304.13ms4147.13G32.624739.76324.5544
DCDR-UNet1.26M1085.91ms4502.77G33.542138.32524.8645
EPCE-HDR31.02M23992.84ms326412.14G32.761631.120710.4322
ITM-LUT0.57M19.81ms59.16G31.667737.90215.4299
Ours1.83M75.26ms1112.18G33.949040.20884.0260

As can be seen, benefiting from the simple form of the target GM, our method achieves the second fastest runtime. While the look-up-table-based solution ITM-LUT is faster, its performance on image quality metrics is much lower.

评论

3. Relevance to the new data format

It needs to be clarified that our method does not rely on the new GM format. The proposed GMNet can estimate GM from the input SDR image. Then the SDR-GM pair can be directly encapsulated into the new GM format, but also can be calculated to linear HDR through the pipeline in Section 3. After that, we can convert the linear HDR into other mainstream HDR formats, such as PQ-encoded HDR image in HDR10 standard, not limited to specialized devices or formats.

4. Visual comparisons in challenging real-world scenarios

Figure 9 and Figure 10 in the Appendix demonstrate the superiority of our method in reconstructing edges and high-contrast night scenes. To enable more in-depth analysis in challenging scenarios, we perform qualitative experiments as follows.

[Figure A] https://picx.zhimg.com/80/v2-f7a1fc6a62f5bf91169047c041d40f49.png

[Figure B] https://picx.zhimg.com/80/v2-9a63b9bf1e886282ae886f41fccfad9e.png

As shown in Figure A, our method achieves superior performance in the sun region at extreme brightness. The HDCFM also achieves low errors, but with grid effect due to operator properties. Figure B shows the estimation results on the leaves with complex textures and eaves with sharp edges, and our method demonstrate superior performance on complex textures in the real-world scenario, validating the generalization and the robustness of the proposed method.

审稿意见
6

This paper introduces Gain Map-based Inverse Tone Mapping (GM-ITM), focusing on estimating the Gain Map (GM) for SDR images rather than directly converting to HDR. The proposed dual-branch network, GMNet, effectively combines local and global information for accurate GM prediction. Extensive experiments on both synthetic and real-world datasets demonstrate GMNet’s advantages over existing HDR methods, showing improved performance in both quantitative and qualitative metrics. The paper also contributes new datasets for GM-ITM research, supporting future advancements in HDR image processing.

优点

  1. Innovative Task Definition: The paper introduces a novel task, Gain Map-based Inverse Tone Mapping (GM-ITM), which focuses on GM estimation rather than direct HDR prediction, leveraging a unique double-layer HDR format for enhanced up-conversion.

  2. Effective Network Design: The proposed GMNet utilizes a dual-branch structure with Local Contrast Restoration (LCR) and Global Luminance Estimation (GLE) branches, effectively capturing both pixel-level and image-level information for accurate GM prediction.

  3. Comprehensive Dataset Contribution: The authors provide both synthetic and real-world SDR-GM datasets, which are diverse and well-suited for evaluating GM-ITM, fostering further research in this field.

  4. Strong Experimental Validation: Extensive quantitative and qualitative experiments demonstrate GMNet’s superiority over existing HDR-related methods, showcasing its potential in real-world applications.

缺点

  1. Limited Error Analysis: The paper provides little insight into errors GMNet might produce in challenging conditions (e.g., extreme lighting, high contrast), which is crucial for understanding its limitations.

  2. Model Size Trade-Offs: An ablation study on model size versus performance could reveal if a smaller GMNet version maintains accuracy with reduced resources, benefiting applications needing speed-accuracy balance.

问题

  1. Insufficient Dataset Generation Information: The process and diversity of the synthetic and real-world datasets are not fully detailed.
  2. Lack of Real-Time Performance Evaluation: The paper does not address GMNet’s computational efficiency or real-time processing capabilities, which are critical for practical HDR applications, especially on mobile devices.
  3. Scalability of the Model: It’s unclear if GMNet can scale effectively with higher-resolution images, which is essential as HDR media often demands large image resolutions for detail preservation.
评论

We appreciate the reviewer's valuable comments and suggestions, and we hope our responses could address the concerns.

1. Insight into challenging conditions

Figure 10 in the Appendix demonstrates the superiority of our method in reconstructing high-contrast night scenes. To enable more in-depth analysis in challenging conditions, we conducted the following qualitative experiments.

[Figure A] https://picx.zhimg.com/80/v2-f7a1fc6a62f5bf91169047c041d40f49.png

[Figure B] https://picx.zhimg.com/80/v2-9a63b9bf1e886282ae886f41fccfad9e.png

As shown in Figure A, our method achieves superior performance in the sun region at extreme brightnesses. The HDCFM also achieves low errors, but with grid effect due to operator properties. The Figure B demonstrates superior performance of our method in challenging condition with complex textures.

2. Model size trade-offs

Keeping the network structure unchanged, we adjust the number of hidden layers to control the model size. The results of the ablation experiments on the synthetic dataset are shown below, and our implementation in the paper is 64 hidden layers.

Hidden LayersParams↓PSNR-L↑SSIM-L↑PSNR-PQ↑SSIM-PQ↑HDR-VDP3↑
320.46M42.70290.997246.76200.99979.9319
481.03M43.32990.997347.45240.99989.9408
641.83M43.55100.997747.62560.99989.9477

The experimental results show that reducing model parameters does not lead to significant performance degradation, especially in SSIM-L, SSIM-PQ and HDR-VDP3 that measure perception quality, verifying that our method can maintain accuracy with reduced resources.

评论

3. More dataset information

In the Appendix of our paper, we present the diversity of the real-world dataset in Figure 7 and Figure 8. To address the reviewer's concerns, we demonstrate the diverse scenes and thumbnails of the synthetic dataset as follows.

[Figure C] https://picx.zhimg.com/80/v2-c9660565e0e70b53d24970ceafcdb691.png

[Figure D] https://picx.zhimg.com/80/v2-f0c2500d31685c44d74b83eca67fd82d.png

The SDR-GM pairs of the real-world dataset are directly derived from the mobile device. The SDR images in the synthetic dataset are degraded from HDRTV frames. Specifically, we first roll off the input HDR by EETF and transfer it to the P3 gamut after linearization, then conduct extended Reinhard tone mapping, clip it under 100 nit, and finally store it in uint8 for quantization. After getting the SDR images, the detailed formation pipeline of the GM can be found in Section A in the Appendix. For more detailed information, we will release our codes and datasets upon acceptance of the paper.

4. Real-Time performance evaluation

To evaluate the computational efficiency and real-time processing capability of our method, we conduct comparison experiments and add three more baselines [1] [2] [3]. The runtime is evaluated in NVIDIA A100 as the average of 100 trials on the real-world dataset in the resolution of 4096×30724096\times3072.

ㅤMethodㅤParams↓ㅤRuntime↓ㅤㅤFLOPs↓ㅤPSNR-L↑PSNR-PQ↑ΔEITPΔE_{ITP}
KUNet1.08M621.70ms7534.50G33.325633.78959.0578
HDRUNet1.58M613.89ms4161.75G33.168637.39665.7358
HDCFM0.10M497.05ms127.08G32.796338.18946.2957
FMNet1.24M304.13ms4147.13G32.624739.76324.5544
DCDR-UNet1.26M1085.91ms4502.77G33.542138.32524.8645
EPCE-HDR31.02M23992.84ms326412.14G32.761631.120710.4322
ITM-LUT0.57M19.81ms59.16G31.667737.90215.4299
Ours1.83M75.26ms1112.18G33.949040.20884.0260

As can be seen, benefiting from the simple form of the target GM, our method achieves the second fastest runtime. While the look-up-table-based solution ITM-LUT is faster, its performance on image quality metrics is much lower.

[1] Kim, Joonsoo, et al. "DCDR-UNet: Deformable Convolution Based Detail Restoration via U-shape Network for Single Image HDR Reconstruction." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.

[2] Tang, Jiaqi, et al. "High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation." ECAI 2023. IOS Press, 2023. 2330-2337.

[3] Guo, Cheng, et al. "Redistributing the Precision and Content in 3D-LUT-based Inverse Tone-mapping for HDR/WCG Display." Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production. 2023.

5. Scalability of the model

It needs to be clarified that the experiments in our paper are all performed on high-resolution images, where the resolution of the synthetic dataset is 3840×21603840\times2160 and the resolution of the real-world dataset is 4096×30724096\times3072. Therefore, the GMNet can scale effectively with high-resolution images in most cases.

评论

Thank you for your response. You have already answered my question. I correspondingly increased my score.

评论

Thank you for raising the score.

We truly appreciate the time and effort you have dedicated to reviewing our work and providing valuable feedback. Your comments and suggestions have been very insightful and helpful in improving the quality of our paper. Thank you again for your support and encouragement!

评论

We sincerely thank the reviewers for their valuable comments and suggestions, and we hope our responses adequately address your concerns. The revised version of the manuscript has been uploaded. Additionally, we are happy to provide further details on any aspects of our responses that may require additional clarification or elaboration.

Once again, we appreciate the reviewers’ time and insightful feedback and look forward to receiving further input.

AC 元评审

The paper presents a novel GM-ITM task for HDR imaging. It claims GMNet can capture local & global info for accurate GM estimation, with datasets supporting further research. Findings show GMNet's superiority. Strengths include the innovative task, the well-designed GMNet, valuable datasets, and extensive experiments. Weaknesses are a lack of error analysis in complex conditions, incomplete model size trade-offs, insufficient dataset details, and inadequate evaluation of real-time performance & scalability. Reasons for considering acceptance are novelty and good performance, but weaknesses need addressing for better scientific rigor and practicality.

审稿人讨论附加意见

The reviewers raised multiple important points. Reviewer yo1C asked for more error analysis in extreme conditions. The authors provided qualitative experiments (Figure A & B in Appendix), yet a more comprehensive quantitative analysis would better clarify limitations. For dataset generation, Reviewer nA6K sought details. The authors presented info on dataset diversity and the generation process, but deeper discussions on representativeness could be useful. Reviewer cLvc questioned computational complexity. The authors added runtime comparison experiments, though further exploration of practical usage impact is needed. Regarding novelty, Reviewer 6tzz had concerns. The authors argued and added discussions, but clearer establishment in HDR field is required. Overall, the comments are positive。

最终决定

Accept (Poster)