PaperHub
6.0
/10
Poster4 位审稿人
最低5最高7标准差1.0
5
7
7
5
4.5
置信度
正确性2.8
贡献度3.0
表达2.8
NeurIPS 2024

Rethinking No-reference Image Exposure Assessment from Holism to Pixel: Models, Datasets and Benchmarks

OpenReviewPDF
提交: 2024-05-07更新: 2024-11-06
TL;DR

Pixel-level image exposure assessment from three perspectives: model, dataset, and benchmark.

摘要

关键词
Image Exposure AssessmentImage Quality Assessment

评审与讨论

审稿意见
5

The paper introduces a novel paradigm that extends Image Exposure Assessment (IEA) from an image-level to a pixel-level framework. This paradigm comprises three components: model, dataset, and benchmark. Concerning the model, the study introduces the Pixel-level IEA Network (P-IEANet). This network processes images of varying exposures, separates them into low and high-frequency components via a discrete wavelet transform, assesses brightness with the low-frequency component, and evaluates structure with the high-frequency component, ultimately delivering pixel-level assessment results. Regarding the dataset, the authors have developed a new dataset, IEA40K, which includes 40,000 images featuring diverse exposures and corresponding pixel-level annotations. Finally, the paper presents comprehensive experiments on both holistic and pixel-level assessments, yielding promising results.

优点

  1. The paper initially proposes a pixel-level image exposure assessment paradigm, significantly enhancing precision in the field of image exposure assessment.

  2. The paper introduces an assessment network that employs discrete wavelet transform, an intriguing choice supported by several ablation studies.

  3. The paper proposes a large-scale, multi-exposure dataset with pixel-wise annotations derived from an automatic multi-exposure fusion technique, subsequently refined by human experts.

  4. The paper also demonstrates that the P-IEANet can potentially improve the performance of low-light image enhancement methods.

  5. The paper is well-composed, demonstrating a clear structure, precise language, and a logical flow of ideas.

缺点

The main weakness is that the paper lacks a well-defined definition for pixel-level image exposure assessment. For other details, please refer to the "Questions" part.

问题

I find the proposed task interesting, while I have reservations about certain assertions made in the paper, terms that lack clarity, and the absence of adequate justification for the introduction of certain tasks without clear motivation. For further details, see the below list. I've organized my concerns and suggestions according to their significance to assist the authors in prioritizing their rebuttal.

  1. The terminology employed in the paper suffers from a lack of clarity, necessitating more detailed explanations. For instance, the term exposure conventionally refers to the duration of exposure time in the context of capturing images with digital cameras, typically considered as a global attribute of an image. However, the paper introduces the concept of pixel-level exposure without providing a sufficient explanation, which is illogical in the literal sense. Similarly, the term exposure residual is introduced but remains poorly defined, further complicating the understanding of the methodology. Probably, the paper misuses the concepts of exposure and brightness, which cannot be used interchangeably, however.

  2. The motivation behind the paper remains ambiguous. It argues that a holistic evaluation of image exposure encounters two primary issues: (1) a dilemma between applicability and practicability, and (2) a narrow inductive bias. However, the paper lacks a further explanation of these problems. Incorporating visual results from current holistic evaluation methods that exhibit these issues could more effectively and intuitively demonstrate the paper's motivation. In the current version, the necessity for a pixel-level image exposure assessment method is not clearly articulated, particularly under which circumstances such a technique would be essential.

  3. Related to the first point, the proposed method targets to predict exposure residual, defined as (reference - input) in RGB space. However, the rationale behind this definition requires further justification. Specifically, it remains unclear why this definition is suitable for use as the ground truth in pixel-wise image exposure assessment. Additionally, it is essential to explore whether any disparity exists between the concepts of (reference - input) in RGB space and the actual pixel-wise score for image exposure.

  4. For evaluation metrics, PSNR and SSIM are two commonly used pixel-wise metrics. However, this paper only adopts SSIM for evaluation. Including PSNR performances would provide a more convincing argument.

  5. While comparing pixel-level performance with other image enhancement methods, the paper derives these methods' exposure residual predictions by directly predicting the residual map. However, image enhancement techniques typically use loss functions designed to smooth the final outputs and align them with human perception, which may not be appropriate for predicting residuals. Although it is acknowledged that the difference between (output-input) and the proposed exposure residual exists, incorporating an additional ablation study that calculates the residual from (output-input) would likely provide a more comprehensive analysis.

  6. Important details, such as the architecture of the proposed Long Range Encoder (LRE) and Short Range Encoder (SRE), are missing, hindering the reproductivity of the proposed framework.

  7. The availability of the proposed dataset to the public is crucial for assessing the contribution of this work.

局限性

The paper adequately discusses the limitations of moving objects and image size while training. No negative social impact is present in this work.

作者回复

Q1: The terminology lacks clarity.

A1: Thanks for the reviewer's thought-provoking questions.

  1. In the context of evaluating images, the term "exposure" is no longer a global attribute of an image. Even in the context of capturing images, as exemplified by the reviewer, the term "exposure" conventionally refers to not only exposure time but also two other parameters (aperture and ISO, referred to [1]). Rather than being characterized as a global attribute of the image, the parameters would be more appropriate to be described as a global attribute associated with the camera for capturing the image. Unfortunately, due to the claim made by the classical photographic theory (Adams’ theory) [2] that "The exposure time is the same for all elements, but the image exposure varies with the luminance of each subject element," the coarse global camera exposure attribute fails to match each subject element in an image, potentially result in some subject elements being under-exposed and others being over-exposed. Therefore, in the context of evaluating images, the term "exposure" is no longer a global attribute, as referred to [2] that "Any scene of photographic interest contains elements of different luminance; consequently, the 'exposure' actually is many different exposures." As exposure is fundamental knowledge in IQA, we sincerely apologize for not emphasizing it enough.
  2. The concept of "pixel-level exposure" is logically consistent with the term "exposure" in the context of evaluating images. As claimed by the theory [2], the ideal exposure should be refined to different elements. In this paper, we innovatively quantify subject elements by utilizing pixels as the smallest units of exposure measurement.
  3. The term "exposure residual" refers to the deviation observed in the actual exposure of each pixel compared to its ideal exposure in the context of evaluating images, as stated in our original paper at line 188: "measuring the deviation of each pixel from its ideal exposure." Numerically, values closer to -1 indicate overexposure, while values closer to 1 indicate underexposure, as detailed in Figure 7.
  4. Exposure and brightness have distinct connotations; for instance, exposure can be either holistic or regional (with a pixel as the minimum unit), whereas brightness is solely at the pixel level with a value assigned from 0 to 255.

[1] Image exposure assessment: a benchmark and a deep convolutional neural networks based model, ICME 2018.

[2] Adams, The Negative: Exposure and Development.


Q2: The motivation and necessity remain unclear, requiring the demonstration of visual results.

A2: Thanks for the valuable advice. Figure 2 in our submitted PDF provides a visual example. Actually, the necessity arises from industrial demands. For instance: smartphone manufacturers currently still manually evaluate exposure images in no-reference scenes; however, by integrating pixel-level IEA (essential) and customized evaluation rules, we have successfully simulated and replaced this manual process for a TOP-5 smartphone manufacturer. If our paper is accepted, we will partially disclose this case on our website.


Q3: The definition of exposure residual requires further justification.

A3: The insightful questions are greatly appreciated.

  1. In training phrase, the exposure residual is not obtained solely from the difference (reference - input) in RGB space, but it undergoes further verification and adjustments by experts to ensure that the final exposure residual closely aligns with the perceived deviation of each pixel from ideal exposure (refer to Figure 7 and line 225 in the original paper).
  2. The rationale behind the exposure residual and its suitability for ground truth is based on the inherent characteristics and practicality of pixel-wise data annotation for supervision. For experts, distinguishing between the reference and input images is relatively straightforward and far more accurate [1][2], thus facilitating practical data annotation. While the implementation of absolute evaluations is hindered by the absence of clear standardized criteria. As discussed in Q1, ideal IEA should be derived to the specific characteristics of each subject element, even at the pixel-level. Fortunately, the exposure residual provides direct and effective supervisory information for model training.
  3. We are a little confused about "any disparity" in the comments. During prediction, the actual pixel-wise scores for image exposure are the exposure residuals generated by P-IEANet which has been trained using the supervision data (also the exposure residuals but adjusted by experts).

[1] Descriptive Image Quality Assessment in the Wild, ECCV2024.

[2] Self-Supervised Multi-Task Pretraining Improves Image Aesthetic Assessment, CVPRW2021.


Q4: PSNR performances.

A4: Thanks for the insightful suggestion. The PSNR results, unequivocally showing our SOTA performance, can be found in Table 7 of our submitted PDF.


Q5: An additional ablation study.

A5: Thanks for the insightful suggestion. We have retrained (using the ground truth of inference images for supervision) representative image enhancement methods to directly predict the reference images instead of using the residual. Then the difference between the output and input was calculated as the residual. The SSIM results, which can be found in Table 8 of our submitted PDF, unequivocally show our SOTA performance despite the decrease in performance for all methods.


Q6: The architecture details of LRE & SRE.

A6: The classic LRE and SRE are detailed in Figure 3 of our submitted PDF.


Q7: The availability of the dataset.

A7: Unfortunately and regrettably, the dataset and code in the supplementary materials are not visible due to unknown issues. We apologize for any inconvenience caused and have provided them again. Pls refer to [#Wvdf, Q1].

评论

Having reviewed the rebuttal, I believe it has addressed the majority of my concerns. Consequently, I would like to revise my rating to borderline acceptance.

评论

Dear Reviewer,

Thank you very much for contributing to NeurIPS2024.

The authors have provided detailed responses to your reviews. Would you please have a look at them?

Thanks again.

AC

审稿意见
7

This work tackles the challenges in image exposure assessment from three aspects: models, datasets, and benchmarks. Specifically, A P-IEANet model based on DWT is proposed, which can generate pixel-level assessment results in a no-reference manner. An exposure-oriented dataset IEA40K is collected to cover various lighting scenarios, devices, and scenes, which are annotated by more than 10 experts with pixel-level labels. A comprehensive benchmark of 19 methods is conducted on the collected IEA40K dataset, where the proposed P-IEANet delivers the best performance.

优点

  • Decomposing images into lightness features and structure components using Haar DWT is theoretically reasonable and empirically effective as presented in this work.
  • The dataset construction strategies described in Sec. 4.1 and Sec. 4.2 provide valuable insights to the related community.
  • The proposed model delivers good performance, even outperforming the LMM-based model Q-align.

缺点

  • Holistic level assessment is performed on SPAQ. It should be straightforward to convert the pixel-level annotations to holistic level annotations in the proposed IEA40K dataset because the pixel-level annotations contain more information than the holistic level annotations.
  • Would the performance of IEA models be boosted by jointly training (like the practices used in UNIQUE, LIQE, etc.) the model on the combination of IEA dataset and general-purpose IQA datasets?

问题

Instead of SSIM and PSNR, I think Eq. (7) can also be employed as a pixel-level performance measure.

局限性

Yes

作者回复

We sincerely appreciate the reviewer's positive feedback, characterizing our paper as "theoretically reasonable and empirically effective" and noting that it "provides valuable insights to the related community." We have thoroughly addressed the reviewer's inquiries, which we believe will significantly improve the quality of our paper.


Q1: "Holistic level assessment is performed on SPAQ. It should be straightforward to convert the pixel-level annotations to holistic level annotations in the proposed IEA40K dataset because the pixel-level annotations contain more information than the holistic level annotations."

A1: Thanks for the insightful comments. To verify the suggestion and convert annotations between the pixel-level annotations and holistic level annotations, we have calculated the average absolute value at the pixel level and adjusted this value by subtracting it from 1, thus normalizing it within a range of 0 to 1. Subsequently, we have conducted additional experiments on this transformed dataset, IEA40K-h, to verify the methods' performance.

  1. Given the constraint of rebuttal time, we selected representative methods to retrain on the IEA40K-h dataset, as detailed in Table 5 of the PDF under the "global" response. Our approach unequivocally outperformed the others, achieving SOTA results. The higher SRCC and LRCC metrics obtained with IEA40K-h, compared to those on the SPAQ dataset, indicate that IEA40K-h allows models to learn features more effectively.
  2. Table 6 in the PDF illustrates the cross-dataset validation for SPAQ and IEA40K-h on the LRCC metric. Models trained on IEA40K-h, even without fine-tuning on SPAQ, demonstrated performance comparable to those directly trained on SPAQ. In contrast, models trained on SPAQ did not generalize effectively to IEA40K-h, indicating that IEA40K-h provides richer holistic information that enhances model generalization.

The valuable findings will be further elaborated in the final version of the paper (if accepted). We sincerely appreciate the insightful suggestion.


Q2: Would the performance of IEA models be boosted by joint training (like the practices used in UNIQUE, LIQE, etc.) the model on the combination of the IEA dataset and general-purpose IQA datasets?

A2: Thank you for your valuable advice. Currently, no existing dataset combines pixel-level and holistic-level annotations. Following the reviewer's suggestion in Q1, we derived holistic-level annotations (IEA40K-h) from our IEA40K dataset and adopted a LIQE-like method to jointly learn these tasks. Comparing the original results, 1) For pixel-level tasks on IEA40K, the performance metrics changed as follows: MAE=0.03 (+0.0), SSIM=0.76 (+1.3%). 2) For the holistic-level task on IEA40K-h, the changes were: LRCC=0.91 (+4.5%), SRCC=0.87 (+4.8%). This joint training approach leverages the detailed information in pixel-level annotations to enhance the model's understanding of holistic tasks. We sincerely appreciate the reviewer's suggestion and will include these findings in the final paper (if accepted).


Q3: Instead of SSIM and PSNR, I think Eq. (7) can also be employed as a pixel-level performance measure.

A3: Yes! Eq. (7) closely resembles the MAE metric detailed in Table 1 of our paper and can serve as a measure of pixel-level performance.


In conclusion, the reviewer's suggestion is highly valued, and we will incorporate detailed results into the final version of the paper (if accepted).

评论

Thanks for the responses, I raised my rating to 7.

审稿意见
7

This paper proposes an innovative no-reference image exposure assessment method, transitioning from traditional holistic image evaluation to fine-grained pixel-level assessment. This approach effectively addresses the shortcomings of existing techniques in terms of accuracy and generalization. Researchers have developed P-IEANet, a pixel-level evaluation network that utilizes Haar discrete wavelet transform to analyze image brightness and structural information, enabling exposure assessment without reference images. Additionally, to support this method, the researchers have constructed the IEA40K dataset, which contains 40,000 images with detailed pixel-level annotations, covering diverse lighting conditions and devices. Using this dataset, they established a comprehensive benchmark including 19 methods, demonstrating that P-IEANet achieves state-of-the-art performance across multiple evaluation metrics. This work not only enhances the accuracy of no-reference IEA tasks but also provides valuable resources and new research directions for the image exposure research community. Future work will focus on optimizing the framework to support multimodal outputs and enhancing exposure perception in AI-generated content.

优点

  • Pixel-level Evaluation: The P-IEANet proposed in the article is capable of conducting pixel-level image exposure assessment, which offers a more refined analysis and more accurate results compared to traditional overall image assessment.
  • Innovative Model Architecture: By integrating the Haar Discrete Wavelet Transform with specific feature extraction modules, P-IEANet is able to analyze images from both the brightness and structural perspectives, providing a more comprehensive exposure assessment.
  • Large-scale Dataset: The article has constructed the IEA40K dataset, which is a large-scale, diverse image dataset that provides rich resources for evaluation and training.

缺点

  • The author mentions in the abstract that the code and dataset can be found in the supplementary materials, but there is no relevant section in the supplementary materials.
  • There is no explanation as to why the Haar wavelet was chosen over other wavelets.
  • The aesthetic quality of Figure 4 needs to be improved.

问题

Please refer to the comments in the weakness part.

局限性

Not applicable

作者回复

We appreciate the reviewer's positive feedback on our paper, particularly for acknowledging "an innovative no-reference image exposure assessment method." We have addressed the questions raised below.


Q1: The author mentions in the abstract that the code and dataset can be found in the supplementary materials, but there is no relevant section in the supplementary materials.

A1: Unfortunately and regrettably, the dataset and code in the supplementary materials are not visible due to unknown issues. We sincerely apologize for any inconvenience caused, but we did include the code and dataset in the supplementary materials, as evidenced by Figure 1 of the PDF under the "global" response. Following the NIPS Rebuttal guidelines, we have provided an anonymized link to the AC for the code and dataset in a separate comment. Additionally, if our paper is accepted, we will release all resources publicly.


Q2: There is no explanation as to why the Haar wavelet was chosen over other wavelets.

A2: There are primarily four reasons for our selection:

  1. As outlined in the paper (lines 114-129), the Haar wavelet distinctly aligns its component decomposition with exposure characteristics, a unique attribute not shared by other wavelets.
  2. The Haar wavelet excels in analyzing signals characterized by sudden variations. It is particularly adept for identifying areas in images that are underexposed or overexposed compared to normally exposed regions, which can be treated as signals with abrupt changes.
  3. The transformations of the Haar wavelet, both forward and inverse, are reversible. A proficient approach to estimating exposure residuals involves reconstructing the "ideal exposure image" in the latent space and subsequently assessing the deviation of each pixel in the input image from this ideal exposure. This methodology is essential, as highlighted in line 46 of the original paper: "As a no-reference method, it should effectively simulate reference images in non-preset scenarios, operating similarly to full-reference methods." The Haar wavelet facilitates this process effectively.
  4. Experimental results show that the Haar wavelet surpasses other representative wavelets in performance. Table 1 in the PDF under the "global" response, presents a comparative analysis of the Haar wavelet against other notable wavelets (Daubechies and Symlet) on the IEA40K dataset.

Q3: The aesthetic quality of Figure 4 needs to be improved.

A3: We sincerely apologize for the aesthetic issues in the figure and greatly appreciate the suggestion. In order to improve the figure's quality in the final paper (if accepted), we have planned to make the following improvements :

  1. Adjusting the color scheme to harmonize the appearance of various modules;
  2. Modifying the layout to balance the content, particularly on the left side of Figure 4, by adjusting the proportions of core modules and reducing the emphasis on non-core modules;
  3. Standardizing the shapes of DWT Kernels to match those of other modules.

In conclusion, the reviewer's suggestion is highly valued, and we will incorporate detailed results into the final version of the paper (if accepted).

评论

Thank you for the clear and convincing rebuttal. I have another question.

  1. Can your method be applied to pixel-level image quality assessment? If so, please provide some basic ideas or preliminary experimental results.
评论

The acknowledgement of our rebuttal is greatly appreciated. It's wonderful to pose such an insightful question.

  1. Our methodology, initially designed for IEA, can be effectively adapted to pixel-level IQA. Currently, there is a lack of open-source datasets specifically tailored for pixel-level IQA; however, we are currently conducting preliminary research in this area and have developed an exclusive dataset that includes pixel-level annotations across 800 images for IQA tasks. We utilized a Transformer-based architecture to predict residuals in the IQA tasks and achieved an SSIM score of 0.69. After fine-tuning these residuals on the KADID-10k dataset using MLPs, our approach attained an SRCC score of 0.93, which closely matches the top-performing method with an SRCC score of 0.94. With more extensive pixel-level annotations available, our methodology has the potential to further improve its performance.
  2. Nevertheless, IQA tasks encompass a wide range of distortions, and the methodology for no-reference pixel-level assessment necessitates a more sophisticated design to attain superior performance. Theoretically, our methodology may hold the potential to accomplish this ambitious objective.
评论

Thank you for your reply. I will keep my score and am looking forward to your future work.

审稿意见
5

This paper proposes a new no-reference image exposure assessment method, Pixel-level IEA Network (P-IEANet), which analyzes and evaluates image exposure from the perspectives of brightness and structure using discrete wavelet transform (Haar DWT). Also, a dataset exclusively tailored for IEA, called IEA40K, is constructed. According to a comprehensive evaluation of methods on the IEA40K dataset, the proposed method achieves SOTA performance and offers advantages for the exposure enhancement community.

优点

This paper demonstrates very good originality as it is the first realization of pixel-level image exposure assessment. The authors have designed corresponding methods specifically addressing the characteristics of this problem and achieved satisfying results. Detailed explanations of the motivation and the current state of research are provided. Both the principles and the implementation of the method are clearly presented. The experimental results effectively demonstrate the performance of the proposed method. This paper not only proposes a new IEA method but also contributes a new dataset and benchmark, providing a significant boost to the IEA and exposure-related community.

缺点

Haar DWT is used to decompose an image into components with different frequencies, but the advantages of this method compared to other similar methods are not adequately explained. In the method section of this paper, some operations lack clear motivation or principles. For example, the reason for applying the DWT^{-1} and the choice of l1 norm as the loss function are not well explained. In the experiments section, SSIM and MAE are adopted to measure the structure and lightness similarity between the ground truth and predicted exposure residual. However, as a perceptual IQA metric, SSIM may not be suitable for evaluating the prediction accuracy of exposure residuals. The paper claims that the proposed method has improved adaptability across varying criteria and scenarios, but this is not well demonstrated in the experiments.

问题

  1. Why use Haar DWT to decompose an image into components with different frequencies, and what are its advantages compared to other similar methods, such as other types of DWT?
  2. Why is the DWT^{-1} step necessary?
  3. What is the reason for using the l1 norm as the loss function?
  4. Why choose MAE to measure the structure and lightness similarity between the ground truth and predicted exposure residual, instead of using MSE?

局限性

The authors have adequately addressed the limitations.

作者回复

We greatly appreciate the reviewer's positive feedback on our paper, especially for acknowledging that it "not only proposes a new IEA method but also contributes a new dataset and benchmark, providing a significant boost to the IEA and exposure-related community." We hope the following responses will address any remaining concerns.


Q1-1: Why use Haar DWT to decompose an image into components with different frequencies?

A1-1: The primary objective is to minimize interference from signals of various frequency domains during the feature extraction phase. This strategy not only enables the model to analyze different features more accurately, thus boosting performance, but also speeds up model training. Supporting experimental results are detailed in Table 2 of the PDF under the "global" response. The data clearly show that employing the Haar DWT significantly enhances performance and decreases the number of epochs needed for convergence.


Q1-2: What are Haar DWT advantages compared to other similar methods, such as other types of DWT?

A1-2: There are primarily four reasons for our selection:

  1. As outlined in the paper (lines 114-129), the Haar wavelet distinctly aligns its component decomposition with exposure characteristics, a unique attribute not shared by other wavelets.
  2. The Haar wavelet excels in analyzing signals characterized by sudden variations. It is particularly adept for identifying areas in images that are underexposed or overexposed compared to normally exposed regions, which can be treated as signals with abrupt changes.
  3. The transformations of the Haar wavelet, both forward and inverse, are reversible. A proficient approach to estimating exposure residuals involves reconstructing the "ideal exposure image" in the latent space and subsequently assessing the deviation of each pixel in the input image from this ideal exposure. This methodology is essential, as highlighted in line 46 of the original paper: "As a no-reference method, it should effectively simulate reference images in non-preset scenarios, operating similarly to full-reference methods." The Haar wavelet facilitates this process effectively.
  4. Experimental results show that the Haar wavelet surpasses other representative wavelets in performance. Table 1 in the PDF under the "global" response, presents a comparative analysis of the Haar wavelet against other notable wavelets (Daubechies and Symlet) on the IEA40K dataset.

Q2: Why is the DWT^{-1} step necessary?

A2: As outlined in the third point of Q1-2, DWT^{-1} is designed to reconstruct the decomposed components of an image back into the "ideal exposure image" within the latent space. While DWT is utilized for enhanced analysis of features, DWT^{-1} helps the model in synthesizing these features back into the ideal exposure image. This synthesis process is crucial for identifying the discrepancies between the ideal and input images, thus aiding in the prediction of exposure residuals. Omitting the DWT^{-1} process would lead to reduced model performance, as evidenced in Table 3 of the PDF.


Q3: What is the reason for using the l1 norm as the loss function?

A3: Training results show that the L1-norm outperforms the alternatives. During training, we evaluated three primary types of loss functions: L1-norm, L2-norm, and Smooth L1. Comparative results are detailed in Table 4 of the PDF. The L1-norm demonstrates superior robustness and faster training speeds, as evidenced by earlier convergence epochs in the IEA task.


Q4: Why choose MAE to measure the structure and lightness similarity between the ground truth and predicted exposure residual, instead of using MSE?

A4: MAE provides a more direct quantification of the deviation of each pixel from its ideal exposure compared to both SSIM and MSE. Additionally, MAE is less sensitive to extreme values than MSE, offering a more robust and consistent measurement of errors. Therefore, using MSE as an option is well-reasoned. In our work, we selected MAE as a complement to SSIM.


In conclusion, the reviewer's suggestion is highly valued, and we will incorporate detailed results into the final version of the paper (if accepted).

评论

Thanks for the authors' response. I keep my original ranking.

作者回复

General Response:

We sincerely thank the reviewers for their efforts in reviewing our work and providing valuable comments. We highly appreciate the comments received, e.g., the positive comments on our contributions (4/4 reviewers), methods' performance (4/4 reviewers), our presentations (3/4 reviewers), and soundness (3/4 reviewers).

We also deeply value the constructive feedback the reviewers have provided. We have taken great care to address the reviewers' questions, conducting additional experiments detailed in the PDF under the 'global' response to further support our claims.
最终决定

The authors performed exceptionally well during the rebuttal and discussion stages, leading all reviewers to reach a consensus for acceptance. The AC concurs, acknowledging that this is a highly comprehensive study on non-reference IEA.