5.3

/10

Poster3 位审稿人

最低5最高6标准差0.5

4.0

置信度

正确性2.7

贡献度2.3

表达2.0

NeurIPS 2024

Self-Distilled Depth Refinement with Noisy Poisson Fusion

Jiaqi Li,Yiran Wang,Jinghong Zheng,Zihao Huang,Ke Xian,Zhiguo Cao,Jianming Zhang

OpenReview PDF

提交: 2024-04-29更新: 2024-11-06

TL;DR

SDDR proposes to model the depth refinement task by noisy Poisson fusion and train the refinement network in a self-distillation paradigm, achieving both high accuracy and efficiency.

摘要

关键词

Depth refinementNoisy Poisson fusionSelf-distilled training

评审与讨论

审稿意见

评分: 6置信度: 42024-06-25

This paper introduces a Self-distilled Depth Refinement (SDDR) framework to enhance robustness against noise. This framework primarily includes depth edge representation and edge-based guidance. And they design an edge-guided gradient loss and an edge-based fusion loss. Furthermore, experiments on five benchmarks have validated the effectiveness of the framework.

优点

The paper is well-written, and the method and results are presented clearly. Using the strategy of iterative depth refinement is helpful for this task. The framework achieves excellent performance on widely used deblurring datasets.

缺点

The authors did not specify in the paper which sensor or modality was used for depth estimation.
The authors decouple depth prediction errors into two degradation components: local inconsistency noise and edge deformation noise. And they give two samples in Fig. 2. However, I don't think this makes it clear that the errors fall into either category. The authors should provide accurate sources of error and analyze the significance of these two types of errors among all the errors.
Line 121: In Motivation Elaboration, the authors primarily focused on analyzing the limitations of previous work but did not offer a compelling justification for the motivation behind their current study.
Line 149: The authors claim to propose an edge-guided gradient loss, but this loss has also been used in the following two papers. Please clarify the differences.
[1] Wang Z, Ye X, Sun B, et al. Depth upsampling based on deep edge-aware learning[J]. Pattern Recognition, 2020, 103: 107274.
[2] Qiao X, Ge C, Zhang Y, et al. Depth super-resolution from explicit and implicit high-frequency features[J]. Computer Vision and Image Understanding, 2023, 237: 103841.

问题

The paper proposes a novel and interesting framework and demonstrates a clear advancement in this task performance, so I would like to accept the paper. However, the lack of clear motivation makes the paper somewhat confusing to read, so I recommend borderline. If the authors can address my concerns, I am ready to change my recommendation based on the comments.

局限性

See the weaknesses.

作者回复

2024-08-06

Dear Reviewer 9uAR:

Thanks for your positive feedback and valuable questions. We address all your comments as follows.

Weakness 1: Sensor and Modality

We specify the modality of depth models and datasets.

(1) Models. Similar to previous monocular depth models, SDDR takes a RGB image as input, without other modalities.

(2) Data. Depth prediction and refinement use various data to train and evaluate, with RGB as input and depth as ground truth. Depth maps are annotated by varied techniques, e.g., CG rendering (TartanAir, IRS), stereo matching (HRWSI, VDW), LiDAR (IBims-1), and Kinect (DIML). These modalities are not used as input for models. We combine these data for experiments as in lines 561-588.

Weakness 2: Two Noise Components.

We illustrate the source and significance of local inconsistency noise $\epsilon_{cons}$ and edge deformation noise $\epsilon_{edge}$ with the analysis and experiments below.

(1) We illustrate the source of our noises in 3 aspects.

Our noises are motivated by vital problems of the task. Prior arts broadly recognize that depth blur and inconsistency are two key problems for the task. E.g., as in GBDF, "depth maps are usually blurry with inaccurate details such as planes and object boundaries." PatchFusion discusses that "we discover that BoostingDepth suffers from scale inconsistencies." The two problems could stem from resolution and receptive field. Low-resolution inference loses detail, while high-resolution prediction leads to inconsistent structures due to limited model receptive field. Our noises model the two dominant degradations between predicted and ideal depth.

Our noises generally conclude failures in prior arts. Previous methods try to solve the above two problems intuitively, e.g., by selecting and merging patches. In contrast, we model refinement by edge deformation and local inconsistency noise. Derivations in Eq.1-4 and Sec.3.1 generally reveal reasons for blur and inconsistency in prior arts. E.g., fusing patches as Boost produces higher $\epsilon_{cons}$ with inconsistent structures. Filtering as GBDF fails to suppress $\epsilon_{edge}$ , yielding blurred details. Our noises conclude previous failures and guide our design.

Our noises accurately depict the aforementioned problems. In Fig.F of the rebuttal PDF, we further provide visual results of our noises. Simulated by regional affine transformations, $\epsilon_{cons}$ represents disrupted depth structures. With position-constrained Gaussian distributions (line 538), $\epsilon_{edge}$ accurately depicts missing or blurred edges. Depth errors can be decoupled with our noises. E.g., $\epsilon_{edge}$ is prominent in low-resolution predictions, while $\epsilon_{cons}$ occupies larger ratios for high resolution.

(2) We show the significance of our noises by 3 experiments.

In lines 545-552, PSNR between predictions and ideal depth with our noises exceeds 40 dB across all samples of Middlebury2021, quantitatively proving the significance of our noises among all errors.

In Fig.2, adding $\epsilon_{cons}$ and $\epsilon_{edge}$ accurately depicts the discrepancy between depth prediction and ground truth. Combining the noises broadly covers depth prediction errors.

Fig.D of the PDF proves that the lower the noise, the better the depth quality. Edge errors and noise levels exhibit a positive correlation. The low noise levels of SDDR bring fine-grained depth.

Weakness 3 and Question 1: Motivation Elaboration

Thanks for your insightful advice! We further discuss motivations of our designs, which will be added to our revised paper.

(1) Noise Modeling. As in Weakness 2, our noises model two key problems of the task.

(2) Poisson Fusion. In lines 107-109 and 40-46, Poisson fusion integrates the value and gradient domain of two inputs. In lines 205-207 and 465-475, low-resolution depth tends to be consistent but blurred, while high-resolution depth involves accurate edges but inconsistency. Poisson fusion is potentially well-suited for refinement, merging consistent depth with meticulous edges. We implement the Poisson fusion operator as a learnable refinement network, without relying on external fusion mask and complex parameter setting (line 130). Our method produces consistent and fine-grained depth with strong generalizability.

(3) Self-distillation. In Eq.2, the optimization objective of Poisson fusion serves as the training loss of refinement network. Low-noise edge labels are necessary to guide the model but are unavailable in diverse natural scenes. Given that the refinement network inherently reduces noises and restores details, self-distillation can naturally form. Depth edge representations are generated as labels by coarse-to-fine refinement. When depth maps are better refined, the labels become more noise-free.

Weakness 4: Comparisons with Previous Loss

Our edge-guided gradient loss differs from prior losses [1, 2] in 4 aspects.

(1) Supervision Paradigm. Previous losses[1, 2] work in a fully-supervised manner with edges from depth ground truth. Our loss serves for self-distilled learning with depth edge representations. As in lines 33-36 and 91-93, in some natural-scene data, edges from ground truth are unreliable. Our loss establishs accurate edge guidance for self-training.

(2) Classification vs. Regression. Wang et al.[1] adopt cross-entropy loss with hard edge labels, classifying whether a pixel belongs to edge area. Our loss guides the model to learn soft edge representations by regression, enforcing both edge area and intensity precisely.

(3) Global vs. Local. Qiao et al.[2] match gradients globally, while our loss works on high-frequency local regions $P_n$ (line 197). Our model improves details in $P_n$ and preserves consistency in flat areas.

(4) Scale and Shift Alignment. Qiao et al.[2] ignore depth scale ambiguity, whereas we perform scale and shift alignment to maintain consistency.

2024-08-11

I appreciate the authors' response to my questions.

There are differences between degradation and noise. Although both image degradation and image noise involve the decline of image quality, they differ in their causes and manifestations. In this paper, I believe that "degradation" is more accurate.

Overall, most of the concerns have been resolved.

2024-08-12

Dear Reviewer 9uAR,

Thank you for your positive feedback. We are glad to hear that our rebuttal solves most of your concerns.

We sincerely appreciate your valuable suggestion regarding the word choice of “degradation” and “noise” in our paper writing. Following your advice, we will adjust the word usage in the revised manuscript to provide more accurate expressions.

审稿意见

评分: 5置信度: 52024-07-13

This paper presents a novel framework, SDDR, for enhancing the resolution and detail of depth maps generated by estimation models. By conceptualizing depth refinement within the context of noisy Poisson fusion, the authors have developed a method that effectively tackles the prevalent issues of inefficiency and inconsistency. SDDR incorporates a self-distillation technique that enhances depth edge precision and diminishes noisy disturbances, resulting in a significant boost in both the accuracy and quality of the depth maps.

优点

The paper introduces SDDR that addresses the limitations of traditional depth refinement methods by treating the problem as a noisy Poisson fusion task.
SDDR demonstrates strong robustness against local inconsistency and edge deformation noise, which are common in depth prediction tasks, leading to improved accuracy and edge quality.
The use of edge-guided gradient loss and edge-based fusion loss as part of the optimization objective results in more accurate and noise-free depth edge representations.

缺点

The performance of SDDR might be highly dependent on the quality of the initial depth predictions; noisy or low-quality inputs could affect the final output.
The refinement process might inadvertently smooth out important details or edges in the pursuit of noise reduction.
The authors claim that the proposed method have a promising performance, especially on edge sharpness, however, it seems not that visually competitive compared with some SOTA monocular depth estimation methods, such as DepthAnythingV2. I understand that there are some differences on the targets between DepthAnything and this paper.

问题

In 4.2, Coarse-to-fine Edge Refinement, is S=3 enough for getting the best results? Higher value of S can be explored. is there any trade-off?
The paper demonstrates its promising performance on edge regions. Edge-specific measurements should be presented to highlight this point.

局限性

The method is designed to address specific types of noise (local inconsistency and edge deformation) and may not perform as well with other noise characteristics or in the presence of different degradations.
The refined depth maps, despite improvements, might still contain artifacts or inconsistencies, particularly in complex scenes with challenging depth structures. The authors should present some of the results of those corner cases. If there is no such cases, the authors should point out the reasons and give a deep analysis for readers.

作者回复

2024-08-04

Dear Reviewer DKoq:

Thanks for your valuable feedback. We address all your questions as follows.

Weakness 1: Dependence on Initial Depth

SDDR achieves strong robustness regarding the quality of initial depth, noticeably improving depth edges and details, even when faced with low-quality initial depth. Our robustness is brought by the self-distillation paradigm. GBDF relies on initial depth with filtering as labels, inheriting errors and noises. SDDR conducts coarse-to-fine refinement for accurate edge representation in self-distillation, acquiring strong robustness to correct and refine.

Fig.7 and lines 274-276 of the paper have proved our robustness by the curves of $\delta_1$ with higher noises of initial depth. Here, we conduct three more experiments for further demonstration.

(1) Simulation Experiment. In Fig.A, we gradually degrade initial depth. Despite increasingly noisy initial depth, SDDR maintains meticulous edges and details, as proved by edge-specific ORD metric and visual results. When initial depth degrades, SDDR only exhibits negligible fluctuations, bringing higher relative improvements.

(2) Real Samples. In Fig.B, GBDF inherits noises and errors from low-quality initial depth, while SDDR robustly corrects the errors, restores the structures, and improves the details.

(3) More Advanced Depth Predictors. SDDR can work with various predictors. Using more advanced predictors, e.g., the Depth-Anything, brings improvements without extra effort. In Fig.C, with better depth of Depth-Anything-V2, SDDR further boosts edges and details.

Weakness 2: Smoothing Out Details in Noise Reduction

In all our experiments, SDDR does not smooth out edges. Across various depth predictors and data, SDDR robustly enhances edges and details, as proved by visualizations and edge metrics (ORD and D3R) in Tab.1-4, 6-9, Fig.1, 3-6, 8, and 14-15 of paper and appendix.

The lower the local inconsistency noise $\epsilon_{cons}$ and edge deformation noise $\epsilon_{edge}$ , the better the edges and details. Fig.D proves this point. We represent noise levels by adding the noise intensity of $\epsilon_{cons}$ and $\epsilon_{edge}$ as lines 537-544 of appendix. In Fig.D, edge errors and noise levels exhibit a positive correlation. We achieve lower noise level and better edge quality.

Weakness 3: Comparison with Depth-Anything-V2

SDDR noticeably improves edges and details with Depth-Anything models as depth predictors. As mentioned by Reviewer DKoq, refinement differs from recent Depth-Anything. SDDR is a plug-in module to refine initial depth, with much fewer FLOPs and params than depth predictors. Thus, in Fig.C, we adopt Depth-Anything models as predictors. SDDR also produces finer edges and details. Fig.D quantitatively shows our better edge quality and lower noise level.

Question 1: Iteration Number S

In Tab.A, S=4 and 5 only produce subtle improvements than S=3. Edges and details are sufficiently refined after S=3. Thus, edge metrics ORD and D3R are saturated for S=4 and 5. More iterations lead to higher time costs. We adopt S=3 for trade-off between efficiency and performance.

Question 2: Edge-specific Measurements

We follow Boost, Kim et al., and GBDF to adopt the common edge-specific measurements ORD and D3R for depth refinement, as stated in line 241. We report the metrics on 5 benchmarks throughout our tables. The metrics are further depicted in lines 606-613 of appendix.

Limitation 1: Different Noises and Degradations

Our local inconsistency and edge deformation noise are general and effective representations of depth prediction errors in diverse scenarios, rather than being specific to particular cases. Reasons are presented in four aspects.

(1) Domain Knowledge. Prior arts broadly recognize that depth blur and inconsistency are two key problems for the task. E.g., as in GBDF, "depth maps are usually blurry with inaccurate details such as planes and object boundaries." PatchFusion discusses that "we discover that BoostingDepth suffers from scale inconsistencies." However, prior arts only try to solve the problems intuitively, e.g., by selecting and merging patches, leading to unsatisfactory results and limited generalizability.

(2) Theoretical Derivation. In contrast, we model refinement by noisy Poisson fusion with edge deformation and local inconsistency noise. Eq.1-4 and Sec.3.1 of our paper provide derivations of our method, which generally reveal reasons for blur and inconsistency of prior arts. E.g., fusing patches as Boost produces higher local inconsistency noise, leading to inconsistent structures. Filtering as GBDF fails to suppress edge deformation noise, yielding blurred details. Noisy Poisson fusion and the noises generally conclude failures of prior arts and guide our design.

(3) Experiments. In Fig.2 of the paper, combining $\epsilon_{cons}$ and $\epsilon_{edge}$ accurately depicts depth error. In lines 545-552 of appendix, PSNR between predictions and ideal depth with our noises exceeds 40 dB across Middlebury2021, showing that $\epsilon_{cons}$ and $\epsilon_{edge}$ generally and accurately represent depth error.

(4) Model Generalization. SDDR is designed under the guidance of Noisy Poisson Fusion, aiming to suppress the 2 noises. Our state-of-the-art performance on 5 datasets, with synthetic and real-world, indoor and outdoor, dynamic and static scenes, shows our strong model generalizability, which also proves that our noises are broadly effective and not limited to particular cases.

Limitation 2: Corner Case

We show corner cases in Fig.B, involving complex details, challenging structures, and light overexposure. All compared methods cannot produce completely satisfactory results, e.g., noisy leaves with blurred edges and inconsistent roof with a black hole. However, SDDR still significantly improves edges and details over LeReS and GBDF, refining initial depth effectively.

2024-08-13

Thanks for the detailed response. The authors have addressed all of my concerns. Thus, I will increase my rating.

2024-08-13

We are glad that our rebuttal can address all your concerns. Thanks for your positive feedback!

审稿意见

评分: 5置信度: 32024-07-13

The paper introduces a novel framework called Self-Distilled Depth Refinement (SDDR) to enhance depth refinement, which aims to infer high-resolution depth maps with fine-grained edges from low-resolution depth estimations. The authors propose modeling depth refinement as a noisy Poisson fusion problem, addressing local inconsistency and edge deformation noises. The SDDR framework consists of depth edge representation and edge-based guidance. Through coarse-to-fine self-distillation, SDDR generates low-noise depth edge representations, which serve as pseudo-labels to guide the refinement process. The method demonstrates significant improvements in accuracy, edge quality, efficiency, and generalizability across five different benchmarks.

优点

Innovative Approach: The modeling of depth refinement as a noisy Poisson fusion problem is a novel and insightful approach that effectively addresses common issues in depth refinement.
Robust Framework: The self-distillation technique employed in SDDR enhances robustness against noise, resulting in high-quality depth maps with accurate edges.
Comprehensive Evaluation: The authors conduct extensive experiments across five benchmarks, showcasing the method's superior performance in various scenarios.
Efficiency: SDDR achieves higher efficiency compared to two-stage tile-based methods, reducing computational costs while maintaining or improving accuracy and edge quality.
Generalizability: The framework demonstrates strong generalizability, performing well on both synthetic and real-world datasets.

缺点

Real-World Application: The paper primarily focuses on benchmarks and does not provide extensive discussion on real-world applications and potential limitations in practical scenarios.
Edge Case Handling: The method’s performance in handling extreme edge cases or highly noisy data is not thoroughly explored.
Ablation Studies: More detailed ablation studies are needed to understand the contribution of each component within the SDDR framework.

问题

In Line 107-108, the logic between the previous works and your motivation is unsmooth; it would be better to provide more analysis.
In Figure 2, the author didn't show the edge deformation visualization. What kind of deformation would occur in depth estimation? Most off-the-shelf depth estimators have not produced deformed results in depth edges or other regions.
Do the results in state-of-the-art work like DepthAnything-V2 exhibit the artifacts, consistency, and deformations proposed by the authors?
Can you provide more insights into the performance differences between synthetic and real-world datasets?
The performances compared to other works seem not very significant due to the combination of several contributions in this paper. How can the authors judge whether the gains are from randomness of training, the settings of the models' parameters, or other factors?
What is the training time comparison between SDDR and other state-of-the-art methods?
How sensitive is the SDDR framework to hyperparameter settings?
How does SDDR perform on extreme edge cases with very high noise levels?
Can SDDR be adapted for real-time applications, and what modifications would be necessary?

局限性

The paper provides extensive quantitative and qualitative results showcasing the strengths of the SDDR framework. However, it lacks a detailed analysis of potential failure cases or scenarios where the method does not perform well, which could provide insights for further improvements.

作者回复

2024-08-07

Dear Reviewer Ca4W:

Thanks for your positive feedback and valuable questions. We address all your comments as follows.

Weakness 1 and Limitation 1: Applications and Limitations

(1) Applications. SDDR produces accurate depth with meticulous edge and consistent structure, suitable for various applications such as style transfer, bokeh rendering, and 3D reconstruction. In Fig.E, we achieve better detail and structure over GBDF for the applications.

(2) Limitations. For complex and meticulous structures in Fig.E and Fig.B, all compared methods cannot produce perfect depth, e.g., some blurred leaves in the first row, leading to incomplete structures in generated images. However, for the depth maps and applications of these cases, SDDR still achieves significantly better edges and details than prior arts. This also underscores the necessity of the depth refinement task to refine edges and consistency of depth prediction models.

Weakness 2 and Question 8: Highly Noisy Edges

SDDR achieves strong robustness against noises, noticeably improving depth edges and details, even when faced with highly noisy initial depth. Our robustness is brought by the self-distillation paradigm. GBDF relies on initial depth with filtering as labels, inheriting errors and noises. SDDR generates accurate edge representation in self-distillation, acquiring robustness to correct and refine.

Fig.7 and lines 274-276 have proved our robustness by the curves of $\delta_1$ with higher noises of initial depth. Here, we conduct two more experiments for further demonstration.

(1) Simulation. In Fig.A, we gradually degrade initial depth. Despite increasingly noisy initial depth, SDDR maintains meticulous edges and details, as proved by ORD metrics and visual results. When initial depth degrades, SDDR only exhibits negligible fluctuations, bringing higher relative improvements.

(2) Real Samples. In Fig.B, GBDF inherits noises and errors from noisy initial depth, while SDDR robustly corrects the errors, restores the structures, and improves the details.

Weakness 3: More Detailed Ablation of Each Component

As per your advice, in Tab.B, we further ablate and prove the contribution of each component within SDDR. Since depth refinement aims to improve depth edges, following previous Boost, Kim et al., and GBDF, edge-specific metrics D3R and ORD are our main focus.

(1) By Components. Starting from depth predictor MiDaS, in each row, we involve one more component and discuss the relative performance compared to the previous row. The self-distillation with depth edge representation and coarse-to-fine refinement reduces the edge error D3R by 8.4% and 5.0%. The edge-guided gradient loss and edge-based fusion loss also reduce D3R by 4.8% and 3.2%.

(2) In Total. Combining all components, SDDR reduces edge errors D3R and ORD by 23.0% and 8.2% than MiDaS. For overall depth quality, SDDR decreases depth error REL by 6.2%. This is noteworthy and proves our strong efficacy, since edges only occupy a very small proportion of images.

Question 1: Smoothing the Motivation (Line 107)

Thanks for advice. We will include discussions below to smooth our motivation for Poisson fusion.

Poisson fusion integrates values and gradients of two inputs. For depth refinement, low-resolution depth is consistent but blurred, while high-resolution depth involves accurate edges but inconsistency. Poisson fusion is potentially well-suited for our task, merging consistent depth with meticulous edges.

Question 2: Edge Deformation Noise

We apologize for causing misunderstanding of the noise, which actually does not represent deformed objects. Instead, it showcases misalignments of predicted and ideal edges, e.g., missing, broken, or blurred depth edges. In Fig.F, we visualize our two noises separately. Edge deformation noise accurately depicts missing or blurred edges, while local inconsistency noise represents inconsistent depth structures. We will revise the paper with the illustrations to avoid misunderstanding.

Question 3: Degradation in Depth-Anything-V2

In Fig.C, Depth-Anything models also produce blurred edges or missing structures. SDDR further refines their predictions with better depth edges and details.

Question 4: Synthetic and Real-world Data

Differences between synthetic and real data are in lines 28-36 and 87-98. Similar to insights in prior arts, real-world data contains diverse scenes but suffers from sparse, blurred, or inaccurate depth. Synthetic data is rendered under full control with accurate depth but limited scenes. Thus, MiDaS, DPT, and DepthAnything combine the two types to balance accuracy and generalization. SDDR is not limited by data formats, since it generates accurate edge representation for self-distillation.

Question 5 and Question 7: Significant Performance, Training Randomness, and Parameters

(1) As proved throughout our figures and tables, SDDR significantly improves depth edges and details over prior arts, which is further demonstrated in Weakness 3 by each component.

(2) Our superior performance is not caused by training randomness. In Tab.C, we train SDDR 12 times. Our performance is stable only with minimal fluctuations.

(3) SDDR is not sensitive to parameters. For training and method parameters, in Tab.C, SDDR remains stable with varied learning rates, loss ratios, training epochs, iteration numbers, overlapping ratios, etc. For other issues, e.g., optimizer and initialization, we follow the prior Boost and GBDF.

Question 6: Training Time

SDDR and prior GBDF have similar training time (12 hours for 3 epochs on a A6000 GPU).

Question 9: Real-time Processing

Pursuing highly fine-grained depth, current refinement methods can't be real-time (25 fps). However, in Fig.1(b) and Tab.5, our SDDR achieves the best efficiency among all compared methods. For real-time applications, pruning and quantization can be used to further improve efficiency.

评论- Official Comment by Authors to Reviewer Ca4W

2024-08-14

Dear Reviewer Ca4W,

We would like to express our sincere gratitude for the time and effort you dedicated to reviewing our manuscript. We truly appreciate that you have positively recognized the strengths of our work, including the “Innovative Approach”, “Robust Framework”, “Comprehensive Evaluation”, “Higher Efficiency”, and “Strong Generalizability”.

Besides, to carefully address all your questions, we have conducted comprehensive experiments and analyses in the rebuttal. The raised questions help us to further improve the comprehensiveness and clarity of our paper. Through the comments of Reviewer DKoq and 9uAR, we are glad to know that our rebuttal solves all their concerns.

We also hope that your valuable questions can be answered and resolved properly. If you have further questions, please use the official comment to propose and discuss. You can also present your rating after evaluating our responses. Thanks again for your meticulous review and suggestions on our paper.

Best Regards,

Authors of Paper 1051

评论- Official Comment by Reviewer Ca4W

2024-08-14

Thanks for the detailed response. The authors have addressed all of my concerns. Thus, I would like to increase my rating.

2024-08-14

Thanks for your positive feedback on our work!

作者回复

2024-08-07

Dear Reviewer DKoq, Ca4W, and 9uAR:

We would like to express our sincere gratitude for your insightful comments and constructive suggestions on our paper. In the rebuttal, we have diligently incorporated comprehensive discussions and experiments to address all the raised queries, comments, and concerns.

Here, we provide a general description of our responses for your convenience.

(1) We separately respond to the three reviewers question by question in the author rebuttal area for each reviewer. Please refer to that area for the answer to your comments.

(2) To address all the questions of the three reviewers, we have included three tables and six figures in the rebuttal. The six figures (marked by Fig.A, B, C, D, E, and F) are contained in our submitted rebuttal PDF. The three tables (marked by Tab.A, B, and C) are presented in this overall rebuttal area, after the general description.

We hope our further discussions, analyses, and experiments can solve all raised concerns and provide better illustrations of the proposed SDDR framework. Thanks again for your meticulous review and valuable insights!

Tab.A, B, and C are listed below.

Table A: Iteration Number S (Reviewer DKoq - Question 1). Based on Tab.4(a) of the paper, we further increase the iteration number S of coarse-to-fine edge refinement to S=4 and S=5. The performance converges after S=3. More iterations lead to higher time costs. We adopt S=3 for the trade-off between efficiency and performance. Refer to Question 1 for detailed analysis.

$Method$	$D3R\downarrow$	$ORD\downarrow$	$REL\downarrow$	$\delta_1\uparrow$
S=0	0.235	0.313	0.125	0.859
S=1	0.223	0.309	0.122	0.860
S=2	0.219	0.307	0.120	0.860
S=3	0.216	0.305	0.120	0.862
S=4	0.215	0.305	0.121	0.862
S=5	0.215	0.304	0.121	0.860

Table B: Detailed Ablation of Each Component (Reviewer Ca4W - Weakness 3). We ablate the contribution of each component within SDDR by zero-shot evaluations on the Multiscopic dataset with MiDaS as the depth predictor. The refinement baseline adopts the common fully-supervised training by depth ground truth without self-distillation. Depth edge representation indicates generating self-distillation pseudo-labels with a single iteration. Coarse-to-fine edge refinement refers to enhancing the pseudo-labels through iterations S=3. In each row, we involve one more component and report the results. Refer to Weakness 3 in our response for detailed discussions.

$Method/Component$	$D3R\downarrow$	$ORD\downarrow$	$REL\downarrow$	$\delta_1\uparrow$
MiDaS	0.274	0.292	0.130	0.839
Refinement Baseline	0.263	0.286	0.129	0.841
+ Depth Edge Representation	0.241	0.280	0.128	0.848
+ Coarse-to-fine Edge Refinement	0.229	0.271	0.124	0.851
+ Edge-guided Gradient Loss	0.218	0.268	0.124	0.851
+ Edge-based Fusion Loss	0.211	0.268	0.122	0.852

Table C: Training Randomness and Hyperparameters (Reviewer Ca4W - Question 5 & Question 7). We train the SDDR 12 times. The performance is stable only with minimal fluctuations. Besides, our method is not sensitive to both training and method parameters. We adjust the learning rates, training epochs, loss ratios, the iteration number S and overlapping ratios of coarse-to-fine refinement, along with the percentile $a$ and repeat steps $N_w$ in the edge-based fusion loss. The model performance remains stable with significant improvements over the depth predictor LeReS. Refer to Question 5 and Question 7 in our response for discussions.

$Type$	$Parameter Settings$	$D3R\downarrow$	$ORD\downarrow$	$REL\downarrow$	$\delta_1\uparrow$
LeReS	-	0.326	0.359	0.123	0.847
Training	learning rate=1e-4,2e-4,5e-5	0.215±0.001	0.305±0.001	0.119±0.001	0.862±0.001
Training	epochs=3,4,5	0.214±0.003	0.303±0.002	0.121±0.001	0.861±0.001
Training	$\lambda_1$ =1.0,0.5,0.1	0.215±0.004	0.305±0.002	0.120±0.001	0.861±0.002
Training	$\lambda_2$ =1.0,0.5,0.1	0.218±0.003	0.306±0.001	0.120±0.002	0.862±0.002
Method	S=3,4,5	0.215±0.002	0.305±0.001	0.120±0.001	0.861±0.001
Method	overlapping ratio=0.1,0.2,0.4	0.215±0.001	0.306±0.002	0.119±0.001	0.862±0.001
Method	$N_w$ =2,3,4; $a$ =1%,2%,4%	0.217±0.002	0.306±0.001	0.119±0.001	0.862±0.001

Please download the rebuttal PDF by the button below, for the six figures (Fig.A - F) with more visual results and illustrations.

(Best view zoomed in on-screen for details and comparisons.)

最终决定Accept (poster)

2024-09-25

The paper introduces a new framework termed as Self-Distilled Depth Refinement (SDDR) to enhance depth refinement, which infers high-resolution depth maps with fine-grained edges from low-resolution depth estimations. The paper models depth refinement as a noisy Poisson fusion problem, addressing local inconsistency and edge deformation noises. Through coarse-to-fine self-distillation, SDDR generates low-noise depth edge representations, which serve as pseudo-labels to guide the refinement process. The method demonstrates improvements in accuracy, edge quality, efficiency, and generalizability across different benchmarks.

The paper was reviewed by three reviewers, whereas the final recommendations were two Borderline Accept and one Weak Accept. The authors made a successful rebuttal in addressing the reviewers' concerns, whereas the rating improves consistently. Given the consistent recommendations, the paper could be accepted while the authors are requested to revise the paper correspondingly.