PaperHub
6.1
/10
Poster4 位审稿人
最低3最高4标准差0.4
3
3
4
3
ICML 2025

Balanced Learning for Domain Adaptive Semantic Segmentation

OpenReviewPDF
提交: 2025-01-22更新: 2025-07-24

摘要

关键词
Semantic segmentationUnsupervised domain adaptation

评审与讨论

审稿意见
3

The paper proposes Balanced Learning for Domain Adaptation (BLDA) to address class bias in unsupervised domain adaptation (UDA) for semantic segmentation. BLDA analyzes logits distributions to assess prediction bias and introduces an online logits adjustment mechanism to balance the class learning in both source and target domains. Experimental results demonstrate consistent performance improvements when integrating BLDA with various methods on standard UDA benchmarks.

给作者的问题

What are the experimental results of unsupervised domain adaptation at Cityscapes to ACDC[1]? [1] ACDC: The Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding. ICCV 2021

论据与证据

The claims made in the submission are supported by clear and convincing evidence.

方法与评估标准

The proposed method makes sense for the problem and experiment results demonstrate its effectiveness.

理论论述

I have checked the correctness of any proofs for theoretical claims. The paper states that "the distribution of logits predicted by the network can assess the degree of class bias", which is demonstrated by Fig.1(c) and Fig.1(d).

实验设计与分析

I have checked the experimental results. What are the experimental results of unsupervised domain adaptation at Cityscapes to ACDC[1]? [1] ACDC: The Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding. ICCV 2021

补充材料

I have reviewed the supplementary material. It includes the derivation of formulae, evaluation metrics, implementation details of online logits distribution estimation, the influence of parameters setting, and more experimental results.

与现有文献的关系

BLDA analyzes logits distributions to assess prediction bias and introduces an online logits adjustment mechanism to balance class learning in both source and target domains. It inspires more researchers to utilize logit distributions to assess prediction bias.

遗漏的重要参考文献

No related works that are essential to understanding the context for key contributions of the paper, but are not currently cited/discussed in the paper.

其他优缺点

The writing is excellent, the charts are beautiful, and the formulas are exciting! The experiments are slightly lacking and would like to show more experimental results for unsupervised domain adaptation settings such as Cityscapes to ACDC.

其他意见或建议

No

作者回复

We sincerely thank the reviewer for the positive and constructive feedback. We appreciate your recognition of motivation, theoretical intuition, and clear writing, as well as your kind comments on our visualizations and formula design.


Regarding your suggestion to include more experimental results for the Cityscapes → ACDC setting, we agree that evaluating BLDA under more diverse and challenging UDA scenarios is important and we conduct extended experiments on image classification/video semantic segmentation in Appendix G/H. We conduct additional experiments on the Cityscapes → ACDC benchmark and present the results below.

As shown in the following tables (∗ denotes the reproduced result), integrating BLDA consistently improves performance across multiple baselines, including DACS, DAFormer, and MIC, on both mIoU and mAcc metrics. Importantly, BLDA is designed as a plug-and-play module that is model-agnostic and can be seamlessly integrated into most existing UDA baselines without modifying their core architectures or training objectives. Since class bias is a pervasive issue across UDA settings, our method provides a general mechanism to mitigate this imbalance during training. As such, BLDA is able to consistently improve performance across a wide range of models and target domains, as demonstrated in our experiments on both standard benchmarks and the newly added Cityscapes → ACDC setting.

Cityscapes → ACDC (IoU, %)

MethodArch.RoadSidewalkBuildingWallFencePoleLightSignVegTerrainSkyPersonRiderCarTruckBusTrainMotorBikemIoUstd
DACS*C78.536.373.428.717.042.257.145.668.326.775.051.324.575.637.843.141.328.826.846.019.3
+BLDAC74.641.967.331.519.846.954.850.971.729.569.454.626.674.241.945.049.831.127.947.917.1
DAFormer*T65.552.879.139.837.256.457.651.372.237.359.954.725.083.669.668.072.939.633.255.516.3
+BLDAT63.850.773.440.240.753.354.650.070.740.364.358.933.983.574.375.578.945.137.657.415.2
MIC*T89.563.086.356.746.264.265.864.275.546.983.267.945.687.785.992.388.754.055.469.415.9
+BLDAT88.766.887.861.148.963.365.267.373.750.582.468.249.586.780.591.191.054.661.870.514.2

Cityscapes → ACDC (Acc, %)

MethodArch.RoadSidewalkBuildingWallFencePoleLightSignVegTerrainSkyPersonRiderCarTruckBusTrainMotorBikemAccstd
DACS*C96.752.981.939.236.351.873.155.090.735.076.160.133.384.544.344.259.735.839.857.420.0
+BLDAC96.166.782.740.736.658.966.467.484.141.370.768.436.384.848.946.559.947.441.460.317.7
DAFormer*T98.263.486.751.242.070.377.365.193.452.660.660.353.790.789.370.890.357.842.769.317.4
+BLDAT98.861.385.555.845.473.282.572.892.051.765.069.251.590.589.580.092.860.651.172.116.4
MIC*T99.469.292.276.057.773.589.580.994.962.889.382.757.597.691.996.996.660.272.681.114.2
+BLDAT98.872.393.886.159.773.690.889.993.273.990.881.468.396.394.897.097.665.374.184.112.1

We will include these results in the revised version of the paper.


We hope our response can resolve your concern. Please do not hesitate to let us know if you have further questions.

审稿意见
3

This paper presents Balanced Learning for Domain Adaptation (BLDA), an innovative approach to address class imbalance and distribution shifts in Unsupervised Domain Adaptation (UDA) for semantic segmentation. Specifically, it identifies over-predicted and under-predicted classes through the analysis of predicted logits and employs a post-hoc approach to align logits distributions across different classes using shared anchor distributions. During self-training, BLDA estimates logits distributions online and incorporates correction terms into the loss function to ensure unbiased pseudo-label generation. Extensive experiments on standard UDA benchmarks have demonstrated that BLDA consistently improves performance, particularly for under-predicted classes, when integrated with various existing methods.

给作者的问题

The major question is about Eq. 4. As P(argmaxc[C]fθ(x)[c]=ly=c)\mathbb{P}(\arg max_{c' \in [C]}f_{\theta}(x)[c']=l\|y=c) represents the probability of predicting class cc as ll and clc \neq l, why does the positive bias Bias(l)\mathrm{Bias}(l) indicates over-prediction? Given the condition y=cy=c, how do we explain the representation of the summation of c[C]c' \in [C]? As my understanding, it may be P(argmaxl[C]fθ(x)=ly=c)\mathbb{P}(\arg max_{l \in [C]}f_{\theta}(x)=l\|y=c).

论据与证据

Yes

方法与评估标准

Yes

理论论述

There is some confusion about Eq.(4). As P(argmaxc[C]fθ(x)[c]=ly=c)\mathbb{P}(\arg max_{c' \in [C]}f_{\theta}(x)[c']=l\|y=c) represents the probability of predicting class cc as ll and clc \neq l, why does the positive bias Bias(l)\mathrm{Bias}(l) indicates over-prediction? Given the condition y=cy=c, how do we explain the representation of the summation of c[C]c' \in [C]? As my understanding, it may be P(argmaxl[C]fθ(x)=ly=c)\mathbb{P}(\arg max_{l \in [C]}f_{\theta}(x)=l\|y=c).

实验设计与分析

This paper has constructed extensive experiments on the standard UDA benchmarks to demonstrate the effectiveness of the proposed post-hoc method.

补充材料

The supplementary material has provided sufficient experiment results, discussions about equations, implementation details, and comparisons with existing methods for the proposed method.

与现有文献的关系

The class imbalance problem is widely studied in semantic segmentation, cross-domain semantic segmentation, and other perception fields. This paper proposes a post-hoc method to balance over-prediction and under-prediction classes during domain adaptation, which is straightforward and makes sense. Moreover, the proposed method may cost a lot of time and computational resources due to the multiple GMMs.

遗漏的重要参考文献

N/A

其他优缺点

Strengths:

  1. The paper is generally well-written, well-structured, and easy to follow.
  2. The experiments are comprehensive, covering three transfer tasks for segmentation, an additional image classification task (included in the supplementary materials), and extensive qualitative analyses.

Weaknesses:

  1. The paper claims to stress the class-imbalanced problem by the proposed BLDA. However, it seems like the proposed method profoundly depends on the baseline models. For example, the performance of "Train" in Table 1 is close to 0 for DACS and DAFormer (C), which shows that the proposed method may still suffer from the class-imblanced problem.
  2. Concerns regarding the fairness of the comparisons. In Tables 1 and 2, the experiments seem to leverage high-quality pseudo-labels for self-training, which may inherently provide an advantage over existing methods, such as DAFormer, CDAC, HRDA, and MIC. This raises questions about whether the improved performance is due to the proposed method's unique contributions or simply a result of the enhanced quality of the pseudo-labels used.
  3. It would be more appropriate to compare the performance with the existing method beginning with the same source-only model.

其他意见或建议

N/A

作者回复

We sincerely thank the reviewers for their valuable feedback and thoughtful comments. We appreciate the recognition of our clear writing, comprehensive experiments, and extensive qualitative analyses. We address each of your concerns point by point.


Q1: Clarification on Eq. (4) and the Definition of Positive Bias

A1: Sorry for any misunderstanding. In Eq. (4), the expression argmaxc[C]fθ(x)[c]\arg\max_{c' \in [C]} f_\theta(x)[c'] refers to the predicted class for input xx. P(argmaxc[C]fθ(x)[c]=ly=c)P(\arg\max_{c' \in [C]} f_\theta(x)[c'] = l \mid y = c) represents the probability that a sample from class cc is predicted as class ll. There is no constraint that clc \ne l in Eq. (4). The summation in this equation is taken over the conditioning variable cc, i.e., it averages the conditional probabilities P(argmaxc[C]fθ(x)[c]=ly=c)P(\arg\max_{c' \in [C]} f_\theta(x)[c'] = l \mid y = c) across all classes c[C]c \in [C].

By summing over all cc and taking the average, we obtain the expected probability that a sample from any class (including ll itself) is predicted as class ll. Under an unbiased model, this expectation should be approximately 1/C1/C. Therefore, a positive bias indicates that class ll is over-predicted, as its average prediction probability exceeds the uniform expectation.


Q2: Dependence on Baseline Models

A2: Our method is designed as a plug-and-play module that can be integrated into any self-training-based UDA framework. Naturally, its performance is influenced by the underlying baseline, especially when estimating the target-domain logits distribution, which relies on the pseudo-labels generated during self-training. Hence, the quality of pseudo-labels affects the accuracy of our distribution estimation and, consequently, the effectiveness of the class-aware adjustment.

However, we emphasize that BLDA consistently reduces class-level prediction variance across all baselines. As reported in the main paper, the standard deviation of per-class IoU and Acc drops significantly when BLDA is applied. This demonstrates that our method consistently yields more balanced predictions with reduced class bias, regardless of the baseline.


Q3: Fairness of Comparisons and Pseudo-Label Quality

A3: We would like to clarify that our method adheres to the standard UDA setting and does not modify the pseudo-label generation process of any baseline. As described in Section 3.2, we employ the self-training framework as implemented in existing works such as DACS, DAFormer, CDAC, HRDA, and MIC. All these methods adopt an online self-training protocol: the model is trained from scratch using labeled source data and unlabeled target data with pseudo-labels generated during training.

BLDA is inserted into this process as a lightweight module that adjusts the logits distributions based on estimated class-wise prediction behavior. The pseudo-label generation and training pipeline of each baseline remains unchanged. Therefore, the comparisons are conducted under fair and consistent settings, where each method follows the same online self-training paradigm (note that all baselines train the model from scratch, without using a source-only pretrained model).


Q4: Computation Overhead

A4: We provide a detailed analysis of the computational overhead in Appendix I, including actual resource usage and training time across various baselines. In summary, the additional cost introduced by our proposed components stems from three main operations. We have implemented them efficiently to minimize overhead:

  1. GMM Implementation: Instead of using off-the-shelf libraries that update Gaussian parameters sequentially, we store the parameters of all C×C×KC \times C \times K components as tensors in PyTorch and update them in parallel using matrix operations.
  2. CDF Computation: We approximate the cumulative distribution function using the Abramowitz-Stegun formula, which allows efficient polynomial evaluation.
  3. Inverse CDF Computation: We use interpolation techniques within the estimated value range, which avoids costly numerical inversion.

All the above operations can be efficiently performed using simple matrix operations on tensors. Moreover, the storage of Gaussian component parameters and the additional regression head (a 1×11\times 1 conv) introduced by our method are lightweight. Overall, our method demonstrates high efficiency in both training time and GPU memory, as reported in Table 13 of Appendix I. Moreover, our method introduces no additional overhead during inference.


We hope our response can resolve your concern. Please do not hesitate to let us know if you have further questions.

审稿意见
4

The paper proposes Balanced Learning Domain Adaptation for addressing class bias in unsupervised domain adaptative semantic segmentation. The authors identify class imbalance and distribution shifts as major obstacles in UDA and propose techniques to analyze logit distributions to assess prediction bias. The method introduces post-hoc logits adjustment and online logits adjustment to mitigate class bias and improve balanced learning across classes. Additionally, cumulative distribution estimation is used as domain-shared structural knowledge. The paper demonstrates significant performance improvements on standard UDA benchmarks.

给作者的问题

  1. How does the method scale to larger datasets or real-time applications, given the computational cost of GMM-based logits modeling?

论据与证据

  1. The claim that logit distribution differences correlate with class bias is well-supported by experimental visualizations of FIG 6.
  2. The effectiveness of online logits adjustment and post-hoc adjustment is supported by ablation studies。
  3. The effectiveness of Balanced Learning is supported by consistent improvements in mIoU and mAcc metrics across several baselines and benchmarks.

方法与评估标准

  1. The proposed methods, including post-hoc logits adjustment and online logits adjustment, are well-suited to address the stated problem of class bias in UDA.
  2. The use of standard UDA benchmarks ensures fair and meaningful evaluation.

理论论述

No major theoretical claims were presented beyond the statistical modeling of logits distributions and their alignment.

实验设计与分析

The experimental designs are sound, with appropriate baselines and thorough ablation studies. The use of mIoU and mAcc metrics is effective for assessing both overall accuracy and balanced performance across classes.

补充材料

The derivations and more experiments in the supplementary material were reviewed.

与现有文献的关系

The idea of using logits distribution analysis is related to class-imbalanced learning and focusing on distribution shifts between domains.

遗漏的重要参考文献

None

其他优缺点

Strengths:

  1. The method is versatile and can be integrated with various self-training-based UDA frameworks.
  2. The authors provide comprehensive experiments, including multiple benchmarks and ablation studies.
  3. The paper provides a clear theoretical explanation for logit distribution alignment and its connection to class bias, grounding the proposed method in established statistical principles.

Weakness:

  1. The GMM-based logits modeling may be computationally expensive, especially for larger datasets.
  2. The reliance on pre-defined anchor distributions may not generalize well to all datasets or domain shifts. The paper lacks discussion on alternative approaches, such as learned or adaptive anchors.

其他意见或建议

  1. The scalability of the method should be discussed, especially in terms of computational overhead introduced by GMMs.
作者回复

We sincerely thank the reviewer for the positive and constructive feedback. We appreciate your recognition of our method’s versatile design, the thoroughness of our experimental validation, and the clarity of our theoretical explanation. We address each of your concerns point by point.


Q1: Computational Cost and Scalability of GMM-based Modeling

A1: We provide a detailed analysis of the computational overhead in Appendix I, including actual resource usage and training time across various baselines. In summary, the additional cost introduced by our proposed components stems from three main operations. We have implemented them efficiently to minimize overhead:

  1. GMM Implementation: Instead of using off-the-shelf libraries that update Gaussian parameters sequentially, we store the parameters of all C×C×KC \times C \times K components as tensors in PyTorch and update them in parallel using matrix operations.
  2. CDF Computation: We approximate the cumulative distribution function using the Abramowitz-Stegun formula, which allows efficient polynomial evaluation.
  3. Inverse CDF Computation: We use interpolation techniques within the estimated value range, which avoids costly numerical inversion.

All the above operations can be efficiently performed using simple matrix operations on tensors. Moreover, the storage of Gaussian component parameters and the additional regression head (a 1×11\times 1 conv) introduced by our method are lightweight. Overall, our method demonstrates high efficiency in both training time and GPU memory, as reported in Table 13 of Appendix I. Moreover, our method introduces no additional overhead during inference.

For larger datasets, i.e., with more classes, the only change is that the GMM module needs to store a proportionally larger number of Gaussian components, which remains tractable in practice.


Q2: Generalization of Pre-defined Anchor Distributions

A2: We provide a detailed discussion of the anchor distribution design and its alternatives in Appendix J. A summary is provided below:

  1. Definition and Role of Anchor Distributions: We use the global positive and negative logits distributions from the source domain to estimate a shared anchor distribution for both the source and target domains. This shared anchor allows the target logits distribution to gradually align with the source, serving two purposes:

    • It provides a reference to balance learning progress across classes within each domain.
    • It acts as a bridge to align the source and target domains in terms of class-wise logits behavior.
  2. Different Selection Criteria for Anchor Distributions: While the anchor distribution is estimated from the source domain, we acknowledge that some discrepancy may exist between this estimation and the true distribution of the target domain. However, our analysis in Appendix B shows that the relative pos/neg bias is more important than the absolute value of the logits, and such relative structure tends to be preserved across domains due to the observation that the biases of positive and negative logits are coupled in Fig.1 (c). This explains why the impact of distributional mismatch is limited in practice.

    We also conduct ablation studies (Table 14) on different anchor selection strategies, including: Global source distribution (default); Global target distribution (oracle); Two biased classes (building and fence in Fig. 1(d)). The results show that different anchor choices lead to only marginal differences in performance, further demonstrating the robustness of our method to anchor selection and its ability to generalize across domain shifts.

  3. Generalization Across Datasets and Domain Shifts: Our logits adjustment mechanism can be interpreted as a cross-entropy loss with an adaptive margin. As discussed in Appendix B, this margin is implicitly determined by the relative difference between the positive and negative logits distributions. There are two key cases:

    • If both positive and negative logits in the anchor increase or decrease simultaneously, the margin remains stable.
    • If the relative gap between pos/neg distributions in the target differs from that in the anchor, the margin adapts accordingly to guide alignment.

    This behavior enables the method to generalize across domain shifts. As demonstrated in Table 15, the adaptive margin mechanism drives the target logits distribution to progressively align with the anchor distribution over training, confirming the intended behavior of our method.


We hope our response can resolve your concern. Please do not hesitate to let us know if you have further questions.

审稿意见
3

The study conducts an in-depth investigation into class bias within UDA scenarios, demonstrating that this bias stems from simultaneous shifts in both the label and data distributions, which complicates the domain adaptation process. To address this challenge, the authors introduce a novel approach that evaluates and reduces class bias through a class-balanced learning method. This approach derives adjustment factors from the distribution of logits, thereby overcoming the constraints inherent in conventional imbalanced class techniques. Notably, the proposed solution is implemented as a versatile plug-and-play module, suitable for broad applications in UDA. It begins by estimating distribution parameters, then applies dynamic logit adjustments to monitor the model’s learning progression, ensuring balanced class performance. Extensive experiments confirm that the method consistently yields significant improvements in performance, highlighting its effectiveness and flexibility.

update after rebuttal

Thanks for the authors' detailed response. The response has addressed most of my concerns. I hope the authors can revise their paper according to the reviewers' suggestions. I will keep my original rating.

给作者的问题

Please refer to Other Strengths And Weaknesses.

论据与证据

The experimental claims are supported with thorough empirical results. However, my concern is the authors need further justification about why using logits distributions as a proxy for class imbalance. Additionally, the method needs comparison against alternative distribution estimation methods except using Gaussian Mixture Models.

方法与评估标准

The selected metrics—mIoU and mAcc are well-suited for assessing performance, as they capture both the precision of segment overlap and overall classification accuracy. Additionally, the method itself is thoughtfully designed, with a clear and sound intuition underpinning its architecture and approach.

理论论述

The theoretical claims about the connection between logits distribution and class biases are well-defined. The authors clearly depict their relationships and provide convincing theoretical justifications.

实验设计与分析

The experimental systematically demonstrating improvements introduced by BLDA. The authors conduct extensive ablations on each model and various configurations, which strongly support the design.

补充材料

The supplementary material adequately details methodology and experiments.

与现有文献的关系

The work is well related with domain-adaptive semantic segmentation literature, clearly point out the limitations of prior class-balancing methods such as re-weighting and re-sampling.

遗漏的重要参考文献

There are plenty of domain adaptive semantic segmentation methods. I know it is not possible for the authors to plug their method to each of them to show the effectiveness of their method, but some typical methods representing different technical solutions need to be compared. For example, comparing to the works listed below

[1] Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018.

[2] Confidence regularized self-training, ICCV 2019.

[3] Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation, NeurIPS 2020.

其他优缺点

Strengths:

-Clear written paper.

-Clearly identified problem of class imbalance unique to UDA.

Weaknesses:

-Lack of analysis concerning computational overhead and scalability.

-Need further justification about why using logits distributions as a proxy for class imbalance.

-Need further comparison against alternative distribution estimation methods except using Gaussian Mixture Models.

其他意见或建议

No other comments.

作者回复

We thank the reviewer for the positive and constructive feedback, as well as for acknowledging our contributions, including the clear problem formulation, the clarity of the writing, and the effectiveness of empirical validation. We address each of your concerns point by point.


Q1: Computation Overhead

A1: We provide a detailed analysis of the computational overhead in Appendix I, including actual resource usage and training time across various baselines. In summary, the additional cost introduced by our proposed components stems from three main operations. We have implemented them efficiently to minimize overhead:

  1. GMM Implementation: Instead of using off-the-shelf libraries that update Gaussian parameters sequentially, we store the parameters of all C×C×KC \times C \times K components as tensors in PyTorch and update them in parallel using matrix operations.
  2. CDF Computation: We approximate the cumulative distribution function using the Abramowitz-Stegun formula, which allows efficient polynomial evaluation.
  3. Inverse CDF Computation: We use interpolation techniques within the estimated value range, which avoids costly numerical inversion.

All the above operations can be efficiently performed using simple matrix operations on tensors. Moreover, the storage of Gaussian component parameters and the additional regression head (a 1×11\times 1 conv) introduced by our method are lightweight. Overall, our method demonstrates high efficiency in both training time and GPU memory, as reported in Table 13 of Appendix I. Moreover, our method introduces no additional overhead during inference.


Q2: Why use logits distributions

A2: In UDA, explicit class distribution priors are often unavailable, especially in the target domain, where distribution shift is severe. Therefore, we propose leveraging the logits distributions as an online proxy to assess a model’s class-wise prediction bias.

We justify this design choice from both theoretical and empirical perspectives:

  1. Theoretical Justification: As shown in Definition 2, the prediction bias Bias(l)\mathrm{Bias}(l) is directly related to the probability of predicting class ll across all ground-truth classes, which in turn depends on the relative distributions of logits. We further show in Eq. (5) that under the assumption of independent logits distribution, this probability can be estimated by comparing the positive and negative class logits. A sufficient condition for unbiased prediction is that these distributions are aligned.

  2. Empirical Evidence: In Figure 1(d), we demonstrate a clear linear correlation between the prediction bias and the differences in logits distributions across classes. This supports our hypothesis that logits distributions serve as an effective proxy to capture class imbalance in the network’s behavior.


Q3: Alternative Distribution Estimation Methods

A3: Thank you for this valuable suggestion. We chose GMMs primarily for their balance between modeling capacity and computational efficiency, which is critical for our online training setup. Specifically:

  • GMMs can approximate a wide range of 1D distributions with a small number of parameters.
  • They allow closed-form CDF and inverse CDF computation (using polynomial approximations), enabling efficient matrix-based implementation for large-scale training.
  • The distributions we model are scalar-valued logits for each class pair (c,l)(c, l), making GMMs a sufficiently expressive and computationally tractable choice.

Additionally, as shown in Appendix M, GMMs empirically fit the logits distributions well, and as discussed in Appendix E, they converge quickly during training. While more complex estimators (e.g., kernel density estimation or deep density models) could be considered, they often introduce significant overhead and do not offer closed-form CDFs, which are essential for our framework.


Q4: References

A4: Thank you for pointing this out. These references are indeed important and representative works in UDA for semantic segmentation. We will include them in the Related Work section in the revised version and compare with them in our experiments.


We hope our response can resolve your concern. Please do not hesitate to let us know if you have further questions.

最终决定

All reviewers acknowledged that the paper presents solid novelty and is well-written, well-structured, and easy to follow. Initially, some concerns were raised regarding the limited scope of experiments, particularly the lack of comparisons with alternative distribution estimation methods and the fairness of existing comparisons. However, during the rebuttal phase, the authors addressed these concerns by providing additional experiments. AC has also reviewed the paper, the reviewer comments, and the rebuttal, and agrees that the motivation is clear, the writing is strong, and the experimental evaluation is thorough. Therefore, the AC recommends acceptance. It is recommended that all additional experiments and discussions from the rebuttal be incorporated into the final version of the paper.