PaperHub
7.1
/10
Poster5 位审稿人
最低4最高5标准差0.5
5
4
5
4
4
4.2
置信度
创新性2.8
质量2.8
清晰度2.6
重要性3.0
NeurIPS 2025

MATCH: Multi-faceted Adaptive Topo-Consistency for Semi-Supervised Histopathology Segmentation

OpenReviewPDF
提交: 2025-05-09更新: 2025-10-29

摘要

关键词
Digital pathologyMedical image segmentatonSemi-supervised Learning

评审与讨论

审稿意见
5

This paper introduces a method with three modules for the segmentation of histopathological images: the first module aims at increasing the consistency across predictions from slightly augmented images (semi-supervised learning); the second module aims at increasing the consistency across multiple stochastic predictions (MC dropout); the third module aims at increasing the topological consistency across different snapshots of the model during training. The method is compared with prior related work, and ablation studies are conducted.

优缺点分析

Strengths

  • Novelty: consistency matching strategies are common in semi-supervised learning, but they are typically focused on pixel-level consistency and not topology consistency, which is novel.
  • Number of experiments: experiments include comparisons with several related modern prior work on three datasets, ablation and sensitivity studies.
  • Importance in microscopy imaging:Since there exist fully-annotated microscopy datasets and annotating new ones is time-consuming, new methods in semi-supervised learning strategies impact scientific areas utilizing microscopy imaging.
  • Clarity: the method is well described.

Weaknesses

  • Experiments were conducted only once (one random seed).
  • The ablation studies were only conducted on two out of the three components. Specifically, there is an ablation study for the "intra" and for the "temp" terms, but not for the "const" term in the equation above Section 4.
  • The ablation studies show that the method is sensitive to the hyper-parameters (see Table A, B_intra and B_temp value, whose best value was 4; not 3 or 5). It also doesn't seem to be an easy/intuitive way to tune them. Furthermore, it hyper-parameter values probably depend on the dataset.
  • Running time, which is crucial especially in Persistence-Homology-based methods, is unreported.

Minor:

  • Typo in 211: "EM [48]. UA-MT" it should be a comma.
  • Typo in L62: "topolgoical"
  • L261: This is not an ablation study but a sensitivity study, i.e., the goal is to measure how sensitive the method is to a hyper-parameter choice.

问题

  • What's the exact running time of this method? And what about the running time of a baseline that does not include the persistence-homology method?
  • There are two algorithms to compute the Matches (Section 3.1 and 3.2). Which one is actually used for the final method?
  • How/which statistical significance tests were used?

局限性

A limitation is mentioned in the supplementary material. I think that there are more obvious limitations with this method. First, relying on persistence homology typically translates to extremely large training time (see one of my questions). Second, since this method relies on different "modules" (see the three extra terms in the total loss function, above Section 4), the underperformance on one of them due to dataset particularities could lead to a general underperformance. Additionally, the sensitivity analysis on the hyper-parameters show that an optimal hyper-parameter value depends on the dataset, and there is no easy/intuitive way to tune it.

最终评判理由

My final recommendation is the same as my initial recommendation: Acceptance.

After engaging with the authors during the discussion period, two of the weaknesses that I highlighted were addressed. I have also followed the comments from the other reviewers and the authors' answers, and I believe that they were addressed to a good extent as well. I still believe that there are some limitations in the method: 1) the method being too slow (which makes it unusable for 3D images), 2) the method being sensitive to the hyper-parameter values. However, with these two weaknesses, acknowledged by the authors, I find that this work has sufficient technical novelty to be accepted in NeurIPS.

格式问题

I have no concerns regarding paper formatting.

作者回复

We sincerely thank the reviewer for the comprehensive comments and appreciate the recognition of our key contributions: the novelty of extending consistency matching from pixel-level to topological consistency in semi-supervised learning, the thoroughness of our experimental validation across three datasets with extensive baselines and ablation studies, and the practical importance for microscopy imaging, where annotation is time-consuming. We are also grateful for the positive feedback on methodological clarity. We will modify the typos and make clearer statements in the final version.

Q1: Experiments were conducted only once (one random seed); How/which statistical significance tests were used?

A1: Our experiments were conducted with multiple random seeds (as evidenced by the error bars reported in Table 1), and we employed standard deviation calculations across these runs to assess variability. For statistical significance testing, we used unpaired t-tests to compare our method against baseline approaches, with significance levels set at p < 0.05. The error bars in our results represent standard deviations across 55 independent runs with different random initializations, ensuring reproducibility and statistical validity.

Q2: Ablation study on "const" term.

A2: We conducted the ablation study on the “const” item and report the performance below:

MethodDice_obj ↑BE ↓BME ↓
w/o const0.875 ± 0.0080.285 ± 0.0259.850 ± 0.680
Ours0.909 ± 0.0050.188 ± 0.0187.425 ± 0.570

Based on the ablation study results and the principles of semi-supervised learning, removing the pixel-wise consistency term in the training stages would result in significant performance degradation across all metrics. Without this foundational constraint, the model would rely solely on limited labeled data and the sparse topological consistency signals, which only constrain specific critical points rather than the complete segmentation boundary.

Q3: The way to tune the hyperparameters is not that easy and intuitive.

A3: The hyperparameter sensitivity observed in our ablation studies (Table 3 (a)) reflects careful optimization rather than a limitation. The optimal values (Bintra=Btemp=4B_{intra} = B_{temp} = 4) were systematically determined through grid search on the CRAG dataset and demonstrate robust transferability across domains. Importantly, these hyperparameters were tuned exclusively on CRAG and directly transferred to the two additional evaluation datasets without further adjustment, achieving consistent performance improvements.

Q4: Exact running time of this method and a baseline that does not include the PH-based method.

A4: We provide detailed runtime analysis comparing our method against non-persistent homology baselines. Our MATCH framework requires 1020.04 ms per training iteration with dual-level topological consistency, while PMT [8] (a non-PH baseline) at 582.34 ms per iteration, with total training time of 4h36m and peak GPU memory usage of 25.726 GB (batch size 16, UNet backbone). While this computational cost is substantial, it is justified by the significant improvements in topological accuracy essential for reliable histopathology analysis, where the clinical cost of topological errors far outweighs the additional computational expense, and our method remains practically feasible on standard single-GPU systems.

Q5: Which one is actually used for the final method?

A5: Sorry for the misleading. Our final method employs both algorithms in a hierarchical manner: MATCH-Pair (Section 3.1) serves as the foundational pairwise matching component, while MATCH-Global (Section 3.2) extends this approach to handle multiple facets by sequentially applying MATCH-Pair between adjacent prediction pairs and then constructing a global correspondence graph. Specifically, MATCH-Global utilizes MATCH-Pair's Hungarian algorithm with our spatial-aware similarity metric to establish pairwise correspondences, then performs breadth-first search on the resulting graph to identify globally consistent topological structures across all MC dropout predictions and temporal snapshots. Therefore, MATCH-Global represents the complete matching framework used in our dual-level topological consistency approach, with MATCH-Pair functioning as its core algorithmic building block.

评论

After reading the authors' answer, two of four main weaknesses that I found can be crossed out, as (W1) the experiments were run multiple times and (W2) the third term of the loss function was also subject to an ablation study.

The other two weaknesses remain: (W3) the method seems to be sensitive to the hyper-parameter values and the hyper-parameters are not intuitive to optimize, i.e., one needs very expensive grid-search to tune them; and (W4) the method is very expensive to run, which is to be expected since it relies on persistence homology. The main issue with W4 is that it's probably unrealistic to use it with 3D images, but, on the other hand, this method was developed for and evaluated in 2D histopathology images.

I've also read the other reviews and the authors' responses, and, while I will keep being active during the rebuttal period, and with the two aforementioned weaknesses (W3, W4), I would still recommend Acceptance.

评论

Thank you very much for your follow-up. We are pleased that we have solved your concerns W1 & W2. Regarding the remaining points:

Hyper-parameter sensitivity (W3): We have found that performance remains stable within a modest range around the default values and will add these recommended settings, along with brief tuning guidelines, to the final version.

Computational cost (W4): While the current implementation is designed for 2D histopathology images, we fully recognize the need for greater efficiency, especially for potential 3D applications, and will actively explore lighter-weight or approximate persistence modules in future work.

We appreciate your willingness to recommend acceptance in spite of these remaining issues, and we hope the planned clarifications and optimizations will mitigate your concerns. Thank you again for your constructive feedback and for the time you have invested in improving our work.

审稿意见
4

This paper proposes MATCH, a semi-supervised segmentation framework specifically designed for histopathology images. It effectively mitigates topological errors by imposing adaptive topological consistency constraints. A central and novel aspect of this approach is the ability to discern reliable topological structures from multiple perturbed predictions, eliminating the reliance on fixed persistence thresholds, a requirement in prior methods such as TopoSemiSeg. The framework innovatively introduces a dual-level topological consistency mechanism. This includes intra-topological consistency across Monte Carlo dropout predictions and temporal-topological consistency across training snapshots. Moreover, the method utilizes novel matching algorithms, namely MATCH-Pair and MATCH-Global. These algorithms ingeniously integrate spatial overlap, topological persistence, and spatial proximity to accurately align topological structures across different predictions. Comprehensive experiments conducted on three histopathology datasets, namely CRAG, GlaS, and MoNuSeg, vividly demonstrate the superiority of MATCH. The results show notable improvements in both pixel-wise and topology-wise metrics when compared to existing state-of-the-art semi-supervised methods.

优缺点分析

Strengths

  1. The paper effectively overcomes a crucial shortcoming of existing topology-aware semi-supervised approaches by eliminating the need for hand-picked persistence thresholds. The adaptive identification of meaningful topological structures is well-motivated and solid from a technical perspective.

  2. The dual-level consistency framework is well-designed, combining intra-prediction consistency (MC dropout) with temporal consistency across training epochs. The MATCH-Pair and MATCH-Global algorithms effectively integrate multiple criteria (spatial overlap, persistence, proximity) for robust structure matching.

  3. The evaluation is comprehensive, covering three datasets with both pixel-wise and topology-specific metrics. The consistent improvements across different label ratios (10%, 20%) and datasets demonstrate the method's effectiveness.

Weaknesses

  1. Although the method purports to eschew hand-selected thresholds, it incorporates several hyperparameters ( τprimary\tau_{primary}) for which the sensitivity has not been comprehensively analyzed. The ablation studies are restricted and do not encompass all crucial parameters.
  2. The paper fails to sufficiently discuss the computational cost of the matching algorithms. In particular, MATCH-Global, which necessitates solving multiple Hungarian assignment problems, might impose a substantial constraint on practical implementation.
  3. The elaboration and pre-explanation of the theory are inadequate. Specifically, the significance of "Birth" and "Death" depicted in Figure 1 has not been expounded upon. It is further recommended to specify what exactly "dual-level" refers to in Figure 3.
  4. The paper has multiple grammatical mistakes and ambiguous expressions. For instance, "topolgoical" is misspelled in line 62, and there are issues with inconsistent notation. Also, in certain sections, the mathematical notation could be made more accurate.

问题

  1. Under what conditions does the matching algorithm fail? Can you provide examples where MATCH-Global produces incorrect correspondences and discuss potential mitigation strategies?
  2. It is stated that a novel integration of topological reasoning into SSL is one of the key contributions (Line 66). Nevertheless, the approach presented in this paper appears to be an enhancement of the TopoSemiSeg [57] approach. More evidence should be provided to demonstrate the innovativeness of the integration of topological reasoning and SSL in this paper.
  3. Given that there are some matching algorithms in this paper, could you please provide a detailed computational complexity analysis along with runtime comparisons against well-known or relevant baseline methods?
  4. Could you elaborate on whether and how the dropout rate of Monte Carlo (MC) affects the algorithm proposed in this paper?

局限性

The limitations section needs expansion in the main paper to include computational and methodological constraints.

最终评判理由

The author has effectively resolved my concerns in the rebuttal.

格式问题

No Concerns.

作者回复

We sincerely thank the reviewer for recognizing the key contributions of our work, including the adaptive topological consistency that removes the need for fixed thresholds, the well-designed dual-level consistency framework, and the effectiveness of our MATCH-Pair and MATCH-Global algorithms. We also appreciate acknowledging our comprehensive evaluation and consistent performance gains across datasets. We address your concerns point by point below.

Q1: Ablation study on the sensitivity of τprimary\tau_{primary}.

A1: We provide the ablation study on the sensitivity of τprimary\tau_{primary}. The results have shown that our method is really robust to the selection of τprimary\tau_{primary}. Moreover, the low threshold of 0.1 was chosen to be inclusive rather than restrictive: it allows more potential matches to be considered valid while letting the Hungarian algorithm determine optimal assignments based on our comprehensive similarity metric (combining spatial overlap, persistence weights, and proximity). This design philosophy aligns with our adaptive approach. Rather than using a high threshold to filter matches aggressively, we use a permissive threshold and rely on our sophisticated matching algorithm to identify the truly meaningful correspondences. This approach ensures we don't exclude potentially relevant topological structures prematurely, which would contradict our core contribution of avoiding hand-picked filtering thresholds.

τprimary\tau_{primary}Dice_obj ↑BE ↓BME ↓DIU ↓
0.050.906 ± 0.0060.195 ± 0.0197.850 ± 0.62041.750 ± 1.850
0.1 (current)0.909 ± 0.0050.188 ± 0.0187.425 ± 0.57040.250 ± 1.720
0.20.908 ± 0.0050.191 ± 0.0207.680 ± 0.59041.100 ± 1.780
0.30.905 ± 0.0060.201 ± 0.0218.150 ± 0.65042.850 ± 1.920

Q2: Computational cost of the matching algorithms.

A2: While MATCH-Global requires solving multiple Hungarian assignment problems, our method remains computationally feasible with a training time of 1020.04 ms per iteration compared to 610.80 ms for TopoSemiSeg. The peak GPU memory usage is 25.726 GB during training (batch size 16) and 8.49 GB during inference, enabling deployment on single-GPU systems.

Q3: Inadequate elaboration, pre-explanation of the theory, and has multiple grammatical mistakes.

A3: We acknowledge these critical concerns and will address them in the final version. Regarding theoretical exposition, we provide detailed explanations of the "Birth" and "Death" critical points in persistent homology in the Supplementary materials. We will move this foundational content to the main paper before introducing these concepts, while also enhancing Figure 3 to explicitly clarify that "dual-level" refers to our intra-topological consistency (across Monte Carlo dropout predictions) and temporal-topological consistency (across training snapshots). We will also carefully correct all grammatical errors and ensure consistent mathematical notation throughout the manuscript.

Q4: The failure cases of the matching algorithm.

A4: Our proposed algorithm may fail in the following cases: First, when dealing with highly fragmented or merged structures due to poor image quality or preprocessing artifacts, the flood-fill algorithm used to generate spatial masks may produce unreliable region boundaries, leading to inaccurate IoU calculations and subsequent mismatches. Second, in severe topological noise where numerous spurious connected components with similar persistence values are generated, the algorithm might struggle to distinguish between meaningful biological structures and artifacts, particularly when spatial proximity alone is insufficient for disambiguation. Additional analysis and visualizations will be provided in the final version.

Q5: More evidence should be provided to demonstrate the innovativeness of integrating topological reasoning and SSL in this paper.

A5: While we acknowledge that TopoSemiSeg [57] pioneered the integration of topological reasoning into SSL frameworks, our approach introduces several fundamental innovations that distinguish it from existing work. As reviewer ZP1X noted, our method presents novel contributions in addressing the core limitations of TopoSemiSeg's reliance on fixed, hand-picked persistence thresholds. Our key innovations include: (1) a data-driven approach to identify meaningful topological structures through multi-faceted predictions rather than predetermined thresholds, (2) the introduction of dual-level topological consistency that combines both intra-predictions (MC dropout) and temporal consistency across training snapshots, and (3) the development of MATCH-Pair and MATCH-Global algorithms that integrate spatial overlap, topological persistence, and spatial proximity for robust structure matching without ground truth. These contributions represent a paradigm shift from threshold-based topological filtering to adaptive, perturbation-based identification of stable topological features, fundamentally advancing how topological reasoning is integrated into SSL frameworks. The experimental results demonstrate that our approach significantly outperforms TopoSemiSeg across all topology-wise metrics while maintaining comparable pixel-wise performance, validating the effectiveness of our novel integration approach.

Q6: Computational complexity analysis along with the runtime comparisons of the matching algorithms.

A6: The computational complexity of our MATCH-Pair algorithm consists of several components: persistence diagram computation requires O(n3)O(n^3) time complexity using the reduction algorithm, where n=MNn=MN pixels, spatial mask generation via flood-fill takes O(n)O(n) time, and similarity matrix computation between k1k_1 and k2k_2 topological features in two persistence diagrams requires O(k1×k2)O(k_1 \times k_2) operations for IoU and Euclidean distance calculations, and the Hungarian algorithm for optimal assignment has O(min(k1,k2)3)O(min(k_1, k_2)^3) complexity, yielding an overall complexity of O(n3+k1k2+min(k1,k2)3)O(n^3+k_1k_2+min(k_1, k_2)^3). In comparison, Wasserstein matching (used in TopoSemiSeg) requires O(n3+k3)O(n^3+k^3) for calculating the persistence diagrams and the matching. Betti matching has O(n3+n2+k2)O(n^3+n^2+k^2) complexity from super-level filtration embedding and overlapping feature identification. Since knk \ll n in practice, the dominant O(n3)O(n^3) term from barcode computation is shared across all methods, while MATCH-Pair achieves superior matching accuracy with only marginal additional cost in the matching phase.

Q7: How does the dropout rate of MC-dropout affect the algorithm proposed in this paper?

A7: We add complementary ablation studies on the dropout rate of the MC-dropout. Other settings are kept unchanged. We conduct the ablation experiments on CRAG 20% labeled data and report the performance below:

Dropout RateDice_obj ↑BE ↓BME ↓
10%0.898 ± 0.0060.210 ± 0.0208.200 ± 0.650
20% (current)0.909 ± 0.0050.188 ± 0.0187.425 ± 0.570
30%0.910 ± 0.0050.185 ± 0.0177.350 ± 0.560
50%0.890 ± 0.0070.220 ± 0.0228.800 ± 0.720

The predicted results reveal an optimal dropout rate range of 20%-30% for our framework, where performance plateaus with minimal differences between these rates. Lower dropout rates provide insufficient perturbation diversity for reliable topological matching. In contrast, excessive dropout introduces detrimental noise that degrades both pixel- and topology-wise performance, confirming that moderate stochasticity is essential for effective topological consistency estimation.

评论

Thank you for the author's reply. Most of my concerns have been addressed. I still have one question that I would like the author to answer: As can be seen from the ablation experiment in Table 2a, the performance of the proposed method is greatly affected by the matching algorithm. However, the results of the baseline TopoSemiSeg (which uses Wasserstein Matching) are significantly better than those of the proposed method using Wasserstein Matching. How can it be demonstrated that the other designs of the proposed method (such as dual - level topological consistency) are effective?

评论

Dear Reviewer Hseg:

Thank you for your thorough review and the valuable feedback you provided. We have tried to address all the concerns and questions you raised by providing additional experimental results, detailed clarifications, and comprehensive responses to each point you mentioned.

We hope these additions have adequately addressed your concerns. If you have any remaining questions or would like further clarification on any aspect of our work, we will be happy to provide additional details or experiments.

Thank you for your time and consideration.

Best,

Authors of paper #11015

评论

Dear Reviewer Hseg,

Thank you for raising this point. The comparison in Table 2a separates two independent elements of our framework: the choice of matching algorithm and the use of dual-level topological consistency.

Why does Wasserstein matching not perform well in our setting? When we substitute Wasserstein matching into our pipeline, performance drops because our method relies on repeated correspondences across many pairs of perturbed predictions and training snapshots. Wasserstein distance compresses spatial information into persistence summaries; after successive applications, small localisation errors accumulate and weaken the ensuing consistency losses.

Instead, our MATCH-Pair strategy retains persistence but augments it with IoU and spatial-proximity cues, producing markedly more faithful correspondences. Table 2b confirms that each cue is essential, as removing either leads to a clear degradation across both pixel-wise and topology-wise metrics. To isolate the benefit of dual-level consistency itself, we present the ablation study in Table 3b. The resulting drop in segmentation quality and rise in topological errors demonstrate that spatially accurate matching is only one ingredient, and that enforcing consistency across both perturbations and time adds complementary gains. These observations indicate that the reduced performance seen with Wasserstein inside our pipeline does not undermine the dual-level design; rather, they highlight the necessity of a more precise matching algorithm when consistency is enforced repeatedly.

Sincerely,

Authors

评论

Thank you for the author's response. I do not have any further questions.

评论

Dear Reviewer Hseg,

We are super happy that we have addressed your concerns! If you haven’t already, we would greatly appreciate it if you could consider re-evaluating our work based on the additional clarifications.

Thanks again for your time and engagement in the discussion phase.

Sincerely,

Authors

审稿意见
5

The authors introduce MATCH, a self-supervised framework for image segmentation in the medical field based on a teacher-student network architecture, where the student learns from both supervised loss on labeled data (standard Dice + cross-entropy) and on unlabeled data (through consistency at the topological level). In particular, it uses dual-level topological consistency via stochastic predictions from Monte Carlo dropout and temporal training snapshots. This process allows the framework to learn useful representations.

优缺点分析

Strengths: The topological consistency is enforced by MATCH-pair, a Hungarian overlap-matching algorithm that integrates spatial overlap, topological persistence, and spatial proximity, which is rather interesting and well-motivated.

The contribution grounded in geometric intuition is supported by good uncertainty maps (Figure 5) and improved segmentation results (Figure 6).

The evaluation on three datasets is also quite thorough, and the proposed model seems to consistently outperform others.

Weaknesses: The results are only marginally better than the best current method in terms of segmentation quality. The Dice score is often just 0.001–0.006 higher than TopoSemiSeg, which is the main weakness in my opinion.

The computational cost of the topological loss in terms of model training and inference is not clearly described. How does the cost-value ratio look, particularly considering the marginal improvement in performance?

The authors do not compare their method with self-supervised approaches like SAM and its variants such as CellVit. Fine-tuning an SSL method on limited labeled data is also considered semi-supervision.

The novelty of the proposed method is not very high, as it is a combination of modules that are based on existing methods.

问题

What is the computational cost of the topological loss in terms of speed and memory consumption, particularly compared to other competitive methods like TopoSemiSeg, which uses Betti matching?

The main results in Table 1 do not show where your method saturates (there is still quite a difference between the 20% labeled and fully supervised results). How much can the unsupervised loss replace the contribution of supervised loss in training?

How is the generated uncertainty map correlated to real-world segmentation uncertainty, e.g., segmentation annotations by two experts?

局限性

One limitation I can think of is that the choice of hyperparameters in the model is somewhat equivalent to manually setting up thresholds for TopoSemiSeg, which the authors criticise and present as one primary motivation for their method. How stable are those hypoparameters?

最终评判理由

Authors did a strong rebuttal and I would incline to raise my score to accept.

格式问题

I did not find any

作者回复

We sincerely thank the reviewer for recognizing our proposed MATCH-pair algorithm, highlighting its integration of spatial overlap, topological persistence, and spatial proximity. The reviewer also acknowledged our method's geometric intuition, supported by clear uncertainty visualizations and improved segmentation performance.

Q1: Marginal improvement on pixel-level performance.

A1: We acknowledge the reviewer’s observation that the improvement in Dice scores over TopoSemiSeg is modest; however, as shown in Table 1, our primary objective was not simply to enhance an already-saturated pixel-wise metric but rather to restore topological fidelity without compromising segmentation accuracy. Even minor topological errors involving only a few pixels can significantly impact downstream analyses, thus emphasizing the importance and practical relevance of the improvements achieved by our approach.

Q2: The computational cost of the topological loss regarding speed and memory consumption.

A2: We appreciate the reviewer's question regarding the computational cost of our topological loss. Our proposed method requires 1020.04 ms per training iteration and a GPU memory usage of 25.726 GB, while TopoSemiSeg consumes 610.80 ms per iteration and 15.235 GB GPU memory under comparable experimental settings.

However, we respectfully clarify that TopoSemiSeg employs optimal matching based on the Wasserstein distance between persistence diagrams instead of Betti Matching, which does not account for spatial correspondences. In contrast, our dual-level topological consistency explicitly incorporates spatial correspondence, justifying the additional computational overhead by achieving better preservation of topological structures, which is critical for accurate downstream analyses.

Q3: Comparison with self-supervised methods finetuned on limited labeled data.

A3: We use LoRA to fine-tune the SAM and MedSAM using 20% labeled data on the CRAG dataset and report the performance below:

MethodDice_obj ↑BE ↓BME ↓
LoRA-SAM-CRAG-20%0.882 ± 0.0060.440 ± 0.04227.300 ± 2.937
LoRA-MedSAM-CRAG-20%0.898 ± 0.0050.268 ± 0.02511.275 ± 1.899
Ours0.909 ± 0.0050.188 ± 0.0187.425 ± 0.570

The results show that even with powerful foundation models, like SAM or MedSAM, topological errors can still exist without explicit topological modeling.

Q4: The novelty of the proposed method is not very high.

A4: Our work introduces several key innovations that advance semi-supervised histopathology segmentation beyond existing approaches. While prior topology-aware methods like TopoSemiSeg rely on fixed, hand-picked persistence thresholds that may exclude relevant structures or retain irrelevant ones, we propose the first adaptive topological consistency framework that automatically identifies meaningful structures without human-selected parameters. We introduce dual-level topological consistency by uniquely combining intra-prediction consistency (across Monte Carlo dropout realizations) with temporal consistency (across training snapshots), providing a more comprehensive approach to structural stability than single-level methods. Furthermore, we develop MATCH-Pair and MATCH-Global algorithms that integrate spatial overlap, topological persistence, and spatial proximity for robust feature matching across multiple predictions, addressing a critical limitation where existing methods like Wasserstein matching produce ambiguous correspondences due to the lack of spatial awareness. This multi-faceted adaptive approach represents a significant departure from threshold-dependent methods, enabling more robust identification of biologically meaningful topological structures while naturally providing uncertainty estimation as a byproduct of the consistency mechanism.

Q5: The main results in Table 1 do not show where your method saturates (there is still quite a difference between the 20% labeled and fully supervised results). How much can the unsupervised loss replace the contribution of the supervised loss in training?

A5: The observed performance gap between our method (trained with only 20% labeled data) and the fully supervised baseline reflects inherent constraints in scenarios typical to digital pathology, where extensive, high-quality annotations are rarely feasible. Given this practical limitation, our method effectively leverages unlabeled data through dual-level topological consistency losses, demonstrating substantial capability in extracting meaningful and robust representations. Although the results indicate potential for further performance gains with increased supervision, the significant improvement achieved underscores the efficacy of the unsupervised components in partially substituting supervised signals and enhancing generalizability, particularly when abundant annotations remain inaccessible.

Q6: How is the generated uncertainty map correlated to real-world segmentation uncertainty, e.g., segmentation annotations by two experts?

A6: While direct validation of uncertainty maps against ground truth segmentation uncertainty from multiple expert annotations would be ideal, such multi-annotator datasets are incredibly challenging to obtain in pathology due to the inherent difficulty and cost of pathological labeling. Consequently, we employ indirect validation methods to assess the correlation between our generated uncertainty maps and real-world segmentation uncertainty. These indirect approaches include computing Pearson correlation coefficients between predicted uncertainty and segmentation performance metrics and demonstrating the utility of uncertainty estimates through downstream applications such as uncertainty-guided active learning or segmentation refinement, where improved task performance is a proxy measure for uncertainty map reliability.

Q7: Sensitivity to hyperparameter tuning.

A7: In our evaluations, sensitivity analysis (Tables 3a and 3b) reveals that performance degrades gracefully across a wide parameter range, demonstrating robustness. Additional experiments, such as the ablation studies on the dropout rate and τprimary\tau_{primary} (Reviewer Hseg, Q1 & A1, Q7 & A7), also show the robustness of our method.

评论

Dear Reviewer iCZE:

Thank you for your time to help improve the quality of our manuscript. We hope our responses have fully addressed your concerns. Please kindly let us know if you have any remaining questions or concerns. We will be happy to provide further details and clarifications.

Thank you for your time.

Best,

Authors of paper #11015

评论

Thank authors for the nice rebuttal. I think it has addressed most of my concerns. I have one relevant comment to the author's claim" Even minor topological errors involving only a few pixels can significantly impact downstream analyses, thus emphasising the importance and practical relevance of the improvements achieved by our approach.". I agree with the statement, but want to see some more concrete evidence, if possible. For example, for the cell counting problem, a miss-segmentation on a few pixels at cell boundaries can lead to cell merging or splitting, resulting in counting errors. Or if the primary analysis is for cell/gland shape statistics, the difference in topology is also essential. Can authors provide such a down-stream analysis?

评论

Dear Reviewer iCZE:

Your suggestion makes total sense, and we performed a cell counting study on the same MoNuSeg test cohort. We used the connected component analysis to identify the cells and calculate the total cell count, the predicted total cell count, and the absolute counting error (mean ± std). The results are shown below:

MethodTotal GT Cell CountPredicted Cell CountAbsolute Counting Error (Mean ± Std)Dice_obj
PMT [8]60248106148.71 ±\pm 99.410.778 ±\pm 0.006
TopoSemiSeg [57]60247877132.36 ±\pm 56.090.793 ±\pm 0.004
MATCH60247511106.21 ±\pm 49.300.790 ±\pm 0.006

Note that the Total GT Cell Count and Predicted Cell Count are reported for the entire test cohort, while the Absolute Counting Error is reported on a per-image basis (with a total of 14 test images).

We observed that our method yields noticeably smaller counting errors than both baseline approaches (one topo method and one non-topo method). This confirms that although the pixel-wise segmentation performances are comparable, fixing the topological errors on a few pixels leads to more accurate biological readouts. We will add this additional analysis to the revised version.

Sincerely,

Authors

评论

Many thanks for the additional experimental results in such a short time. That addressed my concerns and will raise my score.

评论

Thank you very much for your positive feedback, and we’re glad the additional results addressed your concerns. We appreciate your support and consideration.

审稿意见
4

The authors aim at improving histopathology image segmentation when unlabeled data are predominant during training. For this reason, they propose a Semi-Supervised Learning (SSL) image segmentation framework and enforce image segmentation prediction robustness against perturbations adopting topological reasoning. In particular, the authors model perturbations with Monte Carlo dropout mechanism and by considering the predictions in different training epochs. As for the topological consistency, they formulate the structure correspondence task as a contrastive learning problem to distinguish features considered stable when matching per spatial overlap, topological persistence and spatial proximity criteria across themselves (via MATCH-Pair) and across multiple predictions (via MATCH-Global). The method is validated in the task of binary gland and cell segmentation across 3 different datasets and with different percentages of label data, ablation study correlates the results analysis.

优缺点分析

Strengths:

  • The proposed method appears technically sound and even though it does not introduce an entirely new method (SSL, contrastive learning paradigm, persistent diagrams, Monte Carlo dropout perturbations, consistency between different training temporal views, Hungarian matching logic are already known technique and widely used in the state of the art), their usage context and combination appears in this work original and more efficient than other state-of-the-art methods (i.e. exploiting the persistent structures between perturbations allow avoiding hand-picking threshold to determine reliable structures as happening in TopoSemiSeg). Notably, the MATCH-Pair and MATCH-Global can be considered a smart glue to define valid matches between 2 persistent diagrams and between multiple facets, respectively.

Weaknesses:

  1. The authors state that they use a pretrained model to initialize the SSL training. However, it is unclear whether this pretraining was performed using only the limited labeled data available to the student model, or if additional labeled data were used. If the teacher model is pretrained on a larger labeled dataset than what is available during the SSL phase, this would violate the core assumption of semi-supervised learning, namely, that only a small fraction of labeled data is accessible. In such a case, the observed performance gains may stem from the additional supervision during pretraining rather than from the proposed SSL strategy itself.
  2. The authors state that capturing meaningful semantic structures from unlabeled data is essential, particularly in scenarios where objects are densely distributed (l. 3). They also acknowledge that such dense distributions often lead to topological errors (l. 21). This is indeed a critical aspect in histopathology image segmentation. However, despite choosing the MoNuSeg dataset, which contains patches with densely packed nuclei, the authors do not analyze how the model's performance varies with respect to object crowding.
  3. The authors evaluate their method on a combination of binary (CRAG) and inherently multiclass (GlaS and MoNuSeg) datasets, but convert all tasks into binary segmentation. While the decision to reduce the problem to a binary setting is understandable given the technical challenges of extending semi-supervised learning to multiclass segmentation, it nonetheless limits the scope of the evaluation and the relevance of the results to real-world histopathology applications. More critically, the method does not account for instance-level structure, despite the authors’ emphasis on the challenges posed by densely packed cellular objects. Unlike the multiclass case, instance-level information is essential for properly resolving overlapping nuclei, and omitting this consideration weakens the effectiveness of the proposed topological reasoning. Topological consistency alone may struggle to disambiguate closely spaced structures without explicit instance-level constraints.
  4. The variables did_i and bib_i​, first introduced at line 147, are never explicitly defined in the manuscript. While their meaning may be inferred by readers familiar with SSL literature (the birth and depth points in the persistence diagrams), this lack of definition may hinder understanding for a broader audience. For clarity and completeness, all variables, especially those central to the method, should be explicitly defined when first introduced
  5. The authors do not clearly specify the source of the likelihood maps lh1lh_{1} lh2lh_{2} (l. 147) specifically, whether these correspond to the output of a sigmoid activation applied to the final layer of the UNet. Clarifying this detail would improve both the clarity and reproducibility of the method, as the exact definition of these maps is critical for understanding how predictions are calibrated and how consistency is enforced during training.
  6. It is unclear whether the index notation i,ji,j, used at line 153 is correct in the given context. Additionally, the indices i,j,k,i,j,k, are never explicitly defined in the manuscript, which may cause confusion, especially for readers trying to follow the mathematical formulation in detail. Providing a clear definition of the indexing convention would improve the rigor and readability of the method description.

Minor Weaknesses:

  1. line 62 typo topolgoical
  2. since Figure 3 is referenced before LintraL_{intra} and LtempL_{temp} it is better to repeat their name definition in the caption
  3. line 177 typo: missing {
  4. line 177 fragmented sentence: G identified…
  5. line 196 format typo inconsistency with Eq. at line 195: P(t)bt,iP^{(t)}bt, i and P(t)dt,iP^{(t)}dt, i

问题

  1. Could the authors clarify whether the teacher model used in the SSL framework was pretrained using only the same limited set of labeled data available to the student? If additional labeled data were used during pretraining, how do the authors justify this in the context of a semi-supervised learning setup, where strict limitations on label availability are assumed?
  2. Given the authors’ emphasis on the challenges posed by densely packed structures, have they evaluated how the model’s performance varies with object density (e.g., comparing sparse vs. crowded regions in MoNuSeg)? Would the authors consider including a stratified analysis or crowding-aware ablation to validate their method's robustness under varying tissue densities?
  3. While simplifying all tasks to binary segmentation may ease implementation, have the authors considered extending or evaluating their method in the multiclass setting, particularly for GlaS and MoNuSeg? More critically, given the importance of separating overlapping nuclei in histopathology, why was instance-level structure not considered, either during training or evaluation? Could the topological matching mechanism benefit from incorporating explicit instance constraints to better resolve ambiguous or overlapping objects?
  4. Could the authors define the variables did_i and bib_i​ more clearly when first introduced (l. 147), especially for readers unfamiliar with persistent homology?
  5. What is the precise definition and origin of the likelihood maps lh1lh1​ and lh2lh2? Do these correspond to the sigmoid-activated outputs of the final UNet layer, or are they derived differently?
  6. At line 153, is the use of the index notation i,ji,j consistent with the dimensions and semantics of the features being described? Furthermore, could the authors clarify the role of the indices i,j,ki,j,k, as they are used without explicit definition and may confuse readers interpreting the formulation?

局限性

The authors have adequately addressed the potential negative societal impact of their work

最终评判理由

After reading the authors’ responses, I find that five of the six main weaknesses I previously identified have been satisfactorily addressed:

(W1) has been verified through the authors' clarification; (W2) is now supported by new experiments; (W4–W6) are resolved by clearer notation and a commitment to update the final manuscript accordingly.

One point of improvement remains: (W3) pertains to the scope of the method, which is currently explicitly developed for binary segmentation. While binary segmentation is indeed a relevant and useful task in computational pathology, expanding to multiclass, instance (or even panoptic) segmentation would enable explicit counting, differentiation and classification between various tissue structures, support quantitative analysis, and facilitate tissue phenotyping, key tasks in digital pathology. The authors have provided appreciated multiclass segmentation results in their response and evidence of implicit instance segmentation effects provided by their strategy, but the model could be further improved by incorporating explicit instance segmentation strategies, which they acknowledge (with the proposal of potential strategies) as future work.

I have also read all other reviews and the authors’ responses. Considering that most concerns have been addressed, aside from the other two remaining weaknesses highlighted by reviewer ZP1X, I would slightly raise my score to 4 (Borderline accept)

格式问题

No Paper Formatting Concerns

作者回复

We sincerely thank the reviewer for the thorough technical evaluation and appreciate the recognition that our novel combination of established techniques represents an original and more efficient approach than existing methods. We are particularly grateful for acknowledging that our method successfully avoids hand-picked threshold limitations and for characterizing MATCH-Pair and MATCH-Global as "smart glue" for defining valid matches across persistence diagrams, which accurately captures our core innovation. We will carefully modify the typos and polish the script in the final version.

Q1: SSL training initialization problem.

A1: Thanks for pointing out the unclear parts. Both teacher and student models start from the same backbone initialization. No additional labels are involved in the pretraining stage. The models see the same labeled data during the two stages. Using the supervised loss on the labeled subset together with a pixel-wise consistency loss on unlabeled patches, we obtain roughly correct probability maps that capture most objects but may contain topological defects (splits, merges, spurious holes). Then, we incorporate our dual-level topological consistency losses in the second stage to explicitly penalize missing/overlapping components and fix the topological errors.

Q2: Crowding-aware ablation.

A2: We appreciate the reviewer’s suggestion and have now quantified the influence of nuclei density on model performance. We randomly cut the test images into patches of size 256×256. For every patch, we count nuclei in the ground-truth instance map. Patches with <= 30 nuclei are labelled Sparse; those with >=100 nuclei are labelled Crowded. As reported in the main paper, the whole test image indicates that we did the inference on the entire test image. We sampled 14 samples to achieve a fair comparison and show the results below:

SettingDice_obj ↑BE ↓BME ↓
Sparse (Ours, 30\leq 30 cells)0.804 ± 0.0044.620 ± 0.140163.132 ± 2.136
Crowded (TopoSemiSeg, 100\geq 100 cells)0.756 ± 0.0096.890 ± 0.240198.525 ± 3.125
Crowded (Ours, 100\geq 100 cells)0.774 ± 0.0075.610 ± 0.198186.313 ± 2.715
Ours (whole test image)0.790 ± 0.0064.930 ± 0.156179.225 ± 2.383

The experiments above verify that our approach is density-aware. It achieves state-of-the-art accuracy on typical tissue, excels in sparse fields, and maintains a clear advantage over the strongest baseline under extreme nuclear crowding. The revised final version will include the full stratified and the crowding-aware ablation studies.

Q3: Extend from binary to multi-class segmentation. Lack of explicit instance-level constraints.

A3: To extend our method to the multi-class setting, we choose a multi-class nuclei segmentation dataset, MoNuSAC, to conduct experiments. This dataset contains four cell types: Epithelial, Lymphocyte, Macrophage, and Neutrophil. We conducted experiments using 20% labeled data and report the class-wise performance of TopoSemiSeg and our method below:

ClassMethodDice_obj ↑BE ↓BME ↓
EpithelialTopoSemiSeg0.778 ± 0.0095.342 ± 0.187195.158 ± 4.627
Ours0.781 ± 0.0085.128 ± 0.189186.847 ± 3.958
LymphocyteTopoSemiSeg0.751 ± 0.0136.089 ± 0.223218.394 ± 5.841
Ours0.756 ± 0.0125.794 ± 0.235207.693 ± 4.672
MacrophageTopoSemiSeg0.765 ± 0.0115.687 ± 0.201206.732 ± 4.985
Ours0.769 ± 0.0105.423 ± 0.208195.381 ± 4.127
NeutrophilTopoSemiSeg0.738 ± 0.0166.521 ± 0.267234.576 ± 6.123
Ours0.742 ± 0.0156.187 ± 0.281221.459 ± 5.894

As demonstrated in our class-specific results on MoNuSAC (Epithelial, Lymphocyte, Macrophage, and Neutrophil), our approach consistently outperforms TopoSemiSeg across all cell types, with particularly notable improvements in topological metrics (BE and BME) that are crucial for distinguishing overlapping structures. Regarding instance-level segmentation, our topological matching mechanism could benefit from incorporating explicit instance constraints, as the spatial overlap and proximity components in our MATCH-Pair algorithm already capture some instance-level information by distinguishing spatially separated objects with similar topological persistence. Future work will explore extending our dual-level consistency framework to explicit instance segmentation tasks, where the topological stability across perturbations could provide robust supervision for resolving ambiguous boundaries between overlapping nuclei, potentially incorporating instance-aware loss terms that leverage our adaptive matching capabilities.

Q4: The unclear definition of the bib_i and did_i.

A4: We will add the introduction before using it in the final version. bib_i is the threshold at which a connected component first appears (birth), while did_i is the threshold at which that component merges with an order one or vanishes. Their values are the pixel values of the critical points. A primer on persistent homology already appears in the Supplementary, and we will move an abbreviated version of this introduction to the main paper in the final version.

Q5: The lack of a precise definition of the likelihood maps.

A5: We thank the reviewer for this important clarification request regarding the likelihood maps lh1lh_1 and lh2lh_2. To clarify, the likelihood maps correspond to the softmax-activated outputs of the final UNet layer, representing normalized probability distributions over segmentation classes for each pixel. In the final version, we will add this specification to improve methodological clarity and reproducibility.

Q6: Undefined indices, i,j,ki, j, k.

A6: We acknowledge this critical concern regarding notation and will clarify the index definitions for improved readability. At line 153, the indices i,j,ki, j, k represent specific roles in our formulation: ii and jj index the topological features from the first and second persistence diagrams respectively (where i{1,...,n1}i \in \{1, ..., n_1\} and j{1,...,n2}j \in \{1, ..., n_2\} with n1,n2n_1, n_2 being the number of features in each diagram), while kk distinguishes between the two likelihood maps being compared (i.e., k{1,2}k \in \{1, 2\}). The notation wk,iw_{k, i} thus refers to the normalized persistence weight of the ii-th topological feature in the kk-th likelihood map, ensuring dimensional consistency throughout our similarity metric computation. In the final version, we will add explicit definitions of all indices and their ranges to eliminate potential confusion and enhance mathematical clarity.

评论

Thank you to the authors for the detailed and thoughtful rebuttal. I found the response informative and appreciate the additional experiments. However, I have a few follow-up questions for clarification.

Q2: Thank you for including the stratified analysis. Just to clarify, is this density-based evaluation conducted on the MoNuSeg dataset or MoNuSAC? Additionally, could the authors elaborate on how the thresholds for Sparse (≤30 nuclei) and Crowded (≥100 nuclei) patches were chosen? Were these based on statistical analysis of nuclei counts across all patches (e.g., mean and standard deviation)? Lastly, could you clarify what the “14 samples” refer to, does this correspond to 14 patches per setting, or 14 total images, and how was this number determined to ensure a fair comparison?

Q3: Thank you for the detailed response.

  • Were these test results obtained from the same trained model reported in the main paper, or was the model retrained specifically for this ablation? Could you specify which loss function/modification was used if retrained or not?
  • Since the MATCH-Pair mechanism appears to implicitly encode some instance-level structure through spatial overlap and proximity, have the authors considered quantitatively analyzing this effect, even within the binary setting, using instance-sensitive metrics (i.e. average precision) and BE, BME, or DIU? Given that these metrics (BE, BME, or DIU) are label-agnostic yet reflect instance structure (e.g., BME strongly penalizes topological mismatches and DIU highlights segmentation ambiguity due to overlaps), such an analysis could provide insight into how much implicit instance-level consistency your method already captures. This would also help contextualize the potential gains of explicitly integrating instance constraints in future work.
评论

Thank you very much for your response, and we are trying to clarify your additional concerns as follows.

Q2:

  1. The density-based evaluation was conducted on the MoNuSeg dataset, binary segmentation.

  2. Sorry for the confusion. The thresholds for Sparse (≤30 nuclei) and Crowded (≥100 nuclei) patches were determined through statistical analysis of nuclei density distribution across our complete dataset. We calculated the mean nuclei count per patch (μ) and standard deviation (σ), with Sparse patches defined as those falling below (μ - 2σ) and Crowded patches as those exceeding (μ + 2σ). Then, we randomly select 14 patches for density-aware ablation studies.

  3. Because MoNuSeg contains 14 total images in the test set, the performance we reported in the main paper is based on these 14 total images. To achieve a fair comparison, we select 14 patches for each density-based evaluation.

Q3:

  1. For the multi-class nuclei segmentation ablation study, we retrained new models using the same hyperparameters, loss functions, and training procedures as reported in the main paper, with the key modification being the conversion from binary to multi-class segmentation output.

  2. As stated in our previous response, “Regarding instance-level segmentation, our topological matching mechanism could benefit from incorporating explicit instance constraints, as the spatial overlap and proximity components in our MATCH-Pair algorithm already capture some instance-level information by distinguishing spatially separated objects with similar topological persistence.” We would like to expand this claim as follows.

  • While quantitative analysis of our implicit instance-level effects is challenging because instance-level effects may manifest differently across various nuclei sizes and densities, we provide a qualitative study of the implicit instance-level information effect in the main paper. As demonstrated in Figure 2 of the main paper, our matching algorithm successfully distinguishes spatially separated glandular structures even when they exhibit similar topological characteristics, effectively maintaining instance-level correspondence through spatial overlap and proximity cues. Moreover, quantitatively, as shown in our ablation study (Table 2 (b) of the main paper), removing the IoU or SP component results in performance degradation across all metrics, confirming that our matching strategy implicitly captures the coherent instance-level information.

  • Beyond implicitly capturing the instance-level information via our MATCH-Pair strategy, we could explore explicit instance-level constraints to benefit the model further in the following two ways. First, we could add a clustering post-processing step that uses our spatial correspondence to group matched structures into instances, then apply instance-level consistency loss during training. Second, we could integrate watershed instance separation that explicitly leverages our spatial matching components as seed points. Then, we could apply instance-level consistency losses that enforce topological correspondence between explicitly identified instances across prediction pairs. Both strategies would preserve our matching advantages while adding explicit instance supervision.

评论

After reading the authors’ responses, I find that five of the six main weaknesses I previously identified have been satisfactorily addressed:

  • (W1) has been verified through the authors' clarification;
  • (W2) is now supported by new experiments;
  • (W4–W6) are resolved by clearer notation and a commitment to update the final manuscript accordingly.

One point of improvement remains: (W3) pertains to the scope of the method, which is currently explicitly developed for binary segmentation. While binary segmentation is indeed a relevant and useful task in computational pathology, expanding to multiclass, instance (or even panoptic) segmentation would enable explicit counting, differentiation and classification between various tissue structures, support quantitative analysis, and facilitate tissue phenotyping, key tasks in digital pathology. The authors have provided appreciated multiclass segmentation results in their response and evidence of implicit instance segmentation effects provided by their strategy, but the model could be further improved by incorporating explicit instance segmentation strategies, which they acknowledge (with the proposal of potential strategies) as future work.

I have also read all other reviews and the authors’ responses. Considering that most concerns have been addressed, aside from the other two remaining weaknesses highlighted by reviewer ZP1X, I would slightly raise my score while I will remain engaged during the rebuttal period.

I thank you again the authors for their work and time.

评论

Dear Reviewer gBzP,

Thank you very much for reconsidering our response, and we are pleased to hear that we have successfully addressed most of your concerns.

Regarding W3 and the scope limitation to binary segmentation, we sincerely value your suggestion about the potential benefits of expanding to multiclass and explicit instance-level constraints. We will explore it in future work.

Thank you again for your thoughtful feedback, and your continued engagement in the discussion period.

审稿意见
4

This paper presents a semi-supervised segmentation method for histopathology images to resolve the problem of topological errors. It enforces topological consistency across different outputs, including outputs from stochastic dropouts and temporal training snapshots. The experiment shows that it reduces topological errors compared with other methods.

优缺点分析

Strengths:

  1. This paper proposes a innovative matching algorithm between a pair of perturbed predictions, which solves topological errors.
  2. This paper provides a new view of the reliability of the segmentation predictions.

Weaknesses:

  1. The concept of temporal topological consistency has been explored in semi-supervised learning, so I think as a standalone module it lacks novelty.
  2. The paper only extracts 0-D topological features for matching, without considering structures with holes or annular structures, which commonly exist in histopathology images.
  3. The method’s performance is only validated on three datasets restricted to gland and nucleus segmentation tasks, with no validation on other pathological tasks.
  4. The method relies solely on MC dropout for generating perturbed predictions without comparing it with other perturbation methods, which may limit the model's performance.
  5. The method generates multiple perturbations for each input image through Monte Carlo dropout, increase computational time and memory consumption, especially when dealing with whole-slide images (WSI).

问题

  1. Beyond the three datasets (CRAG, GlaS, MoNuSeg), how does the model generalize to other pathology tasks?
  2. The paper only extracts 0-D topological features. Do you plan to extend the framework to 1-D or higher-order topological features?
  3. Have you evaluated the risk of error propagation in pseudo-labels?

局限性

Yes.

最终评判理由

Thank authors for their response, which has addressed some of my concerns. However, the validation of 1D topological structure capability was conducted solely on the Roads dataset, with no evidence on medical datasets for critical histopathological structures(e.g., holes or annular configurations). Additionally, there was no lightweight strategies to mitigate the computational cost. While authors assert that "the MATCH algorithm and dual-level topological consistency are domain-agnostic", there was no cross-task validation evidence. I will maintain my original rating.

格式问题

No.

作者回复

We sincerely thank the reviewer for the thoughtful and positive feedback on our method. We greatly appreciate the recognition of the innovative matching algorithm we proposed to resolve topological errors between perturbed predictions and the acknowledgment of our contribution to a new perspective on assessing the reliability of segmentation predictions. Below, we address your concerns one by one.

Q1: The standalone module of temporal topological consistency lacks novelty.

A1: While we acknowledge that temporal consistency has been explored in semi-supervised learning, our contribution represents a novel and substantial extension beyond existing approaches in several key aspects.

  • First, unlike previous temporal consistency methods that operate at the pixel or feature level, our work introduces the first application of temporal topological consistency specifically for preserving topological structures across training snapshots.
  • Second, our dual-level consistency framework uniquely combines both intra-topological consistency (across MC-dropout predictions) and temporal consistency within a unified topological reasoning paradigm, rather than treating them as separate mechanisms.
  • Third, integrating temporal consistency with our novel MATCH-Global matching algorithm creates a synergistic effect that enables robust identification of stable topological structures without requiring hand-picked persistence thresholds—a critical limitation in existing topology-aware methods like TopoSemiSeg.

Q2: Only 0-D topological features are extracted, without considering structures with holes or annular structures.

A2: Our decision to concentrate on 0-D topological features was motivated by the following factor: For the primary applications in our study (gland and nuclei segmentation), the most critical topological errors involve incorrect splitting or merging of individual structures, which are well-captured by 0-D persistent homology.

For validation on 1-dimensional structures, we conducted additional experiments on the Roads dataset [r1], which contains aerial images of road networks with complex branching and connectivity patterns. While this dataset is not histopathological, it verifies that our method could also learn good topological representations from the unlabeled data regarding 1-dimensional topological features. The results are shown below:

Labeled RatioMethodBE ↓BME ↓DIU ↓
10%TopoSemiSeg8.324 ± 0.7299.681 ± 0.64710.952 ± 0.671
Ours7.892 ± 0.6348.147 ± 0.5219.376 ± 0.583
20%TopoSemiSeg7.467 ± 0.5828.213 ± 0.5149.387 ± 0.538
Ours6.983 ± 0.5077.024 ± 0.4368.149 ± 0.492

The complete extension results of our method to the 1-dimensional topological feature will be provided in the final version.

Q3: No validation on other pathological tasks.

A3: As suggested by our title, we focused on gland and nuclei segmentation as these represent fundamental histopathology tasks where topological errors are most critical and prevalent due to dense, overlapping structures. Our core contributions, the MATCH algorithm and dual-level topological consistency, are domain-agnostic. They rely on general topological properties rather than task-specific features. Accurate segmentation of the glands and nuclei could be the foundation of the downstream analysis. We will leave exploring other pathological tasks as future work.

Q4: The alternatives to generate perturbed predictions other than MC-dropout.

A4: We choose two alternative perturbation methods: Variational Inference (VI) [r2], which generates multiple predictions by sampling from the learned variational posterior distribution, and Temperature Scaling [r3], which produces diverse predictions through multiple sampling from temperature-modulated probability distributions. The experiments are conducted on CRAG 20% labeled data, and the results are shown below:

MethodDice_obj ↑BE ↓BME ↓
Variational Inference0.895 ± 0.0060.242 ± 0.0229.125 ± 0.685
Temperature Scaling0.891 ± 0.0070.258 ± 0.0259.850 ± 0.795
MC-Dropout0.909 ± 0.0050.188 ± 0.0187.425 ± 0.570

MC-dropout is chosen as a widely used and easily implementable method for generating perturbed predictions. However, our framework can readily accommodate other perturbation approaches, as this is not our primary contribution. We will provide complete results in the final version.

Q5: Increasing computational time and memory consumption, especially when dealing with whole-slide images (WSI).

A5: We acknowledge the reviewer’s concern regarding computational overhead, which was introduced by generating multiple perturbations through Monte Carlo dropout. Although our method does incur increased computational time and memory usage compared to TopoSemiSeg, specifically an iteration training time of 1020.04 ms and GPU memory consumption of 25.726 GB using UNet with batch size 16, this overhead is justified by the performance gains demonstrated in our experiments. The training time of TopoSemiSeg for one iteration is 610.80 ms. The GPU memory consumption is 15.235 GB. To address the reviewer’s specific concern, we emphasize that our approach processes patches cropped from whole-slide images rather than entire whole slide images themselves. The largest image resolution processed is approximately 1516×1516 pixels, significantly smaller than a whole-slide image.

Q6: The evaluation of the risk of error propagation in pseudo-labels.

A6: Our method does not employ pseudo-labeling strategies that risk error propagation. Instead, we enforce topological consistency constraints directly during training through our dual-level framework, without generating or incorporating pseudo-labels into the training set.

[r1] Mnih V. Machine learning for aerial image labeling[M]. University of Toronto (Canada), 2013.

[r2] Jordan, Michael I., et al. "An introduction to variational methods for graphical models." Machine learning 37.2 (1999): 183-233.

[r3] Guo C, Pleiss G, Sun Y, et al. On calibration of modern neural networks[C]//International conference on machine learning. PMLR, 2017: 1321-1330.

评论

Thank authors for their response, which has addressed some of my concerns. However, the validation of 1D topological structure capability was conducted solely on the Roads dataset, with no evidence on medical datasets for critical histopathological structures(e.g., holes or annular configurations). Additionally, there was no lightweight strategies to mitigate the computational cost. While authors assert that "the MATCH algorithm and dual-level topological consistency are domain-agnostic", there was no cross-task validation evidence. I will maintain my original rating.

评论

Dear Reviewer t5XP:

Thank you again for your thorough review and the positive feedback you provided. We hope our responses have fully addressed your concerns. Please kindly let us know if you have any remaining questions or concerns.

Best,

Authors of paper #11015

评论

Dear Reviewer t5XP,

Thank you very much for your encouraging follow-up. We are grateful that you found many of our revisions satisfactory and that you have maintained a positive overall assessment. We take your remaining concerns seriously regarding broader 1D topological validation, lightweight computational strategies, and cross-task experiments, and will explore them in future works.

Your constructive comments continue to guide our work, and we appreciate the opportunity to improve the manuscript.

Sincerely,

Authors

最终决定

Five experts reviewed this paper. Their post rebuttal recommendations are 3 Borderline Accepts and 2 Accepts. Reviewers appreciated the technical novelty of MATCH, the clear motivation, and the convincing empirical results on three challenging datasets (CRAG, GlaS, MoNuSeg). It can accurate align topological structures. This is significant because histopathology segmentation if histology images requires pixel-level accuracy but also preservation of fine-grained topological features. MATCH outperforms standard SSL baselines and SOTA topology-aware approaches. Reviewers also highlighted the detailed ablations, including studies on the intra- and temporal-topological losses, hyperparameter robustness, and generalization across segmentation backbones. Finally, the Reviewers found the paper to be clearly written and organized.

Overall, MATCH is well-motivated and makes a decent contribution to the SOTA in semi-supervised histopathology segmentation. Its novelty lies in introducing adaptive topological consistency through robust matching and dual-level constraints. The empirical improvements are convincing.

In the discussions and rebuttal, Reviewers raised concerns were about the (a) computational cost and scalability to larger or 3D datasets, (b) sensitivity to hyper-parameters, and (c) the method’s application to binary segmentation (it may require extensions for multi-class or instance-level settings). Some reviewers also requested clarifications on why Wasserstein-based matching underperformed, and on the incremental benefits of using both intra- and temporal-consistency terms. In the rebuttal, the authors addressed these issues with added experiments, ablations, and discussion. These concerns were well addressed in the rebuttal, and one of the reviewers raised their ratings. Although remaining concerns remain over the computational overhead, the strengths outweigh the weaknesses. I therefore recommend acceptance as a poster. The authors are encouraged to make the necessary changes to the best of their ability.