7.3

/10

Poster4 位审稿人

最低4最高5标准差0.5

3.5

置信度

创新性2.5

质量3.0

清晰度3.3

重要性2.8

NeurIPS 2025

KeeA: Epistemic Exploratory A Search via Knowledge Calibration

Dengwei Zhao,Shikui Tu,Yanan Sun,Lei Xu

OpenReview PDF

提交: 2025-05-07更新: 2025-10-29

摘要

In recent years, neural network-guided heuristic search algorithms, such as Monte-Carlo tree search and A$^\*$ search, have achieved significant advancements across diverse practical applications. Due to the challenges stemming from high state-space complexity, sparse training datasets, and incomplete environmental modeling, heuristic estimations manifest uncontrolled inherent biases towards the actual expected evaluations, thereby compromising the decision-making quality of search algorithms. Sampling exploration enhanced A$^\*$ (SeeA$^\*$) was proposed to improve the efficiency of A$^\*$ search by constructing an dynamic candidate subset through random sampling, from which the expanded node was selected. However, uniform sampling strategy utilized by SeeA$^\*$ facilitates exploration exclusively through the injection of randomness, which completely neglects the heuristic knowledge relevant to open nodes. Moreover, the theoretical support of cluster sampling remains ambiguous. Despite the existence of potential biases, heuristic estimations still encapsulate certain valuable information. In this paper, epistemic exploratory A$^\*$ search (KeeA$^\*$) is proposed to integrate heuristic knowledge for calibrating the sampling process. We first theoretically demonstrate that SeeA$^\*$ with cluster sampling outperforms uniform sampling due to the distribution-aware selection with higher variance. Building on this insight, cluster scouting and path-aware sampling are introduced in KeeA$^\*$ to further exploit heuristic knowledge to increase the sampling mean and variance, respectively, thereby generating higher-quality extreme candidates and enhancing overall decision-making performance. Finally, empirical results on retrosynthetic planning and logic synthesis demonstrate superior performance of KeeA$^*$ compared to state-of-the-art heuristic search algorithms.

关键词

A* searchneural guided heuristic searchretrosynthesis planning

评审与讨论

审稿意见

评分: 4置信度: 32025-06-28

This paper proposes a new search algorithm that incorporates several modifications compared to the SeeA* algorithm. First, the paper provides theoretical evidence that sampling within clusters can increase the likelihood of finding good solutions since this increases the variance of the reward among the candidate nodes to be expanded. Compared to SeeA*, the paper introduces cluster scouting, where samples within each cluster are evaluated to determine a score for each cluster which is used to bias candidate node sampling towards the clusters with a higher score. The paper also introduces a method for intra-cluster sampling that biases candidate node selection towards nodes that have been expanded fewer times. This algorithm is then evaluated on two sets of benchmarks involving retrosynthetic planning and logic synthesis.

优缺点分析

Quality: The first two claims in the paper, concerning the theoretical foundation for cluster sampling and the rationale for introducing cluster scouting and path-aware sampling, are well supported by the theoretical and empirical evidence. However, I do see two issues with the implementation of cluster scouting. First, and this is a point I may have misunderstood, is that this appears to be a computationally intensive process since it requires evaluating a potentially large number of reward functions to determine the score for each cluster. I expect this could be a limitation of this method compared to SeeA*. Second, I’m not convinced dividing the expectation value of the reward by the standard deviation is a good way of assessing the score. This would attribute a higher score to clusters with very low rewards but much smaller standard deviation than it would to clusters with much higher rewards and a higher (but still potentially quite small) standard deviation. I also have a concern about how path-aware sampling is combined with cluster scouting. Cluster scouting may return high scores to clusters where the reward for deeper nodes is much higher than shallower nodes, but the path-aware sampling would then focus on the shallower nodes which is counter-productive. I feel a more principled sampling approach would include the expansion depth into the score.

The third claim concerns the experimental benchmarking. A strength of the paper is that the experiments are extensive and contain a statistical robustness test for the results. However, a weakness is that the results for KeeA* are very close to those of SeeA*. Though the statistical tests run in the paper show that the difference is statistically significant, these tests do not rule out the possibility that these better results are achieved through overfitting the hyperparameters of KeeA* to the specific problems investigated in the experiments. This concern is reinforced by results in the appendix that show that the performance sometimes seems to vary by more than the difference between the two algorithms for different (but close) hyperparameter choices. Another concern I have about these experiments is that they do not include SeeA* with UCT as a baseline in the benchmarks. The SeeA* paper shows that SeeA* with UCT outperforms KeeA* on the USPTO dataset in both success rate and average runtime. Considering UCT plays a similar role as path-aware sampling, I feel this is a valid baseline that should have been included in this comparison.

Though the paper does mention some limitations, some additional discussion of these limitations would have been welcome. For instance, more information about the variability of KeeA* due to sampling randomness would be welcome.

Clarity: The paper is clear and well-written. Extensive additional information is provided in the appendix. There are a few minor typos that need correcting. A minor detail is that how clusters are made and maintained in the nodes is not explained in the main text apart from Algorithm 1. It would help the reader to include a small discussion of this before section 4 that discusses the motivation for these clusters.

Significance: The paper proposes some interesting directions for improving SeeA*, and I can see some of these being picked up by the community. The observation that sampling from clusters is a theoretically well-founded way of improving performance is useful and does motivate further exploration of ways in which clusters could be better defined and sampled from. The expanded evaluation on multiple retrosynthetic planning datasets compared to SeeA* is also a step forward. However, the very small performance improvement over SeeA* limits the significance of the work.

Originality: The way in which KeeA* investigates and makes use of different clusters, and the evaluation on additional retrosynthetic datasets, are original and interesting. This work is well-referenced and is appropriately compared to prior work. However, some of the contributions in this paper appear somewhat incremental. The observation that cluster sampling can lead to higher variance in the samples is quite intuitive, and there is little additional insight provided by the mathematical proof. Theorem 5.1 is also quite basic and does not add new insight.

问题

Please provide more information about the additional overhead involved in the reward function estimations that are required to assign scores to clusters. This seems to be a potentially limiting factor.
Please provide a comparison of this work to SeeA* with UCT, and explain why this was not included as a baseline in the benchmarking. This seems to be a very relevant comparison that is currently missing.
Could you please include more information about the variability of KeeA* due to randomness, for example how the performance changes between different seeds on a given problem?
Could the authors please respond to my concerns about their choice of score metric for the clusters, and discuss the interplay with path-aware sampling? I am not currently convinced these are good design choices in KeeA*.
Could the authors please respond to my concerns that the performance improvement compared to SeeA* could be due to overfitting KeeA*’s hyperparameters to these problems, instead of an underlying algorithmic advantage? A satisfactory answer to these questions could improve my “Quality” score.

局限性

Limitations are only addressed superficially in the conclusion. More information specifically about the variability due to randomness would be appreciated.

最终评判理由

The authors provide a convincing response to my questions. My original low quality score stemmed mainly from doubts about the benchmarking against SeeA*: the small difference in performance, combined with variable performance for different hyperparameter choices and the absence of SeeA* with UCT as a benchmark all raised doubts about whether the proposed algorithm actually does outperform SeeA*. In their response, the authors put forward convincing evidence that this is indeed the case. As such, I am raising my Quality score to 3. Though I feel the gains are still a bit incremental compared to SeeA*, now that my main objection has been raised I am increasing my overall rating to 4.

格式问题

no concerns

作者回复

2025-07-31

We sincerely appreciate the reviewers’ insightful comments and constructive suggestions. We will carefully revise the manuscript and refine the relevant details accordingly. Our detailed responses to each of the concerns are presented as follows.

Q1: Additional overhead is involved in assigning scores to clusters.

A1: We agree that calculating cluster reward increases the computational overhead per node expansion, because an additional presampling step is required to evaluate each cluster, along with computations for cluster rewards and path‑aware sampling. However, in practical scenarios, the computational overhead is usually dominated by calls to the policy and value functions as well as the realization of state transitions, and thus the additional cost of node selection is negligible. In particular, by improving search efficiency, our search algorithm is able to identify solutions with fewer node expansions, which in turn reduces the overall computation time. Taking retrosynthesis as an example, the primary bottleneck lies in the policy network’s single-step reaction prediction. On the USPTO benchmark, the average decision time of KeeA* is $38.70$ s, which is only slightly higher than SeeA* (Cluster)'s $37.98$ s.

Q2: Concerns on the experimental benchmarking, and comparisons of this work to SeeA* with UCT.

A2: In the logic synthesis task, the average ADP reduction rate of SeeA* (Cluster) is $22.5\\%$ , while KeeA* achieves $25.2\\%$ . In the retrosynthesis task, SeeA* (UCT) achieves an average success rate of $63.31\\%$ , below KeeA*’s $63.76\\%$ . SeeA* (UCT) is not included as a baseline in this paper due to its inferior performance compared to SeeA (Cluster). Although SeeA* (UCT) outperforms SeeA* (Cluster) on the USPTO test set, this benchmark comprises only $190$ carefully selected molecules, and the guiding heuristics are also trained on the USPTO dataset, raising concerns of potential overfitting. Across $4,909$ diverse test cases from seven benchmarks, SeeA* (Cluster) performs better than SeeA* (UCT). Moreover, the actual success rate of KeeA* is $98.9473\\%$ , consistent with that of SeeA* (UCT), and the discrepancy arises from a rounding error. We appreciate your attention to this detail. Relevant results will be included in the revised paper.

Q3: Variability of KeeA* due to randomness.

A3: To quantify KeeA*’s variability induced by randomness, we executed three runs for each of ten random seeds on the USPTO test set. A t-test with the $0.05$ significance level is used to compute the error intervals. The average success rate over all $30$ runs is $98.53\\%$ , which is comparable to the $98.84\\%$ reported in the paper. KeeA* exhibits robust performance across various random seeds, achieving more stable results compared to SeeA* (Cluster). SeeA* not only exhibits a larger variance across multiple runs but also shows greater performance fluctuations under different random seeds, highlighting the superiority of KeeA* over SeeA* (Cluster) with respect to stability.

Seed	KeeA* ( $\\%$ )	SeeA*(Cluster)( $\\%$ )
0	$98.07\pm 0.62$	$96.49\pm 1.64$
1	$97.89\pm 2.14$	$98.07\pm 1.64$
2	$98.60\pm 0.62$	$96.84\pm 0.00$
3	$98.60\pm 1.24$	$97.89\pm 1.07$
4	$98.25\pm 0.62$	$98.60\pm 0.62$
5	$98.95\pm 1.07$	$97.37\pm 0.00$
6	$99.30\pm 1.24$	$96.67\pm 2.68$
7	$98.95\pm 0.00$	$96.67\pm 2.21$
8	$98.07\pm 0.62$	$96.84\pm 2.83$
9	$98.60\pm 0.62$	$97.54\pm 0.62$
All	$98.53\pm 0.22$	$97.30\pm 0.35$

Q4: Choice of score metric for the clusters, and the interplay with path-aware sampling

A4: We agree that the clustering score metric can be improved, and it is necessary to design more effective metrics. However, the mean-to-variance ratio employed in this paper aligns with a commonly used risk-adjusted evaluation metric, designed to provide solutions with higher returns relative to their variability, similar to the Sharpe Ratio. Clusters with reduced variance exhibit more concentrated and stable performance, leading to higher reliability in obtaining rewards and improved robustness under uncertainty. Clusters with high mean and low variance are preferred, while those with low mean and high variance are least desirable. In clusters with high mean and high variance, while the chance of sampling superior nodes increases, the risk of sampling inferior nodes also grows, thereby reducing performance stability. For clusters with low mean and low variance, the upper extremes of node quality are constrained, yet their stability increases the probability of achieving relatively consistent and satisfactory performance. The latter two scenarios involve a trade-off between potential gains and stability. Therefore, the cluster scoring method employed in this paper is reasonable. In three independent runs on the USPTO test set, using the mean alone yields an average success rate of $98.07\pm 1.64\\%$ , whereas the result of MEAN/STD scoring is $98.53\pm 1.49\\%$ , achieving higher average performance with lower variance and reflecting its greater robustness and overall superiority.

In the situation that deeper nodes may receive a higher reward $f$ than shallower ones, selecting nodes with lower $f$ -values at shallower depths reflects the algorithm’s exploratory behavior, rather than greedy expansion based on maximal estimates $f$ . Since $f$ is a biased estimate of the true return $f^\*$ , the node with the highest $f$ does not necessarily correspond to the optimal node with the highest $f^\*$ . Depth-penalty encourages selecting nodes beyond those with the highest $f$ values, introducing exploration into the search algorithm to escape local optima. Depth-based penalties have also been adopted in LevinTS [1] to encourage exploration. It is a classical exploitation-exploration trade-off in search algorithms, and is not necessarily detrimental to the search algorithm.

[1] Orseau, Laurent, et al. "Single-agent policy tree search with guarantees." Advances in Neural Information Processing Systems 31 (2018).

Q5: Performance improvement compared to SeeA*

A5: KeeA* demonstrates greater robustness than SeeA*, while consistently yielding solutions with significantly higher quality. Both SeeA* and KeeA* are sampling-based search algorithms. As shown in Table A3, KeeA* exhibits greater stability across different random seeds, achieving an average success rate of $98.53\pm 0.22\\%$ , outperforming SeeA* (Cluster), which achieves $97.30\pm 0.35\\%$ . Furthermore, solution length is a critical indicator of solution quality. With $30$ test runs, KeeA* achieves a significantly shorter average path length of $6.15\pm0.06$ versus $6.78\pm0.10$ for SeeA* (Cluster) on USPTO benchmark . Similar advantages in solution quality are also observed on other test sets, as shown in Table 2 of the paper. The substantial difference in solution quality indicates that KeeA*’s advantage is not due to hyperparameter settings obtained by chance. In addition, we conducted a grid search over different hyperparameter settings, with three trials conducted for each configuration. The results are consistent with those in Appendix G: KeeA* demonstrates robust performance across a wide range of hyperparameter settings and outperforms $97.30\pm 2.36\\%$ of SeeA* (Cluster), for which the number of runs is also limited to three. It need to be noticed that the limited $3$ runs lead to higher variability than the $30$ -run setting in t-test.

	$\beta=0.1$	$\beta=0.3$	$\beta=0.5$	$\beta=0.7$	$\beta=0.9$
$\alpha=0.1$	$97.72\pm2.46$	$98.07\pm2.68$	$97.54\pm1.64$	$97.72\pm1.64$	$97.72\pm1.24$
$\alpha=0.3$	$97.89\pm2.83$	$97.72\pm1.64$	$98.60\pm2.21$	$98.60\pm2.68$	$97.54\pm1.24$
$\alpha=0.5$	$97.54\pm2.46$	$98.77\pm2.46$	$98.07\pm0.62$	$97.37\pm1.84$	$97.89\pm0.00$
$\alpha=0.7$	$98.25\pm0.62$	$98.07\pm1.64$	$98.07\pm0.62$	$98.77\pm0.62$	$97.89\pm1.07$
$\alpha=0.9$	$64.74\pm7.70$	$62.81\pm8.57$	$64.04\pm6.43$	$60.88\pm4.32$	$64.74\pm2.14$

Q6: Theoretical contributions of this work.

A6: In this work, we theoretically demonstrate that cluster sampling outperforms uniform sampling in expectation for the first time. It is not readily intuitive since the advantage of cluster sampling does not universally hold. In uniform sampling, drawing $m$ samples from $N$ nodes yields a probability of $m/N$ for including the optimal node. Under cluster sampling, $m/K$ nodes are sampled from each of the $K$ clusters uniformly. If the optimal node resides in a large cluster with more than $N/K$ nodes, the probability of selecting the optimal node under cluster-based sampling will be lower than $m/N$ in uniform sampling, leading to reduced search efficiency. This paper offers a complementary perspective by first showing that cluster sampling increases the variance of collected candidates. Leveraging the extreme value theorem, a higher expected maximum is achieved by cluster sampling, providing a theoretical justification for SeeA* (Cluster). What's more, Theorem 5.1 establishes a theoretical connection between the predicted value $f$ and the ground-truth value $f^\*$ , which is rarely investigated in neural-guided search algorithms.

2025-08-06

Thank you for this response. I feel that you have provided convincing answers to my questions. In particular, I feel this additional evidence does alleviate my concerns about the extent to which KeeA* actually outperforms SeeA*. Though I still feel some of the contributions are a bit incremental, I will revise my rating accordingly.

2025-08-06

Thank you for your thoughtful feedback and positive assessment of our work. We appreciate your acknowledgment of our efforts in addressing your concerns. KeeA* is developed as an extension of the SeeA* framework, while adopting a different perspective on optimizing the search process. SeeA* aims to improve the probability of identifying the optimal solution, which cannot explain the advantage of clustering sampling over uniform sampling, as discussed in Q6. Although A* search guarantees optimality under certain assumptions, heuristics learned by neural networks often violate these assumptions, and therefore the obtained solutions are not guaranteed to be optimal.

KeeA* explores an alternative direction by optimizing the search process to obtain solutions that are closer to the optimal one, which is different from the objective of SeeA*. Guided by this insight, we theoretically establish that clustering sampling produces solutions closer to the optimum than uniform sampling to demonstrate its advantage. Cluster scouting and path-aware sampling are proposed to improve the quality of identified solutions, and experimental results has demonstrated the effectiveness of our approach. In retrosynthetic planning, KeeA* achieves higher solution quality by reducing the average synthesis path length from SeeA*’s $4.17$ to $3.99$ for successfully synthesized molecules. A sign test has revealed that KeeA* achieves a statistically significant advantage in solution lengths compared to SeeA* with a $p$ value $=1.86\times 10^{-8}$ . We hope this explanation helps clarify the contribution of our work and appreciate the opportunity to further elaborate on our perspective.

审稿意见

评分: 4置信度: 32025-07-02

This paper proposed a heuristic search algorithm called Keep A*, which utilizes a cluster sampling instead of a simple random sampling. Experimental evaluations on extensive real-world datasets demonstrated the effectiveness of the proposed method. Furthermore, the authors provided theoretical backgrounds of Keep A* to clarify the reasons for the performance improvements by Keep A*.

优缺点分析

Strengths

Clear motivation and extensive related work study supporting the motivation of the proposed method.
Theoretical analysis that clarifies the reasons for the performance improvements by the proposed method.
State-of-the-art performance on real-world datasets.

Weaknesses

Which of the following categories does the proposed cluster sampling belong to: distance-based, density-based, or their combination? Each clustering method has inherent advantages and disadvantages. Please clarify the advantages and disadvantages of the cluster sampling. It will be helpful in justifying the proposed method.
I was not able to fully analyze the reasons for the accuracy improvements by Keep A* in the the retrosynthetic planning problems because the problem definition of the retrosynthetic planning is not clear. In the current problem definition, I didn't fully understand why state-of-the-art machine learning algorithms for organic retrosynthesis have not been compared in the experiments.
K is a crucial hyperparameter in the clustering methods with a fixed K. However, a hyperparameter analysis of K is not provided in the manuscript. If I missed the description of K, please let me know where the description of K is.

问题

See the Strengths And Weaknesses section.

局限性

See the Strengths And Weaknesses section.

最终评判理由

The author's thorough rebuttal addressed many of my concerns, so I raised my score accordingly. Thank you for the author's comprehensive response.

格式问题

N/A

作者回复

2025-07-31

We sincerely appreciate the reviewer’s insightful comments and constructive suggestions. We provide detailed responses to the concerns raised below, and the relevant revisions and clarifications will also be incorporated into the updated version of the manuscript.

Q1: Please clarify the advantages and disadvantages of cluster sampling.

A1: Following SeeA*, competitive learning as a distance-based clustering algorithm is employed to reduce computational overhead, where nodes are grouped by measuring their distance in the embedding space. A key advantage of this clustering algorithm is that it improves computational efficiency by avoiding re-clustering from scratch with each node generation. Newly generated nodes are incrementally assigned to the nearest cluster, with the cluster centers updated in turn. Meanwhile, besides the initialization of cluster centers, cluster sampling depends on the quality of feature vectors extracted by neural networks and the sequence of node expansions. As this study primarily focuses on tree search algorithms, the development of more advanced clustering methods is an essential avenue for future research.

Q2: Why keep A* in the retrosynthetic planning problems and the exclusion of state-of-the-art machine learning algorithms.

A2: This paper focuses on neural network‑guided heuristic search algorithms, aiming to enhance search efficiency when the estimated $f$ -values deviate from the true optimal cost $f^\*$ . A* search is the foundation of both SeeA* and KeeA*, and is retained as a baseline search algorithm. KeeA* is designed as a general‑purpose problem‑solving algorithm, not limited to retrosynthesis planning. SOTA general search algorithms, such as LevinTS and PHS, are also evaluated under the same heuristic function for a fair comparison.

As summarized in Appendix D, retrosynthetic planning aims to identify a synthetic route from available building block molecules to a target compound. One-step retrosynthesis model is employed to predict single-step chemical reactions that can generate the target molecule. If the reactants of a reaction are non-building-block molecules, one-step model is called recursively to find chemical reactions that can decompose these molecules until all molecules are reduced to building blocks. Machine learning algorithms are commonly employed to learn the one-step retrosynthesis model to guide the search process, and a synthesis route composed of multiple chemical reactions typically requires search algorithms for a solution. Directly generating the entire pathway using machine learning models remains highly challenging, and current SOTA retrosynthesis algorithms still rely heavily on search algorithms.

The following table is a comparative analysis on the USPTO benchmark against SOTA retrosynthesis algorithms. Notably, KeeA* attains comparable performance despite using less accurate heuristics, benefiting from search algorithm refinements without the need for expensive neural network training. We also test EG-MCTS [4] on the entire seven benchmarks, attaining an average success rate of $62.78\\%$ , below KeeA*’s $63.76\\%$ . Furthermore, both GRASP [1] and RetroGraph [2] are not open-source. PDVN [3] did not release their trained guiding heuristic models. Therefore, the results for [1,2,3] on all seven benchmarks could not be obtained. The latest DESP algorithm [5] requires processing a large set of starting molecules, leading to significant computational overhead and failing to yield feasible solutions within the 10-minute time constraint.

Algorithm	Succ ( $\\%$ )	Avg length
GRASP [1]	$98.95$	$6.17$
RetroGraph [2]	$99.47$	$6.33$
PDVN+Retro* [3]	$98.95$	-
EG-MCTS [4]	$96.84$	$5.92$
KeeA*	$98.95$	$5.89$

[1] Yu, Yemin, et al. "Grasp: Navigating retrosynthetic planning with goal-driven policy." Advances in Neural Information Processing Systems 35 (2022): 10257-10268.

[2] Xie, Shufang, et al. "Retrograph: Retrosynthetic planning with graph search." Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022.

[3] Liu, Guoqing, et al. "Retrosynthetic planning with dual value networks." International conference on machine learning. PMLR, 2023.

[4] Hong, Siqi, et al. "Retrosynthetic planning with experience-guided Monte Carlo tree search." Communications Chemistry 6.1 (2023): 120.

[5] Yu, Kevin, et al. "Double-ended synthesis planning with goal-constrained bidirectional search." Advances in Neural Information Processing Systems 37 (2024): 112919-112949.

Q3: Investigations on the hyperparameter $K$ .

A3: As indicated in Line 204 and Algorithm 1, $K$ represents the number of clusters, which is also a critical parameter of SeeA* (Cluster) and was comprehensively analyzed in the original study. KeeA* adopts the same $K$ value as SeeA* in the paper. To assess the influence of $K$ on KeeA*, three runs are performed for various $K$ , and a t-test with the $0.05$ significance level is used to compute the error intervals. Because only three runs are conducted, the error interval is relatively large. KeeA* consistently outperforms SeeA* (Cluster) on the USPTO test set. An excessively small $K$ induces under-fitting, leading to a model of limited flexibility, while an overly large $K$ risks overfitting to collected states. Both extremes impair the efficacy of decision-making during search, highlighting the importance of appropriate cluster number selection to optimize the efficiency of SeeA* and KeeA*. We will include the relevant discussions and results in the revised manuscript to address this point.

K	KeeA* ( $\\%$ )	SeeA*(Cluster)( $\\%$ )
3	$95.44\pm 1.64$	$94.91\pm 4.32$
4	$97.54\pm 1.24$	$97.37\pm 2.14$
5	$98.53\pm 1.49$	$97.30\pm 2.36$
6	$97.36\pm 3.85$	$96.32\pm 1.07$
7	$95.26\pm 2.13$	$94.04\pm 2.21$
8	$94.91\pm 4.32$	$92.98\pm 5.49$

审稿意见

评分: 5置信度: 42025-07-02

The paper proposes KeeA*, an extension of SeeA* that augments selective‑sampling A* search with two knowledge‑guided mechanisms, cluster scouting and path‑aware sampling. A theoretical argument shows that cluster sampling should raise the expected extreme reward relative to uniform sampling, after which the two new heuristics are introduced to further raise the candidate‑set mean and variance. Experimental results cover retrosynthetic planning and logic‑synthesis benchmarks and include ablations and a McNemar test for significance.

优缺点分析

Strengths

The topic is timely: many modern search systems rely on neural heuristics whose bias can cripple performance, and methods that inject principled exploration are valuable.
The theoretical development, while idealised, cleanly connects distributional variance to extreme‑value rewards and clarifies why cluster sampling helps. KeeA* moves beyond a conceptual observation by designing concrete, lightweight modifications that integrate readily with SeeA*, so the methodological increment is small in implementation but effective in practice.
The two applications, retrosynthetic planning in organic chemistry and logic synthesis in VLSI design, are interesting and the performance shows clear evidence the proposed method outperforms existing state-of-the-art methods.

Weaknesses

The analysis in Theorem 5.1 still rests on homoscedastic Gaussian error assumptions. In practice, heuristic inaccuracies are heavy‑tailed and state‑dependent; some discussion of robustness would help readers better understand applicability.
While the authors use consistent \alpha and \beta, baseline hyper‑parameter parity is not fully demonstrated: different search algorithms may require different choices of \alpha and \beta to achieve the optimal performance. Part of KeeA*’s gain may just come from the fact that the current setup of \alpha and \beta works better compared to the baselines. A small grid search on both SeeA* and KeeA* might dispel this doubt.

问题

Typos such as “an dynamic subset” (abstract) and a few inconsistent variable names between Algorithm 1 and the appendix remain.
Figures 3-5 lack confidence intervals; error bars would complement the McNemar test. The derivation in Appendix B occasionally drops expectation brackets, which may confuse readers unfamiliar with the notation.

局限性

Yes.

格式问题

None

作者回复

2025-07-31

We sincerely thank the reviewer for the valuable comments. We will carefully revise the manuscript to correct typos and clarify variable notations to improve readability. Our responses to the concerns are provided below.

Q1: Theorem 5.1 still rests on homoscedastic Gaussian error assumptions

A1: Due to the presence of prediction errors, for two nodes $n_1$ and $n_2$ , we have that $f_1=f^\*_1+\epsilon_1$ and $f_2=f^\*_2+\epsilon_2$ . Define $\Delta=f_2-f_1>0$ and $\eta=\epsilon_2-\epsilon_1$ .

P(f^\*_2>f^\*_1|f_2-f_1=\Delta)=P(f^\*_2-f^\*_1>0|\Delta)=P(\Delta>\eta|\Delta)

Since $P(\Delta>\eta|\Delta)$ is increasing with $\Delta$ , $P(f^\*_2>f^\*_1|f_2-f_1=\Delta)$ also increases with the gap between $f_1$ and $f_2$ . The monotonicity property established in Theorem 5.1 remains valid even when the noise distribution $\epsilon$ is non-Gaussian and node-dependent; however, Equation (11) can no longer be derived.

Q2: Grid search on both SeeA* and KeeA* and confidence intervals.

A2: KeeA* extends SeeA* by incorporating two additional hyperparameters, $\alpha$ and $\beta$ . Other parameters remain consistent with the optimal settings provided in SeeA*. To demonstrate the robustness of KeeA*, an exhaustive grid search is conducted with three independent runs per configuration, and a t-test at the $0.05$ significance level is used to compute the error intervals. Because only three runs are conducted, the deviation is relatively large. Both the mean and the error interval are reported in the following table. KeeA* demonstrates robust performance, consistently outperforming SeeA*'s $97.30\pm 2.36\\%$ across a wide range of hyperparameter settings, with the highest success rate reaching $98.77\pm 0.62\\%$ . This observation aligns with the findings shown in Appendix G, which provides the performance of KeeA* under various $\alpha$ and $\beta$ .

To strengthen the statistical validity of Figures 4–6, confidence intervals (t-test, $\alpha$ =0.05) based on empirical standard deviations can be computed. Revisions will be incorporated into the final manuscript. Due to time constraints, we have not completed the grid search and confidence interval calculations for the McNemar test across all seven datasets. We will include the comprehensive results in the final version of the paper.

	$\beta=0.1$	$\beta=0.3$	$\beta=0.5$	$\beta=0.7$	$\beta=0.9$
$\alpha=0.1$	$97.72\pm2.46$	$98.07\pm2.68$	$97.54\pm1.64$	$97.72\pm1.64$	$97.72\pm1.24$
$\alpha=0.3$	$97.89\pm2.83$	$97.72\pm1.64$	$98.60\pm2.21$	$98.60\pm2.68$	$97.54\pm1.24$
$\alpha=0.5$	$97.54\pm2.46$	$98.77\pm2.46$	$98.07\pm0.62$	$97.37\pm1.84$	$97.89\pm0.00$
$\alpha=0.7$	$98.25\pm0.62$	$98.07\pm1.64$	$98.07\pm0.62$	$98.77\pm0.62$	$97.89\pm1.07$
$\alpha=0.9$	$64.74\pm7.70$	$62.81\pm8.57$	$64.04\pm6.43$	$60.88\pm4.32$	$64.74\pm2.14$

审稿意见

评分: 5置信度: 42025-07-05

This paper introduces KeeA*, an epistemic exploratory A* search method that incorporates heuristic knowledge to refine the sampling process. Through theoretical analysis, they first prove that SeeA* with cluster sampling outperforms uniform sampling by leveraging distribution-aware selection. Built on this, KeeA* introduces two key techniques: 1) cluster scouting, which increases the sampling mean, and 2) path-aware sampling, which enhances the sampling variance. These advancements enable KeeA* to generate higher-quality extreme candidates. Empirical evaluations on retrosynthetic planning and logic synthesis confirm that KeeA* achieves superior performance compared to state-of-the-art heuristic search algorithms, e.g., SeeA*, A*, MCTS, etc.

优缺点分析

Strength:

How to effectively leverage heuristic knowledge relevant to open nodes is a important question to explore in search algorithms.
Theretical analysis on cluster sampling and the two proposed techniques is novel and robust.
Experiment design and execution are excellent, with KeeA* demonstrating clear performance improvements over the baselines.

Weaknesses:

Discuss on application scenarios where KeeA* might underperform compared to SeeA*.

问题

Discuss on application scenarios where KeeA* might underperform compared to SeeA*

局限性

最终评判理由

Thank you to the authors for their thorough discussion of the scenarios where SeeA* may outperform KeeA*, offering valuable insights to help users better understand. Overall, I recommend acceptance.

格式问题

作者回复

2025-07-31

We sincerely thank the reviewer for the insightful comment. The following content will be added to the revised manuscript to provide further clarification.

Q1: Discuss application scenarios where KeeA* might underperform compared to SeeA*.

A1: KeeA* and SeeA* are both search algorithms designed to address the discrepancy between the heuristic estimate $f$ and the true return $f^\*$ . Compared to SeeA*, KeeA* introduces two key improvements:

(1) KeeA* allocates the number of sampled nodes from each cluster in proportion to the mean of the estimated $f$ -values of their nodes, while SeeA* samples the same number of nodes from each cluster,

(2) KeeA* improves the probability of sampling nodes at shallower depths for exploration, instead of uniformly selecting candidates from the chosen cluster as in SeeA*

The improvement (1) exploits the observation that, despite potential bias, $f$ estimated by a well-trained heuristic remains informative. The improvement (2) introduces a depth-based penalty to encourage exploration. Given the exponential growth of the search tree, the number of nodes increases significantly at deeper levels; favoring shallower nodes helps escape from over-explored branches. The effectiveness of both improvements depends on the degree of bias between the heuristic estimate $f$ and the true evaluation $f^\*$ . Therefore, there are two scenarios where KeeA* might underperform SeeA*:

Case (1): If the prediction quality of $f$ is very low, then the clusters with higher $f$ are more likely to have lower true returns $f^\*$ . In such cases, KeeA* may over-sample from suboptimal clusters, reducing the likelihood of selecting the optimal node and thereby impairing search efficiency.

Case (2): If the prediction quality of $f$ is sufficiently high, greedy expansion based on $f$ -values becomes an effective strategy to identify the node with the highest true return $f^\*$ . The depth-based penalty used to encourage exploration introduces unnecessary node expansions, reducing the search efficiency of KeeA*.

Excluding the two aforementioned extreme cases, KeeA* leverages both exploitation of the heuristic function $f$ and exploration encouraged by the depth-based penalty, enhancing search efficiency under biased estimates by effectively balancing exploitation and exploration.

最终决定Accept (poster)

2025-09-17

This paper introduces KeeA, an Asearch variant that incorporates heuristic knowledge into its sampling process. The authors provide a theoretical analysis arguing that SeeA* with cluster sampling outperforms uniform sampling. The KeeA* method is built on this foundation and proposes two techniques: cluster scouting, intended to increase the sampling mean, and path-aware sampling, intended to enhance the sampling variance. The claim is that these allow the algorithm to generate higher-quality candidate solutions. The paper supports this claim with empirical evaluations on retrosynthetic planning and logic synthesis, reporting superior performance compared to SeeA, A, and MCTS.

KeeA*: Epistemic Exploratory A* Search via Knowledge Calibration

摘要

评审与讨论

优缺点分析

问题

局限性

最终评判理由

格式问题

优缺点分析

问题

局限性

最终评判理由

格式问题

优缺点分析

问题

局限性

格式问题

优缺点分析

问题

局限性

最终评判理由

格式问题

KeeA: Epistemic Exploratory A Search via Knowledge Calibration