5.3

/10

Rejected4 位审稿人

最低5最高6标准差0.4

3.5

置信度

ICLR 2024

G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural Networks

Zhaoyu Li,Jinpei Guo,Xujie Si

OpenReview PDF

提交: 2023-09-23更新: 2024-02-11

摘要

关键词

SAT solvingGraph neural networks

评审与讨论

审稿意见

评分: 5置信度: 32023-10-29

Despite GNNs showing promise in SAT solving, a standard dataset and comparison benchmark were lacking. The paper introduced G4SATBench, the first benchmark for evaluating GNNs in solving SAT. G4SATBench, comprising 7 problems with 3 difficulty tiers, assesses several GNN models across various prediction tasks, training objectives, and inference algorithms. Empirical evaluation using G4SATBench indicate GNNs excel at greedy local search strategies but struggle with backtracking search in latent spaces.

优点

The problem the paper is trying to address is important. Indeed, we should have a uniform way for evaluating the GNN-for-SAT approaches emerging in recent years.
I appreciate the large amount of experiments (consuming 8,000 GPU hours) and data the author put effort to in studying GNN for SAT.
The way of comparing the training and solving processes of GNNs with both CDCL and local search in SAT solvers is interesting to me.

缺点

G4SATBench contains only small scale formulas with at most 400 variables. However, industrial-level SAT formulas and formulas in SAT competitions can easily have thousands or even millions of variables. In order to make GNN-for-SAT approach practically applicable, the benchmarks using in the evaluation should also consider real-world level cases, such as large SAT formulas. I would encourage the authors to include such cases in G4SATBench.
GNN models evaluated in the study only contains basic ones, including NeuroSAT (the first GNN-for-SAT model), GCN, GGNN, and GIN. It would be interesting to compare GNN models applied by recent GNN-for-SAT approaches, and see how they perform in G4SATBench, instead of only considering basic GNN models.
For experiments comparing with CDCL heuristics, the authors apply contrastive pretraining to pretrain the representation of CNF formulas to be close to their augmented ones with learned clauses. I have some doubts on the experimental setup. The learned clauses are not static and can vary depending on which CDCL solver we use. Studying whether GNN can learn these learned clauses is not quite reasonable to study if CNN can mimic CDCL. Furthermore, even though GNN can infer a bunch of learned clauses, can we say that CNN can learn a solving strategy akin to CDCL? As we know that CDCL not only involves clause learning, but also variable branching, etc.

问题

Why only small scale SAT formulas are included in the benchmark?
Why only basic GNN models are applied in the paper?
What is the rationale behind the comparison between GNN and CDCL heuristics? Can you illustrate in more details?

伦理问题详情

N/A

2023-11-14

Thank you for your thorough review of our paper. We value your insights and would like to address your questions as follows:

Why only small scale SAT formulas are included in the benchmark?

We acknowledge the potential value of including industrial-scale SAT instances in G4SATBench. However, our decision to exclude these instances is based on several considerations:

Firstly, the size of industrial SAT instances often exceeds 1 million variables and clauses, which poses significant challenges for training GNN models effectively within GPU memory constraints. Furthermore, they are always limited in number (e.g., the industrial instances in SAT competitions are less than 1,000 instances), making it harder to train a good GNN model, as a diverse and ample dataset is critical for effective training.
Secondly, we find that small-sized industrial instances in SAT Competitions often exhibit a high degree of similarity in their clauses, particularly within the same domain. This limits the ability of GNN models trained and evaluated on these instances to demonstrate true generalization across varied SAT problems.
Thirdly, to partially address this, G4SATBench incorporates pseudo-industrial generators (CA and PS) to emulate some characteristics of industrial instances. This approach allows us to simulate the structure complexity of industrial scenarios while maintaining manageable dataset sizes and instance sizes.
Lastly, our core objective is to establish a comprehensive framework for evaluating GNN-based SAT solvers and investigate how and what can (standalone) GNN models learn to solve SAT. Our results show that GNN models can develop a novel solving heuristic akin to local search, but fail to learn or mimic CDCL heuristic. Consequently, for broader applicability in industrial instances, we suggest future research could develop neural-guided SAT solvers that integrate the strengths of GNN models with modern CDCL-based approaches. We expect that our insights from G4SATBench will be instrumental in guiding such advancements.

Why only basic GNN models are applied in the paper?

We observe that many existing GNN architectures for SAT solving only perform minor modifications to the original NeuroSAT [1,2], and the difference in performance between their architectures and NeuroSAT is marginal (experiments in G4SATBench also demonstrate that the performance between different GNN models is always similar). While there are some indeed novel GNN architectures for SAT solving [3,4], they are tailored for instances not in CNF and specialized to specific applications (like CircuitSAT), rather than general purposes. Due to this specialized nature, we've chosen to keep them outside the current scope of G4SATBench. Although using more advanced GNN models could achieve better performance, we believe our findings on the learning ability and limitations of GNN-based SAT solvers are still valid for these static GNN models, given their shared fundamental message-passing framework.

2023-11-14

What is the rationale behind the comparison between GNN and CDCL heuristics? Can you illustrate in more details?

To demonstrate whether GNN models could effectively learn the CDCL heuristic, we first train a GNN model on the original SAT instances in the training set (referred to as model 1) and then evaluate their performance on the (clause learning) augmented instances in the testing set. Note that CDCL-based (or even backtracking-based) SAT solvers could easily solve these augmented instances in very few search steps (since the learned clauses effectively reduce the search space). Therefore, if GNNs are capable of learning the CDCL heuristic, we expect them to demonstrate a similarly high performance on these augmented instances. However, model 1 exhibits similar or, in many cases, inferior performance on the augmented instances compared to its performance on the original testing instances, suggesting that GNN models are not learning the CDCL heuristic.

Further, we train another GNN model specifically on the augmented SAT instances in the training set (model 2). and evaluate them on the augmented instances in the testing set. Experiments show that the performance of model 2 on the augmented testing set is notably better than that of model 1. Therefore, we hypothesize if the GNN models could learn the representation of the original instance to be similar to its augmented counterpart, their performance might improve. To explore this possibility, we use contrastive pretraining strategies [5], trying to make GNN models embed the clause learning heuristic in the latent space (i.e., learn a better representation like its clause learning augmented one). However, even with this approach, the GNN models faced challenges in effectively adopting the CDCL heuristic. This might be attributed to the inherent limitations of static GNN message passing, which struggles to adapt to the dynamic nature of clause learning in SAT solving, where additional nodes and edges dynamically modify the graph representation of an instance.

[1] Cameron, Chris, et al. "Predicting propositional satisfiability via end-to-end learning." AAAI 2020.

[2] Ozolins, Emils, et al. "Goal-aware neural SAT solver." IJCNN 2022.

[3] Li, Min, et al. "DeepSAT: An EDA-driven learning framework for SAT." arXiv preprint 2022.

[4] Yan, Zhiyuan, et al. "Addressing Variable Dependency in GNN-based SAT Solving." arXiv preprint 2023.

[5] Duan, Haonan, et al. "Augment with care: Contrastive learning for combinatorial problems." ICML 2022.

2023-11-22

Thank you for your clarification. For future work, I would suggest 1) expanding G4SATBench by adding more real-world SAT formulas from the industry with moderate sizes, 2) expanding the GNN models under study with more recent GNN models, and 3) developing more convincing methods to study GNN for CDCL heuristics. I would like to keep my current score.

审稿意见

评分: 5置信度: 42023-10-29

This paper studies Graph neural networks for the Boolean Satisfiability Problem. As claimed by the authors, previous research pays little attention on the dataset and benchmark to evaluate and compare existing approaches. In this work, the authors propose G4SATBench, which accounts those aspects ignored by previous research works. The authors claim that their proposed G4SATBench is the first benchmark study that establishes a comprehensive evaluation framework for GNN-based SAT solvers.

优点

This paper focus on benchmark studying, which is a foundational work in GNN-based SAT solvers.
G4SATBench contains a diverse set with enough domain areas and difficulty levels.
Publicly available source code is provided.

缺点

I have several comments regarding this submission.

As described in this submission, the G4SATBench benchmark consists of 3 instance families, including random instances, pseudo-industrial instances, and combinatorial instances. However, SAT is so fascinating because of its great importance in real-world industrial applications. As a top-tier conference submission that focuses on proposing a SAT benchmark, it is needed to include industrial SAT instances. Hence, the lack of industrial SAT instances degrades the significance of this work.
It seems that the technical merit of this work is a bit thin. This paper describes a SAT benchmark, rather proposing new SAT solving techniques. Also, the authors claim that an insight obtained from this paper is that GNN models can effectively learn a simple greedy local search strategy (such as GSAT algorithm), but it is difficult to learn backtracking search strategy (such as DPLL and CDCL). Since it is well-known that, for solving application SAT instances, CDCL-based SAT solvers greatly outperform local search SAT solvers. The insight of this work seems to argue that GNN-based SAT solvers are not promising to solve application SAT instances, which also degrades the significance of this work.

问题

Please see my comments in the Weaknesses section.

伦理问题详情

I have no ethic concern about this submission.

2023-11-14

Thank you for the feedback and comments! In the following, we hope to answer the concerns you have mentioned:

The lack of industrial SAT instances degrades the significance of this work.

We acknowledge the potential value of including industrial-scale SAT instances in G4SATBench. However, our decision to exclude these instances is based on several considerations:

Firstly, the size of industrial SAT instances often exceeds 1 million variables and clauses, which poses significant challenges for training GNN models effectively within GPU memory constraints. Furthermore, they are always limited in number (e.g., the industrial instances in SAT competitions are less than 1,000 instances), making it harder to train a good GNN model, as a diverse and ample dataset is critical for effective training.
Secondly, we find that small-sized industrial instances in SAT Competitions often exhibit a high degree of similarity in their clauses, particularly within the same domain. This limits the ability of GNN models trained and evaluated on these instances to demonstrate true generalization across varied SAT problems.
Thirdly, to partially address this, G4SATBench incorporates pseudo-industrial generators (CA and PS) to emulate some characteristics of industrial instances. This approach allows us to simulate the structure complexity of industrial scenarios while maintaining manageable dataset sizes and instance sizes.
Lastly, our core objective is to establish a comprehensive framework for evaluating GNN-based SAT solvers and investigate how and what can (standalone) GNN models learn to solve SAT. Our results show that GNN models can develop a novel solving heuristic akin to local search, but fail to learn or mimic CDCL heuristic. Consequently, for broader applicability in industrial instances, we suggest future research could develop neural-guided SAT solvers that integrate the strengths of GNN models with modern CDCL-based approaches. We expect that our insights from G4SATBench will be instrumental in guiding such advancements.

The insight of this work seems to argue that GNN-based SAT solvers are not promising to solve application SAT instances, which also degrades the significance of this work.

One of our key contributions is to shed light on the capabilities and limitations of (standalone) GNN-based SAT solvers. While existing studies have mainly focused on developing various GNN architectures and methods to enhance solver performance, our work uniquely delves into understanding the mechanisms of how GNNs can learn to solve SAT problems. We acknowledge that CDCL-based SAT solvers currently outperform local search SAT solvers in most practical SAT applications. Given this nature, in Appendix E, we highlight the potential for future research to build neural-guided SAT solvers by integrating GNN models with modern CDCL-based solvers. For instance, GNN models could be employed as advanced local search heuristics, activated during the restart phases of CDCL-based SAT solvers. Or, leveraging the strong performance of GNN models in predicting unsat-core variables, as evidenced in G4SATBench, we could utilize these predictions to inform the branching decisions of CDCL-based solvers, thereby enhancing their efficiency. The goal of G4SATBench is not necessarily to develop a state-of-the-art SAT solver but rather to offer valuable insights into the strategies of GNN models for SAT solving. We hope these insights can lay the groundwork for future innovations and hybrid approaches in this area.

2023-11-22

Thanks very much for your response! However, as a top-tier conference submission that describes a benchmark on SAT instances, it is desirable to include industrial SAT instances. Also, in the community of SAT solving, CDCL has become the mainstream method for about two decades, so it is interesting to improve the performance of CDCL-based SAT solvers. As a result, I have to maintain my score.

审稿意见

评分: 6置信度: 42023-11-04

The authors present a comprehensive benchmark for the development of Graph Neural Network (GNN)-based approaches for satisfiability (SAT) solving. They compile all relevant work in the field into a unified benchmark, facilitating direct comparison of various methods. This benchmark is poised to be a pivotal resource, bridging the research efforts in deep learning and SAT solving. The paper also offers an extensive empirical analysis of different GNN architectures, providing insights into their performance across the benchmark's datasets, and highlighting certain limitations in current methodologies.

优点

Consolidation of GNN-Based SAT Solving: The paper's primary strength lies in amalgamating all existing GNN-based SAT-solving methods into a single codebase and publication. This centralization is an invaluable contribution to the field and sets a firm foundation for subsequent research.

Innovative Comparison Techniques: The authors introduce inventive strategies for comparing GNN-based SAT solvers with traditional heuristics. These comparisons are crucial for contextualizing the performance of the proposed methods within the broader landscape of SAT-solving techniques.

缺点

Benchmarking Against Traditional Heuristics: Despite the thorough comparison within GNN-based methods, the benchmarking against traditional SAT-solving heuristics remains unclear. For this study to significantly impact SAT-solving, a more explicit comparison using time-to-solve metrics (preferably both arithmetic and geometric means, in accordance with standard practices) with leading industrial solvers or mainstream heuristics is needed.

Gap Analysis: There is a lack of detailed discussion on where GNN-based SAT solvers fall short when compared to state-of-the-art (SOTA) SAT solvers. The authors should elaborate on the deficiencies of GNN-based methods and propose what advancements are necessary to surpass traditional SOTA SAT solvers.

Addressing Gaps: While the paper stands as a considerable contribution to the field, I think it could have been better with novel contributions by proposing solutions to address the gaps identified in the work, e.g., a proposal to increase generalization performance.

问题

There appears to be a typographical error in the paragraph following Equation 1. Aggr(a)gation → Aggr(e)gation
In Section 4.3, "Training Objectives," the relationship between S(x), S_min, and S_max could benefit from clarification. Additionally, the use of N(a) within this context is ambiguous.

2023-11-14

Thank you for your positive review and valuable feedback! In the following, please let us address the concerns and questions you've pointed out.

Benchmarking Against Traditional Heuristics

The primary objective of G4SATBench is to establish a comprehensive framework for evaluating GNN-based approaches to SAT solving and shed light on the learning capabilities and strategies of GNNs in SAT solving. To this end, our current framework does not include direct time-to-solve metric comparisons with modern SAT solvers. To enable such comparisons, we further conduct additional experiments using the local search solver Sparrow and CDCL SAT solver CaDiCaL on the instances in G4SATBench. Each experiment is repeated three times with different random seeds, and the mean results are reported.

We first set Sparrow to output the same number of solutions (i.e., 32) as NeuroSAT in our G4SATBench experiments. The following tables show the solving accuracy and running time of Sparrow:

Solving accuracy of Sparrow (%):

Difficulty	SR	3-SAT	CA	PS	K-Clique	K-Domset	K-Vercov
Easy	56.03	52.09	85.48	77.25	53.68	37.15	31.61
Medium	1.64	1.85	12.68	10.72	12.99	1.27	0.53

Running time of Sparrow (seconds per instance)

Difficulty	SR	3-SAT	CA	PS	K-Clique	K-Domset	K-Vercov
Easy	0.005	0.005	0.006	0.006	0.005	0.005	0.005
Medium	0.008	0.007	0.006	0.006	0.006	0.007	0.006

We then allow Sparrow and CaDiCaL to run without constraints to fully solve the satisfiable instances in G4SATBench and report the running time of each solver:

Running time of Sparrow (seconds per instance)

Difficulty	SR	3-SAT	CA	PS	K-Clique	K-Domset	K-Vercov
Easy	0.007	0.007	0.008	0.008	0.007	0.009	0.009
Medium	0.013	0.013	0.010	0.013	0.013	0.012	0.011

Running time of CaDiCaL (seconds per instance)

Difficulty	SR	3-SAT	CA	PS	K-Clique	K-Domset	K-Vercov
Easy	0.013	0.013	0.012	0.011	0.014	0.012	0.011
Medium	0.014	0.043	0.016	0.018	0.015	0.013	0.012

It's important to note that both CaDiCaL and Sparrow are highly optimized SAT solvers with advanced solving heuristics and engineering features. As a result, current standalone GNN-based SAT solvers face challenges competing with them. However, our findings indicate that when constrained to output the same number of solutions, GNN-based heuristics perform more effectively than modern local search solvers like Sparrow. Yet, when such a constraint is removed, as stated in our experiment section, GNN models may struggle to explore the vast solution space as efficiently as Sparrow and CaDiCaL.

Gap Analysis & Addressing Gaps

Our experimental results reveal that standalone GNN-based SAT solvers can develop a novel solving heuristic similar to local search. However, they currently fall short in emulating the CDCL heuristic, primarily due to the static nature of their message-passing framework that struggles to encapsulate the dynamic clause learning process inherent in CDCL. This limitation becomes evident when comparing their performance with CDCL-based solvers, particularly on large-scale industrial SAT instances where CDCL solvers excel.

To bridge this gap, as outlined in Appendix E, one avenue to enhance standalone GNN-based SAT solvers is to leverage dynamic GNN architectures to learn the CDCL heuristic within the latent space. Furthermore, our experiments demonstrate that GNN models excel in unsat-core variable prediction, pointing to another promising path toward achieving a state-of-the-art solver. This could be achieved by integrating GNN models with modern CDCL-based SAT solvers, combining their strengths for more effective SAT solving. For instance, GNN models could be employed initially to predict unsat-core variables, guiding CDCL-based SAT solvers to prioritize these variables in their branching strategies. Alternatively, GNN-based SAT solvers could be viewed as novel local search heuristics, complementing CDCL solvers in tackling SAT instances that are challenging for the CDCL heuristic alone.

2023-11-14

There appears to be a typographical error in the paragraph following Equation 1. Aggr(a)gation → Aggr(e)gation

We thank you for pointing out our typographical error. This has been corrected to "Aggregation" in the revision.

In Section 4.3, "Training Objectives," the relationship between S(x), S_min, and S_max could benefit from clarification. Additionally, the use of N(a) within this context is ambiguous.

$S_{max}$ , $S_{min}$ , $N$ are the smoothed approximations for the logical operators OR ( $\lor$ ), AND ( $\land$ ), and Not ( $\neg$ ), respectively. Given a CNF formula $\phi$ and a predicted assignment $x$ , we employ these smoothed functions to replace the standard logical operators in $\phi$ and substitute the variable in $\phi$ with the corresponding values from $x$ to calculate its satisfiability score $S_{\phi}(x)$ . We have provided a clearer explanation of this part in the revision.

评论- thank you!

2023-11-20

Thank you for the detailed clarification. I have no further questions.

I believe that incorporating this detailed discussion can significantly enhance the comprehension and impact of your work. Additionally, it might be beneficial to explore the solutions to the gaps you've identified in your research. Such an inclusion could further bolster the technical contribution of your manuscript.

审稿意见

评分: 5置信度: 32023-11-09

The paper presents G4SATBench, the first benchmark study framework for GNN-based SAT solvers. The experiments provided insights and valuable comparisons between some current GNN-based SAT solvers

优点

Originality: The new framework G4SATBENCH provides a flatform for comprehensive comparisons on existing GNN-based SAT solvers.

Quality: The overall quality is good. The authors performed substantial experiments and comparison with exiting solvers on several testing datasets.

Clarity: The paper is generally well-written, and the overall clarity is good.

Significance: The experiments were substantial and significantly showed the advantage of the framework in conducting experiments on GNN-based SAT solvers. It served as a platform to provide insight comparisons, which is helpful in understanding limitations and advantages of current GNN-based SAT solvers. However, I would like to see more experiments and comparisons on the dataset published with NeuroSAT and GGNN. I also want to see the running time of these solvers in Table 1, table 2, and table 3.

缺点

The G4SATBench integrates the methodologies and evaluation metrics of existing GNN-based SAT solvers, which is very useful for reviewing the current GNN-based SAT solvers. However, I could not find the significant contribution and innovation towards the state-of-the-art of SAT solvers.

问题

Can you provide the running time of these solvers in Table 1, table 2, and table 3? Can you provide your definitions of how to categorise easy datasets (vs) medium datasets?

2023-11-14

Thank you for your feedback on our paper! We hope to address your concerns and questions about our paper as follows:

I would like to see more experiments and comparisons on the dataset published with NeuroSAT and GGNN.

The original NeuroSAT paper [1] trained its model on SR(U(10, 40)) instances (i.e., 10 to 40 variables in a SAT instance) and primarily evaluated it on SR(40) instances. While it also explored smaller graph combinatorial problems (around 10 nodes in an instance), these evaluations were somewhat limited in scope and depth (they only provided a rough description of the results). In G4SATBench, we offer a richer set of benchmarks with 7 different problems across 3 difficulty levels. Our benchmarks not only cover the original NeuroSAT datasets but also significantly extend them, enabling a more thorough assessment of GNN-based SAT solvers.

However, I could not find the significant contribution and innovation towards the state-of-the-art SAT solvers.

We hope that G4SATBench could serve as the first comprehensive framework for GNN-based SAT solving. Toward this goal, we provide a unified interface and establish a comprehensive benchmark for evaluation. Beyond the datasets and evaluation, we further explore how and what GNN models can learn to solve SAT and unveil some critical insights previously unexplored in this field:

For satisfying assignment prediction, using unsupervised training could yield better solving accuracy than supervised learning, especially in scenarios with multiple solutions.
Representation learning (e.g., contrastive learning/learnable node features) has limited influence on enhancing the performance of GNN-based SAT solvers.
Standalone GNN-based SAT solvers struggle to learn or mimic the CDCL heuristics in the latent space.
Standalone GNN-based SAT solvers could learn a solving heuristic akin to the greedy local search to solve SAT when trained for satisfiability prediction or assignment prediction.

While our current focus is not on developing a state-of-the-art (GNN-based) SAT solver, our work provides valuable insights into how GNNs can approach SAT solving and what heuristics they can learn. As stated in Appendix E, to develop a state-of-the-art SAT solver, we could leverage these insights to combine GNN models with a modern SAT solver. For instance, GNN models could be utilized as local search heuristics to assist CDCL-based SAT solvers. Or they could predict the unsat-core variables to inform the branching decisions of CDCL-based solvers, thereby enhancing their efficiency. We hope our work could pave the way for future developments in this area and inspire further advancements to develop neural-guided SAT solvers.

Can you provide the running time of these solvers in Table 1, table 2, and table 3?

A key advantage of employing GNNs in SAT solving is their capability to process multiple instances simultaneously in parallel on a single GPU, greatly enhancing computational efficiency. In our experiments, we set the batch size to 512 for testing and calculate the average running time per instance by dividing the total time by the number of instances. We also find that various GNN architectures exhibit very similar running times. Therefore, for clarity and conciseness, we present here the running times of NeuroSAT for the three tasks in G4SATBench:

Running time for satisfiability prediction (seconds per instance)

Difficulty	SR	3-SAT	CA	PS	K-Clique	K-Domset	K-Vercov
Easy	0.002	0.002	0.003	0.002	0.003	0.003	0.002
Medium	0.007	0.004	0.010	0.012	0.010	0.007	0.008

Running time for satisfying assignment prediction (seconds per instance)

Difficulty	SR	3-SAT	CA	PS	K-Clique	K-Domset	K-Vercov
Easy	0.002	0.002	0.002	0.002	0.003	0.002	0.003
Medium	0.008	0.006	0.009	0.010	0.009	0.006	0.007

Running time for unsat-core variable prediction (seconds per instance)

Difficulty	SR	3-SAT	CA	PS	K-Clique	K-Domset	K-Vercov
Easy	0.002	0.002	0.002	0.003	0.002	0.003	0.003
Medium	0.004	0.003	0.007	0.005	0.008	0.007	0.009

Note that using more advanced GPUs (e.g., NVIDIA A100) and a larger batch size, can further reduce the running times of GNN models, thereby improving efficiency.

2023-11-14

Can you provide your definitions of how to categorize easy datasets (vs) medium datasets?

We categorize the difficulty levels roughly based on the variable and clause size. For example, the easy SR dataset contains SAT instances with 10-40 variables while the medium SR dataset includes instances with 40-200 variables. We provide the detailed statistics of our datasets in Appendix A.

[1] Selsam, Daniel, et al. "Learning a SAT solver from single-bit supervision." ICLR 2019.

评论- Thank you

2023-11-23

Thank you authors for addressing the questions. I do not have any further questions.

2023-11-18

We genuinely appreciate all the reviewers for their valuable feedback on our paper. Should there be any more questions or points of clarification needed, we are keen to engage in further discussions!

AC 元评审

2023-12-07

Summary: The paper proposes a dataset of SAT instances, which is used to benchmark both traditional and GNN-based solvers.

The paper is clearly written.
Source code is provided and the benchmark will be public, which would help other researchers in the field replicate the results and test new ideas.

The dataset is lacking in some important respects (e.g. real-world instances).
The analysis of the results is lacking.
Some of the more recent neural solvers are missing.

为何不给更高分

All the reviewers agree that there is clear merit in this line of work. However, the benchmarking effort is lacking in some important respects (see weaknesses above).

The authors are encouraged to take into account the detailed reviews when preparing the paper for resubmission to a top-tier ML venue.

为何不给更低分

N/A

最终决定Reject

2024-01-16

Reject