6.4

/10

Poster4 位审稿人

最低4最高4标准差0.0

3.0

置信度

创新性2.5

质量2.8

清晰度2.0

重要性3.3

NeurIPS 2025

Graph-based Symbolic Regression with Invariance and Constraint Encoding

Ziyu Xiang,Kenna Ashen,Xiaofeng Qian,Xiaoning Qian

OpenReview PDF

提交: 2025-05-11更新: 2025-10-29

TL;DR

Graph-based symbolic regression that captures expression equivalences on graph representations and incorporate constrained search utilizing hybrid neural-guided Monte-Carlo tree search.

摘要

Symbolic regression (SR) seeks interpretable analytical expressions that uncover the governing relationships within data, providing mechanistic insight beyond 'black-box' models. However, existing SR methods often suffer from two key limitations: (1) *redundant representations* that fail to capture mathematical equivalences and higher-order operand relations, breaking permutation invariance and hindering efficient learning; and (2) *sparse rewards* caused by incomplete incorporation of constraints that can only be evaluated on full expressions, such as constant fitting or physical-law verification. To address these challenges, we propose a unified framework, **Graph-based Symbolic Regression (GSR)**, which compresses the search space through a permutation-invariant representation, Expression Graphs (EGs), that intrinsically encode expression equivalences via a term-rewriting system (TRS) and a directed acyclic graph (DAG) structure; and mitigates reward sparsity via employing hybrid neural-guided Monte-Carlo tree search (hnMCTS) on EGs, where the constraint-informed neural guidance enables direct incorporation of expression-level constraint priors, and an adaptive $\epsilon$-UCB policy balances exploration and exploitation. Theoretical analyses establish the uniqueness of our proposed EG representation and the convergence of the hnMCTS algorithm. Experiments on synthetic and real-world scientific datasets demonstrate the efficiency and accuracy of GSR in discovering underlying expressions and adhering to physical laws, offering practical solutions for scientific discovery.

关键词

Symbolic RegressionExpression GraphMonte-Carlo Tree SearchExpression EquivalencyPermutation InvarianceConstrained Search

评审与讨论

审稿意见

评分: 4置信度: 42025-06-20

This work tackles the challenging problem of symbolic regression by introducing novel methods to learn more effective representations of mathematical equations and improve reinforcement learning (RL) signal design.

The key contributions of the paper are twofold:

Expression Graphs (EG) Representation and R-GCN-Based Encoding: The authors propose a graph-based representation of equations—referred to as Expression Graphs (EG)—which captures structural and semantic relationships within mathematical expressions. To encode these graphs into meaningful representations, they employ R-GCNs, enabling the model to better understand and generalize from complex equation structures.
hnMCTS for Reward Densification in RL: The second contribution introduces hnMCTS. This component is used to densify sparse reward signals in the RL framework, allowing the agent to evaluate and learn from partial equation expansions rather than relying solely on complete solutions.

The paper includes a series of experimental evaluations and ablation studies that support the effectiveness of the proposed approaches.

优缺点分析

Strengths:

[Significance] The paper is well-motivated and tackles two rather significant issues in symbolic regression.
[Novelty of EG] The proposal of using EG to represent equations is novel.
[Quality of experiments] The author conducted experiments and relevant ablation studies to verify the various components of the algorithm.

Weaknesses:

[Clarity] The mathematical formulation is somewhat unclear. For example, $\mathbf{P}$ was initially defined as a "prior probability matrix" on line 130, but was redefined as a predicted output of R-GCN on line 132. The dimensionality of $\mathbf{P}$ is also unclear. As another example, the actions $a_t$ are not defined in the text. It is thus unclear how the authors parameterized $a_t$ . There are many other examples of ambiguous notations in the manuscript. The unclear formulation poses challenges to thoroughly understanding the method.
[Novelty of hnMCTS] The proposed hnMCTS seems to be a simple combination of two well-known algorithms (MCTS and neural-guided MCTS). While the motivation and effectiveness of hnMCTS are clear, the method is lacking in novelty.
[Limitation of EG] The proposed EG method can only account for a limited set of invariance relations in equations (i.e., commutative equivalences). In practice, there exist other modalities that lead to two equations being equivalent (e.g., distributive laws and other common algebraic equalities). It would be beneficial to analyze the amount of invariance EG can resolve versus the amount of residual invariance.
[Limitation of sequential generation] In the MDP formulation, the EG is generated sequentially. For the same EG, the generations will be different if the nodes are traversed in different orders. In other words, although EG captures the invariance in equations, the sequential traversals on EG might sacrifice efficiency again (thereby unnecessarily complicating the search space). The authors should clarify and disucss this aspect.
[Incorporating equation-level constraints] The author mentioned that the new method allows incorporation of equation-level constraints without seeing a completed equation (line 92). However, the exact way to incorporate such constraints is not clearly stated.

问题

Please refer to weakness 3,4, and 5.

局限性

yes

最终评判理由

The rebuttal partially addresses my concerns. Point 1, 4, and 5 in Weakness needs further elaboration.

格式问题

No concerns of formatting.

评论- Response to Reviewer ytHy

2025-08-01

We sincerely thank the reviewer for the constructive feedback and insightful questions. Below, we address each point in turn to clarify the framework and resolve any ambiguities.

1) Clarity in Mathematical Formulations

We appreciate the reviewer’s comments on notation clarity and will revise Section 3 for greater precision. Specifically:

The prior probability matrix $\mathbf{P} \in \mathbb{R}^{|\mathcal{V}| \times |\mathcal{Q}|}$ (line 130, $|\mathcal{V}|$ is the number of nodes and $|\mathcal{Q}|$ is the size of operator dictionary) is defined as a matrix where each row corresponds to a node in the EG, and each column to an operator $\phi \in \mathcal{Q}$ , giving the prior distribution for adding $\phi$ at that node. Line 132 clarifies that $\mathbf{P}$ and $r$ are predicted by the R-GCN-based policy $\pi_\theta$ (Eq. 1), with $\pi_\theta | \mathbf{P}$ and $\pi_\theta | r$ as separate MLPs on the R-GCN embeddings (Eq. 2, Appendix C.3). This is consistent: the initial mention introduces the concept, and the follow-up specifies the neural prediction.
Actions $a_t$ are defined in the MDP settings (Sec. 3.2): at step $t$ , $a_t$ is the operator $\phi \in \mathcal{Q}$ added to the current EG state $s_t = G_t$ , growing the graph deterministically (e.g., adding a child node/edge). The policy $\pi_\theta | \mathbf{P}$ parameterizes the prior distribution over $a_t$ through $P(s, a)$ in PUCB (Eq. 3). We will add explicit definitions and parameterizations in revisions to eliminate ambiguity.

2) Novelty of hnMCTS

While hnMCTS builds upon naive and neural-guided MCTS [8, 13, 18], its novelty lies in its adaptive $\varepsilon$ -policy (Eq. 3), which enables hybrid integration of exploration and guidance. This design allows early random exploration, followed by a smooth transition to neural guidance as R-GCN stabilizes. Expression-level constraints are directly encoded via priors $\mathbf{P}, r$ , influencing the sampling policy during search. Furthermore, we provide convergence guarantees under this scheme (Appendix C.5, Theorems C.1–C.6).

This is not a simple combination of components: the $\varepsilon$ -policy decay (Appendix C.6) is specifically introduced to mitigate sparse rewards, a known challenge in symbolic regression. This mechanism improves sample efficiency (Table 1) and enhances the balance between exploration and exploitation (Appendix C.5.2, D.3).

3) Limitation of EG Invariance

EG captures a broad class of equivalences beyond commutativity, including associative, distributive, identity, exponential, and trigonometric rules (Table 3, Appendix C.2). These are modeled through a convergent term rewriting system (TRS) that is both terminating and confluent, yielding unique normal forms. EG encodes these canonical forms as DAGs. We provide a detailed proof of this uniqueness property, quantify its impact on search efficiency, and discuss remaining limitations in our response to Reviewer RJ7g (Section 1), which we refer to for completeness.

4) Sequential EG Generation

We thank the reviewer for raising this insightful question. We would like to clarify that the sequential generation scheme remains efficient. While different traversal orders can construct the same EG, they often pass through distinct intermediate expressions, which are not mathematically equivalent. Therefore, these paths shouldn't be merged and do not result in redundant representations, as the same EG is only one of several possible descendants from such intermediates. The permutation invariance encoded by EG only applies to equivalent expressions with the same descendants, and different intermediate expressions are not redundant cases to inflate the state space.

5) Incorporating Expression-Level Constraints

Expression-level constraints are incorporated proactively via R-GCN-predicted priors $\mathbf{P} \in \mathbb{R}^{|\mathcal{V}| \times |\mathcal{Q}|}$ and scalar scores $r$ , which leverage structural features (e.g., node/edge features) to estimate physical validity (e.g., favoring $r^{-m}$ near singularities; Eq. 5, Appendix C.3). These priors guide hnMCTS through $P(s, a)$ in PUCB (Eq. 2), biasing the search toward constraint-satisfying expressions early.

Unlike post-hoc filtering, this inductive bias reduces invalid rollouts and densifies the reward landscape, even though full evaluations are only performed at leaf nodes. We will clarify this mechanism in Sec. 3.1 and provide a concrete example in the final version.

We thank the reviewer again for the detailed and thoughtful feedback. We believe these clarifications address the raised concerns and will further enhance the presentation in the revised manuscript.

Best regards,
The Authors

2025-08-04

Thanks for the response. I have updated the score. I still have some lingering concerns over point 1, 4 and 5.

评论- Reply to Comment by Reviewer ytHy

2025-08-07

We sincerely thank Reviewer ytHy for the thoughtful engagement throughout the review process and for the updated score. We understand that certain points—particularly regarding formulation clarity, sequential generation, and constraint incorporation—may still benefit from further explanation. In the camera-ready version, we will strengthen these areas as follows:

Formulation clarity (Point 1)
We will include inline examples demonstrating how the prior matrix $\mathbf{P} \in \mathbb{R}^{|\mathcal{V}|\times|\mathcal{Q}|}$ is applied within PUCB (Eq. 3), and provide a step-by-step breakdown of the action sampling $a_t$ , ensuring notation on lines 130–132 is fully self-contained and consistent.
Sequential EG generation (Point 4)
We will clarify in the Appendix that invariance is enforced at the expression level via canonicalization of EG, not at the path level. A concrete EG example will show how different traversal orders contain semantically different intermediates, which are inherent to exploratory search and do not create redundant equivalents.
Expression-level constraints (Point 5)
In Section 3.2, we will detail how R-GCN messages produce priors $(\mathbf{P},r)$ via Eq. 5 to estimate validity, how these priors bias hnMCTS through Eq. 3, and we will quantify the resulting sparsity reduction (Fig. 6a).

We thank Reviewer ytHy again for the invaluable feedback, which has guided us in improving the precision and accessibility of our presentation, and we are grateful for the opportunity to refine our work accordingly.

Best regards,

The Authors

审稿意见

评分: 4置信度: 22025-07-01

The paper claims two improvements to symbolic regression: first a new DAG representation which removes the well-known permutation problem of expression trees. Second, use of a graph neural network to propose Monte Carlo moves. The GNN proposals are used as part of a policy which can also draw conventional random proposals. The GNN takes only encodings of the expression graph, and does not look at the data. Acceptances depend on the reward, ie the fit of the numerically-optimised tree to the data. A RL algorithm receives and propagates rewards to states, where rewards are the fit from model to data. Numerical parameters are fit with standard BFGS for each expression (which also handles constraints), and a form of chunking type memory is used in "trans" to store small useful tree fragments which then get their own special syntax to insert into new expressions. The method beats state of the art methods including AI-Feynman, Operon and DGSR on a mixture of two standard datasets for scientific discovery, Feynman and blackbox, measured by solution rate; and beats them all except AIFeynman (which appears to give a NaN score) measured by R^2; and beats a smaller collection of models on a new dataset.

优缺点分析

The main paper does not contain enough detail to reproduce the work, though this is provided in a very long appendix which is effectively the real paper. (I'm not sure what is NeuroIPS policy on this, many papers now seem to be basically ignoring page limits via this approach, so look more like journal papers but split between the two documents. AFAIK appendices are supposed to be for optional, nice-to-have extensions rather than for the main algorithmic content?)

Likewise the lit review in the main paper is very short, but the appendix does contain the full one, where it is good to see the history of SR traced back to Koza's GP in the 1990s (though not Simon's original BACON from the 1970s, which should maybe get a mention?), through to recent relevant DNN-based systems like AI-Feynman.

Convergence proofs are provided in the appendix.

I'm struggling to understand the part of the algorithm for making the neural network expression proposals. I can see the expression graphs being encoded with graph NNs. I can see discussion of an MDP and rewards (i.e. the expression, when numerically optimised, being a good fit to the data, appendix eqn 6), and Q-values. There is GNN message passing in adx eqn 5. From my reading, idea idea appears to be to encode the expression trees via a GNN, then use this encoding as state, in a reinforcement learning algorithm which selections actions that modify the graph (via encoding representations). I'm struggling to understand why the proposal method doesn't look at the data -- only the current state of the graph -- and what gets learned by doing so? I can see the learning makes use of the data via the rewards, so it looks like the network is learning heuristics which say "if the current graph has these kinds of features, then try modifying it in these ways".

问题

The conversion from expression tree to expression graph looks similar to moves made by optimizing compilers, could you comment on any links here? Is there a standard compiler theory for how to do this optimally? How different is the proposed method from such standards?

Does the data play any role in generating the expression graph proposals, or are they proposed based only on the current expression graph? If the latter, why not also encode and look at the data when making proposals?

Is the learning done from scratch each time a new data set is presented for fitting, or it is a foundation style model where such proposals are learned for the general case then applied to each data set?

Why does AIFeynman have a NaN score for R^2? Could this be tweaked in any way to obtain an actual number (e.g. removing a few problematic items?) to make it more comparable with the new method?

Why is the new physics dataset only tested against 4 less well known models in Appendix Table 2, rather than the full set of state of the art models used in the Feynman+blackbox experiment? (maybe there is something about this dataset than means only models with some certain property can be run on it?)

Beating all of the state of the art methods on the standard data sets seems impressive. But it is sometimes possible to do this by developing new algorithms against the test data sets throughout their lives. Even if numerical parameters are not specifically optimised to fit the test set, different choices in the structure of the algorithm itself can be tried out and used to evolve the system to fit the test set. And this is a very complex algorithm which has many such decisions to be made. Can the authors confirm that this was not done during algorithm development, and that all development was done using different data sets from the test sets presented here?

局限性

yes

格式问题

none

评论- Response to Reviewer bn25

2025-08-01

We appreciate the reviewer bn25' detailed comments. To help address the raised concerns and questions, please allow us to provide point-by-point responses to clarify our method for better understanding.

1) Links to Compiler Theory for ET-to-EG Conversion

The conversion from Expression Trees (ETs) to EGs shares similarities with compiler optimizations on Abstract Syntax Trees (ASTs). In compilers, ASTs are often transformed to Directed Acyclic Graphs (DAGs) for expressions to enable common subexpression elimination (CSE) and algebraic simplifications (e.g., flattening associative operations, constant folding). Our EG construction (Appendix C.2) mirrors this: node sharing for common feature nodes (DAG property), unification of binaries (e.g., +/− to $\sum$ with edge features), and rewriting for equivalences (Table 3, like distributive expansion). Standard compiler techniques include bottom-up DAG construction during parsing (e.g., in GCC/LLVM for intermediate representations) and term rewriting for optimizations. However, EGs extend this with R-GCN-compatible features (e.g., operand relations $\alpha$ ) tailored for SR invariance. While no exact "standard" exists for SR-specific equivalences, our rules (TRS, defined in our response to Reviewer RJ7g Section 1, to yield unique EG representations) align with algebraic rewriting in computer algebra systems (e.g., SymPy). We will add these connections to Appendix B in revisions.

2) Literature Review and Appendix Usage

We thank the reviewer bn25 for the suggestion and will discuss Simon's BACON in the literature review for completeness. We agree that the main paper should be self-contained, and we aimed for that by including core descriptions and results in the main paper and providing optional expansions in the appendix. To improve clarity, we will include the key literature review and reproducibility details (e.g., experimental settings) in the main paper in the camera-ready version given the extended page limit.

3) Clarification on Neural-Guided Proposals and the Data Role

Thanks for raising questions in the GNN proposal mechanism, and we would like to provide more clarifications to assist better understanding. The Expression Graph as the DAG $G_t$ is represented as the state of an MDP, which the Relational GCN (R-GCN) encodes to predict a prior probability matrix $\mathbf{P}$ (operator selection distribution per node) and reward $r$ (estimated utility; Eq. 1-2). These guide hnMCTS simulations (Algorithm 2), balancing exploration via PUCB (Eq. 3).

The network does not directly encode the data $(X, y)$ ; instead, data influences proposals indirectly via RL rewards $R(\tau)$ from expression evaluations (fit to data) for current $G_t$ , and are backpropagated to update $\pi_\theta$ (Eq. 4). This indirect data influence (via rewards) allows learning general structural priors (e.g., favoring valid constraints) without overfitting to specific datasets. Encoding data directly (e.g., via embeddings) could bias toward numerical patterns over symbolic ones, complicating generalization, and leading to computational overhead. Future work could explore more carefully designed hybrid data-graph inputs for data-aware GNNs to balance the computational cost and guidance enhancement.

4) Learning Approach

The model learns from scratch for each dataset, as SR is typically per-problem (exploratory discovery). This avoids foundation-model biases from unrelated data, but pretraining on synthetic expressions (as in Sec. 6) is a promising extension.

5) AIFeynman NaN Score

In SRBench, AIFeynman's NaN in R2 (Fig. 4a) arises from precision artifacts: exact recoveries yield zero error, causing NaN in some metrics (e.g., division by zero in normalized scores). This is a known benchmarking issue, and we report raw SRBench results for fairness.

6) Limited Baselines on Physics Dataset

The materials science dataset (Appendix F) requires physics-specific constraints (e.g., the essential scalar output constraint for valid evaluation; Sec. 5.2), for which many SOTA SR methods (e.g., AIFeynman, Operon) lack immediate support. We compared with fitting baselines: CGCNN (graph NN surrogate), GPs (from [25]), and MCTS (constraint-aware variant). GSR's superiority (lowest MAE, satisfies all constraints; Table 2) highlights its value. We will discuss this rationale in our revision.

7) No Test-Set Overfitting

We confirm that all developments were on separate datasets (e.g., Nguyen for initial tuning, holdouts from SRBench) without access to test sets during algorithm design. Hyperparameters (Appendix C.6) were fixed before final runs, following standard practices to ensure fair comparisons.

Overall, we thank the reviewer again for the detailed comments and hope that we have addressed all the reviewer's concerns and questions.

Best Regards,

The Authors

2025-08-01

These replies answer my questions, it would be good to see them incorporated into the paper if accepted.

From comparing to other reviewers: the clarity of the paper seems OK to the most experienced reviewers in this area, and less clear to the reviewers who are further away from the core topic. I'm happy to defer to those reviewers and admin that my low clarity score may be due to this. Though I would recommend updating the proposal mechanism description in the paper as above to be more accessible to readers coming from further away.

评论- Reply to Comment by Reviewer bn25

2025-08-04

Thank you for your thoughtful follow-up and for confirming that our responses addressed your questions. We greatly appreciate your constructive insights on clarity and accessibility, which are valuable in improving the overall presentation of our work.

We agree that refining the description of the proposal mechanism will enhance the paper's accessibility to a broader audience, and we are committed to incorporating these suggestions, including a clearer explanation of the proposal mechanism, into the camera-ready version if the paper is accepted.

Best regards,
The Authors

审稿意见

评分: 4置信度: 32025-07-01

The authors present Graph-based Symbolic Regression (GSR) to address two limitations in symbolic regression: permutation invariance in expression trees (ET), and sparse rewards due to inadequate constraint incorporation during sampling. GSR uses Expression Graphs processed by Relational Graph Convolutional Networks (R-GCN) and employs hybrid neural-guided Monte Carlo Tree Search (hnMCTS) to integrate constraints directly into the search process. The method is evaluated on standard benchmarks and a materials science application for predicting copper interatomic potentials.

优缺点分析

Strengths:

The paper approaches real limitations in symbolic regression. The expression graph representation appears to have the potential to address permutation invariance issues that affect traditional expression trees, and the R-GCN architecture seems well-suited to leverage edge features for encoding operator relationships. The hybrid hnMCTS approach provides a reasonable way to balance exploration and exploitation while incorporating learned guidance.
The experimental evaluation is comprehensive, covering multiple standard benchmarks (Feynman, Black-box, Nguyen) + an application problem to molecular dynamics inter-atomic potentials -- However, the paper could benefit from some additional comparisons and discussions (see questions)
The paper is extensive and well-written and the technical execution appears sound based on the algorithmic details provided in the appendix. The combination of graph neural networks with MCTS represents a reasonable engineering approach that shows consistent improvements across benchmarks.

Weaknesses:

The novelty claims may be overstated. Mainly, the claimed advantages in constraint handling are unclear. Both operation-level and expression-level constraints appear implementable in tree-based methods using the same mechanisms (action masking and post-hoc evaluation). The distinction between traditional "post-hoc penalties" and the proposed "direct constraint incorporation" seems unclear since expression-level constraints inherently require complete expressions for evaluation.
The practical application scope seems limited. Training on 32 copper atoms to predict properties of the same system doesn't demonstrate the generalizability typically expected from scientific methods. While the physics constraints are interesting, they appear to be relatively basic requirements (unit consistency, electrostatic repulsion) rather than complex physical principles.

问题

For the Feynman and Strogatz dataset, could you provide numeric accuracy (R2) scores, expression complexity, and training time comparisons (similar to the black-box dataset analysis)? This would help assess whether the benefits extend beyond recovery rate. Also, if the authors can add some other strong and recent baselines to the comparison, the paper will be improved. I suggest baselines such as PySR (Cranmer), uDSR (Landajuela et al.), TPSR (Shojaee et al.), and MDLFormer (Yu et al.).
How does your approach to expression-level constraints differ from post-hoc evaluation in tree-based methods? For constraints like f(r→0) = ∞, it appears complete expressions must still be generated and evaluated - could you clarify how this constitutes "direct constraint incorporation"?
The method looks to be computationally expensive. What additional complexities graph representation adds to the method? Could you provide wall-clock time comparisons across different dimensions (scalability)?

局限性

The authors mention some of the limitations throughout the paper; this is a minor suggestion but they could be more explicit in bringing the limitations in the final discussion/conclusion of the paper.

最终评判理由

I still remain slightly positive overall about this work and keep my score.

格式问题

N/A

评论- Response to Reviewer YFGA

2025-08-01

We thank the reviewer for their thoughtful comments and valuable suggestions. Below, we address each point in detail to clarify the contributions and scope of our proposed GSR framework.

1) Novelty of Constraint Handling

We appreciate the opportunity to clarify the novelty of our constraint incorporation strategy. The key distinction lies in the proactive constraint encoding via hnMCTS, which integrates expression-level constraints—such as $f(r \to 0) = \infty$ —through R-GCN-predicted priors $\mathbf{P} \in \mathbb{R}^{|\mathcal{V}| \times |\mathcal{Q}|}$ (line 130) and scalar validity scores $r$ , both computed from global EG features (e.g., node/edge types $\alpha$ ) (Appendix C.3).

These priors guide hnMCTS via $P(s, a)$ in the PUCB formula (Eq. 3), steering the search towards constraint-satisfying expressions early. This approach contrasts with post-hoc filtering, as it reduces invalid rollouts before evaluation—leading to a ~60% reduction in reward sparsity (Fig. 6a). While tree-based methods and EG-based methods differ primarily in invariance encoding, EGs enable richer structural (node/edge) features that strengthen neural guidance during search. As shown in the ablation study (Sec. 4.2), EG improves constraint learning even under the same hnMCTS setup. We will clarify this distinction and provide a motivating example in Sec. 3.2 of the revised version.

2) Feynman Metrics and Baselines

Thank you for your suggestions regarding benchmarking completeness. For the Feynman dataset (Fig. 4d), we adopt solution rate (exact recovery) as the primary metric, following prior SRBench protocols. For recovered expressions, $R^2 = 1$ ; for approximate solutions, the average $R^2 \approx 0.92$ , based on best-fit expressions. Our discovered expressions have a median complexity of ~20 operations, comparable to DGSR.

As SRBench’s official release lacks comprehensive reporting of $R^2$ , expression complexity, and runtime, we will extend our benchmarking with these additional metrics via re-implementation. We also plan to include recent symbolic regression baselines (e.g., PySR [Cranmer], uDSR [Landajuela et al.], TPSR [Shojaee et al.]) in our evaluation framework. If accepted, we will include these comparisons in the camera-ready version.

3) Application Scope and Generalizability

The 32-atom copper dataset from first-principles molecular dynamics [25] serves as a representative benchmark for materials science, consistent with typical system sizes in datasets like MPtraj (~31 atoms/structure [40]). While training occurs at this scale, the discovered analytical potential $f(r)$ (Table 2) generalizes to much larger systems. As shown in Appendix F.2, it enables efficient simulations involving millions of atoms, such as modeling copper crystallization [41], which are beyond the reach of DFT.

The incorporated constraints—unit consistency and electrostatic repulsion—are foundational physics principles rather than "basic" in a dismissive sense, which ensure valid behavior at short ranges and prevent simulation crashes (Fig. 5). These constraints are either absent or improperly handled in current machine learning models of interatomic potentials, leading to divergences in force calculations and causing real-world simulations using these models to crash. Without these fundamental constraints (though look simple), most, if not all, machine learning models cannot work properly when the system moves away from the equilibrium position, e.g., under temperature or strain fluctuation during molecular dynamics simulations.

4) Computational Expense and Scalability

EGs are DAGs with node sharing, resulting in fewer nodes than equivalent expression trees (Appendix C.2), which helps reduce memory and computational overhead. Moreover, our hnMCTS design does not rely solely on neural-guided search; the UCB component ensures efficiency during the exploration. Together, these design choices contribute to the method’s practical efficiency.

As shown in the ablation study (Sec. 4.2), GSR achieves strong performance while requiring the least training time among competitive baselines. Runtime configurations and sampling throughput are provided in Appendix A.

Regarding scalability, GSR performs robustly across datasets with varying input dimensionality. For example, in the black-box benchmark (Fig. 4c), the average training time remains manageable across increasing dimensions. We will include wall-clock timing and scaling results in Appendix D of the camera-ready version.

Dimension	3	8	13	18	36
Training Time (s)	12676	31483	20781	12783	9172

We again thank the reviewer for their detailed and constructive feedback. We will incorporate the clarifications above, expand on limitations, and include additional evaluations and discussion in the final submission.

Best regards,

The Authors

2025-08-06

Thank you for the discussion on my concerns and questions. Trusting the authors will include the additional comparisons and discussions in the final version, I keep my positive score.

评论- Reply to Comment by Reviewer YFGA

2025-08-07

We sincerely thank Reviewer YFGA for the thoughtful engagement and for maintaining a positive score. We appreciate your trust in our commitment to incorporate the suggested comparisons, metrics, and expanded discussions in the camera-ready version.

Your feedback has been instrumental in improving both the clarity and rigor of our work. We will ensure that the final version includes the benchmarking extensions, clearer constraint integration explanation, and broader discussion on generalizability and scalability, as outlined in our response.

Thank you again for your valuable contributions throughout the review process.

Best regards,
The Authors

审稿意见

评分: 4置信度: 32025-07-05

Based on the invariance-aware design utilizing graph representation, this paper addresses the long-standing challenges of redundant representations (due to the inability to capture equivalences) and reward sparsity in symbolic regression. The proposed framework is validated across both synthetic benchmarks and real-world scientific datasets, demonstrating superior performance in expression recovery and constraint satisfaction. Furthermore, it provides convergence guarantees for the hybrid neural-guided Monte Carlo Tree Search (hnMCTS) algorithm, as detailed in the appendix.

优缺点分析

Strength

This paper solves the redundant representation problem in symbolic regression by introducing Expression Graphs (EGs) that inherently capture mathematical equivalences.
This paper solves the sparse reward challenge by embedding expression-level constraints (e.g., physical laws) directly into the search process via R-GCN-guided MCTS.
This paper addresses the reproducibility gap through open-sourced code and detailed configurations (Appendix D.1/F.3).

Weakness

Uniqueness gap: Lacks proof for EG's representation uniqueness in complex cases (e.g., nested operators).
Only material science case validates physical constraints—needs cross-domain tests (e.g., fluid dynamics).

问题

Is the expression represented by a graph unique? If not, it seems that the limitation has not been completely solved. Can the relative improvement of uniqueness be quantified? Are there any future work directions that can be pointed out for the redundant representation problem that graph representation still cannot completely solve?

局限性

Theoretical Limitation: Unverified Uniqueness of Expression Graphs While the Expression Graph (EG) representation resolves redundancy for basic operator equivalences (e.g., commutative operations in Fig. 2b), this paper does not establish EG's capacity to guarantee canonical forms for expressions involving complex nested operators.

2. Empirical Limitation: Narrow Validation of Physical Constraints The physics-informed validation exclusively relies on a single-domain case study (interatomic potential energy in materials science, Sec. 5).

格式问题

Not found

评论- Response to Reviewer RJ7g

2025-08-01

We thank reviewer RJ7g for the insightful questions on the uniqueness of Expression Graphs (EGs) and application scope. We clarify that, for the equivalences in Table 3, EGs yield unique representations (up to graph isomorphism) for equivalent expressions, modeled as a convergent term rewriting system (TRS). Below, we provide a formal proof addressing nested operators, quantify improvements, and discuss residuals/future work.

1.1) Theoretical Proof of EG Uniqueness

Definition: Let $\mathcal{Q}$ be the operator dictionary (Sec. 2.1), and $T(\mathcal{Q})$ the set of expressions over $\mathcal{Q}$ . Table 3 defines a term rewriting system (TRS) $R$ with rewrite rules for equivalences. Terms $t_1, t_2 \in T(\mathcal{Q})$ are equivalent ( $t_1 \sim_R t_2$ ) if they rewrite to a common term under $\leftrightarrow_R^*$ (reflexive-transitive-symmetric closure of $R$ ). EG is a function $EG: T(\mathcal{Q}) \to \mathcal{G}$ , where $\mathcal{G}$ is DAGs with nodes labeled by operators/features (e.g., arity) and edges by relations $\alpha$ (e.g., base/exponent for $\wedge$ ; Fig. 2a). EG normalizes via bottom-up $R$ application (Appendix C.2). Uniqueness: $t_1 \sim_R t_2 \implies EG(t_1) \cong EG(t_2)$ (isomorphism).

Proof: Uniqueness requires $R$ convergent (confluent + terminating), ensuring unique normal forms $nf(t)$ .

Termination: Finite expressions yield finite steps; $R$ satisfies Termination as rules are non-recursive (no loops recreating left-hand sides).
Confluence: If $t \to^* u$ and $t \to^* v$ , then there exists $w$ such that $u \to^* w$ and $v \to^* w$ . $R$ is confluent, as rules are orthogonal (left-linear, non-overlapping). Arithmetic rules (commutative/associative/distributive/identity) mirror polynomial rewriting to sum-of-products form, confluent with multisets for commutative operands [1]. Trigonometric/exponential rules do not overlap with arithmetic rules, preserving confluence via modularity [2]. By Newman’s Lemma, termination and local confluence imply global confluence, so $R$ has a unique $nf(t)$ .
EG Encodes Uniquely: Builds minimal DAG from $nf(t)$ (Appendix C.2): flattens chains (multi-operand nodes with multisets), expands distributivity, simplifies identities/trigonometric. Fixed features ensure uniqueness up to isomorphism; node sharing for minimality, multisets for order-independence.

Thus, $t_1 \sim_R t_2 \implies nf(t_1) = nf(t_2) \implies EG(t_1) \cong EG(t_2)$ . Holds for nested cases (e.g., $(a \times (b + c)) + d$ ) via recursive normalization (Fig. 1).

We will further clarify these points with the above analyses in our Appendix.

[1] Buchberger, Bruno. An algorithm for finding a basis for the residue class ring of a zero-dimensional polynomial ideal. Diss. Ph. D. thesis, University of Innsbruck, Austria, 1965.

[2] Huet, Gérard. "Confluent reductions: Abstract properties and applications to term rewriting systems: Abstract properties and applications to term rewriting systems." Journal of the ACM (JACM) 27.4 (1980): 797-821.

1.2) Quantification

While full algebraic uniqueness is intractable, EG captures dominant redundancies. In the ablation study (Table 1), MCTS‑EG (invariance only) reduces the average search‐space size by 33.9% relative to naive MCTS, directly measuring merged states due to EG's improved uniqueness.

We also notice that ~95% of merges in ablations are operation-level (commutative/associative/distributive), as expression-level equivalences require specific operands/constants and are hard to hold, confirming Table 3 covers bulk redundancy.

1.3) Residuals and Future Work

EG captures equivalences in Table 3, but some rarer expression-level ones (e.g., advanced trig identities) remain. Future work includes: expanding Table 3 with semantic rules; integrating pre-trained GNNs for equivalence detection or hybrid algebraic rewriters. We will discuss residuals in camera-ready.

2) Empirical Limitation

We agree that broader validation would strengthen generalizability. The 32-atom copper dataset [25] was chosen as a representative materials science benchmark, aligning with MPtraj (~31 atoms [40]). The analytical potential $f(r)$ (Table 2) generalizes to large systems, enabling million-atom simulations (Appendix F.2). Besides, the Feynman benchmark (Fig. 4d) already spans diverse physics domains (e.g., mechanics, electromagnetism, quantum), with GSR recovering more ground-truth expressions than baselines, implicitly validating constraint handling across domains. For future work, we plan cross-domain extensions like fluid dynamics (e.g., Navier-Stokes discovery) or biology (e.g., reaction kinetics), leveraging GSR's flexible constraint encoding. We will add this to the discussion in revisions.

We believe these clarifications address your concerns and reinforce the paper's contributions. Thank you again for your feedback—we will incorporate your suggestions to improve the final version.

Best regards,

The Authors

评论- Follow-up on Expression Graph Uniqueness (Reviewer RJ7g)

2025-08-08

We sincerely thank the Reviewer RJ7g's again for the thoughtful review and insightful feedback. As the discussion period nears its conclusion, we would like to briefly follow up on your comment regarding the uniqueness of Expression Graphs (EGs), in case further clarification might be helpful.

EG normalization is modeled using a convergent term rewriting system (TRS) based on the equivalence rules listed in Table 3 (Appendix C.2). The TRS is both terminating and confluent, ensuring that mathematically equivalent expressions yield unique DAG representations up to isomorphism, including nested cases via recursive normalization.

This improved canonicalization is quantified by a 33.9% reduction in search space size (MCTS-EG vs. MCTS; Table 1). We acknowledge that certain rare or domain-specific equivalences remain outside current coverage, and we plan to extend the TRS and investigate GNN-based semantic normalization as part of future work (Sec. 6). We will detail this TRS proof and other clarifications in Appendix C.2 revisions

If you have any further questions or if clarification would be useful, we’d be happy to elaborate. Otherwise, we sincerely appreciate your insights, which have helped us identify areas for improvement in future revisions.

Best regards,

The Authors

最终决定Accept (poster)

2025-09-17

This paper proposes a graph neural network–based symbolic regression approach that addresses two key challenges in the current symbolic regression literature: (1) redundant representations and (2) sparse rewards. The method introduces permutation invariance into the graph representation of expressions to eliminate redundancy, and it encodes constraints during the expression sampling process, thereby avoiding unnecessary rollouts of full expressions.

Strengths:

Novel approach that tackles fundamental issues in symbolic regression.
Clear methodological contributions: permutation-invariant representations and constrained expression sampling.
Potential to significantly improve efficiency and robustness of symbolic regression.

Concerns:

Some reviewers found the algorithmic process not fully clear.
The lack of proof of uniqueness was noted, though this was addressed in the rebuttal.
Evaluation is somewhat limited, focusing primarily on material science applications.

Assessment:

The reviewers agree that the work is novel and makes progress on fundamental problems in symbolic regression. The concerns raised were satisfactorily addressed in the rebuttal, which also included useful clarifications.

Recommendation:

I recommend acceptance. I strongly encourage the authors to incorporate the clarifications and additional results presented in the rebuttal into the final version, and to follow through on their commitments to strengthen the paper. Doing so will ensure the work has maximum clarity and impact.