Global Lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers
Transformers can be trained from synthetic data to find Lyapunov functions, a long-standing open problem in mathematics
摘要
评审与讨论
This paper presents a method using sequence-to-sequence transformers to find Lyapunov functions for dynamical systems, mainly polynomial systems. Even though it illustrates some new advances in LLMs in solving mathematical problems, the reviewer doubts the mechanism and the necessity of using large models such as transformers to learn Lyapunov functions for simple polynomial systems, as well as the fairness of the comparisons made in the paper. This work could be a very interesting and good workshop paper to show the advances of LLMs, but the reviewer doesn't think there are significant academic contributions to the math community.
优点
This paper finds a way to explore or show the capacity of transformers, or LLMs in general, to solve existing mathematical problems, that is, finding the Lyapunov functions for nonlinear dynamical systems in this paper. Also, the backward data generation method is interesting.
缺点
Even though the authors tried to make the story interesting and appealing, it seems that a deeper understanding on Lyapunov stability analysis is needed when improving this paper in the future. For example, Lyapunov functions are useful for providing stability guarantees for equilibrium points for dynamical systems, but there are no such equilibrium points in the three-body problem (line 29). Please see the introduction of chapter 3, Lyapunov stability in 'Nonlinear systems' by Khalil, H. K., one of the references of this work.
Moreover, it seems that the authors didn't understand the stability concept fully when writing this paper. In line 27, they stated that the goal is "discovering the Lyapunov functions that control the global stability of dynamical systems", while in Def. 3.2, they had the definition of 'stable'. Stable and globally stable are different, which should be precise as an academic paper. Also, that's different from asymptotically stable. Meanwhile, 'control' is definitely not the correct wording here.
Last but not least, finding a Lyapunov function (candidate) for a nonlinear system is not a very hard problem now, even for high-dimensional (polynomial) systems, but how to verify it is indeed a valid Lyapunov function satisfying the Lyapunov conditions is the bottleneck, which is difficult to address for non-polynomial or high-dimensional systems. As shown in the paper 'Neural Lyapunov Control', one can easily use a one-hidden feedforward neural network with 6 hidden neurons to learn a Lyapunov function for many 2- or higher-dimensional nonlinear systems which are not necessary to be polynomials. That said, the results shown in this work are not impressive at all, and the Lyapunov function candidates they found using transformers still lack correctness, which needs to be verified against the Lyapunov conditions, given a new nonlinear system.
问题
There are quite a couple of interesting questions that can be asked:
-
It's pretty shocking and interesting to see that the authors called SOSTOOLS which was proposed and developed more than two decades ago. Similar to the basic Lyapunov analysis explained above, it seems that the authors were unclear about the latest development of the toolbox or numerical methods for finding Lyapunov functions for nonlinear systems. Even in this year's HSCC conference, there are two tool papers on finding Lyapunov functions for general nonlinear systems: FOSSIL 2.0 and LyZNet. Both of them can handle the non-polynomial cases. The authors did a comparison with FOSSIL, but they claimed that they just achieved <5% accuracy. I don't think they give the toolbox a fair enough try.
-
With bullet point #1 being said, this transformer method needs a very long training time and an extremely large amount of data. However, with the aforementioned two toolboxes, for some systems, including the simple polynomial systems shown in Appendix C, valid Lyapunov functions can be found within a few seconds, and they are formally verified with SMT solvers. The same for SOSTOOLS, the obtained Lyapunov functions are guaranteed to be valid ones. On the contrary, the performances of the proposed method on some datasets are not satisfactory, such as Table 2 and Table 5.
-
Either in the control and system community or in the applied mathematics community, people barely solve Lyapunov functions by hand, the same as adding two very large numbers. The comparison with so-called 'human mathematicians' is not reasonable.
-
In line 189, ' are diverse enough to span all of '. When is it 'diverse' enough to span the space?
-
It's not surprising to see that after adding some data from forward datasets to the backward ones, the performances are much better. It's very likely that there are some issues with the backward datasets, while the forward datasets are guaranteed to be correct with SOS.
局限性
The fairness of the comparisons and the correctness of the learned Lyapuonv functions, as discussed above. Also, even though the authors claimed that their methods worked well with non-polynomial system, the comparisons and examples are mainly for polynomial systems.
Thank you for your appreciation of the paper.
there are no equilibrium points in the three-body problem
We agree, we mention this problem to emphasize the importance of stability in general. This said, there are periodic orbits which, up to a change of variables, are equilibrium points of a non-autonomous system.
Stable and globally stable are different, which should be precise as an academic paper.
“Globally stable” meant “has a global Lyapunov function”. This is indeed different from asymptotic stability (LAS or GAS) or exponential stability for which we make no claim, but it coincides with GAS when the largest invariant set of is the equilibrium (LaSalle). The Lyapunov functions we generate are proper, we will add it in our definition and update the paper to achieve a level of rigor comparable to what we submit to math journals.
finding a Lyapunov function for a nonlinear system is not a very hard problem
We disagree, at least for non-polynomial systems: finding a global Lyapunov function for some particular examples may be easy, but in general it is a very hard problem and still an active area of research in mathematics (see for instance [Adda & Bichara, IJPAM, 2012] or [Agbo Bidi, Almeida, Coron, 2024]).
As shown in Neural Lyapunov Control, one can easily use a one-hidden feedforward neural network [...] to learn a Lyapunov function.
Neural Lyapunov Function works well on their showcase examples, but often does not converge for the systems we consider. It achieves 2-22% acc. on polynomial test sets, and <1% on non polynomials. This may be because it was designed for stabilization problems, and performs worse when there is no control feedback.
the Lyapunov function candidates they found using transformers still lack correctness
All the Lyapunov functions we find for polynomial systems are guaranteed true using a SOS checker. For non-polynomial systems, following your suggestion, we now use an SMT solver to theoretically ensure correctness (with no change in performance).
the authors called SOSTOOLS which was proposed more than two decades ago
SOSTOOLS is still one of the most widely recognized toolboxes to find Lyapunov functions. While it is two decades old, the underlying SDP solvers -at the core of the computation- have evolved and we are using up-to-date SDP solvers.
The authors were unclear about the latest development [...] for finding Lyapunov functions [...] FOSSIL 2.0 and LyZNet. [...] I don't think they give the toolbox a fair enough try.
We did provide a comparison to FOSSIL. Following your comments, we experimented extensively with the very recent FOSSIL 2.0 and LyZNet (see rebuttal PDF). We were able to reproduce their results and we confirm they have very low accuracy on our polynomial and non-polynomial test sets (see the rebuttal PDF). We will share our code and datasets.
We believe the low accuracy of these models on our test sets is due to the fact that they address a different problem: discovering local or semi-global Lyapunov functions, while we target global Lyapunov functions. Overall, we believe our work is related, but complementary. We will update the discussion accordingly.
this transformer method needs a very long training time and an extremely large amount of data.
Our model needs to be trained once from synthetic data, before it can be used at very little cost for any system, a common paradigm in machine learning. In contrast, the smaller models in FOSSIL and LyZNet must be retrained for every system.
with the two toolboxes [...] Lyapunov functions [...] are formally verified with SMT solvers. The same for SOSTOOLS [...] the performances of the proposed method on some datasets are not satisfactory, such as Table 2 and Table 5.
Our Lyapunov functions are now also formally verified with an SMT solver (this makes little difference in the results, see authors rebuttal). On Table 2 and 5 (out-of-distribution and discovery) our methods perform better than SOSTOOLS and the toolboxes.
people barely solve Lyapunov functions by hand. The comparison with so-called 'human mathematicians' is not reasonable
When classical Lyapunov functions and classical techniques (LMI, BMI SOS, etc.) don’t work, mathematicians (in our community as least) do need to find the form of the Lyapunov functions by hand (see for instance [Adda & Bichara, IJPAM, 2012], [Friedrich et al, SICON, 2023] for ODE systems or [Hayat, Shang, JMPA, 2021] for an example although on PDE). Our models may help “guessing” candidate Lyapunov functions in these situations.
Since finding Lyapunov function is often a textbook exercise in control theory courses, comparing our models with master students in such courses seemed to provide a fair evaluation of the hardness of the problems.
In line 189 [...] when is it 'diverse' enough to span the space?
We mean that , should be a generative family of the . Thanks for the comment, we’ll clarify.
It's not surprising to see that after adding some data from forward datasets to the backward ones, the performances are much better
What is surprising is the proportion involved: performance on both distributions improve, after adding 50 forward samples to 1 million backward (0.05%). This is an important result, since OOD is a difficult problem in machine learning.
It's very likely that there are some issues with the backward datasets
The backward datasets are guaranteed correct by design: the steps of the process ensure the conditions from Def. 3.2 hold (see section 5.1 and Appendix A.2).
the comparisons and examples are mainly for polynomial systems
Our framework can handle both polynomial and non-polynomial systems. We provide more comparisons on polynomial systems because it is a strong baseline with a long history. We will add more comparisons for non-polynomial systems.
Thanks for the responses and the newly added experiments. After reviewing them, I remain unconvinced by the comparisons and the supporting arguments presented. Additionally, the references provided in the responses are difficult to locate and follow in their current format.
I personally still do not see significant contributions to the community (at least in my community).
-
As I stated in my previous comments, finding Lyapunov function candidates for nonlinear systems is not hard. We can use neural networks or other methods to find one (candidate) easily. The critical issue lies in guaranteeing the correctness of these functions. The proposed method cannot address this problem. As the authors themselves stated, the outputs of the transformer model still require formal verification via SOS or SMT solvers, if one needs to ensure their correctness. In addition, for the neural network method in the “Neural Lyapunov Control” paper, I believe that modifying the neural network structure slightly and removing the feedback control component u should yield better accuracy. The claim that the accuracy should be as low as suggested is questionable, especially since FOSSIL is built on this framework, and the case without control input should be easier to solve for low-dimensional systems.
-
I also strongly doubt the claim that it can be used to find a Lyapunov function "for any system". The results are unlikely to generalize to arbitrary nonlinear systems. The systems must likely fall within the categories represented in the training dataset, given the generalization limitations of neural networks. Moreover, I struggle to envision a scenario where generating a Lyapunov function in a rapid, "plug-and-play" manner would be necessary.
-
Furthermore, the authors argued that the proposed method has better performance in finding global Lyapunov functions. This raises the question of why finding a global Lyapunov function is more important (than the other two). In which practical scenarios, such as robotics or power systems, do we need the global Lyapunov function? It’s better to further clarify the significance of this contribution further.
-
While it is true that exercises in control theory courses or publications often involve finding Lyapunov functions by hand, all of them are in a particular structure and definitely not any general nonlinear systems.
Given that this paper aims to solve a mathematical problem using machine learning methods, it is crucial that it be written with rigor, as the authors themselves acknowledge. In my opinion, the current version is not ready for publication, and I would like to keep my score unchanged.
Thank you for your response. We believe most of our disagreement stems from different mathematical communities and motivations:
We are focusing on training transformers to find explicit symbolic global Lyapunov functions for globally stable systems. Except for some polynomial systems, this is an open problem in the area of dynamical systems. No ML methods for this exist. In fact, addressing such an open mathematical problem would be a first for symbolic transformers, which so far have only been applied to problems with known solutions (integration, etc.). We are primarily motivated by the (pure) math problem in itself, rather than its engineering applications.
Your focus seems to be in a related field: finding (implicit) local or semi-global Lyapunov functions, with a motivation coming from engineering applications. For this, ML solutions, such as FOSSIL, do exist.
We make no claim as to the relative scientific importance of the two problems. However, these are very different problems, as our experiments with FOSSIL and other neural solvers suggest in the sense that these tools designed for semi-global Lyapunov functions do not perform well on the global problem. To make this clearer, we propose to add the word "global” to the title of our paper, and add a clarification in the introduction and the related work, about the two problems.
We believe that judging the novelty of our work, targeted on a specific problem of pure mathematics, by comparing it to a related but different mathematical problem with a focus on engineering problems is not appropriate.
Concerning the references, because of the characters limit of the rebuttal we couldn’t add the full links but only the author's name, journal when appropriate and year. We add the full references below for convenience:
- Adda, P., & Bichara, D. (2012). Global stability for SIR and SIRS models with differential mortality. International Journal of Pure and Applied Mathematics, 80(3), 425-433. Link
- Bidi, K. A., Almeida, L., & Coron, J. M. (2023). Global stabilization of sterile insect technique model by feedback laws. arXiv preprint arXiv:2307.00846. Link
- [Friedrich, J., Göttlich, S., & Herty, M. (2023)]. Lyapunov stabilization for nonlocal traffic flow models. SIAM Journal on Control and Optimization, 61(5), 2849-2875. Link
- Hayat, A., & Shang, P. (2021). Exponential stability of density-velocity systems with boundary conditions and source term for the H2 norm. Journal de mathématiques pures et appliquées, 153, 187-212. Link
In this paper, the authors propose a new method of generating synthetic training samples from random solutions, i.e. Lyapunov functions for dynamical systems in the forms of ordinary differential equations. They demonstrate that the sequence-to-sequence transformer training on such data can produce Lyapunov functions better than both existing algorithmic solvers and humans. More speficially, for polynomial systems, the method proposed in the paper can find Lyapunov functions for 10.1% of the randomly generated polynomial systems where the state-of-art algorithm can only find 0.7%, furthermore, the method can produce a Lyapunov function for 12,7% of randomly generated non-polynomial systems.
优点
The problem of Lyapunov function problem is an important problem. The performance of the method proposed in the paper is impressive and promising.
缺点
The novelty of the data generating methods, those described in Sec. 5, is not clearly stated and explained. Currently, it reads as if the results are obtained accidentally.
问题
It is not clear how the forward and backward generation methods are different from what proposed in the past, for example those in Lample & Charton (2019), Prajna et al., 2002, and Prajna et al., 2005.
局限性
As I have pointed out, the paper does clearly describe the difference between what it proposes to some of existing methods, thus make it quite difficult to see the main reasons for the rather remarkable experiments results.
Thank you for your interest and your in-depth reading of the paper.
The novelty of the data generating methods, those described in Sec. 5, is not clearly stated and explained. Currently, it reads as if the results are obtained accidentally
We agree, thank you for raising this. We provide a rationale for the generation method we use in the author rebuttal (section B). We will update the paper accordingly.
It is not clear how the forward and backward generation methods are different from what proposed in the past, for example those in Lample & Charton (2019), Prajna et al., 2002, and Prajna et al., 2005.
For relation with Lample and Charton, and the novelty of our approach, this is detailed in section A of the authors’ rebuttal and will be made precise in the revised version.
Concerning Prajna et al., 2002, and Prajna et al., 2005, our approach is very different: because Prajna et al. design a deterministic mathematical algorithm, there is no notion of data generation: their algorithm is thought of as a tool for mathematicians and engineers given an input system. We use the algorithm they designed (and subsequent more recent versions using the same approach) as a rejector for rejection sampling and as verifier of the Lyapunov functions provided by the model in polynomial cases.
As I have pointed out, the paper does not clearly describe the difference between what it proposes to some of existing methods, thus make it quite difficult to see the main reasons for the rather remarkable experiments results.
Thank you for pointing this out, we will clarify this in the revised paper. We believe that the main differences with existing methods are the following:
Differences with Lample and Charton, 2019 - Charton, Hayat, Lample, 2020
- These papers address problems with known solutions, for which forward generation of large training sets is possible, we address an open problem still unsolved in mathematics. This raises new difficulties : backward generation and OOD generalization. To our knowledge this is the first application of transformers, trained on synthetic data to solve an open problem end to end.
- We develop a bespoke technique for generating the backward dataset, that mitigates the risks discussed by Yehudah et al
- We introduce a new approach to prevent performance from dropping when the model is tested on distributions different from the training distribution. This relies on injecting a tiny proposition (0.05%) of forward examples in the backward dataset.
Differences with Prajna et al. and other Sum-Of-Squares (SOS) approaches
- On the method side, SOS approaches are deterministic solvers relying on Semidefinite Programming, very different from our approach, we use these or similar algorithms for testing the output of our model.
- Our methods allow to handle non-polynomial systems and polynomial systems that have no sum-of-square Lyapunov functions
- On average on the different datasets, our models have better performances and shorter evaluation/inference time.
Differences with recent neural methods (“Neural Lyapunov Function”, FOSSIL 2.0, LyZNet):
- Our approach is very different: we use a Transformer and treat the mathematical problem end-to-end symbolically. Neural methods rely on numerical approximation/simulations of the dynamics, provide an implicit solution, and need to be re-trained on every system.
- Our methods have much higher performances both on polynomial and non-polynomial datasets (see section 6.3 and our new results in the author rebuttal).
- We provide symbolic expressions for the Lyapunov functions, whereas existing neural methods learn an implicit “black box” Lyapunov function
- Our method learn and provide global Lyapunov functions on the whole (rather than semi-global Lyapunov function on a compact set of which is an easier problem, although still challenging)
Thank the authors for providing detailed response to all the questions. I will also increase my score to 5.
This study addresses the problem of designing the Lyapunov function of dynamical systems via learning. The existence of the Lyapunov function is a sufficient condition of the dynamical system's stability. However, its design is not established except for systems with sum-of-square polynomials. The Authors propose a novel dataset construction framework on which Transformer models can be trained. Extensive experiments demonstrate that Transformer models successfully learn to design Lyapunov functions on several types of datasets with moderate generalization ability across datasets. It is also shown that the injection of out-distribution samples in the training set significantly boosts the success rate. Yet restricted to small systems, this study suggests that the learning approach is a promising direction for addressing hard problems of symbolic mathematics.
优点
- This study clearly presents the problem and the approach. Overall, the writing is clean and accessible to diverse readers.
- A novel backward dataset generation algorithm is proposed to train Transformer models for Lyapunov function design.
- The experiments validate the proposed approach from various aspects. Particularly, the generalization across datasets is tested well.
缺点
I appreciate the overall contributions of this study. I'd like to raise two weaknesses, one for the methodology (backward generation) and one for the experiments.
Major comments
Backward generation. While the Author claims that the backward generation is the key innovation, the explanation of the generation process is very limited in the main text. I understand the restriction of pages, but it would be better to elaborate on the method a little more to give intuition to the readers. While the details are presented in Appendix A.2, it mainly provides the procedures, and the rationale is not much given. I was not able to fully understand the procedures either because of the unclarity of several notations (see Minor comments below). Plus, many mapping and transformations appear, but their intention and necessity are unclear to me. I encourage the authors to discuss the range of the Lyapunov function and covered by the proposed generation method and whether this is large enough or not, including, if any, the important class of functions that cannot be covered for now.
As for the experiments, assuming a practical use, it is important to know whether the Lyapunov function given by Transformers is really the Lyapunov function of the targeted system, and if not, whether this is because of a simple failure or the existence of such function. Discussion and experiments seem missing in this part.
Minor comments
(Markdown math rendering somehow fails in the first item below, so I wrote it in a bit tricky way.)
- Some notations are unclear and inconsistent. For example, vectors and are in bold font, while other vectors such as are not. The subscript notations [line 177] and in Eq. (10) are not defined. Assuming that it refers to the -th entry of vector and is , the first term of the equation in [line 177] appears to be a scalar while the second term is a vector. The Authors may represent a vector by in the first term, but still, this is a row vector while the second term is a column vector.
- Forward/backward methods and Forward/backward generations are both used. I recommend that the Authors keep terminology consistent.
- [line 183] and the is --> and there is
- [line 224] .In -> . In
- [line 489] What does mean (the identity map?)?
- [line 492] " is positive" souds a bit wierd as is a function.
- [line 489] Written but is supposed to be a function.
- [Eq. 10] Does refers to the -th entry of ? Then, it should be unbold as it is a scalar.
I also encourage the Authors to include more related works that exploit Transformers to address hard math problems.
Shortest Vector Problem (a series of works by the same group)
- "SALSA: Attacking Lattice Cryptography with Transformers," Emily Wenger, Mingjie Chen, François Charton, Kristin Lauter
Gröbner basis computation
- "Learning to Compute Gröbner Bases," Hiroshi Kera, Yuki Ishihara, Yuta Kambe, Tristan Vaccon, Kazuhiro Yokoyama
If not restricted to Transformers,
Detection of terminal singularity
- "Machine learning detects terminal singularities," Tom Coates, Alexander M. Kasprzyk, Sara Veneziale
Integer programming
- "Machine Learning for Cutting Planes in Integer Programming: A Survey," Arnaud Deza, Elias B. Khalil
问题
I'd like the Authors to answer the weaknesses raised above. Plus,
- What do the "pre-defined set of increasing functions" and "pre-defined set of bounded-functions" refer to at [lines 479, 485]?
局限性
Limitations are adequately presented. For the potential improvements, see Weakness and Questions.
Thank you for your review and comments.
Backward generation. it would be better to elaborate on the method a little more.
We agree. See section B of the author rebuttal for a better presentation of our methods, explaining the motivation of the different steps. We will update the paper in this direction, and add further explanations in the appendix.
I encourage the authors to discuss the range of the Lyapunov function and 𝑓
Thank you for bringing this up. One of the main advantages of the generation method we propose is that it allows us to parametrize the functions and , by selecting the functions , , and in different classes. This enables us to generate polynomial or non polynomial Lyapunov functions, for polynomial or non polynomial systems. At present, we generate functions defined by the four operations, power, and usual functions (exp, ln, cos sin…), but this could be extended to any other functions, so long we can specify their gradient. As a consequence, our framework can generate any function that has a closed formula.
The main limitation is the restriction on . According to definition 3.2, any non negative function with a strict minimum of 0 in the origin, and infinite at infinity would do, but sampling such a function in the general case is likely a NP-hard problem. Instead, we sample in the smaller class of functions that can be written , with a product of non-negative functions, such as sum-of-squares, not necessarily polynomial, composed with bounded functions, and either a positive definite homogeneous polynomial (in the polynomial case) or a function of the form with a composition of an even polynomial with a strictly increasing function (currently limited to , , but this could be extended).
Still, this is a larger class than the functions considered in SOS approaches where must be the second form with even polynomials and a sum of squares. It also covers all the examples we have seen in textbook examples.
Another limitation is that, for now, we reject functions that are defined on a smaller domain than because we are interested in the global stability problem. Future improvement would include operators with a restricted domain of definition and focus on the stability problem on a restriction of .
Following your suggestion, we’ll add a discussion on the class of functions that can be covered and the limitations in our revised version.
it is important to know whether the Lyapunov function given by Transformers is really the Lyapunov function of the targeted system, and if not, whether this is because of a simple failure or the existence of such function.
Thank you for the comment. All model predictions are passed to a verifier that checks whether it is a correct Lyapunov function for the input system. For polynomial systems, the verifier is a theoretical checker (based on sum-of-squares), for non-polynomial systems it was a numerical checker. In the new version we added a theoretical checker for the non-polynomial case (based on SMT, see our author's rebuttal, section D).
The use of an external checker guarantees that all verified model predictions are indeed correct Lyapunov functions. It may happen, however, that correct model predictions are verified as incorrect, because of a failure in the checker.
There are four cases:
- The verifier is too restrictive. Polynomial systems may have non-polynomial Lyapunov functions, but SOS checkers always assume sum-of-square Lyapunov functions.
- The verifier aborts because of timeout or memory overflow, while checking a correct solution.
- The verifier aborts because of timeout or memory overflow, while checking a wrong solution, for a system that is globally stable
- The verifier aborts because the system is not globally stable (so no Lyapunov function exists)
The first case can be identified by changing the verifier, we believe it is rare (polynomial systems with non-polynomial functions are a very special case).
Discriminating between the second and two last cases amounts to telling whether the verifier aborted for “good or bad” reasons. This cannot be done in a systematic way, but we can estimate the frequency of these failures by counting the number of verifier failures on the generated datasets, for which we know the Lyapunov functions. Following your suggestion, we estimated these failure rates and will report them in the updated paper. For non-polynomial systems with known Lyapunov functions, we observe that the SMT solver fails in about 18% of the cases.
Discriminating between the two last cases amounts to determining whether a system is globally stable. This is an open question, in fact the only way to verify this is to find a Lyapunov function. The only estimate we can have is for special classes of functions that we know are globally stable (polynomials SOS methods can solve, gradient flow systems deriving from a potential). We will report the accuracy on gradient flow systems in the revised version.
What do the "pre-defined set of increasing functions" and "pre-defined set of bounded-functions" refer to at [lines 479, 485]?
Currently the pre-defined set of increasing functions is (, , ) and the pre-defined set of bounded-functions is (cos, sin). We didn’t initially specify them in the procedure because these sets are generation parameters that can be user-defined (one only has to specify the function and its gradient), but we have included them now in the revised version, thanks for the comment.
Minor comments
Thank you for spotting these. We corrected them, and added the references you suggested.
Thank you for your rebuttal.
Overall, the rebuttal improved my understanding of the work. Based on the discussion, I highly recommend the authors improve the presentations, including the notations, elaboration of the proposed method with its core idea of each step, and more.
The study itself is interesting; the problem, method, and experiments seem reasonable. I increased my score to 6, assuming that the authors will carefully update the manuscript. I'll also be watching the further responses from the other reviewers.
Thank you for your response, and thanks again for your comments, which helped us improve the paper. We will carefully update the manuscript with your suggestions. In particular, we will improve the notations, include a rationale concerning the generation procedure and detail the core idea of each step in the generation, as you suggest.
This paper trains a transformer to predict Lapunov functions for both polynomial and nonpolynompial dynamical systems. They generate a dataset combining "forward-generated" problems — solutions discovered from existing solvers on randomly generated problems — with "backward-generated" problems — problems generated by generating a random solution, and then building a function solved by the solution. The trained transformer achieves great performance. On a test dataset of randomly generated problems, it solves 12% of problems, compared to 1% solved by the state of the art solver. In addition, the model solves 84% of 75 problems in the FSOTOOLS dataset where human experts only solved 9.33%.
优点
- The paper is well-written, and clearly described, so that I could understand it even without having previously encountered Lyapunov functions.
- The related work section is good
- The dataset generation is very carefully described, and lots of important design decisions are included. There is even more detail in the Appendix. Despite not being an expert, it seems like the authors really know what they're doing, and have rigorously tested different interesting choices of what to include in the dataset.
- The results are really cool! The authors test out of domain generalization well, by showing performance on a randomly generated dataset. Indeed, some of the models evaluated were only trained using "ground truth" solver solutions from a different randomly generated dataset. But the model gets 10x higher performance than the ground truth solver on this random dataset.
- The big question for papers like this is how well the models generalize out side the train set. Due to the difficult nature of the problem, it's hard to have really good test sets, but the authors are aware of the importance of this, and do lots of OOD generalization tests of their systems, and report positive results.
- Again, the authors are quite thorough with their documentation and evaluation of their system. The paper feels very polished and professional and the results are impressive.
- Progress on this problem could lead to other work using deep learning for mathematics.
缺点
-
The algorithmic approach is not new — training a transformer to do math problems that a large dataset can be generated for — but this is okay given the careful effort needed and given to the dataset creation process, and the importance and performance of the problem achieved.
-
I find all the different dataset variants included to be a bit confusing. The paper could be easier to read with a reduced the number of variants presented in the main section of the paper.
-
I think the discussion section goes a little far in making suggestions about LLM reasoning ability given success on this dataset. The sentences are true, but it's not necessarily true that performance here is equivalent to reasoning. the transformers are probably doing sometihng more like "intuition" given the amount of data they are seeing and the quickness at which they generate an answer.
问题
- findylap solves the same number of problems as SOSTOOLs, right?
- aren't the other forward datasets also datasets that SOSTOOLS can solve, since its used to generate them? if so, what decision went into the characteristics of the test set over those used for training?
- are the polynomial systems for FSOSTOOLS homogenous, nonhomogenous, etc? I'm not an expert on the math at all, but those adjectives were used for the train set. I'm trying to understand, because the performance on the test set is the main way of knowing how good a ML model is. but if the test set is too similar to the train set, then the results aren't as meaningful. of course, you have the results on the "random systems" dataset.
- One idea could be to try to do some sort of expert iteration with your model: use it as the base solver when generating a forward dataset, and use newly solved problems (since it's better at solving than
findylapon the random systems dataset) to further fine-tune the model. this probably isn't that useful for increasing performance, but it would be a really cool proof of concept to see the performance keep going up on the random systems dataset with some iterations! - Is there a real-world (or "real math") application of finding Lyapunov functions? for example, maybe there are some applications. What kind of things do the existing tools get used for, and how might this model be used as a superior version?
局限性
The limitations are addressed well in the discussion section.
Thank you for your review and comments. Here are a few answers and clarifications.
The algorithmic approach is not new
We agree. See section A of the authors’ response for a clarification of our novel contributions. Summarizing,
- we demonstrate the applicability of backward generation to an open problem (instead of one that we know how to solve, like integration). [Yehudah et al, ICML, 2020] explained why this is not straightforward. The adjustments we had to implement (section B off the author response) prove their point,
- we show that mixing a tiny number of forward examples into the backward training set helps mitigate the OOD problem.
I find all the different dataset variants included to be a bit confusing
We will change this in the revised version, and limit the main paper to four datasets:
- Two forward sets: Fbarrier and FLyap, for two different problems: barrier functions and Lyapunov functions,
- Two backward sets: BPoly and BNonPoly, for polynomial and non-polynomial systems.
Results on other datasets will be moved to the Appendix.
the discussion section goes a little far in making suggestions about LLM reasoning ability given success on this dataset. (...) The transformers are probably doing something more like "intuition"
We agree, and we will clarify this paragraph. We used the word “reasoning” because it is common in current LLM literature, but we agree that its overuse makes the claim confusing. We changed it to intuition as you suggest.
In particular, we propose the following rewrite of the end of the discussion:
Our results suggest that transformers can indeed be trained to discover solutions to a hard problem of symbolic mathematics that humans solve through reasoning, and that this is enabled by a careful selection of training examples, instead of a change of architecture. We do not claim that the Transformer is reasoning but it may instead solve the problem by a kind of “super-intuition” that stems from a deep understanding of a mathematical problem.
Questions
1- findylap solves the same number of problems as SOSTOOLs, right?
Yes. findlyap is a python analog of SOSTOOLS, that we needed because our computing cluster cannot run Matlab. It solves the same problems as SOSTOOLS, up to differences in speed and memory usage, which may cause different timeouts and failures.
2- aren't the other forward datasets also datasets that SOSTOOLS can solve (...)? if so, what decision went into the characteristics of the test set over those used for training?
Yes. Since SOS methods are the main techniques known to find the symbolic global Lyapunov functions we need for our forward training sets, in this paper, the word “forward” is essentially equivalent to “solvable with SOSTOOLS” (for this reason, findlyap accuracies on forward datasets, 100% by design, are not reported in table 4).
The three forward datasets include systems of increasing difficulty for SOSTOOLS. FHom and FNHom both focus on the easier “barrier Lyapunov function discovery” problem. FNHomProper, focuses on the harder problem of discovering a Lyapunov function for a general polynomial system. Number of equations, degree and range of coefficients were selected so that findlyap could find a Lyapunov function in a reasonable amount of time (a few minutes per problem).
The forward test sets, being held-out samples of the forward training sets, have the same distribution. We ensure that no overlap exists between train and test sets.
3- are the polynomial systems for FSOSTOOLS homogenous, nonhomogenous, etc?
This was indeed missing, thank you for pointing it out. FSOSTOOLS contains non-homogeneous proper examples that correspond to the Lyapunov function problem with the most generic distribution of polynomials.
4- try to do some sort of expert iteration with your model: use it as the base solver when generating a forward dataset, and use newly solved problems to further fine-tune the model
Thank you very much for this suggestion! We experimented with it during the rebuttal period. Specifically, we created a sample of verified model predictions, for polynomial systems, added it to the original training sample, and continued training the model.
We note that adding about 1,000 correct predictions to the 1 million original training sample improves our performance on the “into the wild” test sets (section 6.4), from 11.7 to 13.5 for Poly3, and 9.6 to 11.9% for Poly5 - a 15% increase in accuracy! The performance on other test sets is unaffected. We are still experimenting with this, and will add our results in the updated paper. Thanks again for suggesting this!
5- Is there a real-world (or "real math") application of finding Lyapunov functions?
Yes, practical applications of stability analysis exist in fields as diverse as supply chain, space industry, car flow regulation on highways, chemical processes etc.
A typical engineering application is the stabilization problem. Given a dynamical system that can be acted upon, e.g. hydraulic gates controlling flow on a river, nozzles controlling the trajectory of satellites, we want to select actions that keep the system stable (i.e. water levels remain constant, satellite stays on orbit). Mathematically, this amounts to setting a parameter of the dynamical equations, so that the system has a Lyapunov function.
In many cases, a global Lyapunov function cannot be found, and one must settle for a local or semi-global solution, which guarantees that the system remains stable so long it is subjected to small perturbations. This is typical in robotics. However, when a global Lyapunov function can be found, more efficient controls can be designed. This is particularly important in biological systems (see for instance [Agbo Bidi, Almeida, Coron, 2024] for the regulation of mosquitoes population), epidemiology (see [Adda &Bichara, IJPAM 2012]) or traffic flow regulation (see [Hayat, Piccoli, Truong, SIAP, 2023]).
Great response, and thank you for your comments and responses! I will keep my score.
Thank you for your response! And thank you again for your comments, which were very helpful!
We thank the reviewers for their comments an suggestions, which helped improve the paper. Here, we cover questions asked by several reviewers, and present the new experiments we ran for this rebuttal. The results are in the appended PDF file.
A- On novelty, and comparison with Lample and Charton (2019)
Lample and Charton introduced the backward approach as a fast data generation technique for symbolic integration. They noticed that it creates out-of-distribution (OOD) generalization issues. Models trained on backward data perform badly on forward-generated test sets. [Yehudah et al, ICML, 2020] suggested that backward-generation, when applied to hard problems, tends to target easier subproblems. This limitation comes on top of OOD issues.
Ours is the first attempt to actually use a backward approach to train a model to solve an open problem, for which only a very small number of forward examples are available. We make two novel contributions.
- 1- We introduce complex generation techniques, like the orthogonal decomposition (step 3), to prevent backward generation from collapsing into solving easier subproblems. While these tricks are problem specific, they show that the limitations described by Yehudah et al. can be mitigated.
- 2- We show that OOD generalization issues can be greatly reduced if a tiny number of forward generated examples is added to the backward-generated training set (50 examples in a million, 0.05%). This approach, related to the priming technique (Jelassi et al 2023) for length generalization, is, to our knowledge, novel.
B- Rationale of backward generation
Our models are trained on pairs of problems and solutions. When a solver exists, we can use to generate a training set, by computing the solutions of random problems. This is the forward method. This approach fails for open problems, for which solvers do not exist, or are costly and limited to simple cases. The backward method creates a training set by sampling random solutions (Lyapunov function), and deriving associated problems (dynamical systems).
Lyapunov functions must satisfy three conditions (def. 3.2). One depends on the system and two are intrinsic: being strictly minimum in 0, tending to infinity at infinity.
In Step 1, we sample a function V that satisfies the two intrinsic conditions. To do so, we write , where belongs to a class with a guaranteed strict minimum in zero – e.g. sums of one-variable functions, positive definite polynomials – and belongs to a large class of non-negative functions, valued 0 at the origin, but with no guarantee of a strict minimum. This class of functions is larger than those considered in usual SOS methods.
In Steps 2 to 4, we address the third condition of Def. 3.2: for any .
A naive solution would be , but this would greatly reduce the class of systems we generate, and turn the Lyapunov function discovery problem (find from ) into an easier integration problem (find from ). This is the point made by Yehadah et al.
To prevent this, we transform the naive solution in two ways:
- Step 4: we multiply it, along each coordinate axis by random non-negative functions . Because, this does not change the sign of , Def. 3.2 still holds.
- Step 2 and 3: we add a random function, orthogonal to , , with random vectors orthogonal to , and random functions. Since , Def. 3.2 will still hold.
This guarantees that the resulting system spans a very large set of systems, because any satisfying can be written as the sum of a function collinear to and a function orthogonal to . This mitigates Yehuda's limitations, by preventing the model from guessing the solution by reversing the generative process.
C- Comparison with neural Lyapunov methods
Reviewer S4kD raised concern about recent neural methods already solving this problem. We consider our approach to be complementary to theirs.
First, we focus on a different problem: proving that a system has a global Lyapunov function (valid in the full space) in symbolic form. Neural Methods, on the other hand, find implicit (black box) semi-global Lyapunov functions. While finding semi-global functions, and the corresponding region of attraction, is an important problem in many engineering applications, it is a different problem: we aim at proving stability over the full space.
Second, in our experiments, we note that our methods perform much better than neural Lyapunov models (including the most recent Fossil 2.0 and LyzNet, which we added to our baselines), see the rebuttal PDF. We believe this is due to the fact that neural methods solve a different problem, as underlined above.
Third, the methods are fairly different. We train a model only once, on generated data, and then use it to predict explicit Lyapunov functions. Neural methods need to be retrained on every system considered, and provide implicit “black box” solutions.
D- New experiments
- We added an SMT, verifier to theoretically guarantee the output of the models in the non-polynomial case. It confirms the performances of our models.
- We provide an extensive comparison to neural methods such as FOSSIL 2.0 and LyZNet (see pdf)
- We tested a suggestion by reviewer LBjn (thanks again!): fine-tuning a model on its verified predictions. We notice that the addition of 1000 verified predictions to our training set of 1 million, improves performance on the “into to wild” test sets by about 15%, while not affecting the other test sets. Adding more examples seems to be detrimental, as it decreases the performance on other benchmarks.
Experiments are ongoing, we will update the paper with the new results.
The paper proposes to apply transformers for finding Lyapunov functions.
I find that the proposed technique is interesting and intriguing and may find practical applications in finding the global Lyapunov function for the given (polynomial) dynamical system, which is often performed by manual experimentation. The authors designed the whole pipeline, including synthetic dataset creation and I therefore accept the paper for the NeurIPS conference.
Several crucial points must be incorporated into the final version of the paper:
- address Reviewer S4kD's concerns about clearly stating the problem setup, definitions, Moreover, discuss differences from the mentioned existing methods FOSSIL and LyZNet.
- include the additional experiments reported in the rebuttal pdf,
- share the created dataset, benchmark, and training/evaluation pipeline in an open-source environment.