DiffPhyCon: A Generative Approach to Control Complex Physical Systems
We introduce a novel method for controlling complex physical systems using generative models, by minimizing the learned generative energy function and specified objective
摘要
评审与讨论
This paper proposes an algorithm for controlling complex physical systems, particularly in long-term settings. The method is based on diffusion models and energy methods, utilizing data generated by traditional finite difference methods.
优点
-
The problem studied in this paper is important and well-motivated.
-
The presentation is clear and easy to follow.
-
Detailed experimental results of the proposed algorithm are presented in both main text and appendices.
缺点
-
The method proposed in the paper does not demonstrate its strength in challenging tasks. I agreed with the claim that the fundamental challenge of simulating complex physical systems, which are high-dimensional and highly nonlinear. However, the current paper does not address these problems convincingly for the following two reasons:
-
First, both numerical examples are low-dimensional so that the curse of dimensionality does not affect much.
-
Second, their dynamics are not very complex. At first glance, the 2D Jellyfish task appears complex, as it involves solving the 2D Navier-Stokes equation, which might yield turbulent solutions. However, I realized that the optimization focuses on the jellyfish itself. I was wondering if there is a chance to obtain a good solution even if the simulation of fluid dynamics is not accurate. If this is the case, the results of this example may not provide strong evidence that the algorithm can handle complex dynamics. Additionally, this paper does not present the error analysis for the fluid dynamics simulation, only stats of energy comparison are provided.
-
This algorithm depends on the data generated by finite difference methods, hence suffering from the curse of dimensionality.
问题
-
What is the energy term ?
-
In the 2D Jellyfish task, what is the value of the kinematic viscosity ?
-
Is it possible to make the proposed algorithm independent of classical methods (e.g., the finite difference method used in this paper) for data generation?
局限性
This paper has addressed its limitations in the main text.
We thank the reviewer for the constructive and detailed comments. We are glad that the reviewer finds our work clear and well-motivated with detailed results. Below, we address the reviewer’s questions one by one.
Comment 1: ...The current paper does not address these problems convincingly for the following two reasons:
-- First, both numerical examples are low-dimensional so that the curse of dimensionality does not affect much.
-- Second, their dynamics are not very complex. ...I was wondering if there is a chance to obtain a good solution even if the simulation of fluid dynamics is not accurate. If this is the case, the results of this example may not provide strong evidence that the algorithm can handle complex dynamics. Additionally, this paper does not present the error analysis for the fluid dynamics simulation, only stats of energy comparison are provided.
Answer: Our tasks align with existing data-driven physical control papers [1-3]. Controlling Burgers' Equation is challenging due to shock wave tendencies, and our setup is more difficult with partial observation/control combinations. The 2D jellyfish control task includes challenges such as (1) asymmetric wake vortex formation from symmetric structures and motion modes [4]; (2) the nonlinear, complex interaction of vortices with the shell [5].
Still, to make our experiments more convincing, we conducted new 2D incompressible fluid control experiments, following a similar (but more challenging) setup of [3]. Control forces were applied within a 64x64 grid flow field, excluding a semi-enclosed region, to minimize the smoke failing to exit through the top middle exit (seven exits in total). For illustration of settings, please refer to subfigures (a) and (b) of Figure 1 in the PDF file at the end of General Response. This high-dimensional indirect control problem involves managing 2-dim forces at approximately 1,700 grid points per time step, resulting in about 100,000 control variables in total across 32 time steps, making it highly challenging. We generated 20,000 training trajectories (including features of the smoke density field, velocity fields, control force fields and smoke proportion field) after filtering low quality trajectories. The average control objective in the training set is 49.9%. We evaluated 50 test samples, and the results are shown in the following Table 1. It shows that our method still has significant advantages, especially in validating the role of prior reweighting. The average relative L2 error of fluid dynamics simulation of our method is 19.2%. In particular, the curse of dimensionality is effectively addressed by our method. For visualization of the generated control and density map of smoke, please refer to subfigure (c) of Figure 1 in the PDF file at the end of General Response. These updates will be added to the final version of our manuscript.
Table 1: Performance comparison on the new 2D indirect fluid control task.
| Method | Control Objective |
|---|---|
| BC | 0.3085 |
| BPPO | 0.3066 |
| SAC (pseudo-online) | 0.3212 |
| SAC(offline) | 0.6503 |
| DiffPhyCon-lite | 0.2324 |
| DiffPhyCon (=0.96) | 0.2254 |
Comment 2: This algorithm depends on the data generated by finite difference methods, hence suffering from the curse of dimensionality;
Comment 3: Is it possible to make the proposed algorithm independent of classical methods (e.g., the finite difference method used in this paper) for data generation?
Answer: Our method does not depend on the data generated by finite difference methods. Our method belongs to the domain of deep learning-based surrogate modeling, and like any other methods in this domain, is decoupled from the data-generating process. The data can come from classical solvers (not limited to finite difference methods) or actual physical observations, e.g., 1D data from finite difference methods while 2D data from the finite volume method [6] in our paper.
Moreover, like any other deep learning-based surrogate modeling method, the goal is exactly to deal with complex, high-dimensional processes. It is exactly our generative control method that can better learn the high-dimensional dependencies from the given data.
Comment 4:What is the energy term ?
Answer: It is the parameterized energy-based model of the joint distribution of , where is the control sequence, is the state trajectory, and is the set of conditions, like initial conditionals and boundary conditions. characterizes . Given , we aim to generate that lies on the data manifold with high probability, with the optimal objective . To achieve this goal, we train a diffusion model to approximate . During inference, the sampling starts from a Gaussion noise and travels along by applying iteratively to achieve a final sample with near-minimal energy , under the guidance of .
Comment 5: In the 2D Jellyfish task, what is the value of the kinematic viscosity ν?
Answer: In our paper, ν is set to 0, considering an inviscid fluid, similar to other flow field control papers [3]. Our method can also be applied to viscous fluids and does not restrict the magnitude of ν.
[1]Solving pde-constrained control problems using operator learning,2022.
[2]Optimal control of PDEs using physics-informed neural networks (PINNs), 2021.
[3]Learning to Control PDEs with Differentiable Physics, 2020.
[4]Stable hovering of a jellyfish-like flying machine, 2014.
[5]Propulsive performance and vortex dynamics of jellyfish-like propulsion with burst-and-coast strategy, 2023.
[6]Conservative Volume-of-Fluid method for free-surface simulations on Cartesian-grids, 2010.
I would like to thank the authors for their rebuttal. When saying the curse of dimensionality, I was referring to the dimension of the state space, which is 2-dimensional, instead of the number of nodes. It would be more convincing to see your method performs well in high-dimensional tasks. I will maintain my score.
Thanks for your feedback. We appreciate your clarification regarding the curse of dimensionality and acknowledge the importance of higher-dimensional tasks. However, our method is a generic control method for physical system control. It is agnostic to specific tasks, model architectures, and data generation methods. Based on our results on the 1D task, the 2D jellyfish control task, and the new 2D smoke control task during rebuttal, the effectiveness of our method is convincingly demonstrated. For 3D tasks, we believe our method is also applicable, provided that efficient solvers, effective model architectures, and sufficient offline data are accessible. Exploration on 3D tasks is left as future work.
Thank you again for your valuable comments. We will continue to refine our work based on your suggestions.
This paper introduces Diffusion Physical Systems Control (DiffPhyCon), where diffusion models are used to generate a near-optimal controller for a system described by a partial differential equation (PDE). In this generative approach, a learned generative energy function and control objective is minimized. Additionally, a prior reweighting technique is developed to mitigate the effect of prior distribution of control sequences.
优点
The paper is well-written and the method is well-motivated. The performance of the developed generative approach is demonstrated using extensive simulation studies on 1D Burgers equation and 2D Jellyfish movement control. The efficiency of this method is compared with a decent number of baselines. Additional experiments are provided to study the effects of hyperparameters.
缺点
The contribution of the paper is not clear compared to the following recently published result in ICLR:
Wei, L., Hu, P., Feng, R., Du, Y., Zhang, T., Wang, R., Wang, Y., Ma, Z.M. and Wu, T., Generative PDE Control. In ICLR 2024 Workshop on AI4DifferentialEquations In Science.
The method looks identical to the generative PDE control result in the above reference which is published and peer-reviewed. The additional contribution, if any, is too minor for NeurIPS. For this reason, I cannot recommend the paper to be accepted for NeurIPS. However, I do think this paper would have been a contribution to NeurIPS otherwise.
UPDATE: It appears the ICLR workshop is not considered an archival publication. With this consideration, I update my score to 7.
问题
Besides the big question of novelty in comparison to the "Generative PDE Control" paper from ICLR, I have the following comments for the authors:
- The authors acknowledge the limitation that DiffPhyCon presently operates in an open-loop manner. This limitation by itself is not a problematic, but the claims made in the introduction need revision in that case. Due to robustness concerns, a controller is seldom implemented open-loop even for simple systems, let alone complex physical systems. However, open-loop optimal control design can be applied to motion planning or trajectory generation objectives. In the context of this paper, the generative approach could be used to construct a nominal pre-planned open-loop control sequence term which can then be added to a feedback term (e.g. PID). Thus, I suggest to revise the introduction to claim planning as the objective. For more information about motion planning, the authors are referred to Chapter 12 (Motion Planning for PDEs) from the reference:
Krstic, M. and Smyshlyaev, A., 2008. Boundary control of PDEs: A course on backstepping designs. Society for Industrial and Applied Mathematics.
- In line 72 of page 2, the authors claim "...DiffPhyCon facilitates global optimization of long-term dynamics...". It is not clear how the globality (or even non-triviality) of near-optimal control sequence is guaranteed to be achieved. I am not sure why Algorithm 1 would not yield local optima.
局限性
Yes, the authors have stated the limitations.
We appreciate the reviewer’s valuable feedback and helpful suggestions. We are pleased to hear that the reviewer finds our paper well-written, well-motivated, and extensive in its results. Below, we address the reviewer’s questions one by one.
Comment 1: ... Thus, I suggest to revise the introduction to claim planning as the objective. For more information about motion planning, the authors are referred to Chapter 12 (Motion Planning for PDEs) from the reference: Krstic, M. and Smyshlyaev, A., 2008. Boundary control of PDEs: A course on backstepping designs. Society for Industrial and Applied Mathematics.
Answer: Thanks for this constructive suggestion. Based on your advice and the discussions in the referenced literature, our problem setting indeed aligns more closely with a planning task, where future multi-step actions are generated at once. We will revise the introduction and other relevant sections of the paper to change the keyword "control" to "planning".
Comment 2: In line 72 of page 2, the authors claim "...DiffPhyCon facilitates global optimization of long-term dynamics...". It is not clear how the globality (or even non-triviality) of near-optimal control sequence is guaranteed to be achieved.
Answer: We apologize for not clearly explaining global optimization. By this, we mean treating the state trajectory and control sequences over all physical time steps as a single variable during diffusion/sampling. This approach is chosen because control objectives are typically defined over the entire time span. In particular, when the objective function includes several conflicting terms, our method can relieve the myopic failure modes [1] that may exist in reinforcement learning's iterative decision-making. For instance, in our 2D control task, maximizing the jellyfish's average speed requires sharp angle changes in the early stage, conflicting with minimizing the total energy cost . More details are in Lines 302-307 and Appendix H.7 on myopic failure modes of SAC in our original submission.
Comment 3: I am not sure why Algorithm 1 would not yield local optima
Answer: We do not guarantee that Algorithm 1 yields local optima. However, our further theoretical study during rebuttal shows that prior reweighting (Line 6 of Algorithm 1) enhances the probability of obtaining global near-optimal solutions. Informally speeking, for tasks with suboptimal training control sequences, there exists a hyperparameter such that using prior reweighting with this increases the likelihood of obtaining near-optimal control sequences compared to not using this technique (). Due to space limitations, for details of the formal statement of the theorem and proof, please refer to Official Comment on top of this webpage.
[1] Janner, et al. "Planning with diffusion for flexible behavior synthesis.", ICML 2022.
I thank the authors for their response. The response was convincing. I am conditionally increasing my score to 8, on the condition the authors actually revise the final version with the keyword planning instead of control.
Thanks for your feedback and support. We appreciate your willingness to increase the score based on our revisions. We will ensure that the final version of the paper reflects your suggestion by using the keyword "planning" instead of "control."
The model proposed a variant of diffusion models to control complex dynamical systems. Its contributions are threefold. First, the proposed model could optimize the trajectory and control sequence simultaneously. Second, it proposed a reweighting prior technique to generate a superior control sequence to the training dataset. Third, it contributes a benchmark dataset, jellyfish movement control, for the complex dynamical control community.
优点
Novelty: The novelty lies in two parts. A variant of the diffusion model is proposed to minimize the energy function and control objectives simultaneously. Moreover, it proposes a prior reweighting algorithm to allow the sample to enhance the sampling of good but low-probability trajectories.
Clarity: It is clear in illustrating the methodologies and the experiments.
Quality: It provides comprehensive details on the background of the model and dataset. The experiment results substantiate its claim.
Significance: The proposed method can potentially control complex systems with a generative model with reduced cost and good long-term control sequences.
缺点
Several questions need to be clarified; see details in the questions section.
问题
-
How is the offline dataset generated? Does it include the optimal control trajectory? What if the offline collected training control trajectory is too far from the optimal one? I am trying to understand if your model "finds" the existing trajectory in the dataset or if it "stitches" different pieces of trajoetiers and generates one that doesn't exist in the dataset.
-
For the cases in this paper, using the online control algorithm is not too expensive. Online RL for these cases could also give global optimal control results. Could you compare the computational cost of your proposed method and an online RL method to demonstrate the benefits against the online RL algorithm?
-
For the diffusion-generated trajectory, how did you get Figure 4? Did you input the diffusion-generated trajectory to the numerical solver (Lilly-Pad) to get the physics simulation result? Or did you take the diffusion-generated flow field result as the "physics trajectory"?
-
I didn't quite get the intuition behind Figure 2. Why does reducing to less than 1 shift the red point from margin to center?
局限性
The limitations are explicitly mentioned in the paper.
We thank the reviewer for the insightful comments. We are glad that the reviewer recognizes the novelty, clarity, results, and significance of our work. Below, we address the reviewer’s questions one by one.
Comment 1. How is the offline dataset generated? Does it include the optimal control trajectory? What if the offline collected training control trajectory is too far from the optimal one? I am trying to understand if your model "finds" the existing trajectory in the dataset or if it "stitches" different pieces of trajoetiers and generates one that doesn't exist in the dataset.
Anwser: As detailed in Appendices C.1 and D of the original submission, for our 1D and 2D data, we input initial conditions and control sequences with random variations into the solvers to generate trajectories. For instance, in the 2D case, control angles are cosine curves with varying amplitude, mean, and duty cycle. These random variations are necessary as expert control sequence data are hard to obtain for nonlinear and high-dimensional physical systems. Consequently, generated sequences are typically far from optimal.
This issue motivates our prior reweighting technique, which reduces the weight of prior control sequences to increase the probability of sampling near-optimal solutions. In this rebuttal process, our additional theoretical analysis shows that for tasks with suboptimal training set control sequences, using prior reweighting with 𝛾<1 increases the likelihood of obtaining near-optimal control sequences compared to not using it (𝛾=1). Due to space limitations, for details of the formal statement of the theorem and proof, please refer to Official Comment on top of this webpage.
Despite the offline training control sequences not being optimal, diffusion models can combine segments of these trajectories through conditional generation or control target guidance, as illustrated in [1]. In our experiments, a close look at Figure 4 reveals that although the entire curve has not appeared in the training set, its segments resemble parts of a cosine control curve from the training set.
Comment 2: For the cases in this paper, using the online control algorithm is not too expensive. Online RL for these cases could also give global optimal control results. Could you compare the computational cost of your proposed method and an online RL method to demonstrate the benefits against the online RL algorithm?
Anwser: First, online training of SAC on our 2D Jellyfish control case is so computationally demanding that it is hardly feasible. However, we conducted a new experiment using SAC on our 1D Burgers' control task. The results are shown below. Our method is about twice as fast as SAC and performs better.
| Method | Training time / hour | |
|---|---|---|
| DiffPhyCon (ours) | 4.4 | 0.01103 |
| DiffPhyCon-lite (ours) | 1.7 | 0.01139 |
| SAC-online | 10.5 | 0.01567 |
| SAC-offline | 8.0 | 0.03201 |
Second, the cases in our paper are for evaluation purposes. Our method can be applied to more complex tasks, such as the new high-dimensional indirect fluid control task (see Figure 1 in the PDF of General Response), where online RL training is also infeasible here due to high costs. Moreover, online RL is impractical in many settings, like dangerous exploration (e.g., autonomous underwater vehicles) [2], whereas offline learning can avoid these issues. Thus, our method has broader applications compared to online RL.
Comment 3: For the diffusion-generated trajectory, how did you get Figure 4? Did you input the diffusion-generated trajectory to the numerical solver (Lilly-Pad) to get the physics simulation result? Or did you take the diffusion-generated flow field result as the "physics trajectory"?
Anwser: Figure 4 shows the control results of three jellyfish, each with different initial conditions. Control sequences obtained from each method are input into Lilypad solver to simulate the flow field trajectories, based on which we calculated the average movement speed and control objective .
Therefore, we did not use the flow field trajectories generated directly by the diffusion model as the "physical trajectory". The reason for this is that the evaluation of these control methods, both our method and the baselines, should be based on simulated trajectory of the same solver, to compare them in a fair footing. Details are provided in Appendix E.4 (Line 853 to Line 864) of our original submission.
Comment 4: I didn't quite get the intuition behind Figure 2. Why does reducing 𝛾 to less than 1 shift the red point from margin to center?
Anwser: We apologize for any confusion. In the figure, the red dot at the margin indicates a local minimum of , while the red dot at the center represents the global minimum. In the joint distribution , we assume the global minimum of has a lower probability than the local minima, as global optimal trajectories rarely appear in the training dataset. By setting 𝛾<1 and then normalizing by a constant , the distribution flattens , increasing the probability of sampling at the global minimum. Thus, the figure illustrates this ideal situation: with 𝛾=1, we sample a local minimum by using DiffPhyCon-lite; with 𝛾<1, we sample the global minimum. However, this ideal scenario is only for intuitive illustration. Rigorous theoretical analysis shows that with 𝛾<1, the probability of sampling near the global minimum increases compared to 𝛾=1. Due to space limitations, for details of the formal theoretical analysis, please refer to Official Comment on top of this webpage.
[1] Janner, et al. Planning with diffusion for flexible behavior synthesis., ICML 2022.
[2] Levine, et al. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv:2005.01643, 2020.
Thanks for providing a detailed rebuttal to my questions and general proof. The quality and relevance of the rebuttal resolved my concerns. Together with the high-quality manuscript, which involves rich detail, it is definitely a paper beyond the acceptance level. Therefore, I would raise my score to 8. I would strongly encourage the author to publish the final version of the code to facilitate future research.
Thank you for your positive feedback and for raising your score. We are delighted to hear that our rebuttal addressed your concerns and that you found the manuscript to be of high quality. We appreciate your encouragement to publish the final version of the code. We plan to release the code upon acceptance of the paper, and we will make sure to include clear documentation and instructions for usage.
Once again, thank you for your constructive comments and support.
The method learns the energy function, , which is used in the energy optimization target to generate the control sequences and system trajectory. A denoising network is trained to approximate the gradient of . The network and optimization framework takes a global state of u and w of all times steps. The training and inference follow the process of a diffusion model.
The paper also introduces prior reweighting to enable the discovery of control sequences that diverge significantly from training. Another network to learn the prior distribution of the control sequences is introduced.
The proposed method is tested by two systems (1D Burgers’ equation and 2D jellyfish movement control). The experiments compared different control methods.
优点
- The paper develops a generative method to control complex physical systems. The method optimizes system trajectory and control sequences jointly in the entire horizon by diffusion models. A prior reweighting technique is also proposed to improve the model's generalization ability.
- The results of the jellyfish systems are strong. The proposed methods generates realistic control sequences and trajectory that align with the established findings in fluid dynamics.
- The paper generated a dataset for the jellyfish system, which contributes to the benchmark of this area.
缺点
- The experiments are relatively limited - only include one 1D example and one 2D example.
- I have minor reservations about the novelty of the paper since the primary method relies on the existing diffusion model.
问题
- Compared with other methods, does the proposed method use less, similar, or more time to train (for the learning-based methods) and inference?
- Is it easy or hard to extend this method to 3D systems? In the 2D example, a 3D U-Net is used. Does this make it hard to apply this method to 3D systems as it requires a much larger 4D network?
- In the prior reweighting method, are and trained jointly, or is trained first and then ?
局限性
The authors discussed the limitations in the paper.
We thank the reviewer for the helpful comments. We are glad that the reviewer appreciates our jellyfish results and dataset contribution. Below, we address the reviewer’s questions one by one.
Comment 1: The experiments are relatively limited - only include one 1D example and one 2D example.
Answer: Our two tasks align with existing data-driven physical control papers [1-3]. For further evaluation, we conducted an additional 2D incompressible fluid control experiment using a setup similar to [3]. Control forces were applied within a 64x64 grid flow field, excluding a semi-enclosed region, to minimize the smoke failing to exit through the top middle exit. This high-dimensional indirect control problem involves managing 2-dim forces at approximately 1,700 grid points per time step, resulting in about 100,000 control variables in total across 32 time steps, making it highly challenging.
We generated 20,000 training trajectories, with average control objective 49.9%. We evaluated 50 test samples. The results (Table 1) show that our method still has significant advantages, especially in validating the role of prior reweighting. For visualization of the results, please refer to Figure 1 in the PDF file in General Response. These updates will be added to the final manuscript.
Table 1: Performance comparison on the new 2D indirect fluid control task.
| Method | Control Objective |
|---|---|
| BC | 0.3085 |
| BPPO | 0.3066 |
| SAC (surrogate-solver) | 0.3212 |
| SAC(offline) | 0.6503 |
| DiffPhyCon-lite | 0.2324 |
| DiffPhyCon (=0.96) | 0.2254 |
Comment 2: I have minor reservations about the novelty of the paper since the primary method relies on the existing diffusion model.
Answer: Our method, while relying on diffusion models, introduces the following three key innovations:
- New Application Area: We leveraged diffusion models' advantages, such as ease of generalization and global optimization over time, for physical system control. This establishes a new application area for diffusion models.
- Prior Reweighting Technique: We addressed the challenge of generating control sequences that outperform those in the training set by introducing prior reweighting. This is a novel technique in the diffusion model literature, marking a significant technical contribution.
- Theoretical Analysis: During the rebuttal, we conducted additional theoretical analysis: for tasks where training set control sequences are far from optimal, using a 𝛾<1 in prior reweighting increases the probability of obtaining near-optimal control sequences compared to not using the technique (𝛾=1). This aligns with our intuitive explanation in the original manuscript and will be included in the final manuscript. Due to space limitation, for formal theoretical analysis, please refer to Official Comment.
Comment 3: Compared with other methods, does the proposed method use less, similar, or more time to train (for the learning-based methods) and inference?
Answer: The efficiency comparison for inference on two tasks was detailed in Appendix I of the original submission. Those resutls are combined with training time comparisons in the following Table 2 and Table 3. Here are the key points:
- Inference Time: Our method is competitive among most methods, except SAC. By adopting the fast sampling method DDIM (DiffPhyCon-DDIM), our method's efficiency significantly improves, nearing the fastest models in the 1D task.
- Training Time: The training time for our method is smaller (1D task) or comparable (2D task) to other learning-based methods.
These details will be included in the final manuscript.
Table 2: Training and inference time comparison on 1D task.
| Method | Training Time / hours | Inference Time / seconds |
|---|---|---|
| DiffPhyCon-lite | 1.7 (1 A100-80G GPU, 8 CPUs) | 21.13 |
| DiffPhyCon | 4.4 (1 A100-80G GPU, 8 CPUs) | 58.97 |
| DiffPhyCon-DDIM (8 sampling steps) | 1.7 (1 A100-80G GPU, 8 CPUs) | 0.53 |
| BPPO | 8.9 (1 V100-32G GPU, 12 CPUs) | 0.82 |
| BC | 8.8(1 V100-32G GPU, 12 CPUs) | 1.22 |
| SAC | 10.5 (1 A6000-48G GPU, 16 CPUs) | 0.11 |
| SL | 2.6 (1 V100-32G GPU, 12 CPUs) | 74.85 |
Table 3: Training and inference time comparison on 2D task.
| Method | Training Time / hours | Inference Time / seconds |
|---|---|---|
| DiffPhyCon | 62 (2 A100-80G GPUs, 32 CPUs) | 252.2 |
| DiffPhyCon-DDIM (50 sampling steps) | 62 (2 A100-80G GPUs, 32 CPUs) | 12.6 |
| BPPO | 3.0 (1 A100-80G GPU, 16 CPUs) | 1.1 |
| BC | 2.8 (1 A100-80G, 16 CPUs) | 1.0 |
| SL | 52.1 (1 A100-80G GPU, 16 CPUs) | 133.5 |
| SAC | 9.5 (1 A100-80G, 16 CPUs) | 0.2 |
| MPC | 52.1 (1 A100-80G GPU, 16 CPUs) | 1401.7 |
Comment 4. Is it easy or hard to extend this method to 3D systems? In the 2D example, a 3D U-Net is used. Does this make it hard to apply this method to 3D systems as it requires a much larger 4D network?
Answer: Our method is agnostic to the neural network backbone of the diffusion model. For a 3D system, we can switch to an appropriate neural network like a 4D U-net.
Regarding efficiency, controlling 3D complex physical systems is challenging for all current methods. Although autoregressive models require only a 3D neural network, they evaluate iteratively at each physical time step [1,4]. Our method improves temporal efficiency by generating the full trajectory simultaneously.
Comment 5. In the prior reweighting method, are 𝜖𝜃 and 𝜖𝜙 trained jointly, or is 𝜖𝜃 trained first and then 𝜖𝜙?
Answer: These two models can be trained simultaneously as they are independent.
[1]Solving pde-constrained control problems using operator learning, 2022.
[2]Optimal control of PDEs using physics-informed neural networks (PINNs), 2021.
[3]Learning to Control PDEs with Differentiable Physics, 2020.
[4]Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control, 2018.
I thank the authors for their detailed response. I initially had the same concern as Mp9v that using only 2D examples might not be sufficiently convincing. If applying to 3D grids, this method requires a 4D network which could be much harder to train. However, I agree with the authors' response that the existing 2D examples are complex enough, and the proposed method could be applicable to 3D tasks. Developing a more efficient network architecture for 4D data might be beyond the scope of this paper. The authors' response regarding the new application area and the prior reweighting technique also alleviated my concerns on novelty. Therefore, I raised my score to 6.
Several reviewers (8Er5, DFgi, n6Yf) are concerned with the theoretical property of our method. Here, we present the formal theoretical analysis of the prior reweighting technique.
Consider a pair sampled using the prior reweighting technique with hyperparameter : . Denote as the global minimum of the control objective . Define as the "-optimal" solution set of such that , whose complement set is , and denote as its indicator function, i.e. if ; otherwise 0. Define to be the random variable of "whether use as a guidance for sampling", namely,
Consider , which indicates the expectation of getting an -optimal solution by using the prior reweighting technique with under the guidance of .
Define
. Then, we have the following theorem:
Theorem: Assume is a smooth function, then:
- (i): If , there exists , s.t., ;
- (ii): If , there exists , s.t., .
Proof
$
E(\gamma)= &\int \mathbb{I}_{Q(\varepsilon)}(u,w) p_{\gamma}(u,w|Y=1) \mathrm{d}(u,w) \\ = &\int \mathbb{I}_{Q(\varepsilon)}(u,w) \frac{p(Y=1|u,w) p_{\gamma}(u,w)}{p(Y=1)} \mathrm{d}(u,w) \\ = &\frac{\mathbb{E}_{(u,w)}[\mathbb{I}_{Q(\varepsilon)}(u,w) p(Y=1|u,w)]}{\mathbb{E}_{(u,w)}[p(Y=1|u,w)]} \\ = &\frac{\mathbb{E}_{(u,w)}[\mathbb{I}_{Q(\varepsilon)}(u,w) p(Y=1|u,w)]}{\mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)}(u,w) p(Y=1|u,w)] + \mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w)]} \\ = &\frac{1}{1 + \frac{\mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w)]}{\mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)}(u,w) p(Y=1|u,w)]}}
$
Define
$
G(\gamma) = &\frac{\mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)}(u,w) p(Y=1|u,w)]}{\mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w)]} \\ = &\frac{\mathbb{E}_w[\mathbb{E}_u[\mathbb{I}_{Q(\varepsilon)}(u,w) p(Y=1|u,w)|w]]}{\mathbb{E}_w[\mathbb{E}_u[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w)|w]]} \\ = &\frac{\int \mathbb{E}_u[\mathbb{I}_{Q(\varepsilon)}(u,w) p(Y=1|u,w)|w] p^{\gamma}(w) \mathrm{d}w}{\int \mathbb{E}_u[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w)|w] p^{\gamma}(w) \mathrm{d}w}
$
Then
. Since , and have the same monotonicity.
$
G'(\gamma) = &\frac{\int \mathbb{E}_{u}[\mathbb{I}_{Q(\varepsilon)}(u,w) p(Y=1|u,w)|w] p^{\gamma}(w) \ln(p(w)) \mathrm{d}w \mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w)]}{(\mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w)])^2 } \\ & -\frac{\mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)}(u,w) p(Y=1|u,w)] \int \mathbb{E}_{u}[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w)|w] p^{\gamma}(w) \ln(p(w)) \mathrm{d}w}{(\mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w)])^2} \\ = & \frac{\mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)}(u,w) p(Y=1|u,w) \ln(p(w))] \mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w)]}{(\mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w)])^2} \\ & - \frac{\mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)}(u,w) p(Y=1|u,w)] \mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w) \ln(p(w))]}{(\mathbb{E}_{u,w}[\mathbb{I}_{Q(\varepsilon)^c}(u,w) p(Y=1|u,w)])^2}
$
By definition, is a positive multiple of , which implies our conclusion:
-
If , then and thus decreases around 1. Hence, there exists , s.t., , thus (i) holds;
-
otherwise, for similar reason, (ii) holds.
Remark: Here can be interpreted as some kind of difference between "entropies" in and . When , it means that has higher "entropies", implying that the training trajectories are far from optimal. As a result, we may need to flatten the distribution of training trajectories, which corresponds to using the prior reweighting technique with . Since this is the most common case, we usually set .
General Response
We thank the reviewers for their thorough and constructive comments, as well as AC's organization in reviewing our paper. We are glad that the reviewers think that our paper is well-written (DFgi, n6Yf, Mp9v) and well-motivated (n6Yf, Mp9v). Reviewers 8Er5 and DFgi recognized the novelty of our paper: a generative method to control complex physical systems, and a prior reweighting technique (8Er5 and DFgi). Reviewers appreciate the strong results of the jellyfish systems (8Er5) and the comprehensive details of experiments (DFgi, n6Yf, Mp9v). Reviewer 8Er5 also recognizes the importance of our contributed dataset for the jellyfish system.
Based on the reviewers’ valuable feedback, we have conducted several additional experiments and made additional theoretical analysis. Below, we address the issues pointed out by the reviewers and resolve possible misunderstandings:
- We present an additional theoretical analysis of the prior reweighting technique to illustrate its effectiveness. The theoretical conclusion is: for control problems where control sequences in the training set are far from optimal, there exists a such that using the prior reweighting technique with this shows improvement compared to not using it. For details of the formal statement of the theorem and proof, please refer to Official Comment.
- We introduce a new 2D incompressible fluid control task to further demonstrate the effectiveness of our method, in response to Reviewer 8Er5 and Mp9v. This is a high-dimensional indirect control task, thus very challenging. Our method still shows significant improvement compared to those baselines. For details, please refer to the responses to Reviewers 8Er5 and Mp9v. For visualization of the generated control fields and the generated smoke density field, please refer to Figure 1 in the attached PDF file.
- We further clarify our contributions, in response to Reviewer 8Er5's minor reservations about the novelty of the paper. Our contributions on top of the diffusion models include three aspects: expanding the application of diffusion models, proposing a novel prior reweighting technique, and providing theoretical analysis of the technique. For details, please refer to the responses to Reviewer 8Er5.
- We add online RL results of the 1D control task. The results show that our method has higher training efficiency and outperforms online SAC. For details, please refer to the responses to Reviewer DFgi.
- We add a comparison of the training efficiency of DiffPhyCon and those learning-based baselines. For details, please refer to the responses to Reviewers 8Er5.
- We agree to revise the Introduction and others related sections, as suggested by Reviewer n6Yf. We will change the keyword "control" to "planning".
The above updates will be presented in our final manuscript. We now individually address the concerns of reviewers. Please see the responses below each review.
The paper presents a new use of diffusion models for open loop control (planning) in physical system models. It observes one can adjust the weighting in the energy function to de-emphasize the prior probability in the data in favor of the control objective. This allows the result to outperform the demonstrations. Some issues remain unaddressed such as the case of a noisy system and/or an inaccurate dynamics model, which would require actual closed loop control. Nonetheless, the idea is interesting and the empirical results show a nice improvement over other methods.