PaperHub
6.3
/10
Spotlight4 位审稿人
最低6最高7标准差0.4
6
6
6
7
2.5
置信度
正确性2.8
贡献度3.0
表达2.8
NeurIPS 2024

Optimization Algorithm Design via Electric Circuits

OpenReviewPDF
提交: 2024-05-13更新: 2025-01-15
TL;DR

We design optimization algorithms using electric RLC circuits.

摘要

关键词
Convex optimizationDistributed optimizationDecentralized optimizationADMMAlternating direction method of multipliersPG-EXTRAPerformance estimation problemContinuous-time analysisfirst-order optimizationproximal methods

评审与讨论

审稿意见
6

This paper addresses the design of new optimization algorithms (centralized and distributed) which lend themselves better to theoretical analysis than existing optimization algorithms, which are tailored more towards establishing fast worst-case convergence guarantees. The novel part is that the authors borrow analogies from electric circuits, specifically RLC circuits (which have components consisting of resistors, inductors, and capacitors).

优点

Noticing that iterations of a (small iteration, or "continuous-time") optimization algorithm tend to mimic the behavior of RLC circuits with nonlinear resistor components is interesting. The authors also provide a methodology for converting convergent continuous-time dynamics into discrete-time.

RLC circuits have a rich history and body of literature and these findings can thus be extended to the realm of optimization algorithm design with provable convergence, which, outside of it just being interesting to apply RLC concepts, is also quite useful for convergence analysis of optimization algorithms.

缺点

The authors should specify in the abstract and introduction that the optimization algorithm design in question seems to only work for convex optimization problems, which doesn't seem to be clear until the official definition of (1). This is a significant drawback considering there are significantly more convergence guarantees for convex problems in general versus nonconvex/mixed-integer etc.

The provable convergence is definitely of value, but is the speed/rate of convergence an improvement upon standard optimization algorithms? In practice, even if convergence isn't provable, for many practical problems most of the consideration would be towards optimization speed (and convergence would be achieved in practice but maybe not in theory).

问题

Do the equations for voltage across the inductor and current through the capacitor (page 3, line 93) have initial conditions as well? (e.g. initial charge across the inductor or current through the capacitor). If so, what are the analagous values for this in the optimization formation?

I don't quite understand how the subdifferential operator enforces all of the considered V-I relationships. For example, if the x variables are analogous to voltage and the y variables are analogous to currents, wouldn't there also be relationships where x \in df(y)?

Why is the 'equilibrium state' also indicative of the voltage across/current through a resistor being zero? Is this definition different than steady state?

In Section 3.1, a negative resistor is used. Do all of the circuit laws (Ohm's, KCL, KVL) also hold for negative resistance?

局限性

There does not seem to be any potential negative societal impacts.

作者回复

Thank you for your positive review and thoughtful questions. We are glad that the reviewer found our framework to be "quite useful for convergence analysis of optimization algorithms".

W1.

We thank the reviewer for this precise point. In the revised paper, we will clarify (in the abstract and introduction) that our framework currently focuses on developing convergent algorithms for convex problems. Extending the framework to cover more general nonconvex problems, including mixed-integer optimization problems, is a long-term goal for this project.

W2.

We agree with the reviewer that obtaining fast/efficient algorithms (rather than merely convergent algorithms) is the ultimate goal of optimization algorithm design. Since this present work represents the very first steps in this framework, we focus on laying the basic foundations and introducing the computer-assisted methodology. Utilizing and extending this machinery to design fast/efficient algorithms is the very next step of our planned agenda.

Q1.

Yes, they do. The initial conditions, stated in line 128, require that vL0v_L^0 and iC0i_C^0 should be compatible with the other voltage and current values of the resistors and f\partial f. Here, 'compatible' means that they should satisfy KCL and KVL.

In terms of optimization formulation, this means the initial values should be compatible with the optimization algorithm. However, vLv_L and iCi_C are often eliminated from the optimization algorithm, so the issue doesn't arise in most implements of the algorithm.

Let us clarify this with an example. Consider the gradient flow in section E.2 of the appendix. For the sake of simplicity, assume ff is differentiable and DC=ID_C=I. If we write out KCL, KVL, V-I relations, we have

$ y &= \nabla f(x) \\\\ i_C &= y \\\\ v_C &= -x \\\\ \frac{d}{dt} v_C &= i_C. $

The initial condition of iCi_C means that the value should be compatible with the first 3 lines. That is, for the initial value x0x^0, the value iC0i_C^0 should satisfy iC0=f(x0)i_C^0=\nabla f(x^0). Now let's think of the discrete counterpart. The last line corresponds to the update rule of the algorithm vCk+1=vCk+hiCk v_C^{k+1} = v_C^{k} + h i_C^k where h>0h>0 is step size. Here, we obtain the algorithm by eliminating vC,iC,yv_C, i_C, y, by solving the system of equations from the first 3 lines. Substituting vCk=xkv_C^k = -x^k, iCk=yk=f(xk)i_C^k = y^k = \nabla f(x^k) we get xk+1=xkhf(xk). x^{k+1} = x^{k} - h \nabla f(x^k) . Thus, iCi_C has been eliminated, and the constraint iC0=f(x0)i_C^0=\nabla f(x^0) does not explicitly manifest in the discrete-time algorithm.

Q2.

The subdifferential operator enforces the V-I relationship specified by f\partial f. The specific case mentioned by the reviewer can be enforced by considering the conjugate function ff^*. Recall that if ff is closed convex proper, then Fenchel's identity tells us (f)1=f(\partial f)^{-1} = \partial f^*, thus xf(y)    y(f)1(x)    yf(x).x \in \partial f(y) \iff y \in (\partial f)^{-1}(x) \iff y \in \partial f^*(x).

Q3.

We believe the reviewer's notion of 'steady state' is the same as our 'equilibrium state'. We define the 'equilibrium state' as the condition where all the circuit states are constant. This happens when vL=0v_L=0 and iC=0i_C=0, as discussed line 968 in the appendix. In equilibrium for admissible dynamic interconnect, the current through the resistor must be zero, as otherwise the resistors will dissipate energy, and a circuit with strictly decreasing energy cannot be in equilibrium.

Q4.

Yes, all circuit laws also hold for negative resistance. The only unusual aspect of negative resistors is that they generate, rather than dissipate, energy. However, this is not a problem since, in the equivalent circuit shown on the right side of the figure in Section 3.1, the negative resistor cancels out with the positive resistor and energy dissipation ddtE(t)0\frac{d}{dt} \mathcal{E}(t)\le0 is still guaranteed. (So, the negative resistor is merely a conceptual tool.) We will further clarify this point in the revised paper.

评论

I appreciate the authors' responses to my questions and comments and the examples provided to illustrate that the circuit laws still apply in the cases I mentioned. I do agree that this provides a good step towards speeding up optimization convergence, and is in general, an interesting finding that should be shared with the community. The paper should definitely be published somewhere, I just think for NeurIPS it's a bit borderline in terms of impact. Thus, I keep my 6/Weak Accept score.

审稿意见
6

This paper presents a methodology of designing an electric circuit whose continuous-time dynamics converge to the solution to its corresponding optimization problem (Theorem 2.2). Furthermore, the paper presents the discretization scheme of the continuous-time dynamics, generating convergent optimization algorithms.

优点

This paper presents a unified framework of implementing various optimization problems into an electric circuit with static and dynamic interconnections. The paper also presents the scheme of generating new optimization algorithms by discretizing the analog circuit equation.

缺点

  1. The paper claims two novelties, one of which is interpreting the optimization problem as an RLC circuit. However, the reviewer disagrees with this novelty. The idea of implementing an optimization problem in an RLC analog circuit is classical and has been studied for a long time. More surveys and comparisons with other works are necessary.

Chua, Leon, and Gui-Nian Lin. "Nonlinear programming without computation." IEEE Transactions on Circuits and Systems 31.2 (1984): 182-188. Wilson, G. "Quadratic programming analogs." IEEE transactions on circuits and systems 33.9 (1986): 907-911. Vichik, Sergey, and Francesco Borrelli. "Solving linear and quadratic programs with an analog circuit." Computers & Chemical Engineering 70 (2014): 160-171.

  1. Theorem 2.2, the main theorem of this paper, seems to be a special version of the Willems stability condition. Although the authors refer to related works like [75], they should make more surveys and clarify the connection between the theorem presented in this paper and the original works on dissipativity theory by J. C. Willems.

Willems, Jan C. "Dissipative dynamical systems part I: General theory." Archive for rational mechanics and analysis 45.5 (1972): 321-351.

问题

  1. The authors use the RLC circuit as an intermediate product to derive the optimization solver. As an alternative use, it might be possible to use the real-world RLC circuit directly as an optimization solver. Do the authors have any comments or perspectives on such a development?

  2. As stated in Lemma 4.1, the convergence of the optimization algorithm is shown by performing a proper discretization that ensures dissipativity. Would it be possible to provide specific examples of such "proper" discretization in the main body of the paper?

局限性

This paper focuses on the theoretical analysis and interpretation of optimization problems, and it does not present its negative social impacts.

作者回复

Thank you for the positive and constructive feedback. We are glad that the reviewer found our "unified framework of implementing various optimization problems into an electric circuit" as a strength of the paper.

W1.

Thank you for bringing these references to our attention. We will include them and other related work we find from reading these references in the revised paper. Indeed, the reviewer is correct in that the relation between the optimization problem and V-I relations has been studied for many decades. We discuss this in more detail in section A of the appendix, lines 670–679.

However, our intention was to claim novelty in using this correspondence to design optimization algorithms (as opposed to physical circuits), and, in our view, our approach is made complete with the PEP-based automated discretization. However, we now see that this distinction was not made sufficiently clear, so in the revised manuscript, we will rewrite the claims of novelty to make clear that the connection between optimization algorithms and electric circuits has been studied previously.

(Also, we doubly thank the reviewer for sharing the Vichik–Borrelli 2014 reference. We just found that this paper also uses the notion of negative resistors. Since we also use negative resistors, we are happy to find a prior reference to this notion. We will include this discussion in the revised paper as well.)

W2.

Thank you for directing us to this reference. In the revised paper, we will adequately reference prior work on dissipative dynamical systems.

From our reading of the reference you provide and the work [1], it seems like our analysis is indeed quite related to the Willems stability condition, but there were some minor differences. While the setups are related to ours, the fact that vC,iLv_C, i_L and iC,vLi_C, v_L may not converge in our setup makes the prior results by Willems not immediately applicable. In our setup, we allow cases where vC,iLv_C, i_L oscillate (for example, a circuit with a disconnected LCL-C loop).

Nevertheless, it seems that the energy functions and the analyses presented by Willems are closely related to our result, so we will adequately discuss this connection in the revised paper.

[1] Willems, J. C. The generation of Lyapunov functions for input-output stable systems. SIAM Journal on Control, 9(1):105–134, 1971.

Q1.

We thank the reviewer for this suggestion. This is indeed an interesting direction with some challenges one should overcome. One crucial challenge would be implementing the nonlinear resistor f\partial f (for any convex f ⁣:RmRf\colon \mathbf{R}^m \to \mathbf{R} \cup \\{ \infty \\}) in a real-world circuit. In [1], the authors explicitly model the constraints and objective functions of an LP and QP using analog circuits, but doing so for general convex problems will likely be more challenging. Another issue is that given the extreme efficiency of modern digital circuits (CPUs and GPUs), which implement standard optimization algorithms, making (analog) RLC circuits have some sort of advantage would likely be a significant challenge. Nevertheless, we do think this is an interesting direction worth pursuing.

[1] Vichik, S., and Borrelli, F. Solving linear and quadratic programs with an analog circuit. Computers & Chemical Engineering, 70:160–171, 2014.

Q2.

Yes, we can. The Example on line 232 is an example of a "proper" discretization. Also, the algorithm DADMM+C (fully presented in line 1404 of the appendix) in the Experiments section is another example. We also provide a proof that this discretization is indeed proper, as Lemma I.1 in line 1423. We will include this discussion in the main body of the revised paper.

However, we point out that our framework provides an automatic tool to find proper discretizations. Specifically, our PEP-based framework circuit finds an appropriate stepsize that makes the discretization proper.

评论

Thank you for your response. The reviewer's concerns have been addressed and appropriately reflected in the revised manuscript.

审稿意见
6

This paper proposes the use of RLC circuits to design optimization algorithms. The authors prove that circuit dynamics in continuous time converge to the solution of the optimization problem. By specifically designing the RLC components, this approach recovers many existing algorithms. The authors also introduce a PEP-based method to discretize continuous-time circuit dynamics and prove its convergence. Experiments demonstrate the effectiveness of their methods.

优点

  1. The authors use RLC circuits to design new optimization algorithms, offering a novel and interesting perspective on algorithm design.
  2. The authors provide the convergence of circuit dynamics in continuous time and the convergence of the algorithm in discrete time.
  3. The authors provide an automatic discretization package, and experimental results show that the proposed method achieves fast convergence.

缺点

  1. Compared to classical optimization algorithms, discretization requires solving the performance estimation problem, which needs extra computation.
  2. The convergence rate of the RLC-based algorithm is not discussed in the paper.

问题

Significant issues

  1. Could you please provide more applications and examples for Problem (1) in Line 25 ?
  2. In Line 104, could you please explain in detail about f\partial f electric device and why f\partial f enforces yf(x)y\in\partial f(x)?
  3. In Line 139, the authors use the energy definition in (6) instead of the total energy of the circuit, i.e., the sum of the energy of all components in the circuit. Could you please futher explain the reason for this choice?
  4. Can the convergence rate of the proposed algorithm be analyzed?
  5. When solving Problem (9) in Line 217, is it necessary to know vv^\star and ii^\star in advance?
  6. When designing an algorithm, how should the parameters of the RLC components be chosen? For example, in the gradient descent method, how should DCD_{\mathcal{C}} be selected?

Minor issues

  1. In Line 79, should nodes 1,...,m1,...,m be 1,...,τ11,...,\tau-1?
  2. In Line 1222, should it be ykf(xk)y^k \in \partial f(x^k)?

局限性

N.A.

评论

Q4.

Yes, we can extract an O(1/k)O(1/k) convergence rate from our proven inequalities. We state and prove a more generalized statement here. The assumption is the same as Lemma G.1 (which covers the assumption of Lemma 4.1), and we restate it here for convenience.

Theorem. Assume f ⁣:RmRf\colon \mathbf{R}^m \to \mathbf{R} \cup \\{ \infty \\} is a strictly convex function and the dynamic interconnect is admissible. Let a discrete-time optimization algorithm generate a sequence (vk,ik,xk,yk)_k=0\\{(v^k, i^k, x^k, y^k)\\}^{\infty}\_{k=0}. Suppose there exists η>0\eta > 0 such that for all k=0,1,k=0, 1, \ldots the energy descent

E_k+1+ηxkx,ykyE_k0\mathcal{E}\_{k+1} + \eta \langle x^k - x^\star, y^k - y^\star \rangle - \mathcal{E}\_k \leq 0

holds. Then, for the Lagrangian defined as L(x,z,y)=f(x)yT(xEz)L(x, z, y) = f(x) - y^T(x - E^\intercal z), we have

min_k0,1,,K(L(xk,z,y)f(x))1(K+1)ηE_0=O(1K)\min\_{k \in \\{0, 1, \dots, K\\}} \left( L(x^k, z^\star, y^\star) - f(x^\star) \right) \leq \frac{1}{(K+1)\eta} \mathcal{E}\_0 = O\left(\frac{1}{K}\right)

for K=0,1,K=0,1,\dots.

Proof outline. Since we are considering the same assumption as in Lemma G.1, all arguments used in its proof are applicable. From line 1223 we have

0_k=0Kηxkx,ykyE_0,0 \leq \sum\_{k=0}^K \eta \langle x^k - x^\star, y^k - y^\star \rangle \leq \mathcal{E}\_0,

and from line 1229 we have

xkx,ykyL(xk,z,y)f(x)0. \langle x^k-x^\star, y^k-y^\star\rangle \geq L(x^k, z^\star, y^\star) - f(x^\star) \geq 0.

Combining two inequalities and dividing both sides by K+1K+1, we get the desired conclusion. \blacksquare

All algorithms obtained by our framework, including DADMM+C, achieve this convergence rate. As mentioned in an earlier part of this response, doing a more refined analysis of convergence rates (establishing, say, linear rates of convergence in an automated fashion) is a follow-up direction of work that we are pursuing.

Q5.

Explicit knowledge of vv^\star and ii^\star is not needed. We can find the numerical values of (α,β,h)(\alpha,\beta,h) without explicit knowledge of the optimal values.

For example, consider finding dissipative discretization for gradient descent. Let ff be LL-smooth. The iterates are given by xk+1=xkhf(xk)x^{k+1} = x^k- h\nabla f(x^k). The energy is E_k=12xkx2_2\mathcal{E}\_k = \frac{1}{2}\\|x^k-x^\star\\|^2\_2, where xx^\star is such that y=f(x)=0y^\star = \nabla f(x^\star) = 0. Then we want to find step size h>0h>0 and η>0\eta>0 such that E_k+1+ηxkx,ykyE_k0.\mathcal{E}\_{k+1}+\eta\langle x^k - x^\star, y^k - y^\star\rangle-\mathcal{E}\_k \leq 0. Using the inequality arises from LL-smoothness of ff,

xkx,yky1Lyk2_2,\langle x^k - x^\star, y^k - y^\star\rangle \geq \frac{1}{L} \\|y^k \\|^2\_2,

we can check that the discretization is dissipative for all η<12L\eta < \frac{1}{2L} and h[1L±1L12ηL]h \in [\frac{1}{L} \pm \frac{1}{L}\sqrt{1-2\eta L}]. Thus, we can find proper η\eta and hh without explicitly specifying xx^\star, by just representing it as a point satisfying the optimality conditions. Further details on this line of reasoning are available in the PEP papers [1, 2].

[1] Y. Drori and M. Teboulle. Performance of first-order methods for smooth convex minimization: A novel approach. Mathematical Programming, 145(1):451–482, 2014.

[2] A. B. Taylor, J. M. Hendrickx, and F. Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming, 161(1):307–345, 2017.

Q6.

One approach is to let the solver find the RLC values along with the discretization. (We recently implemented this functionality in ciropt.)

Another approach is to try a few variations for the values for resistance, inductance and capacitance, and see for which of them the solver finds a discretization.

Questions: Minor issues

Q1.

We clarify that this is not a mistake. For the circuit in Figure 2, for example, there are τ=8\tau=8 nodes, but only m=5m=5 of the nodes (node 1,2,51, 2, \dots 5) are connected to the terminals. Likewise, there are nodes that are connected to neither the terminal nor the ground. The potential of those nodes is denoted by ee.

Q2.

Thank you for pointing out to us this typo; we will correct it.

We appreciate the reviewer’s thoughtful questions. We hope we have adequately addressed the reviewer's concern regarding the extra computation related to the performance estimation problem and the possibility of obtaining convergence rates from our approach. If so, we kindly ask the reviewer to consider raising the score.

作者回复

Thank you for the constructive comments. We are pleased that you found our framework to be "a novel and interesting perspective on algorithm design." Additionally, we're glad that you recognized our automatic discretization package as a strength of our work.

W1.

To clarify, the role of the Performance Estimation Problem (PEP) in our framework automates the calculations and analysis for obtaining discretizations. So, it does not produce extra computation (it does not make the algorithm more expensive), but rather, it eliminates the need for humans to carry out the convergence proof of the discretized algorithm.

As mentioned in line 628 of the prior works section, continuous-time-based optimization design is a powerful approach for designing new algorithms, but a shortcoming of this prior work has been that there was no principled way of performing discretization. Our PEP-based automated discretization overcomes this shortcoming. We believe that our PEP-based automated discretization is not a weakness but rather a substantive contribution that completes the methodology of continuous-time optimization algorithm design.

W2.

We agree with the reviewer that obtaining fast/efficient algorithms (rather than merely convergent algorithms) is the ultimate goal of optimization algorithm design. Since this present work represents the very first steps in this framework, we focus on laying the basic foundations and introducing the computer-assisted methodology. Utilizing and extending this machinery to design fast/efficient algorithms is the very next step of our planned agenda.

However, we would like to point out that it is possible to extract an O(1/k)O(1/k) convergence rate from the analytic convergence proof Lemma I.1 in line 1423 of the appendix, and a similar argument can be made more generally. We further elaborate on this point in our response to the reviewer's Question 4.

Questions: Significant issues

Q1.

This form includes many problems that arise in distributed and decentralized optimization. For example, Sections E and F of the appendix discuss how we recover many classical centralized and decentralized optimization methods for their respective problems. For more details on the setup for each of these classical methods, see, e.g., Chapters 2, 3, and 11 of [1], and paper [2]. These optimization problems and their solution methods have a wide range of applications, for example, wireless sensory networks [3], federated learning [4], power grids, and many more [5].

We thank the reviewer for raising this point. We agree that it is important to at least briefly discuss the motivating applications of Problem (1), so we will include this discussion in the updated version of the paper.

[1] Ryu, E. K., and Yin, W. Large-scale Convex Optimization: Algorithms & Analyses via Monotone Operators. Cambridge University Press, 2022.

[2] Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1):1–122, 2011.

[3] Mota, J. F., Xavier, J. M., Aguiar, P. M., and Püschel, M. D-ADMM: A communication-efficient distributed algorithm for separable optimization. IEEE Transactions on Signal Processing, 61(10):2718–2723, 2013.

[4] Zhou, S., and Li, G. Y. Federated learning via inexact ADMM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8):9699–9708, 2023.

[5] Yang, T., Yi, X., Wu, J., Yuan, Y., Wu, D., Meng, Z., ..., and Johansson, K. H. A survey of distributed optimization. Annual Reviews in Control, 47:278–305, 2019.

Q2.

To start with the simplest example, consider the case m=1m=1 and f(x)=12Rx2f(x) = \frac{1}{2R} x^2 with some R>0R>0. Since ff is differentiable f=f\partial f=\nabla f, and the V-I relation becomes y=f(x)=1Rxy=\nabla f(x)=\frac{1}{R}x, i.e., x=Ryx=Ry. Recalling xx is potential, and yy is current, the electric device f\partial f becomes the usual linear resistor with resistance RR.

As another example, an ideal diode is a device that blocks current flow if the voltage is less than 00 (reverse biased) and allows current flow with no voltage drop otherwise (forward biased). Such an ideal diode can be modeled with a convex function

f(x)={0if x0otherwise.f(x) = \begin{cases} 0 & \text{if } x \leq 0 \\\\ \infty & \text{otherwise}. \end{cases}

So the V-I relation given by f\partial f has a vertical line at x=0x=0, i.e.,

f(x)={0if x<0[0,]if x=0if x>0.\partial f(x) = \begin{cases} 0 & \text{if } x < 0 \\\\ [0, \infty] & \text{if } x = 0 \\\\ \emptyset & \text{if } x > 0. \end{cases}

Thus the electric device f\partial f becomes the ideal diode, as no current flows through the device when x<0x<0, and current flows with zero resistance (infinite slope) otherwise. It does not result in x>0x>0 because there is no voltage drop in forward bias.

Using set-valued functions to describe V-I curves of circuit elements is a standard approach in circuit theory. So please think of f\partial f as something like a non-linear resistor generalizing resistors and diodes.

Q3.

We clarify that the energy in (6) is indeed the total energy in the circuit since resistors and f\partial f do not store energy.

One slightly non-standard aspect is that the energy in the capacitors and inductors is computed after translating by v_Cv\_C^\star and i_Li\_L^\star. This is necessary because f\partial f is a non-linear resistor that is incrementally passive (instead of being simply passive). Therefore, shifting by the equilibrium values v_Cv\_C^\star and i_Li\_L^\star leads to a dissipative energy. (If f\partial f is simply passive, then the circuit's equilibrium is at x=y=0x=y=0, but the incrementally passive f\partial f allows the circuit to relax to the (non-zero) solution to the optimization problem.)

评论

I thank the authors for addressing all my concerns in detail. The additional theorem on the convergence rate of the discrete algorithm looks solid to me. The theoretical proof for the faster convergence rate of the discrete algorithm is an interesting direction. I am willing to raise the rating to 6.

审稿意见
7

This paper presents a novel framework for designing optimization algorithms with electric RLC circuits. It contains two stages:

  1. design an appropriate circuit whose equilibrium is the solution to the optimization problem.
  2. discretize the continuous-time dynamics of the circuit to form a discrete-time algorithm.

Theoretical guarantees are given on the convergence of the continuous-time dynamics.

优点

  1. The idea of the paper is novel. Though prior works have explored the relationship between ODE and optimization algorithms, using RLC circuits to explain existing algorithms and design new ones seems a novel framework.

  2. The authors demonstrate the proposed framework covers a lot of existing algorithms and can be extended to new ones with convergence guarantees, which is practically useful.

  3. The paper is comprehensive and overall well-written.

缺点

  1. Some parts lack clarity. See Questions.

  2. The theoretical results in this paper focus on the strongly-convex settings. It is unclear how can this be extended to convex or nonconvex settings which are more common in real-world problems.

  3. It would be better if there was more discussion on the benefit of using this approach to design algorithms compared to the standard approaches.

问题

  1. How to determine whether a circuit is admissible from construction instead of checking the equation on lines 98-99?

  2. In line 125, under what "appropriate" conditions?

  3. Although modifications on the RLC circuits of existing algorithms can lead to new algorithms, how to ensure these modifications of the RLC circuits still can reach equilibrium? Is there a general recipe? I would like to see more discussion on this.

局限性

There are no separate sections in the main paper. However, the authors state the assumptions for their results.

作者回复

We are happy to hear that the reviewer found our proposed framework "novel" and "practically useful".

W1.

This is further addressed below.

W2.

The assumption of strong convexity is made to prove the well-posedness of the circuit ODE. We clarify that this assumption is not necessary for showing convergence of the discretized methods (Lemma 4.1). In an extension to non-convex setups, we expect the technical well-posedness of the circuit ODE to be somewhat challenging (especially for non-differentiable non-convex functions), but the convergence analysis should not be difficult, at least not more so than how non-convex optimization is usually difficult.

W3.

Existing optimization methods are designed either with the worst-case convergence analysis or to have fast empirical performance on some problems. This means that either the methods are too pessimistic and slow in practice or they do not have convergence guarantees. We present a new framework that allows to quickly design and explore provably convergent algorithms that are well suited to a problem at hand.

Q1.

As line 97 states, a dynamic interconnect is admissible if, at equilibrium, it reduces to the static interconnect. In the language of circuit theory, this can be stated as follows. First, at equilibrium, the dynamic components relax, i.e., capacitors become disconnects (iC=0i_C=0), and inductors become wires (vL=0v_L=0). Second, the circuit relations hold, which is expressed with KCL, KVL, and resistor relations. Third, at terminals 1,,m1, \ldots, m the V-I relation of the relaxed circuit is the same as that of the static interconnect, i.e., xR(ET)x \in \mathcal{R}(E^T) and yN(E)y \in \mathcal{N}(E). The admissibility condition (lines 98-99) is a mathematical formalization of this.

Q2.

The conditions are concretely stated in the statement of Theorem 2.1 on line 128.

Q3.

In continuous-time, the intuition comes from the observation that the convergence of electric circuits to equilibrium hinges on incremental passivity, i.e., no component can generate energy, and the fact that resistors dissipate energy. Since convexity of ff leads to incremental passivity of f\nabla f (so ff also does not generate energy), the circuit will relax to equilibrium.

When we discretize the dynamics, however, there is a possibility that the discretized process is no longer dissipative. Therefore, we obtain a formal convergence guarantee of the discretized dynamics through the use of the computer-assisted proof framework called the performance estimation problem (PEP). Using the PEP, we certify that the discretized process is also dissipative and, therefore, will reach equilibrium. This is a very general recipe.

评论

Thanks for the response. It addressed all my concerns. I would like to keep my current rating.

作者回复

Common Response

We thank the reviewers for their thoughtful comments and suggestions. We are pleased that the reviewers generally find our framework novel and valuable. We address the reviewers' specific questions in the individual responses.

最终决定

This paper presents a novel idea of designing optimization algorithms with electric RLC circuits. The first step is to design an appropriate circuit whose equilibrium is the solution to the optimization problem, and the second step is to discretize the continuous-time dynamics of the circuit to form a discrete-time algorithm. Overall, this is an interesting paper and has publication merits. The strengths outweigh the weaknesses.