PaperHub
4.9
/10
Poster4 位审稿人
最低2最高3标准差0.4
2
3
3
3
ICML 2025

AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation

OpenReviewPDF
提交: 2025-01-24更新: 2025-07-24
TL;DR

We introduce AUTOCIRCUIT-RL, an AI system that uses reinforcement learning and large language models to automatically generate efficient and valid analog circuit designs from constraints.

摘要

关键词
Analog circuitcircuit topologyreinforcement learninginstruction tuningRL refinement

评审与讨论

审稿意见
2

This paper proposes a reinforcement learning (RL)-based framework, AutoCircuit-RL, for automated analog circuit topology generation. It consists of two main stages: instruction fine-tuning and reinforcement learning optimization. During the instruction fine-tuning stage, supervised learning techniques are used to fine-tune a large language model (LLM). The RL optimization stage further refines the topology generation process using RL with AI feedback (RLAIF).

给作者的问题

N/A

论据与证据

  1. The topology search is restricted to ~10 components, which is impractical for real-world designs.
  2. Supported components lack diversity (only NMOS, PMOS, inductors, capacitors; missing resistors, diodes, NPN/PNP transistors).
  3. Fixed device parameters oversimplify circuit generation.

方法与评估标准

  1. Unclear necessity of reward models:
    • No comparison with Ngspice's runtime efficiency (fast for small circuits), raising doubts about the need for three separate reward models.
  2. Handcrafted reward parameters:
    • Reward design relies on manual parameter tuning, limiting scalability.

理论论述

N/A

实验设计与分析

  1. Scalability issues:
    • Trained on 100,000 samples for 4–5 component circuits, but success rates drop significantly for 6–10 component designs.
    • Massive training data likely required for larger-scale circuits.
  2. Lack of modern baselines:
  • No comparison with recent methods, such as [1] and [2].

References: [1] Lai, Y., Lee, S., Chen, G., Poddar, S., Hu, M., Pan, D. Z., & Luo, P. (2024). Analogcoder: Analog circuit design via training-free code generation. AAAI 2025.

[2] Chen, Z., Huang, J., Liu, Y., Yang, F., Shang, L., Zhou, D., & Zeng, X. (2024, June). Artisan: Automated operational amplifier design via domain-specific large language model. In Proceedings of the 61st ACM/IEEE Design Automation Conference (pp. 1-6).

[3] Zhang et al., “AnalogXpert: Automating Analog Topology Synthesis by Incorporating Circuit Design Expertise into Large Language Models,” arXiv, 2024.

补充材料

Yes, A & B parts.

与现有文献的关系

  1. LLM-based analog topology generation: The framework generates analog circuit topologies containing up to 10 components and introduces a customized RL fine-tuning method to optimize circuit quality and performance.
  2. Granular evaluation during RL feedback: The analog topology evaluation process is decomposed into detailed steps, enabling generated circuits to simultaneously satisfy design requirements and incorporate performance metrics.

遗漏的重要参考文献

[1] Shen et al., “Atelier: An Automated Analog Circuit Design Framework via Multiple Large Language Model-based Agents,” TechRxiv, 2024.

[2] Zhang et al., “AnalogXpert: Automating Analog Topology Synthesis by Incorporating Circuit Design Expertise into Large Language Models,” arXiv, 2024.

[3] Chen, Z., Huang, J., Liu, Y., Yang, F., Shang, L., Zhou, D., & Zeng, X. (2024, June). Artisan: Automated operational amplifier design via domain-specific large language model. In Proceedings of the 61st ACM/IEEE Design Automation Conference (pp. 1-6).

其他优缺点

There are two same items in the reference list:

Fan, S., Cao, N., Zhang, S., Li, J., Guo, X., and Zhang, X. From specification to topology: Automatic power converter design via reinforcement learning. In IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–9, 2021a.

Fan, S., Cao, N., Zhang, S., Li, J., Guo, X., and Zhang, X. From specification to topology: Automatic power converter design via reinforcement learning. In IEEE/ACM International Conference On Computer Aided Design (ICCAD), pp. 1–9, 2021a.

其他意见或建议

It is better to show several generated circuits.

作者回复

We appreciate the reviewer’s detailed feedback and address the concerns as follows:

Scalability Concerns:

Our experiments primarily focus on 4–5 component circuits, with a few-shot evaluation for 6–10 component designs using only 1,000 training samples. Although success rates drop for larger circuits due to increased complexity, this decline is an inherent challenge of scaling circuit design—not a fundamental limitation of our method. Notably, even for larger circuits, our model achieves over 60% success in generating valid designs. Moreover, our experiments show that employing a multi-sample generation strategy (quantified by our SuccessRate@m metric) substantially improves performance. For instance, using m=3 or m=5 yields average generation times of 5.1 and 8.5 seconds, respectively, while markedly increasing the probability of finding a valid design. This approach effectively mitigates performance drops without necessitating massive additional training data. We are actively exploring enhanced RL sampling techniques and efficient data augmentation strategies to further improve scalability.

Comparison with Other Baselines:

We appreciate the reviewer’s suggestion to compare our approach with recent methods such as AnalogCoder and Artisan. While both methods provide valuable contributions to LLM-based circuit synthesis, our work differs fundamentally in methodology and scope:

AnalogCoder relies on training-free prompt engineering to generate SPICE netlists, but it lacks an adaptive optimization mechanism, making it less suitable for synthesizing novel or complex topologies. Moreover, it does not cover power converters that require specialized design constraints, i.e., efficiency, and output voltage.

Artisan is specialized for operational amplifier design using domain-specific LLMs, whereas our approach is focused on power convertors, though they could be more generalizable across different analog topologies.

Our method (AutoCircuit-RL) introduces a two-phase process:

a. Instruction-tuned LLM generates diverse circuit topologies.

b. Reinforcement learning (RL) refinement optimizes the topologies for multiple objectives (validity, efficiency, and output voltage) through reward-driven feedback.

This novel dual-phase strategy allows us to tackle multi-objective design challenges in power convertors more effectively. Additionally, we plan to add more quantitative comparisons with additional baselines, to demonstrate that our approach achieves superior design quality and efficiency. We will include these updated results in the final version if this is accepted.

Reference Clarifications:

Thank you for pointing this out. We acknowledge the issue of duplicate references and will correct them in the final manuscript. We will also include additional relevant references (e.g., Atelier, AnalogXpert) to better situate our work within the broader literature.

审稿人评论

Thank you for the author's response.

  1. However, I believe that a topology with 4–10 components is not a practical issue in analog design. Due to the limited number of components, designers can refer to textbooks to achieve faster design implementation. Even considering different sizing parameter selections, some optimization methods based on Bayesian Optimization (BO) can already meet the requirements. Large Language Models (LLMs) do not offer a specical advantage in this regard.
  2. I point out that the analog circuit design problem in this work has been overly simplified. As the author acknowledged, training the model on data with 6–10 components results in a significant drop in performance, which directly indicates that the model's scalability is challenging to improve.
  3. In this version, comparisons with newer methods are still missing.
审稿意见
3

This work proposes a framework for automating analog circuit topology synthesis using RL. It has two phases. In the instruction tuning phase, LLM learns to generate initial circuit topology, ensuring feasibility under basic constraints. In the RL phase, LLM iteratively optimizes topologies using reward models, enhancing validity, efficiency and output voltage to meeting the requirements.

给作者的问题

Is it possible to compare the proposed approach with AnalogCoder[1] and LaMAGIC[2]? [1] https://arxiv.org/abs/2405.14918 [2] https://arxiv.org/abs/2407.18269

论据与证据

The experiments are solid and comprehensive, which support the claim made very well.

方法与评估标准

Only a synthesized dataset is considered for evaluation. While I acknowledge that it is difficult to collect a large-scale analogy topology dataset for training, it is feasible to evaluate the proposed method on a real analog topology dataset. With that, we can demonstrate that the data synthesis pipeline used is a reasonable one and the model trained on it generalizes well.

理论论述

There is no theoretical claim in this paper.

实验设计与分析

It seems that the proposed approach only generates the devices and the connection between them. How is the device size set? If the device sizes are not set, could we get the metrics mentioned in Section 4.2?

补充材料

I checked the implementation details.

与现有文献的关系

The definition of reward function mentioned in this work may have broader impact.

遗漏的重要参考文献

No

其他优缺点

No additional comments.

其他意见或建议

No additional comments.

作者回复

Thank you for your thoughtful review.

Device Sizing and Evaluation Metrics:

Our approach primarily focuses on generating circuit topologies, which includes selecting the devices and their interconnections. The device parameters (e.g., capacitor, inductor, and MOSFET sizes) are set based on standard design guidelines commonly used in power electronics (e.g., 10μF for capacitors, 10μH for inductors, and fixed MOSFET parameters). This decision allows us to isolate and optimize the topology generation task while maintaining consistency in evaluations. We plan to focus on optimizing device parameters in the future. The metrics reported in Section 4.2—including circuit validity, efficiency, and output voltage—are computed using these standardized component values. While our method does not dynamically generate device sizes, the reported metrics remain valid and meaningful as they reflect the performance of different topologies under fixed design constraints.

Comparison with AnalogCoder and LaMAGIC:

While both AnalogCoder and LaMAGIC use LLMs for circuit synthesis, our approach differs in its methodology and optimization strategy:

AutoCircuit-RL (Ours):

We employ a two-phase process: (i) an LLM first generates initial circuit topologies through instruction tuning, and (ii) a reinforcement learning (RL) refinement phase iteratively optimizes these topologies using reward models that account for circuit validity, efficiency, and output voltage. This allows us to explore a much wider design space and generate more robust and scalable topologies, even with limited training data. Our empirical results show a 12–14% increase in circuit validity and a 14–16% improvement in efficiency, demonstrating the effectiveness of our RL-based refinement.

AnalogCoder:

This approach uses domain-specific prompt engineering to generate PySpice code for standard analog circuits. However, it relies on a predefined synthesis library, which limits its ability to explore novel or complex topologies. Moreover, it does not cover power converters that require specialized design constraints, i.e., efficiency, and output voltage.

LaMAGIC:

Similar to other LLM-based methods, LaMAGIC generates circuits using supervised fine-tuning with pre-trained models but does not incorporate reinforcement learning. As a result, it lacks an iterative refinement process that can optimize designs beyond their initial generation. Our reward-driven optimization provides a key advantage in enhancing both circuit validity and efficiency.

审稿意见
3

The authors proposed a novel framework, AUTOCIRCUIT-RL (AC-RL), for automating analog circuit topology synthesis using reinforcement learning (RL). The architecture operates in two main phases: instruction fine-tuning and RL refinement. Through extensive experimental validation, the authors demonstrated that the proposed framework outperforms previous research efforts, achieving significant improvements in (a) valid circuit generation, (b) efficiency in designing complex circuits, and (c) strong generalization capability with limited data.

给作者的问题

  1. The primary concern to be highlighted is that if standard benchmarking datasets (e.g., modern IC netlists) are not used, the generalizability of the proposed approach may be limited. Further investigation is required to quantify key metrics such as correctness, efficiency, and scalability for complex circuit topology generation.
  2. Additionally, a runtime complexity analysis is necessary to evaluate trade-offs between the framework’s computational efficiency and the quality of the generated circuits and benchmark against previous research works.

论据与证据

Yes, the authors have made a significant effort to validate their claims through an in-depth experimental strategy, supported by satisfactory results and analysis demonstrated in this manuscript. They have also provided a thorough explanation for each metric and strategy adopted to substantiate the effectiveness of the proposed framework compared to previous state-of-the-art (SOTA) research works/models/architectures.

方法与评估标准

Yes, the proposed method, benchmark dataset, and evaluation criteria/metrics are explicitly defined and are satisfactory for demonstrating the application. The experimental design is well-structured and methodologically sound, covering a broad spectrum of LLM-based generative methods for circuit synthesis. The comparative analysis with GraphVAE strengthens its validity.

理论论述

Yes, the theoretical claims regarding the framework's design, implementation, performance evaluation, and overall generalizability and scalability have been substantiated through experimental validation and proof of correctness.

实验设计与分析

Yes....... soundness/validity of experimental designs has been validated against, (a) comparative benchmarking -> The study compares multiple LLM-based approaches against a non-LLM baseline (GraphVAE), ensuring a comprehensive evaluation of generative performance. (b) evaluation of Zero-Shot vs. Fine-Tuned Models --> The research systematically investigates how LLMs perform without fine-tuning (zero-shot & ICL) vs. with fine-tuning (prompt tuning & vanilla fine-tuning), providing a layered understanding of model capabilities, (c) use of soft-computing --> by integrating soft prompt tuning, the study tests an efficient fine-tuning strategy that preserves the base model while learning task-specific adaptations.

The primary concern to be highlighted is that if standard benchmarking datasets (e.g., modern IC netlists) are not used, the generalizability of the proposed approach may be limited. Further investigation is required to quantify key metrics such as correctness, efficiency, and scalability for complex circuit topology generation. Additionally, a runtime complexity analysis is necessary to evaluate trade-offs between the framework’s computational efficiency and the quality of the generated circuits and benchmark against previous research works.

补充材料

N/A

与现有文献的关系

The key contributions of this research manuscript are potentially significant to the broader research community. The proposed framework not only serves the purpose of analog circuit design using LLM+RL but could also be pivotal in any domain involving constrained topology, such as agile software development, digital circuit synthesis, 3.5D/chiplet design, and more. A more in-depth analysis is presented in the "Impact Statement," which has been validated to demonstrate its potential significance.

遗漏的重要参考文献

N/A

其他优缺点

Mentioned and discussed in previous sections as "Methods And Evaluation Criteria*" and "Experimental Designs Or Analyses*".

其他意见或建议

N/A

伦理审查问题

no ethical review concerns flagged.

作者回复

Thank you for your valuable feedback.

Benchmarking Datasets:

To the best knowledge of the authors, currently in the domain of analog circuit topology generation, there is no standard benchmarking datasets. All the recent publications in this domain would spend considerable effort in curating initial dataset. Our focus is on power converter synthesis—a domain with unique design constraints such as efficiency, output voltage, and component pool. We have domain experts in power converter design who helped us examine the evaluation setup and validity of the dataset. Our dataset is specifically curated to represent these constraints and ensure that our evaluation reflects real-world challenges in analog circuit design. This specialized dataset is critical for assessing generative performance in our targeted application area.

Evaluation of Key Metrics:

Our study comprehensively evaluates correctness, efficiency, and scalability by comparing multiple LLM-based approaches with a non-LLM baseline (GraphVAE) and by contrasting zero-shot, in-context learning, and fine-tuning methods. These layered evaluations clearly demonstrate that our framework not only generates more valid circuits but also achieves significant improvements in efficiency and scalability.

Runtime Complexity Analysis:

Our method generates a new design in approximately 1.7 seconds using 2 NVIDIA V100 GPUs, which is a significant improvement over traditional search-based methods that typically require hundreds of seconds due to large number of simulation queries. This efficiency underscores the practical advantages of our RL-based refinement approach. We acknowledge that further runtime complexity analysis against standardized datasets could provide additional insights and plan to include such analyses in our final manuscript upon acceptance.

审稿意见
3

The paper introduces AUTOCIRCUIT-RL, an RL-based LLM framework for automating analog circuit synthesis. The framework operates in two phases: instruction tuning and RL refinement. The authors claim that AUTOCIRCUIT-RL outperforms existing baselines, generating ~12% more valid circuits, improving efficiency by ~14%, and reducing duplicate generation rates by ~38%.

给作者的问题

  1. What is the computation cost of the proposed method?
  2. Can the authors provide a performance comparison against AnalogCoder?
  3. Can the authors provide some failure cases generated by the method?
  4. What are the detailed prompts used in the proposed method?
  5. Can the authors provide the learning curve of RL tuning?
  6. Can the authors provide their code implementation?

论据与证据

The claims are supported by empirical evidence.

方法与评估标准

The evaluation is generally well-suited. However, the paper could benefit from more diverse benchmarks, or open-source benchmarks.

理论论述

There is no theoretical claim.

实验设计与分析

Experiments are generally sounds. The learning curve of RL method is missing.

补充材料

Yes, the details of the baseline methods.

与现有文献的关系

This paper is about LLM-based circuit synthesis. Some relevant literature includes AnalogCoder, CircuitSynth, LaMAGIC.

遗漏的重要参考文献

Can the author discuss the difference between the paper [1], which leverages llm for generating SPICE netlist?

Bhandari, Jitendra, et al. "Auto-SPICE: Leveraging LLMs for Dataset Creation via Automated SPICE Netlist Extraction from Analog Circuit Diagrams." arXiv preprint arXiv:2411.14299 (2024).

其他优缺点

Weakness:

  1. The dataset is not open-source, so it would be difficult to assess the performance fairly.
  2. More details regarding the dataset should be provided, for example, how does the dataset look like? What belongs to unique designs and unique netlists? What are some valid/invalid samples?

其他意见或建议

  1. I suggest the author provide more qualitative examples and qualitative evaluation to demonstrate their methods and insights.
作者回复

We appreciate the reviewer’s constructive feedback and address each concern below.

Comparison with Auto-SPICE [1]:

Our work fundamentally differs from Auto-SPICE, which focuses on generating SPICE netlists via domain-specific prompt engineering for dataset creation. It aims to generate large-scale SPICE netlist datasets from analog textbooks by annotating netlist schematics. In contrast, our approach—AutoCircuit-RL—goes beyond merely creating a dataset. It employs a two-phase process: first, an instruction-tuned LLM generates initial circuit topologies; then, a reinforcement learning (RL) refinement phase iteratively optimizes these topologies for multiple objectives (validity, efficiency, and expected output voltage). This reward-driven optimization enables our method to produce more robust designs and scale to more complex circuits.

Our work differs from Auto-SPICE in several key aspects. Auto-SPICE focuses on automating the creation of large-scale SPICE netlist datasets (e.g., Masala-CHAI) by extracting netlists from schematic images and their captions, and then uses these datasets to fine-tune LLMs for SPICE netlist generation. In contrast, our approach—AutoCircuit-RL—is centered on the methodology of generating circuit topologies that adhere to specific design constraints using a synthetic dataset. We first leverage instruction tuning to generate initial topologies based on various user prompts, and then apply reinforcement learning to iteratively refine these designs with respect to objectives such as circuit validity, efficiency, and output voltage.

While our methodology could potentially benefit from integrating datasets like those produced by Auto-SPICE, doing so addresses a different problem: dataset creation versus constrained topology synthesis. Our work tackles the challenge of generating circuit topologies under diverse, constraint-specific scenarios—a distinct and complementary problem to that addressed by Auto-SPICE.

Dataset Details:

We plan to release the code and dataset upon acceptance to support reproducibility. We have provided details of dataset collection in Section 2. Each data sample will include a circuit netlist representing the topology, along with its corresponding duty cycle, output voltage and efficiency obtained from the simulator.In the paper, “unique netlists” refer to differences in connectivity and node indexing, while “unique designs” abstract away superficial ordering variations. We will provide a representative sample subset for illustration.

Suggestions:

Qualitative Examples and Evaluation:

Thank you for the suggestion. We have included qualitative examples in the Appendix that showcase generated circuit topologies and highlight successful cases. These examples illustrate how our model adapts to design constraints and generates diverse solutions. We plan to expand these qualitative evaluations further in the final version.

Questions:

Computation Cost:

Our method generates a circuit design in approximately 1.7 seconds using 2 NVIDIA V100 GPUs—substantially faster than traditional search-based methods, which often require hundreds of seconds due to extensive SPICE simulations. This efficiency is a direct result of our two-phase, reward-driven RL refinement process.

Comparison with Other Baselines:

We appreciate the suggestion to compare our approach with recent methods such as AnalogCoder and Artisan. While our primary comparisons have been with GraphVAE and other LLM-based methods, additional analyses show that our RL refinement significantly improves circuit validity and efficiency compared to these approaches. AnalogCoder relies on training-free code generation, which limits its ability to explore novel or complex topologies, whereas our method’s iterative refinement better addresses the multi-objective challenges of power converter design. AnalogCoder generates amplifier circuits while this work generates power converter topologies, it’s not a direct comparison.

Failure Cases and Detailed Prompts:

We recognize that including failure cases and error analysis will provide a clearer understanding of our approach and highlight potential areas for future improvement. While Table 3 currently presents examples of successful generation scenarios, we will expand it to include failure cases as well. Additionally, Table 3 in the Appendix provides detailed prompts for different constraint-related scenarios, offering insights into the model’s behavior under various conditions. These analyses help diagnose existing limitations and inform future refinements. We will incorporate further discussion on these aspects in the final manuscript upon acceptance.

Learning Curve of RL Tuning:

The convergence curve of RL tuning will be added in the supplementary section upon acceptance.

Code Implementation:

We will opensource the code if this manuscript is accepted.

最终决定

The paper presents AutoCircuit-RL, a reinforcement learning (RL)-based framework for automated analog circuit topology generation.

Most reviewers agree that the problem is well-motivated, the proposed approach is novel, and the empirical evaluations are both solid and comprehensive. The authors make great efforts and address most of the concerns of the reviewers during the rebuttal phase.

Overall, the paper is valuable for this community. Therefore, I recommend accepting the paper.