PaperHub
4.9
/10
Poster4 位审稿人
最低2最高3标准差0.4
2
3
3
3
ICML 2025

Chip Placement with Diffusion Models

OpenReviewPDF
提交: 2025-01-23更新: 2025-07-24
TL;DR

We train diffusion models using synthetic data to perform macro placement, a key step in chip design.

摘要

关键词
macro placementdiffusionmachine learningchip designsynthetic data

评审与讨论

审稿意见
2

This paper proposes to address the challenges faced in RL-based placement methods, including 1) scalability to larger circuits, and 2) the trajectory cannot be reversed in RL-based methods. A method to synthesize placement data, and a diffusion-based method to tackle the placement task are proposed.

update after rebuttal

After carefully reviewing the rebuttals and comments, I would like to maintain my current score.

给作者的问题

In Table 5, the magnitude of the mixed-size placement HPWL is 1e6, while in ChipFormer, the magnitude of mixed-size placement HPWL is 1e7 (shown in Table 8 in [1]). Could the authors to explain the inconsistency or check the correctness of the presented data?

[1] Lai, Yao, et al. "Chipformer: Transferable chip placement via offline decision transformer." International Conference on Machine Learning. PMLR, 2023.

论据与证据

Yes.

方法与评估标准

Yes.

理论论述

Yes. There are no issues about the correctness of any proofs for theoretical claims.

实验设计与分析

Yes. There are no issues about the soundness/validity of any experimental designs or analyses.

补充材料

Yes. The source code part.

与现有文献的关系

The idea of constructing synthesized data to support the training of model is valuable.

遗漏的重要参考文献

A fast and strong RL+MCTS method [1] for macro & mixed-size placement, which is not discussed and compared.

[1] Geng, Zijie, et al. "Reinforcement learning within tree search for fast macro placement." Forty-first International Conference on Machine Learning. 2024.

其他优缺点

Strengths A new approach to construct the synthesized data to alleviate the scarcity of data in the EDA field.

Weaknesses

  1. Optimization of other objectives are not consideration, such as timing metric (WNS, TNS), final wirelength.
  2. Constraints such as the non-overlap constraint cannot be guaranteed by the diffusion method, which need a post-processing legalization to further eliminate the overlap. in the contrary, overlap can be avoided in RL-based method like MaskPlace and ChipFormer through the filtering out invalid positions in the actions which lead to overlap, e.g., through the position mask proposed in MaskPlace.
  3. The motivation of using diffusion-based method is not promising enough. The authors only discuss the weaknesses of RL-based methods, while the discussion of analytical methods such as DREAMPlace is ignored. The advantage of diffusion and the motivation should be further analyzed.
  4. Performance between the proposed approach and DREAMPlace is very similar, with only decreasing from 23.6 to 22.7, while other cost, including the congestion, timing metrics (WNS & TNS), runtime (speed), and the resources cost for training the diffusion model are not compared.

其他意见或建议

  1. Compare the inference speed of diffusion-based method and analytical method.
  2. Consider other objectives, such as post-routing wire length, timing metrics, power, are hard to represent into a differentiable form, it is more attractive to inject these objectives into the optimization of diffusion-based methods, rather than merely concentrating on the optimization of HPWL.
  3. For other suggestions, please refer to the weakness part.
作者回复

We thank the reviewer for the insightful feedback. Our response is as follows:

EfficientPlace: Although EfficientPlace uses tree search to address the shortcomings of RL, it still requires significant training on every new circuit to perform well. We present additional experiments on the IBM benchmark below, showing that our method significantly outperforms EfficientPlace in both HPWL and congestion, while requiring a fraction of the time.

Table 1 Congestion and HPWL on IBM

Wiremask-BBOChipformerMaskPlaceEfficientPlaceOurs
Average Congestion (RUDY)323.6335.9345.02366.7195.5
Average HPWL (10^6)7.4327.9318.7238.3162.495

DreamPlace: We highlight that our method outperforms DreamPlace, a very strong baseline, while other macro placement methods fall far short. This could be because DreamPlace relies on gradient descent, which performs a local search, whereas the diffusion model can perform a global search based on the training data. We agree that this additional performance comes at a cost, with our method having a longer runtime as shown in Table 2 below, and requiring several days of training.

Nevertheless, we emphasize that our work explores and develops a novel approach - training diffusion models - for the placement problem, and to our knowledge is the first to apply diffusion to this domain. Developing a method that is competitive with, even marginally outperforming, DreamPlace is still a significant improvement over prior macro placement approaches, and demonstrates that this approach has strong merits. We believe showing this is one of the contributions of our work.

Table 2 Runtimes in minutes

Wiremask-BBOChipFormerDreamplaceOurs
IBM Average227.2124.880.4754.39
ISPD Average886.51048.53.00020.89

Non-overlap: While it is true that our method does not enforce hard constraints to prevent overlap, we find that in practice this is not an issue. Legality guidance, combined with gradient-based legalization, is effective in ensuring almost no overlaps in our macro placements.

Optimization of other objectives: We presented results on congestion in table 1 above, which shows that our method achieves significantly lower congestion than the baselines. This result is consistent with findings from prior work [1] that finds congestion to be correlated with HPWL.

We agree that downstream metrics such as PPA are important targets for optimization. However, optimizing for PPA is difficult, with many similar works focusing on simple proxy objectives like HPWL. Moreover, the commonly used benchmarks such as ISPD2005 and IBM do not support timing analysis. Because the goal of our work is to explore and develop a novel approach - training a diffusion model - to macro placement, we have therefore chosen to focus our contributions on developing the techniques necessary for such an approach, such as synthetic data generation, rather than simultaneously tackling PPA optimization. Therefore, while PPA optimization is an important end-goal, we leave it as an area for future work.

Answers to Questions:

  1. We believe the magnitude in the ChipFormer paper should be 1e5. Table 15 of the MaskPlace paper [2] has the same numbers as those in Table 8 of the ChipFormer paper, but with a scale of 1e5, which is the same as us.

We hope we have been able to address your concerns.

[1] Shi et. al. "Macro Placement by Wire-Mask-Guided Black-Box Optimization." Neurips, 2023.

[2] Lai et. al. "MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning." Neurips, 2022.

审稿意见
3

The authors propose a new diffusion-based method to address chip placement. Compared to existing RL approaches, pre-trained diffusion models can obtain the placement results on new circuits within minutes, which are much more efficient. After global placement, users can fix the positions of macros and optimize the cells using other cell placer like DREAMPlace.

update after rebuttal

Based on the reply, I would like to increase the score to 3. I hope the authors could add these new experiments and illustrations to the revised manuscript for a more transparent presentation.

给作者的问题

  • Why did the authors choose IBM benchmarks as the test dataset? I note that (all) the baselines the authors compared used ISPD05/15 as the datasets.
  • The experimental results of HPWL on ibm01-04 are quite different from those displayed in ChipFormer. Is this because the results of ChipFormer are macro-only and the results in this work are mixed-size?
  • What is the placement xx? Is it the positions of all macros?

论据与证据

The authors claim that RL methods are slow and suffer from flexibility, however, they do not show diffusion models’ detailed time overhead.

方法与评估标准

Yes, the proposed method intuitively makes sense, but the procedure is not that clear in Sec. 4.3. I suggest the authors provide a detailed algorithm box.

理论论述

No theoretical claim is provided.

实验设计与分析

The datasets used in the experiments are not in line with those in existing work. For example, this paper tests the approach on ibm01-18, but the baselines test the ispd05/25 benchmarks.

补充材料

Yes, the supplementary material contains the code.

与现有文献的关系

This paper is an approach for chip placement, which is in the field of physical design in electronic design automation.

遗漏的重要参考文献

Yes, I think the references are sufficient.

其他优缺点

Other Strengths:

  • The authors address chip placement under the perspective of diffusion models, which can obtain chip placement results on new circuits within minutes.
  • The motivation is clear, as existing RL methods take a long time to complete the placement.

Other weaknesses:

  • Non-learning methods, especially the DREAMPlace, are not included in the related work section.
  • DREAMPlace has lots of versions with significantly different performance, so it is important for the authors to mention the version that they used in the experiments. I note that the authors mentioned DREAMPlace 4.1 but cited their paper in 2019.
  • The datasets used in the experiments are not in line with those in existing work. The author could give a suitable explanation to address this weakness.

其他意见或建议

  • The authors can display placement visualizations of different methods on the same circuit.
  • As the fixed-size performance significantly depends on cell placers (e.g, DREAMPlace), the authors could display more comparison results on macro-only settings.
作者回复

We thank the reviewer for their insightful feedback. Our response is as follows:

Choice of dataset: We choose the IBM dataset for several reasons. First, it contains more circuits - 18, compared to 8 for ISPD2005. Second, the IBM dataset allows for easier comparison with other macro placement methods. The ISPD2005 benchmark contains circuits with a large number of macros (up to 23k), causing prior works to omit these circuits or pick macros to place according to various criteria. The IBM dataset avoids this issue, and allows for consistent evaluation of macro placement methods on all 18 circuits. Nevertheless, we have evaluated our method on the ISPD2005 dataset, using the macro-only setting for easier comparison, with the results shown in Table 1 below. Our method achieves significantly improved performance over the baselines.

Macro-only Evaluations: We agree with this suggestion, and present our results in the tables below.

Table 1 HPWL on ISPD2005 benchmark

Wiremask-BBOChipformerMaskPlaceOurs
Average HPWL (10^6)19.53413.68914.9254.393

Table 2 HPWL on IBM benchmark

Wiremask-BBOChipformerMaskPlaceEfficientPlaceOurs
Average HPWL (10^6)7.4327.9318.7238.3162.495

Time overhead: We present the runtimes for our method and baselines in the table below. Our method is significantly faster than other macro placement methods, taking on average 4 and 20 minutes on the IBM and ISPD benchmarks respectively, compared to RL or BBO methods that take 10 times longer. We note however that we are slower than Dreamplace, and further optimization of our code and diffusion sampling is an interesting area of future work.

Table 3 Runtimes in minutes

Wiremask-BBOChipFormerDreamplaceOurs
IBM Average227.2124.880.4754.39
ISPD Average886.51048.53.00020.89

To clarify, we used Dreamplace 4.1, and will correct the citation to reflect this.

Answers to questions:

  1. Our motivation for choosing IBM is detailed above. We have also performed additional experiments on the ISPD05 benchmark, with results shown in Table 1 above.

  2. Yes, Table 2 in the ChipFormer paper reports macro-only HPWL, whereas Table 5 in our paper reports mixed-size HPWL.

  3. x is a (V x 2) array, where V is the number of objects in the netlist, containing the 2D positions of all objects, which includes macros and standard cell clusters.

We hope we have been able to address your concerns.

审稿人评论

Thanks for the authors' response. Though some of my concerns have been addressed, I still have some questions regarding the performance of baselines.

First, the HPWL performances of MaskPlace and ChiPFormer on mixed-size placement in this paper are different from those in their original papers. For example, according to MaskPlace, the HPWL on ibm01 is 24.18×10524.18\times 10^5. In ChiPFormer, the HPWL on ibm01 is 16.70×10716.70\times 10^7 (it might be a typo in their paper if I understand correctly, should be 16.70×10516.70\times 10^5). However, in your paper, these two values are 3.33×1063.33\times 10^6 and 3.35×1063.35\times 10^6. (same issues also occur in other benchmarks ibm02, ibm03...) What is the reason for this discrepancy?

Second, for the newly-added macro-only experiments performed on the ISPD2005 benchmark, such circumstances may also exist. Additionally, I think it is not proper to only show the average HPWL or time for the ISPD2005 benchmark as the circuits differ significantly in their scale. The authors could detail the performance of each circuit.

作者评论

We thank the reviewer for your thoughtful response, and hope the following addresses your remaining concerns.

Mixed-size HPWL on IBM: The differences in HPWL from the original papers can be accounted for by differences in evaluation setups. The most major is whether the macros are fixed during standard cell placement. For ease of comparison, and to make clearer the impact of the initial macro placements, we fixed the macro positions when placing standard cells with DreamPlace, while many prior works allowed them to move. Another difference is the DreamPlace version (we use 4.1).

Macro-only HPWL on ISPD: The differences in HPWL from the original papers is because some baselines select and place only a subset of the macros (selection criteria differs between baselines), while we place all macros for all baselines to ensure a fair comparison. This is especially significant for bigblue2 and bigblue4, which have large numbers of macros, but can also apply to other circuits. MaskPlace, for instance, places only 128 macros [1] for adaptec1.

Reporting performance for each circuit: We present the per-circuit figures in the tables below, with HPWL in Tables 1 & 2, runtimes in Tables 3 & 4, and congestion (requested by other reviewers and included for reference) in Tables 5 & 6. We note some minor differences (Tables 1, 3, 5) with the earlier reported averages on the ISPD benchmark. We erroneously copied the data previously, and deeply apologize for our mistake. The tables below contain the corrected figures. Nevertheless, we emphasize that our conclusions remain unaffected: our method produces significantly better results on both HPWL and congestion on the IBM and ISPD benchmarks, while running faster than prior methods (except DreamPlace).

We hope we have been able to address your concerns.

[1] See line 51 and 63 of PPO2.py in the MaskPlace public github repository.

Table 1 HPWL (×105\times 10^5) on ISPD.

MaskPlaceWiremask-BBOChipformerOurs
adaptec18.575.816.759.19
adaptec277.754.563.831.0
adaptec310859.273.254.4
adaptec491.962.785.854.5
bigblue13.112.123.052.64
bigblue2Timeout18685.838.8
bigblue384.066.279.235.9
bigblue4Timeout798548141
Average-15411645.9

Table 2 HPWL (×105\times 10^5) on IBM.

MaskPlaceWiremask-BBOChipformerEfficientPlaceOurs
ibm014.302.783.883.661.16
ibm025.544.195.054.422.68
ibm033.313.303.743.871.07
ibm046.915.435.966.102.40
ibm060.930.850.870.840.32
ibm072.672.662.363.420.78
ibm0820.619.219.919.39.32
ibm092.451.761.772.570.44
ibm1023.818.218.220.65.28
ibm114.153.753.254.700.78
ibm1214.911.813.012.12.85
ibm134.584.414.025.371.05
ibm148.439.807.4411.72.42
ibm154.687.772.675.981.06
ibm1618.314.815.515.26.11
ibm1716.812.213.717.93.20
ibm185.983.444.193.641.52
Average8.727.437.338.322.49

Table 3 Runtime (minutes) on ISPD.

MaskPlaceWiremask-BBOChipformerDreamplaceOurs
adaptec11392112231.074.78
adaptec21952092341.344.53
adaptec32242072842.064.73
adaptec4718.22124672.434.94
bigblue12742042561.304.83
bigblue2-139652203.80122
bigblue36482334943.145.13
bigblue4-59612108.8618.9
Average-408.510493.0021.2

Table 4 Runtime (minutes) on IBM.

MaskPlaceWiremask-BBOChipformerEfficientPlaceDreamplaceOurs
ibm0115420998540.3081.85
ibm0216520487610.4112.25
ibm0312321775610.3932.17
ibm046320882610.4012.21
ibm063422480290.2292.38
ibm075822380630.2612.87
ibm0875207105790.2603.38
ibm095022171460.2573.08
ibm105162282362680.4554.50
ibm1179224106800.3033.40
ibm123902531962060.4695.24
ibm1393225127950.6134.19
ibm143932661872160.7605.95
ibm1583254113860.9236.81
ibm161072171371300.7847.88
ibm174892662503580.83910.33
ibm186021693580.7428.06
Average1722271241140.4754.39

Table 5 Congestion on ISPD

MaskPlaceWiremask-BBOChipformerOurs
adaptec1312139140149
adaptec2106810841180668
adaptec3990672677579
adaptec4945793779584
bigblue198.525.119.023.4
bigblue2-1924500523
bigblue3969.8955956391
bigblue4-629024361451
Average-1485836546

Table 6 Congestion on IBM

MaskPlaceWiremask-BBOChipformerEfficientPlaceOurs
ibm01289253266316160
ibm02228243205257178
ibm03176173173214117
ibm04449483490480260
ibm0679.277.176.976.742.8
ibm0715416416017783.0
ibm081232119812611288776
ibm0912711911115349.1
ibm10480463466538362
ibm1118018317224069.2
ibm12392212357360190
ibm1316320217720986.0
ibm14378375378418232
ibm1516217317322769.2
ibm16574497528534334
ibm17531464488483204
ibm18271221229266111
Average345324336367196
审稿意见
3

The authors proposed a diffusion model-based chip placement strategy. They also developed a novel data generation algorithm and a synthetic dataset, training the model to enable zero-shot transfer to real circuits. Additionally, they introduced a neural network model that demonstrates strong performance and scalability.

给作者的问题

I would like to hear the authors' thoughts on conducting an additional experiment using other private or public datasets to establish the generalization and scalability of the approach, such as modern IC design netlists.

论据与证据

The major claims by the authors:

  1. Synthetic data generation: The approach generates a plausible netlist ensuring that the given placement is near-optimal while enabling data generation without relying on commercial tools or higher-level design specifications. Tables 1, 2, and 3 provide evidential support for this claim.

  2. Dataset design: An extensive empirical study was conducted to investigate the generalization properties of models trained on synthetic data, identifying several factors—such as the scale parameter—that contribute to poor generalization. These insights were utilized to design synthetic datasets that enable effective zero-shot transfer to real circuits. Once again, Tables 1, 2, 3, and 7 provide evidential support for this claim.

  3. Model architecture: The authors proposed a novel neural network architecture incorporating interleaved graph convolutions and attention layers, resulting in a model that is both computationally efficient and highly expressive. Tables 5 and 6 provide support for this claim.

方法与评估标准

The proposed method is thoroughly evaluated using a well-designed experimental setup and relevant metrics. The authors provide an in-depth analysis, effectively demonstrating proof-of-concept to support their claims. Additionally, the combination of proposed strategies enables the generation of placements for unseen netlists in a zero-shot manner, achieving competitive performance with state-of-the-art (SOTA) methods on the IBM benchmark dataset ICCAD04.

理论论述

Yes, the theoretical claims regarding the quality of synthetic data generation, dataset design, scalability, and generalization impact have been quantified through experimental validation.

实验设计与分析

Yes....... soundness/validity of experimental designs has been validated against:

  1. Synthetic data generation: The approach generates a plausible netlist ensuring that the given placement is near-optimal while enabling data generation without relying on commercial tools or higher-level design specifications. Tables 1, 2, and 3 provide evidential support for this claim.

  2. Dataset design: An extensive empirical study was conducted to investigate the generalization properties of models trained on synthetic data, identifying several factors—such as the scale parameter—that contribute to poor generalization. These insights were utilized to design synthetic datasets that enable effective zero-shot transfer to real circuits. Once again, Tables 1, 2, 3, and 7 provide evidential support for this claim.

  3. Model architecture: The authors proposed a novel neural network architecture incorporating interleaved graph convolutions and attention layers, resulting in a model that is both computationally efficient and highly expressive. Tables 5 and 6 provide support for this claim.

补充材料

yes...code/scripts and validation presented against claims in the research manuscript

与现有文献的关系

The research topic and the presented idea, despite certain limitations, are interesting and hold potential significance for the broader research community. This is particularly true from two perspectives: synthetic data generation and dataset design. Additionally, the combination of the proposed strategies enables the generation of placements for unseen netlists in a zero-shot manner, achieving competitive performance with state-of-the-art (SOTA) methods on the IBM benchmark dataset ICCAD04.

遗漏的重要参考文献

NA

其他优缺点

Mentioned and discussed in previous sections as "Methods And Evaluation Criteria*" and "Experimental Designs Or Analyses*".

其他意见或建议

The authors should benchmark the proposed approach on other private or public datasets to establish its generalization and scalability, such as any modern IC design netlists.

伦理审查问题

No ethical review concerns noticed.

作者回复

We thank the reviewer for their helpful feedback. Our response is as follows:

Additional benchmarks: We have included experiments on the ISPD2005 benchmark, which we show in the table below. To facilitate comparison with baselines, we follow the suggestion of reviewer m7a3 and present HPWL and congestion in the macro-only setting. These results show that our method significantly outperforms baselines on this benchmark as well.

Table 1 Congestion and HPWL of macro placements on ISPD2005

Wiremask-BBOChipformerMaskPlaceOurs
Average Congestion (RUDY)1837988.71291539.4
Average HPWL (10^6)19.53413.68914.9254.393

We hope we have been able to address your concerns.

审稿意见
3

This paper applies diffusion models to macro placement. The motivation is that existing RL-based methods for macro placement are slow and lack flexibility. To provide more data for training, this paper generates synthetic data by randomly placing objects, sampling pins, and creating edges based on a distance-dependent probability. The model architecture combines GNN and attention layers, with MLP blocks and sinusoidal encodings. Guided sampling is used to optimize placement quality. Experiments on synthetic data and the ICCAD04 benchmark show that the model can achieve competitive results in terms of legality and HPWL, and it performs well in mixed-size placement.

给作者的问题

  1. Where were the other RL methods in Table 5 trained?
  2. How are overlaps handled? Is the legalization method provided in DREAMPlace used?
  3. Fig. 6: Increasing the scale parameter significantly causes legality to violate constraints. Would using a diverse range of scales (e.g., randomly samling in a large range) during training lead to better results?

论据与证据

“WireMask-BBO must be started from scratch for each new circuit.” Incorrect claim. The original WireMask-BBO paper show that it can fine-tune existing placements. I think the one of drawbacks of WireMask-BBO is the generalization ability and search efficiency, compared to those learning-based RL approaches.

方法与评估标准

There is no PPA evaluation. Although wire length is important, many articles have actually found that its impact on the final result is also limited. Recently, there are some open-source platforms have provided PPA evaluation such as OpenRoad and [1].

I believe adding PPA assessment could significantly enhance the quality of this paper.

[1] Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms. arxiv, 2024.

理论论述

No theory part.

实验设计与分析

  1. I believe the validity of synthetic data requires more discussion. If synthetic data is a contribution, some experiments should be included to demonstrate that the previous method (e.g., Flora) is ineffective.
  2. Followed by point 1, I believe an important contribution is the study of the role of synthetic placement datasets. If the dataset proposed in this paper truly "covers the important features" as stated in line 311, it should also improve the reinforcement learning methods; however, the authors have not compared this aspect.
  3. Clustering standard cells: How does it compare to placing only several macros and then using DMP to place standard cells? How does it compare to other RL methods with the same clustering approaches?

补充材料

Yes. The code's structure seems elegant.

与现有文献的关系

Chip placement is an vital tasks to EDA. Previous chip placement relies on RL and suffers several limitations. This paper first propose to use diffusion model to conduct chip placement and perform well.

遗漏的重要参考文献

There are many recent papers on reinforcement learning for chip placement in AI conferences [1-3], which I believe should be at least discussed.

[1] Reinforcement Learning within Tree Search for Fast Macro Placement. ICML'24.

[2] Reinforcement Learning Policy as Macro Regulator Rather than Macro Placer. NeurIPS'24.

[3] LaMPlace: Learning to Optimize Cross-Stage Metrics in Macro Placement. ICLR'25.

其他优缺点

Strengths:

  1. Completely using synthetic data has demonstrated good generalization ability.
  2. The authors studied the impact of synthetic data on generalization and conducted extensive analyses, including the number of edges and vertices, etc.

Weaknessess:

  1. The writing should be improved. Besides, adding some discussions of the background and recent related works would also be very beneficial.

其他意见或建议

  1. More references should be added; for example, the first paragraph of the introduction has no references at all. It is necessary to include more articles on EDA background so that people in the machine learning field can understand the context of the problem.
  2. The third contribution - Model Architecture, cannot be considered a contribution, as it does not seem novel to me since many papers apply similar architectures. It would be better to list the application of diffusion for chip placement as a contribution here.
  3. Please use "DMP" rather than "DP" in Table 5 to represent DREAMPlace.
作者回复

Thank you for your insightful feedback and suggestions. We hope the following can address your concerns.

PPA evaluation: We agree that PPA evaluation and optimization is important. As a step towards analyzing and optimizing downstream objectives, we also evaluated congestion of our macro placements, with the results in Table 1 below showing that our method significantly outperforms the baselines not just in HPWL, but in congestion.

However, optimizing for PPA is difficult, with many similar works focusing on simple proxy objectives like HPWL. Moreover, the commonly used benchmarks such as ISPD2005 and IBM do not support timing analysis. Because the goal of our work is to explore and develop a novel approach - training a diffusion model - to macro placement, we have therefore chosen to focus our contributions on developing the techniques necessary for such an approach, such as synthetic data generation, rather than simultaneously tackling PPA optimization. We believe that despite its shortcomings, HPWL is a reasonable optimization objective, particularly as a first step when exploring a new approach.

Therefore, while PPA optimization is an important end-goal, we leave it as a direction for future work.

Table 1 Congestion on IBM macro placements.

Wiremask-BBOChipformerMaskPlaceEfficientPlaceOurs
Average Congestion (RUDY)323.6335.9345.02366.7195.5

Validity of synthetic data: We trained different-sized models on a dataset generated using Flora’s algorithm and found that models trained on their dataset show much poorer legality than ours when evaluated on the clustered IBM circuits. This indicates that the Flora dataset generalizes poorly, in contrast to ours. The results below are after 1M training steps.

Table 2 Performance of Large and Medium models trained on different datasets.

Large+FloraMedium+FloraLarge+v1 (Ours)Medium+v1 (Ours)
Legality0.2830.3490.8060.784
HPWL (10^7)3.0583.3063.3303.527

Training RL with synthetic data: This is a good point, and we believe this to be an interesting experiment to perform in the future.

Clustering standard cells: We performed mixed-size placement on the IBM benchmark using clustered standard cells (our approach), and the suggested approach (also commonly used in literature) of first placing macros only. The results below show that using clustered standard cells performs better, likely because the macro positions can be informed by connectivity and space needed for standard cells.

Table 3 Mixed-size placement performance with and without standard cell clusters.

ClusteringPlacing Macro-only
Average HPWL (10^6)22.727.9

Recent papers: We have conducted additional experiments comparing with EfficientPlace[1] in the macro-only setting, which we show below. Although EfficientPlace uses tree search to address the shortcomings of RL, our method still produces higher-quality samples in HPWL, while requiring a fraction of the sampling time.

Table 4 Comparison of various methods, including EfficientPlace, on the IBM benchmark.

Wiremask-BBOChipformerMaskPlaceEfficientPlaceOurs
Average HPWL (10^6)7.4327.9318.7238.3162.495

Answers to questions:

  1. We used the official implementations and trained on the test (ie. IBM benchmark) circuits.
  2. We post-processed the macro placements with our own gradient-based legalizer. Combined with legality guidance, this method is effective in ensuring almost no overlaps.
  3. As mentioned in section 5.1.4, we do use a diverse range of scales to generate our dataset, sampling the scale from a log-uniform distribution, with a range of (0.05, 1.6) and (0.025, 0.8) for the v1 and v2 datasets respectively.

We thank the reviewer for the helpful comments on the writing and clarity of our paper, and will be sure to make the necessary changes. We hope we have been able to address your concerns.

[1] Reinforcement Learning within Tree Search for Fast Macro Placement. ICML'24.

最终决定

Summary: Paper used the standard DDPM diffusion pipeline with a denoiser whose architecture has interleaved attention and Graph neural Net layers and many other modifications to adapt to the problem of placing macros (large components as I understand it ) on 2d Chip layouts. A lot of prior works use RL algorithms where they sequentially commit to placement of components with RL rewards reinforcing the policy. Authors point out that this problem calls for a simultaneous placement and therefore diffusion could play a role.

Authors come up with a simple method that randomly generates rectangular components and places them randomly obeying some overlap constraints and placement of pins satisfy some wire-length constraints. Authors show that by training a DDPM model with a novel architecture for denoiser using GNN layers, it shows impressive performance on many downstream metrics on real world benchmarks.

Discussion/Rebuttal points

  1. From the discussion, it is acknowledged by reviewers and authors that this method is very competitive with DREAMplace - a state of the art method that uses a lot of domain specific information to optimize placements. Given that it is trained only on synthetic data and guided at test time by differentiable proxies, I see value in this being competitive with a good custom prior alternative.

  2. Authors acknowledge that the benchmarks they have chosen are not compatible with other metrics like timing analysis but congestion based metrics are used to evaluate in the rebuttal. Different synthetic datasets were tried and their dataset generation method seems to outperform.

Overall: Main weakness is lack of algorithmic novelty - its the same standard DDPM pipeline. However, the architecture for denoiser is novel and tailored to this domain and that is the main punchline. I don't consider lack of all downstream metrics in evaluation to be a potential block against publication. For the potential impact of diffusion to an important domain, novel architectural choices and demonstration that purely pre-training on synthetic data suffices, I am recommending accept.

Note to authors: Please include all the experimental results from rebuttal (including on new metrics and ablations on other synthetic dataset) to the paper.