6.0

/10

Poster4 位审稿人

最低5最高7标准差0.7

4.0

置信度

正确性2.3

贡献度2.5

表达2.8

NeurIPS 2024

NeuralSteiner: Learning Steiner Tree for Overflow-avoiding Global Routing in Chip Design

Ruizhi Liu,ZhishengZeng,Shizhe Ding,Jingyan Sui,Xingquan Li,Dongbo Bu

OpenReview PDF

提交: 2024-05-16更新: 2024-11-06

TL;DR

We propose NeuralSteiner, a learning-based method to optimize overflow and wirelength simultaneously for global routing problem in chip design.

摘要

关键词

Global routingchip designneural networkSteiner treedeep learningcongestion

评审与讨论

审稿意见

评分: 7置信度: 42024-07-08

This paper proposes NeuralSteiner, a two-phase global routing scheme to optimize both wirelength and overflow. It also demonstrate capability of generalization on unseen large-scale circuits. Experiments on public benchmarks show NeuralSteiner reduces 99.6 % in overflow with a wirelength increase of 1.8 %.

优点

NeuralSteiner proposes a scheme to realize partial parallel routing tasks by group nets whose bounding boxes share no common area or overlap.
NeuralSteiner is able to optimize the overflow in global routing. It utilizes SOTA global router CUGR to perform global routing stage and construct expert datasets. These routing results are congestion-avoided, assisting the candidate prediction model to learn the position of Steiner points and corner points.
Besides, it also proposes NAG representation and a RST construction algorithm to reduce the overflow in global routing.

缺点

Typos in line 282: "enven arger than 2000".
NeuralSteiner is not an end-to-end approach, still relying on the post-processing/greedy algorithm to construct the final routing results. But this greedy algorithm is simple and efficient, and utilizing deep learning for RST construction is leaved for future work.

问题

Where is the edge weight in NAG used?
How to calculate the "distance" mentioned in line 222 to select the two nearest connected components?
In line 154, the authors mention they use CUGR to perform global routing to construct the congestion-avoid datasets. And they also propose some algorithms such as NAG, overflow-avoided RST construction algorithm. Between the datasets and algorithms, which one is the key to the overflow reduction of NeuralSteiner?

局限性

RST construction still relys on heuristics post-processing algorithm, but this could be leave for future work.

作者回复

2024-08-07

Thank you for your detailed review and valuable feedback. Please note our global comment with additional experimental results. Below we will address specific questions.

W1: Thanks again for your meticulous review, and we will correct typos in the revised version.

W2: NeuralSteiner is not an end-to-end approach, still relying on the post-processing/greedy algorithm to construct the final routing results.

We understand the reviewer's concerns about NeuralSteiner not being an end-to-end method. The complexity of obstacle-avoiding Steiner tree generation in global routing makes it very challenging for the neural network to simultaneously optimize wirelength and congestion, and ensure connectivity in an end-to-end manner.

Methods like PRNet [1] and [3] have attempted to merge these tasks using a single network but faced significant connectivity issues. Hubrouter [2] addresses these two tasks with different networks, solving the problems of RST connectivity and wirelength optimization.

Our NeuralSteiner method achieves optimization of wirelength and congestion simultaneously through a two-stage setup. However, since the existing candidate points still need to be processed as discrete values for subsequent RST construction, differentiable learning is difficult to apply. In the future, we will explore the efficient generation of overflow-avoiding RSTs using an end-to-end neural network.

Q1: Where is the edge weight in NAG used?

In Section 3.4, Eq.8 defines the weights used for edges in the NAG.

These weights consider both the wirelength and the congestion the edge passes through, with parameters $w_d$ and $w_o$ reflecting the trade-off between the two. We used $w_d = 1$ and $w_o = 2$ in all experiments. However, we realize that the usage of the symbol $O(x,y)$ causes some confusion, as it has the same meaning as $o_{ij}$ in Eq.6 of Section 3.3, representing the value of the overflow map at that location. We will clarify this ambiguity in future revisions.

Q2: How to calculate the "distance" mentioned in line 222 to select the two nearest connected components?

When calculating the distance between two connected components, we compute the shortest path length on the NAG connecting any two points belonging to the different two components respectively (where the path length is the sum of the weights of all edges on that path). The minimum of all these shortest path lengths is taken as the distance between the two connected components. Please refer to the Q3 in global responses for detailed analysis of RST construction algorithm and the acceleration methods.

Q3: Between the datasets and algorithms, which one is the key to the overflow reduction of NeuralSteiner?

Thank you for raising this question. We believe it is of significant importance to the advancement of this field. In Section 3.3, we use the expert router CUGR to solve the routing results of nets under congestion and construct an expert routing dataset. The neural network learns from this expert dataset, enabling its output to form candidate points for potential low-congestion RSTs, and provides a solution space for subsequent NAG and the overflow-avoiding RST construction algorithm.

The NAG and the corresponding RST construction algorithm use a greedy algorithm within this solution space to find the final routing result that ensures connectivity while optimizing wirelength and congestion. As shown in the ablation experiments Table 3 and Table 4 for adaptec05, even though the inclusion of NAG and the RST construction algorithm significantly reduces OF without using the candidate points generated by neural network, the addition of the neural network trained on the expert dataset further avoids 95% of the remaining challenging congestion.

L: RST construction still relys on heuristics post-processing algorithm, but this could be leave for future work.

As mentioned in reply to Q1 in the global responses and our response to W2, the inclusion of current heuristic post-processing is due to the multi-task nature of overflow-avoiding global routing. We hope to explore more efficient end-to-end approaches in the future to replace the existing heuristic post-processing schemes.

Please kindly let us know if you have any follow-up questions or areas needing further clarification. Your insights are valuable to us, and we stand ready to provide any additional information that could be helpful.

Reference:

[1] The policy-gradient placement and generative routing neural networks for chip design, NIPS 2022.

[2] HubRouter: Learning Global Routing via Hub Generation and Pin-hub Connection, NIPS 2023.

[3] Late breaking results: A neural network that routes ics, DAC 2020.

评论- Response to Author Rebuttal

2024-08-08

Thank you for your detailed response.

Q1 & Q2: Distance Calculation

I have a clear understanding of the distance calculation in line 222. I recommend that the authors provide a clearer explanation of the distance calculation in final version: 1) we calculate the distance on the weighted graph NAG based on XXX algorithm, and 2) edge weight is determined according to Eq. 8.

Q3: Effectiveness of NAG-based RST Construction

In Table 4's ablation study, HubRouter + NAG also demonstrates effective overflow reduction and comparable wirelength. I would like to inquire about the dataset on which HR-GAN (w/o Mask) + NAG was trained. Is this dataset aware of the overflow reduction?

Comparison of CUGR + NeuralSteiner and Original CUGR in Rebuttal PDF

CUGR + NeuralSteiner showcases a reduction of 5% and 20% in short and space respectively, highlighting NeuralSteiner's efficacy in overflow reduction. It would be beneficial to include this comparison in the main paper.

2024-08-09

Thank you for your valuable comments and kind words about our work.

Q1 & Q2: Edge weight and distance calculation

In the final version of the paper, we will ensure a clearer explanation of the edge weight and distance is provided:

We will include more detailed code regarding the distance calculation process in Algorithm 4 of Appendix B.5 in the supplementary material and provide a thorough explanation in the main paper.
We will explicitly state that the edge weight is determined according to Equation 8.

Q3: Effectiveness of NAG-based RST Construction

Thank you for pointing out the effectiveness of HubRouter + NAG in Table 4's ablation study. Regarding your inquiry about the dataset on which HR-GAN (w/o Mask) + NAG was trained: yes, this dataset is the same as that we use in the training of NeuralSteiner, which is aware of overflow reduction and generated by expert routing tool CUGR. We will clarify this in the revised paper.

Comparison of CUGR + NeuralSteiner and original CUGR

Thank you for the suggestion. We will incorporate this comparison of CUGR + NeuralSteiner with the original CUGR, as showcased in the Rebuttal PDF, into the main paper to underscore the effectiveness of our approach.

Thank you once again for your insightful comments and suggestions. We will address each point in the final version of our paper to improve clarity and comprehensiveness.

2024-08-09

Thank you for your detailed response. I have no further questions. Anticipate the release of the source code.

2024-08-10

Thank you again for reviewing our manuscript and for your interest in our work. As mentioned, we will make the source code publicly available upon acceptance of the paper. We appreciate your anticipation of its release.

审稿意见

评分: 6置信度: 32024-07-13

The paper presents NeuralSteiner, a method to improve chip design routing by minimizing overflow. Using a neural network to predict Steiner points and a graph-based algorithm for selection, ensures connectivity and reduces congestion. NeuralSteiner outperforms existing methods, achieving up to a 99.8% reduction in overflow with only a 1.8% increase in wire length, and scales effectively to large nets without needing modifications for new designs.

优点

This paper is well-written and the figures are very easy to understand.
The experiment was very thorough, comparing the routing results of different routers, and verifying the effectiveness of the method in optimizing overlap.
The Overflow-avoiding RST Construction proposed by the author is novel and effective. Meanwhile, NeuralSteiner is the first learning-based approach capable of optimizing both wirelength and overflow.

缺点

Typo: line 282: “enven arger”->”even larger”
It is recommended that the paper should be extended to a full 9 pages.
According to the results, this method still faces challenges when optimizing the wire length and running time.

问题

In Table 2, do you compare the Boxrouter’s routing results?
In Table S2, why the correctness rate is 100%. Is it due to the small problem size?

局限性

Yes

作者回复

2024-08-07

Thank you for your detailed review and thoughtful comments. Please note our global responses with additional experimental results. Below we will address specific questions.

W1: Thanks again for your meticulous review, and we will correct typos in the revised version.

W2: The paper should be extended to a full 9 pages.

Thank you for your comments and for affirming our method. We will expand the article based on the constructive comments in subsequent revisions.

W3: Challenges when optimizing the wire length and running time.

We understand the reviewers' concerns about NeuralSteiner's optimization of wirelength and its scalability to large-scale nets.

First, wirelength is also one of our important metrics. In the construction of NAG, wirelength is considered within the weight, which allows us to achieve minimal wirelength loss while significantly optimizing congestion.

We analyze the algorithm complexity of the overflow-avoiding RST construction algorithm in Q3 of global responses and use acceleration methods to further reduce it to $(O((N_{pin})^3 \log N_{pin}))$ .

Additionally, according to our latest statistics in Table 3 of the rebuttal PDF, NeuralSteiner does not add an excessive number of extra candidate points to construct the NAG, meaning that the number of nodes remains at the scale of $(O(N_{pin}))$ . Therefore, the efficiency of this method will be very high for the vast majority of nets with fewer pins. For nets with more pins, heuristic methods such as subtree division can be used to reduce the computation time.

Q1: In Table 2, do you compare the Boxrouter's routing results?

We understand the reviewers' concerns regarding the comparison of NeuralSteiner and traditional routers like Boxrouter. Boxrouter is not included in the ISPD07 results in Table 2 for the following reasons:

Unlike ISPD98, the ISPD07 global routing competition considers not only the consumption of routing resources but also the consumption of resources by cross-layer vias (structures to connect segments in different layers, consuming routing resources). Boxrouter's reported wirelength results also include part of the via, making it numerically incomparable with neural network-based algorithm results.
NeuralSteiner inherits the abstraction of the routing environment used in Hubrouter to consider the impact of routing segments on resources. However, this abstraction does not effectively model the effect of vias, making via optimization more challenging, which is a direction for future exploration and improvement.

Q2: Correctness rate is 100% in Table S2. Is it due to the small problem size?

Thank you very much for your meticulous reading. The correctness of the net reflects the proportion of successfully connected nets in the test set, considering only connectivity. This metric comes from Hubrouter. Our method achieves 100% net correctness because our NAG-based RST construction algorithm ensures net connectivity, irrespective of the net size. In Table S2, we mainly aim to demonstrate that our method ensures connectivity like Hubrouter. Other methods, such as PRNet, fail to ensure connectivity for smaller nets and are therefore not included in Table S2 for comparison.

2024-08-09

Thank you very much for your detailed response. I hope the authors can release the code soon as promised.

2024-08-10

审稿意见

评分: 6置信度: 52024-07-13

This paper tackles the challenging task of global routing in VLSI design, with the aim of addressing the overflow limitation inherent in current learning-based methodologies. The authors introduce NeuralSteiner, a novel approach that builds upon a previous method known as HubRouter. Like HubRouter, NeuralSteiner divides the global routing process into two stages: the first predicts the locations of probable candidate points, and the second connects these points and pins with a focus on minimizing overflow. In the first stage, NeuralSteiner introduces a new concept – the Candidate point, and generates an expert routing dataset using CUGR and ISPD07. This dataset is then used to train a neural network for the point prediction task. In the second stage, NeuralSteiner employs a straightforward and heuristic construction method that avoids overflow while ensuring connectivity. The paper is well-structured, with clear motivations and easy-to-follow content. NeuralSteiner effectively addresses the significant overflow issue in learning-based global routing, making it applicable. Experimental results show that NeuralSteiner significantly reduces overflow, while maintaining comparable wirelength.

优点

NeuralSteiner is empirically proven to be effective in reducing overflow in the global routing task. Equipped with the CNN structure, NeuralSteiner is capable of recognizing the candidate points, and the RST construction further reduces overflow significantly.
A parallel technique is proposed to route the non-conflicting nets simultaneously, which can relieve the time-consuming plight.
The experiments show a significant improvement in overflow. In particular, for the case of newblue01, the wirelength and overflow both surpass most traditional global routing approaches.

缺点

The paper lacks an analysis of time complexity in the construction stage, leaving uncertainty about the scalability of the second stage.
In the last paragraph of Section 3, a simple post-processing algorithm is proposed to shorten the wirelength, but it lacks detail and also misses the analysis of time complexity.
I note that the performance of wirelength and overflow in Table 2 is pretty good for the case ‘newblue3’. This could be potentially confusing as ‘newblue03’ is part of the training dataset.
Typo: Line 282, enven arger.
Suggestion: figure 2 could be moved to the next page to better correspond with Section 3.

问题

Given that the training set is within HPWL <= 128 or the scale of 64x64, how are NeuralSteiner and HubRouter applied to ISPD07 cases with much larger scales?
Besides the case newblue03 which is already included in the training dataset, I note that the performance of wirelength and overflow in Table 2 is also pretty good for the case ‘newblue1’, outperforming most traditional approaches. However, the overflow for the case ‘ibm01’ is 2033 in Table 3, which is much higher than traditional methods. What are the inherent reasons for this? For instance, the overflow for ‘ibm01’ and ‘newblue1’ are 0 and 400 respectively, while the corresponding results for NeuralSteiner are 2033 and 5.
Based on the above question, I am curious about what the WL and overflow are for newblue01 and newblue03 when using CUGR, the approach used to generate the expert routing dataset.
Is the RST construction conducted by Python or C++?
Will the code be made available upon acceptance of the paper?

局限性

Yes.

作者回复

2024-08-07

Thank you for your detailed review and valuable feedback. Please note our global comment with additional experimental results. Below we will address specific questions.

W1&W2: Analysis of time complexity in the construction stage.

We understand the reviewer's concerns regarding NeuralSteiner's optimization of wirelength and algorithm complexity. In the Q3 of global responses, we analyze the time complexity of the RST construction algorithm. The time complexity for constructing an overflow-avoiding RST is $(O((N_{pin})^4 \log N_{pin}))$ and we further reduce it using limited component nodes acceleration to $(O((N_{pin})^3 \log N_{pin}))$ .

Additionally, according to Table 3 in the rebuttal PDF, our NeuralSteiner method does not add an excessive number of extra candidate points to the nets. This means that the number of nodes in the NAG remains small-scale for nets occupying the majority with fewer pins. Therefore, this method will be very efficient for the vast majority of nets, and for nets with more pins, heuristic methods such as subtree division can be used to reduce the computation time.

W3: The performance of wirelength and overflow in Table 2 is pretty good for the case 'newblue3'. This could be potentially confusing as 'newblue03' is part of the training dataset.

Thank you for pointing this out. We also realize that the use of names in ISPD07 datasets has caused some confusion. In fact, our datasets for testing and training do not overlap.

Our experimental setup follows the work of Hubrouter, which combines ISPD07 and ISPD08 datasets into a single dataset and re-divides the training and test sets. Specifically, the 'newblue3_3d' (with 6 routing layers) nets from ISPD08 are routed using expert routers and included in the training set, while' newblue3_2d' (with 2 routing layers) nets from ISPD07 are in the test set. The 'newblue3' presented in Table 2 is the 'newblue3_2d' from ISPD07. We realize this is not clearly stated, and we appreciate the reviewer pointing it out. We will clarify this in the revised version by correcting it to 'newblue3_2d' from the ISPD07 dataset.

W4: Thanks again for your meticulous review, and we will correct typos in the revised version.

W5: Figure 2 could be moved to the next page to better correspond with Section 3.

Thank you very much for your suggestion. We will move Figure 2 to the relevant section in the revised version to improve readability.

Q1: How are NeuralSteiner and HubRouter applied to ISPD07 cases with much larger scales?

Both our NeuralSteiner method and the networks used in Hubrouter employ a CNN-based backbone without including structures like fully connected layers that require fixed input sizes. Therefore, the trained network weights can accept inputs of any spatial shape. However, there still exist generalization issues. To mitigate the generalization problems brought by large-scale nets (such as non-locality relations between different pins), we introduce the RCCA mechanism in Section 3.3 (lines 166-175) and App. B.1.

Q2: Different performance on ibm01 and newblue01.

Thank you very much for your detailed reading and questions. We have clarified the source of the newblue3_2d test set in response to W3.

Regarding the comparison between NeuralSteiner and Boxrouter on ibm01 and newblue01_2d, it is evident from Table 6 in App. D.1 of Hubrouter and Table S1 in App. A of our paper that the ibm01 has fewer routing resources and more average pins, while is much smaller than newblue01_2d. This poses a challenge for the NeuralSteiner method. However, Boxrouter's Robust Negotiation-Based A* Search allows nets to bypass resource-constrained areas over a larger range, reflected in Boxrouter's longer wirelength (62659).

In the case of newblue01_2d, due to the ISPD07 competition setup, Boxrouter's wire length and congestion are also influenced by vias (a cross-layer connection structure), making direct comparison with NeuralSteiner impractical, which is the reason why we do not include it in Table 2 for comparison.

Q3: Results of CUGR.

The ISPD07 data format does not match the input format required by CUGR. We input the net information obtained through FLUTE + edge shifting and the corresponding resource distribution into CUGR, invoking its rip-up and reroute algorithm to construct congestion-free nets. The wirelengths for newblue01_2d and newblue03_2d are 2,462,361 and 7,655,784, respectively, with congestion values of 0 and 29,683.

Q4: Is the RST construction conducted by Python or C++?

Our overflow-avoided RST construction algorithm is primarily implemented in C++.

Q5: Will the code be made available upon acceptance of the paper?

Thank you very much for your interest in our method. Yes, our code will be open-sourced.

2024-08-10

Thanks for your response. I have read the rebuttal and other reviews. Currently I have no further question and keep the current positive rating. Wish you good luck.

2024-08-12

Thank you again for reviewing our manuscript. Your constructive comments and insights are greatly helpful in improving our work.

审稿意见

评分: 5置信度: 42024-07-16

This paper presents a learning-driven approach for overflow avoiding Steiner tree construction. The authors propose a two-stage framework that initially predicts the locations of potential Steiner points, followed by a post-processing algorithm that constructs the Steiner tree based on these predicted points. The effectiveness of the proposed method is then evaluated through a series of experiments.

优点

The paper is well organized.
Detailed background introduction is included.

缺点

The idea lacks novelty.
Many important experimental settings are not disclosed.
The experimental results are not convincing.
The writing can be difficult to follow and occasionally confusing.

问题

The methodology presented in this work appears to align closely with prior works, such as Hubrouter, and employs trivial techniques. Could you please elaborate on the unique technical contributions of your study?
The primary aim of this paper is to optimize overflow during the construction of the Steiner tree, a process that occurs prior to routing. However, there tends to be a significant difference between pre-routing and post-routing overflow. Additionally, most routing tools will also take overflow reduction into account. Hence, optimizing pre-routing overflow may not yield significant benefits. Could you please shed some light on this?
In Section 3.4, could you clarify how the congestion value O(x,y) in Eq. (8) is calculated? From which stage are the O(x,y) values extracted? How are points that do not satisfy the two conditions handled? How are the values of the weights w_d and w_o in Eq. (8) determined?
The paper lacks clarity regarding the evaluation metrics used. Could you specify the stage from which the overflow values are extracted, for instance, is it immediately after RST construction or after global placement? Could you also share the tool used for global routing during testing and whether it considers congestion?
It would be beneficial if the authors could provide a comprehensive description of the test flow, including the source of the inputs, how the proposed framework integrates with the routing tool, and the stage from which the evaluation metrics are extracted.
The results in Table 3 seems wield. The data suggests that overflow loss reduces both overflow and wirelength, which is unusual as these two metrics typically have a trade-off. If overflow loss is designed to reduce overflow, it should result in an increase in wirelength. Could you provide some clarification on this?
I'm interested to know the number of Steiner points inserted by the proposed method. This is crucial as simple methods can also reduce overflow by adding more Steiner points to circumvent congested regions. However, an increase in Steiner points could significantly limit the optimization space of the subsequent routing process. Could you please include the number of Steiner points in the results tables?
To enhance reader comprehension, could you provide an example of an actual solution?

局限性

Listed in questions.

作者回复

2024-08-07

Thank you for your detailed review and thoughtful comments. Please note our global comment with additional experimental results. Below we will address specific questions.

W1/Q1: Novelty and contribution of the method

Thank you for your valuable comments. Here, we further elaborate on the novelty and contribution of our method in comparison to the previous HubRouter approach, highlighting the differences and innovations in motivation, workflow, and neural network architecture:

The motivation of NeuralSteiner is to construct overflow-avoiding RSTs based on the deep learning method, which is crucial for practical global routing applications, rather than merely ensuring connectivity and optimizing wirelength.
We utilize a ResNet-based backbone network to learn the encoding of the overflow and pin map (lines 160-164). Additionally, we introduce the RCCA mechanism to address long-range association issues in large-scale nets (lines 165-175) and the cost loss to encourage overflow-avoiding candidate points generation.
Compared with other state-of-the-art learning-based methods. NeuralSteiner achieved a significant reduction in congestion with only a minimal sacrifice in wirelength quality of RSTs, fulfilling the motivation of our method.
Further experiments on ispd18/19 indicate that NeuralSteiner when combined with traditional routing tools, can effectively reduce the number of violations after detailed routing.

W2/Q4/Q5: Experimental settings, evaluation metrics and test flow.

We realize the importance of clarifying these issues. This could be refered to in the Q2 in global responses for the detailed discussion of the different experimental setup, the metrics extraction method, and different test flows that are integrated with abstract resource model or routing tools.

W3: The convincingness of experimental results.

Since our environment resource modeling and dataset division are the same as those in previous works, we can make a fair comparison between the NeuralSteiner and other neural network-based RST generation methods on the same routing test set.

Tables 1, 2, and Figure 3 in the main paper compare the RST generation results of different methods on ISPD98 and ISPD07. NeuralSteiner achieves a significant reduction in congestion while maintaining wirelength quality, and it also demonstrates higher solving efficiency. Tables 3 and 4 explore the roles of our different design modules from various perspectives.

Experimental results of integrating our method with the routing tool CUGR for global routing are shown in Table 2 in the rebuttal PDF, which indicates the potential of the NeuralSteiner method to reduce overflow when combined with routing tools.

W4: Thanks again for your meticulous review and we will correct the confusing parts of the writing in the revised version.

Q2: The necessity of optimizing pre-routing overflow.

As you mentioned, the relationship between pre-routing congestion and post-routing congestion is quite complex. However, we still believe that optimizing pre-routing congestion is of significant importance:

Many global routers use rip-up and reroute to eliminate the overflow caused by initially generated RSTs without considering congestion. Introducing congestion mitigation (such as NeuralSteiner) at the RST generation stage can provide a resource-friendly initial solution for subsequent stages.
Table 2 in the rebuttal PDF shows that by integrating NeuralSteiner into CUGR, we achieve a 4.4% and 19.1% reduction in shorts and spaces, respectively, in the detailed routing results, with minimal losses in wirelength and vias. This demonstrates that our pre-routing overflow mitigation method is beneficial for reducing overflow in the post-routing results.
The latest routing tool DGR [1], introduces concurrent optimization techniques in the RST generation field, optimizing wirelength, vias, and congestion simultaneously. This also leads to a significant reduction in DRV in the detailed routing results, further supporting our opinion.

Q3: Questions about Eq.8, overflow extraction, and edge conditions.

Please refer to the Q4 in global responses for the detailed clarification of $O(x,y)$ and the parameters $w_d$ and $w_o$ .

As shown in Figure 1(a)-(d) on page 2, the overflow map of the net with black pins is extracted in real time from the environmental model.

There will be no edges between points that do not meet both conditions. If this results in a disconnected NAG, we will add turning points and edges between the disconnected components according to the 2 conditions.

Q6: The role of cost loss.

Thank you very much for pointing out this informative detail. In our loss setting, we combine focal, dice, and cost loss. The first two primarily learn candidate points from expert routing data. The inclusion of cost loss introduces candidate points not present in the expert data, thereby altering the topology of the final RST generated.

Additionally, our post-processing method constructs RSTs using a greedy algorithm, which does not guarantee the shortest wirelength. Including cost loss provides a larger solution space for RST construction, enabling optimization of both wirelength and congestion.

Q7: Include the number of Steiner points in the results tables.

Thank you very much for your important suggestion. In Table 3 of the rebuttal PDF, we have summarized the predicted candidate points for ISPD07. The average number of candidate points added by NeuralSteiner is not significantly more than the average number of pins, which means that for the vast majority of nets, the number of nodes in the NAG will remain at a small scale.

Q8: Could you provide an example of an actual solution:

Thank you for your suggestion. We have added a comparison of the actual solutions between NeuralSteiner and Hubrouter in the rebuttal PDF.

Reference:

[1] DGR: Differentiable Global Route, DAC 2024

评论- Reply

2024-08-10

I appreciate the authors' effort in answering my questions and addressing my concerns. Therefore, I would like to change the score from 4 to 5.

2024-08-10

Thank you very much for your positive feedback and for taking the time to review our responses. We greatly appreciate your willingness to reconsider the score and are glad that our explanations addressed your concerns. Your constructive comments and insights are greatly helpful in improving our work.

作者回复

2024-08-07

Dear Area Chairs and Reviewers,

We would like to thank all reviewers for providing constructive feedback that helps us improved the paper. We are encouraged that the reviewers acknowledge the novelty (4X1w, 1MAW), effectiveness (4X1w, 1eER, 1MAW) of our approach, thoroughness of our experiments (4X1w, 1eER) and are interested in the comparison and integration with traditional routers (JcjD, 1eER). We particularly appreciate the recognition of our significant improvements in overflow over state-of-the-art learning-based methods (4X1w, 1eER, 1MAW), the affirmation of our method's generalization capabilities (1MAW), and the acknowledgment of our method's contributions to the field (4X1w, 1eER).

Beyond the positive feedback, some concerns from the reviewers are common, so we give the following global responses:

Please see the attached PDF for a one-page PDF with a summary of added experimental results.

Q1: Why not adopt an end-to-end model?

We understand the reviewers' concerns regarding NeuralSteiner not being an end-to-end approach. This is determined by the complexity of the global routing problem, which requires learning-based methods to 1) learn the representation of routing resource, and 2) generate routing result that ensures connectivity of all pins while minimizing wirelength and overflow. NeuralSteiner achieves the first goal by learning the expert candidate Steiner points and corner points. However, processing these candidate points as discrete values for subsequent RST construction makes differentiable learning challenging.

Our NeuralSteiner method significantly outperforms other end-to-end methods such as PRNet [1] and two-stage methods like HubRouter [2] in terms of total time cost and overflow. In the future, we will explore using continuous probabilistic candidate point maps and investigate end-to-end learning with neural networks for generating overflow-avoiding RSTs.

Q2: Experimental setup and comparison with routing tools.

First, our environment modeling of routing resources is consistent with HubRouter [2]. Metal layers are projected onto a 2-dimensional grid graph with a horizontal and a vertical layer. The NeuralSteiner interacts with this resource model, extracting overflow map in real-time and updating the resource of corresponding edges in the model based on the routing results (Figure 2).

Second, NeuralSteiner extracts the metrics by calculating the total wirelength and overflow in the environmental model after routing all nets. This metric extraction method is also the same as that in previous work [2].

Third, we do not integrate our method with routing tools in the main experiments of the paper because they mainly evaluate the metrics above of the RSTs generated by different learning-based methods.

Finally, the comparative experimental results of integrating our method with the routing tool CUGR for global routing are shown in Table 2 in the rebuttal PDF. In these experiments, the RSTs generated by NeuralSteiner replace the RSTs generated by FLUTE + edge shifting in CUGR, and subsequent detailed routing is performed using DRCU. The results on larger ISPD18/19 benchmarks confirm the potential of the NeuralSteiner method to reduce overflow when applied to routing tools.

Q3: Time complexity and scalability of the RST construction algorithm.

The distance between two connected components is the sum of the edge weight of the shortest path between any two points from each component. Here, we use Dijkstra's algorithm to search for the shortest path between every two points. Given that NAG is a sparse graph, and the number of edges and nodes in the NAG is $(O(N_{pin}))$ according to table 3 in the rebuttal PDF, Dijkstra's algorithm with a binary heap yields a time complexity of $(O(N_{pin} \log N_{pin}))$ . The total time complexity is $(O((N_{pin})^2 \log N_{pin}))$ to calculate the shortest path for all points in the connected component. In each iteration, we connect the two components with the shortest distance and update the distances matrix. It will end with a connected component containing all pins. So, in the worst case, the total time complexity of the RST construction algorithm is $(O((N_{pin})^4 \log N_{pin}))$ .

Table 3 in the rebuttal PDF shows that the pin size for most of nets in dataset is no more than 10, indicating that the algorithm will be efficient.

Additionally, the algorithm can be accelerated by simple heuristic rules. When calculating the distance from one connected component to another, we compute and filter out a fixed number of node pairs with the shortest Euclidean distance, performing Dijkstra's algorithm only on these limited node pairs. This reduces the complexity of the RST construction algorithm to $(O((N_{pin})^3 \log N_{pin}))$ . As shown in Table 1 in the rebuttal PDF, the accelerated algorithm significantly improves solving efficiency while achieving similar wirelength and overflow compared to the original version.

Q4: Parameters used in NAG's edge weight.

We realize that the use of the $O(x,y)$ in Eq.8 may be misleading. It is identical in meaning to $o_{ij}$ in Eq.6 in Section 3.3, representing the value at the corresponding position on the overflow map. Eq.8 defines the weights used for edges in the NAG. These weights consider both the wirelength and the congestion that the edge traverses, with parameters $w_d$ and $w_o$ reflecting the trade-off between them. In all experiments, we use $w_d = 1$ and $w_o = 2$ . We will explicitly state this in Section 3.4 and make this revision in the final version.

In the following, we provide detailed answers. We are glad to give further responses for informed evaluation.

Reference:

[1] The policy-gradient placement and generative routing neural networks for chip design, NIPS 2022

[2] HubRouter: Learning Global Routing via Hub Generation and Pin-hub Connection, NIPS 2023.

最终决定Accept (poster)

2024-09-25

The paper has been accepted due to its significant contributions to the field of global routing in VLSI design. The proposed method, NeuralSteiner, offers innovative techniques for addressing overflow problems and optimizing wirelength, which are well-demonstrated through rigorous experiments and detailed comparisons with existing methods. The responses to the reviewer questions have clarified the concerns effectively, reinforcing the paper's value and novelty. The decision to accept is supported by the reviewer's appreciation of the method's generalization capabilities and its potential impact on practical applications in chip design. The provision of the source code upon acceptance promises further validation and replication of the results, enhancing the paper's reproducibility and transparency.