Invited: Acceleration on Physical Design: Machine Learning-based Routability Optimization

Design rule violation (DRV) is one of the significant challenges in designing integrated circuits. To successfully manufacture a chip, it is crucial to create a DRV clean layout. However, as technology nodes shrink and the cell density of the design increases, design rules have become increasingly difficult to meet, making the routing more complex. In addition, the conventional design flow has a problem in that it primarily determines design parameters and tool options, while evaluating routability at the end of the design flow. Furthermore, due to complex design rules, even global routers are not accurate enough, so routability can be assessed after actual routing, leading to significantly extended design turn-around times. In this paper, we introduce a framework that leverages machine learning techniques to overcome the limitations of the conventional design flows. We also present the challenges that arise during the construction of the framework, along with related research. Furthermore, we discuss issues that remain unresolved.


INTRODUCTION
With the advancement of technology nodes, the feature sizes keep downscaling while fundamental physical and circuit limitations such as lithographic patterning, reliability, and crosstalk remain [1].As the feature size shrinks, the cell density and complexity of design rules become challenging [2] which has led to a significant increase in design complexity [3].Moreover, this has resulted in miscorrelation between routing congestion and design rule violations (DRVs), leading to an exponential increase in routing execution time, and ultimately routing failures.As a result, in terms of design efficiency, the product benefits have decreased and the design cost at advanced nodes has started to rise rapidly, making design cost the most significant threat to the semiconductor roadmap [4].
At the initial design stages, designers need to configure various design parameters (e.g., layout utilization, the number of metal layers in the back-end-of-line (BEOL) stack, aspect ratio, and clock period) and tool options.To achieve chip tape-out, designers should obtain a solution without DRVs.When there are many DRVs unable to be fixed, designers go back to the initial design stages to modify design parameters and tool options, causing additional design iterations [5].However, the time complexity of the routing is extremely high, and the earlier design stages influence routability significantly [6].Furthermore, since the quality of layout design can be evaluated at the end of the place and route (P&R) process, to determine the best combination of design parameters is time-consuming task [7].Therefore, to reduce the overall chip design turnaround time (TAT) and design cost, a method that accurately predicts and optimizes routability at the early stages of the design flow is critically needed [8] (Fig. 1).
Routability is influenced by numerous factors such as design parameters [5], tool options [9], circuit [10], frontend-of-line (FEOL) technology and design enablement options [11].Therefore, modeling routability can be defined as to establish a functional relationship between the routability and these factors.Meanwhile, it is challenging to assess the effects of the various routability factors due to their intricate interconnections, and still, no precise model is available to reveal these relationships accurately.Moreover, the noise within P&R tools [12,13] further complicates analytical modeling approaches.
Numerous machine learning (ML) models used for early prediction [14], black-box optimization [7,9], and path-finding  [15] in the physical design flow have demonstrated considerable potential for enhancement compared with the conventional design flows.Routability research as an early prediction technique is also adopting ML due to its high complexity of modeling.ML models can identify the connections between high-dimensional input data and output data through complex networks.ML can learn specific trends and patterns from large amounts of data that humans might overlook, which makes it better for constructing a complex black-box function than the analytical modeling approach.Due to these advantages, ML models are being applied in routability research.However, a few challenges remain in establishing a design flow using ML.In this paper, we describe the challenges that arise during the construction of the design flow using ML, along with related works to address these challenges.Fig. 2 illustrates the outline of the paper.We present the data generation method in Section 3, ML-based routaiblity prediction methods in Section 4, and routability optimization method of ML-based prediction in Section 5.Each of the studies contributes to the construction of [16], and in Section 6, we will include further challenges raised in [16].

DATA GENERATION
ML involves the process of learning high-dimensional models from data.If diverse data related to the prediction target are not available, ML models can not guarantee accurate predictions for unseen data.However, obtaining real-world data can be expensive and has security risks.Therefore, to overcome the shortcomings of real-world data, synthetic data have been widely used in ML research [17].Synthetic data created by computer programs has the advantage that it can simulate various complex situations such as corner cases that are hard to acquire from real-world data to train ML models.Additionally, employing a variety of synthetic data can enhance the performance of ML algorithms and mitigate problems such as data bias [18].
In the EDA field, the demand for large datasets for ML research has been rising.However, the absence of various circuit data poses a significant obstacle to the advancement of ML research.Data augmentation through modifications of placement settings or design parameters [5] can aid in training a model to recognize patterns in the design quality of circuits.However, these datasets cannot guarantee accurate predictions for unseen circuits [19].In P&R flow, the coverage of the data varies depending on the diversity of the benchmarks used for training [20].However, constructing big data using real-world intellectual property (IP) cores is often impractical due to patents and copyrights.
To overcome the lack of circuit data, generating artificial circuits can be a solution [3].Therefore, several circuit generation methods have been proposed for various purposes.Darnauer and Dai [21] propose a method to generate random benchmark circuits based on Rent's rule for studying routability, but it does not address issues related to delay, fanout, or sequential correctness.Hutton et al. [22,23] generate combinational netlists with given properties such as size, delay, physical shape, edge-length distribution, and fanout distribution.Kahng and Kang [24] provide benchmarks to evaluate the sub-optimality of a gate-sizing algorithm using inputs like fanin, fanout distribution, target clock, and datapath depth.However, these studies are focused on generating data for specific applications, which can result in unrealistic topologies that are not typically seen in real circuits.
Artificial circuits can be generated in two ways: 1) circuit augmentation and 2) circuit exploration (Fig. 3).Data augmentation involves modifying the original data to create new data, allowing ML models to have tolerance for variations in input [25] and increasing the amount of data [26].Circuit augmentation can fill the missing points of the circuit space by extracting the topological parameters of real-world circuits and embedding them into the circuit space.By giving variations to the embedded real-world circuit points and Figure 3: The sampling methods for synthetic circuits [20] randomly sampling points, circuit augmentation can be conducted.Circuit exploration is the process of examining and understanding data using statistics [27] and can be defined as a method for sampling artificial circuits when there are no reference circuits.
Artificial netlist generator (ANG) [20], which employs both circuit augmentation and exploration, takes topological parameters such as the number of instances, net degree, and average size of the net bounding box as inputs, enabling data exploration by sampling in bounded circuit parameters.ANG can generate circuit data using both methods and demonstrate an improvement in the coverage of input features related to circuit characteristics compared to using only real-world circuits.Additionally, by employing t-distributed stochastic neighbor embedding (t-SNE) [28], it shows a broader coverage of the embedding space than previous work [24] with the same number of samples (Fig. 4).When using the artificial circuit in the routerability prediction model, it results in an increase of 0.143 in the Pearson correlation coefficient [29] in log(num_drvs) and an improvement of 54.5& in F1 score for routing failure classification, compared to using only real-world datasets [20].
Figure 4: t-SNE of data distributions for test (green), reference [24] (red), and artificial (blue) designs [20] 3 ROUTABILITY PREDICTION Recent studies on routability prediction have focused on utilizing ML techniques to enhance the prediction performance compared with analytical congestion analysis models [30,31].We can predict routability by two main factors (1) the total number of DRVs, and (2) the spatial distribution of DRVs.
The total number of DRVs serves as an indicator of how feasible the routing is for a given placement solution.It helps identify cases where routing may fail in the design, allowing for quicker feedback and reducing the overall design TAT by terminating routing.One study introduces a support vector machine (SVM)-based binary classification model to predict whether the number of DRVs exceeds a specific threshold after the post-route stage [5].Another study proposes a ML framework capable of identifying both routing and timing failure cases based on circuit and layout-related features [32].In the other study, a convolutional neural network (CNN) model is employed to predict the number of DRVs [33].
In addition to predicting routing failure, to apply routability optimization algorithms, predicting the locations of DRV hotspots is needed.Several methods have been proposed to predict DRV hotspots.Tabrizi et al. [34] propose a diverse feature set that can significantly contribute to DRVs and a deep neural network (DNN) model to predict short violations.To consider not only short violations but also various types of DRVs, Liang et al. propose the J-Net architecture [35] with consideration for pin accessibility, which takes both high-resolution pin shape patterns and low-resolution layout features and LGC-Net [36] architecture to cope with non-determinism in DRV prediction [12,13].However, to overcome the increased computation time of the model due to pin images, Baek et al. [37] combine graph neural network (GNN) and U-net architecture to address the impact of pin accessibility and routing congestion.These DRV hotspot predictions are implemented by dividing the entire layout into tiles and predicting the presence of DRVs in each tile.
While violations tend to concentrate in regions with poor routability, the prior works utilized a binary labeling that categorizes them as either hotspots or non-hotspots, without considering the actual number of DRVs within the Gcell.This binary classification prediction has limitations since it cannot provide a continuous assessment of DRVs.To optimize routability more precisely, both hotspot detection and diagnosis are crucial.Another challenge is data imbalance, where real-world data does not have a uniform distribution but rather exhibits a skewed distribution towards specific observed values.Data imbalance can lead to biased training of the model toward the major class while most ML models are trained on balanced datasets.However, in physical design, the number of hotspots is less than non-hotspots [35], because the former occur when the router fails to make corrections.
Figure 5: Overall framework of HCR model [38] To address this issue, it is necessary to consider methods for predicting routability on a continuous value and improving the prediction accuracy of minority samples.Therefore, Kim et al. propose the deep hierarchical classification and regression (HCR) model [38] (Fig. 5).They define the routability prediction as a tile-wise regression problem and the HCR model predicts DRV hotspots and then the number of DRVs within the hotspots.This approach not only reduces regression error caused by data imbalance but also provides more refined information about DRV hotspots.Furthermore, by employing the bayesian optimization algorithm to explore modeling parameters for the classification model, the performance deviation of the model based on data configurations and hyperparameters (Fig. 6) can be minimized.As a result, in regression, an R2 score is 0.71, achieving a 94% improvement in the F1 score for hotspot prediction, compared to the results of the previous work [33].
Figure 6: Training method for HCR model [38] 4 ROUTABILITY OPTIMIZATION ML is effective at handling complex, high-dimensional data.Its application has significantly enhanced prediction accuracy, allowing for the identification of routability issues in specific locations without the need to perform actual routing.As a result, research on routability optimization in the pre-route stage based on ML prediction has been actively conducted.Chan et al. [39], for the first time, optimize routability utilizing an ML-based prediction model by applying white space redistribution to predicted DRV hotspots.Additionally, placement blockage insertion is developed to address predicted short violations based on pin patterns [40].Furthermore, Yu et al. [41] tune the global routing cost parameters to guide a router based on predicted post-route congestion.
Although the use of ML for optimization has improved the routing quality, there are limitations on its applicability.While white space redistribution can be beneficial in alleviating local congestion, imprudent movements of cells can lead to additional DRVs in neighboring regions [40].Moreover, [40] and [41] primarily focus on predicting M2 short violations and post-route congestion.Consequently, they may not be effective in addressing various types of DRVs.To effectively address the various causes of DRVs, we need solutions that not only interpret the predicted information but also guide the optimization methods.However, ML cannot provide guidance on how to prevent DRV hotspots.The lack of transparency of the ML models limits the complete utilization of their predictive capabilities.The black-box nature of ML presents a challenge that hinders its integration into real-world industrial design flow.While commercial P&R tools offer a range of options to enhance routability, model prediction does not offer insights into which tool options to use, making it difficult to make informed decisions.
Figure 7: The gap between prediction and optimization [16] To apply prediction models in the actual design flow, a solution is needed to interpret the predicted information and use the interpreted information as a basis for optimizing the design.A method called Explainable Artificial Intelligence (XAI) has been proposed to overcome the black-box nature of ML models and interpret their predictions.Using XAI can enhance the interpretability of the model, allowing us to understand how the model arrived at its predictions.Through this approach, we can track the contribution of input features to routability and get insights into how to improve routability.Thus XAI can be a way to bridge the gap between the prediction task and the optimization task (Fig. 7).
To address and mitigate DRVs based on the reasoning behind predicted DRV hotspots (Fig. 8), Park et al. [16] propose a framework that leverages XAI.First, the prediction model is trained using a substantial amount of data generated by ANG, and it predicts DRV hotspots in the trial route stage.Then, one of the recent XAI techniques, DeepSHAP [42] calculates the contribution of each feature providing information about the most important features and their contributions to the predicted DRV hotspots.Afterward, based on interpretable predictions, an optimization method is selected and integrated into a commercial P&R tool.As a result, the framework reduces the number of DRVs by 42% on average compared with the previous work [39].

CHALLENGES AHEAD
Among XAI techniques, DeepSHAP [42] is a method that efficiently calculates Shapley values [43], representing the contribution of each input feature to the prediction result, by adopting the DeepLIFT algorithm [44].DeepSHAP calculates the contribution of each feature by taking the difference between the prediction result when the feature is absent and the average prediction result, using a reference input.
However, DeepSHAP has a drawback since this reference input significantly influences its explanation results.In image classification, where research adopting XAI techniques is actively conducted [45,46], plain black images are commonly used as reference inputs [47].However, in the case of circuit data, there is no established method for selecting an appropriate reference input.In the study by [16], to mitigate the dependency on reference input, 10 random reference inputs are selected from the training dataset followed by an averaging contribution score which can be a factor affecting optimization performance.Therefore, further research is needed to choose appropriate reference input to make more clear and effective explanations of model predictions.
In addition, the interpretation of the model's prediction results can also be used to effectively utilize the various routability optimization options provided by the P&R tools [48,49].P&R tools offer various tool parameters, and typically, configuring these parameters relies on the domain knowledge of the designer.However, by using XAI to identify which features are problematic, one can determine the necessary tool parameters, thus streamlining the exploration of the design space.Nevertheless, combinations of tool parameters are challenging to intuitively predict their action and experiments are time-consuming and lack robustness.Moreover, since the internal workings remain a black-box, Understanding the causality of optimization results is challenging.
Last, routability prediction using ML requires a significant amount of data to train models, which is time-consuming and requires numerous copies of P&R tool licenses.However, data is suitable only for the corresponding tech node and is poor-fitting for others.This limitation arises because design rules can vary for each tech node, and circuit characteristics may also change.Consequently, when the tech node changes, the performance of the prediction is not guaranteed.To apply an existing ML framework to a new tech node, it is necessary to generate data and retrain models, and this process can become a roadblock for the ML-integrated framework.To improve the efficiency of training prediction models, leveraging prior knowledge can be a solution.Therefore, research is needed on methodologies for transferring models trained on an existing tech node to a new target tech node followed by additional validation to ensure that existing optimization methodologies work robustly even when the technology changes.
Research Foundation of Korea (NRF) (2022M3H4A1A04096496) funded by the Ministry of Science and ICT, Korea.

Figure 1 :
Figure 1: Routability prediction to reduce design turnaround time

Figure 2 :
Figure 2: Outlines of this paper

Figure 8 :
Figure 8: The overall framework of routability optimization framework