Gear-Ratio-Aware Standard Cell Layout Framework for DTCO Exploration

With advances in lithography technology, the minimum metal pitch (MP) becomes smaller than the contacted poly pitch (CPP). This difference has long prompted the need to seek an optimal ratio between CPP and MP. Automated cell synthesis with conditional design rules offers a valuable lever to speed up the technology exploration process and to identify the best "gear ratio" (GR) for Design-Technology Co-optimization (DTCO) exploration. Existing approaches for cell layout generation frameworks have primarily supported uniform grids with limited gear ratio options. In this work, we present SMTCell, a new exploratory framework for cell layout generation that allows flexible gear ratio options using a graph-based data structure. We employ distance-based objective functions and conditional design rule parameters to adapt to varying pitch values. This approach enables us to investigate and discover optimal layouts under diverse technology node settings. An acceleration feature drastically trims the solution space, resulting in a speed increase of up to 19X without sacrificing the quality of the original solutions. With SMTCell, cell synthesis automation can be configured to accommodate a wide range of design choices. We conduct an empirical study to assess the impact of gear ratio at block-level synthesis, place and route (SP&R) outcomes, with the ultimate goal of identifying the most effective technology and standard cell configurations in terms of design power, performance and area (PPA) metrics.


Introduction
The gear ratio (GR) [4] between contacted poly pitch (CPP) 1  and M1 pitch 2 plays a pivotal role in overcoming scaling constraints, as it enables fine-grain balancing of density, pin access and routability considerations.To support Design-Technology Co-optimization (DTCO) [13] and exploration of potential metal pitch and offset 3  [6] [18] configurations, we develop an improved Satisfiability Modulo Theories-based cell layout synthesis solver, called SMTCell, that accommodates various GR configurations.SMTCell leverages a graphbased methodology to handle rich configurations of pitch and design rules while returning optimal cell layouts.
Motivations for such a tool are seen as follows.Gear ratio has a profound impact on the allocation of routing resources across various layers. 4Figure 1 shows three 1 Distance between a poly gate and its adjacent poly gate. 2 Distance between adjacent tracks on the lowest vertical routing layer. 3Offset is the distance by which all vertical routing tracks move to the right from the left edge of the cell. 4In this work, we assume a layer stack of placement (vertical), metal 0 (M0, horizontal), metal 1 (M1, vertical) and metal 2 (M2, horizontal)."Placement" in our work denotes a layer that contains gate, source and drain.We assume that M1 and M2 are used for input/output (IO) connections.In the following, we refer to these four layers (placement through M2) respectively as  1 through  4 .Shapes on M0 may be stretched to satisfy the Minimum Area Rule (MAR) [17].And, to ensure proper interconnectivity, sufficient metal length must be provided to facilitate connections between upper and lower vias.The positioning of these vias is influenced by the varying pitch value on M1.Notably, in larger cells where routing resources tend to be scarce, efficient routing across different layers becomes challenging.This challenge is further exacerbated when factoring in the Via-separation Rule (VR) [17] and other layout constraints.
Furthermore, imposition of a more stringent M1 pitch can introduce more complexity to layout design.Pin openings on adjacent tracks become unroutable from M2, as highlighted in orange in Figure 2. A layout automation tool must be precise enough to take advantage of such dense routing tracks while being aware of potential spacing violations caused by VR and other design rules.
Previous methods [9] [14] demonstrate the potential benefit of using SMT-based design rule constraints to simultaneously place and route when generating cell layouts.SP&R [9] uses a grid graph representation of a potential cell layout, then encodes graph elements into boolean variables.Cheng et al. [2] [3] leverage this approach on newer device architectures (Vertical-FET [11] and Complementary-FET [15]), and demonstrate scaling boost at block-level.Recent enhancements of the PROBE framework [5] automate custom Process Design Kit (PDK) generation and incorporate PPA and IR drop prediction for DTCO.
However, previous formulations are limited to only handling 1:1 and 3:2 GR.Overcoming this limitation requires a new and effective way to represent the solution space with a graph structure.Such a graph structure needs to encode the varying lengths of edges (i.e., metal segments).Specifically, to enable all (spacing-related) design rule constraints 5to be satisfied, preference should be given to shorter metal segments.
In this work, we incorporate a new graph structure with properly scaled edges to achieve a SMT-based cell layout synthesis.We present SMTCell, a comprehensive framework endowed with an array of design options.Introduction of the underlying graph structure enables incorporation of dense routing tracks into cell layout, and exploration of arbitrary gear ratios.Furthermore, we introduce pre-partitioning, a solution space trimming technique, to shorten runtime.
Our main contributions are as follows: • We propose a new graph structure, called the relative layered grid graph, 6 which supports all gear ratios by representing densely placed routing tracks on each layer.Our formulation based on this graph enables distance-based objective functions and design rule constraints.• We introduce a new cell partitioning scheme that "hints" a more plausible SMT solution space, thereby speeding up runtime by up to 19×.• We use our framework to explore different GR options and the impact of GR on block-level Power, Performance and Area (PPA).
SMTCell is available in permissive open source, in our GitHub repository [19].In the following, Section 2 introduces some basic concepts and a graph structure to explore gear ratio.Section 3 introduces a solution space reduction method with cell partitioning and cell width hint.Finally, Section 4 uses SMTCell to conduct assessments with both cell-level and block-level metrics.

Formulation
Recall that gear ratio (GR) is the ratio between (contacted) Poly and M1 pitches.GR is typically given in integers or half-integers.For example, designers may use 3:2 or 2:1 GR.
Given any GR configuration, our tool dynamically formulates SMT constraints based on a graph structure.In terms of SMT formulation, we incorporate largely the same set of constraints as SP&R [9].The novelty of SMTCell lies in its more fine-grained graph structure that permits capture of all the gear ratio details, which are essential for the SMT formulation.Our framework treats pitches as variables to construct a relative layered grid graph, 7 representing a potential cell layout.
In our graph structure, we typically consider a 4-layer design: placement layer  1 (vertical), M0 layer  2 (horizontal), M1 layer  3 (vertical), and M2 layer  4 (horizontal).Since our SMT formulation is based on constraints involving the vertices of the graph, it is crucial to consider the varying distances between vertices to accurately formulate various design rules [17].

Relative Layered Grid Graph Construction
Before performing graph construction, we need to compute the total cell width, and the set of columns and rows on each layer.Acquiring these information beforehand can ease the process of graph construction.The total cell width  total , is determined by max(  ,   ), where   and   are the respective sums of PMOS and NMOS device widths (in nm), Let  be the index of the layers.With  total and user-given variables such as a number of horizontal tracks  ∈ Z + , a set of pitch values  = {  ∈ Z + | ∈ {1, ..., 4}} and a set of offsets from the left edge Δ = {  ∈ Z + | ∈ {1, 3}} of the cell, 8 we can compute a set of column sets C = {  | ∈ {1, ..., 4},   ⊂ Z + } and the a set of row sets R = {  | ∈ {1, ..., 4},   ⊂ Z + } for each layer   .Our targeted cell layout is uni-directional, meaning that each layer has tracks running in a direction orthogonal to that of its neighbor layer(s), with   as the pitch between adjacent tracks.Therefore, a horizontal layer shares columns with its adjacent vertical layer(s); a vertical layer shares rows with its adjacent horizontal layer(s).Since gear ratio is only defined between vertical layers, horizontal layers must have the same pitch (i.e.,  2 =  4 ) and thus all layers must share the same set of rows, Creating the column sets   must allow for  1 ≠  3 , and must consider offsets Δ. Figure 3 illustrates the scenario where the  1 and  3 values result in an irregular pattern of columns on  2 .The column set  2 must contain all columns from both  1 and  3 to represent the densely placed routing tracks, with overlapped columns being merged to avoid redundancy.We achieve this using Algorithm 1, which iteratively creates columns for vertical layers first (Lines 1-7) and then merges them for horizontal layers without repetition (Lines 8-9), ensuring uniqueness for each column in a horizontal layer.
After creating C and R, we construct a relative layered grid graph  = (V, E), where V and E are a set of disjoint 8 In practice,  1 is always set to 0 and is omitted in the calculation.
Algorithm 1: Column Set Creation vertex sets and a set of disjoint edge sets, 9 respectively: Each vertex  is a triplet (;  ; ) that contains layer , row  , and column .Each edge  is a pair of vertices whose locations are adjacent (i.e., nearest neighbors) created along the direction of this layer.To generate V and E, we use Algorithm 2: for horizontal layers, edges are created along each row (Lines 8-10); for vertical layers, edges are created along each column (Lines 11-13).These edges facilitate intralayer routing.Edges between layers, called vias, are created if and only if the two vertices (endpoints) are on adjacent layers and have the same  and  (Lines 14-17).
We use a standard multicommodity flow model for routing [7]. 12As shown in Figure 4, each multi-pin net needs to be routed from the internal pins (multiple sources) on  1 to the frontside IO pins (multiple sinks) on  3 and  4 .These net routings can be represented by flows (edges) through access points at each layer (vertices).SMTCell simultaneously performs placement and routing to ensure the optimal solution.Under our formulation, the SMT solver, Z3 [12], minimizes a set of objective functions regarding cell metrics while satisfying the design rule constraints.

Fine-grain Design Rule Checking
SMTCell incorporates conditional design rules [17], such as End-Of-Line spacing (EOL), Step Height Rule (SHR), Parallel Run Length (PRL), Minimum Area Rule (MAR), etc.The  non-uniformity in a relative layered grid graph requires that the above design rule parameters be defined in terms of Manhattan distance.Prior methods [7] [9] [14] define these design rule parameters in terms of "number of vertices", which is valid only in a uniform grid where all edge lengths are equal.End-Of-Line Rule (EOL) checking on  2 with distance input.Let  2;24;48 denote the metal usage at layer 2, row 24 and column 48.The EOL constraint induces blockages on the current track (as  2;24;20,...,35 ), the upper track (as  2;48;34,...,48 ), and the lower track (as  2;0;34,...,48 ), enforcing the required separation distance.
We input design rule parameters with distance and monitor the total distance traveled along each track until the desired distance is attained.Figure 5 illustrates for a horizontal metal layer how imposing EOL at  2;24;48 prohibits the use of vertices along three tracks: the current track (vertices  2;24;20,...,35 ), the upper track (vertices  2;48;34,...,48 ), and the lower track (vertices  2;0;34,...,48 ).Similarly, when applying other design rules, we examine all potential metal placements, and navigate around their neighboring vertices to prohibit access within the given distance.

Distance-based Objective Functions
To ensure the optimality of our solution, we perform a lexicographic distance-based multiple-objective optimization: where  (•) is a function that returns the edge length;   is an indicator for metal usage; u  is an indicator for via usage; and  4 and   are user-determined multipliers to further penalize top track usage and via usage, respectively.Equation 1 and Equation 3 minimize the resources taken by each cell, while Equation 2 ensures that the IO pins are accessible by the router.Specifically, Equation 2 minimizes the total wirelength used on the frontside (on  4 ) to prevent potential routing blockages.Equation 3 minimizes the total wirelength used.In previous work [9], the cost of each wire is determined by the number of vertices it occupies.In our distance-based objective function, we scale the cost of each wire by summing up the distances between each pair of consecutive occupied vertices, making the solver aware of the shortest path.
Overall, our SMTCell framework is formulated around a relative layered grid graph to efficiently generate vertices and edges on each layer and incorporate distance-based objective formulations and design rule constraints.

Cell Pre-Partitioning
Cell partitioning is used in [8] [9] for special cells to reduce the search space for transistor device placement.For example, in D flip-flops, the Clock (Clk), Data-in (Din), Data-out (Dout), and Leader/Follower latches must be arranged in a specific order, such as Din-Leader-Follower-Dout or Dout-Follower-Leader-Din, to optimize the setup time and delay of the flipflop.Cell partitioning groups transistor devices and encodes their relative positions, thus significantly reducing the search space by limiting the possible placement options.Cell partitioning based on datapath (DP) is defined only for D Flip-flop and is manually coded by designers [8] [9].By using similar concepts to heuristically partition devices, we can significantly reduce the runtime when dealing with larger cells in general.
Our heuristic comes from the following three observations.First, cell layout solutions under different gear ratio and design rule parameter settings do not shift the order of devices in most cases.Hence, instead of partitioning by function groups, we can encode the exact relative position from a previous solution (each group contains only one device).Second, as we incorporate denser routing tracks, more vertices are created based on Algorithm 2. The number of created vertices is proportional to the number of variables in the SMT formulation [1], as shown in Table 1.Hence, solving such SMT problems requires longer runtimes.Finally, the cell width in terms of CPP under different cell settings does not drastically increase or decrease, as shown in Table 2 with different cell configurations.For instance, as discussed in Section 2.1, max(  ,   ) gives a lower bound for the cell width.Then, an upper bound on the cell width can be hinted to the solver by using a previous solution with some relaxation by a few (e.g., 2) CPP.
Motivated by these observations, we introduce a two-stage approach in SMTCell called Pre-Partitioning: • First, we solve a relatively simpler problem by utilizing a uniform grid in 1:1 GR with reduced complexity.• Second, we use the device relative position and cell width value obtained from the first stage to reduce the search space for any given GR.With the relative placement order, the SMT solver only optimizes for routing, which saves runtime.
As our "hint" originates from a solution based on a uniform grid graph, the quality of the initial solution directly influences the quality of the downstream solutions.For instance, given that each layer has its own pitch, design rule constraints prohibit different numbers of vertices on each layer.We are not able to capture such precision in a uniform grid graph.We can enhance the uniform grid graph by assigning different design rule parameters for each layer, rather than applying them universally.These design rule parameters are converted based on the pitch value of each layer from the more difficult GR input.These modifications allow for more accurate design rule considerations and yield a more informative initial solution.
Figure 6 depicts the complete pre-partitioning flow.The partitioned input file, highlighted in red, is passed on to SMTCell with GR enablement to obtain a comprehensive cell layout solution.

Experiments
Our experiments seek to assess the impact of gear ratio at both cell-level and block-level.We hypothesize that having more M1 routing resources benefits block-level routing, while vertical alignment between Poly and M1 tracks is preferred at cell-level.
We first investigate the impact of different GR settings, using the 45:30 (3:2) GR as a baseline with various cell designs and track settings.We choose 22 cells with higher drive strength out of the 40 cells available in SMTCell.Secondly, we focus on the execution time of layout synthesis for larger cells, where we evaluate the speedup achieved through prepartitioning.Finally, we present block-level P&R studies to assess the effect of various GR configurations.

Effect of Different Gear Ratio Settings
We present a sensitivity study on gear ratios in Table 2.In this study, we examine the effects of different GR settings while scaling design rules based on the varying pitch values.For instance, by shrinking the pitch value of M1, we also shrink design rule parameters on M2 and M0 as a reasonable design choice. 13We use 45:30 (3:2) GR as a baseline.Additionally, we examine 45:27 (5:3) GR (less vertical alignment, more routing resources) and 45:45 (1:1) GR (more vertical alignment, less routing resources).We assume that vertically aligning M1 and Poly layers is always preferred.This is because our formulation allows vias to be stacked when direct connections within the same net are possible across more than two layers.If the given gear ratio does not allow such a vertical alignment, all connections between Poly and M1 need to be rerouted through M0, with two separate via connections and a longer M0 metal segment in between.This scenario is less desirable, as the M0 segment is extended due to MAR and VR, potentially exhausting the limited routing resources.
Given that the design rule parameters are scaled with the pitch values, cell-level metrics do not follow an obvious pattern with different GR.Overall, 45:27 (5:3) GR increases cell width, wirelength, and via count.This effect implies some extra detours take place on M0 as described previously. 13Cell designs cannot take advantage of dense routing tracks if design rules are much larger than pitch values.
In particular, cells with larger drive strength (AOI22_X2, NAND4_X2, NOR4_X2 and OAI22_X2) require more CPP in the layout solution.
On the other hand, with 45:45 (1:1) GR, the three largest cells in our library (DFFHQN_X1, LHQ_X1, and MUX2_X1) require extra routing resources from M2, leading to blockages in block-level routing.We also observe that more blockages are created when tracks are reduced (i.e., 2F4RT).The limited routing resource on M1 "overflows" the track usage to M2, causing these extra blockages.One notable trend is that the Z3 runtime and the tightness of M1 pitch display an inverse relation, implying the necessity of an acceleration scheme for faster design turnaround time.

Solution Quality with Pre-Partitioning
To evaluate the performance improvement from pre-partitioning on larger cells, we use three cells with the longest runtimes from our experiments: D Flip-Flop, AOI22_X2 and LHQ_X1.We choose 45:25 (9:5) GR as it produces even denser tracks than 45:27 (5:3) GR.By default, D Flip-Flop uses DP partitioning information, 14 while AOI22_X2 and LHQ_X1 do not have any partitioning information.To evaluate the performance impact of pre-partitioning, we perform the following tasks: • First, we generate these three cells without using prepartitioning.
• Second, we generate these three cells again, this time with pre-partitioning.The additional runtime of the pre-partitioning flow is added to the total runtime.
• Finally, if the total runtime exceeds that of the nonpartitioned cell layout solution, a "timeout" is recorded.From Table 3, it is evident that our acceleration approach significantly improves the runtime in most cases. 15Furthermore, solutions with pre-partitioning have similar quality to those obtained without pre-partitioning.Notably, using the new acceleration technique in SMTCell, Z3 is able to converge to a reasonable solution up to 19 times faster on LHQ_X1 under 45:25 (9:5) GR.
To assess whether pre-partitioning provides a meaningful hint, we perform the following steps to compare the relative positions of transistors and the total cell width: • First, we solve for 1:1 GR and 9:5 GR standard-cell layouts of larger cells, both without pre-partitioning information.
• Second, we extract the relative positions of PMOS and NMOS, which are encoded into two separate strings.• Third, we compute the edit distance [10] between 1:1 GR and 9:5 GR to check if they are similar enough for the former to provide informative hints.• Finally, we also examine the differences in cell width between these solutions to see if the 9:5 GR solution is within the provided guiding range for CPP.Here, the edit distance is the minimum number of singlecharacter edits required to transform one string into the other.An edit distance of 0 indicates a perfect hint as it implies a matching relative position of transistors between the two GR solutions.
Table 4 demonstrates that between 1:1 GR and 9:5 GR cell layouts, the relative placement orders differ by at most one pair-swap (edit distance = 2).Additionally, the cell width differences remain within the specified range.This further validates the effectiveness of our pre-partitioning flow.

Block-level Effect
Since the cell-level improvement does not directly contribute to block-level designs due to limited inter-cell routing resources and other factors, we conduct further investigation using our custom cell library and the JPEG Encoder blocklevel design.We use the same 40 cells described above.
A standard-cell layout contains local Poly and M1 grids.A standard-cell row contains global Poly and M1 grids.For any standard cell to be legally placed on a standard-cell row, the following requirements must be met: • Its local Poly and M1 grids must align with the global Poly and M1 grids respectively.(Every red diamond must align with a red dashed line and every blue diamond must align with a blue dashed line in Figure 7.) • Its left-end and right-end boundaries must align with the global Poly grid.(The left-most and the right-most 15 LHQ_X1 under 45:30 GR failed to converge within the runtime of its 45:45 GR counterpart.We recognize this as an anomaly that deserves further investigation.red diamonds must each align with a red dashed line in Figure 7.) When using a 1:1 GR, all cells can be placed on any column since alignment is guaranteed with a uniform grid.However, with a 3:2 GR, misalignments can occur between the local and global M1 grids; such misalignments induce off-grid violations. 16igure 7 demonstrates an example under 3:2 GR setting.We classify a cell as having an even cell (pattern) or an odd cell (pattern) based on the number of Poly that it occupies (Figure 7(a)).An odd cell is asymmetrical in terms of the local M1 grid.By mirroring the layout, an odd cell has two kinds of alignment patterns.Figure 7(b) illustrates that an odd cell can be placed on any column, making it legally placeable at all columns.An even cell has a symmetrical pattern in terms of the local M1 grid.We cannot create additional alignment patterns by mirroring the layout.Figure 7(b) illustrates that A3 and A4 cannot be legally placed due to misalignments on M1 grids.These misalignments induce off-grid violations.Therefore, an even cell can only be placed at every other column, making it legally placeable at only 50% of the columns (Figure 7(b)).We refer to this calculation as cell legalization% (cell legal%).
We further demonstrate that cell legal% directly impacts PPA in block-level design.In Figure 8, we demonstrate two different scenarios when placing different cells adjacent to an odd cell.Due to the legalization issue posed by even cells, 1 must be shifted rightward by 1 CPP in Figure 8(a).2 can be mirrored and placed next to (i.e., abutting) 1 in Figure 8(b).The extra CPP caused by legalization stretches the routing wire on M2 and increases the timing and power consumption.Table 5 presents a commercial tool report for cell legal% using our cells in block-level design.For each GR, we sort cells from the highest to the lowest usage count.For 1:1 GR setting, all cells are legal and can be placed anywhere, representing a straightforward scenario.However, as GR increases, such as under the 3:2 and 5:3 GR settings, cell legal% becomes increasingly challenging.For the 3:2 GR setting, two kinds of cell patterns can be generated, and half of the most frequently used cells have only 50% cell legalization%.For the 5:3 GR setting, the challenge intensifies, as three kinds of cell patterns may be generated.Indeed, under the 5:3 GR setting, on average only 46% of the columns can be used to legally place cells.With this in mind: • Since cell legal% directly affects cell placeability and hence routability, we hypothesize that lower cell legal% will degrade timing, power, and area.• Further, the 1:1 GR setting should yield the best performance, and the 3:2 GR setting should outperform the 5:3 GR setting despite the latter having more routing resources on M1. Figure 9 presents our PPA results comparing the different GR configurations with the JPEG Encoder block-level design.Lower data points indicate better PPA, with the 1:1 GR setting achieving the best result at all clock periods, followed by the 3:2 GR setting, while the 5:3 GR setting ranks as the worst overall.These data align with our our hypothesized impacts of cell legal%.
The differences in PPA metrics become particularly prominent when the effective clock period is as tight as 0.2.The corresponding area values are 2372 2 , 2461 2 , and 2623 2 for 1:1, 3:2, and 5:3 GR settings respectively (Figure 9(a)).Table 6 shows that with a fixed clock period of 300, increased GR worsens the overall capacitance and wirelength.In conjunction with the increased total cell counts shown in Table 5, this results in higher power consumption for the block.Total power consumption values are 58970 , 62720 , and 64900 for 1:1, 3:2 and 5:3 GR settings respectively (Figure 9(b)).In the above PPA results for the JPEG Encoder designs (Figure 9), the 1:1 GR setting outperforms others, emphasizing the crucial role of cell legal% in influencing overall performance metrics, including area and power consumption.Consequently, to effectively demonstrate the advantages of a finer GR, enhancing cell legal% becomes imperative.Table 6.As gear ratio increases, capacitance increases due to the denser tracks and longer wires.These factors impact the overall power, performance, and area.These findings motivate us to continue investigating the impact of variant GR values, as we can generate multiple copies the same functional cells with different offsets, as shown in Figure 10.By offsetting the local M1 grid from the left edge of the cell boundary for even cell layouts, we create an additional alignment pattern for even cells in Figure 10(a).As shown in Figure 10(b), the additional copy of an even cell layout can be used by 3 and 4, thus increasing the cell legal% by offering more flexibility in placement and routing.This flexibility not only enhances the efficient utilization of the available layout space but also contributes to reduced wirelength and improved routing efficiency.Consequently, PPA metrics benefit from reduced power consumption, improved signal propagation, and more compact layouts -i.e., better overall block-level design outcomes.

Conclusion
We have tackled the challenge in standard-cell library creation with gear ratio by leveraging and enhancing Satisfiability Modulo Theory (SMT) formulations.Our approach enables automation of the standard-cell layout synthesis process, accommodating varying gear ratio settings.Our framework handles the fine-grained layout design constraints and interactions that emerge in difficult gear ratio settings, which present substantial challenges for human designers.We have also conducted comprehensive studies at both cell-level and block-level.These studies identify the best-performing cells and analyze footprint shrinkage, and moreover confirm that a mere reduction in M1 pitch does not lead to proportional improvements in block-level scaling.This realization underscores the need for careful tuning of design rule parameters to fully unlock their potential for optimized cell designs.Our ongoing research focuses on determining the ideal vertical routing pitch by empirical studies, rather than relying solely on exhaustive searches.As shown in our experiments, GR introduces a trade-off between more routing resources and potential timing and power burdens.Future cost-benefit studies with more fully-elaborated cell libraries and performance models can uncover the underlying impacts of GR tuning.IR-drop comparison according to various PDN schemes is additionally required for holistic block-level evaluation.These additions to SMTCell can potentially lead to more effective cell layout generation processes and to enhance the accuracy of existing DTCO flows for pathfinding [4] [5].

Figure 2 .
Figure 2. AOI22_X1 layout: pin unroutable due to via rule violation (left) and improved pin accessibility (right).Unroutability is caused by the previous design rule formulation failing to consider the varying pitches induced by GR.

Figure 3 .
Figure 3. Top-down view of the irregular pattern of columns on  2 induced by different pitch values on  1 and  3 .

Figure 4 .
Figure 4.An example of routing from both source/drain and gate locations in the placement grid to the frontside IO through  2 ,  3 and  4 .At each routing layer, interconnections are made following the layer orientation.Between each pair of layers, vias are constructed at access points (colored squares) to make connections.

Figure 6 .
Figure 6.Pre-partitioning approximates the design rule in a 1:1 GR setting and solves for a simpler solution.Then, we extract the necessary information and encode it into a new partitioned input file.This new input guides Z3 to reduce the solver runtime.Finally, the cell width in terms of CPP under different cell settings does not drastically increase or decrease, as shown in Table2with different cell configurations.For instance, as

Figure 7 .
Figure 7.Even and odd cell patterns based on the number of Poly they occupy under 3:2 GR.(a) Local Poly and M1 grids (shown in diamonds) must align with the global Poly and M1 grids (shown in dashed lines) to be legally placed.Even cells create one symmetrical pattern.Odd cells create two patterns by mirroring the layout due to their asymmetrical local M1 grids.(b) Cell legalization% is computed by the placement opportunity for each pattern.A3 and A4 cause off-grid violations due to misalignments on M1 grids.B2 and B3 use a mirrored layout to align M1 grids.When using a 1:1 GR, all cells can be placed on any column since alignment is guaranteed with a uniform grid.However, with a 3:2 GR, misalignments can occur between the local and global M1 grids; such misalignments induce off-grid violations.16Figure7demonstrates an example under 3:2 GR setting.We classify a cell as having an even cell (pattern) or an odd cell (pattern) based on the number of Poly that it occupies (Figure7(a)).An odd cell is asymmetrical in terms of the local M1 grid.By mirroring the layout, an odd cell has two kinds of alignment patterns.Figure7(b) illustrates that an odd cell can be placed on any column, making it legally placeable at all columns.An even cell has a symmetrical pattern in terms of the local M1 grid.We cannot create additional alignment patterns by mirroring the layout.Figure7(b) illustrates that A3 and A4 cannot be legally placed due to misalignments on M1 grids.These misalignments induce off-grid violations.Therefore, an even cell can only be placed at every other column, making it legally placeable at only 50% of the columns (Figure7(b)).We refer to this calculation as cell legalization% (cell legal%).We further demonstrate that cell legal% directly impacts PPA in block-level design.In Figure8, we demonstrate two different scenarios when placing different cells adjacent to an odd cell.Due to the legalization issue posed by even cells, 1 must be shifted rightward by 1 CPP in Figure8(a).

2 Figure 8 .
Figure 8. Poor cell legalization% can cause longer routing wires, which increase the timing and power consumption.(a) 1 cannot abut 1 as this causes an off-grid violation.Shifting 1 by 1 CPP can potentially lead to a longer routing wire on M2.(b) 2 can be placed abutting 1, and has a shorter routing wire on M2.

Figure 10 .
Figure 10.Extending from Figure 7, we introduce an offset from the left edge of the local M1 grid for even cells.(a) For any even cell layouts under the same function, we solve for two different copies of the cell: one without M1 offset (shown in dark blue diamonds) and one with M1 offset (shown in light blue diamonds).(b) 3 and 4 take advantage of the local M1 offset copy to align with the global M1 grid.By introducing the offset, even cells become legally placeable at every column.

Table 1 .
Complexity for DFFHQN_X1 on 2 Fins and 4 Tracks.Denser tracks and vertices increase the problem complexity.Consequently, solving time is increased.

Table 3 .
Cell metrics and Z3 runtimes for 45:30/45:25 GR in 2F4T and 3F5T settings.(Exact = MOSFET positions are encoded to be exactly the same as the "hint".DP = Datapath-aware cell partitioning.Bold = the better result between no partitioning and pre-partitioning.)

Table 4 .
Edit distance of PMOS/NMOS relative positions and cell width difference

Table 5 .
Comparison of cell legal% in block-level design for different GR.Cell usage is presented with its proportion of the total cell count.Cell legal% is obtained from Cadence Innovus (higher % indicates more placement opportunities).