QuCT: A Framework for Analyzing Quantum Circuit by Extracting Contextual and Topological Features

In the current Noisy Intermediate-Scale Quantum era, quantum circuit analysis is an essential technique for designing high-performance quantum programs. Current analysis methods exhibit either accuracy limitations or high computational complexity for obtaining precise results. To reduce this tradeoff, we propose QuCT, a unified framework for extracting, analyzing, and optimizing quantum circuits. The main innovation of QuCT is to vectorize each gate with each element, quantitatively describing the degree of the interaction with neighboring gates. Extending from the vectorization model, we propose two representative downstream models for fidelity prediction and unitary decomposition. The fidelity prediction model performs a linear transformation on all gate vectors and aggregates the results to estimate the overall circuit fidelity. By identifying critical weights in the transformation matrix, we propose two optimizations to improve the circuit fidelity. In the unitary decomposition model, we significantly reduce the search space by bridging the gap between unitary and circuit via gate vectors. Experiments show that QuCT improves the accuracy of fidelity prediction by 4.2× on 5-qubit and 18-qubit quantum devices and achieves 2.5× fidelity improvement compared to existing quantum compilers [19], [55]. In unitary decomposition, QuCT achieves 46.3× speedup for 5-qubit unitary and more than hundreds of speedup for 8-qubit unitary, compared to the state-of-the-art method [87].CCS CONCEPTS• Hardware → Quantum technologies.


INTRODUCTION
Quantum computing has developed rapidly over the last few decades, offering polynomial or even exponential speedup in several areas, such as chemistry simulation [9], database search [29], and combinatorial optimization [21].Quantum circuit is a widely-used quantum programming model that describes the computation by quantum gates.In the current Noisy Intermediate-Scale Quantum (NISQ) era [63], strong motivation exists to develop circuit analysis and optimization techniques to improve the efficiency of quantum circuit design.For example, estimating fidelity is critical in minimizing noise overhead [3,42,65,66], thereby improving the probability of a circuit producing the correct result [12,49,57,79,84].Additionally, for applications represented as unitary matrices (unitaries), it is necessary to decompose them into circuits with executable basic gates, such as single-qubit and two-qubit gates [6,26,34,72,80].
However, the analysis and optimization of quantum circuits still rely on classical computers, which have to face the tradeoff between accuracy and computational burden.For example, cross-entropy benchmarking (XEB) [3] and randomized benchmarking (RB) [42] are two widely-used fidelity models.They model gate In the upstream model, each gate is transformed into a vector that captures its neighboring circuit features.The downstream models take these vectors as input for various analysis tasks, such as fidelity prediction and unitary decomposition.
fidelities as single values and efficiently estimate the circuit fidelity via a polynomial function, while they exhibit low accuracy.For example, estimating the fidelity of a 5-qubit Grover algorithm [29] on IBM Oslo quantum processor may lead to about 15% difference between the real and predicted fidelity using the RB-based method.An alternative approach is to simulate the Master's equation with density matrix.This is accurate but shows high computational complexity.For example, the number of qubits is limited to 34 when simulating noisy circuits on A100 GPU [33].Techniques of unitary decomposition also suffer from this dilemma.Mathematical approaches like Column-by-Column Decomposition (CCD) [34] and Quantum Shannon Decomposition (QSD) [72] can decompose a 5-qubit unitary in a few seconds, but generate thousands of gates, leading to poor performance and even failure when deploying to quantum devices.On the other hand, aiming to minimize the number of gates, the recent decomposition approach QFAST [87] adopts a search-based method to approximate the target unitary.However, this advantage comes at the expense of increased time complexity.For example, it takes around 60 hours for QFAST to decompose a 5-qubit unitary.
These limitations fundamentally originate from the absence of an analysis method to extract and preserve circuit features.The inaccuracy of the XEB-and the RB-based approaches mainly comes from the inability to model errors caused by gate interactions, such as crosstalk [84] and pulse distortion [69].CCD [34] and QSD [72] inherently follow matrix decomposition theories without considering the features of the circuit structure.On the other hand, because of the lack of preserving circuit features in a formal representation, the accurate approaches [33,87] have to repeatedly go through the circuit, requiring a large amount of time.For example, to accurately predict the fidelity, it needs to simulate a circuit multiple times [33].Similarly, to get a decomposition solution with fewer gates, QFAST [87] has to revisit a large number of candidate circuits and calculate the matrix distance to the target unitary.Ideally, an extraction should thoroughly cover the contextual and topological features of the circuit, providing a unified model to enable rigorous analysis and optimization tasks.
In this work, we propose QuCT, a unified framework that comprises an upstream model and multiple downstream models.Figure 1 shows the overview of QuCT.The upstream model is characterized by its ability to vectorize each quantum gate while taking the circuit features into consideration.We first formally define the concept of path that describes the relation of a gate with its neighboring gates in terms of types (e.g., CX, RZ) and execution orders (e.g., parallelism, dependency).Each gate is vectorized by enumerating the paths starting from it.In the gate vector, each element represents a path, and its value indicates the degree of correlation between the path and the starting gate.The primary advantage of vectorization is that it transforms the unstructured circuit into a set of one-dimensional vectors, which significantly reduces the arithmetic complexity while retaining the contextual and topological features.The vectorization model is trained offline and only needs to be performed once for a target circuit.
The vectorization serves as a new representation of quantum circuit, allowing various analysis and optimization tasks.In this work, we introduce two downstream models to perform fast and accurate fidelity prediction and unitary decomposition, respectively.In the fidelity prediction model, the circuit fidelity is estimated by applying a linear transformation on the gate vectors, where the transformation matrix is trained using a fidelity dataset obtained from the real execution results of quantum circuits.Moreover, the prediction provides guidance to mitigate circuit error in gate scheduling and hardware calibration.In the unitary decomposition model, the gate vector helps to bridge the gap between the unitary and the circuit.Different from existing methods that exhaustively enumerate all possible gate combinations as search candidates, our model effectively reduces the search space by identifying gate vectors that may be involved in the resulting circuit of the target unitary.Benefiting from the expressive representation, we can easily obtain the decomposition solution by reconstructing the circuit based on the paths recorded in the gate vector.
By extending downstream models, our framework can also be applied to other analysis tasks, such as gate cancellation and bug detection.The main contributions of this paper are summarized as follows: • We propose QuCT, a unified framework for quantum circuit analysis, which decouples analysis tasks into an upstream vectorization model and multiple downstream models, providing accurate analytical results with low computational costs.• We propose an accurate model for fidelity prediction, which is extended from the upstream model.Benefiting from our vectorization representation, our model naturally supports the analysis of the errors caused by gate interactions and offers optimization techniques to mitigate these errors.• We propose a unitary decomposition model that achieves remarkable speedup compared to the state-of-the-art method [87].
Our approach prunes search space efficiently by capturing the circuit similarity between different unitary matrices.

BACKGROUND 2.1 Quantum Circuit
Quantum circuit (circuit) is a widely-used quantum programming model.It consists of a sequential arrangement of quantum gates (gates)  that operate on a set of qubits .Each quantum gate  comprises an operation and operated qubits   : Gates that manipulate one qubit refer to single-qubit gates (e.g., RX, RY, RZ, H, and U gates), and gates that manipulate two qubits refer to two-qubit gates (e.g., CX , and CZ gates).Not all gates (e.g., 3-qubit unitary gates) can be directly implemented on the target quantum device.Before deployment, they must be transformed into basic gates that only include specified types of gates determined by the hardware.
Definition 1.We define layer as the basic unit of the circuit timeline.In each layer, a qubit can be operated by at most one gate, and gates within the same layer are executed in parallel.
Each gate is mathematically represented as a unitary matrix according to its operation and parameters.The overall unitary of the circuit is calculated by applying matrix multiplication and tensor-product (⊗) on the unitaries of gates.Figure 2 provides an example of a circuit with four layers and its unitary, where   ,   ,   are unitaries of CX, RX and RZ gates, and  represents the 2×2 identity matrix.

Quantum Circuit Analysis
In this paper, we introduce two representative tasks for quantum circuit analysis.
Fidelity prediction aims to estimate the probability of getting correct results of a circuit under noise.In the case of superconducting quantum computers, errors can be categorized into two types [25], including: a) gate errors resulting from decoherence and imperfect implementation; and b) measurement errors occurring when qubit information is read into classical hardware.Recent works [3,42,77] predict the overall circuit fidelity in a polynomial form as follows: where   ,   1 , 2 denote the number of single-qubit gates of qubit  and the number of two-qubit gates between qubits  1 ,  2 .  ,   1 , 2 , and   represent the single-qubit gate, the two-qubit gate, and the measurement fidelities, respectively.We define gate error as (1 −   ) or (1 −   1 , 2 ).In addition to the aforementioned types of errors, errors from unexpected gate interactions between gates, e.g., crosstalk [57] and pulse distortion [69].
Unitary decomposition takes a unitary as input and decomposes it into matrices of basic gates, resulting in an equivalent circuit.Early methods, such as CSD [34], QSD [72], and CCD [34] decompose a unitary into a sequence of smaller unitaries following mathematical equations (e.g., cosine-sine decomposition function [78]).In contrast, recent methods [6,15,67,87] apply searchbased methods.To approximate the target unitary, they iteratively search and insert gates to the end of the circuit.QFAST [87] is the state-of-the-art search-based approach that aims to minimize the number of gates after decomposition.In this paper, we extend QFAST as one of our downstream models.

Motivational Examples
Many fidelity optimization frameworks [3,20,42,57,69] use Equation 2 to predict fidelity, while they fail to capture the noise resulting from gate interactions, such as crosstalk [84] and pulse distortion [69].The accuracy of the prediction largely determines the performance of the optimization.For example, UREQA [61] prioritizes prediction accuracy.By considering the noise variance in different types of operations, it achieves a fidelity improvement of around 10% in qubit mapping compared to [75].However, it ignores gate interactions related to the circuit structure.Additional  experiments and optimizations are required to identify and mitigate specific types of interactions [3,20,42,57,69].As their results cannot be integrated into an overall prediction function, the error types they focus on have to be optimized separately.Each optimization may decrease one type of error while increasing another.Figure 3 (a) displays the real fidelities for running the Grover [29] and the BV [5] algorithms on the 5-qubit IBM Oslo quantum device.The fidelities predicted by Equation 2(with model parameters derived from RB) exhibit differences of 15.54% and 10.40% compared to the real fidelities.
Unitary decomposition plays an important role in circuit optimization [11] and algorithm design [37].However, early decomposition methods, including QSD [72], CSD [80], and CCD [34], introduce massive redundant two-qubit gates between qubits that actually show no entanglement.Figure 3 (b) presents an example.CCD, the default decomposition method of Qiskit [2], requires more than 9,000 gates to decompose a 6-qubit unitary.Alternatively, searched-based methods may achieve more than 3× gate reduction by searching and inserting the gates that make the circuit closer to the target unitary.However, the search process is very time-consuming.For example, QFAST [87] takes an average of 60.92 hours to decompose a 5-qubit unitary and over a week to decompose a 6-qubit unitary.
The low accuracy and high time complexity of these methods fundamentally come from the lack of a representation to convert the circuit features into mathematical forms.The inaccuracy of prior prediction models [3,42,61] mainly results from the unthorough extraction of topological information (the connection and dependency between quantum gates), rendering it impossible to model the complex gate interactions.On the other hand, prior unitary decomposition models [15,67,87] exhaustively rely on arithmetic approaches without leveraging the contextual information (the parameter space of quantum gates), incurring massive invalid exploration during the search.
This paper aims to develop an intermediate representation to preserve both contextual and topological features while keeping formulation-friendly. Considering that a quantum circuit is naturally a sparsely-connected graph, we find that vectorization is an effective approach to extract features of such graphs [36,47].Our key insight is to leverage random walk [47] to vectorize the circuit such that gate interactions are captured.Clearly, each element of the vector is assigned a value, which gives the degree to how the gates in a region affect each others.The vector representation enables accurate and fast modeling for various analysis tasks.As shown in Figure 3, QuCT achieves a 1.7× and 2.7× reduction in inaccuracy for these two algorithms by modeling gate interactions on the IBM Oslo quantum processor.Besides, by bridging the gap between the unitary and circuit structure using gate vectors, QuCT achieves remarkable speedup compared to QFAST, meanwhile requiring less number of gates.

UPSTREAM MODEL: GATE VECTORIZATION
Before introducing our vectorization model, we define path as follows.Definition 2. A path is a chained relation between multiple gates.A -step path is formulated as follows, where   denotes the relation between two gates, which is categorized into three types: former, next, and parallel.These terms indicate that gate  +1 is in the former, the next, and the same layers of gate   , respectively.As shown in Stage 1 in Figure 4, taking Path 3 as an example.
is a path starting from the gate CX 3 ,  2 .It describes a sub-circuit where the gate RZ 1 is in the same layer as CX 3 ,  2 , and the gate RX 2 is in the next layer of RZ 1 .

Vectorization
To generate paths for each gate, we apply random walk [47] in the circuit.Random walk is a popular algorithm in the graph domain to explore neighboring information of nodes.When extending a path, the next gate is randomly selected from gates that share a relation with the former gate.For the gate that requires vectorization, we collect multiple paths starting from it within a specific number of steps.Figure 4 shows six paths of the CX 3 ,  2 gate.The first walk is responsible for the following paths: The second walk generates the following paths: These two walks generate six paths starting from the same target gate but in different directions.The total number of paths for a given gate is determined by the number of steps (  ) per walk and the number of walks (  ), which are configurable parameters.The number of walks determines the number of neighboring sub-circuits sampled as circuit features.The number of steps determines the maximum number of gates in these sub-circuits.Accordingly, the upper-bound number of paths for one gate can be estimated as (    + 1), where 1 is the 0-step path.
The quantum gate is then vectorized by comparing its paths to a static path table.To be specific, the paths in this table are offline generated by enumerating all possible parameters of Equation 3.For each gate, the dimension of the vector equals the size of the path table.If a -step path in the path table matches a path generated in the random walk, the corresponding element value is set to   , where  ∈ (0, 1] is a decay parameter.For example, in Figure 4, the generated 2-step Path 3 matches the 4th path in the path table.Thus, the 4th element of the gate vector is set to  2 .Shorter paths are assigned with a larger value, which follows the intuition that the analysis should pay more attention to adjacent gates since they are more likely to exhibit a higher interaction.

The Size of Path Table
According to Equation 3, the size of the path table depends on the number of steps and the settings of relation   and gate   .Relation   has only three types, while gate   includes operated qubits and the gate type.The gate type is hardware-dependent, which consists of the basic gates supported by the target hardware.For example, the gate set of Google Sycamore hardware is composed of and   gates [3], while the gate set includes ID, RZ, SX, X, and CX gates for IBM Manila device.In this paper, our gate set includes RX, RY, RZ, and CZ gates derived from our self-developed superconducting quantum hardware.
Since each path aims to capture the interaction between the starting gate and its neighboring gates, we impose that, for each path in the table, the qubits of gates   should be physically connected to the qubits of the starting gate under the device topology.For example, as shown in Figure 5 (a), only qubits  10 ,  15 and  13 are connected with qubit  12 .Thus, starting from the gate RZ 12 , paths can only involve gates that operate on  10 ,  15 and  13 , e.g., RZ 12  −→ CX 13 ,  14 .The table size is, therefore, mainly determined by the complexity of the device topology, not the number of qubits.We also remove redundant paths that lead to the equivalent circuit.For example, RX 3 −→ RZ 1 describe the same circuit features.Figure 5 (b) shows the number of paths for each gate under the brick-like topology, where there are around 22 and 450 paths for 128qubit IBM Washington processor with   = 1, and   = 2, respectively.

Expressivity
There have been various representations to extract graph features such as graph kernel [44], graph sampling [83], spectrum analysis [28], and random walk [47].Among them, random walk is an effective approach for sparsely-connected graphs, e.g., recommendation systems [36] and knowledge inference [47].The sparsity is also a prominent feature of quantum circuits.However, different from the prior random walk-based methods that capture the similarity between different nodes (e.g., finding similar preferences of two customers) by comparing the paths of each node, our vectorization is expressive to preserve the surrounding information of the target gate for circuit reconstruction.Clearly, the contextual features are recorded in the parameter   of Equation 3, including gate types and operated qubits.The topological features are preserved as the relation in the path.By referring to the path table with the nonzero elements of the gate vector, we can identify the paths related to the target gate and partially reconstruct the circuit.The target gate is at the head of the path.Layers of neighboring gates can be retraced according to their relations with their previous gates.Figure 4 presents an example of reconstruction.According to the gate vector and the path table, the 1st, 2nd, 4th, and 5th paths in the table are identified.The first path contains only the starting gate.Based on the relation in these paths, we can resketch the subcircuit within the step.The ability of circuit reconstruction offers the opportunity to the downstream models that involve circuit generation from the gate vector, such as the unitary decomposition.

DOWNSTREAM MODEL 1: CIRCUIT FIDELITY PREDICTION AND OPTIMIZATION
This section presents the methodology for modeling and optimizing circuit fidelity using our vectorization technique.

Fidelity Prediction
QuCT revises the prediction in Equation 2 by formulating the error  of each gate as the dot-product between its vector   and a weight Apply different grouping schemes.
Still walk across different sub-circuits.
Train separately by Equation 4.
Dataset: vector  : The weight vector  is trained by the stochastic gradient descent algorithm [39] based on a fidelity dataset.This dataset is hardwaredependent.The dataset consists of the ground-truth circuit fidelities by executing a set of randomized circuits on the target quantum device.The ground-truth fidelity of each circuit is labeled via the Hellinger fidelity function [51] as follows: where   and   are the measured and ideal (noise-free) distributions, respectively.
Here, we briefly introduce the reason for choosing dot-product as the equation to predict the gate fidelity.According to [40], the error of a gate is mathematically equal to the sum of the trace of the Kraus operators, where each Kraus operator formulates the evolution caused by a source of noise.This suggests that the dotproduct is consistent with mathematical intuition.In other words, each weight evaluates the effect of a path, which may correspond to the trace of a source of noise.Specifically, the weight element for the 0-step path models noise from the gate itself, while weights for other paths represent noise from the interaction among gates.The weight element of 0 suggests the path is not related to a source of error.We also tried other methods like machine learning and deep learning.They show limited improvement yet are accompanied by a disproportionately high computational complexity.
In the fidelity dataset construction, we allow more randomness when generating circuits to ensure the generality of the model.Clearly, given the hardware constraint of basic gates and topology, we generate circuits with different numbers of gates and proportions of two-qubit gates by randomly inserting gates.Compared to the circuits in RB [42] (only Clifford gates) and XEB [3] (only repeated blocks), our fidelity dataset can cover more complex interactions between gates.In QuCT, our dataset comprises 2000 circuits for each device with 5 qubits or 18 qubits, with the circuit depth ranging from 5 to 160.The time of generating the dataset takes around 1.7 hours per device (including 20.0 minutes of QPU access time).
We also allow generating separable circuits for processors with more than hundreds of qubits in the fidelity dataset construction.Circuits executed on these processors usually have a large number of gates, making the final fidelity vanish to zero.A large amount  To address this, the separable circuits used in the fidelity dataset restrict the entangled qubits into sub-circuits within a small number of qubits.For example, in Figure 6, the circuit can be partitioned into two independent sub-circuits that execute simultaneously on the target hardware.Each sub-circuit has a relatively smaller number of gates, leading to a higher fidelity.We label the fidelity of each sub-circuit and use it for training.Note that the paths of gate vector   still walk across different sub-circuits, thereby capturing the interactions of the entire circuit.To improve generality and accuracy, we apply breadth-first-search to generate different grouping schemes.Since qubits are sparsely connected in real-world quantum hardware, the fidelity dataset is sufficient to cover all grouping schemes.For example, there are 236 grouping schemes under the 128-qubit IBM Washington device topology, while the fidelity dataset contains thousands of circuits.

Fidelity Optimization
As QuCT provides fine-grained gate fidelity prediction and interpretable weight, it allows various optimization techniques to improve the circuit fidelity.
Compilation-level optimization.A typical compilation flow includes routing and scheduling.The routing pass transforms the circuit to satisfy the processor topology.Clearly, it inserts SWAP gates to change the qubit mapping, ensuring that all two-qubit gates can be implemented by the coupler of the processor.By precisely predicting the fidelity, QuCT can be integrated with existing compilers [12,61,87] to find the routing solution with the best fidelity.For instance, the recent SATMAP [55] compiler uses a MAX-SAT solver to find the routing solution that minimizes the number of gates, which leads to different output circuits due to heuristic search.By extending SATMAP with our prediction model, we can guide the compilation to select the output circuit with maximum fidelity (Abbr.QuCT_route_opt).Scheduling means adjusting the layer of gates to improve the fidelity under the execution dependency.For example, in Figure 7 (a), the RY gate operated on  1 can be moved to any of the following three layers with the functionality of this circuit remaining the same.By analyzing the circuit fidelity under each scheduling choice, QuCT helps to find the gate allocation with the highest predicted fidelity (Abbr.QuCT_sched_opt).
Calibration-level optimization.Calibration tries to locate the error in the circuit and tune the amplitudes and phases of pulses to improve the fidelity [38,41].By setting the decay parameter to 1 during vectorization, QuCT provides an interpretable weight (Equation 4).As there is a one-to-one correspondence between weights and paths, by identifying the path with a large weight, critical source of noise are efficiently located.For example, calibrating crosstalk is a necessary step to mitigate error at the pulse level [45], which requires ( 2 ) execution on quantum hardware [57].Using QuCT, we can easily find such crosstalk by identifying large weights with 1-step paths.Figure 7 (b) provides an example.In the table, according to the value of each weight element, the RX gate on  1 increases the error of its parallel CX gate on  2 and  3 by 0.4%.As a 0.4% reduction is significant compared to average gate error (∼0.1%), this suggests high crosstalk that requires more attention to minimize this error.

DOWNSTREAM MODEL 2: UNITARY DECOMPOSITION
The second application of our vectorization model is used to decompose unitaries.We integrate our vectorization method with QFAST [87], which is a state-of-the-art method for achieving the minimum number of decomposed gates.It adopts an A * recursive algorithm, which iteratively approximates the target unitary by inserting unitary gates until the matrix distance is within the threshold.As shown in the left part of Figure 8, a typical QFAST iteration consists of four steps: (a) For the current circuit (e.g., a 4-qubit that has been inserted with two unitary gates), enumerate all possible small unitary as candidates (e.g., 2-qubit and 3-qubit unitary gates) that have fewer qubits than the current circuit.
(b) Insert these candidate gates at the end of the current circuit, search their parameters, and calculate the updated matrix distance the unitary.(c) Select the updated circuit with the distance and check whether the distance is less than the threshold.If not, return to step (a).(d) If the distance is within the threshold, check whether all unitary are in the basic gate set.If not, decompose them following steps (a) to (c).
QFAST suffers from a long decomposition time due to exhaustive searching among various candidates.On the one hand, each unitary gate has numerous parameters to search before calculating the distance.The number of parameters increases exponentially with the number of qubits, it takes than 30 minutes search among 10 candidate gates for an 8-qubit circuit in one iteration.On the other hand, the search process also requires all unitary gates into basic gates, which dramatically increases the number iterations.
The right party of Figure 8 illustrates how QuCT accelerates the decomposition process.Instead of exhaustively enumerating a large number of candidate gates, our approach is to consider a gate vector as a search candidate.The gate vector serves as an intermediate representation between the unitary and the circuit.By identifying the vectors that may be involved in the circuit of the target unitary, we can prune the candidate space.More importantly, we can easily reconstruct the circuit with basic gates according to the paths recorded in the gate vector, eliminating the additional overhead of the decomposition to basic gates.

Unitary-to-Vector Model
Since each gate vector implies the features of a sub-circuit (see Section 3.3), the purpose of the U2V model is to find the candidate vectors that tend to be part of the resulting circuit of the target unitary.The U2V model serves as the bridge between unitaries and gate vectors, where the sub-circuits reconstructed from these candidate vectors will replace the search space of QFAST.To build such a model, we obtain a U2V dataset composed of <, {}> pairs, derived from a set of random circuits generated with the same scheme mentioned in Section 4.1.To obtain high-quality decomposition results, these circuits are optimized using Qiskit transpiler [2] to minimize the number of gates.We then run our upstream vectorization model to obtain the vectors and calculate the unitaries of these circuits.Note that the circuit is not necessarily optimal because the U2V model aims to capture the potential gate vectors of the target unitary rather than the entire circuit.In other words, by combining these vectors, we might get an alternative circuit with a more compact representation.Based on this dataset, we train a random forest model [7] with  decision trees.Given a unitary as input, each tree will predict a gate vector that shows the maximum probability of appearing in this unitary.
As the data size and the unitary input size can be very large, we develop two pruning strategies to reduce the search space and accelerate the overall decomposition process.First, considering that the decomposition is an iterative insertion process starting from the beginning of the circuit, we only choose the vectors from the gate in the first layer when generating unitary-vectors pairs.Second, to accelerate the inference of the U2V model, we apply eigen-decomposition [23] to project the original unitary to a lowdimensional matrix, where the Eigen matrix is determined by selecting the most significant eigenvectors of the unitaries in the U2V dataset.
Our vector-based approach is different from the template-based approach that selects candidates from a limited-size template library.The vector-based approach searches the circuit by learning the mapping from unitary to path features via the U2V model, resulting in a high-quality solution.In contrast, the template-based approach shows smaller design space due to its coarse-grained construction of circuits, which leads to more gates and search time.

QuCT Decomposition Flow
For the current circuit and its unitary   in each iteration, the objective is to find the rest circuit.To approximate the target unitary   , the input unitary   2 _ of the U2V model of this iteration equals the unitary of the rest circuit, which should satisfy the following equation: Note that   2 _ is on the left side of   .Thus, the input of the U2V model can be calculated as follows: The U2V model then outputs gate vectors that may be involved in the circuit.We reconstruct the circuits from these vectors, which serve as the search candidates.Mathematically, the candidate space of QFAST includes unitaries with all combinations of qubits, where the size of the space is . For example, when   = 8, the candidate space of QFAST is 246.The candidate space of our method equals the number of the trees () in the model.Empirically,  = 2  is adequate to find the appropriate candidate, which leads to a 15.4× space reduction when   = 8.

EVALUATION 6.1 Methodology
Quantum hardware.We use two superconducting quantum devices our fidelity prediction model: a) a custom device with 5 Xmon qubits in a chained topology; b) a custom device with 18 Xmon qubits arranged in a 6×3 grid qubit topology; Both devices use RX, RY, RZ, and CZ gates as basis gates, with gate times of 30 ns and 60 ns for single-qubit and two-qubit gates, respectively.The single-qubit gate fidelity and two-qubit fidelity of each device are benchmarked by isolated RB [42].For simultaneous RB [24], the Table 3: 10 benchmarks used in the experiments single-qubit and two-qubit fidelities of both devices are above 99% and 98%, respectively.Quantum simulator.To demonstrate the scalability of QuCT, we design 7 simulators.We perform simulations on the Qiskit Aer QASM simulator (version 0.39.0) using 50-, 100-, 150-, 200-, 250-, 300-, and 350-qubit circuits.To increase efficiency, the simulation is based on the grouping scheme in Section 4.1 that avoids entanglement across the groups.The error of the gate itself is modeled as bit flip, phase flip, and depolarization.The error from the interaction between gates is modeled by applying an RX operator with a random angle ([−/20, /20]) to a 1-step path.In other words, the two gates of a 1-step path will be added with the RX operator if this path is injected with a noise.Under these settings, the fidelity of these 7 simulators is 99.88% -99.97% for single-qubit gates and 99.21% -99.68% for two-qubit gates, benchmarked by isolated RB.
QuCT model.Table 2 shows the detailed configuration of QuCT models.The parameters of the upstream model include the number of qubits and the number of steps of random walks.The decay ( in Figure 4) is set to 0.4, which will be evaluated in the following sections.The upstream model has 8 configurations; the first five are used for fidelity prediction and optimization, and the other three are used for unitary decomposition.Config-0 to config-3 are evaluated on real-world hardware.Config-4 is the configuration for the 350qubit simulator.We do not list the configurations for the other six simulators for simplicity.We write a Python program to implement Comparison between the real and predicted circuit fidelities using QuCT (config-0 to config-2), the RB-based model [42] and the XEB-based model [3] and QUEST [82] on the 18-qubit device.Avg.means the average prediction inaccuracy.Std.means the standard deviation.
random walks in our upstream model.In our fidelity prediction model, we adopt Adam optimizer [39] to implement stochastic gradient descent.Before training the prediction model, we set the learning rate to 0.01, the batch size to 100, the split ratio to 0.8, and the maximum number of epochs to 100.The training stops when reaching the maximum number of epochs.We use Scikit-learn [62] to implement random forest in our unitary decomposition model.The unitary is reduced to the matrices with the top-10 Eigenvalues when generating the U2V model.Dataset.The fidelity dataset is a hardware-dependent dataset, which is built by collecting the real results on the target quantum device.The circuits are generated by the method introduced in Section 4.1, where the maximum size of the groups is 5.We randomly divide the dataset into training and testing sets.For each device, both the training and testing datasets contain results of 2000 circuits.The circuit depth ranges from 5 to 100.Each circuit is sampled 2000 times on the target device.For the config-3 model, we also evaluate it using the benchmarks from Table 3.In the unitary decomposition, we set the number of steps to 4 so that the gate vector can be reconstructed to a larger sub-circuit.

Fidelity Prediction
Figure 9 shows the fidelity predicted by QuCT, the RB-based model [42], and the XEB-based model [3] on the 18-qubit device.The xaxis and y-axis represent the real fidelity and the predicted fidelity, respectively.The color indicates the different duration times of circuits.The prediction inaccuracy is defined as Δfidelity= |real fidelity−predicted fidelity|.
Evaluation on 18-qubit device using randomized circuits.As shown in Figure 9, the point above the diagonal line indicates the overestimation of the fidelity, and the point below means underestimation.Overall, QuCT with config-2 achieves the lowest prediction inaccuracy (5.68%), leading to 2.8×, 4.2×, and 1.8× reduction compared to the RB-based model (16.25%) [42], the XEB-based model (24.37%) [3] and QUEST [82] (10.34%), respectively.The high  Though QUEST tries to model the gate interaction using a graph neural network, it is a coarse-grained approach that is accurate when estimating the fidelity of each individual gate.And it uses a neural network with numerous parameters, taking a lot of time for convergence during training.
The prediction of QuCT is also more stable, which reduces the standard deviation from 10.12% to 4.69%.Improved stability is obtained by our vectorization, which effectively models the interactions between gates, whereas RB and XEB only consider the noise from individual gates and hence tend to overestimate the fidelity.Config-0 has the highest inaccuracy among all downstream models as this configuration sets the number of steps to 0, which also means considering each gate an individual unit, but it is more accurate than RB and XEB as it is operation-aware.When comparing config-1 to config-2, there is little accuracy improvement (0.12%).It implies that the interaction between gates mainly happens within two steps  (among three gates).Thus, we can speculate that 2 steps are sufficient to extract circuit This also matches the theory and empirical observations in many works [32,69,71], as current hardware implementation applies sparse signal lines between qubits to enable the interactions.The signal transmitted in these lines exponentially decreases in both temporal dimension (the duration of the circuit) and spatial dimension (the length of the signal lines), making the noise local.
In Figure 9, we can observe that the inaccuracy of RB and XEB increases over the duration of the circuit.To further investigate this trend, Figure 10 (a) illustrates the predicted fidelity under different duration times, which matches the aforementioned observation.In a 2s duration period, the maximum inaccuracy of XEB is 27.34%.In contrast, it is 5.55% of QuCT with Config-2, which achieves 21.79% inaccuracy reduction.
In Figure 10 (b), we evaluate the different choices of the decay parameter  used during vectorization.For both 5-qubit and 18-qubit devices, the best prediction accuracy is achieved when the decay reaches 0.4.A larger decay makes the model pay more attention to longer paths, while these paths result in little noise.
Figure 10 (c) presents the prediction accuracy under the device drift across 5 days, where the device is calibrated every two days.The ground-truth fidelity is collected every day and compared with different prediction models.We can see that QuCT with config-2 outperforms all other models, which achieves 6.30% -17.78% inaccuracy reduction.To further improve the accuracy, we propose to fine-tune the downstream model using the updated fidelity of 100 and 200 circuits (requiring less than 5 minutes), which shows 8.27% and 20.12% inaccuracy reduction, respectively.
Figure 10 (d) explores the prediction accuracy with different numbers of grouping schemes.For our 18-qubit device with grid topology, there are 59 grouping schemes in total.When applying 10 grouping schemes, the inaccuracy converges to 6.14%.With only one grouping scheme, the inaccuracy is increased to 9.13%.
Evaluation on 5-qubit device.Implementing these benchmarks on the 18-qubit device requires a large number of gates, which could lead to near-zero fidelity.Thus, we deploy them on the 5-qubit device with config-3.The results are shown in Figure 11.On average, QuCT reduces the prediction inaccuracy from 27.52% to 7.73% over 10 benchmarks.The prediction from the RB-based method is more inaccurate on these benchmarks compared to the results on the 18-qubit device with randomized circuits.One reason for this phenomenon is that these benchmarks exhibit a higher proportion of two-qubit gates, resulting in more than 30% inaccuracy.For example, RB is 39.47% and 43.70% inaccurate on QGAN and QSVM benchmarks as they have 36.85%and 37.32% two-qubit gates, respectively.Another reason is that these benchmarks are computationally expensive, which reduces the prediction accuracy as analyzed before.For example, QFT (7.3s) and QEC (6.7s) are the top-2 circuits with the longest duration time, making the inaccuracy reach nearly 40%.In contrast, QuCT shows less than 10% inaccuracy in 7 out of 10 benchmarks.The relatively large inaccuracy of QuCT occurs in the QGAN and QFT benchmarks because their qubit measurement is associated with a uniform probability distribution, leading to insensitivity to noise when calculating fidelity.
Evaluation on 50-qubit to 350-qubit simulators using randomized circuits.To demonstrate the scalability, we evaluate QuCT on multiple simulators with different numbers of qubits, as shown in Figure 12 (a).Compared to the RB-based method, QuCT shows 4.3× inaccuracy reduction with a much lower standard deviation (13.72% that of RB).The prediction on the 350-qubit simulator (config-4) is more accurate compared to the prediction on the realworld quantum device (config-1 in Figure 9), although both these two configurations set the number of steps to 1.This may result from the fact that real-world device is affected by additional complex noise that may stem from the interactions of the environment and the defect of classical hardware, which is hard to model by QuCT.
We also test the robustness of QuCT with different numbers of injected noises, as shown in Figure 12 (b).The inaccuracy of the RB-based method linearly increases with the number of injected noises.However, QuCT shows only a little drop in prediction accuracy for the circuit with more noise.When the number of noise increases to 1K, the RB-based method fails to predict the fidelity (53.14% inaccuracy).By contrast, the inaccuracy of QuCT is only 7.61%, which effectively reduces the inaccuracy by a factor of 7.0×.The noise simulated by injecting random RX gates to 1-step paths represents the gate interactions.The strength of QuCT, therefore, lies in its ability to model this complex noise.

Fidelity Optimization
Compilation-level optimization.We integrate our fidelity prediction model with SATMAP compiler [55] to optimize the fidelity during circuit routing, abbreviated as  __.For gate scheduling, we compare to the technique proposed by Ding et al. [19], which applies a crosstalk-aware scheduling scheme.The scheduling optimization of QuCT is abbreviated as  _ℎ_.The comparison baseline is set as the default routing and scheduling strategy of Qiskit with the optimization level 3. To quantitatively evaluate different optimizations, we define the error reduction as: Error of (Qiskit_route+Qiskit_sched) Error after optimization Figure 13 (a) presents the comparison of different routing and scheduling schemes for fidelity optimization.Compared to the Qiskit baseline, QuCT shows an average error reduction of 5.0× (see case 3).We also compare to the combination of SATMAP [55] and Ding et al. [19] (see case 3 and case 6), where QuCT outperforms in 10 benchmarks and achieves 2.5× improvement of error reduction.
The routing optimization of QuCT improves error reduction from 1.2× to 1.4× compared to SATMAP [55] (case 1 and case 4).The reductions are marginal because the optimization space of the 5qubit circuits is relatively small, where the default routing scheme of Qiskit [48] also works well.SATMAP aims to minimize the number of gates.Although a small number of gates is usually associated with a high fidelity, SATMAP still lacks a model to quantitatively analyze the fidelity.
As for the scheduling-level optimization, QuCT shows 1.7× improvement compared to [19] (see case 2 and case 5).[19] mainly targets to mitigate the noise from crosstalk.However, instead of modeling a certain type of noise, QuCT optimizes the circuit in a global view by finding the allocation of each gate with maximum fidelity.[19] provides higher error reduction in the benchmarks that have a large number of two-qubit gates, such as the QGAN, QSVM, and QEC, since they involve higher crosstalk noise.While for other benchmarks, it even fails to outperform Qiskit.For example, [19] suggests a negative effect (0.8× reduction) in the BV benchmark, but QuCT still provides 2.4× reduction (case 5).
Calibration-level optimization.As mentioned in Section 4.2, we can locate the critical path that involves noise by identifying the weight with a large value in Equation 4 and improve the fidelity by calibrating these noisy paths.To evaluate the effectiveness of this technique, in config-4, there is a total number of 99,176 paths with 1,750 of them injected with noise.In Figure 13 (b), paths are sorted according to their corresponding weight element.QuCT can find 93.0% of noise-injected paths (1627 paths) from the top 4.7% paths in the path table.By calibrating these detected paths, QuCT leads to a longer qubit coherence time with 29.16% fidelity improvement, as shown in Figure 13 (c).

Unitary Decomposition
Config-5, config-6, and config-7 are models for 4-qubit, 5-qubit, and 8-qubit unitary decomposition, respectively.Both training and decomposition are performed on the server with two AMD EPYC 64-core CPUs.The test data for each model are 110 unitaries with 100 random-generated unitaries and 10 unitaries of various benchmarks in Table 3.The threshold is set to 0.01, which means the decomposition is completed when the distance between the target unitary and the current unitary of the circuit is within 0.01.To make a fair comparison, all programs can only use a single thread for decomposition.Table 4 summarizes the decomposition results, including the number of gates, the depth of the circuit, and the time of the decomposition process.For the approach that takes more than three weeks, we terminate the decomposition process and give a rough estimation of the possible required time.
Evaluation of time cost.For 4-qubit and 5-qubit random unitaries, QuCT achieves 4.6× and 46.3× speedup, respectively.For benchmark unitaries, it is 2.2× and 5.8×, which drops slightly since the benchmark unitary is less complicated compared to random unitaries.Compared to Squander [67] that aims to minimize the number of CNOT gates, QuCT achieves 8.1× and 55.6× for decomposing 4-qubit and 5-qubit random unitaries.Similar to QFAST, the long decomposition time of Squander mainly comes from the fact that it applies sequential optimization of gate parameters and inserts fewer gates in each iteration of the searching.When the number of qubits goes to 8, QFAST and Squander may require several months or years to find the decomposition solution.QuCT successfully decomposes all 8-qubit random unitaries and benchmark unitaries in 144.4 hours and 26.1 hours, respectively.Our speedup significantly increases when the number of qubits increases because the search space of QFAST increases exponentially with the number of qubits.Our approach can effectively prune the search space by identifying suitable gate vectors as candidates.
Evaluation of the number of gates.Compared to mathematical methods like CCD [34] and QSD [72]-although they are faster than QuCT-the decomposition solution of QuCT requires fewer gates, resulting in higher reliability of circuits.For example, for 8-qubit benchmark unitaries, QuCT reduces the average number of gates from 8.9 × 10 4 to 3,392.2 compared to QSD.Compared to QFAST, QuCT also shows some improvement in the quality of the resulting circuit, e.g., 1.3× gate reduction on 5-qubit random unitaries.This improvement is mainly attributed to the reconstruction   of the candidate vector.Clearly, according to Figure 8, the paths of the gate vector have already contained information on how to use basis gates to construct the circuit.However, QFAST has to apply a recursive approach to decompose unitary gates into basis gates, which may fall into a local optimal.We also observe that Squander [67] outperforms QFAST in decomposing 5-qubit random unitaries, but it still requires 1.2× gates compared to QuCT.This is because Squander only updates part of gate parameters in each iteration, which may lead to the local optimal.
Decomposition time breakdown.As mentioned in Section 5, the speedup of QuCT is achieved by the following optimization processes.
• Opt1: the total number of search iterations is reduced for two reasons.First, our U2V model helps to identify the gate vectors that share similarities with the target unitary.Second, each gate vector can directly construct the circuit by basis gates.• Opt2: in each iteration, the number of candidates is reduced thanks to the U2V model.We observe that increasing the dataset size can reduce the decomposition time as a larger dataset helps to capture more similarity between unitaries and vectors.When the dataset size reaches 2×10 4 , the reduction of the decomposition time is not obvious.This may be because this data size is sufficient to extract the features of 5-qubit circuits.The training time mainly consists of the time to build the data set and the random forest model.Both processes have linear time complexity with the dataset size.Thus, the overall training time is linear to the dataset size.Note that the decomposition time is significantly greater than the training time (less than 250 seconds for the 5-qubit model).
Further acceleration using multi-threading.As different candidates can be searched in parallel, we can leverage the multithreading technique to further accelerate the decomposition.Based on our test, the sweet points for 5-qubit and 8-qubit unitary decomposition are 10 and 16 candidate vectors, respectively.Figure 14 (c) presents the decomposition time after both QuCT and QFAST [87] applying multi-threading.As a result, for 5-qubit random unitaries, QuCT with multi-threading reduces the decomposition time from 9.2 hours to 2.9 hours and shows 20.9× speedup compared to QFAST.For 8-qubit random unitaries, QuCT reduces the decomposition time from 144.4 hours to 8.3 hours.Although QFAST can also benefit from the multi-threading, it would require more than one month to decompose an 8-qubit unitary.

RELATED WORK
Fidelity modeling and optimization.Errors of NISQ devices can be modeled by mathematical equations [25,30] or profiled by various experiments [3,10,42,[64][65][66].They consider the error of an operation or a qubit as a single value.Additional experiments are required to characterize certain types of errors caused by gate interactions, such as crosstalk [57] and pulse distortion [69] or benchmark the error of a certain circuit [20].These experiment results are difficult to integrate into a unified prediction model.Graph neural network [70,82] has been developed to model circuit fidelity but exhibits a reduced generalization and interpretation power due to high complexities.
Unitary decomposition.Unitary decomposition can be conducted via some mathematical decomposition equations [34,46,72,80], where [16] and [17] achieve the optimal number of gates in arbitrary single-qubit two-qubit unitaries, respectively.However, their number of gates is extremely large when decomposing larger unitaries [6,46,81].Search-based methods are more realistic for four to five-qubit unitaries [15,67,87].With regard to the decomposition time and the number of gates, QSEARCH [15] performs best for three-qubit unitaries.QFAST [87] is improved based on QSEARCH for 4 and 5-qubit unitaries.To limit the search space, [6] uses repeat-until-success circuits for approximation.There are also methods to handle specific unitary types [14,43,53,56], such as Clifford unitaries [26,76] and sparse unitaries [52].Quartz [86] and Queso [85] are techniques that focus on decomposing parametric unitaries to enable automatic gate transformation.To enable this decomposition, we can construct the path table with parametric gates, which will be the future work of QuCT.
Features extraction on a graph.Extraction of contextual and topological features is important in the analysis of natural language [18], graph [73], and program [60].They extract features as frequent sub-structures and represent them as vectors.Random walk is applied in graph analysis [22], which puts paths as input of natural networks.Thus, we think that it is reasonable to apply this technique to the analysis of quantum circuits.Some studies leverage pattern matching in analyzing quantum circuits to find subcircuits for cancellation [35,54] while suffering from exponential time complexity.

CONCLUSION
We propose a unified framework for analyzing quantum circuits, which first utilizes contextual and topological information to improve the accuracy and efficiency of the analysis.Our upstream model extracts gates of circuits into vectors considering their neighboring gates and their dependencies.Our downstream models take gate vectors as input and analyze circuits for specific tasks.We verify our framework with two representative analysis tasks.Our circuit fidelity prediction model shows 4.2× accuracy improvement and achieves 46.3× speedup compared to prior unitary decomposition methods.

Figure 1 :
Figure1: Key components of the QuCT framework.In the upstream model, each gate is transformed into a vector that captures its neighboring circuit features.The downstream models take these vectors as input for various analysis tasks, such as fidelity prediction and unitary decomposition.

Figure 2 :
Figure 2: Example of a quantum circuit with four layers and its equivalent unitary.
(a) Real and predicted fidelities of different methods.

Fidelity
Numbers of resulting gates and time cost of unitary decomposition.

Figure 4 :
Figure4: Two-stage process to vectorize a gate in the circuit and the reconstruction process from the gate vector.

( b )Figure 5 :
Figure 5: Minimizing the size of the path table.(a) The adjacent qubits of  12 include  10 ,  13 , and  15 ; (b) Paths are generated under the IBM brick-like topology.

Figure 6 :
Figure 6: Workflow to use separable circuits to train weight vector  .

Figure 8 :
Figure 8: Workflow of the unitary decomposition.

Figure 9 :
Figure 9: Comparison between the real and predicted circuit fidelities using QuCT (config-0 to config-2), the RB-based model[42] and the XEB-based model[3] and QUEST[82] on the 18-qubit device.Avg.means the average prediction inaccuracy.Std.means the standard deviation.
Evaluation of device drift.(d) Inaccuracy under numbers of grouping schemes.

Figure 10 :
Figure 10: Detailed analysis of fidelity prediction

Figure 12 :
Figure 12: Prediction inaccuracies on the simulator with more qubits.The results of (b) are obtained from the 100-qubit simulator.

Figure 13 :
Figure 13: Evaluation different fidelity optimization techniques.The results of (a) are obtained from the 5-qubit device.The results of (b) and (c) are obtained from the 350-qubit simulator.

Figure 14 (
Figure14 (a)  shows the speedup breakdown for the decomposition on 5-qubit random unitaries.The three optimizations contribute to 5.2×, 2.5×, and 3.6× speedup compared to QFAST, respectively.QFAST spends 54.5 hours for the decomposition of the QEC unitary, while QuCT takes 728.3 seconds by benefiting from opt3.Evaluation of different sizes of U2V dataset.Figure14(b) shows the training time and decomposition time with different sizes of the U2V dataset.We observe that increasing the dataset size can reduce the decomposition time as a larger dataset helps to capture more similarity between unitaries and vectors.When the dataset size reaches 2×104 , the reduction of the decomposition time is not obvious.This may be because this data size is sufficient to extract the features of 5-qubit circuits.The training time mainly consists of the time to build the data set and the random forest model.Both processes have linear time complexity with the dataset size.Thus, the overall training time is linear to the dataset size.Note that the decomposition time is significantly greater than the training time (less than 250 seconds for the 5-qubit model).Further acceleration using multi-threading.As different candidates can be searched in parallel, we can leverage the multithreading technique to further accelerate the decomposition.Based on our test, the sweet points for 5-qubit and 8-qubit unitary decomposition are 10 and 16 candidate vectors, respectively.Figure14 (c) presents the decomposition time after both QuCT and QFAST[87] applying multi-threading.As a result, for 5-qubit random unitaries, QuCT with multi-threading reduces the decomposition time from 9.2 hours to 2.9 hours and shows 20.9× speedup compared to QFAST.For 8-qubit random unitaries, QuCT reduces the decomposition time from 144.4 hours to 8.3 hours.Although QFAST can also benefit from the multi-threading, it would require more than one month to decompose an 8-qubit unitary.

Table 1 :
Two quantum devices involved in the experiment.

•
Opt3: for each candidate, QuCT requires less time to search gate parameters since the reconstructed circuits are composed of basis gates with fewer parameters.For example, a two-qubit unitary gate used in QFAST includes 16 parameters, while a CZ gate used in QuCT has no parameter to search.