Synthetiq: Fast and Versatile Quantum Circuit Synthesis

To implement quantum algorithms on quantum computers it is crucial to decompose their operators into the limited gate set supported by those computers. Unfortunately, existing works automating this essential task are generally slow and only applicable to narrow use cases.We present Synthetiq, a method to synthesize quantum circuits implementing a given specification over arbitrary finite gate sets, which is faster and more versatile than existing works. Synthetiq utilizes Simulated Annealing instantiated with a novel, domain-specific energy function that allows developers to leverage partial specifications for better efficiency. Synthetiq further couples this synthesis method with a custom simplification pass, to ensure efficiency of the found circuits. We experimentally demonstrate that Synthetiq can generate better implementations than were previously known for multiple relevant quantum operators including RCCCX, CCT, CCiSWAP, C√SWAP, and C√iSWAP. Our extensive evaluation also demonstrates Synthetiq frequently outperforms a wide variety of more specialized tools in their own domains, including (i) the well-studied task of synthesizing fully specified operators in the Clifford+T gate set, (ii) є-approximate synthesis of multi-qubit operators in the same gate set, and (iii) synthesis tasks with custom gate sets. On all those tasks, Synthetiq is typically one to two orders of magnitude faster than previous state-of-the-art and can tackle problems that were previously out of the reach of any synthesis tool.


INTRODUCTION
Quantum computing promises to gain a significant advantage over classical computing by leveraging the principles of quantum mechanics [Arute et al. 2019;de Wolf 2017;Shor 1997].However, for such an advantage to be realized in practice, quantum algorithms must be implemented and executed on a quantum computer.This requires bridging the gap between the high level constructs used in the description of those quantum algorithms, and the limited set of operations that can be executed on a quantum computer.specified operators in the Clifford+T gate set [Gheorghiu et al. 2022a;Mosca and Mukhopadhyay 2021] by a factor of one to two orders of magnitude, often producing more efficient circuits.Second, Synthetiq's performance is on par with the state-of-the-art approach for -approximate synthesis of fully specified multi-qubit operators in the Clifford+T gate set [Gheorghiu et al. 2022b].Finally, Synthetiq stands out as the first tool to successfully synthesize relative phase operators, an important case of incomplete specification.Those operators are in particular crucial for the efficient implementation of operators with multiple controls [Maslov 2016], and are used by Qiskit as one of the standard decompositions of the MCX operator [Qiskit 2023].
Main Contributions.To summarize, our main contributions are: • Synthetiq, a fast and versatile synthesis algorithm for quantum operators over finite gate sets based on Simulated Annealing ( §3- §4), • a natural framework for partial specifications addressing common synthesis tasks ( §5), • an implementation1 and thorough evaluation of Synthetiq, showing that it outperforms more specialized tools and can tackle synthesis problems that were previously out of reach ( §6).In the following, we present the necessary background ( §2), exemplify Synthetiq on an example ( §3), formally describe Synthetiq ( §4) and how it handles partial specifications ( §5), evaluate Synthetiq ( §6) and discuss related work ( §7).

BACKGROUND
We now present the necessary background on quantum computation and Simulated Annealing.
Qubit.A qubit is the quantum counterpart of a classical bit.The state of a qubit is a linear combination of the basis states |0⟩ and |1⟩ , which we can write as = 0 |0⟩ + 1 |1⟩ with 0 , 1 ∈ C. In the following, we often omit the subscript indicating the qubit name when it is not relevant.The state of qubit can equivalently be described by a state vector = 0 1 .To describe a system with multiple qubits, we use the tensor product ⊗.For instance, we can write the state of a system with two qubits and as with ∈ C. We often abbreviate | ⟩ ⊗ | ⟩ to | ⟩ .We say the state vector describes in the computational basis {|00⟩ , |10⟩ , |01⟩ , |11⟩}.
Quantum Gates and Circuits.Quantum compiling aims to produce circuits to be run on a quantum computer.Quantum circuits consist of a fixed number of qubits, and gates to be applied to some of those qubits.For instance, the X gate acts on one qubit and flips its value, or more formally X | ⟩ = | ⊕ 1⟩.More generally, X maps the state 0 |0⟩ + 1 |1⟩ to 0 |1⟩ + 1 |0⟩.Using the state vector representation, this operation can be described by the following matrix in C 2 1 ×2 1 , which we refer to as ⟦X⟧: Applying gate X to a qubit described by the state vector yields the new state vector ⟦X⟧ .Some quantum gates act on multiple qubits at the same time.For instance, the controlled X gate CX maps | ⟩ to | ⟩ X | ⟩ = | ⟩ | ⊕ ⟩; the second qubit is flipped iff the first one is 1.Again, the effect of CX can be described as a matrix.We finally introduce the identity gate I.It is the no-op of quantum gates, and its matrix representation when applied to qubits is the identity matrix in C 2 ×2 .Quantum Circuit Semantics.We can think of a quantum circuit as a list of gates and qubits to which each gate is applied.For instance, we can consider the circuit on two qubits which applies the X gate to its first qubit, followed by the CX gate on both of its qubits.The effect of this circuit on two qubits can again be described by a matrix, which is simply the product of the matrices of each of its gates: ⟦ ⟧ = ⟦CX⟧ • (⟦X⟧ ⊗ ⟦I⟧).Note how we used the tensor product of ⟦X⟧ with the identity I to extend the semantics of this one qubit gate to two qubits.In slight abuse of notation, we will typically write instead of ⟦ ⟧ throughout this work.
Quantum Operators.In the above, we described how the effect of a quantum gate or circuit on qubits can be described by a matrix in C 2 ×2 .It is worth noting that all matrices representing the action of gates or circuits are unitary 2 .We say a quantum operator is an operation on qubits described by a unitary matrix .We then say that a circuit implements this operator if ⟦ ⟧ = .
Clifford+T Gate Set.To implement a general quantum operator on a given quantum computer, we must decompose it into gates from the gate set G supported by this computer.Such gate sets G are usually universal, meaning that every quantum operator can be decomposed into gates from G, with arbitrary precision > 0.More formally, for any unitary matrix ∈ C 2 ×2 and > 0, there exists some circuit on qubits using only gates from G such that ( , ) ⩽ , for some distance metric .Fault-tolerant quantum computers will likely rely on the so-called Clifford+T gate set [Terhal 2015], which consists of the following gates: , and CX = 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 .
The Clifford+T gate set is known to be universal [Nielsen and Chuang 2002].Further, Giles and Selinger [2013] showed that based on the coefficients of an operator we can decide whether it can be decomposed exactly in the Clifford+T gate set, meaning there is a circuit such that ( , ) = 0. Based on the determinant of , we can decide if this decomposition requires an ancilla, that is if acts on qubits, its decomposition will be a circuit acting on + 1 qubits, where the last extra qubit is assumed to be in state |0⟩ initially, and must be returned to this state at the end.Simulated Annealing.Simulated Annealing (SA) allows to efficiently find a state which approximately minimizes an energy function ( ), often referred to as an energy.Starting from some initial state, each SA step picks a randomly sampled neighbor state ′ ∼ N ( ) of the current state , and selects ′ as the current state with some probability ( , ′ , ).Here, ∈ R >0 denotes a progressively decreasing temperature.In this work, we set the acceptance probability using the common approach ( , ′ , ) = min 1, exp − ( ′ )− ( ) .Thus a better ′ (meaning ( ′ ) < ( )) ensures ( , ′ , ) = 1 and therefore is always accepted.In contrast, a worse ′ has ( , ′ , ) < 1 and thus can be rejected or accepted, where acceptance is particularly likely initially, at high temperatures .

OVERVIEW
We now explain the approach of Synthetiq by synthesizing a circuit over the Clifford+T gate set for an example operator.First, we introduce this operator and translate it to a partial specification ( §3.1).We then show how Synthetiq finds an implementation for it ( §3.2) and how composite gates can be used to speed up this search ( §3.3).We describe Synthetiq in more detail in later sections.

Creating a Partial Specification
Controlled-T.The Controlled-T (CT) operator acts on two qubits, and applies the T gate to the second qubit if and only if the first qubit is 1.More formally, for , ∈ {0, 1}: In matrix notation, CT corresponds to the operator shown below: We note that CT can be represented exactly by a Clifford+T circuit [Giles and Selinger 2013, Theorem 1], but only if its circuit can make use of an ancilla [Giles and Selinger 2013, Corollary 2].We now show how to encode the CT specification, taking into account this ancilla.
Here we denote all unspecified elements with "?".When comparing such an underspecified matrix to the operator induced by a given circuit, Synthetiq only takes the specified elements into account.
More precisely, we say a circuit satisfies the specification if the matrix of the circuit operator matches all specified coefficients of the underspecified matrix.Although our example matrix consists of only fully known or fully unknown columns (i.e., isometries), we note that Synthetiq can also handle partially specified columns.

Running Synthetiq
We now describe how Synthetiq builds a quantum circuit from a gate set and a partial specification, following Fig. 1.
Sampling an Initial Circuit.For a given input, Synthetiq executes multiple separate runs of SA.Each run starts by sampling a random circuit .To this end, Synthetiq first samples a circuit size within the circuit size bounds ℓ min and ℓ max , and then a gate from the gate set (augmented with the identity gate) for each position in the circuit.Then, Synthetiq runs SA starting from .Simulated Annealing.Each SA step randomly replaces one gate in the current circuit , yielding the circuit ′ .Synthetiq decides whether to keep or replace it by ′ using a custom energy function .Specifically, if ′ is strictly better, we always replace , otherwise we do so probabilistically.
Energy Function.We describe our energy function in detail in §4.4.In a nutshell, it measures how close the operator implemented by the circuit is to the given specification.The key insight of Synthetiq is the extension of an energy function used in e.g.[Chou et al. 2022;Khatri et al. 2019;Meister et al. 2023] to work for partial specifications.Such an extension is non-trivial, as it must gracefully decrease for circuits that "almost" satisfy a specification in order to guide SA efficiently.Further, it must be scaled to work for all possible underspecified operator sizes without requiring tuning of Synthetiq to each underspecified case.
Found Circuit.At every step of the SA algorithm, we check whether the current circuit satisfies this specification.If so, we run a simplification pass on it.This simplification is a simple and fast algorithm that heuristically optimizes both the total gate cost and the depth of user specified gates.Here, the total gate cost is the sum of the implementation cost for each gate, which is specified by the user.If the simplified circuit is the best one found so far, we record it.Finally, we use this best circuit to update the circuit size bounds for the next initial circuit.If no circuit is found in a reasonable number of steps, we abort SA and start a new run from a fresh random circuit.
Parallelization.As Synthetiq executes many short runs of SA, we can easily parallelize it.This leads to an almost linear speedup in the number of cores, greatly increasing Synthetiq's speed and allowing it to synthesize larger circuits than previous work, as demonstrated in §6.

Speeding up Synthetiq with Composite Gates
The versatility of Synthetiq extends beyond the Clifford+T gate set, as it is designed to work with any finite gate set.This includes the ability to integrate composite gates, that is operators whose decomposition in the gate set is already known.This stratified synthesis [Heule et al. 2016] significantly broadens the size of operators that Synthetiq can decompose as it can dramatically boost the speed of the search process.This is because a single circuit mutation can introduce a complex operator that would otherwise require a large amount of precise mutations.For instance, the inclusion of the RCCX operator as a composite gate enabled us to find an optimal implementation of the CT-operator within seconds, while finding one without the composite gate took 12 hours.
However, it is important to note that while the direct insertion of a complex gate can speed up the search process, it may not always yield the most efficient circuits.There could be a simpler, more efficient implementation that could only be discovered after a more thorough run of Synthetiq.

SYNTHETIQ
We now describe our method in more detail.For more details on hyperparameter optimization and values, we refer to Tab. 3 in §6.1.
End-to-End Procedure.Alg. 1 describes our main algorithm, Synthetiq.It takes as input a gate set ( §4.1) and a (partial) specification ( §4.2).Lin.3-16 then execute multiple separate runs of SA.The starting point of each run is a fresh random circuit (Lin.5) made of ℓ randomly chosen gates.We select the number of gates ℓ uniformly at random within the current circuit size bounds ℓ min and ℓ max (Lin.4).Lin.6-16 then run steps SA steps from this initial circuit.
At each SA step, Lin.7 creates a new candidate circuit ′ by randomly changing one gate in the current circuit ( §4.3) and scores the two circuits using the energy function (Lin.8).This energy function captures how close the circuit is to the given specification (see §4.4).Lin. 9 then accepts the new circuit ′ with a probability depending on the energy of both and ′ (see §4.5).
If the current circuit does not satisfy the specification, Synthetiq proceeds to the next SA step (Lin.11).Otherwise, Lin. 12 simplifies it ( §4.6).If it is the best circuit found so far (Lin.13), Lin. 14 records it.Finally, Lin. 15 uses the best circuit to update the circuit size bounds ℓ min and ℓ max ( §4.3).

Gate Sets
Synthetiq searches for a circuit implementing the given specification using the gates in the input gate set G. This can be any finite set of gates, for instance the Clifford+T gate set or a user supplied custom gate set.As discussed in §3.3, we can also add composite gates to G to speed up Synthetiq.

Expressing Partial Specifications
A partial specification S = ( , ) consists of two matrices ∈ C 2 ×2 and ∈ {0, 1} 2 ×2 . is the operation we aim to implement and is a boolean mask specifying which elements of should be matched (marked with 1) and which can be ignored (marked with 0).Note that elements of corresponding to a 0 in can be omitted in S-we typically write them as "?".We say a unitary matrix ∈ C 2 ×2 matches the specification S = ( , ) if and only if • = exp( ) • for some ∈ R, where "•" denotes element-wise multiplication and is a global phase difference.Of course, there may be many such matrices.We will show in §5 how this natural framework for underspecification can be used to specify various useful applications in quantum computing.

Building Circuits
We now explain how Synthetiq builds and modifies circuits.
Randomly Mutating Gates.To mutate a given circuit into a new candidate ′ , Synthetiq picks a gate position uniformly at random, and replaces it with a randomly selected gate.
To this end, Synthetiq first decides whether or not to replace the selected gate by a placeholder "identity" gate, with probability Id ∈ [0, 1].This replacement step is analogous to deleting a gate, while replacing an identity gate by another is analogous to inserting a gate.Hence, this approach allows us to work with fixed size circuits, while keeping the flexibility of gate insertion and deletion.
If the identity gate was not selected, Synthetiq chooses which gate to insert.It picks a gate in G uniformly at random and then select the qubits it acts on.Further, we multiply the probability of sampling composite gates by comp ∈ [0, 1] in order to avoid inserting these more expensive gates into the circuit too frequently.
Building the Initial Circuit.To build the initial circuit, we select a circuit size ℓ, and generate a circuit by randomly selecting ℓ gates as described above.
We have found empirically that Synthetiq performs best when ℓ is around 3 times the length of the optimal circuit for the given specification, which we denote here by .As is not known when running Synthetiq, we use an adaptive scheme to pick ℓ.To this end, we define minimal and maximal sizes ℓ min and ℓ max , and sample ℓ uniformly between the two for each new initial circuit.If a circuit is found that implements the specification in an SA step, we use the length of the current optimal circuit ℓ best to move ℓ min closer to 3 , and analogously for ℓ max : We now explain how Eq. ( 4) moves ℓ min closer to 3 ; the intuition behind Eq. ( 5) is analogous.
Suppose for now that we set min = 3, then Fig. 2 illustrates three possible situations.In all three situations, 3 ≤ 3ℓ best , as is the theoretical best circuit size, while ℓ best is the best size found so far.Fig. 2a shows the typical case in the first steps of SA: as we pick ℓ min to be small, it is typically smaller than 3 .Then, the second case in Eq. ( 4) increases ℓ min slightly, where the gray part accounts for rounding and increments smaller than 1.Fig. 2b shows a case where ℓ min was increased to surpass 3 , but still lies below 3ℓ best .In this case, the second case in Eq. ( 4) further increases ℓ min , which moves us further from 3 , but eventually Synthetiq will find better circuits, thus decreasing 3ℓ best .Finally, Fig. 2c shows a case where ℓ min is larger than 3ℓ best .Then, we know for sure that 3ℓ best is closer to the optimal value 3 , so we directly update ℓ min to 3ℓ best .
We note that since our estimate ℓ best of becomes better with every circuit found (no matter whether the current circuit improved ℓ best or not), we apply this update rule every time a circuit is found.When Synthetiq is run on multiple threads, ℓ min and ℓ max are synchronized across threads.

Evaluating Circuits
To evaluate a circuit with respect to a specification S, we need to define an energy function (S, ) that measures the distance between and S.
Various works have used measures inspired by fidelity to compare the matrices of quantum operators [Chou et al. 2022;Khatri et al. 2019;Meister et al. 2023].Such measures are typically of the shape4 ( , ) = 1 − |Tr( † ) | 2 and have the important property that if and differ by a global phase, i.e., if = i for some ∈ R, then ( , ) = 0.
Intuitively, we want to generalize ( , ) to account for partial specifications by replacing and by • and • , respectively.Unfortunately, the resulting energy ˜ is useless if only 0s are specified in , which is relevant, e.g., when specifying relative phase operators ( §5).In such a case, since • = 0, the suggested energy ˜ is constant regardless of the current circuit: To address this problem, we first rewrite ( , ) to5 (derived in App.A.1): After this rewrite, we generalize to account for partial specifications by replacing and : Here, we define 0/0 := 1 to account for the fact that Tr(( , which still provides valuable information; at the entries where is 0 (and is 1), might not be and the higher its values, the higher ¯ is.
Further, we also adapt the normalization factor of 2 in ¯ .This is crucial, as an incorrect normalization would make the magnitude of ( , ) sensitive to the number of specified elements || || 2 .To this end, we note that 2 is equal to ∥1 ∥ , where 1 is the matrix of dimension 2 × 2 with all ones, that is the boolean mask for a complete specification.Therefore, we replace the normalization of 2 by || || .Taking the square root of all squared norms and dropping the factor of 1 2 for simplicity then yields: Measuring Distance Between Circuit and Specification.To speed up the search for circuits, we always evaluate not only the current circuit against the specification S = ( , ), but also all the circuits that can be built from by permuting its qubits following some permutation .Note that this is equivalent to evaluating against any permutation S = ( , ) of the specification.Here (resp.) is defined as −1 (resp.−1 ) where is the change-of-basis matrix from the original qubit order to their permutation.This gives finally: where is the set of all permutations of {1, ..., }.This is inspired by the equality metric for classical programs presented in [Heule et al. 2016].There, if a program gives the correct result in the wrong register, the penalty is much smaller than if the result is not present at all.
Efficiently Computing the Circuit Matrix.To compute , we need the matrix of the operator implemented by .To compute it efficiently, we maintain a binary tree over the matrices of the list of gates in .Hence, as each mutation only modifies one gate, we can update the complete matrix in only O (log(len( ))) matrix multiplications.This comes at the cost of an extra memory requirement, but this is not a limiting factor in practice.
Approximate Synthesis.Synthetiq can be readily adapted for approximate circuit synthesis.We simply treat a circuit as discovered once the condition ( , ) ⩽ √ 2 is met (see Lin. 11 in Alg. 1).In the context of a complete specification, this corresponds exactly to the global phase invariant distance employed in previous studies, such as Gheorghiu et al. [2022b].

Updating the Current Circuit
As mentioned in §2, SA accepts a new circuit ′ with a certain probability depending on a temperature function .Modifications leading to a better circuit ( (S, ) < (S, ′ )) are therefore always accepted, whereas modifications leading to a worse circuit are only accepted occasionally.The temperature function ( ) governs the acceptance rate of such worse modifications: increasing it means that worse modifications are more likely to be accepted.We define as: where accept is the total number of accepted modifications since the start of the SA run, ℓ is the number of gates in the circuit currently under consideration, and (0) and norm are hyperparameters.Intuitively, after many modifications were accepted, acceptance of a worse circuit becomes less likely, allowing to focus on a local optimum.

Simplifying a Circuit
SA allows Synthetiq to discover many new circuits implementing the given specification.However, in many cases, we are specifically looking for efficient circuits implementing this specification.The circuits found by SA can often be trivially simplified, for instance by replacing two consecutive gates that cancel out by the no-op identity gate I.We therefore developed a fast simplification pass to remove such inefficiencies from the found circuits.We first discuss two ways of measuring the efficiency of a circuit and then describe our simplification pass.
Cost of a Circuit.The first way of measuring efficiency is by looking at the number of gates the circuit is made of.Typically, if each gate has a cost of execution on the quantum computer (be it in time or loss of precision), the cost of a circuit is simply the sum of the cost of each of its gates.For some applications (see §6.3), we assume the cost of all gates is the same and equal to 1.In contrast, for fault-tolerant quantum computing, the T gate is much more expensive to implement than any of the other gates in the Clifford+T gate set.To compute the cost of a circuit in this gate set, we use the following gates costs, roughly reflecting gate complexity on hardware: In all gate sets, the identity gate has cost 0, as it does not apply any operation to the qubits.
Algorithm 2 Simplification applied as post-processing.denotes the i-th gate in , [ , ] is the subset of consisting of its i-th to k-th gates, and ↔ is after swapping the i-th and k-th gates.Depth of a Circuit.The second way of defining a circuit cost takes parallelism in its execution into account.If a circuit applies one gate on its first qubit and another on its second qubit, those two gates can often be executed at the same time.Therefore, the cost of the execution is only the cost of one gate, and not the sum of the two.The depth of a circuit reflects this cost.It is the length on the execution of the circuit, assuming all operations that can be are parallelized6 .Further, in cases where some gates take much longer to execute than others, we may use as cost for the circuit its depth when only considering those expensive gates.This is typically the case for the Clifford+T gate set, where we measure T-depth.
Optimizing Found Circuits.We show our simplification pass in Alg. 2. It consists of two main parts.The first aims at minimizing the gate cost of the circuit .More specifically, it replaces sequences of gates in the circuit with gates from G that have the same semantics if this gate has a lower cost than the complete sequence (Lin.4-7).Note that we only consider sequences of up to 12 gates since higher values did not result in more efficient circuits.The second part aims at minimizing the depth of the circuit (Lin.13-19).Here, it swaps gates that commute7 if doing so would reduce the depth of the circuit.Finally, to create more opportunities for both optimizations, Alg. 2 also swaps any gates that commute, in both parts (Lin.8-10 and Lin.17).To ensure we don't endlessly swap gates back and forth, we only do so according to a custom total order on gates ≺.We give more details about this order in App.A.2.
A Custom Pass.We note that this simplification pass is specifically tuned to our SA algorithm.It is both fast and focused on simple optimizations that are easily missed by SA and can be applied for any finite gate set.Further, we found in practice that for a given circuit, optimizing for gate count and depth were not at odds.We therefore always optimize for both.This simplification pass is an essential part of the algorithm and is not meant to be used as a standalone procedure.Indeed, when applied to circuits found by other synthesis tools, it most often does not find any simplifications.

LEVERAGING PARTIAL SPECIFICATIONS
We now show how to leverage our partial specification framework to express common tasks when implementing quantum algorithms.
Classification.Recall that in a partial specification S = ( , ), the boolean matrix specifies which elements of matrix should be matched.When each column of is either all ones or all zeros (i.e., each column is fully specified or not at all), the partial specification is an isometry.Otherwise, we refer to the specification as element-wise.
Tasks.Tab. 1 summarizes the tasks discussed in §5.1- §5.5, and whether they can be expressed as a full specification, an isometry, or require element-wise specification in the general case.We note that multiple tasks can be combined.For instance, allowing an isometry to use an ancilla yields a new, element-wise specification.

State Preparation
The task of state preparation asks to implement an operator that brings qubits from some initial state (typically |0...0⟩) to some target state .This operator is only specified for the input |0...0⟩ and can be written as = ?, where denotes the vector representation of the target state.State preparation applications include quantum chemistry ( [Cao et al. 2019]), quantum machine learning ([Araujo et al. 2021]), and solving systems of linear equations ( [Harrow et al. 2009]).For example, the specification to prepare the GHZ state for two qubits ? ??0 ???0 ??? .

Relative Phase Operators
We say an operator ′ is a relative phase operator for operator if for any input state in the computational basis | ⟩ there exists some phase such that ′ | ⟩ = i | ⟩.Such relative phase operators often have a shorter circuit implementation than their non-relative original.Therefore, it can be interesting to replace with its relative counterpart when it is used in a bigger computation, if this replacement does not change the overall computation.Common use cases for relative phase operators include their use in more efficient implementations of their non-relative counterpart [Maslov 2016], replacing the CCX gate by a relative RCCX gate when it is later uncomputed [Paradis et al. 2021], or when the non-relative counterpart is used in a circuit that is measured directly after the application of the operator.
When can be described classically, i.e., when it maps all computational basis states to another basis state, we know that its matrix representation consists of 0s and 1s.In this case, we can simply replace each "1" with a question mark.As any operator built by Synthetiq is unitary, any circuit it produces matching the specification will have values of norm 1 in place of the question marks.

Operators with Ancillae
As discussed in §3, an ancilla is an extra qubit used to help implementing an operator on the other qubits.We assume the ancilla is initially in state |0⟩ and must be returned to the same state.In some cases, such an ancilla is necessary to implement the given operator using the chosen gate set [Giles and Selinger 2013].In other cases, ancillae are not necessary but may allow for a shorter circuit implementation, e.g., CCX has lower T-depth when implemented with one ancilla [Amy et al. 2013].To represent an operator with ancillae , we observe that any state ⊗ |0⟩ maps to ( ) ⊗ |0⟩ , while the result on any state ⊗ |1⟩ is unspecified.The resulting specification is thus: , where 0 is the null matrix.More generally, for an already partial specification S = ( , ), adding an ancilla changes the specification to S ′ = ?0 ?, , where 1 is the all ones matrix of appropriate size.For example, adding an ancilla to an isometry gives an element-wise underspecification.An example of ancilla underspecification for the CT operator can be found in §3.

Oracles
).This corresponds to the following incomplete specification: ? ? 1 0 ??0 −1 ??−1 0 ??0 1 .5.5 Dirty bits Dirty qubits are similar to ancillae, with the difference that they can initially be in any state and must return to that same state after the computation.Therefore, they allow for less underspecification, but have the advantage of not requiring any preparation for the extra qubit.Dirty qubits are for example used in Low et al. [2018] to do state preparation, allowing them to achieve significantly shorter circuits.
Given a specification S = ( , ), we can allow for an extra dirty qubit by using the specification , where 1 is the all-ones matrix of appropriate size.The null matrices 0 in S ensure that the dirty qubits remain in the same state before and after the computation, since a state flip of the dirty qubits would require a non-zero element at any of the positions of the null matrices.

EXPERIMENTAL EVALUATION
We now experimentally evaluate Synthetiq.We first explain our process for optimizing the hyperparameters occurring in the SA algorithm ( §6.1).We then demonstrate that due to its versatility and speed, Synthetiq can push the limits of circuit synthesis, synthesizing previously unknown decompositions of relevant quantum operators ( §6.2).We finally evaluate the versatility of Synthetiq by running it in different modes ( §6.3- §6.6) and comparing the results to synthesis tools specialized for each of these modes.Overall, our main findings are: §6.2 Better operator decompositions.We show that Synthetiq finds better implementations than the currently best known ones for RCCCX, CCT, CCiSWAP, C √ SWAP, and C √ iSWAP.§6.3 Custom gates.We show that Synthetiq can efficiently synthesize circuits with user-supplied custom gate sets, and outperforms the state-of-the-art [Kang and Oh 2023] in 50% of the cases (including 27% of cases where Kang and Oh [2023] fails to return any decomposition) while being equally optimal in all other cases.§6.4 Clifford+T gate set.We show that when synthesizing completely specified operators over the Clifford+T gate set, Synthetiq outperforms the specialized state-of-the-art [Gheorghiu et al. 2022a;Mosca and Mukhopadhyay 2021].Synthetiq is able to find circuits for more operators and those circuits are often more efficient and typically found one to two orders of magnitude faster.§6.5 Approximate synthesis for the Clifford+T gate set.We show that for approximate synthesis on the Clifford+T gate set, Synthetiq is 6 times faster than the state-of-the-art approach specialized to this task [Gheorghiu et al. 2022b] for complex multi-qubit operators and, while slower, can find circuits that are on par with Gheorghiu et al. [2022b] for simpler single qubit operators.§6.6 Relative phase gates.Finally, we show how using Synthetiq to synthesize small components of a bigger circuit allows for more efficient implementations.Specifically, by synthesizing a relative phase carry operator, we can reduce the T-count of the Cirq Adder [Cirq 2023] by more than 3x.
Implementation.We implemented Synthetiq using C++17 with the Eigen matrix library [Guennebaud et al. 2010] and openMP [OpenMP Architecture Review Board 2021] for parallelization.All experiments are conducted on a Linux machine with 500 GB RAM and two AMD EPYC 7601 2200MHz processors, with a total of 64 cores.In the practical implementation of Synthetiq, we do not specify iter (see Alg. 1).Instead, we report the average runtime averaged over 100 runs, where each run finishes as soon as a circuit with the desired property (e.g., T-count optimal) is found.For particularly time-consuming tasks, we instead average on as many runs as fit within a set time-limit (12 hours per task).Unless specified otherwise, we run Synthetiq on all 64 cores.Finally, for Tab. 4, we do not average over multiple runs and instead mention the total timeout instead, as well as the best circuit found within this time-out.Results Format and Correctness.The energy function of Synthetiq naturally checks correctness of the synthesized circuits, as we only consider a circuit to be found if its distance to the specification is 0. Further, Synthetiq explicitly produces the found circuit in the standard OpenQASM 2.0 language [Cross et al. 2017] and can therefore easily be imported to other frameworks such as Qiskit [Abraham et al. 2019].Note that this is in contrast to other tools which often only output a resource count [Gheorghiu et al. 2022a,b;Mosca and Mukhopadhyay 2021].Finally, all synthesized implementations from this section are made available with our implementation.

Hyperparameter Optimization
We describe how we validated Synthetiq's design choices and fine-tuned its hyperparameters, using a randomly generated benchmark of operators.
Generating Random Operators.To optimize Synthetiq's hyperparameters without overfitting to a specific domain, we built a set of random operators covering the many use cases of Synthetiq.The benchmark consists of 90 operators acting on 2, 3, or 4 qubits, whose shortest decomposition contains 10 gates in the Clifford+T gate set8 .A third of these operators have full specifications, another third are isometries, and the last third have element-wise specifications.The performance metric for circuits is T-count for the remainder of this section.
Ablation Study.To evaluate our design choices, we ran Synthetiq on the benchmark described above (i) without rewriting the energy function (instead using Eq. ( 6)), (ii) without using qubit permutations to speed up the search (outlined in §4.4) and (iii) without the simplification pass (outlined in §4.6).The results, shown in Tab. 2, demonstrate the significant impact on runtime of each of these choices.In particular, the simplification pass is essential, increasing Synthetiq speed by orders of magnitude.Note that the speedup is less than one in only one case, namely not applying cost rewriting for full specification.As the two energy functions (Eq.( 6) and Eq. ( 8)) are equivalent for fully specified operators, the slowdown is solely due to the slightly higher computational complexity of the rewritten energy function.
Optimizing Hyperparameters.We optimize the hyperparameters for Synthetiq, recalled in Tab. 3. We first optimize the optimal number of starting gates ℓ on our random benchmark where all operators have shortest decomposition length 10. ℓ = 30 was the optimal value.Optimizing ℓ on a few other random operators of different lengths, we confirmed that the optimal ℓ was consistently around three times the decomposition length size.As we aim to synthesize decompositions of 10 to 40 gates, we use this factor 3 and set ℓ min,init to 30 and ℓ max,init to 120.
Since min and max only start playing a large role for bigger operators, we could not optimize them efficiently on our benchmark.We chose to set min to 2.5 and max to 3.5, to achieve a higher variety of initial number of starting gates and while staying close to the optimal value 3.
Subsequently, we conduct a grid search for every parameter but , scanning over a range of plausible values for each parameter and optimizing the average time taken to solve the random operators introduced above.For the hyperparameters Id and comp we follow a slightly adjusted grid search procedure to ensure the found values perform well for larger operators too.First, as small operators do not require composite gates, optimizing comp directly is impossible.Instead, we add the RCCX gate as a composite gate and set comp to the highest value that does not slow down the synthesis speed by more than a factor of 2. This ensures that the inclusion of a unneeded composite gate does not slow down the synthesis process too much, while ensuring composite gates are still likely to be used for operators that do require this additional gate.In the case of Id , we observe that its optimal value is heavily influenced by the ratio of the optimal circuit size to the number of initial gates, ℓ.Indeed, as ℓ increases, the proportion of identity gates in the optimal circuit also increases, which in turn raises the optimal value of Id .Therefore, when optimizing Id , we set ℓ to 30 for all operators, which is the optimal value of ℓ for operators with 10 gates.
Lastly, we optimize , the moving average factor used in updating circuit size bounds.As is largely dependent on the size of the found circuits and is not significantly influenced by the specifications, we use one larger operator -the 4-qubit adder operator -to optimize this parameter.

Be er Operator Decompositions
Using Synthetiq, we were able to provide new and more efficient decompositions of multiple relevant operators, shown in Tab. 4.
Operators.We briefly describe each of the operators in Tab. 4. The first is RCCCX, that is a relative controlled X with three controls.For any , , , in {0, 1}, it maps | ⟩ | ⟩ to | ⟩ i X | ⟩, flipping the last qubit with unspecified phase ∈ R if and only if all three controls are 1.This gate is extremely useful to decompose controlled X operators with more than three controls, as described in Maslov [2016].Hence, finding a better implementation of RCCCX directly gives a better implementation of the controlled X with four controls, when using only Clifford+T gates.The next operator is CCT, that is the T gate with two controls, mapping Baselines.No existing circuit synthesis tool could synthesize the operators in Tab. 4. First, Kang and Oh [2023] is the only tool that can express the partially specified synthesis problems of RCCCX, CCT, and C √ SWAP.It is however too slow to obtain results within any reasonable timeframe, as it times out for the much simpler circuit CCX after 1 day.The remaining operators (CCiSWAP and C √ iSWAP) are beyond the capabilities of all existing tools due to their size: the fastest synthesis tool for T-depth, Gheorghiu et al. [2022a] again fails to find any result in 2 days for CCiSWAP, and yields incorrect results for C √ iSWAP9 .We hence had to manually combine existing operator decompositions and generic decomposition techniques for each of the operators in Tab. 4. We describe this manual effort in App.A.3.
Results.Qubits are often the scarcest resource in quantum computers.Reducing the number of ancillae, and hence qubits used by a quantum operator is crucial.For three of the operators shown in Tab. 4, Synthetiq was able to find a decomposition using fewer ancillae than previous state of the art10 .Only for the CCiSWAP operator does this come at the cost of a slightly higher T-depth.For the two operators where state of the art decompositions already used the minimum amount of ancillae, Synthetiq was able to significantly reduce the T-depth of the operators: from 8 to 5 for RCCX, and 29 to 8 for C √ SWAP.Further, note that all those results were obtained in only a few hours.Finally, now that those decompositions are known, they can easily be reused by any quantum compiler.
Exploiting Versatility.To generate the decompositions in Tab. 4, we heavily relied on the versatility of Synthetiq.First, incomplete specification was necessary for all operators requiring an ancilla and for RCCCX.Further, we used composite gates to speed up synthesis and hence boost the chances of success.More precisely, we added to the Clifford+T gate set the RCCX gate for all operators but CCiSWAP, where we instead added CCX.This allows the synthesis to directly leverage those complex gates, and hence speeds up the search.To pick which composite gate to add to the gate set, we consistently used the following procedure.If after running for one minute Synthetiq could not find any circuit satisfying the specification, we added RCCX to the gate set.If no circuit was found after running one more minute with RCCX, we replaced RCCX with CCX in the gate set.Note that these intermediate runs take at most 2 minutes, which is negligible compared to the total runtime for each operator.

Mode: Custom Gates
As mentioned above, Synthetiq can synthesize circuits using any finite custom gates set.The most recent work on quantum circuit synthesis also allowing for custom finite gate sets is Kang and Oh [2023].We evaluate the applicability of both tools on the benchmark described below.Benchmark.Each element in the benchmark consists of a circuit specification and a custom gate set.The first part of the benchmark (three _ superpose to bit _ measure in Fig. 3) is the evaluation benchmark from Kang and Oh [2023].However, these synthesis problems are not entirely realistic.They assume the gates required to build a circuit for the specification are known ahead of time, and supply exactly those gates in the gate set.This results in small gate sets (three or less gates for 11 of the 17 problems), and hence easier synthesis.We therefore complete the benchmark with real-world problems taken directly from Quantum Computing Stack Exchange.Gate sets and specifications are taken directly from the questions, resulting in bigger gate sets, where some gates are not used in the optimal decomposition.
Results.We show the results in Fig. 3 and Tab. 5. Fig. 3 shows the results when optimizing for gate count.We see that Synthetiq outperforms Kang and Oh [2023] in 50% of cases, and matches it on the rest.Further, Kang and Oh [2023] is not able to find any decomposition for one of their problems as well as the more complex problems we added to the benchmark, even with a one day time out.This shows that Kang and Oh [2023] is not scalable to those new complex problems, whereas Synthetiq still easily handles those in less than 4 seconds.
In Tab. 5, we focus on the questions taken from StackOverflow, for the more realistic objective of minimizing the use of the most expensive gate in the gate set.We compare Synthetiq results to the Table 6.Synthesis of common fully specified operators using Clifford+T.We denote Mosca and Mukhopadhyay [2021] as Mosca and Gheorghiu et al. [2022a] as Gheorghiu.Speedup is the ratio of the time taken by the other tool to the time taken by Synthetiq.Times were measured on 64 cores for Synthetiq and Gheorghiu, and on a single core for Mosca.[2023] as they could not find any of the circuits within a day.We find that Synthetiq outperforms the expert answer in two out of five cases, and matches it in the remaining three cases.Further, all results were found within a few seconds, confirming the usefulness of Synthetiq for quantum programmers.

Mode: Clifford+T Gate Set
We now compare Synthetiq to the state of the art for the well-studied problem of synthesizing fully specified quantum operators over the Clifford+T gate set.When optimizing T-count, the current state of the art is Mosca and Mukhopadhyay [2021], while for T-depth it is Gheorghiu et al. [2022a]; we provide a broader overview of existing tools in §7.
Overall, we find that Synthetiq is generally faster than both tools, and finds strictly better or equally good implementations compared to either of them.
Benchmarks.Tab. 6 shows the comparison of Synthetiq to both works on a benchmark of common quantum operators, which is based on the original benchmark of Mosca and Mukhopadhyay [2021].CCX, Adder, U 1 , and U 2 are taken directly from their benchmark, where U 1 is defined as CCX( , , ); CCX( , , ) and U 2 as CCX( , , ); CCX( , , ); CCX( , , ).We exclude the other 3-qubit operators from the original benchmark as they are affine equivalents of CCX and add the CCH operator to the benchmark instead.Based on U 1 , we additionally introduce U 1 var.which we define as CCX( , , ); CCX( , , ), allowing us to evaluate the sensitivity of all tools to simple changes of specifications.
We additionally show in Fig. 4 the comparison of Synthetiq with both tools on a benchmark of 3-qubits permutations, following Gheorghiu et al. [2022a] 11 .We built this benchmark by clustering all 40320 permutations on 3 qubits by Clifford equivalence12 , and picking one representative for each of the resulting 30 equivalence classes.
Results for Common Operators.We find that Synthetiq consistently finds the best implementation for each operator in the benchmark, outperforming Mosca and Mukhopadhyay [2021] in 33% of the cases and Gheorghiu et al. [2022a] in 66% of the cases.Further, Synthetiq finds these implementations faster than both tools in every example but one (CCH for T-depth).More importantly, Synthetiq does not time out (> 2 days compute) on any of the examples whereas Mosca and Mukhopadhyay [2021] and Gheorghiu et al. [2022a] do, showing that Synthetiq can handle more difficult problems than what could previously be done.We also note that Synthetiq is the first to automatically synthesize a T-depth 2 circuit for the Adder operator.
Results for Permutations.The results are shown in Fig. 4. For the largest eight operators.We added RCCX to the gate set for Synthetiq, following the procedure described in §6.2.Note that neither Mosca and Mukhopadhyay [2021] nor Gheorghiu et al. [2022a] allow for composite gates, and hence cannot be extended when used for complex operators.We find that Synthetiq significantly outperforms both tools.Synthetiq finds a better T-count than Mosca and Mukhopadhyay [2021] in 43% cases, including 27% where their tool times out.Further, Synthetiq is one order of magnitude faster on problems where Mosca and Mukhopadhyay [2021] does not time out.
For T-depth, Synthetiq finds more efficient circuits than Gheorghiu et al. [2022a] in 93% of cases, including 50% where Gheorghiu et al. [2022a] fails to find any.Excluding the cases where Gheorghiu et al. [2022a] times out, Synthetiq is two orders of magnitude faster than Gheorghiu et al. [2022a].

Mode: Approximate Circuit Synthesis
We compare Synthetiq with Gheorghiu et al. [2022b], the state-of-the-art method for approximate synthesis of multi-qubit operators in the Clifford+T gate set.Approximate synthesis is important as many operators cannot be implemented exactly with Clifford+T gates, but all can be approximated up to an arbitrary distance (see for instance [Nielsen and Chuang 2002, Chap. 4.5.3]).
Tab. 7 shows the results of this comparison on the benchmark from Gheorghiu et al. [2022b].We report results for all operators present in their evaluation, except for trivial operators with T-count less than 2 or for operators where neither tool reported any result. 13able 7. Results for approximate synthesis compared to [Gheorghiu et al. 2022b].We run Synthetiq and [Gheorghiu et al. 2022b] for an hour on 1 or 2 qubit tasks and for two hours on the 3 qubit task and report the best found circuit.Synthetiq is run on 64 cores and [Gheorghiu et al. 2022b] is run on one.We tried running the code of [Gheorghiu et al. 2022b]   For operators on one qubit, Synthetiq finds more efficient circuits than Gheorghiu et al. [2022b] for two out of six operators, despite their claim of optimality14 .Synthetiq is only outperformed once, when it fails to find any circuit.It is however several orders of magnitude slower than Gheorghiu et al. [2022b].For operators on two qubits, Synthetiq is five times faster than Gheorghiu et al. [2022b] and finds circuits as efficient as Gheorghiu et al. [2022b] does.Further, Gheorghiu et al. [2022b] could not synthesize the three qubits operator, whereas Synthetiq succeeds.

Mode: Relative Phase Operators
We now showcase the use of Synthetiq for relative phase operators, and their use in bigger circuits.We do so using the Adder implementation from Cirq [Cirq 2023].This implementation relies on the interleaving of three small operators: sum, carry, and uncarry (which uncomputes carry).To build an adder for two operands of qubits with ancilla qubits, this implementation uses of each of the three operators.As we explained in §5.2, relative phase operators can be used to replace their non-relative counterpart in a circuit when this counter-part is later uncomputed.We can hence replace the carry operator by a relative implementation, and uncarry by the inverse of the relative operator, without changing the semantics of the resulting Adder circuit.
We used Synthetiq to synthesize such relative operators.This yielded two circuits: one optimized for T-count and T-depth, and one optimized for CX-count and CX-depth, each synthesized in less than 1h.Using those relative operators, we built the complete Adder circuit for different number of qubits.Note that all Adder circuits, no matter the number of qubits, use the same carry and uncarry operators.We therefore only synthesized two operators, and were able to use them for all adder operators.
We show the resulting circuit performance in Fig. 5. Using the relative operators allowed for significantly more efficient circuits; with a reduction in T-count by 3.5, in T-depth by 2.3, in CXcount by 2 and in CX-depth by 1.5.This demonstrates the usefulness of relative phase operators, and the need for a synthesis tool that can easily synthesize such operators for any specification.

RELATED WORK
We now discuss works related to Synthetiq.
Clifford+T Synthesis.Meet-in-the-middle (MITM) algorithms have been extensively explored for synthesizing circuits with finite gate sets.The original implementation by Amy et al. [2013] ensures gate-depth or T-depth optimality but is much slower than other methods, taking over four days to find a T-depth optimal CCX circuit.It supports ancillae by treating specifications that allow for ancillae as isometries, but does not discuss extending it to element-wise specifications.A later iteration of MITM [Gosset et al. 2013] focuses on optimizing T-count rather than T-depth, but sacrifices the use of ancillae.Matteo and Mosca [2016] improved upon Gosset et al. [2013] by introducing a parallel framework, thereby reducing runtime.Overall, those three MITM algorithms are extremely slow.For instance, the most efficient among them, Matteo and Mosca [2016], requires approximately 30 seconds to execute on 4096 cores for the smallest operator we considered, which is the CCX gate.Due to these excessive runtimes, they were not incorporated in the evaluation conducted in §6.4.
The more recent Mosca and Mukhopadhyay [2021] and Gheorghiu et al. [2022a] further refined the original MITM algorithm, optimizing for T-count and T-depth, respectively.However, they lost the original algorithm's optimality guarantees and cannot deal with ancillae15 .It is worth noting that the former cannot be parallelized, and the latter gains only marginal benefits from parallelization.In §6.4,we demonstrate that Synthetiq outperforms both of those works, in terms of runtime and efficiency of the generated circuits.
Another work [Giles and Selinger 2013] suggests an algorithm to synthesize any circuit that can be exactly synthesized over Clifford+T.However, this work does not target efficient decompositions, instead often producing expensive ones.Niemann et al. [2020] implement and evaluate an improved version of this approach.Unfortunately, we were unable to compare their results to Synthetiq because we could not run their implementation and their publication does not report results on the circuits we consider here. 16ynthesis on Other Gate Sets.We now discuss methods capable of handling gate sets other than Clifford+T and compare their capabilities with Synthetiq.Kang and Oh [2023] recently proposed a new circuit synthesis method focusing on finite gate sets and provide a framework for specifying isometries in any basis.Synthetiq, on the other hand, naturally handles partial specifications like relative phase operators that cannot be specified as isometries in any basis.More importantly, we demonstrated in §6.3 that Synthetiq significantly outperforms Kang and Oh [2023] in all tasks, both 96:23 in terms of speed and efficiency of the generated circuits, even when restricted to a single core.Allowing Synthetiq to use multiple cores would only increase the performance gap further.Chou et al. [2022] suggested an evolutionary algorithm that incrementally modifies a circuit to meet a specification.Even though it exploits known aspects of an optimal CCX gate decomposition, its reported runtime on CCX is orders of magnitude higher than Synthetiq, namely 600s.Unfortunately, its implementation is not available, so we were unable to compare it to Synthetiq.17Its publication does not address parallelization, partial specifications, or ancillae.Further, it assumes incorrect definitions of gate depth and T-depth. 18Approximate Clifford+T Synthesis.Since not all unitaries can be implemented exactly in the Clifford+T gate set, some works have focused on implementing circuits up to a distance , where the distance can be measured using distances that allow for a global phase difference (e.g., [Gheorghiu et al. 2022b;Kliuchnikov et al. 2016]) or not (e.g., [Ross and Selinger 2016;Selinger 2014]).Most of these works [Kliuchnikov et al. 2016;Ross and Selinger 2016;Selinger 2014] focus on single qubit operators; only Gheorghiu et al. [2022b] considers multi-qubit operators.While the latter claims its algorithm produces optimal circuits, our experiments demonstrate its implementation is not optimal for all tasks.As shown in §6.5, Synthetiq performs similarly to Gheorghiu et al. [2022b] on the subdomain of -approximate Clifford+T synthesis.
Synthesis in Other Settings.In contrast to the finite gate sets assumed by Synthetiq, various works have studied synthesis using parametrized gate sets such as CX+Rot [Davis et al. 2019;Khatri et al. 2019;Meister et al. 2023;Smith et al. 2023;Younis et al. 2021].However, synthesis over CX+Rot relies on the optimization of the parameters in the rotational gates, which is not possible for finite gate sets such as Clifford+T.State preparation synthesis for the CX+Rot gate set has also been studied extensively, see e.g., [Araujo et al. 2021;Iten et al. 2016;Plesch and Brukner 2011].
A plethora of works synthesizes circuits for specific use cases.Various works decompose classical oracles into quantum circuits [Amy et al. 2017;Biswal et al. 2018;Green et al. 2013;Parent et al. 2015Parent et al. , 2017;;Rand et al. 2019], or help with this task [Bhattacharjee et al. 2019;Paradis et al. 2021].In contrast to these specialized algorithms, Synthetiq synthesizes general circuits.

CONCLUSION
We presented Synthetiq, a novel method and tool to synthesize quantum circuits over finite gate sets.Synthetiq is based on Simulated Annealing (SA) and allows us to solve a wide range of synthesis tasks from relative phase operators over Clifford+T to operators with ancillae over custom gates.
Our evaluation shows that Synthetiq (i) is able to synthesize more efficient implementations of relevant quantum operators, (ii) frequently outperforms more specialized synthesis tools such as synthesis for complete specification in the Clifford+T gate set, and (iii) can use relative phase operators to build more efficient implementations large qubit operators.
We believe there are many more applications of Synthetiq worth exploring, such as topologyaware synthesis [Davis et al. 2020] or incomplete specifications of operators in different bases.

DATA-AVAILABILITY STATEMENT
The implementation of Synthetiq and all evaluations results are available on github19 and Zenodo [Paradis et al. 2024].

A APPENDIX A.1 Energy Function Derivation
Here, we demonstrate our rewrite of the typical energy function used by e.g.[Chou et al. 2022;Khatri et al. 2019;Meister et al. 2023].The lemma follows by multiplying Eq. ( 10) with 1 2 +1 .We first note that for any matrix , ∈ C 2 ×2 , the following well-known properties hold:

A.2 Simplification Order
We define the order ≺ that is used in Alg. 2 to determine if two gates should be swapped in a circuit .The goal of ≺ is to ensure that as many gates as possible can be swapped and to define some strict order on the set of gates.
We therefore first define the functions comm , act and .comm ( , , ) is the number of consecutive gates that commute with starting from and going down until 1 .Mathematically, this means comm ( , , ) = max − 0 ⩽ ⩽ ∧ commutes with .
act ( ) is the number of qubits on which gate acts and ( ) is the indices of the qubits on which acts.We then define ≺ alpabet as the alphabetical order on the name of the gates (e.g.H, S, CX, ...) and ≺ as the standard order on R .Alg. 3 shows the definition of ≺.As shown, the order prioritizes a difference between commutating gates, than acting qubits, than the alphabetical order of the gates and finally the order of the qubits on which the gates act.

A.3 Baseline for Operators Decomposition
We describe our best effort construction of the baselines for Tab. 4. [Maslov 2016] gives the best decomposition of RCCCX we could find, with a T-depth of 8.
We were unable to find any explicit decomposition of CCT in published work.We therefore used the construction from [Maslov and the minimal implementation of the CT gate from [Amy et al. 2013], decomposing CCT( , , ) as RCCX( , , ); CT( , ); RCCX( , , ) where is an ancilla qubit.Altogether, this yields an implementation with T-depth 9 and 2 ancilla qubits, as CT requires its own extra ancilla.
For CCiSWAP, we used the same construction as above, with CiSWAP decomposed as in [Crooks 2023] instead of CT.This yields an implementation with 1 ancilla.
For C √ SWAP (resp.C √ iSWAP), we could not find a better existing implementation than controlling every gate in the best known decomposition of √ SWAP (resp.√ iSWAP).This yields for each of those two gates a decomposition with an ancilla and a T-depth higher than 25.

Fig. 3 .
Fig. 3. Comparison between Synthetiq and [Kang and Oh 2023] on the benchmark used by [Kang and Oh 2023].A time-out of 1 day was set for both tools, but Synthetiq always reached a solution within 1 hour.N/A means that the tool returned an empty solution set.
Fig. 4. Comparison between Synthetiq and previous works on 3-qubit permutation synthesis.Each bar represents one of the 30 evaluated permutations.Speedups are only included for cases where both tools return a result.Time-out set at 1h per example for all tools.

Table 1 .
Translating common tasks to incomplete specifications.

Table 2 .
Ablation study of Synthetiq.We report the average speedup of Synthetiq compared to Synthetiq with a specific component removed.

Table 3 .
Hyperparameter values used in Synthetiq.is the number of qubits (fixed by the operator specification) and ℓ the number of starting gates in a run (randomly sampled in [ℓ min , ℓ max ] for each run).
and are 1, and to | ⟩ | ⟩ otherwise.C

Table 4 .
New operator decompositions found with Synthetiq, using 64 cores.For each operator we gave Synthetiq the composite gate RCCX, except for CCiSWAP where we used CCX.Previous Best is the result of a best-effort search either found in previous work or using a standard decomposition as discussed in §6.2.

Table 5 .
Operators synthesis for several questions on StackExchange.Expensive gate count is the count of the most expensive gate (highlighted in bold).Time for Synthetiq is on 64 cores.[Kang and Oh 2023] was le out as it timed out (1 day) or returned an empty solution set for every problem.
on 64 cores, but this resulted in runtimes about 50 times slower.