Quantum Time-Space Tradeoffs for Matrix Problems

We consider the time and space required for quantum computers to solve a wide variety of problems involving matrices, many of which have only been analyzed classically in prior work. Our main results show that for a range of linear algebra problems -- including matrix-vector product, matrix inversion, matrix multiplication and powering -- existing classical time-space tradeoffs, several of which are tight for every space bound, also apply to quantum algorithms. For example, for almost all matrices $A$, including the discrete Fourier transform (DFT) matrix, we prove that quantum circuits with at most $T$ input queries and $S$ qubits of memory require $T=\Omega(n^2/S)$ to compute matrix-vector product $Ax$ for $x \in \{0,1\}^n$. We similarly prove that matrix multiplication for $n\times n$ binary matrices requires $T=\Omega(n^3 / \sqrt{S})$. Because many of our lower bounds match deterministic algorithms with the same time and space complexity, we show that quantum computers cannot provide any asymptotic advantage for these problems with any space bound. We obtain matching lower bounds for the stronger notion of quantum cumulative memory complexity -- the sum of the space per layer of a circuit. We also consider Boolean (i.e. AND-OR) matrix multiplication and matrix-vector products, improving the previous quantum time-space tradeoff lower bounds for $n\times n$ Boolean matrix multiplication to $T=\Omega(n^{2.5}/S^{1/4})$ from $T=\Omega(n^{2.5}/S^{1/2})$. Our improved lower bound for Boolean matrix multiplication is based on a new coloring argument that extracts more from the strong direct product theorem used in prior work. Our tight lower bounds for linear algebra problems require adding a new bucketing method to the recording-query technique of Zhandry that lets us apply classical arguments to upper bound the success probability of quantum circuits.


Introduction
Matrix computations are among the most fundamental computational problems and are critically important in areas such as numerical and scientific computing, optimization, and machine learning.If quantum computers can be shown to have a significant advantage over classical computations for these types of problems then it would open up a wide range of applications for such devices.
Prior work has shown that non-standard versions of matrix problems may indeed admit exponential or large polynomial quantum advantage: For any efficiently implementable operator M, the HHL algorithm of Harrow, Hassidim, and Lloyd [HHL09] (with the improvements of [CKS15]) can efficiently ϵ-approximate the value of x † Mx for the solution x of a well-conditioned linear system.However, it is worth noting that this algorithm requires the input to be presented in an unconventional format.
Many extensions of the HHL algorithm have also been proposed that can be elegantly described in the quantum singular value transform (qSVT) framework first described in [LC19] and popularized by [GSLW19].Despite initial hope of exponential speed-up, a series of papers by Tang and co-authors, and others (e.g.[Tan18, CGL + 20a, CGL + 20b, GST22, BT23, CCH + 22]) has shown that, by providing classical algorithms a comparable input format to the HHL algorithm, these quantum algorithms can be replaced by classical ones with only a polynomial blowup in the running time, although this polynomial is not always small.This body of work still begs the question: What is the conventional quantum complexity of standard classical problems like explicitly computing the linear-system solutions, multiplying or inverting matrices, computing matrix-vector products, and computing the low rank approximation of a matrix?
By the polynomial method, we know that computing a single inner product (or parity) of n-bit vectors requires Ω(n) quantum queries [BBC + 01] but linear algebra computations generally involve Ω(n) or Ω(n 2 ) such computations.Sherstov [She12], generalizing results of Klauck, Špalek, and de Wolf [KŠdW07] for the OR function, gave a strong direct product lower bound for quantum query complexity proved using the polynomial method, which proves strong lower bounds for inner products involving many disjoint input vectors.However, the matrix problems in linear algebra are very far from direct product problems: The vectors involved are highly correlated with each other, so this prior work does not shed light on the key question of whether quantum algorithms provide any advantage for general linear algebra.
In this paper, we resolve these questions for quantum computation of a wide array of linear algebra problems.We prove lower bounds for quantum computation that are asymptotically the same as the best classical lower bounds.Since many of the problems also have deterministic algorithms whose resource usage matches the lower bounds, our results show that there is provably no asymptotic quantum advantage at all in solving these linear algebra problems!
As with the study of classical computation involving super-linear time lower bounds, we consider quantum algorithms in which we limit the number of qubits of memory and hence produce quantum time-space tradeoffs.That is, for each fixed bound on the amount of memory allowed, we derive asymptotically the same time lower bound for the quantum algorithm as one would get for the time lower bound on classical algorithms with the same number of classical bits.In many ways, quantum memory is an even more important resource than classical memory since it is a measure of the maximum number of qubits that maintain coherence at any time during the algorithm's execution.For this reason the first general-purpose fault-tolerant quantum computers will likely have very limited memory and only be able to execute low depth quantum circuits.As such, it is crucial to consider both the time and space complexity for quantum algorithms.
We prove our lower bounds for quantum computation in a query model where algorithms are able to perform arbitrary input-independent unitary transformations on their state between quantum queries to their input.This is a sufficiently general model that our lower bounds also apply to any reasonable model of quantum computation-including quantum circuits where the (classical) input is stored in quantum-readable read only memory (QROM).
The keys to proving our time-space tradeoffs are new results proving much stronger lower bounds than strong direct product theorems for matrix-vector products and matrix multiplication.While our bounds have the same form as strong direct product theorems (the success probability decays exponentially with the number of outputs), they also apply with almost completely overlapping sets of inputs, in contrast to the disjoint inputs that are necessary to apply direct product theorems.
While there is a large body of work proving strong classical time-space tradeoffs (e.g.[Tom78, BFK + 79, Yes84, BC82, Abr90, Abr91, Bea91, MNT93]) and a large body of work analyzing unrestricted quantum query algorithms versus their classical randomized counterparts (e.g [DJ92, BV97, Sim97, BBC + 01, Amb02, ŠS05, Špa08, She11]), there are just a few previous papers that analyze the quantum memory required to make use of these quantum queries.Klauck, Špalek, and de Wolf [KŠdW07] extended the classical method of Borodin and Cook [BC82] for proving time-space trade-offs to quantum circuits using a new strong direct product theorem for quantum query algorithms computing the OR function.They showed that algorithms making T quantum queries and using S qubits of quantum memory require T = Θ(n 1.5 /S 1/2 ) to sort lists of length n, and require T = Ω(n 2.5 /S 1/2 ) to compute n × n Boolean matrix product.Ambainis, Špalek, and de Wolf [AŠdW09] extended this direct product approach to 2-sided error algorithms computing k-threshold functions which allowed them to produce similar trade-off lower bounds for systems of linear inequalities/equalities (though these have the drawback, unlike the other results, that the hard function for space S depends on the space bound).This approach, based on an extension of the adversary method using eigenspace analysis, was very difficult to apply.
As a result, further study of quantum time-space tradeoff lower bounds languished until it was enabled by an idea of Zhandry [Zha19] who, motivated by understanding quantum algorithms interacting with random function oracles, developed an approach to understanding quantum query algorithms using a compressed oracle and Fourier analysis.This views computations in a recording query basis that allows one to keep track of a quantum query algorithm as a superposition of basis states that have a natural classical query interpretation.It has been applied to finding multi-way collisions [LZ19] and to inverting a random permutation [Ros21].This greatly simplifies the analysis of quantum query algorithms and can be applied to many lower bound methods that use randomly chosen inputs rather than being limited to cryptographic applications.Extending Zhandry's approach, Hamoudi and Magniez [HM21] applied an even cleaner expression of the method, using phase oracles with the recording query basis rather than Fourier analysis, and extended it using biased random inputs to derive query lower bounds in a regime of exponentially small success probability.They used this to obtain time-space tradeoff lower bounds, proving that any quantum algorithm that finds K disjoint collisions in an input of length n with T quantum queries and S qubits of memory must have T = Ω(KN1/3 /S 1/3 ).They also re-derived the earlier sorting lower bound using this method.
Our linear algebra lower bounds and methods Time-space trade-off lower bounds for linear algebraic problems were among the first to be studied for classical computation [Yes84] after the first bounds for sorting.The strongest classical results are due to Abrahamson [Abr91] who developed a powerful general method based on matrix rigidity.This yields state-of-the-art lower bounds for computation of Fourier transforms, convolution, matrix-vector products, matrix multiplication, matrix inversion, matrix powering, and linear system solving.The lack of any analogous results for quantum computation has been a substantial gap in our understanding 1 .
Our results show that all of the linear algebraic time-space tradeoff lower bounds shown by Abrahamson [Abr91] also apply to quantum computation even when the quantum circuit can adaptively decide when to produce output based on the observed input.Since many of these classical lower bounds are tight, our results directly imply that there is no hybrid classical-quantum algorithms with a polynomial advantage for these problems unlike the query bounds for search and collision finding in [HLS22].Using the generic results in [BK23], we also prove asymptotically equivalent lower bounds on the stronger notion of quantum cumulative memory complexity for these problems.We include a table of our time-space tradeoff lower bounds in Table 1.
As discussed already, we need a much stronger lower bound method than any derivable from strong direct product theorems.We do this by the adding new ideas to the compressed oracle/recording query approach of Zhandry [Zha19] as extended and applied by Magniez and Hamoudi [HM21].Thus far, the compressed oracle method has used a two-step pattern: First, identify a notion of unusual progress of a quantum algorithm towards a solution (i.e., the partial information so far is more determinative of the answer than one might expect) and show that the total amplitude of states where this occurs is small, Second, show that the total amplitude of the quantum states where many outputs are produced without unusual progress can be bounded; this latter part has used ideas with classical analogues that can be applied by breaking the algorithm's final state into mutually orthogonal components, each with small amplitude on the correct answers.
However, in our case with linear algebra problems, there is no form of unusual progress and also no clear way to break up the problem into mutually orthogonal basis states.Thus, neither part of the pattern seems to work.Instead, we can use the recording query framework to characterize how much a quantum circuit can know about its input.We use the triangle inequality to bucket amplitude from the algorithm's state into a small number of non-orthogonal components (or buckets) that share some set of inputs that they know nothing about.We can then apply a classical argument showing that each component must have small amplitude on the correct answers.By finding a way to divide the state into a small number of buckets that each have small amplitude on correct answers, we can obtain tight lower bounds.The properties required of this division become more subtle as we move to the problem of matrix multiplication, where in order to get small amplitude, we need to contend with a partition featuring significantly more parts.

Problem
T = Ω(n 2.5 /S 1/4 ) Theorem 5.5 Boolean Matrix Squaring f (A) = A • A T = Ω(n 2.5 /S 1/4 ) Corollary 5.17 Table 1: Our quantum time space tradeoff lower bounds.Other than Boolean matrix multiplication, where [KŠdW07] shows a quantum advantage for the problem, all of these lower bounds match the tightest known classical lower bound.For the linear algebra problems, we assume that input elements come from some finite subset D of a field and let d = |D|.

Improved bounds for Boolean matrix operations
Here we improve the previous lower bound for quantum algorithms computing Boolean matrix multiplication given in [KŠdW07] from T = Ω(n 2.5 /S 1/2 ) to T = Ω(n 2.5 /S 1/4 ).We do this using a more sophisticated embedding of the k-fold direct product of OR functions into an arbitrary subset of k outputs of Boolean matrix multiplication.
The embedding hinges on the number of colors needed for a certain kind of coloring of subsets E of the n × n grid.The exponents of n and S in our lower bound are optimal for the general quantum circuit model to which it applies.
Our lower bounds also lead to improving the classical lower bound tradeoff of T = Ω(n 3 /S) for circuits shown in [KŠdW07] to T = Ω(n 3 /S 1/2 ).(In these bounds, T is circuit depth and S is circuit width.)Just as with our quantum lower bound, this has optimal exponents for n and S, achieving the goal of Klauck, Špalek, and de Wolf [KŠdW07] who suggested that T 2 S = Ω(n 6 ) was a likely tight tradeoff for classical computation of Boolean matrix multiplication.It is strictly larger almost everywhere than a classical lower bound of T = Ω(n 3 /S) for S ≤ n 0.5 and T = Ω(n 3.5 /S for S ≥ n for Boolean matrix multiplication on branching programs (a more general model than circuits) due to Abrahamson [Abr90] that is tight almost surely for input matrices whose entries are 1 with probability 1/ √ n independently.Finally, we make a small adjustment to convert the Boolean matrix-vector lower bounds and lower bounds for systems of inequalities given in [KŠdW07] and [AŠdW09], respectively, so that the problems that are shown hard for space S do not depend on S.

Preliminaries
We start with some basic facts and definitions.We define the binary entropy function Proposition 2.1 (Shannon).The number of subsets of [k] of size at most αk is at most 2 H 2 (α) k .Definition 2.2.An m × n matrix is (g, h, c)-rigid iff every k × w submatrix where k ≤ g and w ≥ n − h has rank at least ck.We call (g, h, 1)-rigid matrices (g, h)-rigid.
Matrix rigidity is a robust notion of rank and is an important property for proving time-space and cumulative complexity lower bounds for linear algebra.Fortunately, Yesha gives an explicit example of such a matrix and Abrahamson proved that there are many rigid square matrices.

Quantum circuit models
Throughout this paper, we consider quantum circuits that seek to compute target functions f : D n → R m .Let d = |D| and assume the existence of a bijective map ν : D → { 0, . . ., d − 1 } that gives us an ordering on the elements of D.
Unitary quantum circuits A T query quantum circuit is specified using unitaries U 0 , . . ., U T that are independent of the input to the problem.These unitaries define a sequence of quantum states |ψ X 1 ⟩ C , . . .|ψ X T ⟩ C that the algorithm enters during its execution on input X.We think of each state |ψ X t ⟩ C as a linear combination of basis vectors |i, p, w⟩ where i ∈ [⌈log 2 n⌉], p ∈ [d], and w ∈ {0, 1} * .With this decomposition we can define a query operator for input X = x 1 , . . ., x n as a unitary O X that performs the following operation: Where ω d is a d-th root of unity.Thus we can think of basis state |i, p, w⟩ as being composed of the index, phase, and work registers respectively.The state |ψ X t ⟩ C of the circuit after t queries to the input X is given by: . The output of the quantum circuit on input X is determined by taking |ψ X T ⟩ C and measuring the work register in the standard basis and then applying some input-independent post-processing function q to interpret the result as output τ ∈ R J where J ⊆ [m].
Oracle State Similar to [Amb02, Zha19, HM21], instead of hard-coding the input X into oracle O X , we define a general oracle operator O that interacts with input registers that start in state |ψ 0 ⟩ O .Given a distribution D over D n , we can make |ψ 0 ⟩ O = ∑ X∈D n Pr X ′ ∼D [X = X ′ ] |X⟩ to represent an input sampled from D. We define our oracle operator O as follows: We can extend the unitaries U 0 , . . ., U T from our definition of unitary quantum circuits to act as the identity on the input registers.After doing so, the joint state of the input and quantum circuit at the end of the computation is given by: Again, the work register of |ψ T ⟩ is measured and a post-processing function q is used to determine a partial assignment τ of outputs.The correctness of these outputs is then determined by measuring the input registers in the standard basis to obtain the input X and evaluating whether τ is consistent with f (X) which we denote by writing τ|| f (X).In general we can define the projector Π k such that: The probability that the circuit produces a correct partial assignment of at least k outputs is given by ∥Π k |ψ T ⟩∥ 2 .For a given partial assignment q(w) to some outputs, we can define Π q(w) to be the projection onto the values of |X⟩ where q(w)|| f (X).More specifically we have that: By construction when q always produces a partial assignment of at least k elements we have: |i, p, w⟩ ⟨i, p, w| ⊗ Π q(w) Space Bounded Quantum Computation Without loss of generality, we think of quantum circuits as starting in the all |0⟩ state and cycling between applying input queries O, arbitrary inputindependent computation U t , and intermediate measurements as in Figure 1.Adopting the notation of [BK23], we will consider the set of consecutive O, U t , and measurement gates as layer L t .The space of layer L t is the number of qubits that are passed from layer L t to L t+1 and is denoted S t .We define the space of a circuit as the maximum space of any layer, the time as the total number of layers, and the cumulative memory as the sum over all the S t .Intermediate measurements enable circuits to produce parts of their output early and discard unnecessary ancillary qubits.Some prior quantum time-space tradeoff lower bounds required the quantum circuit to declare which outputs are produced at each layer (e.g.sorting, Boolean matrix multiplication, and systems of linear inequalities [KŠdW07,AŠdW09]); however the recent collision-finding bounds in [HM21,HLS22] extend the output model for quantum circuits to include indicator qubits specifying which (if any) outputs are being produced at each layer.This allows them to prove lower bounds against quantum algorithms that dynamically decide when they want to produce outputs based on their observed inputs.While our Boolean matrix bounds build on those in [KŠdW07] and thus require a fixed time for each output bit, our linear algebra bounds work with this dynamic output model.
The time-space tradeoffs we prove in this paper will follow the Borodin-Cook method, and thus rely on dividing a quantum circuit into blocks that each are unlikely to produce many correct outputs.We use the unitary quantum circuits model to prove that these blocks cannot produce many outputs and then apply the results to our space bounded model using the differed measurement principle.After the first block, a quantum circuit will have some input-dependent state that can help it produce more outputs.Fortunately, a result by Aaronson lets us bound how much this initial state can amplify the success probability.

Proposition 2.5 ([Aar05]
).Let C be a quantum circuit, ρ be an S-qubit (possibly mixed) state, and π mix be the S-qubit maximally mixed state.If C starting in initial state ρ produces some output z with probability p, then C starting in state π mix will produce z with probability at least p/2 2S .
We will implicitly use this proposition to limit the power of the initial quantum state in the following way: Let p be an upper bound on the success probability of a quantum circuit without any qubits of input-dependent initial state.Assume that there existed a quantum circuit with S bits of input dependent advice that could succeed with probability q.Then by Proposition 2.5 there is a quantum circuit without input-dependent initial state that succeeds with probability p that is at least q/2 2S .Thus we know that q ≤ p2 2S .Therefore any quantum circuit with S qubits of initial state can succeed with probability at most p2 2S .

The recording query technique and quantum lower bounds
Here we review the methods developed in [Zha19,HM21] that allow us to analyze what a quantum circuit learns about its input by making quantum queries.We will assume that the input state starts in the equal superposition state over all inputs, although [Zha19,HM21] generalize this method to other input distributions.We can exchange the general query operator O with a recording query operator R that we define as follows: Definition 2.6 (adapted from [HM21]).Let S 1 be the unitary operator that maps Let S = (I) i,p,w ⊗ (S ⊗n 1 ) x 1 ,...,x n and O be the standard oracle operator that maps the basis state |i, p, w, x 1 , . . ., Then the recording query oracle operator R is defined as SOS.
S 1 introduces ⊥ as a new value for the input registers.Intuitively, the ⊥ symbol indicates that the algorithm does not know anything about that register of the oracle.Hence by adding and correctly manipulating the ⊥ symbols in the oracle's registers, it is able to record what the algorithm knows about the input.Since S 2 = I, we can exactly characterize how the states of quantum circuits with oracles O and R relate to one another.
Proposition 2.7 (Theorem 3.3 in [HM21]).Let C be a quantum circuit that for each j ≤ t applies unitary U j after the j-th query.Let S be the unitary operation and R be the recording query oracle from Definition 2.6.Let be the states of C with oracle O or R respectively.Then |ψ t ⟩ = S |ϕ t ⟩.
In other words, it is impossible to distinguish the final state |ψ T ⟩ of a circuit with standard oracle O from the output with recording oracle R if we apply S to the registers of R after the final query.Thus we can conclude that the success probability of a quantum circuit with T queries is given by ∥Π succ |ψ T ⟩∥ 2 = ∥Π succ S |ϕ T ⟩∥ 2 .Note that while |ϕ T ⟩ may have inputs in the ⊥ state, Proposition 2.7 tells us that S |ϕ T ⟩ will never have an input in the ⊥ state.This means that when considering recording query oracles, it is safe to keep our current definitions of Π succ and Π q(w) which will always project out any basis state where an input is assigned to ⊥.We will leverage the following property of |ϕ T ⟩ to bound the success probability of quantum circuits with at most T queries.
Proposition 2.8 (Fact 3.2 in [HM21]).The state |ϕ t ⟩ from Proposition 2.7 is a linear combination of basis states |i, p, w, x 1 , . . ., x n ⟩ where at most t of the x i are different from ⊥.
For the bounds in [HM21] it is essential to bound how the state of |ϕ⟩ O can change after each query.For our use of the recording query technique, this detailed analysis is not necessary.Nevertheless, we state the following proposition here for completeness.Proposition 2.9 (Lemma 4.2 in [HM21] fixed).Let d = |D|.If the recording query operator R is applied to a basis state |i, p, w, x 1 , . . ., x n ⟩ where p ̸ = 0 then the register |x i ⟩ X is mapped to (2) If p = 0 then none of the registers is changed.

Quantum matrix vector products
In this section, we consider the task of-for a fixed matrix A ∈ F m×n -computing the function f (x) = Ax for inputs x ∈ D m using a quantum circuit.We note that this is a fundamentally harder task than is considered in many quantum machine learning papers (for example [HHL09]) as we require the circuit to output a classical vector y ∈ F n rather than either a quantum state encoding the entries of y in the amplitudes or an estimate of y † My.Also unlike many prior quantum time-space tradeoffs, including sorting [KŠdW07, HM21, BK23] and boolean matrix multiplication [KŠdW07] (and our Theorem 5.5), our matrix vector product and matrix multiplication lower bounds apply to circuits that can adaptively decide when to produce each output based on the observed inputs.Time-space lower bounds against such quantum circuits were first described in [HM21] for the multiple disjoint collisions problem, although they were not able to show such a result for sorting.
Similar to [HM21] we are able to lower bound these circuits by identifying a single hard distribution over the inputs that applies to any set of outputs.

Success probability of small depth quantum circuits
We prove the following key lemma, which lets us bound the number of correct outputs produced by a shallow quantum circuit.
Lemma 3.1.Let A be any (k, h, c)-rigid m × n matrix over a finite field F and let f : D n → F m for D ⊆ F be defined by f (x) = Ax.Then for α > 0 and for input x sampled uniformly from D n and any quantum circuit C with at most αh queries to x, the probability that Note: For α ≤ 0.1717 we have 1 − α − H 2 (α) > 1/6 and hence the bound is at most ⌈h/(ck)⌉|D| −ck/6 for d ≥ 2.
Proof.Let d = |D|.For simplicity we will assume that q(w)-the output as a function of the measured value of the work register-always produces k outputs.2Let A be a (k, h, c)-rigid matrix.By Proposition 2.8 after t ≤ αh queries in the recording query oracle model, we can write the state as: for some α i,p,w,I,y with ∑ i,p,w,I,y |α i,p,w,I,y | 2 = 1.Thus by Proposition 2.7, the final state of the algorithm in the non-recording query oracle setting is given by: where Π q(w) is defined as in Equation (1) and is the projection of Π k onto fixed values of q(w).Since the basis states |i, p, w⟩ are orthogonal and ∑ i,p,w |β i,p,w | 2 = 1, we have We now fix i, p, w and let A q(w) be the submatrix of A restricted to the rows defined by the set of the k output values U associated with q(w).We can describe Π q(w) as a projection onto basis states |x 1 , . . ., x n ⟩ such that: Since the basis states |y⟩ I |⊥⟩ [n]\I for distinct I are orthogonal in the recording query basis, they remain orthogonal in the standard basis after the S operator is applied.However, the subsequent application of the Π q(w) projector makes these vectors no longer orthogonal.
To handle this, we bucket the sets I ⊆ [n] with |I| ≤ t into a small number of buckets, B 1 , . .., so that for each bucket B ℓ we can bound: In particular, our key observation is that if a bucket of recording query basis states completely misses querying a fixed set of input variables that could completely scramble the value of a set of r output values, then one cannot do better than randomly guess those output values.More precisely, we show that the contribution to success from that bucket of basis states has amplitude at most 1 √ d r .Lemma 3.2.Let U ⊆ [m] be a set of output indices and V ⊆ [n] be a set of input indices with |V| = |U| = r such that the submatrix A U,V is full rank.Fix q ∈ F U and define Π q to be the projection map onto the span of the set of basis states |x 1 , . . ., x n ⟩ with x 1 . . .x n ∈ D n such that A U x = q.Then for any collection B of sets I ⊆ [n] \ V and any quantum state ∑ I∈B, y∈D I γ I,y |y⟩ I |⊥⟩ [n]\I we have Proof.By definition each I ∈ B satisfies I ∩ V = ∅, so For each value of z ∈ D [n]\V , since the sub-matrix A U,V is invertible, there is a unique value Next we decompose the set of all I with |I| ≤ t into buckets so that we can apply the above.
Lemma 3.3.Let A be a (k, h, c)-rigid matrix and let k ′ = ⌈ck⌉.Then for every subset U of k rows of A, there is a collection of disjoint k ′ -subsets of columns from [n], V 1 , . . ., V ℓ for ℓ = ⌈h/k ′ ⌉ ≤ ⌈h/(ck)⌉ and corresponding sets of rows U 1 , . . ., U ℓ ⊆ U such that for each j ∈ [ℓ], the k ′ × k ′ submatrix A U j ,V j is full rank.(In particular the union, W, of the sets V j has size at least h.)If c = 1 then all U j = U.
Proof.Fix U ∈ [m] with |U| = k.The following procedure constructs such a collection, one set at a time.We maintain a subset of W columns that is the union of the V j constructed so far.Suppose that |W| < h.Then, by the (k, h, c)-rigidity of A, the submatrix A U,[n]\W has rank at least k ′ .Hence there is a k ′ × k ′ submatrix A U j ,V j of A U,[n]\W that has full rank k ′ .We now add V j to the collection of k ′ -sets of columns, record its corresponding row set U j , and set W ← W ∪ V j .This produces exactly ⌈h/k ′ ⌉ subsets.
Fix the collection of sets V 1 , . . ., V ℓ given by Lemma 3.3.Let These sets have two useful properties: first any subset of [n] with size at most αh must miss some V λ j and second if the entries of x corresponding to some V λ j are uniformly random, then for any set of k indices in Ax, at least c(1 − α)k of these values are also uniformly random.
Lemma 3.4.For t ≤ αh and every I ⊆ [n] with |I| ≤ t, there is some j ≤ ⌈h/k ′ ⌉ and λ ∈ ( Proof.Fix such a set I with |I| ≤ t.Since t ≤ αh, | j∈[ℓ] V j | ≥ h, and the sets V j are disjoint, by averaging there is some set V j that has at most an α fraction of its elements in I. Hence V j has at most k ′′ ≤ αk ′ elements of I. Choose a set λ ∈ ( [k ′ ] k ′′ ) that contains the indices within V j of all of the elements of V j ∩ I. Then by construction I ∩ V λ j = ∅.
By applying Lemma 3.4 we can associate each I ⊆ [n] with |I| ≤ t with a pair (j, λ) such that I ∈ [n] \ V λ j and define bucket B λ j to consist of all such sets I associated with pair (j, λ). 3 Further, define a set is full rank.Such a subset of rows must exist since A U j ,V λ j is a full rank matrix.Then let be the portion of the assignment q(w) on the rows of U λ j .We are now ready to provide an upper bound on the success probability from Equation (4). (5) , and B = B λ j , we have that and hence using Equation (5) we obtain that Without loss of generality in our desired bound we can assume that 2 H 2 (α) /d (1−α) < 1. Therefore the bound still applies when we replace k ′ by the potentially smaller ck which is what we needed to show.

Matrix-vector product time-space tradeoffs and related lower bounds
Theorem 3.5.Let m be n O(1) .Let A be an m × n matrix over a field F that is (g(m), h(n), c)-rigid for c ∈ (0, 1/2].Then any quantum circuit using time T and space S that computes a function f : D n → F m for D ⊆ F with d = |D| given by f (x) = Ax with success probability larger than 2 −S requires that T is Ω(g(m) h(n) log(d) /S); more precisely, T must be Ω(min{g(m) n log d, m h(n) log d}/S).
Proof.First observe that since S ≥ log 2 n and T ≥ n we know that T • S is Ω(n log n) which is Ω(g(m) n log |D|) if g(m) < (12/c) log d n.Therefore we can assume without loss of generality that g(m) ≥ (12/c) log d n.
Let C be a quantum circuit with T queries and space S, write h = h(n), g = g(m), and let α = 0.1717.We partition C into ⌈T/(αh)⌉ sub-circuits that each have at most αh queries.By combining Proposition 2.5 and Lemma 3.1, we know that each sub-circuit can produce k ≤ g correct outputs with probability at most 2 2S ⌈h/(ck)⌉ d −ck/6 ≤ h 2 2S d −ck/6 .Now suppose that h 2 2S d −cg/6 > 2 −S /T.Then T2 3S > d cg/6 /h ≥ d cg/6 /n ≥ d cg/12 by the assumption on g.Since S ≥ log 2 n and T is at most polynomial in n (or the bound applies already), T2 3S is at most 2 c ′ S for some constant c ′ > 0. This implies that S is Ω(g(m) log d) and since T ≥ n, we get that T • S is Ω(g(m) n log |D|) as claimed.
Otherwise set k ≤ g to be the smallest integer such that h 2 2S d −ck/6 ≤ 2 −S /T.Then the probability that a sub-circuit produces k correct outputs is at most 2 −S /T.This gives Taking a union bound over the sub-circuits, the probability that any of them produces k correct outputs is at most 2 −S .Since f has m outputs, this means that Plugging in our upper bound on k we have that 2c * TS/ log 2 d ≥ αmh and hence T • S is Ω(mh log d) which is Ω(m h(n) log |D|) as claimed.
Following the same arguments as for classical computation [Abr91], we obtain a collection of time-space lower bounds for problems that are closely related to matrix vector products.Our proofs are identical to their classical counterparts proven in[Abr91, Sections 5-6] and are duplicated here for completeness.
Corollary 3.6.Let F be a field and D ⊆ F such that d = |D|.Any quantum circuit that computes the discrete Fourier transform (DFT) of vectors in D n in time T and space S with probability at least 2 −S requires T to be Ω(n 2 log(d) /S).
Proof.Applying Theorem 3.5 with the rigidity of the DFT from Proposition 2.3 directly gives us the lower bound.
Corollary 3.8.Let F be a field and D ⊆ F such that d = |D|.Computing the convolution of two vectors in D n in time T and space S with probability at least 2 −S requires T to be Ω(n 2 log(d) /S) Proof.For simplicity assume that n is even.Let Where A, B, C and D are n/2 × n/2 submatrices.Then Uv is the convolution between vectors u and v. Observe that U is a Toeplitz matrix and by picking u to be a uniform vector over D, Proposition 3.7 tells us that for sufficiently large n, there is a constant γ ∈ (0, 1/2) such that both A and B are (γn, γn/2)-rigid with probability at least 1/2.This lets us restrict our input to such choices for u and observe that the matrix U ′ = A B is (γn, γn/2)-rigid, so Theorem 3.5 gives us that computing U ′ requires T that is Ω(n 2 log(d) /S).Since U ′ is a subfunction of U, convolution also requires T that is Ω(n 2 log(d) /S).
Corollary 3.9.A quantum circuit that multiplies two n bit binary numbers in time T and space S with probability at least 2 −S requires T to be Ω(n 2 /(S log 2 n)).
Proof.Let u, v be arbitrary vectors over F 2 .Define the binary number and similarly define v ′ .Then observe that the product u ′ • v ′ contains all entries of the convolution between u and v encoded in blocks of ⌈log 2 n⌉ bits each.By Corollary 3.8 this requires T to be Ω(n 2 /(S log 2 n)).
Where ⊗ is the standard tensor (Kronecker) product.
Corollary 3.12.Let F be a field and D ⊆ F such that d = |D|.Any quantum circuit that computes the product ABC on inputs A, B, C ∈ D n×n in time T and space S with probability at least 2 −S requires T that is Ω(n 4 log(d) /S).
Proof.We use Proposition 3.10 to view this as a matrix-vector product problem where B is the input and Y is the output.By Proposition 2.4 there is a constant γ ∈ (0, 1/2) such that both A and C are γ rigid with constant probability, so we can assume such without increasing the expected cost by more than a constant factor.Then Proposition 3.11 gives us that A ⊗ C is (γ 2 n 2 , γ 2 n 2 , γ 2 )-rigid and we can apply Theorem 3.5 to get that T must be Ω(n 4 log(d) /S) as desired.
Corollary 3.13.Let F be a field and D ⊆ F such that d = |D|.Any quantum circuit that computes A 3 on inputs in D n×n in time T and space S with probability at least 2 −S requires T that is Ω(n 4 log(d) /S).
Proof.Let A, B, C ∈ D n×n .Then construct the 4n × 4n matrix: Observe that the top right n × n sub-matrix of M 3 is equal to the product ABC.Thus we get a reduction to matrix-matrix-matrix product and can apply Corollary 3.12 to get our lower bound.
Corollary 3.14.Let F be a field and D ⊆ F such that d = |D|.Any quantum circuit that computes A −1 on inputs in D n×n in time T and space S with probability at least 2 −S requires T that is Ω(n 4 log(d)/S).
Proof.Let A, B, C ∈ D n×n .Then construct the 4n × 4n matrix: Where I is the n × n identity submatrix.Then observe that M −1 has the product ABC as its top right n × n submatrix.We can again use Theorem 3.5 to get our lower bound.
Corollary 3.15.Let F be a field and D ⊆ F such that d = |D|.Any quantum circuit that solves any n × n system of linear equations over D in time T and space S with probability at least 2 −S requires T that is Ω(n 3 log(d) /S) Proof.It is possible to invert a matrix by solving n systems of n linear equations.By a reduction Corollary 3.14 gives us that solving these equations requires T that is Ω(n 4 log(d) /S).Thus least one of these equations must require T that is Ω(n 3 log(d) /S) to solve.
In [BK23] the authors showed that the kinds of quantum time-space product lower bounds we proved in this section can be extended to asymptotically equivalent lower bounds on the stronger notion of cumulative memory complexity.We restate a simplified version of their main theorem for quantum and classical circuits here.

Proposition 3.16 ([BK23]
).Let f : D n → R m be a function such that there exists constant C, functions m and a distribution µ over D n where when x ∼ µ the probability that -for any k ≤ m ′ (n) -any quantum (or classical) circuit with at most h(k, n) queries to x produces k correct outputs of f (x) with probability at most C • K(n) −k .Then for any constant c > 0, any quantum (or classical) circuit that computes f with T queries and error ϵ ≤ (1 − 1/(2T c )) must have cumulative memory that is: Using the above result, we can extend the quantum time-space product lower bound for matrix vector products to a matching quantum cumulative memory lower bound.Theorem 3.17.Let γ > 0 and c ∈ (0, 1/2] be fixed.If A is a (γn, γn, c)-rigid n × n matrix over a field F then any quantum circuit using time T and space S that computes the function f : D n → F n for D ⊆ F with d = |D| given by f (x) = Ax with success probability larger than 1/T requires cumulative memory that is Ω(n 2 log d).
Proof.By Lemma 3.1 we can apply Proposition 3.16 where and µ is the uniform distribution.This give us that any quantum circuit computing f with T queries and error at most 1 − 1/(2T) requires cumulative memory Ω(n 2 log d) as desired.
Directly applying this in place of Theorem 5.5 gives us matching cumulative (CM) memory lower bounds for Corollary 3.6 through Corollary 3.15.
Corollary 3.18.Let F be a field and D ⊆ F such that d = |D|.Any quantum circuit with inputs over D that computes the DFT or vector convolution requires CM that is Ω(n 2 log d).Any quantum circuit that computes the product of three matrices, matrix cubing, or matrix inversion requires CM that is Ω(n 4 log d).Any quantum circuit that solves n × n systems of linear equations requires CM that is Ω(n 3 log d).Additionally any quantum circuit that multiplies two n bit binary numbers requires CM that is Ω(n 2 / log 2 n).

Quantum matrix multiplication
While many of the applications so far, including the matrix triple product lower bound discussed in the previous section, are derived from the matrix-vector product lower bound, our matrix multiplication lower bound requires a separate argument using ideas from the classical lower bound for the problem in [Abr91].Implementing this requires a much more subtle way of applying our bucketing method for states that allows us to concentrate on just a subset of the buckets containing most of the total amplitude and ignore the others.As in Section 3, our lower bounds in this section apply to a more general model of quantum circuits that can decide which outputs they want to produce in a given layer based on the inputs that they have queried.

The success probability of small depth quantum circuits
Lemma 4.1.Let γ ∈ (0, 1/2) and f : D n 2 × D n 2 → F n 2 for D ⊆ F with |D| = d be defined by f (A, B) = AB.Then for any constant β > 0 and quantum circuit C with at most h = βγn √ k/2 queries to input matrices A, B sampled uniformly from D n 2 , the probability that A and B are (γn, γn)-rigid and C produces k correct output values of f (A, B) is at most 16 min(k, n) Note that for β ≤ 0.0429 we have 1 − 4β − H 2 (4β) > 1/6 so the bound is at most 16 min(k, n) Proof.Let C = AB, Π rigid(A) (Π rigid(B) ) be the projection onto inputs where A (B) is a (γn, γn)-rigid matrix, and define Π rigid = Π rigid A Π rigid B .Assume that q(w)-the output as a function of the measured value of the work register-produces exactly k outputs; we ignore anything it produces after the first k.We will use [A] to denote the set of indices of elements in A and likewise for [B] and [C].By Proposition 2.8, after t ≤ h queries in the recording query basis, our state can be written as: for some α i,p,w,E,F,x,y with ∑ i,p,w,E,F,x,y |α i,p,w,E,F,x,y | 2 = 1.We first apply analogous series observations and decompositions to those that allowed us to derive (4) from (3) in the case of matrix-vector product.By Proposition 2.7, we note that the final state of the algorithm in the standard oracle setting is given by: Because S behaves as the identity on |ψ⟩ C and each distinct choice of |i, p, w⟩ gives an orthogonal basis state, this equals: for some β i,p,w and β i,p,w E,F,x,y such that ∑ i,p,w |β i,p,w | 2 = 1 and ∑ E,F,x,y |β i,p,w E,F,x,y | 2 = 1 for each i, p, w.Now the probability over the choices of the input matrices and the result of the quantum algorithm making t queries that the matrices A and B are both (γn, γn)-rigid and the algorithm produces k correct output values from C = AB is at most: For the rest of the proof we fix an i, p, w to achieve the maximum value in Equation ( 7) and prove a upper bound on the resulting probability.This fixes the output values q(w); we write G ⊆ [C] with |G| = k for the set of indices of the outputs given by q(w).To keep notations simpler in the remainder of the proof we observe that Equation ( 7) is upper bounded by the maximum of over all β E,F,x,y with ∑ E,F,x,y |β E,F,x,y | 2 = 1, all sets G ⊆ [C] with |G| = k and all assignments q(G) to G.
We will split the sum in Equation ( 8) over the different sets E and F of queried input indices depending on how they relate to the set of output indices given by G. Let r(G) be the set of rows containing elements of G and c(G) be the set of columns containing elements of G.
We define a light row of E to be an element of r(G) that contains at most βγn elements of E and define a light column of F to be an element of c(G) that contains at most βγn elements of ∈ L(E), j ′ / ∈ L ′ (F)}| ≤ k/2 so at least k/2 elements of G are in light rows of E or in light columns of F. Therefore for every pair (E, F) at least one of the sets of outputs  (E, F) with E ∈ E and F ∈ F. The analyses of the two cases are completely symmetric up to matrix transposition.It will be convenient to focus on the case F ∈ F that there are many outputs of G in light columns and compute an upper bound on The case that E ∈ E has exactly the same upper bound as Equation ( 9) by applying the argument to the transposed product B T A T and corresponding transposed sets F T , E T , and G T .Hence, the quantity in Equation ( 8) is at most 4 times that of Equation (9).To upper bound Equation (9), we first remove the projection operator Π rigid B from Π q(G) Π rigid = Π q(G) Π rigid A Π rigid B to get Π q(G) Π rigid A .We then rewrite this combined projection operator as where Π A is the projection onto the specific matrix A and for each A, Π A q(G) is the projection onto the choices for matrix B such that C = AB agrees with q(w).We therefore obtain that Equation ( 9) is at most for some β A and β A F,y such that gives orthogonal states so Equation (10) equals We fix a (γn, γn)-rigid matrix A that maximizes (11).We now partition the set F based on the set L ′ (F) which contains all but precisely √ k/2 columns in c(G).Therefore we can rewrite (11) as Since the different choices of F, and hence different choices of H, correspond to orthogonal basis states, we can upper bound (12) by We fix the set H achieving the maximum value in Equation ( 13) which fixes the value of L ′ (F) = c(G) \ H.This fixes the set G c L ′ (F) of elements in G that are in light columns of F (equivalently, not in H) which, since F ∈ F, contains at least k/4 elements of G. Let G ′ be a fixed subset of k/4 of the elements of G c L ′ (F) .By construction we have c(G ′ ) ⊆ L ′ (F).By only requiring that the outputs in G ′ are correct and using the fact that |c(G)| ≤ min(k, n), we therefore can upper bound Π k Π rigid S |ϕ t ⟩ 2 by the maximum value of , let k j be the number of elements of G ′ in column j.Our overall strategy is to consider the j ∈ c(G ′ ) one by one, and show that the total amplitude on states where these k j outputs are correct conditioned on the success for previous values of j is of the form d −δk j for some fixed constant δ > 0. These are k j outputs of the matrix-vector product Ay j where y j is the j-th column of B and that fact that c(G ′ ) ⊆ L ′ (F) implies that F has made at most βγn queries to y (j) .This is very similar to the situation with the matrix-vector problem from Lemma 3.1.In analogy with the Lemma 3.1, we define U j to be the set of k j rows containing outputs of G ′ in column j.
Using the ideas of Lemma 3.1 we could bucket the possible quantum states into one bucket for each tuple (V j i ) j∈q(G ′ ) using Lemmas 3.2 and 3.3 and bound each bucket separately.However, unlike Lemma 3.1, the value of many of the k j can be very small, as low as 1, in which case the upper bounds using Lemmas 3.2 and 3.3 would yield a probability bound larger than 1.
Instead, we need a stronger argument to show that, except for an exponentially small amount in k, all of the amplitude can be allocated to a very small number of buckets.The following lemma gives the inductive step that allows us to define those buckets.Rather than thinking about each column j ∈ c(G ′ ) as separate matrix-vector problems, it works by considering all of the answers in G ′ at once.
Proof.We first recall the definitions in our discussion preceding the lemma statement.For each j ∈ c(G ′ ), define U j to be the set of row indices of G ′ in column j and let k j = |U j |; let ℓ j = γn/k j and V j 1 , . . ., V j ℓ j be the collection of disjoint subsets of [n] of size k j given by Lemma 3.3 such that each k j × k j sub-matrix A U j V j i has full rank.
For each F ∈ F ′ and i ∈ c(G ′ ), define F j to be the set of row indices of elements of F in column j Since ∑ F,y |δ F,y | 2 = 1, m j i can be viewed as the expected size of the overlap between the recorded queries in the j-th column of the matrix B and each V j i .Since for each j, the sets V j i are disjoint and |F j | ≤ βγn we have ∑ i∈[ℓ j ] m j i ≤ βγn.Therefore, for each j, we have some index , the expected total overlap between the recorded queries in the columns of G and the chosen sets We split our analysis for F ′ into two parts due to sets F in F ′′ and F ′ \ F ′′ , respectively.We begin with We now consider F ′ \ F ′′ .By definition, for F ∈ F ′ \ F ′′ , we have we define a bucket B V ′ that contains sets F that must the elements of V ′ and assign each F ∈ F ′ \ F ′′ to a unique bucket in an arbitrary fixed way.There are at most 2 H 2 (α)k/4 such buckets.Then where we first used the triangle inequality followed by Jensen's inequality.Now, applying the S ⊗n 2 1 operator in (16) will convert the |⊥⟩ V ′ to a uniform superposition of all |y ′ ⟩ V ′ for all y ′ ∈ D V ′ and convert We now consider the application of For each j, the outputs in U j × { j } ⊂ [C] can be expressed as the matrix-vector product , there is precisely one value of y j V ′ j that will yield the output values q(U j × { j }).Therefore, putting the properties for the columns of c(G ′ ) together, there is precisely one value y ′ ∈ D V ′ that will yield the output values q(G ′ 0 ).Therefore, (17) is at most where the last equality follows since the buckets B V ′ partition F ′ \ F ′′ .We now combine the contributions from F ′′ and F ′ \ F ′′ .Applying Jensen's inequality together with the bounds in (15) and ( 18) we obtain that Proof.Let M be the maximum value of 2 over all choices of F ′ and δ F,y with the required properties.This corollary follows from Lemma 4.2 with C ′ = 4 by observing that the term multiplied by 2/C ′ is also upper bounded by M and hence Finally, plugging the bound from Corollary 4.3 into (14), we obtain that the probability that A and B are both (γn, γn)-rigid and C produces k correct output values for as desired.

Matrix multiplication time-space tradeoff lower bounds
Here we consider the matrix multiplication problem f (A, B) = AB where both A and B are considered input.If we could fix a choice of A, we would be able to make our proof somewhat simpler.However, as Abrahamson pointed out in [Abr91], there is a classical algorithm that can compute the function f (B) = AB for any fixed matrix A in O(n 2 ) time and O(n log d) space.Thus our lower bound requires both A and B to be inputs to the function.
Theorem 4.4.Let F be a field and D ⊆ F with d = |D|.Then any quantum circuit C that uses time T and space S and computes the function f : D 2n 2 → F n 2 given by f (A, B) = AB with success probability larger than 1/T must have T that is Ω(n 3 log d /S).
Proof.Let γ ∈ (0, 1/2) be the constant given by Proposition 2.4.By that proposition, the probability that either of two matrices A and B chosen uniformly randomly from D n 2 is not (γn, γn)-rigid is at most 2d −1 (2/3) γn .Let C be a quantum circuit with T queries and space S. Let β = 0.0429, d = |D|, and set k = ⌈48(5S + 5)/ log 2 d⌉ .We partition C into T/(βγn √ k/2) sub-circuits that each have at most βγn √ k/2 queries.Without loss of generalities there are at most n 2 such sub-circuits.By combining Proposition 2.5 with Lemma 4.1, we know that for a uniformly random input, the probability that A and B are (γn, γn)-rigid matrices and a fixed sub-circuit can produce k outputs is at most 16k . Therefore the probability that A and B are (γn, γn)-rigid matrices and one of the sub-circuits produces k correct outputs is at most 16k Combining this with the probability that one of A or B is not (γn, γn)-rigid, the probability that there is a sub-circuit that produces k correct outputs is at most 16k Since we can assume without loss of generality that T ≤ n 3 , for sufficiently large n, 2d −1 (2/3) 2γn ≤ 1/(2T) and k √ k/2 ≤ 2 k/48 ≤ d k/48 .Plugging in our value of k and the fact that S ≥ log 2 n without loss of generality gives a probability of at most 16k Since C must be correct with probability larger than 1/T, this implies that Plugging in our value of k gives us that T is Ω(n 3 log d/ S + log T).
Since S ≥ log 2 n and our bound trivially holds when T is ω(n 3 log d) there is a constant c > 0 such that cS ≥ log 2 T.This implies that T is Ω(n 3 log d/S) as desired.
Corollary 4.5.Let F be a field and D ⊆ F with d = |D|.If C is a quantum circuit that computes the function f : D n 2 → F n 2 where f (A) = A 2 on all upper triangular inputs in time T and space S with success probability at least 1/T, then T must be Ω(n 3 log d /S).
Proof.Let A, B ∈ D n 2 and construct the 3n × 3n matrix: Since the top right n × n sub-matrix of M 2 is equal to the product AB, we get a reduction from matrix multiplication and can apply Theorem 4.4 to derive the lower bound.
Using Proposition 3.16 we can also bound the cumulative memory complexity for these problems.
Corollary 4.6.Let F be a field and D ⊆ F with d = |D|.If C is a quantum circuit that computes the function f : D 2n 2 → F n 2 given by f (A, B) = AB or the function g : D n 2 → F n 2 given by f (A) = A 2 , then C must have cumulative memory complexity Ω(n 3 log d /S).
Proof.For f , we apply Proposition 3.16 with Lemma 4.1 where . This gives us that the cumulative memory complexity is Ω(n 6 log(d) /T).Using the same reduction as in Corollary 4.5, this same lower bound applies to computing g.

Quantum Tradeoffs for Boolean Matrix Operations
In this section we focus on Boolean matrix operations, which use (∨, ∧) inner product of vectors rather than the usual (+, ×) inner product.We denote this Boolean inner product of vectors u and v by u • v and extend this notation to Boolean matrix-vector product and Boolean matrix multiplication.For u, v ∈ { 0, 1 } n , u • v = 1 if and only if the subsets of [n] encoded by u and v intersect, so the problems of computing Boolean matrix multiplication and Boolean matrix-vector product can be seen as computing many correlated copies of the set disjointness problem.

Tradeoffs for Boolean matrix multiplication
Unlike what we have shown for algebraic problems, quantum algorithms for Boolean matrix multiplication have better time-space tradeoff properties than their classical counterparts.Proposition 5.1.For any c > 0, there are quantum circuits computing n × n Boolean matrix multiplication A • B with error at most n −c using space O(log n) and a number of queries T that is O(n 2.5 log n).
Proof.Fix c > 0. Each of the n 2 entries in the product is a disjointness function of length n that can be computed with error at most n −c−2 and space O(log n) using Grover's algorithm in time O( √ n log n) for error at most n −c overall.This is in contrast to the following result of Abrahamson which shows that classical algorithms as fast as this quantum algorithm require space Ω(n 0.5 ) rather than O(log n).

Proposition 5.2 ([Abr90]
).There is a probability distribution on input matrices and constants 0 < c 1 < c 2 under which the best classical algorithms (branching programs) for Boolean matrix multiplication A • B using time T and space S require T • S that is Θ(n 3.5 ) for T ≤ c 1 n 2.5 Θ(n 3 ) for T ≥ c 2 n 2.5 .
For quantum circuits, Klauck, Špalek, and de Wolf [KŠdW07] proved the following time-space tradeoff lower bound which proves that the quantum algorithm in Proposition 5.1 is nearly optimal when the space S is O(log n).
Proposition 5.3 (Theorem 25 in [KŠdW07]).Any bounded error quantum circuit that computes the n × n Boolean matrix multiplication A • B with T queries and space S requires T 2 S to be Ω(n 5 ), or equivalently that T is Ω(n 2.5 /S 0.5 ).
A key difference between the methods used in Abrahamson's bounds and those in this proof is that for quantum (and classical) circuits, unlike the case for branching programs, it is reasonable to assume that the set of output values produced in each part of the computation is fixed independent of the input.Such an assumption was essential for the time-space lower bounds in [KŠdW07,AŠdW09], although the bound for multiple disjoint collision pairs in [HM21] and our results in Sections 3 and 4 apply to quantum circuits without such a restriction on output production.Fixing the output values produced in each part of the computation allows one to go beyond using a single hard distribution on inputs, and instead choose hard distributions for each part of the computation depending on the target outputs.To give a sense of how this works we sketch the lower bound method of [KŠdW07] for Boolean matrix multiplication, which relies on a strong direct product lemma for the function OR k n : Proposition 5.4 (Strong Direct Product Theorem for OR k n [KŠdW07]).There are positive constants ε and γ such that the following hold: (a) Any randomized algorithm making at most εkn queries has success probability at most 2 −γk in computing OR k n .
(b) Any quantum algorithm making at most εk √ n queries has success probability at most 2 −γk in computing OR k n .
Proof sketch for Proposition 5.3.For any integer k ≤ n/2, the function OR k ⌊n/k⌋ can be embedded in any set E ⊆ [n] × [n] of k outputs of the n × n Boolean matrix product A • B as follows: Begin by dividing [n] into k blocks b 1 , . . ., b k each of size ⌊n/k⌋ (together with at most at most k − 1 other elements) and associate each (i, j) ∈ E, with a unique index ℓ = ℓ(i, j) ∈ [k].For each (i, j) ∈ E, for ℓ = ℓ(i, j) set every entry in A i,b ℓ to 1 and set the vector of inputs in B b ℓ ,j to the ℓ-th block of the input to OR k ⌊n/k⌋ .Set all other bits in A and B to 0. It is easy to see that the k outputs indexed by E will be the outputs for k disjoint OR functions on ⌊n/k⌋ bits.
Without loss of generality one can assume that the space bound S is at most αn for some small constant α > 0 since the number of queries must be Ω(n 2 ) in the worst case 4 .Choose k = cS for some suitably large constant c that depends on the constant γ in Proposition 5.4.Begin by slicing the circuit into layers of ε √ kn queries each.There are Θ(T/ √ kn) such layers.By Proposition 5.4 and the embedding, any circuit of depth ε √ kn = εk √ n/k queries can produce k correct outputs with probability only 2 −γk for some γ > 0. This is the same depth as each of the layers but each layer also gets an S qubit input-dependent state to begin.By Proposition 2.5, the probability that the resulting layer can produce k correct outputs is at most 2 2S 2 −γk which is at most 2 −S if the constant c used in defining k is sufficiently large.

Our improved lower bound
Theorem 5.5.Any quantum circuit computing n × n Boolean matrix multiplication A • B with T queries and space S and success probability more than 2 −S must have T that is Ω(n 2.5 /S 1/4 ).
Though the form of our lower bound may seem somewhat unusual, both the exponent of n and that of S are optimal: The algorithm of Proposition 5.1 shows that exponent of n is optimal since there is only a gap of O(log 5/4 n) for space Θ(log n).In our quantum query model, at the other end of the scale, an algorithm with space 3n 2 can query and completely remember both matrices in 2n 2 time, after which a single global unitary transformation will produce the n 2 bits of output needed in the remaining qubits working memory; hence the exponent of 1/4 on S cannot be reduced.
Theorem 5.5 follows from the following key lemma which improves on the corresponding bound in [KŠdW07] by a factor of Θ(k 1/4 ).
Lemma 5.6.There are constants ε, c ′ > 0 such that the following holds.Let k < n 2 /100 be an integer.For any quantum circuit C with at most εk 3/4 n 1/2 queries to x, the probability that C produces k correct output values of n × n Boolean matrix multiplication A • B is at most 2 −γk .
We first see how this lemma suffices for the theorem: Proof of Theorem 5.5 assuming Lemma 5.6.Since there are n 2 outputs, it seems that T ≥ n 2 queries are required, but that isn't quite obvious.Nonetheless, we can, for example, derive a T = Ω(n 2 ) lower bound by applying Lemma 5.6 with k = n 2 /101 which shows that a circuit with at most some βn 2 queries can only achieve exponentially small success probability for producing a small fraction of the output.Therefore without loss of generality we can assume that √ S < αn for some arbitrarily small constant α > 0. Let ε and γ be the constants from Lemma 5.6.Let c = 3/(2γ) and define k = cS.Therefore for α ≤ 1/(10 √ c) we obtain that 5 √ k = 5 √ cS < n/2.By Lemma 5.6, since k < n 2 /100, any quantum query algorithm with at most εk 3/4 n 1/2 queries has success probability at most 2 −γk = 2 −3S of producing k correct outputs.
We prove the contrapositive of the theorem statement: Suppose that T ≤ εn 2.5 /(cS) 1/4 = εn 2.5 /k 1/4 .When we divide C into layers with εk 3/4 n 1/2 quantum queries each, there are at most n 2 /k layers.Since there are a total of n 2 outputs, there must be some layer i during which at least k outputs are produced.Let E be the set of the first k outputs produced in layer i.By the argument above since the space is at most S, by Proposition 2.5 the probability that these k outputs are correct given the S qubits of input-dependent initial state at the beginning of layer i is at most 2 2S times larger than that of a circuit without them and the same number of queries, which is at most 2 2S • 2 −3S = 2 −S which is what we needed to show.
The main idea behind the proof of this key lemma is an improved method for embedding the direct product of OR functions into outputs of the Boolean matrix multiplication problem.This is based on the following definition of an L-coloring of subsets of • within each color class either all rows are distinct or all columns are distinct, and • for each color ℓ there is a rectangle given by sets R ℓ ⊆ [n] of rows and C ℓ ⊆ [n] of columns such that the set of points of color ℓ is precisely E ∩ (R ℓ × C ℓ ). Figure 2: Comparison of our lower bounds for Booleam matrix multiplication with those of prior work for both quantum and classical computation.The shaded region comes from the fact that the time must always be Ω(n 2 ).The endpoints mark choices of parameters where the upper and lower bounds match.
We can extend the above to get a matching lower bound on the classical cumulative memory complexity.
Corollary 5.16.Any classical circuit (or other sequential model in which each output value is produced at a fixed time step) computing n × n Boolean matrix-multiplication with T queries and space S with success probability more than 1/(2T) must have cumulative memory that is Ω(n 6 /T).
Using the same proof idea as in Corollary 4.5, the bounds in Theorems 5.5 and 5.15 immediately imply lower bounds for Boolean matrix squaring.
Corollary 5.17.Any quantum circuit computing n × n Boolean matrix squaring on all inputs with T queries, space S, and success probability more than 2 −S must have T that is Ω(n 2.5 /S 1/4 ).Any such classical circuit must have T that is Ω(n 3 /S 1/2 ).Quantum and classical circuits for Boolean matrix squaring with success probability larger than 1/(2T) must have cumulative memories Ω(n 10 /T 3 ) or Ω(n 6 /T) respectively.

Boolean matrix-vector product
Though [Abr90] does not contain an explicit theorem statement on time-space tradeoffs for Boolean matrix-vector products that is the analog of the linear algebra bound in [Abr91] or our Theorem 3.5, [Abr90] contains the claim that analogous results do indeed hold for this problem using the same ideas.(The bound would be a factor n smaller lower bound.) For quantum circuits, Klauck, Špalek, and de Wolf [KŠdW07] prove the following results for computing Boolean matrix-vector products.(They prove a similar result for the case of classical circuits also, though that does not apply to branching programs, which can vary the output timing depending on the input values.)Proposition 5.18 (Theorem 23 in [KŠdW07]).For every S in o(n/ log n), there is an n × n Boolean matrix A (S) such that every bounded-error quantum circuit with space at most S that computes Boolean matrix-vector product A (S) • x in T queries requires that T is Ω( √ n 3 /S) = Ω(n 1.5 /S 0.5 ).
This result is weaker than a standard time-space tradeoff since the function involved is not independent of the circuits that might compute it.In particular, [KŠdW07] does not find a single function that is hard for all space bounds, as the matrix A (S) that they use changes depending on the value of S.
For S = o(n/ log n), the matrix A (S) is produced via the probabilistic method using the following distribution: Choose k to be a sufficiently large constant multiple of S. This distribution chooses matrices A ⊆ {0, 1} n×n by selecting a uniformly random subset of n/(2k) positions in each row to set to 1, with the remainder of the entries in each row being 0. They show that with positive probability over the choice of A, for all sets I ⊆ [n] of size k, at least k/2 of the rows of A I contain at least n/(6k) 1's that are unique in their column of A I ; that is, those columns are 0 in all of the k − 1 other rows of A I .A (S) is then some fixed matrix for which this property is true.
More precisely, when we fix a row j ∈ I and the n/(2k) columns where it is 1, the expected number of the (k − 1)n/(2k) < n/2 1's among the rows in I \ {j} that land in those n/(2k) columns is less than n/(4k).By a Hoeffding bound, the number of those 1's is at most n/(3k) except with probability exponentially small in n/k, which is n −ω(1) since k = O(S) = o(n/ log n).Hence, except with probability n −ω(1) , a row j ∈ I is good for I in that at least n/(2k) − n/(3k) = n/(6k) of the 1's in row j are unique in their respective columns in A I .For a fixed I, the probability that there is no J ⊆ I of size k/2 all of whose rows are good for I is less than the probability that there are k/2 rows of I that are not good for I.This happens with probability at most n −ω(k) since are at most ( k k/2 ) such subsets of rows of size k/2, each of which is not good for I with probability n −ω(k) (and the probabilities are negatively associated).Since there are only ( n k ) choices of I, the total probability that A does not have desired properties is only n −ω(k) .
The proof of Proposition 5.18 follows from the usual time-space lower bound methodology and the following lemma: Lemma 5.19.There is an α > 0 such for every quantum circuit C that makes at most α √ kn queries to x ∈ {0, 1} n , the probability that C produces at least k correct outputs of A (S) • x is at most 2 −Ω(k) .
Proof.Let I ⊆ [n] be the set of indices of the first k outputs of A (S) • x produced by C. Let J ⊆ I be the set of size k/2 rows that are good for I guaranteed by the properties of A (S) .We show that the probability that C produces all outputs even for the rows in J is exponentially small in k: For each row j ∈ J there is a set C j of n/(6k) columns of A (S) I where the unique 1 is in row j.Consider the restriction to input vectors x ∈ {0, 1} n that are 0 outside of j∈J C j .Then the outputs for j ∈ J are a direct product of k/2 OR functions of size n/(6k) on the bits of j∈J C j .By a strong direct product theorem for OR (Theorem 14 of [KŠdW07]), for ε a sufficiently small constant, any circuit of height at most ε(k/2) n/(6k) = ε √ kn/24 is correct with probability at most 2 −γk for some constant γ > 0.
On the algorithmic side, we have the following: Proposition 5.20.For every c > 0 and every Boolean matrix A ∈ {0, 1} m×n there is a quantum circuit using space O(log n) and time O(mn 1/2 log m) that computes Boolean matrix-vector product A • x with error at most m −c .More precisely, the algorithm runs in time O(|A| 1/2 log m) where Proof.For each row in turn, run Grover's algorithm to compute the OR of the bits indexed by the 1's of A i , the i-th row of A with probability of error at most m −c−1 per row for a total error of at most m −c .We note that for the fixed matrix A (S) , each row has Θ(n/S) 1's so |A (S) | 1/2 = Θ(n 3/2 /S 1/2 ).This is an odd situation in that the matrix A (S) designed to require large time for space S algorithms can be solved in nearly the same time bound by space O(log n) algorithms.
On the other hand, consider the following space S algorithm that works for all inputs x with Hamming weight |x| 1 ≤ S/ log n: Run Grover's algorithm O(S) times to find and record the locations of all O(S/ log n) 1's in input string x.This takes O( Sn/ log n) queries.Then compute the m entries of A • x, one after another, which doesn't require any additional queries.Note that this is always more efficient than m √ n/S queries.

Systems of linear inequalities
The same space dependent matrix A (S) in Proposition 5.18 was also used in [AŠdW09] for systems of inequalities.
Proposition 5.21 (Theorem 11 in [AŠdW09]).Let ⃗ t be the length n all-t vector.For every S in min(O(n/t), o(n/ log n)) there exists an n × n Boolean matrix A S such that every bounded error quantum circuit with space at most S that decides the system Ax ≥ ⃗ t of n inequalities requires that T is Ω( √ tn 3 /S).
Similar to [KŠdW07] this matrix is used so that any quantum circuit that computes Ax ≥ ⃗ t can be broken down into slices that solve independent instances of the t-threshold function.

Our results
Using Proposition 5.18, we can obtain a time-space tradeoff lower bound for quantum computation of Boolean matrix-vector product that has a only slightly lower weaker bound in terms of the matrix dimensions but, unlike the previous bound, defines a fixed computational problem whose definition is independent of the space bound allowed.Theorem 5.22.There is a fixed m × n Boolean matrix A with m ≤ n log 2 n such that for every S that is o(n/ log n) every bounded-error quantum circuit with space at most S that computes Boolean matrix-vector product A • x in T queries requires that T is Ω( √ n 3 /S).

Figure 1 :
Figure 1: The general structure of a quantum circuit with T queries.
2 and then we can use Cauchy-Schwarz to bound the success probability as a sum of the µ ℓ .

Proposition 3 .
10 ([Abr91]).Let A, B, C ∈ D n×n and Y (and Y) be the vectors in D n 2 formed by stacking the transposes of the rows of B (and Y) into a column vector.If D is a commutative ring, then the following conditions are equivalent: w,I,y |i, p, w⟩ |y⟩ I |⊥⟩ [n]\I Since S behaves as the identity on |ψ⟩ C and the |i, p, w⟩ are orthogonal basis states, we can rewrite this as: ∑ i,p,w |β i,p,w | 2 = 1 and for each i, p, w, ∑ I,y |β i,p,w I,y | 2 = 1.With this decomposition, the success probability is given by: I |⊥⟩ [n]\I for some β i,p,w and β i,p,w I,y such that α i,p,w,I,y = β i,p,w β i,p,w I,y , } has size ≥ k/4.Let E be the set of all E ⊆ [A] with |E| ≤ t such that G has many outputs in light rows, |G r L(E) | ≥ k/4, and F be the set of all F ⊆ [B] with |F| ≤ t such that G has many outputs in light columns, |G c L ′ (F) | ≥ k/4.We separately bound the contribution to Equation (8) from pairs