Small-Space Spectral Sparsification via Bounded-Independence Sampling

We give a deterministic, nearly logarithmic-space algorithm for mild spectral sparsification of undirected graphs. Given a weighted, undirected graph G on n vertices described by a binary string of length N, an integer k ≤ log n, and an error parameter ɛ > 0, our algorithm runs in space \(\widetilde{O}(k\log (N\cdot w_{\mathrm{max}}/w_{\mathrm{min}})),\) where wmax and wmin are the maximum and minimum edge weights in G, and produces a weighted graph H with \(\widetilde{O}(n^{1+2/k}/\varepsilon ^2)\) edges that spectrally approximates G, in the sense of Spielman and Teng, up to an error of ɛ. Our algorithm is based on a new bounded-independence analysis of Spielman and Srivastava’s effective resistance-based edge sampling algorithm and uses results from recent work on space-bounded Laplacian solvers. In particular, we demonstrate an inherent trade-off (via upper and lower bounds) between the amount of (bounded) independence used in the edge sampling algorithm, denoted by k above, and the resulting sparsity that can be achieved.


INTRODUCTION
The graph sparsification problem is the following: given a weighted, undirected graph G, compute a graph H that has very few edges but is a close approximation to G for some definition of approximation.In general, graph sparsifiers are useful for developing more efficient graph-theoretic approximation algorithms.Algorithms whose complexity depend on the number of edges in the 7:2 D. Doron et al.
graph will be more efficient when run on the sparser graph H , and if H approximates G in an appropriate way, the result on H may be a good approximation to the desired result on G.In this work, we present an algorithm that can be implemented deterministically in small space and achieves sparsification in the spectral sense of Spielman and Teng [54].See Section 1.2 for a more formal statement of our main result.

Background
Motivated by network design and motion planning, Chew [11] studied graph spanners, which are sparse versions of graphs that approximately preserve the shortest distance between each pair of vertices.Benczúr and Karger [7] defined cut sparsifiers whose notion of approximation is that every cut of H has size within a (1 ± ε) factor of the size of the corresponding cut in G.They showed that every graph G on n vertices has a cut sparsifier H with O(n • log n/ε2 ) edges and gave a randomized algorithm for computing such cut sparsifiers.Their algorithm runs in nearly linear time (i.e., O(m), where m is the number of edges in G and the O(•) notation hides polylogarithmic factors) and they used it to give a faster algorithm for approximating minimum s-t cuts.
Spielman and Teng introduced spectral sparsifiers, which define approximation between the graph and its sparsifier in terms of the quadratic forms of their Laplacians [54].The Laplacian of an undirected graph is the matrix L = D − A, where A is the adjacency matrix of the graph and D is the diagonal matrix of vertex degrees (i.e., D ii equals the weighted degree of vertex i).H is said to be an ε-spectral approximation of G if for all vectors v ∈ R n , we have that where L and L are the Laplacians of H and G, respectively.Spectral sparsifiers generalize cut sparsifiers, which can be seen by observing that when v ∈ {0, 1} n , v is the characteristic vector of some set of vertices S ⊆ [n] and v Lv equals the sum of the weights of the edges cut by S.
Spielman and Teng showed that all graphs have spectral sparsifiers with O(n • log O (1) n/ε 2 ) edges and gave a nearly linear time randomized algorithm for computing them with high constant probability.Their spectral sparsifiers were a key ingredient that they used to develop the first nearly linear time algorithm for solving Laplacian systems.These fast Laplacian solvers spawned a flurry of improvements and simplifications [17, 28-30, 32, 33, 36, 46] as well as extensions to directed graphs [14][15][16] and to the space-bounded setting [1,19,43].Spectral sparsification and the nearly linear time Laplacian solvers that use them have been critical primitives that have enabled the development of faster algorithms for a wide variety of problems, including max flow [12,18,26,35,39], random generation of spanning trees [27,42,51], and other problems in computer science [31,45].
Spielman and Srivastava [53] gave a spectral sparsification algorithm that both simplified and improved upon the algorithm of Spielman and Teng.They show that randomly sampling edges independently with probabilities proportional to their effective resistances produces a good spectral sparsifier with high probability.Viewing a graph as an electrical network, the effective resistance of an edge (a, b) is the potential difference induced between them when a unit of current is injected at a and extracted at b (or vice versa).More formally, the effective resistance of an edge (a, b) in a graph with Laplacian L is R ab = (e a − e b ) L + (e a − e b ), (1) where e i denotes the i-th standard basis vector and L + denotes the Moore-Penrose pseudoinverse of L. 1Small-Space Spectral Sparsification via Bounded-Independence Sampling 7:3 Spielman and Srivastava proved the following theorem.Theorem 1.1 (spectral sparsification via effective resistance sampling2 [52,53]).Let G = (V , E, w) be a weighted graph on n vertices and for each edge (a, b) ∈ E with weight w ab , define p ab = min{1, 4 • log n • w ab • R ab /ε 2 }, where R ab is the effective resistance of (a, b) as defined in Equation (1).Construct a sparsifier H by sampling edges from G independently such that each edge (a, b) in G is added to H with probability p ab .For edges that get added to H , reweight them with weight w ab /p ab .Let L and L be the Laplacians of G and H , respectively.Then, with high probability, (1) H has O(n • (log n)/ε 2 ) edges and (2) L ε-spectrally approximates L.
Furthermore, this procedure can be implemented to run in time O( m ε 2 • log(w max /w min )), where m is the number of edges in G and w max , w min are the maximum and minimum edge weights of G, respectively.
The sparsity achieved by the Spielman and Srivastava sparsifiers was improved by Batson et al. [6], who gave a deterministic algorithm for computing ε-spectral sparsifiers with O(n/ε 2 ) edges, which is asymptotically optimal.However, their algorithm is less efficient, running in time O(m • n3 /ε 2 ).Work on these optimal sparsifiers continued with another slightly faster deterministic algorithm [55] followed by an O(n 2+ε )-time randomized algorithm [2] and culminating in the randomized algorithms of Lee and Sun, who achieved almost-linear time [37] and, finally, nearlylinear time [38].

Our Main Result
In this work, we study the deterministic space complexity of computing spectral sparsifiers.We use the standard model of space-bounded computation.We refer the reader to Section 5 (and to [5,Section 4] and [23,Section 5]) for an overview of the model. 3 Our main result is a deterministic, nearly-logarithmic space algorithm for computing mild spectral sparsifiers, that is, graphs with O(n 1+α /ε 2 ) edges for any constant α > 0. Theorem 1.2 (see also Theorem 5.1).Let G be a connected, weighted, undirected graph on n vertices, k ∈ N an independence parameter.and ε > 0 an error parameter.There is a deterministic algorithm that on input G, k, and ε outputs a weighted graph H that is an ε-spectral sparsifier of G and has O(n 1+2/k •(log n)/ε 2 ) edges.The algorithm runs in space O(k log(N •w)+log(N •w) log log(N •w)), where w = w max /w min is the ratio of the maximum and minimum edge weights in G and N is the length of the input.
The closest analogue to spectral sparsifiers in the space-bounded derandomization literature is the derandomized square of Rozenman and Vadhan [49], a graph operation that produces a sparse approximation to the square of a graph. 4The derandomized square was introduced to give an alternative proof to Reingold's celebrated result that Undirected S-T Connectivity can be solved in deterministic logspace [48].Murtagh et al. [43] showed that the derandomized square actually produces a spectral sparsifier of the square of a graph.This was a key observation they used to develop a deterministic, nearly logarithmic space algorithm for solving Laplacian systems.Later, the 7:4 D. Doron et al. sparsification benefits of the derandomized square were also used in nearly logarithmic space algorithms for deterministically approximating random walk probabilities and for solving Laplacian systems in Eulerian directed graphs [1,44].
For a d-regular graph G on n vertices, its square G 2 has degree d 2 and the derandomized square computes an ε-spectral approximation to G 2 with degree O(d/ε 2 ).On the other hand, applying our sparsification to G 2 results in an ε-spectral approximation with, on average, O(n α /ε 2 ) edges adjacent to each vertex for any constant α, which is independent of d and much sparser when d = ω(n α ).Also, our algorithm can sparsify any undirected graph, not just squares.Our algorithm does not replace the derandomized square, however, because the derandomized square can be iterated very space efficiently, a property that is used in all of its applications thus far.Nevertheless, given the success of spectral sparsification and Laplacian solvers in the nearly-linear time context and the fruit borne of porting these techniques to the logspace setting, we are hopeful that our spectral sparsifiers will have further applications in derandomization of space-bounded computation.

Techniques
Our deterministic space-efficient algorithm is modeled after the effective resistance-based sampling algorithm of Spielman and Srivastava (Theorem 1.1).Although the Spielman and Srivastava procedure is randomized and does not achieve optimal sparsity, the known algorithms that do ([2, 6, 37, 38, 55]) are more involved and often sequential in nature.Thus, they do not seem as amenable to small-space implementations.
To derandomize the Spielman-Srivastava algorithm, we follow the standard approach of first reducing the number of random bits used to logarithmic and then enumerating over all random choices of the resulting algorithm.Following [3,41], a natural way to reduce the number of random bits used is to do the edge sampling only k-wise independently for some k |E| rather than sampling every edge independently from all other edges.
Let k be our bounded-independence parameter.Namely, we are only guaranteed that every subset of k edges is chosen independently (with the right marginals).However, there may be correlations between the choices in tuples of size k + 1.It is well known that such a sampling can be performed using fewer random bits.By [53], we know that k = |E| will, with high probability, produce an ε-spectral sparsifier with O(n •log n/ε 2 ) edges in expectation.What about much smaller values of k?In Section 3, we prove the following: Theorem 1.3 (informal; see Theorem 3.1).Let G be a connected weighted undirected graph on n vertices with Laplacian L, k ∈ N an independence parameter, and ε > 0 an error parameter.Let H be the graph that is the output of Spielman and Srivastava's sampling-based sparsification algorithm (Theorem 1.1), when the edge sampling is done in a k-wise independent manner, and let L be the Laplacian of H .Then, with high constant probability, L ε-approximates L and The first thing to observe is that k = log n gives the same result as in [53].More importantly, the above shows that the result interpolates: Even for a constant k, Theorem 1.3 gives a mild sparsification that sparsifies dense graphs to O(n 1+α ) expected edges, where α > 0 is an arbitrarily small constant.
We prove Theorem 1.3 by extending the arguments in [52,53].For every edge (a, b) ∈ E, we define a random matrix X ab that corresponds to the choice made by the sparsification algorithm, in such a way that X = (a,b)∈E X ab relates to the resulting Laplacian L. 5 Let Π be the orthogonal Small-Space Spectral Sparsification via Bounded-Independence Sampling 7:5 projection onto the image of L. Following [52,53], we show that L ε-spectrally approximates L (equivalently, that H is an ε-spectral sparsifier for G) with high probability if X − Π has bounded moments.Deriving a tail bound that relies on the first k moments alone, we can proceed with the analysis as if the X ab s were truly independent.More specifically, we bound Tr(E X [(X − Π) k ]) using a matrix concentration result due to Chen et al. [10].We note that matrix tail bounds that rely on the first moments, combined with k-wise independence, was used in previous works, e.g., in [25].For the complete details, as well as how our argument differs from [52,53], see Section 3.
Getting a Deterministic Algorithm.Theorem 1.3 readily gives a simple, randomness-efficient algorithm, as k-wise independent sampling of edges only requires O(k • log(N • w)) random bits [3,24] (see Lemma 2.4).However, more work is needed to obtain a space-efficient deterministic algorithm.First, we need to be able to compute the marginal sampling probabilities, which depend on the effective resistances R ab .Fortunately, the recent work of Murtagh et al. [43] allows us to approximate the effective resistances using only O(log(N • w) log log(N • w)) space and we show that the k-wise independent sampling procedure can tolerate the approximation.
Next, to obtain a deterministic algorithm, we can enumerate over all possible random choices of the algorithm in space O(k • log(N • w)) and compute a candidate sparsifier H for each.We are guaranteed that at least one (indeed, most) of the resulting graphs H is a good sparsifier for G but how can we identify which one?To do this, it suffices for us, given Laplacians L and L, to distinguish the case that L is an ε-spectral approximation of L from the case that L is not a 2εspectral approximation of L. We reduce that problem to that of approximating the spectral radius of where L + is the pseudoinverse of L, which can be approximated in nearly logarithmic space by [43].In fact, it will be sufficient to check whether the trace of a logarithmically high power of M is below a certain threshold to deduce that the spectral radius of M does not exceed 1.In Section 5.2, we show that the latter case implies that L indeed ε-approximates L.
The deterministic, nearly logarithmic space Laplacian solver of [43] only worked for multigraphs, i.e., graphs with integer edge weights.To get our result for arbitrary weighted graphs, we extend the work of [43] and give a deterministic, nearly logarithmic space Laplacian solver for arbitrary undirected weighted graphs.Combining this extension with the k-wise independent analysis of the edge sampling algorithm (Theorem 1.3) and the verification procedure described above lets us prove our main result, Theorem 1.2.

Lower Bounds for Bounded-Independence Sampling
Having established an upper bound on the amount of independence required for the edge-sampling procedure (Theorem 1.3), a natural goal would be to come up with a corresponding lower bound.Theorem 1.3 tells us that in order to sparsify to O(n 1+α ) expected edges, we can use k-wise independent sampling for k = 2/α.Can a substantially smaller choice of k perform just as well?In Section 4, we show that our upper bound of k = 2/α is tight up to a small constant factor.Theorem 1.4 (informal; see Theorem 4.1).For every small enough α > 0, there exist infinitely many connected graphs G = (V = [n], E) with all effective resistances equal that are d-regular with d = Ω(n α ) and a distribution D ∼ {0, 1} |E | that is k-wise independent for k = 4/3α with marginals 1/2 that would produce a disconnected graph with high probability.
Our family of "bad graphs" will be dense graphs having large girth.Namely, given a girth д and an integer d ≥ 3, we consider graphs G = (V = [n], E) satisfying d > n γ /д for some constant 0 < γ < 2 [34].Getting an infinite family of graphs with γ approaching 2 (and specifically attaining the Moore bound), even non-explicitly, has been the subject of extensive study (see [21] and references therein).See also Section 4.1 for a further discussion.Given a sparsification parameter α > 0, we set k ≈ γ /α and take a graph G on n vertices with girth д = k + 1 and degree d > n γ /д + 1.
Our construction of the distribution D is inspired by Alon and Nussboim [4]: choose a partition of the vertices V = V 0 V 1 uniformly at random, and for every edge e = (u, v) ∈ E, include it in the sample if and only if either u, v ∈ V 0 or u, v ∈ V 1 .Clearly, sampling edges according to D results in a disconnected graph almost surely.However, we show that D is indeed k-wise independent, relying on the fact that the girth of G is k + 1.
To obtain Theorem 1.4, we use the family of graphs given by Lazebnik et al. [34] who obtained γ = 4/3.Indeed, any improvement in γ would bring our upper bound of k ≈ 2/α and lower bound of k ≈ γ /α closer together.

Open Problems
An interesting open problem is to achieve improved sparsity, e.g., O(n • (log n)/ε 2 ) matching [53].Our algorithm would require space Ω(log 2 n) to achieve this sparsity, due to setting k = Ω(log n).We remark that previous work implies that this can be done in randomized logarithmic space.Indeed, Doron et al. [19] gave a randomized algorithm for solving Laplacian systems in logarithmic space (without log log(•) factors), which implies that one can approximate effective resistances and, hence, implement the Spielman-Srivastava edge sampling with full independence in randomized logspace.It is also an interesting question as to whether there is a nearly logspace algorithm (even randomized) that produces spectral sparsifiers of optimal sparsity (i.e., O(n/ε 2 ) edges).
Finally, there has been recent progress on sparsifying Eulerian digraphs in the nearly-linear time literature [13][14][15][16].Given the recent advance of a nearly-logarithmic space solver for Eulerian Laplacian systems [1], an interesting question is sparsifying Eulerian graphs in small space.

PRELIMINARIES
We will work with undirected weighted graphs, G = (V , E, w), where w is a vector of length |E| and each edge (a, b) ∈ E is associated with a positive weight w ab > 0. At times we refer to undirected multigraphs, which are weighted graphs where all of the weights are integers.The adjacency matrix of G is a symmetric, real-valued matrix A in which A i j = w i j if (i, j) ∈ E and A i j = 0 otherwise.
For any matrix A, its spectral norm A is max x =1 Ax 2 , which is also the largest singular value of A. For any square matrix A, its spectral radius, denoted ρ(A), is the largest absolute value of its eigenvalues.When A is real and symmetric, the spectral norm equals the spectral radius.The spectral norm is sub-multiplicative, i.e., AB ≤ A B .We denote by A the transpose of A. We denote by 1 the all-ones vector, by 0 the all-zeros vector, and e a is the vector with 1 in the a-th coordinate and 0 elsewhere, where e a 's dimension will be understood from context (i.e., e a is the a-th standard basis vector).
The trace of a matrix A ii , which also equals the sum of its eigenvalues.The trace is invariant under cyclic permutations, i.e., Tr(AB) = Tr(BA).The expectation of a random matrix is the matrix of the coordinate-wise expectations.More formally, if A is a random matrix, then The trace and the expectation are both linear functions of a matrix and they commute.That is, for all random matrices A, we have that Tr(E[A]) = E[Tr(A)] (see, e.g., [47]).

PSD Matrices and Spectral Approximation
A symmetric matrix A ∈ R n×n is positive semi-definite (PSD), denoted A 0, if for every x ∈ R n it holds that x Ax ≥ 0 or, equivalently, if all its eigenvalues are non-negative.We write When A and B share an eigenvector basis v 1 , . . .,v n , Definition 2.1 is equivalent to requiring that (1−ε)μ i ≤ λ i ≤ (1+ε)μ i , where λ 1 , . . ., λ n are the eigenvalues of A corresponding to v 1 , . . .,v n and μ 1 , . . ., μ n are the eigenvalues of B corresponding to v 1 , . . .,v n .

The Moore-Penrose Pseudoinverse
Let A be any linear operator.The Moore-Penrose pseudoinverse of A, denoted A + , is the unique matrix that satisfies the following: both AA + and A + A are Hermitian.
If A = U ΣV is the singular value decomposition (SVD) of A, the pseudoinverse is given by A + = V Σ + U , where Σ + is the matrix obtained by taking the reciprocal of each nonzero diagonal element of Σ and leaving the zeros intact.When A is a symmetric PSD matrix, the SVD coincides with the eigen-decomposition.Thus, if λ 1 , . . ., λ n are the eigenvalues of A, then A + shares the same eigenvector basis and has eigenvalues λ + 1 , . . ., λ + n , where Also, note that if A is real then A + is real valued as well.
A square root of a matrix A is any matrix X that satisfies X 2 = A. When A is symmetric and PSD, it has a unique symmetric PSD square root, which we write as ΣU where √ Σ is obtained by taking the square root of each diagonal element of Σ.We denote by A +/2 the matrix (A + ) 1/2 = (A 1/2 ) + .

The Graph Laplacian and Effective Resistance
Given a graph G on n vertices with an adjacency matrix A and degree matrix D (i.e., D is a diagonal matrix where For every undirected weighted graph G = (V , E, w), its Laplacian L is symmetric and PSD, with smallest eigenvalue 0. The zero eigenvalue has multiplicity one if and only if G is connected.In this case, ker(L G ) = span({1}).
It is often helpful to associate G with an electric circuit, where an edge (a, b) ∈ E corresponds to a resistor of resistance 1/w ab .For each pair of vertices a and b, the effective resistance between them, denoted by R ab , is the energy of the electrical flow that sends one unit of current from a to b.The effective resistance can be calculated using the pseudoinverse of the Laplacian: See [9] for more information on Laplacians and viewing graphs as electrical networks.A useful fact about effective resistances is Foster's Theorem: Theorem 2.2 ( [22]).For every undirected connected weighted graph G = (V , E, w) on n vertices, it holds that

Bounded-Independence Sampling
Given a probability vector p ∈ [0, 1] m , let Bernoulli(p) denote the distribution X over {0, 1} m , where the bits are independent and for each i We refer to X as a k-wise independent sample space with marginals p.
Throughout, when we say sampling edges in a k-wise independent manner, we refer to the process of picking an element x ∈ {0, 1} m from a k-wise independent sample space uniformly at random and taking those edges e ∈ E for which x e = 1.
For p ∈ [0, 1] m and a positive integer t, we define p t to be the vector p obtained by truncating every element of p after t bits.Thus, for each The following lemma states that we can construct small k-wise independent sample spaces with any specified marginals.Lemma 2.4 (following [3,24]).For every m, k, t ∈ N and p ∈ [0, 1] m , there exists an explicit k-wise independent distribution X ∼ {0, 1} m with marginals p t that can be sampled with r = O(k • max {t, log m}) truly random bits.Furthermore, given ρ ∈ {0, 1} r , the element x ∈ Supp(X ) corresponding to the random bits ρ can be computed in O(k • max {t, log m}) space.

Auxiliary Claims
We will need the following claims, whose proofs we will defer to Appendix A.

Claim 2.5. Let A, B, C be n × n symmetric PSD matrices and suppose that B C. Then,
Claim 2.7.Let G = (V , E, w) be an undirected connected weighted graph on n vertices with Laplacian L. Let J be the n × n matrix with 1/n in every entry and define Π = I − J (i.e., Π is the projection onto span(1) ⊥ = Im(L)).Then, we have that Claim 2.8.Let A, B, C be symmetric n × n matrices and suppose that A and B are PSD.Then, the following hold: (1) The proof of the following claim can be found in [54].Claim 2.9 ( [54]).Let G be an undirected, weighted graph on n vertices with Laplacian L and minimum weight w min .Then, the smallest nonzero eigenvalue of L is at least min{ 8w min n 2 , w min n }.

SPARSIFICATION VIA BOUNDED-INDEPENDENCE SAMPLING
In Section 1, we briefly introduced the Spielman-Srivastava sparsification algorithm [53] based on (truly) independent edge sampling, with probabilities proportional to the effective resistances of the edges.In this section, we explore the trade-off between the amount of independence used in the edge sampling process and the resulting sparsity that can be achieved.
In particular, we analyze the algorithm Sparsify (see Figure 1).The algorithm gets as input an undirected, weighted, dense graph G = (V , E, w) on n vertices, approximate effective resistances R ab for each edge (a, b) ∈ E, a bounded independence parameter k ≤ log n, a desired approximation error ε > 0, and a parameter δ > 0 governing the success probability, and outputs a sparser graph H whose Laplacian ε-spectral approximates the Laplacian of G with probability at least 1 − 2δ .
First, we will analyze Sparsify for the case in which the effective resistances are given exactly, i.e., R ab = R ab for all (a, b) ∈ E.Then, in Section 3.2, we will analyze the more general case in which we are given approximations to the effective resistances.This latter case is useful algorithmically because more efficient algorithms are known for estimating effective resistances rather than for computing them exactly, both in the time-bounded and space-bounded settings [43,53].

Sparsification With Exact Effective Resistances
In this section, we prove the following theorem about Sparsify.Theorem 3.1 (spectral sparsification via bounded independence).Let G = (V , E, w) be an undirected connected weighted graph on n vertices with Laplacian L and effective resistances R = {R ab } (a,b)∈E .Let 0 < ε < 1, 0 < δ < 1/2 and let k ≤ log n be an even integer.Let H be the output of Sparsify(G, R, k, ε, δ ) and let L be its Laplacian.Then, with probability at least 1 − 2δ , we have that Spielman and Srivastava showed that by using truly independent sampling (i.e., k = |E|) in Sparsify, one can compute an ε-spectral sparsification of G with O(n • log n/ε 2 ) edges, with high constant probability [53].One immediate consequence of Theorem 3.1 is that log n-wise independent sampling suffices to match the sparsity that truly independent sampling achieves.Another consequence of Theorem 3.1 is that for any constant 0 < α < 1 and any constant γ < α/2, for k ≈ 2/(α − 2γ ), k-wise independent sampling achieves a spectral sparsifier with error ε = n −γ and O(n 1+α ) expected edges, with high constant probability.
The proof of Theorem 3.1 is modeled after Spielman and Srivastava's argument [53].One difference is that the sparsification algorithm in [53] fixes the number of edges to be sampled in advance rather than having the number of edges be a random variable.They then prove spectral approximation by reducing the problem to a question about concentration of random matrices, which they resolve with a matrix Chernoff bound due to Rudelson and Vershynin [50].We follow a variant of this argument for the case in which the number of edges in the sparsifier is random and use a matrix concentration bound of Chen et al. [10].This variant, for truly independent sampling, has appeared before in [52].Our argument deviates in the proof of Lemma 3.7 to address the fact that we only use k-wise independent sampling.
We start by showing that the sparsity guarantee in Theorem 3.1 indeed holds.Since the inclusion or exclusion of each edge in the sparsifier is a Bernoulli random variable, we can write the expected number of edges in it as where the second line follows from Claim 2.2.By Markov's inequality, we can conclude the following.
We prove item (1) of Theorem 3.1 by the following sequence of lemmas.Throughout, we let G = (V , E, w), L, L, ε, and k as in Theorem 3.1 and s, R ab , and p ab as in Sparsify.Let Π = I − J be the orthogonal projection onto Im(L), as in Claim 2.7.For each (a, b) ∈ E, we define the random matrix and the X ab s are k-wise independent.That is, Proof.By definition, L ≈ ε L if and only if Multiplying on both sides by L +/2 and applying Claims 2.7 and 2.8, we get that this is equivalent to Small-Space Spectral Sparsification via Bounded-Independence Sampling 7:11 Note that (a,b)∈E Y ab = L. Thus, we have that Lemma 3.4.Fix all the random choices in Sparsify and assume that k is even.Then, Proof.First, we observe that X and Π share a common eigenbasis. 1 is in the kernel of both Π = I − J and X .Let v 2 , . . .,v n be orthogonal eigenvectors of X in span({1}) ⊥ with eigenvalues λ 2 , . . ., λ n , respectively.These are all also eigenvectors of Π since span({1}) ⊥ is an eigenspace of Π of eigenvalue 1. Assume that Since this holds for all eigenvalues λ 2 , . . ., λ n of X and the corresponding eigenvalues of Π are 1, we conclude that X ≈ ε Π. Lemma 3.5.Fix all the random choices in Sparsify and assume that k is even.Then, Proof.Since k is even, (X − Π) k has non-negative eigenvalues.If Tr (X − Π) k ≤ ε k , then the sum of the eigenvalues of (X − Π) k is at most ε k .Hence, the largest eigenvalue of (X − Π) k is at most ε k .Since ker(X ) ⊇ ker(Π) and all non-zero eigenvalues of (ε Note that the trace and the expectation commute completes the proof.
To prove Lemma 3.7, we will use the following theorem of Chen, Gittens, and Tropp.
Theorem 3.8 ( [10]).Let W 1 , . . .,W m be independent, random, symmetric n ×n matrices.Fix k ≥ 2 and let r = max{k, 2 log n}.Then,  to be identically distributed to the Z ab s, except that the Z ab random variables are truly independent instead of only k-wise.More specifically, if we let X ab be defined the same way as X ab but this time we sample the edges in Step 4 of Sparsify truly independently (with marginals p ab ), then The key point to note is that both E[ Z k ] and E[Z k ] can each be written as a sum of products of at most k random variables.As the Z ab s are k-wise independent, we have the following.Claim 3.9.

Towards bounding Tr(E[ Z
Then, we use the fact that for all symmetric n × n matrices M, we have that Tr(M) ≤ n • M .Thus, where the latter inequality is by the submultiplicity of the spectral norm.Since Z = (a,b)∈E Z ab , we can bound the right-hand side by applying Theorem 3.8 to the Z ab s.To bound the two terms on the right-hand side, we make use of the following two claims.Claim 3.10.For every (a, b) ∈ E and every matrix in the support of Z ab , it holds that Z ab ≤ 1 s .
Proof.Observe that if p ab = 1, then Z ab = 0.If p ab < 1, Proof.As E[ Z ab ] = 0, we can write Note that if p ab = 1, then the above expectation is 0. If p ab < 1, then Now we can bound the first term on the right-hand side of Theorem 3.8.Together, Claims 3.11 and 2.5 give: To bound the second term of Theorem 3.8, we apply Claim 3.10 to get Set r = max {k, 2 log n} = 2 log n.Combining the bounds on the two terms and applying Theorem 3.8 gives when s > e • r .Raising both sides to the k-th power and using the sub-multiplicativity of the spectral norm, we get .
For all symmetric n × n matrices M, we have that Tr(M) ≤ n • M .Thus, by the monotonicity of expectation, we get that . By Claim 3.9, , and, thus, Proof of Theorem 3.1.From Lemma 3.3, Lemma 3.4, Lemma 3.5, and Lemma 3.6, we have that L ≈ ε L except with probability at most 1 . By Lemma 3.7, we have that .
The above is upper bounded by δ whenever which is how we set s in Sparsify.Combining this with Claim 3.2, the theorem follows by the union bound.

Sparsification With Approximate Effective Resistances
Spielman and Srivastava showed that the original version of spectral sparsification through effective resistance sampling (with fully independent sampling and fixing the number of edges in advance) is robust to small changes in the sampling probabilities.In this section, we show that the same is true of Sparsify.As said, this is useful because more efficient algorithms are known for estimating effective resistances than for computing them exactly.We will also use this fact for our space-bounded algorithm for sparsification in Section 5.
The lemma below says that if we only have small multiplicative approximations to the effective resistances, then the guarantees of Theorem 3.1 still hold with a small loss in the sparsity.Lemma 3.12.Let G = (V , E, w) be an undirected connected weighted graph on n vertices with Laplacian L. Let 0 < ε < 1, 0 < δ < 1/2 and let k ≤ log n be an even integer.For each (a, b) ∈ E, let R ab be such that where R ab is the effective resistance of (a, b) and Let H be the output of Sparsify(G, R, k, ε, δ ) and let L be its Laplacian.Then, with probability at least 1 − 2δ , we have that Proof.Using R ab /(1 − α) in Sparsify, our sampling probabilities become This means that the expected sparsity of the resulting graph is Small-Space Spectral Sparsification via Bounded-Independence Sampling 7:15 Note that by feeding Sparsify R ab /(1 − α) rather than R ab , we guarantee that the approximate effective resistance is an upper bound on the true effective resistance and, hence, the approximate sampling probability is an upper bound on the true sampling probability.In particular, this implies that if p ab = 1, then p ab = 1.Note that in Lemma 3.3 through Lemma 3.7, the expectations of X ab , Z ab , and Z ab do not depend on the sampling probabilities.The sampling probabilities come up when we bound two terms of the concentration bound in Theorem 3.8.However, because of our guarantee that p ab ≥ p ab , we get the same results.The calculation we used for the first term, given in Claim 3.11, now yields Again, when p ab = 1, the above is 0; otherwise, we have that , which is exactly the bound in Claim 3.11.Similarly, when adapting Claim 3.10 to the switch to p ab , we incur no loss.We have that which matches the original bound.In fact, this lemma holds with a slightly weaker assumption.Note that we used the fact that (1 − α) • R ab ≤ R ab but for the upper bound on the approximate effective resistances, we only need the weaker inequality, for the argument above to go through.
Note that we could equivalently define Sparsify to take approximate sampling probabilities as input (i.e., (1 − α)p ab ≤ p ab ≤ (1 + α)p ab ) rather than α-approximate effective resistances and the same lemma applies.

LOWER BOUNDS FOR BOUNDED-INDEPENDENCE SAMPLING
In this section, we give a lower bound for sampling-based bounded independence sparsification.Our lower bound will hold even for unweighted, simple, regular graphs in which all the effective resistances are the same.Thus, for this section, assume that G = (V = [n], E) is such a graph.In Section 3, we measure sparsity in terms of the number of edges in the graph.We use this measure rather than average degree because, in weighted graphs, the degree of a vertex v typically refers to the sum of the weights of the edges incident to v, whereas in sparsification algorithms we are trying to minimize the number of edges incident to v regardless of their weight.In this section, we will sometimes refer to average degree rather than number of edges.When we refer to the average degree of a weighted graph, we mean the average number of edges incident to each vertex.For simple, unweighted graphs, these quantities are the same.
Fix some α > 0. Theorem 3.1 tells us that if we want to sparsify G to within error ε and expected degree s = O n α • log n/ε 2 , we can do so by sampling each edge with probability p = s • (n − 1)/|E| in a k-wise independent manner, where k = 2/α (rounded to an even integer). 6We now prove that k ≥ 4/3α is essential for such a sampling procedure, at least for constant α.Theorem 4.1 (lower bound for spectral sparsification via bounded independence).Fix c > 0. For every α ∈ (0, 4/15], there exist infinitely many n's for which the following holds. There exists a connected graph G = (V , E), where V = [n], whose effective resistances are all equal and a distribution D ∼ {0, 1} |E | that is k-wise independent for k = 4/3α with marginals 1/2 that would fail to sparsify G to within any error ε > 0 and expected degree s = c log n • n α 0 , where α 0 ≥ (1 − 2α)α.
More specifically, sampling a subgraph of G according to D would result in a disconnected graph with probability at least 1 − 2/2 n .
We note that a disconnected graph fails to be a good spectral sparsifier of a connected graph, which is implicit in Theorem 4.1.Formally: Claim 4.2.Let G and G be undirected graphs on n vertices with Laplacians L and L, respectively.If G is connected and G is disconnected, then L ε L for any ε > 0.
We give a proof of Claim 4.2 in Appendix A.

Moore-like Graphs With a Given Girth
Toward proving Theorem 4.1, we will need, for every bounded-independence parameter k, an infinite family of graphs satisfying certain properties.Recall that the girth of a graph G is the length of the shortest cycle in G.We will need an infinite family of girth-д graphs having large degree.Formally: Definition 4.3.Given γ > 0 and д : N → N, an infinite family of graphs The problem of finding such families of graphs, or even proving their existence in some regime of parameters, has been widely studied in extremal graph theory.A simple counting argument ( [20], see also [9]) shows that (д, γ )-Moorish families of graphs can only exist when γ ≤ 2: Lemma 4.4 (the Moore bound, see, e.g., [9]).Every d-regular graph of girth д on n vertices Still, no families with γ approaching 2 for arbitrary girths are known.The Ramanujan graphs of Lubotzkyet al. [40] were shown to obtain γ ≥ 4/3 by Biggs and Boshier [8].Lazebniket al. [34] slightly improved upon [40] in the lower-order terms.However, more importantly for us, the family they construct consists of edge-transitive graphs.Theorem 4.5 ([34]).For every prime power d and even integer д ≥ 6, there exists a d-regular explicit simple, edge-transitive graph with n ≤ 2d д− д−3 4 −4 vertices and girth д.In particular, for every prime power d, there exists a (д, γ = 4/3)-Moorish family of edge-transitive graphs, where Im(д) = {6, 8, . ..}.
Intuitively, in an edge-transitive graph, the local environment of every edge (i.e., the vertices and edges adjacent to it) looks the same.More formally, an edge-transitive graph is one in which any two edges are equivalent under some element of its automorphism group.As the computation of the effective resistance is not affected by an automorphism, we can conclude the following claim.

The Lower Bound Proof
We next prove our main result for this section, showing that Moorish edge-transitive graphs cannot be sparsified via bounded-independence edge sampling when k is too small.Our proof can be seen as an extension of an argument by Alon and Nussboim [4], who studied the bounded independence relaxation of the usual Erdős-Rényi random graph model, for which it is only required that the distribution of any subset of k edges is independent.They provide upper and lower bounds on the minimal k required to maintain properties that are satisfied by a truly random graph.In particular, they show that there exists a pairwise independent distribution D over edges with marginals 1/2 such that a random graph sampled from D is disconnected almost surely.
As a warm-up, we extend the argument in [4] and show that 3-wise independence also does not suffice, even for the special case of sparsifying the complete graph.Proof.We first set some notations.Let G(A, p) be the usual Erdős-Rényi model, in which each edge between two vertices in A is included in the graph with probability p.Let B(A) be the natural distribution over complete bipartite graphs: Choose a partition A = A 1 A 2 uniformly at random and include all edges between A 1 and A 2 .
We construct D ∼ {0, 1} |E | as follows.Choose a partition [n] = V 0 V 1 uniformly at random.On V 0 , draw a graph from G(V 0 , 1/2) and on V 1 , draw a graph from B(V 1 ).Clearly, sampling G according to D ∼ D would result in a disconnected graph unless V 0 = or V 1 = , which occurs with probability at most 2/2 n , so that what is left to show is that D is 3-wise independent with marginals 1/4.This is equivalent to saying that for every Let us first consider the case |T | = 1, i.e., T = e for a single edge e ∈ E. Note that Pr[D(e) = 1] only if both endpoints of e appear in the same side of the partition V 0 V 1 , which occurs with probability 1/2 and, given that this occurs, e appears in G(V 0 , 1/2) or B(V 1 ) with probability 1/2.Thus, Pr[D(e) = 1] = 1/4, as desired.
Next, fix a set T ⊆ E of t ∈ {2, 3} edges and note that we can assume without loss of generality that these edges form either a path or a triangle (for t = 3), as disjoint paths will occur independently.If T forms a path, then, similarly, which is what we want.If T forms a triangle, then using the fact that a bipartite graph is trianglefree, concluding the proof.
The above lemma shows that one cannot sparsify the complete graph via (k = 3)-wise independent edge sampling.For a general k, we indeed need to resort to Moore-like graphs.
We now give a k-wise independent distribution with marginals 1/2 that fails to yield a good spectral sparsifier for G; namely, it will be disconnected with high probability.
To do so, construct D ∼ {0, 1} |E | as follows.Choose a partition [n] = V 0 V 1 uniformly at random.Each random partition gives rise to D ∼ D in which for e = (u, v) ∈ E, D(e) = 1 (i.e., the edge e is chosen to survive) if and only if either Claim 4.8.The distribution D is k-wise independent with marginals 1/2.Proof.As in the proof of Lemma 4.7, it suffices to show that for every setT ⊆ E of t ≤ k edges of G, we have that Pr[∀e ∈ T , D(e) = 1] = 2 −t .First, similar to Lemma 4.7, note that we can assume without loss of generality that T is a connected component since whenever T 1 and T 2 are over disjoint sets of vertices, Pr[∀e As the girth of G is larger than t, it must be the case that A is a tree.
In such a case, where T contains no cycles, Pr[∀e ∈ T , D(e) = 1] is equal to the probability that all t + 1 vertices in T belong to the same partition, which is 2 By the way that D was constructed, it is clear that sampling G according to D would result in a disconnected graph unless V 0 = or V 1 = , which occurs with probability 1 − 2/2 n , meaning that G almost surely does not ε-approximate G for any ε.
Small-Space Spectral Sparsification via Bounded-Independence Sampling

7:19
We again stress that by the work in Section 3, we know that any k-wise independent distribution over the edges of G with marginals s•(n−1)/|E| for k = 2/α would produce an ε-spectral sparsifier with expected degree O(s) with high constant probability.
The above also implies that any improvement upon Moorish families of edge-transitive graphs will improve our lower bound.Assuming the existence of a (д, γ = 2)-Moorish family of edgetransitive graphs, we are able to show that the result of Section 3 is essentially tight.

SPECTRAL SPARSIFIERS IN DETERMINISTIC SMALL SPACE
In this section, we show that Sparsify can be derandomized space efficiently.Theorem 5.1 (deterministic small-space sparsification).Let G be an undirected, connected, weighted graph on n vertices with Laplacian L. There is a deterministic algorithm that, when given G, an even integer k and 0 < ε < 1 outputs a weighted graph H with Laplacian L satisfying: , where w = w max /w min is the ratio of the maximum and minimum edge weights in G and N is the bitlength of the input.
We use the standard model of space-bounded computation.The machine has a read-only input tape, a constant number of read/write work tapes, and a write-only output tape.We say the machine runs in space s if, throughout the computation, it only uses s total tape cells on the work tapes.The machine may write outputs to the output tape that are larger than s (in fact, as large as 2 O (s) ) but the output tape is write-only.We use the following fact about the composition of space-bounded algorithms (see [5,Chapter 4] or [23,Chapter 5]).Lemma 5.2.Let f 1 and f 2 be functions that can be computed in space s 1 (n), s 2 (n) ≥ log n, respectively, and f 1 has output of length 1 (n) on inputs of size n.Then, The natural way to derandomize Sparsify would be to iterate over all elements of the corresponding k-wise independent sample space.More formally, given {p ab } (a,b)∈E , let I ab be the indicator random variable that is 1 if and only if edge (a, b) is chosen.If the I ab s are k-wise independent so that Pr[I ab = 1] = p ab (or some good approximation of p ab ), we are guaranteed to succeed with non-zero probability.Hence, at least one assignment to the I ab s taken from the k-wise independent is guaranteed to work.From Section 2.4, we know that the sample space is small enough that we can afford to enumerate over all elements in it.Towards proving Theorem 5.1, there are still three issues to consider: (1) Approximating the effective resistances R ab for every (a, b) ∈ E, space efficiently.Fortunately, we can do this with high accuracy using the result of Murtaghet al. [43] for approximating the pseudoinverse of a Laplacian, which we state shortly.(2) Verifying that a given set of random choices in Sparsify provides a sparse and accurate approximation to the input graph.The sparsity requirement is easy to check.To check that L ≈ ε L, we devise a verification algorithm that uses the algorithm of [43].The details are given in Lemma 5.7.(3) The Laplacian solver of [43] only works for multigraphs (graphs with integer edge weights) and we want an algorithm that works for general weighted graphs.To fix this, we extend the work of [43] by giving a simple reduction from the weighted case to the multigraph case.The details can be found in Appendix B.

Algorithm for Approximating Effective Resistances
A key ingredient in our deterministic sparsification algorithm is a deterministic nearly logarithmic space algorithm for approximating the pseudoinverse of an undirected Laplacian.
Theorem 5.3 ([43]).Given an undirected, connected multigraph G with Laplacian L = D − A and ε > 0, there is a deterministic algorithm that computes a symmetric PSD matrix L + such that L + ≈ ε L + and uses space O(log N • log log N ε ), where N is the bitlength of the input (as a list of edges).Note that the space complexity above assumes that the multigraph is given as a list of edges.If we instead think of parallel edges as integer edge weights, then N should be replaced by N • w max where w max is the maximum edge weight in G since an edge of weight w gets repeated w times in the edge-list representation.To work with general weighted graphs, we extend the result of [43].Lemma 5.4 (small space laplacian solver for weighted graphs).Given an undirected connected weighted graph G = (V , E, w) with Laplacian L = D − A, and 0 < ε < 1, there exists a deterministic algorithm that computes a symmetric PSD matrix L + such that L + ≈ ε L + and uses space O(log(N • w) log log(N • w/ε)), where w = w max /w min is the ratio of the maximum and minimum edge weights in G and N is the bitlength of the input.
A proof of Lemma 5.4 can be found in Appendix B. Lemma 5.4 immediately gives an algorithm for computing strong multiplicative approximations to effective resistances.Lemma 5.5.Let G = (V , E, w) be an undirected, connected, weighted graph and let R ab be the effective resistance of (a, b) ∈ E. There is an algorithm that computes a real number R ab such that . By the definition , this implies that Setting R ab = (e a − e b ) L + (e a − e b ) and noting that the vector matrix multiplication only adds logarithmic space overhead completes the proof.

Testing for Spectral Proximity
In this section, we give our deterministic, small-space procedure for verifying that two Laplacians spectrally approximate one another.We will need the following claim about the space complexity of matrix multiplication.
The proof of Claim 5.6 uses the natural divide-and-conquer algorithm and the fact that two matrices can be multiplied in logarithmic space.A detailed proof can be found in [43].
Using Lemma 5.4 and Claim 5.6, we prove the following lemma.The high-level idea is that testing whether two matrices L and L spectrally approximate each other can be reduced to approximating the spectral radius of a particular matrix In fact, it will be sufficient to check whether the trace of a sufficiently high power of M is below a certain threshold to deduce whether the spectral radius of M does not exceed 1.For intuition, replace the matrices with scalars m, , and , where Then, m ≤ 1 implies that √ m ≤ 1, which implies that | − | ≤ ε • -the kind of relative closeness we want between the matrices L and L when aiming for spectral approximation.Lemma 5.7.There exists a deterministic algorithm that, given undirected, connected, weighted graphs G and G with Laplacians L, L, and ε, α > 0, outputs YES or NO such that , where w = w max /w min is the ratio of the maximum and minimum edge weights in G and G and N is the bitlength of the input.
Set T = Tr(M) and t = log T log(1+α ) .The following claim shows that if we can compute Tr(M t ) exactly, then we can check the two cases in Lemma 5.7.However, we won't be able to compute Tr(M t ) exactly because that would require computing L + exactly.This will be addressed later.Claim 5.8.
Proof.Let Π = I − J be the orthogonal projection onto Im(L) = Im( L) (note that both L and L are the Laplacians of connected graphs).Using Claims 2.7 and 2.8, we know that L ≈ ε L if and only if for all v ∈ R n we have that Note that Π and L +/2 ( L − L)L +/2 have the same kernel -namely, span({1}) -and being perpendicular to 1 is preserved under both operators.Thus, the above holds if and only if it holds on all vectors v ⊥ 1.For such vectors, we have that Πv = v; hence, v Πv = v 2 .Thus, we have that L ≈ ε L if and only if for all vectors v ⊥ 1 we have that or, equivalently, Note that L +/2 ( L − L)L +/2 is symmetric; thus, it has real eigenvalues and is similar to the matrix ( L − L)L + on the space orthogonal to the kernel.Therefore, we can rewrite the above condition as Furthermore, we have that M = (( L − L)L + /ε) 2 has real, non-negative eigenvalues.
Consider the contrapositives of the implications stated in the claim, namely: If Tr(M t ) > T , then L ε L and if Tr(M t ) ≤ T , then L ≈ ε • √ 1+α L. Now, note that if Tr(M t ) > T = Tr(M), then ρ(M) > 1 because the only way that the trace of a matrix with real non-negative eigenvalues can increase under powering is if at least one of its eigenvalues exceeds 1.Thus, on the one hand, if Tr(M t ) > T = Tr(M), then L ε L. On the other hand, Recall that we cannot compute M exactly in small space because we do not know how to compute L + exactly in small space.Let where L ≈ γ L + for We have chosen γ so that Now, we will show that M is sufficient for our purposes.For this, we will need the following claim, whose proof is deferred to Appendix A.
Claim 5.9.Let C be a real, symmetric matrix and let A and B be real, symmetric, PSD matrices such that A B. Suppose that ker(A) = ker(B) = ker(C).Then, ρ(CA) ≤ ρ(CB).
Next, we show that M is sufficient for our purposes.Small-Space Spectral Sparsification via Bounded-Independence Sampling 7:23 and that Thus, from the arguments above, we can distinguish the case of L ≈ ε L from L ε • √ 1+α L by computing Tr( M t ) for t = log Tr( M)/log(1 + α ) and comparing the result to Tr( M).Now, we prove Claim 5.10.
Proof.By assumption, we have that L ≈ γ L + ; hence, and note that C is symmetric.By Claim 5.9, we have that Noting that for all matrices A with real eigenvalues, we have that ρ(A 2 ) = ρ(A) 2 and that (CL as desired. Thus, our distinguishing algorithm goes as follows: Approximate L ≈ γ L + using the Laplacian solver algorithm given in Lemma 5.4, compute Tr( M t ) and answer according to whether it is greater than or less than Tr( M).Claims 5.8 and 5.10 establish its correctness.We are left with establishing the space complexity.Now, we argue that we can (loosely) bound Tr( M) by poly(N • w, 1/α, 1/ε).If d max is the maximum weighted degree of the graph corresponding to L and L, then we have that the spectral norms of L and L are at most 2 • d max = O(N • w max ).Claim 2.9 says that the smallest non-zero eigenvalue of L is lower bounded by w min /n 2 .Note that L + equals the reciprocal of the smallest non-zero eigenvalue of L; hence, we have that L + ≤ n 2 /w min and, therefore, L ≤ (1 +γ ) • n 2 /w min = poly(N /w min , 1 α ).It follows that Tr( M) = poly(N • w, 1/α, 1/ε).Plugging this into the space complexity gives us a space bound of .

Completing the Proof of Theorem 5.1
We can now prove Theorem 5.1.As noted above, the algorithm proceeds by first approximating the sampling probabilities and then sparsifying G where the surviving edges are chosen from a small k-wise independent sample space whose marginals are set properly.Each potential sparsifier is checked using the algorithm given in Section 5.2.
Proof of Theorem 5.1.Set δ = 1 4 , ε = 4ε 5 and for α soon to be determined.These parameters are chosen in accordance with the parameters required for Sparsify to succeed with probability 1/2 and approximation error ε (see Lemma 3.12).Set α = α/(4 + α).We compute approximate effective resistances R ab for each edge (a, b) in G using Lemma 5.5, so that This takes O(log(N • w) log log((N • w)/α)) space.Then, we compute approximate sampling probabilities as follows: That is, we truncate the required (approximate) sampling probabilities to log 1 α bits of precision.In particular, denoting the precise sampling probabilities by p ab = min {1, Furthermore, we have an additional error of α due to the truncation; thus, p ab − p ab ≤ α/2+α ≤ α.
We want to set α so that p ab is a multiplicative approximation to p ab for all (a, b) ∈ E, which requires α to be smaller than min (a,b)∈E) {p ab }.Note that we can indeed consider the minimal non-zero eigenvalue of L + because e a − e b is perpendicular to the one-dimensional kernel of L (the all-ones vector).
In light of the above, we can set α so that 1/α = 2 • d max = O(N • w) and get a 1/2-multiplicative approximation to the sampling probabilities.Now, consider the k-wise independent sample space D ⊆ {0, 1} |E | guaranteed to us by Lemma 2.4, substituting t = log(1/α ) .By Lemma 2.4, each element of D can be sampled using space.For each element of D, construct the corresponding sparse graph.Note that the space used to cycle through each element can be reused.Lemma 3.12 tells us that at least 1 − 2δ = 1/2 of the Laplacians of the resulting graphs ε-approximate the Laplacian of G and have edges.For each of these graphs, we run the verification algorithm with accuracy parameter 9/16, which is guaranteed to find a graph with the above sparsity whose Laplacian approximates the Laplacian of G with error Again, the space used for the verification process can be reused.Adding up the space complexities gives us a total of Note that the final result is vacuous when ε ≤ 1/n.Thus, we can without loss of generality assume that ε ≥ 1/n.This gives a total space complexity of O(k log(N •w)+log(N •w) log log(N •w)).Observing that y Ay = x C ACx and y By = x C BCx and noting that x was arbitrary completes the proof.
For item 2, assume that ker(C) ⊆ ker(A) = ker(B) and we will show that The other direction follows from item 1. Fix x ∈ R n .We want to show that If x ∈ ker(A) = ker(B), then the above is trivially true.Thus, without loss of generality, we can take x ∈ ker(A) ⊥ = Im(A) = Im(B).Let y = C + x.By assumption, we have that Let Π = CC + .We can rewrite the above as Note that by the definition of the Moore-Penrose pseudoinverse, Π is the projection onto ker(C) ⊥ ⊇ Im(A) = Im(B).Since we assumed without loss of generality that x ∈ Im(A) = Im(B), it follows that Πx = x.Substituting this into the above establishes Equation ( 2) and completes the proof.
Claim 4.2 restated.Let G and G be undirected graphs on n vertices with Laplacians L and L, respectively.If G is connected and G is disconnected, then L ε L for any ε > 0.
Proof.Since G is a connected, undirected graph, we have that ker(L) = span({1}).We will show that there is a vector v ∈ ker( L) such that v ker(L).This will complete the proof because it implies that v Lv = 0 but v Lv 0 and, hence, the quadratic forms of these Laplacians cannot multiplicatively approximate each other.
Let C be a connected component of G and let V \ C be the remaining vertices.By assumption, V \C . Let L(C) be the Laplacian of G if all of the edges in V \C were deleted and define L(V \C) analogously.Note that L = L(C) + L(V \ C).
Define v so that v i = 0 for all i ∈ V \ C and v i = 1 for all i ∈ C.Then, we have that but v ker(L).
Claim 5.9 restated.Let C be a real, symmetric matrix and let A and B be real, symmetric, PSD matrices such that A B. Suppose that ker(A) = ker(B) = ker(C).Then, ρ(CA) ≤ ρ(CB).
Claim A.1.Let M ∈ R n×n be a (possibly asymmetric) matrix and let A, B ∈ R n×n be symmetric PSD matrices such that A B. Then, (1) O(log(N • w max ) • log log(N • w max )) Then, they boost this to an ε-spectral approximation using an iterative method known as the Richardson iterations.This latter step uses an additional O(log(N • w max ) • log log N •w max ε ) space, giving the final space complexity.The first step is more delicate and is where the authors required integer edge weights, while Richardson iterations are agnostic to the edge weights and boosts any c-spectral approximation to the pseudoinverse to an ε-spectral approximation in space O(log(N • w max ) • log log(N • w max )/ε), as long as c < 1/2.Now, given a weighted Laplacian L, we will construct a multigraph having Laplacian L such that a scalar multiple of (L ) + is a constant spectral approximation to L + .Applying the first step of [43] to L will allow us to compute a constant spectral approximation to (L ) + , which, in turn, will be a constant spectral approximation to L + by transitivity (Item (2) of Claim B.1).With this, we can boost to an ε-spectral approximation using Richardson iterations.
Set z = min{1, w min }, where w min is the minimum edge weight in G. Set δ = 1 6 , γ = δ 1−δ = 1 5 , and t = log 2n 3 δ z .We construct a multigraph G on n vertices as follows.For each edge (a, b) ∈ E(G), add an edge between a and b in G with weight 2 t • w ab .Note that from the way we set t, these new weights are all positive integers.Let L = D − A be the Laplacian of G and note that for all a, b ∈ [n], we have that Since for all i, D ii = j A i j and 2 Letting E = 2 −t • L − L, we can conclude that the sum of absolute values of the entries of each column of E is bounded by 2 We will show that 2 −t •L ≈ δ L, which, from Claim B.1, will imply that 2 t •(L ) + = (2 −t •L ) + ≈ γ L + .To see that 2 −t • L ≈ δ L, fix v ∈ R n .Since L and L are both Laplacians of undirected graphs with the same connectivity status, they share a kernel.Thus, for all vectors in the kernel, L and L have equal quadratic forms and, without loss of generality, we can assume that v is in the orthogonal complement of the kernel.
For any symmetric matrix M, we have that |v Setting M = E and recalling our bound on the • 1 -norm of E, we get that From Claim 2.9, we have that v Lv ≥ v v • z n 2 .Combining this with the above gives from which it follows that 2 −t • L ≈ δ L. Thus, 2 t • (L ) + ≈ γ L + .
Using the first step in [43], we compute L, which is a γ -approximation to 2 t • (L ) + .Note that the maximum edge weight in G is upper-bounded by w max • 2 t = poly(n • w max /w min ).Thus, the invocation of [43] uses space O(log(N • w max /w min ) • log log(N • w max /w min )).From Item (2) of Claim B.1, we have that L ≈ 2γ +γ 2 L + .Since 2γ + γ 2 < 1 2 , we can apply Richardson iterations to L to compute an ε-spectral approximation to L + in total space O(log(N • w max /w min ) log log(N • w max /ε • w min )), as desired.
For every edge (a, b) ∈ E, define the edge Laplacian of (a, b) to be L ab = (e a − e b )(e a − e b ) = (e b − e a )(e b − e a ) .
w) be an undirected weighted graph on n vertices with Laplacian L. Fix (a, b) ∈ E and recall that L ab = (e a − e b )(e a − e b ) .Then,

7 : 12 D
. Doron et al.Proof of Lemma 3.7.Define Z ab = X ab − w ab • L +/2 L ab L +/2 and let Z = (a,b)∈E Z ab = X − Π.Our goal is to bound E[Z k ].To do this, we define Z ab (a,b)∈E

. 7 : 14 D
. Doron et al.Now we can prove the main theorem of this section.

Claim 4 . 6 .
Let G = (V , E) be an unweighted edge-transitive graph.Then, for every two edges e = (a, b) and e = (a , b ) in E, it holds that R ab = R a b .

Lemma 4 . 7 .
Let G = (V = [n], E) be the complete graph.There exists a distribution D ∼ {0, 1} |E | that is 3-wise independent with marginals 1/4 such that sampling a subgraph of G according to D would result in a disconnected graph with probability at least 1 − 2/2 n .7

д 2 − 1 , 4 + 2 − 1 ), 2 ≤
the latter being the Moore bound, we have that log(2c log n) log n ≤ log(2c) + log 3д log log(d − 1) log log(d − 1), log(2c) log(d −1) ≤ α 2 6 , and log log(d −1) and uses space O(log(N • w) • log log N •w ε ), where w = w max /w min is the ratio of the maximum and minimum edge weights in G and N is the bitlength of the input.Proof.Let L be the Laplacian of G.By the definition of effective resistance, we have that R ab = (e a − e b ) L + (e a − e b ).From Lemma 5.4, we can compute a matrix L such that

Claim 5 . 11 .
The distinguishing algorithm uses space O(log(N •w)•log log N •w α ε +log(N •w)•log 1 α ), where w = w max /w min is the ratio of the maximum and minimum edge weights in G and G and N is the bitlength of the input.Proof.By Lemma 5.4, we can computeL in space S = O(log(N •w) log log N •w γ ) = O(log(N •w)• log log N •w α ), and up to constant factors, this is also the space required to compute M. Note that the bitlength of M is N = O(N + log(1/ε)) = O(N ).Claim 5.6 and composition of space-bounded algorithms (Proposition 5.2) 8 say that we can compute M t usingS = O S + log N • log t = O S + log N • log log Tr( M) + log 1 log(1 + α)space.Finally, computing Tr( M t ) requires summing n entries in M t ; thus, the required additional space can be bounded by O(log n + log(N • t)) = O(S ).

Claim 5 . 12 .
Let d max be the maximum weighted degree over all vertices in G.Then, for all (a, b) ∈ E, p ab ≥ 1/d max .Proof.Since s > 1 and w ab ≥ 1 (all edge weights are positive integers) we have that p ab ≥ R ab .Let λ min (C) denote the minimal non-zero eigenvalue of a matrix C. To lower bound R ab , we use the variational characterization of eigenvalues and the definition of effective resistance to write R ab = (e a − e b ) L + (e a − e b ) ≥ λ min (L + ) • e a − e b Claim 2.5 restated.Let A, B, C be n × n symmetric PSD matrices and suppose that B C. Then, A + B ≤ A + C .