diGRASS: Directed Graph Spectral Sparsification via Spectrum-Preserving Symmetrization

Recent spectral graph sparsification research aims to construct ultra-sparse subgraphs for preserving the original graph spectral (structural) properties, such as the first few Laplacian eigenvalues and eigenvectors, which has led to the development of a variety of nearly-linear time numerical and graph algorithms. However, there is very limited progress for spectral sparsification of directed graphs. In this work, we prove the existence of nearly-linear-sized spectral sparsifiers for directed graphs under certain conditions. Furthermore, we introduce a practically-efficient spectral algorithm (diGRASS) for sparsifying real-world, large-scale directed graphs leveraging spectral matrix perturbation analysis. The proposed method has been evaluated using a variety of directed graphs obtained from real-world applications, showing promising results for solving directed graph Laplacians, spectral partitioning of directed graphs, and approximately computing (personalized) PageRank vectors.


INTRODUCTION
Graph-based analysis is an essential technique that has been widely adopted in many electronic design automation (EDA) problems, such as the tasks for logic synthesis and verification, layout optimization, static timing analysis (STA), network partitioning/decomposition, circuit modeling and simulation, and so on.In recent years, several research problems for simplifying large graphs leveraging spectral graph theory have been extensively studied by mathematics and theoretical computer science (TCS) researchers [5,12,13,23,26,35,40].Recent spectral graph sparsification research allows constructing nearly-linear-sized subgraphs that can well preserve the spectral (structural) properties of the original graph, such as the the first few eigenvalues and eigenvectors of the graph Laplacian.The related results can potentially lead to 102:2 Y. Zhang et al. the development of a variety of nearly-linear time numerical and graph algorithms for solving large sparse matrices and partial differential equations (PDEs), graph-based semi-supervised learning (SSL), computing the stationary distributions of Markov chains and personalized PageRank vectors, spectral graph partitioning and data clustering, max flow and multi-commodity flow of undirected graphs, nearly-linear time circuit simulation and verification algorithms, and so on [10,12,13,18,22,24,40,41,43].
However, there is not a unified approach that allows for truly-scalable spectral sparsification of directed graphs.For example, the state-of-the-art sampling-based methods for spectral sparsification are only applicable to undirected graphs [24,38,41]; the latest theoretical breakthrough in spectral sparsification of directed graphs [11] can only handle strongly-connected directed graphs, 1 which inevitably limits its applications when confronting real-world graphs, since many directed graphs may not be strongly connected, such as the graphs used in chip design automation (e.g., timing analysis) tasks as well as the graphs used in machine learning and data mining tasks.
Consequently, there is still a pressing need for the development of highly-robust (theoreticallyrigorous) and truly-scalable (nearly-linear complexity) algorithms for reducing real-world largescale directed graphs while preserving key graph spectral (structural) properties.In summary, we make the following contributions: (1) We prove the existence of nearly-linear-sized spectral sparsifiers for directed graphs whose symmetrized undirected graphs only contain non-negative edge weights and introduce a practically-efficient yet unified spectral sparsification approach (diGRASS) that allows simplifying real-world, large-scale (un)directed graphs with guaranteed preservation of the original graph spectra.(2) We show that leveraging a scalable spectral matrix perturbation analysis for constructing ultra-sparse subgraphs will allow us to well preserve the key eigenvalues and eigenvectors of the original directed graph Laplacians.(3) Our approach is applicable to a much broader range of directed graphs when comparing with the state-of-the-arts that may only be applicable to specific types of graphs, such as undirected or strongly-connected directed graphs.(4) Through extensive experiments for real-world directed graphs, diGRASS has been leveraged for computing PageRank vectors, spectral partitioning of directed graphs, and solving directed graph Laplacian matrices.
The spectrally-sparsified directed graphs constructed by diGRASS will potentially lead to the development of much faster numerical and graph-related algorithms.For example, spectrallysparsified social (data) networks allow for more efficient modeling and analysis of large social (data) networks; spectrally-sparsified neural networks allow for more scalable model training and processing in emerging machine learning tasks; spectrally-sparsified web-graphs allow for much faster computations of personalized PageRank vectors; spectrally-sparsified integrated circuit networks will lead to more efficient partitioning, modeling, simulation, optimization and verification of large chip designs, and so on.
The rest of this article is organized as follows.Section 2 describes recent works related to spectral algorithms for directed graphs and the key idea of the proposed method.Section 3 introduces the background of the (un)directed graphs and spectral graph sparsification.Section 4 introduces a novel theoretical framework for unified spectral sparsification of directed graphs.Section 5 introduces a practically-efficient algorithm for spectral directed graph sparsification.Section 6 describes diGRASS: Directed Graph Spectral Sparsification 102:3 several important applications of the proposed diGRASS algorithm.Section 7 demonstrates comprehensive experiment results of diGRASS for a variety of real-world, large-scale directed graphs, which is followed by the conclusion of this work in Section 8.

RELATED WORKS
This section firstly provides a simple overview of undirected graph sparsification.Then we introuduce the existing directed graph symmetrization methods that convert directed graphs into undirected ones such that existing sparsification algorithms on undirected graphs can be directly utilized.At last, prior theoretical directed graph sparsification algorithms are introduced briefly.

Graph Sparsification
Graph sparsification aims at finding a subgraph (sparsifier) that has the same set of vertices but much fewer edges than the original graph.There are several types of graph sparsifiers proposed for undirected graphs.Graph spanners [2][3][4]34] are trying to preserve the pair-wise shortest-path distance between the original graph and the sparsifier.Cut sparsifiers [7,21] are targeting preserving the cut values between cuts.Spectral sparsification methods preserve the graph spectral (structural) properties, such as distances between vertices, effective resistances, cuts in the graph, as well as the stationary distributions of Markov chains [12,13,40].Therefore, spectral graph sparsification is a much stronger notion than cut sparsification, and more spectral related sparsification methods are proposed in recent years, such as spectral preservation of pseudoinverse for the graph Laplacian [29] and linear-sized sparsifier [28].

Directed Graph Symmetrization
When dealing with directed graphs, it's natural to convert directed graphs into undirected ones so that existing undirected graph algorithms can be subsequently leveraged.The related transforming procedures are called symmetrization methods.We will review three existing graph symmetrization methods, including the A + A 2 symmetrization, bibliometric symmetrization methods, and the random-walk symmetrization.
-A + A symmetrization simply ignores the edges' directions, which is the simplest and most efficient way for directed graph symmetrization.However, edge directions may play an important role in directed graphs.As shown in Figure 1, edges (8, 1) and (4, 5) seem to have the equal importance in the symmetrized undirected graph A + A .However, in the original directed graph, edge (8, 1) is much more important than edge (4, 5), since removing edge (8, 1) will lead to the loss of more connections in the directed graph.For example, removing edge (4, 5) will only affect the walks from node 4 to any other nodes as well as walks from any other nodes to node 5.However, if we remove edge (8, 1) in the directed graph, it will affect the walks from node 8 to any other nodes and the walks to node 1; there will also be no access from nodes 5, 6, 7, and 8 to nodes 1, 2, 3, and 4.
-Bibliographic symmetrization [37] adopts AA + A A as the adjacency matrix after symmetrization to take the in-going and out-going edges into consideration.However, it cannot be scaled to large-scale graphs since it will create much denser undirected graphs after symmetrization.Also, disconnected graphs can be created due to the AA + A A symmetrization, as shown in Figure 1.
-Random-walk symmetrization [11] is based on random walks and allows normalized cut to be preserved after symmetrization.This is also a new symmetrization approach used in recent work for defining the Laplacian matrix of directed graphs.When defining the Laplacian matrix, we  can apply the random walk for aperiodic graphs, or lazy random walk scheme for periodic graphs.In [11], Cheeger's inequality has been extended to directed graphs and plays a significant role in spectral analysis of directed graphs.It connects Cheeger constant (conductance) with spectral properties (eigenvalues of the graph Laplacian) of a graph.It also provides the bound for the smallest eigenvalue of the directed graph Laplacian.However, the related theoretical results can only be applied to strongly-connected aperiodic directed graphs, which are rare to find in real-world applications.

Directed Graph Sparsification Algorithms
Refs. [13] and [12] expanded the scope and work on not only strongly-connected graphs but also Eulerian graphs.However, there are obvious limitations with this approach.For example, a random graph needs to be converted into an Eulerian graph via an Eulerian scaling procedure by introducing additional edges, changing the directions of the edges, or reweighing the edges, which may jeopardize the original graphs' spectral properties [13].In addition, the Eulerian scaling is very timing consuming for large-scale graphs.Lastly, even though the complexity of algorithms is nearly linear-time, it is still not fast in practice for different applications, such as solving asymmetric linear systems, computing the stationary distribution of a Markov chain or computing expected commute time in a directed graph, and so on.
In [9], the authors design cut sparsifiers and sketches for directed graphs.Cut is a property that connects between undirected and directed graphs.How to construct cut sparsifiers for directed graphs really depends on cut balance, which is the ratio between incoming and outgoing edges in any given cut.

BACKGROUND 3.1 Definitions and Preliminaries
Undirected graph.Consider a weighted, undirected graph G = (V , E, ω) with n = |V | and m = |E|, where V denotes a set of vertices, n denotes the number of vertices, E denotes a set of edges, m denotes the number of edges, and ω denotes a weight function that assigns a positive weight to 102:5 each edge.The adjacency matrix of graph G can be defined as follows: The Laplacian matrix can be computed by where D G is a diagonal matrix with elements D G (p, p) = t p ω(p, t).For any real vector x ∈ R n , the Laplacian quadratic form of graph G is defined as x L G x = (p,q)∈E ω(p, q)(x(p) − x(q)) 2 .
Recall that the undirected graph Laplacian is defined in Equation (2).Alternatively, the undirected graph Laplacian can also be written as [38] L G = B WB, where matrix W is the diagonal matrix with W (p, p) to be the node degree for node p and matrix B is shown as Directed graph.Consider a directed graph G = (V , E G , w G ) with V denoting the set of vertices, E G representing the set of directed edges, and w G denoting the associated edge weights.Let n = |V |, m = |E G | be the size of node and edge set.In the following, we denote the diagonal matrix by D G with D G (i, i) being equal to the (weighted) outdegree of node i, as well as the adjacency matrix of G by A G : Then the directed Laplacian matrix can be constructed as follows [13,14]: The directed graph Laplacian matrix can also be constructed as L G = B WC, where W and B are defined the same as above, while matrix C is a signed edge-vertex incidence (injection) matrix defined as follows: 1 ifv is pth edge's head; 0 ifv is pth edge's tail; 0 otherwise.
For better illustration, we have summarized the most-frequently used symbols in our article in Table 1.It can be shown that any directed (undirected) graph Laplacian constructed using Equation (6) will satisfy the following properties: (I) Each column (and row) sum is equal to zero; (II) All off-diagonal elements are non-positive; (III) The Laplacian matrix is asymmetric (symmetric) and indefinite (positive semidefinite).

Spectral Graph Sparsification
Spectral sparsifier was first introduced by Spielman and Teng [40].Given an undirected graph with n vertices and m edges, a nearly-linear time algorithm was introduced for building (1 ± ϵ) spectral sparsifiers with O(n log n/ϵ 2 ) edges in [38].S is said to be a (1 ± ϵ) spectral sparsifier of G if the following inequality holds for any x ∈ R n :

Before symmetrization
After symmetrization where L G and L S denote the symmetric diagonally dominant (SDD) Laplacian matrices of graphs G and S, respectively.The key to the analysis of spectral sparsifier S, which is the improved construction of Equation ( 8), is to observe the following equation [6]: where x ∈ R n .Relative condition number can be defined as κ(L G , L S ) ≤ σ 2 , implying that a smaller relative condition number or σ 2 corresponds to a higher (better) spectral similarity between two graphs.

A THEORETICAL FRAMEWORK FOR UNIFIED SPECTRAL SPARSIFICATION
In this section, we provide an innovative method to convert a directed graph into an undirected one with the proposed L G L G symmetrization.We will also introduce the spectral properties of the new symmetrization scheme, as well as the proof for the existence of nearly-linear-sized spectral sparsifier for a directed graph under certain conditions.

Our Contribution: The L G L G Symmetrization Scheme [44]
For directed graphs, the subgraph S can be considered spectrally similar to the original graph G if the condition number or the ratio between the largest and smallest singular values of L + S L G is close to 1, where L + S denotes the Moore-Penrose pseudoinverse of L S .Spectral sparsification of directed graphs is equivalent to finding an ultra-sparse subgraph S such that the condition number of Theorem 2. For any (un where L G u is positive semi-definite (PSD) and will have the all-one vector as its null space.
Proof.The row sum of the Laplacian matrix equals to zero, which can be proved as follows: (10) which indicates the all-one vector is the subspace of the null space of L G u .Meanwhile, the column sum of L G u equals to zero, which can be proved by the same way.Also, it's very straightforward to prove It can be shown that G u will contain negative edge weights under the following condition: The edges will be coupled together and cause a denser graph in L G u , if a node has more than one outgoing edge.As an example shown in Figure 2, when edge e2 is added into the initial graph G that includes a single edge e1, an extra edge (shown in red dashed line) coupling with e1 will be created in the resultant undirected graph G u ; similarly, when an edge e3 is further added, two extra edges coupling with e1 and e2 will be created in G u .When the last edge e4 is added, It forms a clique.

Why not L G L G Symmetrization?
L G L G symmetrization is a novel spectrum-preserving Laplacian symmetrization procedure for converting directed graphs into undirected ones.On the other hand, L G L G does not work for this purpose since L G L G does not correspond to the Laplacian of an undirected graph.Since the row sum of L G is not zero, the row sum of the L G L G will not be zero shown as follows: 102:8 Y. Zhang et al.
Although L G L G is a PSD matrix, the all-one vector is not its null space, and existing methods for spectral sparsification of undirected graphs [5,43] cannot be exploited for sparsifying directed graphs.

Existence of Nearly-linear-sized Spectral Sparsifier
In this section, we prove the existence of nearly-linear-sized spectral sparsifier for directed graphs under the condition that their corresponding undirected graphs (obtained through the proposed Laplacian symmetrization scheme) only contain non-negative edge weights.
Lemma 3. Let ϵ > 0, and u 1 , u 2 , . . ., u m denote a set of vectors in R n that allow expressing the identity decomposition as [5] 1≤i≤m where id R n ∈ R n×n denotes the identity matrix.Then there exists a series of non-negative coefficients Theorem 4. For an undirected graph PSDs such that the corresponding undirected graph Laplacian L S u = L S L S satisfies the following condition for any x ∈ R |V | [27]: Proof.Lemma (3) proves the existence of the sparsifier for an undirected graph with nonnegative edge weights.Given a directed graph G with m G edges, it can be shown that the Laplacian of the symmetrized undirected graph G u can be expressed as combination of m G PSD matrices rather than m G u PSD matrices.The key of our approach is to construct a set of vectors u 1 , . . ., u m G in R |V | such that u i can be expressed as an identity decomposition shown in Equation (13).Since L G u and its Moore-Penrose inverse (pseudoinverse) L + G u can be written as where vectors u i and λ i are the eigenvector and eigenvalue, respectively, it can be shown that where id L Gu is the identity on im(L G u ) = ker(L G u ) .In the following, we show how to construct vectors u i for i = 1, . . .,m G .The undirected Laplacian after symmetrization can be written as Consequently, U n×m G matrix with u i for i = 1, . . .,m G as its column vectors can be constructed as It can be shown that U n×m G will satisfy the following equation: According to Lemma 3, we can always construct a diagonal matrix T ∈ R m×m with t i as its ith diagonal element.Then there will be at most O(n/ϵ 2 ) positive diagonal elements in T, which allows constructing L S u as that corresponds to the directed subgraph S for achieving (1 + ϵ)-spectral approximation of G as required by Equation ( 15).Since matrix W o is a symmetric positive semidefinite matrix and T is a symmetric diagonal matrix, L S u can be further written into Therefore, the Laplacian of the directed graph L S can be expressed as Also, the following inequality holds for any Since i t i u i u i = UTU and we have while Equation ( 23) can be proved through the following steps based on the Courant-Fischer Theorem: Equation (25) demonstrates that S u is a (1 + ϵ)-spectral sparsifier of graph G u under specific conditions.
Theorem 4 proves that there exists an undirected graph S u and the connection between graph S u and the original directed graph G.The next key step is to show that L S u can be factorized into the product of a directed graph Laplacian L S and its transpose.Also, Theorem 4 does not immediately imply a sparse structure in L S .To prove the existence of nearly-linear-sized spectral sparsifier for a directed graph, we further assume the undirected graph G u obtained through the proposed Laplacian symmetrization only contains edges with non-negative weights.Next, the following Lemma 5 can be exploited to prove the existence of nearly-linear-sized L S based on Equation ( 22): Lemma 5. [SparseCholesky Algorithm [25]] Given an n × n undirected Laplacian matrix L u with O(m) non-positive off-diagonal elements (non-negative edge weights), the SparseCholesky Algorithm [25] runs in expected time O(m log 3 n) and computes a permutation Π, a lower triangular matrix L with O(m log 3 n) nonzero entries, and a diagonal matrix D such that with probability 1 − 1 poly(n) , we have where Z = ΠLDL Π , and Z has a sparse Cholesky factorization.
Reference [25] provides a nearly-linear time algorithm for constructing L using the sparsified Cholesky factorization method, which is a process of constructing clique structure of the Schur complement.More importantly, [25] demonstrates that L is a lower diagonal matrix which always corresponds to a feed-forward directed graph.As a result, we can conclude that given a Laplacian matrix of an undirected graph with non-negative edge weights, it can be always factorized into the product of a nearly-linear-sized directed Laplacian matrix and its transpose through the LDL T decomposition (sparsified Cholesky factorization).
Combining Theorem 4 and Lemma 5 will allow us to prove the following main theorem.Theorem 6.For any given directed graph G = (V , E G , w G ), when its undirected graph G u = (V , E G u , w G u ) obtained via the proposed Laplacian symmetrization only contains non-negative edge weights, there exists a (1 + ϵ)-spectral sparsifier S = (V , E S , w S ) with O(n log 3 n/ϵ 2 ) edges.
However, the above theorem has its own limitations.For example, when comparing with the works of [11,13,14] which can preserve the cut in the sparsifier, ours cannot.

DIGRASS: A PRACTICALLY-EFFICIENT ALGORITHM FOR SPECTRAL SPARSIFICATION OF DIRECTED GRAPHS
To apply our theoretical results to deal with real-world directed graphs, the following concerns should be addressed in advance: -The undirected graph L G L G may become too dense to compute and thus may impose high cost during spectral sparsification.As we introduced in Section 4.1, the L G L G symmetrization scheme will create extra edges if a node has more than one outgoing edge in L G , as shown in Figure 2.Under this condition, it may be possible to directly perform symmetrization on L G if L G is relatively sparse.However, for general cases where L G may be very dense, the generated L G u will be much denser due to the edge coupling effect, which will inevitably impose high computational and memory cost for following spectral sparsification procedure.To achieve a general algorithmic framework for handling directed graph with various densities, it is necessary to solve this issue during the framework design.-It can be quite challenging to convert the sparsified undirected graph to its corresponding directed sparsifier L S , even when L S u is available.

102:11
There is no guarantee that the L S is an one-to-one correspondence to L S u .For example, it is possible that multiple L S correspond to the same symmetrized undirected graph Laplacian L S u .So it can be challenging to convert the L S u back to L S , even when L S u is available.While the coupling edges generated during symmetrization will make the situation even harder.
To address the above concerns for unified spectral graph sparsification, we propose a practicallyefficient framework with following desired features: (1) our approach does not require to explicitly compute L G L G but only the matrix-vector multiplications; (2) our approach can effectively identify the most spectrally-critical edges for dramatically decreasing the relative condition number; (3) although our approach requires to compute L S L S , the L S u matrix density can be effectively controlled by carefully pruning spectrally-similar edges through the proposed edge similarity checking scheme.

Initial Sparsifier Construction
Motivated by the recent research on low-stretch spanning trees [1,17] and spectral perturbation analysis [18,43] for nearly-linear-time spectral sparsification of undirected graphs, we propose a practically-efficient algorithm for sparsifying general directed graphs by first constructing the initial subgraph sparsifiers of directed graphs through the following steps: as a new adjacency matrix, where D denotes the diagonal matrix with each element equal to the row (column) sum of (A G + A G ). Recent research shows such split transformations can effectively reduce graph irregularity while preserving critical graph connectivity, distance between node pairs, the minimal edge weight in the path, as well as outdegrees and indegrees when using push-based and pull-based vertex-centric programming [33].-Step 2: Construct a maximum spanning tree (MST) based on D −1 (A G + A G ), which allows effectively controlling the number of outgoing edges for each node so that the resultant undirected graph after symmetrization will not be too dense.-Step 3: Recover the direction of each edge in the MST and make sure each node of its sparsifier has at least one outgoing edge if there are more than one in the original graph for achieving stronger connectivity in the initial directed sparsifier.

Spectral Sensitivity of Off-subgraph Edges
As aforementioned, when the condition number of L + S u L G u is small, the condition number of L + S L G will be small, which represents that graph S is a good spectral sparsifier for graph G.To this end, we will exploit the following spectral perturbation analysis framework for computing spectral sensitivity of each off-subgraph edges.For the generalized eigenvalue problem let matrix V = [v 1 , . . ., v n ].Then v i and λ i can be constructed to satisfy the following orthogonality requirement: Consider the following first-order generalized eigenvalue perturbation problem: where a small perturbation δ L S u in L S u is introduced, leading to the perturbed generalized eigenvalues and eigenvectors λ i + δλ i and v i + δ v i .By only keeping the first-order terms, Equation ( 29) becomes Let δ v i = j ψ i,j v j , then Equation ( 30) can be expressed as Based on the orthogonality properties in Equation ( 28), multiplying v i to both sides of Equation ( 31) results in which further leads to Then the task of spectral sparsification of general (un)directed graphs will require to recover as few as possible off-subgraph edges to the initial directed subgraph S such that the largest eigenvalues, or the condition number of L + S u L G u can be dramatically reduced.Expand δ L S u with only the firstorder terms as where δ L S = w G (p, q)e p,q e p for (p, q) ∈ E G \ E S , e p ∈ R n denotes the vector with only the pth element being 1 and others being 0, and e p,q = e p − e q .The spectral sensitivity for each off-subgraph edge (p, q) can be expressed as It is obvious that Equation ( 35) can be leveraged to rank the spectral importance of each offsubgraph edge.Consequently, spectral sparsification of general graphs can be achieved by only recovering a few dissimilar off-subgraph edges with large spectral sensitivity values.In this work, the following method based on t-step power iterations is proposed for efficient computation of dominant generalized eigenvectors where h 0 denotes a random vector.When the number of power iterations is small (e.g., t ≤ 3), h t will be a linear combination of the first few dominant generalized eigenvectors corresponding to the largest few eigenvalues.Then the spectral sensitivity for the off-subgraph edge (p, q) can be approximately computed by The computation of h t through power iterations requires solving the linear system of equations L S u x = b for t times.Note that only L S u needs to be explicitly computed for generalized power iterations.The Lean Algebraic Multigrid (LAMG) [30] solver is leveraged for computing h t , which can handle undirected graphs with negative edge weights and has an empirical O(|E S u |) complexity for solving Laplacian matrices L S u .

Lean Algebraic Multigrid (LAMG)
The setup phase of LAMG contains two main steps [30], as shown in Figure 3. First, a nodal elimination procedure is performed to eliminate disconnected and low-degree nodes.Next, a node aggregation procedure is applied for aggregating strongly connected nodes according to the following affinity metric c uv for nodes u and v: X(u, :), X(u, :))(X(v, :), X(v, :)) where X = (x (1) , . . ., x (K) ) is computed by applying a few Gauss-Seidel (GS) relaxations using K initial random vectors to the linear system equation L S u x = 0. Let x represent the approximation of the true solution x after applying several GS relaxations to L S u x = 0. Due to the smoothing property of GS relaxation, the latest error can be expressed as e s = x − x, which will only contain the smooth components of the initial error, while the highly oscillating modes will be effectively damped out [8].Nodes u and v are considered strongly connected to each other if X(u, :) and X(v, :) are highly correlated for all the K test vectors (or a larger c uv value), which thus should be aggregated to form a coarse level node.
Once the multilevel hierarchical representations of the original graph (Laplacians) have been created, algebraic multigrid (AMG) solvers can be built and subsequently leveraged to solve large Laplacian matrices efficiently.

Edge Spectral Similarities
The proposed spectral sparsification algorithm will first sort all off-subgraph edges according to their spectral sensitivities in descending order (p 1 , q 1 ), (p 2 , q 2 ), ... and then select top few offsubgraph edges to be recovered to the initial subgraph.To avoid recovering redundant edges into the subgraph, it is indispensable to check the edge similarities: only the edges that are not similar to each other will be added to the initial sparsifier.To this end, we exploit the following spectral embedding scheme for distinguishing off-subgraph edges leveraging approximate dominant generalized eigenvectors h t computed by Equation (36): q k h t e p,q e p,q k + e p,q k e p,q h where (p, q k ) are the directed edges sharing the same head with (p, q) but different tails.Then the proposed scheme for checking the spectral similarity of two off-subgraph edges will include the following steps: Step 1: Perform t-step power iterations with r = O(log n) initial random vectors h (1)  0 , . . ., h (r) 0 to compute r approximate dominant generalized eigenvectors h (1)  t , . . ., h (r) t ; Step 2: For each edge (p, q), compute an r -dimensional spectral embedding vector s p,q ∈ R r with s p,q (r ) = ψ p,q (h (r ) t ); 102:14 Y. Zhang et al.
Step 3: Check the similarity of two off-subgraph edges (p i , q i ) and (p j , q j ) with SpectralSim(i, j) = 1 − ||s p i ,q i − s p j ,q j || max(||s p i ,q i ||, ||s p j ,q j ||) .
If SpectralSim(i, j) < ϱ for a given threshold ϱ, edge (p i , q i ) is considered spectrally dissimilar to (p j , q j ).Given a list of candidate off-subgraph edges, Algorithm 2 is proposed for edge similarity checking.

Algorithm Flow and Complexity of diGRASS
Algorithm 1 shows the algorithm flow for directed graph sparsification, where L G is the Laplacian matrix for original graph, L S is the Laplacian matrix of initial spanning tree, d out is the user-defined outgoing degree for nodes, and λ limit is the desired maximum generalized eigenvalue.Algorithm 2

5:
Form the final edge list E list that only includes spectrally-dissimilar off-subgraph edges obtained by using Edge_Similarities_Checking( Ẽlist , L G , L S , d out , ϱ); Calculate the spectral similarity score SpectralSim(i, j) between (p i , q i ) and every edge (p j , q j ) in Ẽlist ; if SpectralSim(i, j) < ϱ and d p j , d q j < d out for ∀(p j , q j ) ∈ Ẽlist then 7: end if 9: end for 10: Return E list ; diGRASS: Directed Graph Spectral Sparsification 102:15 obtains the edges after checking edges similarities for the off-subgraph edges, where Ẽlist is the set of off-subgraph edges; E list is the set of edges that will be added into sparsifier; d p is the outgoing degree for node p.The complexity has been summarized as follows:

APPLICATIONS OF DIRECTED GRAPH SPARSIFICATION 6.1 Directed Laplacian Solver
Recent research has focused on developing more efficient algorithms for solving undirected Laplacians [22,24].In this work, we will focus on solving asymmetric Laplacian matrices that correspond to directed graphs: solving the following linear system equations L G x = b, where the right-handside (RHS) vector b lies in the left singular vector space, will be equivalent to solving the following problem: Let y = L + G x, then we will first solve L G u y = b.Once y is obtained, we can get the solution x = L G y. Since L G u is a much denser matrix, L G u should not be explicitly formed when solving L G u y = b.To this end, iterative methods such as the preconditioned conjugate gradient (PCG) method can be leveraged for solving L G u y = b with L S u as the preconditioner.Note that only L S u will be explicitly computed during PCG iterations.
The directed graph sparsifier can also be directly leveraged as a preconditioner for solving L G x = b using existing iterative methods, such as the generalized minimal residual (GMRES) method [36].GMRES is a widely-adopted Krylov-subspace iterative method for solving asymmetric matrices.Given an initial solution vector x 0 , GMRES gradually improves the solution x m of the mth iteration by minimizing the residue as follows: where r 0 = b − L G x 0 , and K m (L G , r 0 ) = span{r 0 , L G r 0 , L 2 G r 0 , . . ., L m−1 G r 0 } denotes the Krylov subspace.

(Personalized) PageRank Vectors
The idea of PageRank is to give a measurement of the importance for each web page.For example, PageRank algorithm aims at finding the most popular web pages, while the personalized PageRank algorithm aims at finding the pages that users will most likely visit.To state it mathematically, the PageRank vector π satisfies the following equation: where 0 < c < 1 is the damping constant, and vector v 0 with non-negative coordinates, satisfying 1 v 0 = 1, is the personalization vector.The original, non-personalized definition of the PageRank is described when v 0 = 1 n 1.Meanwhile, D −1 G can not be defined if there exist nodes that have no outgoing edges.To deal with such situation, a self-loop with a small edge weight can be added for each node.

Directed Graph Partitioning
It has been shown that partitioning and clustering of directed graphs can play very significant roles in a variety of applications related to machine learning [31], data mining and circuit synthesis and optimization [32], and so on.However, the efficiency of existing methods for partitioning directed graphs strongly depends on the complexity of the underlying graphs [31].For an undirected graph, the eigenvectors corresponding to the first few smallest eigenvalues can be utilized for the spectral partitioning purpose [39].For a directed graph G on the other hand, the eigenvectors corresponding to the first few different smallest eigenvalues of Laplacian L G u will be required for directed graph partitioning.The eigenvalues according to the symmetrization of the directed graph in Figure 5 have a few multiplicities, which are shown in Figure 4.The partitioning result of the directed graph in Figure 5 will depend on the eigenvectors that correspond to eigenvalues of μ 1 , μ 2 , μ 4 , μ 8 .As shown in Figure 5, the spectral partitioning results can be quite different between the directed and undirected graph with the same set of nodes and edges.

EXPERIMENTAL RESULTS
The proposed algorithm for spectral sparsification of directed graphs has been implemented using MATLAB and C++.Extensive experiments have been conducted to evaluate the proposed method with various types of directed graphs obtained from public-domain datasets [15].To ensure that every node in the graph has at least one out-going edge, we delete the nodes with no out-going edges in the graph.

Dataset Description
The datasets are from SuiteSparse Matrix Collection [16].If a node has only incoming edges or is isolated with the rest of nodes, this bode will be removed from the graph.The statistics of datasets are summarized in Table 2.The detailed description for each graph is shown as follows: -gre_115, gre_185, and gre_1107 are from the Harwell-Boeing collection, which describe the simulation of computer systems.-hor is from the Harwell-Boeing Collection and it describes a flow network.-harvard500 is a web connectivity matrix from Cleve Moler.
-cell1 is a GSM cell traffic matrix from Salvatore Lucifora, Telecom Italia Mobile.
-big and pesa are structure symmetric matrices.
-wordnet3 is a directed multi-relational network.
-email-Eu-core is a relatively denser social network that is generated with e-mail data from a research institute.-wiki-Vote is a relatively denser social network from the Wikipedia vote dataset.
-cit-HepTh is a high-energy physics theory citation network from arxiv.

Spectral Edge Sensitivities
Figure 6 shows the spectral sensitivities of all the off-subgraph edges (e1 to e19 represented with blue color) in both directed and undirected graphs calculated using MATLAB's eigs function and the proposed method based on (37) using the LAMG solver, respectively.Meanwhile, the spectral sensitivities of all the off-subgraph edges (e1 to e19) with respect to the dominant eigenvalues (λ max or λ 1 ) in both directed and undirected graphs are plotted.We observe that spectral sensitivities for directed and undirected graphs are drastically different from each other.The reason is that the spectral sensitivities for off-subgraph edges in the directed graph depend on the edge directions.It is also observed that the approximate spectral sensitivities calculated by the proposed t-step power iterations with the LAMG solver match the true solution very well for both directed and undirected graphs.Fig. 7. Runtime scalability for "gre_1107" (left), "big" (middle), "gre_115" (right).

Directed Graph Sparsification
Table 3 shows comprehensive results on directed graph spectral sparsification for a variety of realworld directed graphs using the proposed method, where |V G |(|E G |) denotes the number of nodes (edges) for the original directed graph G; |E S 0 | and |E S | denote the numbers of edges in the initial subgraph S 0 and final spectral sparsifier S. Notice that we will directly apply the MATLAB's eigs function if graph size is relatively small (|E S 0 | < 1E4); otherwise, we will apply LAMG solver for better efficiency when calculating the generalized eigenvector h t .Note that a small diagonal entry with value of 1e − 6 is added to all symmetrized undirected graphs during the calculation.We report the total runtime for the eigsolver using either the LAMG solver or eigs function.
λ max, S 0 λ max denotes the reduction rate of the largest generalized eigenvalue of L + S u L G u from initial sparsifier to final sparsifier.
Ablation study.Since the proposed method is iteratively adding edges for forming the sparsifier.We demonstrate the performance of the runtime and generalized eigenvalue reduction with respect to the number of added edges in the sparsifier.Figure 7 shows the runtime scalability regarding to the number of off-subgraph edges (|E added |) added in the final sparsifier for graph "gre_1107" (left), "big" (middle) and "gre_115" (right).It shows that the runtime scales linearly with the added number of edges for all three graphs.Figure 8 shows how λ max (L G u , L S u ) is changing when including different number of edges in the sparsifier.We can observe that λ max can be efficiently reduced diGRASS: Directed Graph Spectral Sparsification 102:19 Fig. 8. Eigenvalue change with respect to added number of edges for "gre_115" (left) "gre_185" (right).

Test cases
GRASS [19] diGRASS (this work) when adding more edges in the sparsifier, especially at the early-stage of sparsifier construction.It also demonstrates that the most spectrally-critical edges can be efficiently identified and included at the early stage comparing to the edges that are less critical.

Comparison with Prior Method
Since there are no other existing directed graph sparsification methods to be compared, we compare our proposed method with the existing undirected graph sparsification tool GRASS [18,19,43].
To this end, we first convert directed graphs into undirected ones (G u ) using A + A symmetrization.Then undirected graph sparsifiers S u will be computed by GRASS.In the last, the directed graph sparsifiers can be constructed by recovering edge directions to the undirected sparsifier S u .Note that a larger diagonal entry with value of 1e −4 is added to all symmetrized undirected graphs during the calculation.The experimental results have been shown in Table 4, where λ max represents the largest generalized eigenvalue between the original graph and its final sparsifier.By keeping similar numbers of edges in the sparsifiers, we observe that the proposed spectral sparsification method consistently produces much better spectral sparsifiers than GRASS.Note that for graphs "harvard500" and "wordnet3", we cannot include more edges into the sparsifiers S u using GRASS, implying that the final λ max (L G u , L S u ) cannot be further reduced; on the other hand, our method is able to further reduce its condition number, achieving a much better spectral approximation level.

Directed Laplacian Solvers
Figure 9 shows the relative residual (res = L G x − b / b ) and runtime plots when spectral sparsifiers are applied as the preconditioners for solving the Laplacians of directed graphs "hor", Fig. 9. PCG convergence (the first row) and runtime (the second row) results for graphs "hor", "gre_115" and "gre_185", respectively.Fig. 10.GMRES convergence (the first row) and runtime (the second row) results for graphs "wordnet3", "harvard" and "big", respectively.
"gre_115" and "gre_185", respectively.As observed, the performance of the PCG solver has been substantially improved by leveraging sparsifier-based preconditioners.Note that for graph "hor" the plain PCG solver without using any preconditioner cannot converge to the desired accuracy within the maximum number of iterations (500 iterations).Figure 10 shows the relative residual and runtime plots when the preconditioners obtained via Incomplete LU (ILU) factorization of the original directed graphs and their spectral sparsifiers are applied for "wordnet3", "harvard500" and "big", respectively."ILU(•)" and "LU(•)" indicate that ILU and LU decompositions have been leveraged to construct the preconditioners, respectively."nnz" denotes the number of nonzeros in the preconditioners.The MATLAB's built-in functions gmres, ilu, and lu with default settings have been applied in our experiments.Note that the GMRES iterations with preconditioners show much faster convergence for all test cases.It is also observed for each test case the preconditioner computed using the directed sparsifier always has lowest number of nonzeros (nnz).results using the original graphs (x-axis) and sparsifiers (y-axis) are plotted for graphs "ibm32" (left), "mathworks100" (middle) and "gre_1107" (right), respectively.Note that a few steps of GS smoothing have been applied to remove the high-frequency errors to obtain the smoothed (personalized) PageRank vectors when using the sparsified graphs.We observe that the (personalized) PageRank vectors obtained from sparsifiers can well approximate the results computed with the original graphs.

Directed Graph Partitioning
Table 5 shows the detailed partitioning results on different graphs.Since there is no clear clue for spectral directed graph partitioning, we choose to perform spectral partitioning on the symmetrized undirected graph G u and S u , where two-way spectral partitioning are applied by utilizing the Fiedler Vector of its Laplacian matrix.np is the number of nodes that share the different partitions when comparing the partitioning results on graph G u and S u , where a smaller np indicates a more similar partitioning results between two graphs, thus a better spectral similarity between the original graph and the sparsifier.np/|V G | can be considered as the percentage of the mismatched node over all node set.cut is the cut value between two partitions, which is equivalent to the number of edges connecting two partitions.θ is the ratio cut [42] value that can be computed with the    following equation given the partition V i and V j : Figures 12, 13, 14, and 15 show the partitioning results on the symmetrized graph G u and its symmetrized sparsifier S u for "ibm", "peta", "gre_1107", and "big" graphs.As observed, very similar partitioning results have been obtained, indicating well preserved spectral properties within the spectrally-sparsified directed graph.

CONCLUSIONS
This article proves the existence of nearly-linear-sized spectral sparsifiers for directed graphs under the condition that their corresponding undirected graphs (obtained through the proposed Laplacian symmetrization scheme) only contain non-negative edge weights, and proposes a practicallyefficient yet unified spectral graph sparsification framework.Such a novel spectral sparsification approach allows sparsifying real-world, large-scale directed and undirected graphs with guaranteed preservation of the original graph spectral properties.By exploiting a highly-scalable (nearlylinear complexity) spectral matrix perturbation analysis framework for constructing nearly-linear sized (directed) subgraphs, it enables us to well preserve the key eigenvalues and eigenvectors of the original (directed) graph Laplacians.The proposed method has been validated using various kinds of directed graphs obtained from public domain sparse matrix collections, showing promising spectral sparsification results for general directed graphs.

Fig. 1 .
Fig. 1.Converting a directed graph G in (a) into undirected ones using A + A , AA + A A, and the proposed L G L G as shown in Figures (b)-(d), respectively.

( a )
Generate an initial subgraph S from the original directed graph in O(m log n) or O(m+n log n) time; (b) Compute the approximate dominant eigenvector h t and the spectral sensitivity of each offsubgraph edge in O(m) time; (c) Recover a small amount of spectrally-dissimilar off-subgraph edges into the latest subgraph S according to their spectral sensitivities and similarities in O(m) time; (d) Repeat steps (b) and (c) until the desired condition number or spectral similarity is achieved.

Fig. 4 .
Fig. 4. Eigenvalues of L G u for the directed graph in Figure 5 .

Fig. 5 .
Fig. 5. Spectral partitioning of directed (left) and undirected graphs (right).The nodes within the same cluster are assigned the same color.

Fig. 6 .
Fig. 6.The spectral sensitivity scores of off-subgraph edges (e1 to e19 in blue) for the undirected (left) and directed graph (right).

Figure 11
shows the application of the proposed directed graph sparsification for computing (personalized) PageRank vectors with c = 0.85, where the correlation of (personalized) PageRank diGRASS: Directed Graph Spectral Sparsification 102:21

Fig. 12 .
Fig.12.The partitioning results between G u (left) and its sparsifier S u (right) for the "ibm32" graph.

Fig. 13 .
Fig.13.The partitioning results between G u (left) and its sparsifier S u (right) for the "peta" graph.

Fig. 14 .
Fig.14.The partitioning results between G u (left) and its sparsifier S u (right) for the "gre_1107" graph.

Fig. 15 .
Fig.15.The partitioning results between G u (left) and its sparsifier S u (right) for the "big" graph.

Table 1 .
Summary of Symbols Used in This Article edge set of its sparsifier E S u : edge set of the symmetrization's sparsifier m S = |E S |: number of edges in E S m S u = |E S u |: number of edges in E S u

end if 10: end while 11: Return L S . ALGORITHM 2: Edge_Similarities_Checking Input:
Ẽlist , L G , L S , d out , ϱ Output: E list 1: Perform t-step power iterations with r = O(log n) initial random vectors h 6:Update S = S + E list and compute the latest dominant generalized eigenvector ht , and its eigenvalue λmax based on L G and L S ; 7: if λmax < λ max then 8:Update S = S, h t = ht , λ max = λmax ; 9:

Table 2 .
Statistics of Datasets

Table 3 .
Results of Directed Graph Spectral Sparsification

Table 5 .
Spectral Partitioning Results