Efficient Approximation Algorithms for Spanning Centrality

Given a graph $\mathcal{G}$, the spanning centrality (SC) of an edge $e$ measures the importance of $e$ for $\mathcal{G}$ to be connected. In practice, SC has seen extensive applications in computational biology, electrical networks, and combinatorial optimization. However, it is highly challenging to compute the SC of all edges (AESC) on large graphs. Existing techniques fail to deal with such graphs, as they either suffer from expensive matrix operations or require sampling numerous long random walks. To circumvent these issues, this paper proposes TGT and its enhanced version TGT+, two algorithms for AESC computation that offers rigorous theoretical approximation guarantees. In particular, TGT remedies the deficiencies of previous solutions by conducting deterministic graph traversals with carefully-crafted truncated lengths. TGT+ further advances TGT in terms of both empirical efficiency and asymptotic performance while retaining result quality, based on the combination of TGT with random walks and several additional heuristic optimizations. We experimentally evaluate TGT+ against recent competitors for AESC using a variety of real datasets. The experimental outcomes authenticate that TGT+ outperforms the state of the arts often by over one order of magnitude speedup without degrading the accuracy.


INTRODUCTION
Edge centrality is a graph-theoretic notion measuring the importance of each edge in the graph, which plays a vital role in analyzing social, sensor, and transportation networks [5,11,32,37].As pinpointed by Mavroforakis et al. [29], compared to the classic edge betweenness [9] based on shortest paths, spanning centrality (SC) [40] is a more ideal centrality for edges as it accommodates the information from longer paths.In particular, given a connected undirected graph G, the SC  () of an edge  is defined as the fraction of spanning trees of G (a tree-structure subgraph of G including all the nodes) that contains .In simpler terms, the SC  () measures how crucial the edge  is for G to remain connected, and hence, can be used to identify vulnerable edges in G.Such a definition renders SC useful in infrastructure networks like electrical grids that require maintaining connectivity, i.e., stability and robustness against failures [3,12].In addition, SC also finds extensive applications in both practical and theoretical fields, including phylogenetics [40], graph sparsification [39], electric circuit analysis [10,36], and combinatorial optimization [4,18], to name a few.
Despite its usefulness, the problem of computing the SC values of all edges (AESC) in G remains challenging.To explain, let  and  be the number of nodes and edges in the graph G, respectively.The graph G can have  (  ) spanning trees in the worst case.Hence, the exact AESC computation by enumerating all spanning trees is infeasible.The best-known algorithm [40] for the exact AESC computation is based on Kirchoff's matrix-tree theory [13,42], which bears more than quadratic time  ( 3/2 ), and thus, is prohibitive for massive graphs.To cope with this challenge, a series of approximation algorithms [14,29,34,39] for AESC have been developed in recent years.Given an absolute error threshold , existing solutions focus on calculating an estimated SC ŝ ( , ) for each edge  , with at most  absolute error in it.Although these methods allow us to trade result accuracy for execution time, they are still rather computationally expensive when G is sizable and  is small.Spielman and Srivastava [39] propose to approximate AESC via its equivalent matrix-based definition, leading to Õ   2 time in total.In the follow-up work [29], Mavroforakis et al. develop a fast implementation by incorporating a suite of heuristic optimizations that considerably elevate its empirical efficiency without compromising its asymptotic performance.However, both methods become impractical when the matrices are high-dimensional and dense (i.e.,  and  are large).To sidestep the shortcomings of matrices, Hayashi et al. [14] and Peng et al. [34] capitalize on the idea of using random walks for fast SC estimation, whereas these random walk-based techniques remain Õ   2 time.Motivated by the deficiencies of existing solutions, this paper presents two approximation algorithms for AESC: TGT and TGT+.At their hearts lie our improved bounds for random walk truncation, which are obtained through a rigorous theoretical analysis and novel exploitation of eigenvalues/eigenvectors pertaining to G. Notably, compared to Peng et al.'s bound [34], our bound can achieve orders of magnitude of reduction in random walk length.Based thereon, TGT (Truncated Graph Traversal) conducts the graph traversal, i.e., deterministic version of random walks, from each node to probe nodes within the truncated length.In doing so, TGT outperforms the state of the arts in the case where the amount of random walks needed in them exceeds the graph size.To overcome the limitations of TGT on large graphs with high degrees, we further devise TGT+, whose idea is deriving rough estimations of AESC by graph traversals in TGT and refining the results using merely a handful of random walks.By including a greedy trade-off strategy and additional optimizations, we can orchestrate and optimize the entire TGT+ algorithm for enhanced practical efficiency.On the theoretical side, TGT+ propels the approximate AESC computation by improving the asymptotic performance to Õ   2 +  .Our extensive experiments on multiple benchmark graph datasets exhibit that TGT+ is often more than one order of magnitude faster compared to the state-of-the-art solutions while offering uncompromised or even superior result quality.Notably, on the Twitch dataset with 6.8 million edges, TGT+ can achieve 10 −5 empirical error on average within 17 minutes for AESC, using a single CPU core, whereas the best competitor takes over 10 hours.
To summarize, we make the following contributions in this work: • We derive an improved lower bound for the truncated random walk length and propose a first-cut solution TGT, which estimates AESC using the graph traversal operations.(Section 3) • We develop an optimized solution TGT+, which integrates random walk sampling into TGT in an adaptive manner and improves over TGT in terms of practical efficiency.(Section 4) • We compare our proposed solutions with 3 competitors on 5 real datasets and demonstrate the superiority of TGT+.(Section 5)  , The truncated length for edge  , defined by Eq.( 5).,  The absolute error threshold and failure probability.,  The number of eigenvectors and candidate nodes, respectively.

PRELIMINARIES
This section sets up the stage for our study by introducing basic notations, the formal problem definition of -approximate AESC computation, and the main competitors for AESC approximation.

Notations
Let G = (V, E) be an undirected graph, where V is a set of  nodes and E is a set of  edges.For each edge  , ∈ E, we say   and   are neighbors to each other, and we use N (  ) to denote the set of neighbors of   , where the degree is  (  ) = |N (  )|.Throughout this paper, we use a boldface lower-case (resp.upper-case) letter ì x (resp.M) to represent a vector (resp.matrix), with its -th element (resp.element at the -th row and -th column) denoted as ì if  , ∈ E and P[, ] = 0 otherwise.Correspondingly, we denote  ℓ (  ,   ) = P ℓ [, ], which can be interpreted as the probability of a random walk from node   visits node   at the ℓ-th hop, reflecting the proximity of nodes   ,   .We refer to  ℓ (  ,   ) as ℓ-hop TP (transition probability) of   w.r.t.  .In this paper, we assume G is connected and not bipartite.According to [31], the random walks over G are ergodic, i.e., lim

Problem Definition
Definition 2.1 (Spanning Centrality [40]).Given an undirected and connected graph G, the SC  ( , ) ∈ (0, 1] of an edge  , is defined as the fraction of spanning trees of G that contains  , .Definition 2.1 presents the formal definition of SC.Recall that a spanning tree of graph G is a tree and spans over all nodes of G. Intuitively, a high SC  ( , ) quantifies how crucial edge  , is for G to ensure connectedness.Since an edge  , with a high SC means that it appears in most spanning trees, all of them will fall apart once  , is removed from G. In the extreme case where  ( , ) = 1, G will be disconnected when  , is excluded.To our knowledge, the state-of-the-art algorithm [40] for computing the exact AESC entails   3 2 time, which is prohibitive for large graphs.Following previous works [14,34], we focus on -approximate all-edge SC (AESC) computation, defined as follows.Particularly, we say an estimated SC ŝ ( , ) is -approximate if it satisfies Eq. (1).Definition 2.2 (-Approximate AESC).Given an undirected and connected graph G = (V, E) and an absolute error threshold  ∈ (0, 1), the -approximate AESC computation returns an estimated ŝ ( , ) for every edge  , ∈ E such that | ŝ ( , ) −  ( , )| ≤ . (1)
Other related works on SC will be reviewed later in Section 6.
Fast-Tree.Mavroforakis et al. [29] develop a fast implementation of [39] on the basis of the equivalence between SC and effective resistance (ER) [6] when node pairs are edges.To be more specific, as per [39], the ER of all edges are the diagonal elements of matrix R = BL † B ⊤ , where B and L † are the incidence matrix and the pseudoinverse of the Laplacian matrix of G, respectively.Fast-Tree first employs random projections [1] to reduce high matrix dimensions and then deploys the SDD solver to solve the linear systems in the low-dimensional space, resulting in a linear time complexity of    2 log 2  log ( 1  ) .However, its practical efficiency is less than satisfactory on large graphs, as revealed by the experiments in [14].
ST-Edge.Based on Definition 2.1, Hayashi et al. [14] first sample a sufficient number of random spanning trees by Wilson's algorithm [48], and record the fraction of trees where edge  , appears as the estimated ŝ ( , ).As proved, the expected time to draw a spanning tree rooted at a random node   is (  ,   ) , where  (  ,   ) is the commute time between nodes   and   and is  () [31] where  is matrix P's second largest eigenvalue in absolute value.The major distinction between MonteCarlo and MonteCarlo-C lies in the approach to computing ℓ-hop TP values.Specifically, MonteCarlo simply conducts random walks of length ℓ (1 ≤ ℓ ≤ ) to approximate ℓ-hop TP values before aggregating them as the estimated SC.According to the Chernoff-Hoeffding bound, a total time complexity of is needed to obtain an -approximate SC ŝ ( , ) with a success probability at least 1 − .By contrast, MonteCarlo-C regards the ℓ-hop TP  ℓ (  ,   ) (1 ≤ ℓ ≤ ) as the collision probability of two random walks of length-ℓ 2 from   and   , respectively, and then samples 40000 ℓ / 2 length-(ℓ/2) random walks from respective nodes.The parameter  ℓ is a constant depending on the graph structure, which is hard to compute in practice.Notice that both algorithms are originally designed for computing the ER of any node pair in G, which overlook the unique property of edges and thus are not optimized for AESC computation.Moreover, they require an exorbitant amount of random walks due to the large  (up to thousands when  is small), significantly exacerbating the efficiency issues.

THE TGT ALGORITHM
In this section, we propose TGT, an iterative deterministic graph traversal approach to AESC processing based on the idea of computing the truncated SC (Eq.( 2)) as in MonteCarlo.Particularly, TGT improves over MonteCarlo in two aspects.First and foremost, TGT offers significantly superior edge-wise lower bounds for truncated lengths by leveraging the well-celebrated theory of Markov chains [47] (Section 3.1).Further, TGT develops a deterministic graph traversal method to remedy the efficiency issue caused by substantial random walks needed in MonteCarlo (Section 3.2).
By Lemma 3.1, the ℓ-hop TP  ℓ (  ,   ) can be computed based on the eigenvectors and eigenvalues of matrix , and hence, the difference between   ( , ) and  ( , ) can be quantified via This suggests that we can utilize these eigenvectors and eigenvalues to determine a truncated length  , for edge  , so that Additionally, when ℓ = 1 and  , ∈ E, we have •   as per Eq. ( 4).Given the above observations, we can establish an improved lower bound for the truncated length  , of each edge  , , as shown in Theorem 3.2.
For ease of exposition, we defer all proofs to Appendix A.
Compared to Peng et al. 's  in Eq. ( 3), our truncated length  , of edge  , in Theorem 3.2 is dependent to the degrees of nodes   ,   , the -largest (typically  = 128) eigenvalues in absolute value and their corresponding eigenvectors of D 1 2 PD − 1 2 , enabling up to orders of magnitude improvement in practice, as reported in Figure 1.Note that the eigenvalues and eigenvectors can be efficiently computed in the preprocessing stage (see Figure 4).
Algorithm 1 presents the pseudo-code of CalTau, an algorithm realizing the computation of  , on the basis of Theorem 3.2.Given graph G,  eigenvalues { 1 , . . .,   }, eigenvectors { ì f 1 , . . ., ì f  }, and parameters  , ,  as inputs, CalTau initializes  , by Eq. ( 6) with  2 and Υ = Δ  = 0 at Line 1, followed by setting  = 1 and calculating Υ according to Eq. ( 7) at Line 2. After that, CalTau increases  iteratively to search for the optimal  such that it is closest to but does not exceed Eq. ( 6), ensuring the validity of Theorem 3.2 (Lines 3-6).To be more precise, in each iteration, CalTau calculates a candidate truncated length  ′ using Eq. ( 6), wherein Δ  is obtained by Eq. ( 8) with current .Next, if  ≤  ′ , we update  , as  ′ and increase  by 2 (Line 5).CalTau repeats the above procedure until the condition at Line 5 does not hold and returns  as  , at Line 7.

Complete Algorithm and Analysis
In light of Theorem 3.2, the problem of AESC computation in Definition 2.2 is reduced to computing the approximate SC ŝ ( , ) =   ( , ) as per Eq. ( 2) for each edge  , ∈ E. Unlike prior methods, TGT conducts a deterministic graph traversal from each node in an iterative manner, and aggregates them as to  ℓ (  ,   ).This operation essentially performs a sparse matrix-vector multiplication (  ) to   (  ,   ) for each neighbor   of   .After the completion of all graph traversal operations, Algorithm 2 computes   ( , ) for each edge  , (Line 12) and returns them as the answers.The following theorem states the correctness and the worst-case time complexity of TGT.Notwithstanding its unsatisfying worst-case time complexity, by virtue of our improved lower bounds for truncated lengths in Section 3.1, the actual number of graph traversal operations from each node in Algorithm 2 (Lines 7-9) is far less than  () when  is non-diminutive, strengthening the superiority of TGT over MonteCarlo in empirical efficiency.

THE TGT+ ALGORITHM
Although TGT advances MonteCarlo in practical performance, we observe in our experiments that its cost is intolerable for massive graphs with high degrees.The reason is that the number of nonzero ℓ-hop TP values grows explosively at an astonishing rate till  (Lines 7-9 in Algorithm 2) on such graphs as ℓ increases, causing a quadratic computational complexity of  ().The severity of the efficiency issue is accentuated in high-precision AESC computation, i.e.,  is small.To alleviate this issue, we propose TGT+, an algorithm that significantly improves TGT in terms of both practical efficiency and asymptotic performance.The rest of this section proceeds as follows: Section 4.1 delineates the basic idea of TGT+, followed by several optimization techniques in Section 4.2.Finally, Section 4.3 describes the complete algorithm and analysis.

High-level Idea
Considering the sheer number of non-zero ℓ-hop TP values in TGT when ℓ is increased, we propose to calculate the TP values within τ (a small number) hops using TGT and harness random walks for the estimation of ℓ-hop TP with ℓ > τ.The rationale is that the amount of nodes in the vicinity of a given node   is often limited, and hence, can be efficiently covered by a graph traversal from   .On the contrary, far-reaching nodes from   can be multitudinous (up to millions in large graphs), where random walks suit the demand better by focusing on probing important nodes (i.e., with high TP values) in lieu of all of them.To fulfill the above-said idea, we first derive a truncated length  , such that | ( , ) −   ( , )| ≤  2 for each edge  , ∈ E. Next, the problem is computing an estimated SC ŝ ( , ) of each edge  , to ensure | ŝ ( , ) −   ( , )| ≤  2 using graph traversals and random walks.To facilitate the seamless integration of random walks into TGT, we leverage the following crucial property of   (  ,   ), a constituent part of SC   ( , ) as defined in Eq. ( 9).Lemma 4.1.For any integer  and τ with 0 ≤ τ ≤ , where More concretely, given a cherry-picked length τ (1 ≤ τ ≤  , ), Lemma 4.1 implies that we can estimate  τ→ (  ,   ) by simulating random walks of lengths from 1 to  , − τ after obtaining  τ (  ,   ) and  τ (•,   ) with TGT.Mathematically, if we conduct two length-( , − τ) random walks   and   containing visited nodes from nodes   ,   , respectively, we can define a random variable  as By definition, the expectation E[ ] of  is exactly  τ→ (  ,   ) in Eq. (10), indicating that  is an unbiased estimator of  τ→ (  ,   ).
Suppose that the range of  is bounded by It is straightforward to apply Hoeffding's inequality in Lemma 4.2 to derive the total number of random walks needed for the accurate estimation of  τ→ (  ,   ), i.e.,

𝑅(𝑒
In the subsequent section, we elucidate the determination of τ and  so as to strike a good balance between graph traversal and random walks for optimized performance and meanwhile reduce the number ( , ,  , − τ) of samples required.

Optimizations
4.2.1 Adaptive determination of τ .Since the length of random walks for estimating  τ→ (  ,   ) ∀  ∈ N (  ) is  , − τ, the computational overhead incurred by random walks from the given node   and its neighbors is hence bounded by which increases as τ decreases.Conversely, the graph traversal operations in TGT will reduce considerably when τ is lowered, as explained at the beginning of Section 4.1.In short, the length τ controls the trade-off between the deterministic graph traversal and random walks for each node   ∈ V. Since it is hard to accurately quantify the graph traversal cost as a function regarding τ due to the complex graph structure, we make use of an adaptive strategy to determine τ.More precisely, in the ℓ-th iteration of deterministic graph traversal (Lines 6-9 in Algorithm 2) originating from   , we set τ = ℓ and switch the graph traversal to random walk simulations, if the following inequality holds: ∑︁ where the l.h.s. and r.h.s.represent their respective costs for computing (ℓ + 1)-hop TP values in the next iteration.The rationale of Eq. ( 14) is that we choose random walks rather than the graph traversal when the cost of the latter will outstrip the former.

4.2.2
Effective refinement of  .By the definition of  in Eq. ( 11), one may simply set  as follows: where  τ (  ,   ) ∀  ∈ V is known from TGT.Unfortunately, the empirical values of r.h.s. of Eq. ( 15) are usually innegligible on real graphs, resulting in a considerable number of random samples according to Eq. (13 Using Lemma 4.3, a refined  is at hand: It is worth mentioning that (  , ℓ) and the first two terms in  (  , ℓ) can be efficiently computed without sorting all the  nodes, since the actual number of non-zero entries in  τ (•,   ) is limited due to our fine-tuned τ, as remarked earlier.Therefore, the critical challenge to realize the derivation of the improved  in Eq. ( 17) arises from the computation of   in Eq. ( 16), which incurs a high cost of  ( log ) if we search for the optimal edge  , ensuring Eq. ( 16) from E in a brute-force fashion.To tackle this problem, we propose a subroutine CalChi in Algorithm 3, which computes   for  in a cost-effective manner, without jeopardizing its correctness.More specifically, instead of inspecting all the  edges in G, CalChi first identifies a set  11) (Line 17).In the end, TGT+ computes ŝ ( , ) = ĝ (  ,   ) + ĝ (  ,   ) for each edge  , ∈ E and outputs them as the SC estimations.The following theorem expresses the correctness and complexity of it.Theorem 4.4.For any ,  ∈ (0, 1), Algorithm 4 returns the approximate SC ŝ ( , ) ∀ , ∈ E with the probability at least 1 − , The rationale of TGT+'s correctness has been explained in Section 4.1.For the time complexity, it comes from (i) the graph traversal in Lines 2-11, (ii) the random walk in Lines 12-17, and (iii) accessing each neighbor of each node in Line 18 with a total time of  ().With the adaptive switch condition in Eq.( 14), TGT+ ensures that the cost of the first part does not exceed the second part, resulting in the cost of both is where as Eq. ( 13) and  , =  log( 1 ) as Line 1 of Algorithm 1.Hence, the time complexity of TGT+ turns to the formula in Theorem 4.4.Table 2 compares the expected time of the randomized algorithm for -approximate AESC computation.Notably, TGT+ eliminates an  term in its bound, where the term (  ) can be simplified as  () or even  (/log ) using Kantorovich inequality on scale-free graphs with / =  (log ), manifesting the superiority of TGT+ over existing solutions.

EXPERIMENTS
In this section, we introduce the experimental settings, followed by evaluating our truncation bound and showing the performance of the proposed TGT+.At last, we analyze the sensitivity of constants  and  in TGT+.All experiments are conducted on a Linux machine with Intel Xeon(R) Gold 6240@2.60GHzCPU and 377GB RAM in single-thread mode.None of the experiments need anywhere near all the memory.Due to space limitations, we refer interested readers to Appendix A for the scalability test.

Experimental Setups
Datasets and groundtruths.We include 5 different types of real undirected graphs at different scales, whose statistics are shown in Table 3.All datasets are collected from SNAP [21] and used as datasets in previous works [14,29,34].For each graph, we generate groundtruth AESC by first computing P  with 0 ≤  ≤ 1000 in parallel and then assembling them into SC by Eq.( 2).Methods and parameters.We compare TGT and TGT+ with three recent algorithms for AESC: ST-Edge [14], MonteCarlo [34] and MonteCarlo-C [34], as introduced in Section 2.3.We exclude Fast-Tree [29] from the competitors since it mainly offers relative approximation guarantees and is empirically shown significantly inferior to ST-Edge in [14].Amid them, MonteCarlo and MonteCarlo-C are adapted for AESC computation, and the detailed modification is explained in Section 5.2.For the randomized algorithms TGT+, ST-Edge, MonteCarlo, and MonteCarlo-C, we follow [14] and set failure probability  = 1/.Regarding MonteCarlo-C, we adopt the  [20] 34,401 420,784 Slashdot [22] 77,360 469,180 Twitch [35] 168,114 6,797,557 Orkut [50] 3,072,441 117,185,082 heuristic settings of   as suggested in [34], since they are unknown.
For the proposed TGT and TGT+, we set the constants  = 10 and  = 128, unless otherwise specified.For a fair comparison, all tested algorithms are implemented in C++ and compiled by g++ 7.5 with −O3 optimization.For reproducibility, the source code is available at: https://github.com/jeremyzhangsq/AESC.

Empirical Study of 𝝉 𝑖,𝑗 and 𝝉
In the first set of experiments, we evaluate the performance of the proposed truncated length in Section 3.1.Figure 1 We take MonteCarlo as an example to demonstrate its superiority.It is worth noting that MonteCarlo and MonteCarlo-C are designed for computing SC of a single node pair.Although our  , can remarkably cut down the number of random walks for an edge, there remain redundant random walks if invoking them for all edges individually.Hence, we further adapt MonteCarlo and MonteCarlo-C for efficient AESC computation by following the idea in TGT and TGT+ that iterate over each node.To summarize, for each node   , the adapted MonteCarlo and MonteCarlo-C first compute the largest   among   's local neighborhood as Line 2 in Algorithm 2, and then compute the number of samplings based on   .In the end, these extensions generate the corresponding random walks from   and estimate   ( , ) for each   ∈ N (  ).

Performance Evaluation
In the second set of experiments, we evaluate the performance of each approach in terms of efficiency and accuracy.For efficiency, we report the average running times (measured in wall-clock time) after all input data are loaded into the memory.For accuracy, we measure the actual average absolute error of the estimated all edge SC returned by each algorithm on each dataset.We run each algorithm with various  in {0.05, 0.02, 0.01, 0.005}, and report the average evaluation score after repeating 3 trials.A method is excluded if it fails to report the result within 120 hours.

Running time.
We first compare TGT+ with TGT and other competitors in terms of efficiency.Figure 2 reports each solution's running time for solving AESC with various  settings.Benefiting from the truncation bound and the seamless combination of TGT and random walk samplings, the proposed TGT+ outperforms all competitors on all tested graphs and  settings.Most notably, TGT+ improves the best competitor ST-Edge by at least one order of magnitude on Facebook and Twitch.We find that the improvement achieved by TGT+ becomes more remarkable as  decreases.For example, TGT+ is 10.8× (resp.23.5×) faster than ST-Edge on HepPh (resp.Slashdot) when  = 0.005.In addition, on the Orkut graph with 117 million edges, TGT+ is the only algorithm that can finish under all  settings, demonstrating the scalability of our algorithm.
To evaluate the performance of the combination in TGT+, we next compare TGT+, TGT, MonteCarlo, and MonteCarlo-C, as all of them employ the edge-wise  for the sake of fairness.As shown in Figure 2, MonteCarlo and MonteCarlo-C fail to return results within the allowed time limit in most cases.In particular, MonteCarlo can only terminate within 120 hours on Facebook and on Twitch when  = 0.05.The running time of MonteCarlo-C is even worse and is only feasible on Facebook with  = 0.02, 0.05.In contrast, TGT speeds up MonteCarlo and MonteCarlo-C by at least 2 orders of magnitude, demonstrating the superiority of the graph traversal in Section 3.2.However, TGT is still rather costly in comparison to TGT+.Specifically, TGT is only comparable to TGT+ on Facebook and is inferior to TGT+ on the rest graphs.For instance, TGT costs at least 1 and 2 orders of magnitude more time than TGT+ on Slashdot and Twitch, respectively, demonstrating the effectiveness of integrating deterministic traversal with randomized simulations in TGT+.To explain, the truncated length  , on the rest graphs (e.g., HepPh and Slashdot) is longer than that on Facebook, substantially increasing the overhead incurred by the graph traversal.

Accuracy.
We next report the tradeoffs between average absolute error (in -axis) and running time (in -axis) in Figure 3.
The results are sorted in the ascending order of , and the error-time curve closer to the lower left corner indicates a better performance.As shown, the overall observation is that TGT+ outperforms all competitors by achieving lower errors with less running time on all graphs.In particular, TGT+ achieves an average absolute error of 1.37E-05 with a time of 532 seconds on Twitch, while the closest solution TGT achieves an average absolute error of 1.56E-05 using over 20,000 seconds (≈ 5.6 hours).Regarding TGT, we observe that, under the same  setting, the actual absolute error of TGT is slightly smaller than TGT+.This is as expected since TGT leverages the largest   = max   ∈ N (  )  , as the maximal iteration for   .In other words, the SC value for the edge  , is overestimated if  , <   .Furthermore, we notice that the absolute error of MonteCarlo-C is an order of magnitude larger than the closest competitor ST-Edge on Facebook.This is due to that the heuristic settings [34] for input parameters   do not ensure the returned values are -approximate., in the preprocessing stage.In particular, MonteCarlo and MonteCarlo-C require the second largest eigenvalues, while TGT and TGT+ need the  = 128 largest eigenvalues and eigenvectors.Fortunately, by virtue of wellestablished techniques [33] and tools [19] for large-scale eigen decomposition, we can quickly obtain the desired eigenvalues and eigenvectors.Figure 4 reports the preprocessing time for TGT+ and vanilla MonteCarlo.As expected, the preprocessing time of TGT+ is comparable to MonteCarlo.In addition, compared to the running time for AESC displayed in Figure 2, the preprocessing costs are insignificant.For instance, the preprocessing time of TGT+ is about 45 minutes for the Orkut (OK) graph with 117 million edges, whereas the running time is at least 7 hours.Notice that this preprocessing step only needs to be conducted once for a graph.

Parameter Analysis
In the last set of experiments, we study the effects of TGT+'s constant: (i) , the number of largest eigenvalues and eigenvectors of D 1 2 PD − 1 2 in Algorithm 1; (ii) , the number of candidates in Algorithm 3. In the sequel, we set  = 0.05 unless otherwise specified.

5.4.1
Varying . Figure 5(a) reports the running time of TGT+ by setting  = 10 and varying  ∈ {2, 8, 32, 128} on HepPh (HP), Slashdot (SD) and Twitch (TW).As expected, TGT+ costs less running times as more eigenvalues and eigenvectors are exploited.Specifically, the improvement of  is more remarkable on HepPh, where the running time of  = 128 is about 126× faster than  = 2. Besides, the running time of TGT+ achieves about 8× and 17× improvements by varying  from 2 to 128 on SD and TW, respectively.5.4.2Varying  .Figure 5(b) reports the running time of TGT+ by fixing  = 128 and picking  ∈ {0, 10 1 , 10 2 , 10 3 , 10 4 } for the computation of  on HP, SD and TW.We observe that the running time of TGT+ first decreases and then increases as more candidates are considered.To explain, when  is too small, the upper bound  for | | is too loose, rendering more random walks generated; when  is too large, Algorithm 3 incurs more computational overhead.For example, TGT+ with  = 0 costs about 2× more time than that with  = 10 on HP and SD.Meanwhile, TGT+ with  = 10, 000 costs over 3× more time than that with  = 10 on TW.

ADDITIONAL RELATED WORK
In the sequel, we review existing studies germane to our work.
Spanning centrality.Apart from the methods discussed in Section 2.3, there exist several techniques for estimating SC (i.e., effective resistance (ER)).Fouss et al. [8] propose to calculate the exact ER values for all pairs of nodes in the input graph  by first computing the Moore-Penrose pseudoinverse L + of the Laplacian matrix L = D − A, and then taking as the ER for any node pair   ,   ∈  .Teixeira et al. [40] and Mavroforakis et al. [29] utilize the random projection and symmetric diagonally dominant solvers to approximate the SC for all edges.After that, Jambulapati and Sidford [17] aime to compute the sketches of L and its pseudoinverse L + , and propose an algorithm for estimating ER values for all possible node pairs in  ( 2 /) time.Besides MonteCarlo and MonteCarlo-C, Peng et al. [34] also propose two solutions by leveraging the connection between ER and the commute time [31] These works all focus on the -multiplicative approximation and are beyond the scope of this paper.
Personalized PageRank.Another line of related work is personalized PageRank (PPR).In past decades, the efficient computation of PPR has been extensively studied in a plethora of works [2, 7, 16, 23-28, 38, 43-46, 49].Among them, some recent approaches [16,23,24,27,28,38,[43][44][45][46] also leveraged the idea of combining the deterministic graph traversal [2,7,26] with random walk simulations.At first glance, it seems that we can simply adapt and extend these techniques for computing -approximate all edge SC.However, SC is much more sophisticated than PPR.This is due to that they are defined according to two inherently different types of random walks.More concretely, PPR leverages the one called random walk with restart (RWR) [41], which would stop at each visited node with a certain probability during the walk.In contrast, SC relies on simple random walks of various fixed lengths (from 1 to ∞), indicating that the walk in SC will not terminate as early as RWR does.Motivated by this, a linchpin of this work is a personalized truncation for the maximum random walk length.Correspondingly, the combination of graph traversal and random walks becomes more challenging.

CONCLUSION
In this paper, we propose two approximation algorithms for AESC computation.Our contributions consist of (i) enhanced lower bounds for truncating random walks, (ii) an algorithmic framework integrating the deterministic graph traversal with random walk sampling, and (iii) several carefully-designed optimization techniques for increasing efficiency.Our experiments on five real datasets demonstrate that our proposed algorithm significantly outperforms existing solutions in terms of practical efficiency without compromising theoretical and empirical accuracy.In the future, we plan to study AESC computation with relative error guarantees as well as under multithreading environments.Regarding the time complexity, notice that, by Eq. ( 14), we ensure that the cost of the deterministic part does not exceed that of using random walk samplings.Hence, the overall time complexity of TGT+ As per Eq. ( 15), we can obtain that ( , ,  , ) ≤ , which completes the proof.

A.2 Scalability Test
Besides the evaluation in Section 5, we also test the scalability of TGT+ on synthetic graphs of varying sizes generated by the Erdos Renyi random graph model.To evaluate scalability, we fix the number of nodes as 10 4 (resp.the number of edges as 10 6 ) and vary the number of edges from 0.2, 0.5, 1, 2, 5×10 6 (resp.the number of nodes from 2, 5, 10, 20, 50 × 10 3 ).We have included the results in Table 4 and Table 5.Our results show that the running time grows linearly with the number of nodes and edges, confirming the time complexity of TGT+ and demonstrating its scalability.
x[] (resp.M[, ]).Given G, we denote by A the adjacency matrix of G, where A[, ] = 1 if  , ∈ E and A[, ] = 0 otherwise.In addition, we let D be the degree diagonal matrix of G and the diagonal entry D[, ] =  (  ) for each node   ∈ V. Let P = D −1 A be the random walk matrix (i.e., transition matrix) of G, in which P[, ] = 1  (  )

Figure 1 (
b) reports the average number of random walks, where the major overhead of MonteCarlo stems from, for estimating each SC.Akin to the observation from Figure1(a), MonteCarlo with our truncated lengths  , requires at most 3 orders of magnitude fewer random walks than Peng et al. 's .

Figure 2 :
Figure 2: Running time of each algorithm by varying .

Figure 3 :
Figure 3: Tradeoffs between running time and absolute error.
Notation DescriptionG = ( V, E ) An undirected graph G with node set V and edge set E.

Table 1
lists the notations that are frequently used in this paper.

Table 4 :
The running time of TGT+ by varying the number of edges.