Effective and Efficient PageRank-based Positioning for Graph Visualization

Graph visualization is a vital component in many real-world applications (e.g., social network analysis, web mining, and bioinformatics) that enables users to unearth crucial insights from complex data. Lying in the core of graph visualization is the node distance measure, which determines how the nodes are placed on the screen. A favorable node distance measure should be informative in reflecting the full structural information between nodes and effective in optimizing visual aesthetics. However, existing node distance measures yield sub-par visualization quality as they fall short of these requirements. Moreover, most existing measures are computationally inefficient, incurring a long response time when visualizing large graphs. To overcome such deficiencies, we propose a new node distance measure, PDist, geared towards graph visualization by exploiting a well-known node proximity measure,personalized PageRank. Moreover, we propose an efficient algorithm Tau-Push for estimating PDist under both single- and multi-level visualization settings. With several carefully-designed techniques, TauPush offers non-trivial theoretical guarantees for estimation accuracy and computation complexity. Extensive experiments show that our proposal significantly outperforms 13 state-of-the-art graph visualization solutions on 12 real-world graphs in terms of both efficiency and effectiveness (including aesthetic criteria and user feedback). In particular, our proposal can interactively produce satisfactory visualizations within one second for billion-edge graphs.

A central problem in graph visualization is calculating an effective layout, i.e., the coordinate of each node on the screen, which seeks to place closely-related nodes close and unrelated nodes far apart based on a node distance measure calculated from the graph.In most existing solutions, classic node distance measures, including the shortest distance [15,27,28,41,53,68,74] and the direct linkage [1,5,23,25,38,52,57], are widely used.However, such distance measures overlook the high-order structure information of the graph and fail to optimize some critical visual features, leading to sub-par visualizations (e.g., node overlapping and edge distortion).For example, the classic stress method [28] determines node positions by employing unrecognizable shortest distances, resulting in severe node overlapping.
Another long-standing challenge is visualizing large graphs.When handling large graphs, most existing solutions [1,5,35,52,53,63,68] adopt the multi-level scheme to avoid the poor readability and prohibitive overhead caused by visualizing all nodes in a single-level fashion.Specifically, the multi-level strategy organizes the nodes of the input graph  into a tree H , such that (i) each leaf in H is a node in , and (ii) each non-leaf node in H , referred to as a supernode, has only a small number of children.The user can navigate through H and visualize any set S of nodes or supernodes that have the same parent.Utilizing the multi-level scheme, existing solutions still struggle to cope with large graphs as they entail significant overheads to compute the node distances.The reason is that these methods (e.g., [52,53]) require calculating all pairwise distances (e.g., shortest distance) for the leaf nodes in the subtrees of two supernodes V  and V  before determining the distance between V  and V  ; otherwise, the visualization of supernodes will be uninformative in reflecting the underlying graph structure, which results in poor visualization quality.
To address the aforementioned challenges in effectiveness and efficiency, we propose a new graph-theoretic node distance measure dedicated to enhancing visualization quality, referred to as PDist.PDist takes inspiration from personalized PageRank (PPR) [59], a proximity measure quantifying the connectivity from a source node to a target node via random walks in a graph.Considering the requirements of graph visualization, PDist incorporates degree and symmetry information into PPR, and ameliorates PPR with transformation and truncation.Through such optimizations, PDist circumvents the visual issues (e.g., node overlapping and edge distortion) that may be caused by PPR, while accurately preserving the graph structure (e.g., degree and high-order proximity information).Notably, our analysis shows that PDist offers non-trivial visualization quality guarantees in terms of two widely-used aesthetic criteria.
Unfortunately, computing PDist for supernodes is challenging as it involves a multitude of leaf nodes underlying the supernodes.There are a plethora of approaches for PPR computation in the literature [47,59,80,81], but they focus on single-source or top- PPR queries rather than arduous all-pair queries in the case of PDist.As a result, these approaches are inefficient when adopted for PDist computation due to redundant graph traversal operations and random walk simulations.To this end, we propose Tau-Push, an efficient solution for computing approximate PDist.Compared to the PPR computation methods, Tau-Push achieves superior time complexity and empirical efficiency, while retaining strong accuracy guarantees.Under the hood, Tau-Push adopts a filter-refinement paradigm accommodating three carefully-designed techniques.First, Tau-Push computes a rough estimation of each PDist by grouped forward graph traversal.Next, Tau-Push identifies a set of failed target nodes T by leveraging the global PageRank of the graph.Finally, Tau-Push refines the PDist estimation of such nodes by performing a handful of graph traversal operations backward from T .Note that our PDist and its computation algorithm Tau-Push are not limited to single-and multi-level graph visualizations, and could underpin Table 1: Frequently used notations.

Notation Description 𝐺=(𝑉 , 𝐸)
A graph  with node set  and edge set . ,  with a circle by placing the center of the circle on the coordinate of the node, and draw each directed edge using a straight arrowhead line.For undirected edges, we omit the arrowhead to avoid a cluttered display.Thus, ∥X[] − X[ ] ∥ is the distance between node   and   on the screen and  (  ,   ) = ∥X[] − X[ ] ∥ is the length of edge (  ,   ) ∈ .
Given an input graph , graph visualization typically consists of two phases: (i) distance matrix computation and (ii) position matrix embedding.In the first phase, a specific node distance matrix D ∈ R × is computed, in which D[, ] reflects the graph-theoretic distance between nodes   and   .For example, [25,38,57] directly employ the adjacent matrix as D and [27,28,41] use the all-pair shortest distances as D. In the second phase, the distance matrix D is converted to a position matrix X, such that the on-screen distance ∥X[] − X[ ] ∥ of node pair (  ,   ) is close to D[, ] for all node pairs (  ,   ) ∈  ×  .In literature, there exist several optimization techniques that transform the distance matrix into the position matrix, e.g., gradient descent [27], simulated annealing [19], and stress majorization [28].

Multi-level Mechanism
Directly visualizing all nodes in a large graph usually results in a giant hairball with little discernible structure information, due to the sheer numbers of nodes and edges in the layout [29].Thus, most existing methods [1,5,63,68] and softwares [6,20,66] visualize large graphs in an interactive and multi-level manner, so as to cut down the number of nodes in each drawing.As surveyed in [33,76], multi-level methods consist of two phases: (i) preprocessing and (ii) interactive visualization.In the preprocessing phase, a supergraph hierarchy is constructed such that nodes of the graph  are organized into a tree H , where (i) each leaf is a node in , and (ii) each non-leaf node, referred to as a supernode, contains  children.For convenience, we say that each leaf node is at level-0, and that each supernode V  is at level-(ℓ +1) if its children are at level-ℓ (ℓ ≥ 0).In the interactive visualization phase, users can select any supernode S at level-(ℓ + 1), and ask for a visualization of the  children of S, where the corresponding position matrix X ∈ R ×2 is derived following the visualization procedure described in the preceding section.For example, if S consists of leaf children, then multi-level methods visualize the subgraph of  induced by the nodes in S. On the other hand, if S consists of supernode children, then they visualize a high-level graph where (i) each node represents a supernode V  in S and (ii) each edge connects from supernode V  to supernode V  if  contains an edge from a leaf node in the sub-tree of V  to another leaf node in the sub-tree of V  .Notice that the size constraint  is conducive to curtailing visual clutter [22] and can be configured by users according to their needs.Throughout this paper, we refer to visualizing the entire graph on one single layout (resp.multiple levels of layouts) as single-level (resp.multi-level) visualization.
Although the supergraph hierarchy can be constructed offline and the time complexity of position matrix embedding is reduced to  ( 3 ) [28] for the children of S, existing multi-level methods still incur high costs for the distance matrix computation of the children in S, especially on graphs with many nodes and edges.A keen reader may propose to directly compute the distances of the children in S by treating S as a stand-alone graph.However, doing so overlooks the underlying structures of S, and thus impairs visualization quality as pinpointed in [52,53].Instead, a canonical approach adopted by existing methods [52,53,70] is to measure the distance between two supernodes V  and V  based on the average distance between every node pair   and   , where   (resp.  ) is a leaf node in V  (resp.V  ).In other words, this approach requires computing  ℓ ×  ℓ pairs of distances for two level-ℓ supernodes, which poses a challenge in terms of computation efficiency.

Visualization Quality Assessment
Intuitively, a high-quality graph visualization should not only reflect the topological information of the input graph but also have good readability [8,29].In other words, the position matrix X in the Euclidean space should accurately reflect the structure of , in the sense that well-connected nodes (resp.poorly-connected nodes) should be placed close (resp.far apart) to each other.As for good readability, it means that the layout should avoid negative visual artifacts such as nodes overlapping with each other or edges with drastically different lengths.In relation to this, there exist several aesthetic criteria in the literature that quantify the readability of graph layouts.In this paper, we adopt the two most commonly-used aesthetic metrics [10,42,58,61,62,72], i.e., node distribution (ND) and uniform length coefficient variance (ULCV), defined as follows.
Definition 2.1 (ND).For a position matrix X, the node distribution is ND(X) Definition 2.2 (ULCV).For a position matrix X, let   (resp.  ) be the standard deviation (resp.mean) of the edge lengths.The uniform length coefficient variance is ULCV(X) =   /  .
ND is the summation of the reciprocals of the squared distance between node pairs in the graph layout.A large ND score indicates the existence of visual clutter in the layout.In particular, overlapping nodes (occupying the same node positions) lead to an infinite ND score.ULCV measures the skewness of edge lengths.A large ULCV indicates that some edges are considerably longer or shorter than others, in which case the layout tends to look distorted.

PPR-BASED NODE DISTANCE
This section presents our node distance measure PDist for graph visualization.We first formally define PDist for the case of singlelevel visualization and then extend it to multi-level visualization.

Single-level PDist
Definition.PDist is formulated based on personalized PageRank (PPR), a node proximity measure defined as follows.Given a directed graph  = ( , ), two nodes   ,   ∈  , and a restart probability , the PPR  (  ,   ) from   to   is defined as the probability that a random walk with restart (RWR) [73] originating from   would end at   .Specifically, an RWR starts from   , and at each step, it either (i) terminates at the current node with probability , or (ii) with the remaining 1 −  probability, navigates to a random out-neighbor of the current node.Intuitively, a large PPR  (  ,   ) indicates that numerous paths exist from   to   ; in other words,   is well connected to   .Based on PPR, we define PDist as follows.
The intuition behind PDist is that if node pair (  ,   ) has a high PPR value  (  ,   ), then its PDist Δ[, ] should be small.As such, PDist ensures that well-connected nodes are placed closely in a visualization.Moreover, PDist also accounts for the out-degrees of   and   in its definition to overcome the inherent limitation of PPR for visualization.To explain, we consider the example in Fig. 1, which shows a graph with nodes  0 - 9 , as well as the PPR and PDist values for node pairs ( 0 ,  8 ), ( 2 ,  0 ), and ( 6 ,  9 ).Observe that  ( 2 ,  0 ) = 0.11 <  ( 6 ,  9 ) = 0.44, even though node pairs ( 2 ,  0 ) and ( 6 ,  9 ) are both directly connected via an edge.This indicates that for adjacent node pairs in a graph, their PPR values could vary considerably.As a consequence, directly transforming PPR values into node distances would bring about a large variance in edge lengths in the visualization.To alleviate this issue, in our formulation of PDist, we scale each  (  ,   ) by multiplying it with the out-degree of the source node   .It has been shown in previous work [86] that such a scaling contributes to an accurate measure of the strength of connections between nodes.In Fig. 1, the PDist of ( 2 ,  0 ) is relatively close to that of ( 6 ,  9 ), which is more consistent with the fact that  2 and  6 are direct neighbors of  0 and  9 , respectively.Moreover, the PDist of ( 0 ,  8 ) is large, reflecting the fact that  0 is far away from  8 in the input graph.
To make the node distance symmetric, we formulate PDist based on   (  ,   ) +   (  ,   ), since   (  ,   ) ≠   (  ,   ) in general.Furthermore, PDist takes the inverse of the natural logarithm of DPPR   (  ,   ) +   (  ,   ) and imposes an empirical truncation to narrow down the distance variance, thereby ensuring the distances between nodes lie in a reasonable range that can be well fitted into a canvas.In particular, the lower and upper bounds of PDist are set to 2 and 2 log  for precluding node overlapping and blank space issues, respectively.The rationale of choosing 2 log  as the upper bound is that the average PPR value between a node pair in a graph with  nodes is 1/ [67,81] and PDist should focus on reflecting the proximity information of node pairs with above-average PPR values (i.e., relatively high relevance).
Bounds for aesthetic criteria.The following two theorems 1 establish the worst-case upper bounds in terms of both ND and ULCV (see Definitions 2.1-2.2),when utilizing PDist for visualization. 1 We refer interested readers to Appendix A for all proofs. .
Note that the upper bound of the restart probability  in Theorem 3.3 (which is approximately 0.243) is not restrictive, as  is usually set to 0.15 or 0.2 [49,78,79,83].
A case study.To verify the effectiveness of PDist, we use PDist and two classic node distance measures (i.e., the shortest distance and SimRank-based distance) to visualize the graph FbEgo via the same position matrix embedding algorithm [28].Note that the SimRank-based distance is obtained by plugging SimRank [39] into Eq.(1).Fig. 2 displays the visualization results of the three distance measures.It can be observed that PDist yields a highquality visualization result, which organizes the graph into a wellconnected cluster and three cliques.In comparison, the widely-used shortest distance measure [15,27,28,53] suffers from severe node overlapping and edge distortion issues.Regarding the visualization derived from SimRank-based distances, the edges of the cliques are distorted, as SimRank assigns node pairs within the large connected component high similarity scores but 0 similarity scores to the 2cliques.Besides the qualitative results, we also report ND and ULCV scores in Fig. 2, and we can find that PDist considerably outperforms the shortest distance and the SimRank-based distance in terms of both metrics, validating the superior aesthetic performance of PDist as proved in Theorem 3.2 and Theorem 3.3.

Multi-level PDist
Definition.Given a user-specified level-(ℓ + 1) supernode S, multilevel visualization requires calculating the level-ℓ PDist between the  level-ℓ children supernodes of S. Recall from Definition 3.1 that a linchpin functioned in single-level PDist is the DPPR between two nodes.Thus, we extend the DPPR in Definition 3.1 to level-ℓ DPPR (Definition 3.4), and level-ℓ PDist can be derived accordingly by plugging Eq. (2) into Eq.(1).Definition 3.4 (Level-ℓ DPPR).For two level-ℓ supernodes V  and V  , denote the set of leaf nodes in V  as  (V  ), the level-ℓ DPPR 0.50 0.69   (V  , V  ) of V  w.r.t.V  is defined as where Intuitively, the level-ℓ DPPR measures the connectivity from supernode V  to V  by taking the average of the DPPR values from the leaf nodes in V  to those in V  .The idea of measuring the proximity between two supernodes by summarizing the structure of the underlying leaf nodes is also adopted in [52,70,75].In particular, [52,70] also consider the average of all pairwise distances of the leaf nodes.
Compared with our level-ℓ DPPR, a simple and straightforward way is to take the level-ℓ children of supernode S as a weighted graph and compute the DPPR on it (called W-DPPR).More precisely, the graph is constructed by treating each supernode V  ∈ S as a node and merging the edges between supernodes V  and V  in S as a weighted edge.Although W-DPPR can be computed very efficiently as it requires only an  ( 2 ) cost, it ignores the micro-structures of each supernode and the paths containing nodes outside S, resulting in sub-par visualization quality.To exemplify, we consider Fig. 3, which shows a graph with nodes  0 - 5 and a level-2 supernode S with level-1 supernodes V 0 , V 1 , V 2 , as well as the level-ℓ DPPR (ℓ-DPPR in short) and W-DPPR values for supernode pairs (V 1 , V 0 ), (V 2 , V 0 ), and (V 2 , V 1 ).Intuitively, for the source supernode V 2 , V 1 has better connectivity to it than V 0 as the children of V 2 and V 0 share one common neighbor  2 , whereas the children of V 2 and V 1 have two common neighbors  3 and  4 .From the table in Fig. 3, we observe that the W-DPPR values of (V 2 , V 0 ) and (V 2 , V 1 ) are the same, which is counter-intuitive.In contrast, level-ℓ DPPR addresses this issue and accurately captures the structure of the original graph.

Efficiency Challenges
As per Eq.(1) and Eq.( 2), the level-ℓ PDist between V  and V  requires computing the exact PPR values for all pairs of leaf nodes in  (V  ) ×  (V  ), where  (V  ) (resp. (V  )) signifies the set of leaf nodes of V  (resp.V  ).As pointed out in prior work [86], exact PPR computation is prohibitive as it entails summing up an infinite series.An option is to compute the near-exact result by performing power iterations (PI) [59] until the absolute error of PPR is less than 10 −9 (the precision of float).That said, we still need to invoke PI for each of the  ( ℓ+1 ) leaf nodes in the specified level-(ℓ + 1) supernode S, where each invocation has  () time complexity [59].As a result, the near-exact computation of the level-ℓ PDist matrix  incurs a high cost of  ( ℓ+1 ), which may be  () in the worst case (i.e., for the highest level supergraph).
To alleviate the above-said efficiency issue, we resort to computing approximate level-ℓ PDist.Towards this end, we define the concept of (, )−approximate level-ℓ DPPR.Definition 3.5 ((, )−approximate level-ℓ DPPR).Let  and  be two constants, for any two supernodes V  , V  ∈ S and The following Lemma 3.6 shows that we can convert the (, )approximate level-ℓ DPPR into an approximate level-ℓ PDist.Accordingly, we focus on computing (, )−approximate level-ℓ DPPR in the rest of the paper.
To compute (, )−approximate level-ℓ DPPR values w.r.t. a source supernode V  , a straightforward approach is to utilize existing single source PPR (SSPPR) approximation methods [4,24,34,67,81,83].Specifically, the state-of-the-art methods [47,51,67,80,81,83] are built upon Forward-Push [4].Given a source leaf node   , Forward-Push maintains an estimated DPPR   (  ,   ) and a residue value  (  ,   ) for each node   , where  (  ,   ) is set to  (  ) and all   (  ,   ) and other residue  (  ,   ) are set to 0 initially.Then, Forward-Push starts a deterministic graph traversal from source node   , and at each step converts  portion of the residue of current node   (i.e.,  (  ,   )) into its estimated DPPR value   (  ,   ) and distributes the remaining 1 −  part to   's out-neighbors evenly.For convenience, we refer to the operation of distributing a fraction of node   's residue to one of its out-neighbors as a push operation.
In particular, the following invariant [4] holds during the course of push operations: where   (  ,   ) is an estimation of   (  ,   ) and the other term can be regarded as the estimation error.Intuitively, an exact DPPR can be obtained when the residue of each node is eventually depleted, i.e.,  (  ,   ) = 0 for all   ∈  .In practice, Forward-Push computes approximate DPPR by conducting pushes until every residue value is less than a given threshold   , referred to as the forward residue threshold.When   is small, Forward-Push incurs significant computational overhead, as a large number of push operations are required to deplete residues.To address this issue, most of the state-of-the-art methods [47,51,67,80,81] follow the two-phase paradigm proposed in FORA [81].In particular, they first invoke Forward-Push [4] with early termination conditions to derive rough approximations of DPPR values, and then refine results by exploiting random walk samplings [24] to estimate the error term in Eq. (3).To illustrate, consider the r.h.s.graph in Fig. 4. Instead of always using pushes, FORA-based methods would terminate at nodes  4 - 7 and simulate random walks from  4 - 7 as per their residues to probe the far-reaching nodes.
Unfortunately, FORA-based solutions still entail considerable computational overheads if applied directly to DPPR approximation for two reasons.First, although SSPPR approximation solutions can improve the efficiency of single-source DPPR computation, we need  ( ℓ+1 ) (up to  in the worst case) single-source DPPR queries for  ( ℓ+1 ) leaf nodes in the specified level-(ℓ + 1) supernode S, which is costly when ℓ is large.Second, most push operations or random walk samples in such methods are futile in the context of DPPR computation.This is due to the fact that these methods aim to obtain accurate DPPR estimates of all nodes w.r.t. the source.However, the majority of visited nodes by such operations are not in the supernode S that we are interested in visualizing.Notably, at level-1, roughly −  of nodes located outside S, where  is a small integer (e.g., 25) and  can be up to millions.To illustrate, consider the example in Fig. 5.For simplicity, we assume that supernode S is at level-1, and it contains 12 leaf nodes  0 - 11 .As shown in the l.h.s.graph in Fig. 5, Forward-Push iteratively performs push operations until the estimated DPPR values of all nodes, including 88 nodes  12 - 99 outside S, satisfied the desired approximation. 4 Tau-Push ALGORITHM In this section, we present Tau-Push, the efficient algorithm for estimating level-ℓ PDist via computing (, )−approximate levelℓ DPPR.At a high level, Tau-Push overcomes the limitations of FORA-based methods through (i) a filter-refinement paradigm, which eliminates redundant push operations or random walks without affecting the accuracy guarantee, and (ii) a grouped push strategy, which reduces the number of leaf-level invocations from  ( ℓ+1 ) to  ().In what follows, we first illustrate the main idea of Tau-Push in Section 4.1, followed by the algorithmic details of two subroutines of Tau-Push in Section 4.2.We further provide theoretical results of Tau-Push in Section 4.3.

Main Idea and Algorithm
Tau-Push estimates level-ℓ DPPR values by pushing in a bidirectional fashion.In particular, Tau-Push first performs a small number of forward push operations from the source V  to derive an accurate DPPR for each target node in the vicinity of V  .Next, it conducts a handful of pushes reversely along in-neighbors, referred to as backward push operations, from the rest of target nodes in S (i.e., the nodes far-reaching from V  ), to obtain their approximate DPPR values.To explain the backward push, we consider the r.h.s.graph in Fig. 5. Given  = 0.1 and node  10 with initial residue  ( 10 ,  10 ) = 1, we will convert 0.1 fraction of  ( 10 ,  10 ) to its estimated DPPR and propagate the remaining 0.9 to its in-neighbors  5 and  6 , which receive 0.9  ( 5 ) = 0.45 and 0.9  ( 6 ) = 0.9, respectively.By combining forward and backward pushes, we can avoid the issue of "pushing too deeply" mentioned in Section 3.3, and hence, prune excessive pushes/samples for nodes outside S, e.g., nodes  12 - 99 in Fig. 5.In other words, Tau-Push filters out the majority of nodes whose approximate DPPR values obtained by Forward-Push are sufficiently accurate, and only refines the DPPR estimation for each remaining node using backward pushes.
The linchpin to enabling the aforementioned filter-refinement optimization is the identification of nodes that requires backward pushes.In relation to this, we introduce the notion of degree-normalized PageRank (DPR).For each supernode V  , its DPR is defined as where Notice that the supernode S considered above is a level-1 supernode with  leaf nodes.As for any level-(ℓ + 1) supernode S containing  ( ℓ+1 ) leaf nodes, we need to invoke Forward-Push  ( ℓ+1 ) times for the computation of all-pair level-ℓ DPPR values in S, which is rather time-consuming when ℓ is large.To alleviate this issue, we design Group Forward-Push (GFP), an improved and optimized Forward-Push equipped with a grouped push strategy: it starts push operations from all leaf nodes in the level-ℓ supernode simultaneously instead of separately.Analogously, we also propose Group Backward-Push (GBP) for accelerating backward push operations in level-ℓ supernodes.
Algorithm.Algorithm 1 shows the pseudo-code of Tau-Push, which consists of two main phases: GFP and GBP.Specifically, given a supernode S and constant , Tau-Push starts by setting DPR threshold  as 1/ √  •  and forward residue threshold as at Lines 1-2.The setting of  strikes a good balance between forward and backward phases, leading to an optimized worst-case time complexity (as analyzed later in Section 4.3).Next, for each supernode V  ∈ S, Tau-Push invokes GFP with residue threshold   from V  to derive   (V  , V  ), a rough estimation of DPPR for each supernode V  ∈ S (Lines 3-4).Subsequently, Tau-Push calculates the residue threshold    used in GBP as per the following equation (Line 5): where the denominator signifies the maximum average degree of leaf nodes in each supernode V  of S. Tau-Push identifies all supernodes V  satisfying DPR   >  and utilizes GBP with residue threshold    to refine the approximate DPPR value of every supernode pair V  , V  in S (Lines 6-7).Eventually, for each supernode pair V  , V  ∈ S, the level-ℓ approximate DPPR is converted to the desired level-ℓ approximate PDist (Lines 8-9).

GFP and GBP
In the following, we elaborate on the algorithmic details of GFP and GBP used in Tau-Push.
GFP. Algorithm 2 shows the pseudo-code of GFP.Akin to Forward-Push, GFP maintains two lists of variables in the course of pushes from source supernode V  : (i) the estimated DPPR   (V  , V  ), ∀V  ∈ S, and (ii) the residue  (V  ,   ), ∀  ∈  .Initially, all variables are set to 0 except that  (V  ,   ) is set to | ( V  ) | for each node   in the leaf node set  (V  ) of V  (Lines 1-2).After that, GFP starts an iterative process to traverse  from nodes with non-zero residues, i.e., nodes in  (V  ).In particular, in each iteration, it inspects the residue of each node in  to identify a node   whose residue is greater than  (  )•  .If such a node   exists, GFP first adds , where V  is the supernode containing   (Lines 4-5).Then, it evenly distributes the remaining (1 − ) fraction of residue to the out-neighbors of   (Lines 6-7).Afterwards, GFP resets the residue  (V  ,   ) to 0 (Line 8) and proceeds to next iteration.Lemma 4.1 indicates the correctness of Algorithm 2. Lemma 4.1.Given a source supernode V  ∈ S and a DPR threshold , by setting   as in Eq. (5), GFP returns (, )−approximate level-ℓ DPPR   (V  , V  ) for V  ∈ S with DPR   ≤ .
GBP. GBP can be regarded as the backward counterpart of GFP.As shown in Algorithm 3, GBP initializes residue value  (  , V  ) of each leaf node   in target supernode V  to 1/| (V  )| (Line 2).Distinct from Forward-Push, GBP conducts the graph traversal from the target supernode V  , following the incoming edges of each node.To be more precise, GBP evenly pushes (1 − ) fraction of residue  (  , V  ) to the in-neighbors of current node   (Lines 6-7), rather than out-neighbors in GFP.In addition, another two minor differences are: (i) the residue threshold    is defined as Eq. ( 6) (Line 3), and (ii) in each iteration, GBP increases the estimated , where V  is the supernode consisting of current node   (Lines 4-5).Lemma 4.2 proves the correctness of GBP.Lemma 4.2.Given a target supernode V  ∈ S and a threshold    in Eq. ( 6), GBP returns the (, )-approximate level-ℓ DPPR   (V  , V  ) for each source supernode V  ∈ S and V  ≠ V  .

Theoretical Results
Correctness.Combining Lemmata 4.1 and 4.2 leads to Theorem 4.3, which establishes the correctness of Tau-Push (Algorithm 1).Theorem 4.3.For any user-selected supernode S and threshold , by setting   and    as in Algorithm 1, respectively, Algorithm 1 returns (, )−approximate level-ℓ DPPR   (V  , V  ) for V  , V  ∈ S and V  ≠ V  .
Time complexity.The worst-case time complexity of Tau-Push is

𝜖
, which is √  •  times slower than Tau-Push.The expected complexity of Tau-Push for a randomly selected supernode S is  Tau-Push vs. FORA.We compare Tau-Push with FORA in terms of time complexity and index space for a randomly selected supernode S. As summarized in Table 2, Tau-Push improves the time where  is the number of supernodes/nodes to be visualized and is usually small as discussed in Section 2.2.For example, Tau-Push is about four orders of magnitude faster than FORA on the Youtube graph with 1 million nodes and 3 million edges.Besides that, we observe that the indexing space of FORA is usually 3-5× larger than that of Tau-Push in the experiments.

PPRviz
This section presents PPRviz, our framework for static graph visualization.Fig. 7 illustrates the procedure of PPRviz, which consists of two phases, i.e., (i) preprocessing and (ii) interactive visualization.In the preprocessing phase, it first constructs a supergraph hierarchy for the given graph , then builds the index for the proposed Tau-Push.It is worth noting that the preprocessing phase is conducted only once for any queries on the given .In the interactive visualization phase, PPRviz contains two stages, namely PDist matrix computation and position matrix embedding.As the index schema of Tau-Push has been explained in Section 4.3, we focus on illustrating the remaining stages as follows.
• Supergraph hierarchy construction: PPRviz first constructs a supergraph hierarchy of the input graph in the preprocessing step [1,5,68].Take Fig. 7(a) as an example.The bottom layer is the input graph, whose non-overlapping node partitions are organized as level-1 supernodes in the middle layer.Similarly, the clustering procedure is repeated to generate the top layer.We employ the Louvain algorithm [13] to construct the supergraph hierarchy.Specifically, Louvain first treats all level-ℓ supernodes and their relationships as a new graph and assigns each supernode to a stand-alone partition.After that, it builds the partitions of level-ℓ supernodes (i.e., supernodes at level-(ℓ + 1)) by iteratively moving each supernode to the neighbor's partition with maximum modularity improvement, where the moduarlity [55] measures the density of links inside partitions as compared to links between partitions.Following the size requirement in Section 2.2, we adapt Louvain under the constraint that each supernode S (resp.the coarsest supergraph) should have at most  children (resp.supernodes) and call this adaption Louvain+.
Note that  can be configured by users according to their needs.where X[] ∈ R 2 represents the two-dimensional coordinate of the -th child node on the screen.In particular, the position matrix embedding step solves the following optimization problem [28]: Intuitively, it aims to ensure that the Euclidean distance ||X[] − X[ ] || derived from the position matrix X is close to the PDist [, ].Towards this end, PPRviz employs the standard method for solving Eq. ( 7), namely, the stress majorization technique [28].
The time complexity of this method is  ( 3 ) [28], which is insignificant as  is usually small to avoid visual clutter [37].
Applications and future works.The proposed framework PPRviz is not limited to multi-level visualization and is applicable for visualizing static homogeneous graphs in other scenarios.For example, PPRviz can visualize the entire graph or the subgraph returned by the graph query in a single-level fashion [11], where PPRviz skips the supergraph hierarchy construction stage and sets  =  for the visualization phase.Furthermore, PPRviz can be employed for incremental exploration [33], where users can move a focal area over the entire graph to explore the subgraph they are interested in.Notice that PPRviz can easily cope with the small dynamic graphs (including subgraphs, motifs, etc.) returned by database queries because our proposed Tau-Push algorithm is highly efficient and can be real-time responsive when a new visualization request is made.
Regarding the scenarios on large dynamic or attributed graphs, even though existing solutions [82,88] are applicable for the supergraph hierarchy construction and position embedding stages of PPRviz, the proposed PDist or corresponding estimation algorithm Tau-Push fails to extend to these scenarios trivially.In particular, under the dynamic setting, it is still unclear how to efficiently estimate the proposed PDist with a rigorous theoretical accuracy guarantee.In addition, PPRviz can be extended to visualize attributed or knowledge graphs by treating node/edge attributes as additional nodes [85].However, such a method requires a new distance measure considering both topologies and attributes.These challenges motivate us to design new approaches in future works.

RELATED WORK
Graph visualization.Conventional single-level graph visualization methods have been extensively studied [29,33,36,76] and can be classified into two main categories: (i) force-directed methods, e.g., FR [25], LinLog [57], ForceAtlas [38] and others [17,23,38,52]; and (ii) stress methods, e.g., CMDS [28], PMDS [15] and others [27,41,53,74].In particular, force-directed methods model a graph as a force system, where adjacent nodes attract each other and all nodes repulse each other.The position matrix is derived by minimizing the composite forces in the entire system.Stress methods utilize the shortest distance as the node distance to guide node placement.They embed the shortest distance matrix into a position matrix by optimization techniques, e.g., gradient descent [27] and stress majorization [28].To boost efficiency, many optimizations are incorporated, e.g., grid-based partitioning [25,52], quad-tree [38,57], and dimensionality reduction [15].Moreover, as surveyed by [30], graph embedding methods, e.g., GFactor [3], SDNE [77], LapEig [9], LLE [64] and Node2vec [31], can also be applied for visualization by treating the embedding matrix with dimension being 2 as the position matrix [30,60,71].Multi-level methods [1,5,35,52,53,63,68] mainly focus on designing graph clustering algorithms, and the single-level methods are directly employed to layout each cluster.For example, OpenOrd [52] clusters nodes based on their Euclidean distances in the graph layout and uses FR for visualization; KDraw [53] applies label propagation for clustering and uses [27] for visualization; GrouseFlocks [5] groups nodes based on certain graph structures.Compared with our PDist distance matrix, the distance measures employed in existing methods fail to preserve the topological information comprehensively and hence dampen visualization quality.Specifically, force-directed methods only consider the direct links in the graph.Stress methods consider the shortest path from the source node to the target node but ignore other intricate paths.A related work [17] uses the force-directed method for visualization after determining the edge weights by PPR; however, this method is still inherently a force-directed method, hence suffering from the corresponding limitations.
PPR computation.The efficient computation of PPR has been extensively studied [4, 24, 34, 40, 47, 49, 50, 67, 69, 78-81, 83, 86, 87].Among these works, BEAR [69] and BePI [40] improve the power iteration method [59] and achieve high efficiency by indexing several large matrices.However, the index space limits their applicability to large graphs.BiPPR [49] and HubPPR [79] combine random walks [24] with Backward-Push [50], and are subsequently improved by FORA [81].However, FORA and its improved solutions [34,47,80,83] suffer from numerous push operations and ineffective sample problems as illustrated in Section 3.2.This is because our level-ℓ PDist computation aims at the aggregated PPRs between two clusters, which is essentially different from PPR computation for a single source node.Another line of work computes PPR using the idea of particle filtering [26], which performs a deterministic graph traversal from a set of source nodes.Nevertheless, it is non-trivial to determine the initial particle distribution for level-ℓ DPPR, and it does not offer quality guarantees.In contrast, Tau-Push can significantly outperform existing visualization and PPR solutions in efficiency with quality guarantees.

EXPERIMENT EVALUATION
We introduce the experiment settings in Section 7. and KDraw [53].We follow the parameter settings of all competitors as recommended in their respective papers.For PPRviz, we set the maximum number of nodes in a cluster to 25 (i.e.,  = 25) as suggested in [37].For a fair comparison, we follow [5,63] and modify OpenOrd and KDraw such that only the partial view of the clusters S in the zoom-in path is visualized.Since OpenOrd does not allow cluster size constraint and KDraw employs a complicated method to determine cluster size, we set the maximum number of supernodes in the coarsest supergraph (instead of all levels) to .We observe that PPRviz usually shows more nodes than OpenOrd and KDraw in a visualization, and hence the efficiency of PPRviz is not caused by processing fewer nodes.Additionally, we compare our proposed Tau-Push against 4 PPR approximation solutions PI [59], FORA [81], FORA+ [80] and ResAcc [47].For all methods, we set  = 1 − 1/ and  = 1/(10) by default.For FORA-based competitors FORA, FORA+, and ResAcc, we set the initial residue  (  ,   ) =  (  ) for the source each   and follow the settings of other parameters as suggested in the original papers, where the correctness of PDist estimation is still satisfied.The experiment source code and the implementation of PPRviz are available at https://github.com/jeremyzhangsq/PPRviz-reproducibility/.
Datasets and performance metrics.We use the 12 real-world graphs in Table 3 for the experiments.We generate visualizations in a single-level fashion on 6 smaller graphs, and use ND and ULCV to evaluate the visualization quality of PPRviz and the competitors.For a fair comparison, we follow NetworkX [32] and normalize each layout to the same scale.The 6 larger graphs are used to evaluate visualization efficiency, on which we report the response time and total preprocessing time.For the single-level methods, the response time is the time to visualize the entire graph.For PPRviz and the multi-level methods, the response time is the average visualization time for the children of each supernode over 100 random zoom-in paths.Each path starts with the supergraph on the highest level (corresponds to the entire graph) and randomly selects a supernode in each level until reaching level-0 (i.e., the original graph) to simulate interactive exploration.The preprocessing time is the time taken before visualization.We terminate a method if its response time (resp.preprocessing time) exceeds 1000 seconds (resp.12 hours).All experiments are conducted on a Linux machine with Intel Xeon(R) Gold 6240@2.60GHzCPU and 80GB RAM in single-thread mode.
Note that the memory utilized by PPRviz is comparable to that used by storing the input graph.

Visualization Quality
We evaluate the visualization quality of PPRviz and the competitors quantitatively via two popular aesthetic metrics, and qualitatively via a user study and a case study on 6 smaller graphs.

Aesthetic Metrics.
We report the ND and ULCV scores of all approaches in Table 4 and Table 5, respectively.Since OpenOrd applies FR to visualize each supergraph, we treat them as one method and report their results in one column.The results of KDraw are omitted as it fails to process graphs with multiple connected components.We use "-" and ∞ to indicate undefined and infinity scores, respectively.Table 4 shows that PPRviz consistently outperforms the competitors in ND on all graphs except TwEgo and FbEgo, where PPRviz has comparable performance to the best method FR.Note that FR achieves the smallest ND scores on TwEgo and FbEgo because it places nodes of the largest connected component far apart from each other, which causes the edge distortion issue.In particular, the ND scores of ForceAtlas, FR and LinLog are 5, 2, and 3 orders of magnitude larger than PPRviz on SciNet, respectively.Regarding stress methods, we find that CMDS and PMDS yield infinite ND scores, indicating severe node overlapping issues.This is because they use the shortest distance between two nodes as the node distance, which only takes a few discrete values and thus fails to distinguish from different node pairs.Furthermore, PMDS computes the position of a non-pivot node as the weighted combination of its connected pivot nodes.Thus, degree-one non-pivot nodes connected to the same pivot will share the same position.The graph embedding methods, especially SDNE and LapEig, have worse ND scores than PPRviz and other competitors, as embeddingbased methods are specially designed for machine learning tasks without considering visualization quality.From Table 5, we can see that PPRviz always performs the best in ULCV.For instance, on    To answer the first question, we generate 6 groups of visualizations, each of which is obtained on one of the 6 small graphs using PPRviz and 4 representative competitors, including FR, CMDS, Node2vec, and SimRank, that achieve relatively good performance in Section 7.2.1.In each group, the 5 visualizations generated by different methods are randomly shuffled before presenting to the participants.Following the assessment paradigm in graph drawing [21,45,84], the participants are asked to rank the visualizations in terms of readability and cluster structure, which are identified as two paramount tasks by [8,29] and are described as follows.
• Task 1 (T1): rank the 5 visualizations in each group from the highest (1) to the lowest readability (5), where high readability means that the graph elements are well displayed.• Task 2 (T2): rank the 5 visualizations in each group from the best (1) to the worst (5) in terms of the structure layout, where a good layout means that clusters and strongly connected nodes can be clearly observed.
To answer the second question, we compare the visualizations generated by employing Tau-Push in PPRviz with those by employing the near-exact solution PI in PPRviz.Since PI does not scale to large graphs, we consider two small graphs: FilmTrust and SciNet.For both methods, we vary  in {15, 20, 25}.For each supergraph, we generate 2 visualizations using Tau-Push and PI and organize them as a group.The visualizations in each group are displayed to the participants without telling them the methods, and the participants are asked to conduct the following task.
• Task 3 (T3): for the 2 visualizations in a group, select the one with superior visualization quality.
Note that for each task, we collected 180 instances (30 participants × 6 groups).Fig. 8(a) (resp.Fig. 8(b)) reports the ranking frequency of the visualization generated by each approach in T1 (resp.T2).In particular, the rankings of PPRviz are concentrated on top-1 and top-2 for both tasks.This result demonstrates PPRviz's sound quality in terms of readability and cluster inspection, which is consistent with our ND and ULCV results reported in Section 7.2.1.Regarding the results of T3, we find in Table 6 that the participants cannot tell the quality difference between the visualizations generated by Tau-Push and PI in most cases, and the times the two methods are selected as the best are comparable, demonstrating that the approximate level-ℓ PDist computation in PPRviz does not affect its visualization quality.

Case Study.
At last, we conduct a case study to compare the visualizations of PPRviz and other competitors.In Fig. 9, we report the results of PPRviz and all competitors on TwEgo.Besides that, we also compare the results of PPRviz and the best competitor FR on two large graphs Wiki-ii and FilmTrust, which are shown in Fig. 10 and Fig. 11, respectively.We find that the comparison results of visualization are consistent with those of metrics in Section 7.2.1.More concrete, PPRviz yields a high-quality visualization, which clearly organizes the graph into a well-connected cluster and several cliques.In contrast, the competitors suffer from issues such as node overlapping and edge distortion.Furthermore, as shown in Fig. 9(d) and Fig. 11(b), the structures of small-size clusters yielded by the best competitor FR are hard to recognize.This is because nodes in the small-size clusters are more likely to be affected by the repulsive force from the large cluster, making all of them huddle together.

Visualization Efficiency
Response time.Fig. 12 shows the response time on the 6 larger graphs, including Amazon, Youtube, Orkut, DBLP, It-2004, and Twitter.We only report the results of PPRviz, OpenOrd, KDraw, PMDS, and Node2vec, and omit the rest of the competitors as they fail to terminate within 1000 seconds for all datasets.We can observe that PPRviz consistently outperforms all competitors in terms of response time.Specifically, PPRviz outputs the visualization result within 1 second on all datasets.For example, PPRviz costs 0.63 seconds on the Twitter graph 3 billion edges, while all competing methods take more than 1000 seconds.Regarding the competitors, PMDS only computes the positions of several pivot nodes and determines the positions of other nodes by linear combinations of the pivot nodes.However, it can only visualize the entire Amazon graph in 45 seconds.In addition, Node2vec incurs costly random walk simulations, rendering it rather inefficient on large graphs and can only return the visualization for Amazon in 845 seconds.The two multi-level methods, i.e., KDraw and OpenOrd, have comparable performance to PPRviz on Amazon, i.e., the smallest one among the 6 graphs, but turn to be markedly inferior to PPRviz when graph size increases.The reason is that KDraw requires computing the forces between the leaf nodes inside a supernode which is rather costly for high-level supergraphs, and OpenOrd takes many iterations to determine the layout in its five-stage design.
Preprocessing time.Fig. 13 plots the preprocessing time of the multi-level methods, i.e., PPRviz, OpenOrd, and KDraw, as the single-level methods do not have the preprocessing phase.All three methods conduct hierarchical clustering for the input graph.However, PPRviz also computes the DPR vector and some single-target PDist scores.The results show that the processing time of PPRviz is two to three orders of magnitude faster than OpenOrd and one order of magnitude faster than KDraw.Moreover, both OpenOrd and KDraw cannot finish preprocessing for the largest Twitter graph within 12 hours while PPRviz takes only 33 minutes.This is because OpenOrd computes the layout of the entire graph first and then conducts hierarchical clustering on the two-dimensional layout.For KDraw, its clustering algorithm [54] is more expensive than Louvain+ in PPRviz.
Vary cluster size.even with  = 100, above which the intra-structure of a supernode becomes dense and overburdens human perception [37].

Tau-Push Performance
Comparing with other solutions.To validate the efficiency of the Tau-Push algorithm for PDist computation, we replace Tau-Push with 4 alternative solutions as mentioned in Section 7.1, and then compare PPRviz against these variants in terms of response time, preprocessing time, and index size on the 4 largest graphs including Youtube, Orkut, It-2004, and Twitter.In terms of response time, we use "-" to indicate that an approach fails to terminate within 1000 seconds.Table 8 shows that all PPRviz variants incur more than 1000 seconds on all tested graphs.This is because they need to compute PDist from  () leaf nodes for the top-level supergraph.As shown in Ablation study.In this set of experiments, we verify the effectiveness of the three techniques in Tau-Push, i.e., the grouped push strategy, DPR-guided termination trick, and the filter-refinement optimization delineated in Section 4.1.First, to demonstrate the effectiveness of the grouped push strategy, we replace Forward-Push in FORA with GFP and call this variant GFRA.In particular,    8, the response time of GFRA is at least four orders of magnitude faster than FORA by adopting the grouped push strategy in our Tau-Push.
Note that if we set  = max V  ∈S\V    for Eq.( 5), all values returned by GFP are (, )−approximate level-ℓ DPPR.Here, we call this variant GFP (  ) and compare GFRA against it.We find in Table 8 that GFP (  ) improves GFRA on the Orkut, It-2004, and Twitter graphs that have massive edges, with 3×, 2.3×, and 4× speedups, respectively.Furthermore, as shown in Table 10, GFP (  ) incurs much less space overhead for index storage.The reason is that on such graphs, GFRA requires a multitude

A APPENDIX A.1 Algorithmic Details
Louvain+.To construct supergraph hierarchy, a solution is directly using multilevel community detection algorithms such as Louvain [13], which merges well-connected nodes based on modularity optimization [55].However, Louvain boils down to two defects for visualization: (i) the number of communities (supernodes) in the highest level is too large as there is no merge that increases the modularity after some point, which causes visual clutter; (ii) the number of nodes in the communities are imbalanced.Specifically, low-level supernodes tend to contain many children, which leads to visual cluster while high-level supernodes usually contain only a few children and thus provide very limited structural information about the graph.To fix these, we extend Louvain to Louvain+.Here we ignore the direction in the raw graph and take the undirected graph as the input for community detection.To generate a level-(ℓ + 1) supergraph  ℓ+1 , the detailed clustering strategy is that either (i) directly merge supernode S to its neighboring supernode T if T is the only neighbor; or (ii) merge S to its neighboring supernode T with the largest modularity gain  (S, T ) if the size of T after this merge is less than .The modularity gain  (S, T ) after merging level-(ℓ + 1) supernodes S and T is defined as Following the time complexity analysis of vanilla FORA, we can show that GFRA costs to approximate all-pair level-ℓ DPPR in S. For the setting of   , we balance the complexity of Tau-Push from all source nodes V  ∈ S with that of the random walk phase.Thus, we have Plugging Eq. ( 11) into the above inequality gives ND(X) ≤ 0.215 Based on the range of PDist, the edge length is in the range of the following approximation should be satisfied the relative error part of (, )-approximate PDist holds.Akin to the above analysis, we can derive that by setting Proof of Lemma 4.1.We first need a crucial property of Algorithm 2 in Lemma A.2. Lemma A.2.By initializing the residue values as Line 2, Algorithm 2 satisfies the following invariant Based on Lemma A.2, for any supernode V  , V  ∈ S, the approximation error is bounded by •   (  ,   ).
As GFP stops when each residue value  (V  ,   ) ≤   •  (  ), the error bound turns to (, )−approximate.Based on Eq. ( 4), for a target V  ∈ S, the correctness of GFP is guaranteed by setting and  (V  ,   ,   ) as the lower bound and residue value, which is firstly initialized from   and then distributed to   .Initially,  (V  ,   ,   ) is set to  (  )  ( V  ) (Line 2).According to the proof in [4], the following equation holds for each   ∈  in graph traversal, Note that for each   ∈  ,   ∈ ( V  )   (V  ,   ,   ) =   (V  ,   ) and residue value holds the same relationships.By summing up all   ∈  (V  ), the above equation turns to By Eq. ( 2), the DPPR between supernodes V  and . Then the above equation can be transformed to •  (  ,   ).
Proof of Lemma 4.2.Similar to the invariant property of conventional Backward-Push [50], GBP has the following property.
Proposition 1. GBP satisfies the following invariant: Based on Eq.( 13) and Note that   ∈  (  ,   ) = 1 and GBP stops when  (  , V  ) ≤    for all   ∈  .Hence, the error term on the r.h.s. is upperbounded by ∑︁ For a source supernode V  , the approximation quality is met with For any sampled node   , let  , be a Bernoulli variable that takes value 1 if the random walk starting from   stops at any node of  (V  ).By definition, Recall that   is selected with probability of and the estimation is refined by which is exactly the second term of Lemma A.2. Thus, it is an unbiased estimator of   (V  , V  ).Next, by applying Chernoff Bound [16], we can derive that, for any V  , V  ∈ S, • when   (V  , V  ) < , by setting  =    •  , we have Pr ) ≤   .

A.4 Complexity Analysis and Constant Setting
For the following analysis about time complexity and  setting, we focus on the level-0 children, i.e.,   of a level-1 supernode S, where the worst complexities of GFP and GBP are achieved.Take GFP as an example.As the level increases, each supernode V  contains more leaf nodes and   tends to 1/ (i.e., the average DPR of the entire graph).Thus, the largest   and worst time complexity occur in the level-0.
Time for the worst case.The threshold  determines how quickly GFP can be terminated and how many GBP are performed.A small  engenders a low cost for GFP and a high cost for GBP, and vice versa.Therefore, the appropriate setting of  should balance the workloads of GFP and GBP in Tau-Push.First, note that the compleixty of GFP for a given leaf source   ∈ S is   (  ) •   [4], by which the worst time complexity of GFP occurs when the source out-degree  (  ) is the largest.Assume that there is only one largest out-degree in the scale-free network.Following the proof by [18], the largest out-degree is

𝜏𝜖𝛿
. Since the degree and DPR follows the same power law [48], the probability that a random-select node has out-degree larger than   is 1/  .Hence, following the proof by [18],   =  () for the scale-free networks.Therefore, the time complexity turns into    +   .For the single-source approximation in GFP, we set  to 1  10• =  ( 1  ) [67] as nodes in the same supernode have good connectivity (i.e., large DPPR value) and we focus on the top- () DPPR values for a source.However, the above setting is not suitable for single-target approximation in GBP because the source nodes and their degrees are unknown with a given target node.Hence, we set  =  (/) for GBP as empirically  A.5 More Experiments More metrics results.In this part, we evaluate the performance of PPRviz and other competitors in terms of another aesthetic metric: angular resolution (AR).Here, we omit the formal definitions of AR, and refer interested readers to [72].Roughly speaking, AR measures the angles of adjacent edges and a smaller angle leads to a larger AR value.Intuitively, larger adjacent edge angles are friendly to user perception.Therefore, a lower score of AR indicates better quality.We report the AR scores of all methods in

Figure 4 :
Figure 4: An example of Forward-Push.
We first store  () DPR values as the index, which can be efficiently pre-computed in a similar way to global PageRank by setting the -th entry in the initial global PageRank as (  )  .Notice that GBP is only conducted from target V  with   >  = 1/ √  • and is independent of the query supernode S. Hence, the index space is   • √  •  since GBP estimates DPPR of  √  •  target nodes w.r.t. () source supernodes in S. Overall, the index space of Tau-Push is   +  • √  •  .

Figure 8 :
Figure 8: Results of T1 and T2, frequency of selected ranking.

Figure 9 :
Figure 9: Visualization results and aesthetic metrics for the TwEgo graph: force-directed methods are marked with ♦; stress methods are marked with ▲; graph embedding methods are marked with ★.

Figure 10 :
Figure 10: Visualization of PPRviz and FR on Wiki-ii.

Figure 12 :
Figure 12: Response time of selected methods (★ indicates failure to finish preprocessing within 12 hours).

1 𝑏− 1
, where  ∈[2,3] is the exponent of degree distribution and  = 2 on the Twitter graph.Hence, the worst time complexity of Lines 1 -2 of Algorithm 1 is    .Next, for a leaf target   , the worst time complexity of GBP is   ∈  (  ) • (  ,  )  [78] and can be simplified as   •     using Eq.(4).As   =  (1) and    =    , the time complexity of GBP can be re-written as    •  .Since there are at most  (1/) target nodes with average DPR larger than , the time complexity of Lines 3 -4 of Algorithm 1 is  1  •   •  . setting.Based on the aforementioned analysis, the worst-case time complexity of Tau-Push is    +   •

.
Recall that  is set to 1/ √ .Hence, given a random query supernode S, Tau-Push will only conductGFP if E[] ≤ 1/ √ , i.e.,  ≥  3 .Note that GFP costs   ( V  ) | ( V  ) | •  [4] from a supernode V  ∈ S, by setting   as Line 1 in Algorithm 2 and   as Lemma 4.1.Therefore, by plugging E[] into the above complexity, we can derive that the complexity of Tau-Push for a random selected supernode S is V  ∈S  ( V  ) | ( V  ) | •   .For easy presentation, we simplify the above complexity by setting / =  (log ) andV  ∈S  ( V  ) | ( V  ) | =  ( • log ),where  is the number of supernodes in S and  (log ) is the average node degree of scale-free networks.Recall that  =  (1/)[67].Hence, the complexity is massaged into   3 •(log )2   .

Figure 15 :
Figure 15: Visualization results for the FbEgo graph: force-directed methods are marked with ♦; stress methods are marked with ▲; graph embedding methods are marked with ★.

Figure 16 :
Figure 16: Visualization results for the Wiki-ii graph: force-directed methods are marked with ♦; stress methods are marked with ▲; graph embedding methods are marked with ★.

Figure 17 :
Figure 17: Visualization results for the Physician graph: force-directed methods are marked with ♦; stress methods are marked with ▲; graph embedding methods are marked with ★.

Figure 18 :
Figure 18: Visualization results for the FilmTrust graph: force-directed methods are marked with ♦; stress methods are marked with ▲; graph embedding methods are marked with ★.

Figure 19 :
Figure 19: Visualization results for the SciNet graph: force-directed methods are marked with ♦; stress methods are marked with ▲; graph embedding methods are marked with ★.
× be the PDist matrix for all node pairs in a graph  and [, ] be the PDist between nodes   and   .We define [, ] as Figure 1: PPR and PDist in a toy graph.Definition 3.1 (PDist).Let  ∈ R Given the PDist matrix  of a graph , suppose that ||X[] − X[ ]|| = [, ] ∀  ,   ∈  , then ND(X) ≤ 0.215 •  + 0.01752 , where  is the Euler constant.Given the PDist matrix  of a graph , suppose that ||X[] − X[ ]|| = [, ] ∀  ,   ∈  and the restart probability (  ,   ) is the DPPR of   w.r.t.  and  (V  ) signifies the set of leaf nodes in V  .In particular, when V  is a leaf node   (i.e.,  (V  ) =   ), its DPR   is the summation of DPPR values pertinent to   , which indicates the global importance of  h.s.graph in Fig.5shows that, given  = 0.05, nodes  1 - 8 and  11 can be filtered out as their DPR values are less than , and hence, we only need to conduct backward pushes from  9 and  10 .We observe that DPR values of real-world graphs often follow the power law distribution, where only a few nodes have large values and the remaining nodes have nominal values.For instance, as shown in Fig.6, the DPR values of over 99.98% of nodes on the wellknown Youtube graph are less than 0.001, while the largest DPR is about 0.005.Hence, by checking DPR and choosing a proper , we can exclude the great majority of nodes in S for further refinement, thereby reducing the computational cost.It is worth mentioning that the DPR of each leaf node can be efficiently calculated in the preprocessing stage and used as the input to Tau-Push.
in the entire graph.Building on our theoretical analysis, if the residue threshold   used in Forward-Push is set properly, the approximate DPPR values of nodes   with   less than the given DPR threshold  are guaranteed to be (, )-approximate (see Lemma 4.1).For example, the r.

Table 2 :
Index space and time complexity of methods for approximate level-ℓ PDist computation.

Table 4 :
ND of PPRviz and the baselines, the best in bold and the second best in italic, ∞ indicates infinity.

Table 5 :
ULCV of PPRviz and the baselines, the best in bold and the second best in italic, "-" indicates undefined.

Table 6 :
Results of T3, frequency of being selected.User Study.In the second set of experiments, we conduct a user study to evaluate the visualization quality of PPRviz.This study aims to answer two questions: (i) does PPRviz output better visualization results than the competitors, and (ii) does the approximate level-ℓ PDist computation in PPRviz affect visualization quality from the perspective of human observers?We recruited 30 participants with 6 females and 24 males, among which 28 individuals are aged 20 to 30 and 2 individuals are aged 30 to 40.
SciNet, PPRviz is 14× better than LinLog.The superior performance of PPRviz is attributed to our carefully designed PDist.7.2.2 Table 7reports the preprocessing time and response time of PPRviz by varying the maximum number of nodes (i.e., the cluster size limit ) on the largest test graph Twitter.We exclude KDraw and OpenOrd from this experiment because configuring  is difficult for them, as discussed in Section 7.1.PPRviz's preprocessing time drops when  increases because Louvain+ organizes more nodes into a supernode with larger , resulting in fewer supernodes in each level-ℓ supergraph and fewer levels of the supergraph hierarchy.For interactive visualization, PPRviz's response time increases with , and the main reason is that more pairwise PDist are computed with more nodes in each visualization.Furthermore, the response time of PPRviz is only 2.10 seconds

Table 7 :
Time of PPRviz on Twitter by varying  (in seconds).

Table 9 ,
all methods achieve comparable performance regarding the preprocessing time, since supergraph hierarchy construction dominates preprocessing cost.The two PPRviz variants using PI and ResAcc have slightly shorter preprocessing time as they only need to construct the supergraph hierarchies.For the same reason, they require much less space to store the indices as shown in Table10.In contrast, FORA (resp.Tau-Push) precomputes random walk samples (resp.DPR values and results for GBP) as indices.Compared with FORA, our Tau-Push costs less space for indices because Tau-Push eliminates the need to store random walks by precomputing DPR to guide the termination of push operations.Specifically, besides  () supernode partitions, PPRviz only requires storing  DPR values and  ( •

Table 8 :
Response time of PPRviz variants (in seconds).

4 .
Proof of Lemma 3.6.We call the approximate PDist mentioned in Lemma 3.6 (, )-approximate PDist.We first focus on the relative error part in Lemma 3.6.Denote  =  ( V  ,V  )+  ( V  ,V  )  To ensure [, ] − [, ] ≤  • [, ],the following approximation should be satisfied Hence, correctness is guaranteed to all source supernodes V  ∈ S\V  by setting    =  •   , where   = max V  ∈S\V    .Proof of Theorem A.1.Recall that the input values of sampling phase satisfy (see Lemma A.2)

Table 11 :
AR of PPRviz and the competitors.Smaller value indicates better visualization quality, the best in bold and the second best in italic, and "-" indicates undefined.  ∈ ( S)   (  ,   )/ (S) =  •   / for a random level-1 supernode S and   should be comparable to .With these configurations, by setting  = 1/ √ , the worst-case time complexity of Tau-Push in Algorithm 1 is minimized to    • √  .Note that, by employing GFP only, the worst-case complexity is   2  Time for a random supernode S. In scale-free networks, both DPR and DPPR values follow the power law [48, 49, 67], hence, the i-th largest DPR value is 1  •log  [67].As there are / supernodes at level-1 and E[] is upper-bounded by the average of the (/) largest DPR values, we have

Table 11 .
As illustrated in

Table 11 ,
PPRviz has the best or second best AR scores on all test graphs except Physician.More visualization results.In Figures 15 to 19, we report the visualization results of PPRviz and other competitors on FbEgo, Wiki-ii, Physician, FilmTrust and SciNet graphs.Specifically, we enlarge the results of PPRviz and a promising competitor ForceAtlas for user comparison.