PaGE-Link: Path-based Graph Neural Network Explanation for Heterogeneous Link Prediction

Transparency and accountability have become major concerns for black-box machine learning (ML) models. Proper explanations for the model behavior increase model transparency and help researchers develop more accountable models. Graph neural networks (GNN) have recently shown superior performance in many graph ML problems than traditional methods, and explaining them has attracted increased interest. However, GNN explanation for link prediction (LP) is lacking in the literature. LP is an essential GNN task and corresponds to web applications like recommendation and sponsored search on web. Given existing GNN explanation methods only address node/graph-level tasks, we propose Path-based GNN Explanation for heterogeneous Link prediction (PaGE-Link) that generates explanations with connection interpretability, enjoys model scalability, and handles graph heterogeneity. Qualitatively, PaGE-Link can generate explanations as paths connecting a node pair, which naturally captures connections between the two nodes and easily transfer to human-interpretable explanations. Quantitatively, explanations generated by PaGE-Link improve AUC for recommendation on citation and user-item graphs by 9 - 35% and are chosen as better by 78.79% of responses in human evaluation.


INTRODUCTION
Transparency and accountability are significant concerns when researchers advance black-box machine learning (ML) models [19,35].Good explanations of model behavior improve model transparency.For end users, explanations make them trust the predictions and increase their engagement and satisfaction [1,10].For researchers and developers, explanations enable them to understand the decisionmaking process and create accountable ML models.Graph Neural Networks (GNNs) [43,55] have recently achieved state-of-the-art performance on many graph ML tasks and attracted increased interest in studying their explainability [25,45,47,52].However, to our knowledge, GNN explanation for link prediction (LP) is missing in the literature.LP is an essential task of many vital Web applications like recommendation [26,42,49] and sponsored search [9,20].GNNs are widely used to solve LP problems [50,56], and generating good GNN explanations for LP will benefit these applications, e.g., increasing user satisfaction with recommended items.
Existing GNN explanation methods have addressed node/graphlevel tasks on homogeneous graphs.Given a data instance, most methods generate an explanation by learning a mask to select an edge-induced subgraph [25,45] or searching over the space of subgraphs [48].However, explaining GNNs for LP is a new and more challenging task.Existing node/graph-level explanation methods do not generalize well to LP for three challenges.1) Connection Interpretability: LP involves a pair of the source node and the target node rather than a single node or graph.Desired interpretable explanations for a predicted link should reveal connections between the node pair.Existing methods generate subgraphs with no format constraints, so they are likely to output subgraphs disconnected from the source, the target, or both.Without revealing connections between the source and the target, these subgraph explanations are hard for humans to interpret and investigate.2) Scalability: For LP, the number of edges involved in GNN computation almost "  # user1 bought item2, and item2 shares attribute1 as item1 user1 and user2 both bought item3, and user2 bought item1 Figure 1: Given a GNN model and a predicted link ( 1 ,  1 ) (dashed red) on a heterogeneous graph of user , item , and attribute  (left).PaGE-Link generates two path explanations (green arrows).Interpretations illustrated on the right.
grows from  to ∼2 compared to the node prediction task because neighbors of both the source and the target are involved.Since most existing methods consider all (edge-induced) subgraphs, the increased edges will scale the number of subgraph candidates by a factor of  (2  ), which makes finding the optimal subgraph explanation much harder.3) Heterogeneity: Practical LP is often on heterogeneous graphs with rich node and edge types, e.g., a graph for recommendations can have user->buys->item edges and item->has->attribute edges, but existing methods only work for homogeneous graphs.
In light of the importance and challenges of GNN explanation for LP, we formulate it as a post hoc and instance-level explanation problem and generate explanations for it in the form of important paths connecting the source node and the target node.Paths have played substantial roles in graph ML and are the core of many non-GNN LP methods [15,16,21,36].Paths as explanations can solve the connection interpretability and scalability challenges.Firstly, paths connecting two nodes naturally explain connections between them.Figure 1 shows an example on a graph for recommendations.Given a GNN and a predicted link between user  1 and item  1 , humaninterpretable explanations may be based on the user's preference of attributes (e.g., user  1 bought item 2 that shared the same attribute  1 as item  1 ) or collaborative filtering (e.g, user  1 had a similar preference as user  2 because they both bought item  3 and user  2 bought item  1 , so that user  1 would like item  1 ).Both explanations boil down to paths.Secondly, paths have a considerably smaller search space than general subgraphs.As we will see in Proposition 4.1, compared to the expected number of edge-induced subgraphs, the expected number of paths grows strictly slower and becomes negligible.Therefore, path explanations exclude many less-meaningful subgraph candidates, making the explanation generation much more straightforward and accurate.
To this end, we propose Path-based GNN Explanation for heterogeneous Link prediction (PaGE-Link), which achieves a better explanation AUC and scales linearly in the number of edges (see Figure 2).We first perform k-core pruning [2] to help find paths and improve scalability.Then we do heterogeneous path-enforcing mask learning to determine important paths, which handles heterogeneity and enforces the explanation edges to form paths connecting source to target.In summary, the contributions of our method are: • Connection Interpretability: PaGE-Link produces more interpretable explanations in path forms and quantitatively improves explanation AUC over baselines.• Scalability: PaGE-Link reduces the explanation search space by magnitudes from subgraph finding to path finding and scales linearly in the number of graph edges.• Heterogeneity: PaGE-Link works on heterogeneous graphs and leverages edge-type information to generate better explanations.

RELATED WORK
We review relevant research on (a) GNNs (b) GNN explanation (c) recommendation explanation and (d) paths for LP.We summarize the properties of PaGE-Link vs. representative methods in Table 1.
They take graph structure and node/edge features as input and output node representations by transforming and aggregating features of nodes' (multi-hop) neighbors.The node representations can be used for LP and achieved great results on LP applications [7,26,42,[49][50][51]54].We review GNN-based LP models in Section 3.
GNN explanation.GNN explanation was studied for node and graph classification, where the explanation is defined as an important subgraph.Existing methods majorly differ in their definition of importance and subgraph selection methods.GNNExplainer [45] selects edge-induced subgraphs by learning fully parameterized masks on graph edges and node features, where the mutual information (MI) between the masked graph and the prediction made with the original graph is maximized.PGExplainer [25] adopts the same MI importance but trains a mask predictor to generate a discrete mask instead.Other popular importance measures are game theory values.SubgraphX [48] uses the Shapley value [34] and performs Monte Carlo Tree Search (MCTS) on subgraphs.GStarX [52] uses a structure-aware HN value [8] to measure the importance of nodes and generates the important-node-induced subgraph.There are more studies from other perspectives that are less related to this work, i.e., surrogate models [12,39], counterfactual explanations [24], and causality [22,23], for which [46] provides a good review.While these methods produce subgraphs as explanations, what makes a good explanation is a complex topic, especially how to meet "stakeholders' desiderata" [18].Our work differs from all above since we focus on a new task of explaining heterogeneous LP, and we generate paths instead of unrestricted subgraphs as explanations.The interpretability of paths makes our method advantaged especially when stakeholders have less ML background.Recommendation explanation.This line of works explains why a recommendation is made [53].J-RECS [28] generates recommendation explanations on product graphs using a justification score that balances item relevance and diversity.PRINCE [6] produces end-user explanations as a set of minimal actions performed by the user on graphs with users, items, reviews, and categories.The set of actions is selected using counterfactual evidence.Typically, recommendations on graphs can be formalized as an LP task.However, the recommendation explanation problem differs from explaining GNNs for LP because the recommendation data may not be graphs, and the models to be explained are primarily not GNN-based [40].
GNNs have their unique message passing procedure, and GNNbased LP corresponds to more general applications beyond recommendation, e.g., drug repurposing [13], and knowledge graph completion [3,27].Thus, recommendation explanation is related to but not directly comparable to GNN explanation.
Paths.Paths are important in graph ML, and many LP methods are path-based, such as graph distance [21], Katz index [16], SimRank [15], and PathSim [36].Paths have also been used to capture the relationship between a pair of nodes.For example, the "connection subgraphs" [5] find paths between the source and the target based on electricity analogs.In general, although black-box GNNs recently outperform path-based methods in LP accuracy, we embrace paths for their interpretability for LP explanation.

NOTATIONS AND PRELIMINARY
In this section, we define necessary notations, summarize them in Table 2, and review the GNN-based LP models.
Definition 3.1.A heterogeneous graph is defined as a directed graph G = (V, E) associated with a node type mapping function  : V → A and an edge type mapping function  : E → R. Each node  ∈ V belongs to one node type  () ∈ A and each edge  ∈ E belongs to one edge type  () ∈ R.
Let Φ(•, •) denote a trained GNN-based model for predicting the missing links in G, where a prediction  = Φ(G, (, )) denotes the predicted link between a source node  and a target node .The model Φ learns a conditional distribution  Φ ( |G, (, )) of the binary random variable  .The commonly used GNN-based LP models [50,54,56] involve two steps.The first step is to generate the GNN-based LP model to explain (,  ) the source and target node for the predicted link the computation graph, i.e., L-hop ego-graph of (,  ) node representations (  ,   ) of (, ) with an -hop GNN encoder.The second step is to apply a prediction head on (  ,   ) to get the prediction of  .An example prediction head is an inner product.
To explain Φ(G, (, )) with an -Layer GNN encoder, we restrict to the computation graph G  = (V  , E  ).G  is the -hop ego-graph of the predicted pair (, ), i.e., the subgraph with node set V  = { ∈  | (, ) ≤  or  (, ) ≤ }.It is called a computation graph because the -layer GNN only collects messages from the hop neighbors of  and  to compute   and   .The LP result is thus fully determined by G  , i.e., Φ(G, (, )) ≡ Φ(G  , (, )).Figure 3b shows a 2-hop ego-graph of  1 and  1 , where  3 and  1  3 are excluded since they are more than 2 hops away from either  1 or  1 .

PROPOSED PROBLEM FORMULATION: LINK-PREDICTION EXPLANATION
In this work, we address a post hoc and instance-level GNN explanation problem.The post hoc means the model Φ(•, •) has been trained.
To generate explanations, we won't change its architecture or parameters.The instance level means we generate an explanation for the prediction of each instance (, ).Specifically, the explanation method answers the question of why a missing link is predicted by Φ(•, •).In a practical web recommendation system, this question can be "why an item is recommended to a user by the model".
An explanation for a GNN prediction should be some substructure in G  , and it should also be concise, i.e., limited by a size budget .This is because an explanation with a large size is often neither informative nor interpretable, for example, an extreme case is that G  could be a non-informative explanation for itself.Also, a fair comparison between different explanations should consume the same budget.In the following, we define budget  as the maximum number of edges included in the explanation.
We list three desirable properties for a GNN explanation method on heterogeneous LP: capturing the connection between the source node and the target node, scalable to large graphs, and addressing graph heterogeneity.Using a path-based method inherently possesses all the properties.Paths capture the connection between a pair of nodes and can be transferred to human-interpretable explanations.Besides, the search space of paths with the fixed source node and the target node is greatly reduced compared to edgeinduced subgraphs.Given the ego-graph G  of  and , the number of paths between  and  and the number of edge-induced subgraphs in G  both rely on the structure of G  .However, they can (a) A GNN predicted link ( 1 ,  1 ) (dashed red) that needs explanation.

K-core pruning
Ego-graph extraction Path-enforcing mask learning  be estimated using random graph approximations.The next proposition on random graphs shows that the expected number of paths grows strictly slower than the expected number of edge-induced subgraphs as the random graph grows.Also, the expected number of paths becomes insignificant for large graphs.Proposition 4.1.Let G(, ) be a random graph with n nodes and density d, i.e., there are  =   2 edges chosen uniformly randomly from all node pairs.Let  , be the expected number of paths between any pair of nodes.Let  , be the expected number of edge-induced subgraphs.Then  , =  ( , ), i.e., lim →∞  ,  , = 0.

Proof. In Appendix A. ■
Paths are also a natural choice for LP explanations on heterogeneous graphs.On homogeneous graphs, features are important for prediction and explanation.A - link may be predicted because of the feature similarity of node  and node .However, the heterogeneous graphs we focus on, as defined in Definition 3.1, often do not store feature information but explicitly model it using new node and edge types.For example, for the heterogeneous graph in Figure 3a, instead of making it a user-item graph and assigning each item node a two-dimensional feature with attributes  1 and  2 , the attribute nodes are explicitly created and connected to the item nodes.Then an explanation like " 1 and  2 share node feature  1  1 " on a homogeneous graph is transferred to " 1 and  2 are connected through the attribute node  1  1 " on a heterogeneous graph.Given the advantages of paths over general subgraphs on connection interpretability, scalability, and their capability to capture feature similarity on heterogeneous graphs, we use paths to explain GNNs for heterogeneous LP.Our design principle is that a good explanation should be concise and informative, so we define the explanation to contain only short paths without high-degree nodes.Long paths are less desirable since they could correspond to unnecessarily complicated connections, making the explanation neither concise nor convincing.For example, in Figure 3c, the long path is not ideal since it takes four hops to go from item  3 to the item  1 , making it less persuasive to be interpreted as "item1 and item3 are similar so item1 should be recommended".Paths containing high-degree nodes are also less desirable because high-degree nodes are often generic, and a path going through them is not as informative.In the same figure, all paths containing node  1  2 are less desirable because  1 2 has a high degree and connects to all the items in the graph.A real example of a generic attribute is the attribute "grocery" connecting to both "vanilla ice cream" and "vanilla cookie".When "vanilla ice cream" is recommended to a person who bought "vanilla cookie", explaining this recommendation with a path going through "grocery" is not very informative since "grocery" connects many items.In contrast, a good informative path explanation should go through the attribute "vanilla", which only connects to vanilla-flavored items and has a much lower degree.
We formalize the GNN explanation for heterogeneous LP as: Problem 4.2.Generating path-based explanations for a predicted link between node  and : • Given a trained GNN-based LP model Φ(•, •), a heterogeneous computation graph G  of  and , a budget  of the maximum number of edges in the explanation, • Find an explanation P = {  | is a - path with maximum length   and degree of each node less than   }, |P |  ≤ , • By optimizing  ∈ P to be influential to the prediction, concise, and informative.

PROPOSED METHOD: PAGE-LINK
This section details PaGE-Link.PaGE-Link has two modules: (i) a -core pruning module to eliminate spurious neighbors and improve speed, and (ii) a heterogeneous path-enforcing mask learning module to identify important paths.An illustration is in Figure 3.

The k-core Pruning
The -core pruning module of PaGE-Link reduces the complexity of G  .The -core of a graph is defined as the unique maximal subgraph with a minimum node degree  [2].We use the superscript  to denote the -core, i.e., G   = (E   , V   ) for the -core of G  .The -core pruning is a recursive algorithm that removes nodes  ∈ V such that their degrees   < , until the remaining subgraph only has nodes with   ≥ , which gives the -core.The difference in nodes between a ( + 1)-core and a -core is called the -shell.The nodes in the orange box of Figure 3b is an example of a 2core pruned from the 2-hop ego-graph, where node  2 1 and  2 2 are pruned in the first iteration because they are degree one.Node  5 is recursively pruned because it becomes degree one after node  2 is pruned.All those three nodes belong to the 1-shell.We perform -core pruning to help path finding because the pruned -shell nodes are unlikely to be part of meaningful paths when  is small.For example, the 1-shell nodes are either leaf nodes or will become leaf nodes during the recursive pruning, which will never be part of a path unless  or  are one of these 1-shell nodes.The -core pruning module in PaGE-Link is modified from the standard -core pruning by adding a condition of never pruning  and .
The following theorem shows that for a random graph G(, ), -core will reduce the expected number of nodes by a factor of  V (, , ) and reduce the expected number of edges by a factor of  E (, , ).Both factors are functions of , , and .We defer the exact expressions of these two factors in Appendix B, since they are only implicitly defined based on Poisson distribution.Numerically, for a random G(, ) with average node degree  ( − 1) = 7, its 5-core has  V (, , 5) and  E (, , 5) both ≈ 0.69.Proof.Please refer to Appendix B and [29].■ The -core pruning helps reduce the graph complexity and accelerates path finding.One concern is whether it prunes too much and disconnects  and .We found that such a situation is very unlikely to happen in practice.To be specific, we focus on explaining positively predicted links, e.g.why an item is recommended to a user by the model.Negative predictions, e.g., why an arbitrary item is not recommended to a user by the model, are less useful in practice and thus not in the scope of our explanation.(, ) node pairs are usually connected by many paths in a practical G [41], and positive link predictions are rarely made between disconnected or weakly-connected (, ).Empirically, we observe that there are usually too many paths connecting a positively predicted (, ) instead of no paths, even in the -core.Therefore, an optional step to enhance pruning is to remove nodes with super-high degrees.As we discussed in Section 4, high-degree nodes are often generic and less informative.Removing them can be a complement to k-core to further reduce complexity and improve path quality.

Heterogeneous Path-Enforcing Mask Learning
The second module of PaGE-Link learns heterogeneous masks to find important path-forming edges.We perform mask learning to select edges from the -core-pruned computation graph.For notation simplicity in this section, we use G = (V, E) to denote the graph for mask learning to save superscripts and subscripts, and G   is the actual graph in the complete version of our algorithm.The idea is to learn a mask over all edges of all edge types to select the important edges.Let =1 be learnable masks of all edge types, with M  ∈ R | E  | corresponds type  .We denote applying M  on its corresponding edge type by E  ⊙  (M  ), where  is the sigmoid function, and ⊙ is the element-wise product.Similarly, we also overload the notation ⊙ to indicate applying the set of masks on all types of edges, i.e., E ⊙  (M) = ∪  ∈R {E  ⊙  (M  )}.We call the graph with the edge set E ⊙  (M) a masked graph.Applying a mask on graph edges will change the edge weights, which makes GNNs pass more information between nodes connected by highly-weighted edges and less on others.The general idea of mask learning is to learn an M that produces high weights for important edges and low weights for others.To learn an M that better fits the LP explanation, we measure edge importance from two perspectives: important edges should be both influential for the model prediction and form meaningful paths.Below, we introduce two loss terms L  and L ℎ for achieving these two measurements.
L  is to learn to select influential edges for model prediction.The idea is to do a perturbation-based explanation, where parts of the input are considered important if perturbing them changes the model prediction significantly.In the graph sense, if removing an edge  significantly influences the prediction, then  is a critical counterfactual edge that should be part of the explanation.This idea can be formalized as maximizing the mutual information between the masked graph and the original graph prediction  , which is equivalent to minimizing the prediction loss L  (M) has a straightforward meaning, which says the masked subgraph should provide enough information for predicting the missing link (, ) as the whole graph.Since the original prediction is a constant, L  (M) can also be interpreted as the performance drop after the mask is applied to the graph.A well-masked graph should give a minimum performance drop.Regularizations of the mask entropy and mask norm are often included in L  (M) to encourage the mask to be discrete and sparse.
L ℎ is the loss term for M to learn to select path-forming edges.The idea is to first identify a set of candidate edges denoted by E ℎ (specified below), where these edges can form concise and informative paths, and then optimize L ℎ (M) to enforce the mask weights for  ∈ E ℎ to increase and mask weights for  ∉ E ℎ to decrease.We considered a weighted average of these two forces balanced by hyperparameters  and , The key question for computing L ℎ (M) is to find a good E ℎ containing edges of concise and informative paths.As in Section 4, paths with these two desired properties should be short and without high-degree generic nodes.We thus define a score function of a path  reflecting these two properties as below In this score function, M gives the probability of  to be included in the explanation, i.e.,  () =  (M

𝜏 (𝑒) 𝑒
).To get the importance of a path, we first use a mean-field approximation for the joint probability by multiplying  () together, and we normalize each

EXPERIMENTS
In this section, we conduct empirical studies to evaluate explanations generated by PaGE-Link.Evaluation is a general challenge when studying model explainability since standard datasets do not have ground truth explanations.Many works [25,45] use synthetic benchmarks, but no benchmarks are available for evaluating GNN explanations for heterogeneous LP.Therefore, we generate an augmented graph and a synthetic graph to evaluate explanations.They allow us to generate ground truth explanation patterns and evaluate explainers quantitatively.

Datasets
The augmented graph.AugCitation is constructed by augmenting the AMiner citation network [37].A graph schema is shown in Figure 4a.The original AMiner graph contains four node types: author, paper, reference (ref), and field of study (fos), and edge types "cites", "writes", and "in".We construct AugCitation by augmenting the original graph with new (author, paper) edges typed "likes" and define a paper recommendation task on AugCitation for predicting the "like" edges.A new edge (, ) is augmented if there is at least one concise and informative path  between them.In our augmentation process, we require the paths  to have lengths shorter than a hyperparameter   and with degrees of nodes on  (excluding  & ) bounded by a hyperparameter   .We highlight these two hyperparameters because of the conciseness and informativeness principles discussed in Section 4. The augmented edge (, ) is used for prediction.The ground truth explanation is the set of paths satisfying the two hyperparameter requirements.We only take the top   paths with the smallest degree sums if there are many qualified paths.We train a GNN-based LP model to predict these new "likes" edges and evaluate explainers by comparing their output explanations with these path patterns as ground truth.
The synthetic graph.UserItemAttr is generated to mimic graphs with users, items, and attributes for recommendations.Figure 4b shows the graph schema and illustrates the generation process.We include three node types: "user", "item", and item attributes ("attr") in the synthetic graph, and we build different types of edges step by step.Firstly, the "has" edges are created by randomly connecting items to attrs, and the "hidden prefers" edges are created by randomly connecting users to attrs.These edges represent items having attributes and user preferences for these attributes.Next, The "likes" edges (dashed red) are augmented for prediction.

Prediction edge
Explanation patterns likes has buys hidden prefers likes    buys has (b) Schema of UserItemAttr (the left box) and its generation process (the right Three types of base edges are generated first, i.e., "has" (black), "hidden prefers" (dashed gray), and "buys" (blue).The solid "has" and "buys" edges are then used to generate "likes" edges (dashed red) for prediction and the ground truth explanation patterns (green arrows).we randomly sample a set of items for each user, and we connect a (user, item) pair by a "buys" edge, if the user "hidden prefers" any attr the item "has".The "hidden prefers" edges correspond to an intermediate step for generating the observable "buys" edges.We remove the "hidden prefers" edges after "buys" edges are generated since we cannot observe 'hidden prefers" information in reality.An example of the rationale behind this generation step is that items have certain attributes, like the item "ice cream" with the attribute "vanilla".Then given that a user likes the attribute "vanilla" as hidden information, we observe that the user buys "vanilla ice cream".The next step is to generate more 'buys" edges between randomly picked (user, item) pairs if a similar user (two users with many shared item neighbors) buys this item.The idea is like collaborative filtering, which says similar users tend to buy similar items.The final step is generating edges for prediction and their corresponding ground truth explanations, which follows the same augmentation process described above for AugCitation.For UserItemAttr, we have "has" and "buys" as base edges to construct the ground truth, and we create "likes" edges between users and items for prediction.

Experiment Settings
The GNN-based LP model.As described in Section 3, the LP model involves a GNN encoder and a prediction head.We use RGCN [32] as the encoder to learn node representations on heterogeneous graphs and the inner product as the prediction head.We train the model using the cross-entropy loss.On each dataset, our prediction task covers one edge type  .We randomly split the observed edges of type  into train:validation:test = 7:1:2 as positive samples and draw negative samples from the unobserved edges of type  .Edges of other types are used for GNN message passing but not prediction.
Explainer baselines.Existing GNN explanation methods cannot be directly applied to heterogeneous LP.Thus, we extend the popular GNNExplainer [45] and PGExplainer [25] as our baselines.We re-implement a heterogeneous version of their mask matrix and mask predictor similar to the heterogeneous mask learning module in PaGE-Link.For these baselines, we perform mask learning using their original objectives, and we generate edge-induced subgraph explanations from their learned mask.We refer to these two adapted explainers as GNNExp-Link and PGExp-Link below.We do not compare to other search-based explainers like SubgraphX [48]  because of their high computational complexity (see Section 5.4).They work well on small graphs as in the original papers, but they are hard to scale to large and dense graphs we consider for LP.

Evaluation Results
Quantitative evaluation.Both the ground truth and the final explanation output of PaGE-Link are sets of paths.In contrast, the baseline explainers generate edge masks M. For a fair comparison, we take the intermediate result PaGE-Link learned, also the mask M, and we follow [25] to compare explainers by their masks.Specifically for each computation graph, edges in the ground truth paths are treated as positive, and other edges are treated as negative.
Then weights in M are treated as the prediction scores of edges and are evaluated with the ROC-AUC metric.A high ROC-AUC score reflects that edges in ground truth are precisely captured by the mask.The results are shown in Table 4, where PaGE-Link outperforms both baseline explainers.
For scalability, we showed PaGE-Link scales linearly in  (|E   |) in Section 5.4.Here we evaluate its scalability empirically by generating ten synthetic graphs with various sizes from 20 to 5,500 edges in G  .The results are shown in Figure 2b, which suggests the computation time scales linearly in the number of edges.
Qualitative evaluation.A critical advantage of PaGE-Link is that it generates path explanations, which can capture the connections between node pairs and enjoy better interpretability.In contrast, the top important edges found by baseline methods are often disconnected from the source, the target, or both, which makes their explanations hard for humans to interpret and investigate.We conduct case studies to visualize explanations generated by PaGE-Link on the paper recommendation task on AugCitation.
Figure 5 shows a case in which the model recommends the source author "Vipin Kumar" the recommended target paper titled "Fast  and exact network trajectory similarity computation: a case-study on bicycle corridor planning".The top path explanation generated by PaGE-Link goes through the coauthor "Shashi Shekhar", which explains the recommendation as Vipin Kumar and Shashi Shekhar coauthored the paper "Correlation analysis of spatial time series datasets: a filter-and-refine approach", and Shashi Shekhar wrote the recommended paper.Given the same budget of three edges, explanations generated by baselines are less interpretable.Figure 6 shows another example with the source author "Huan Liu" and the recommended target paper titled "Using association rules to solve the cold-start problem in recommender systems".PaGE-Link generates paths going through the common fos of the recommended paper and three other papers written by Huan Liu: 22646, 25160, and 35294.We show the PaGE-Link explanation with the top three paths in green.We also show other unselected fos shared by the 22646, 25160, and 35294 and the target paper.Note that the explanation paths all have length three, even though there are many paths with length five or longer, e.g., (328, 22646,  4, 25260,  4134, 5670).Also, the explanation paths go through the fos "Redundancy (engineering)" and "User profile" instead of generic fos like "Artificial intelligence" and "Computer science".This case demonstrates that explanation paths selected by PaGE-Link are more concise and informative.

HUMAN EVALUATION
The ultimate goal of model explanation is to improve model transparency and help human decision-making.Human evaluation is thus the best way to evaluate the effectiveness of an explainer, which has been a standard evaluation approach in previous works [6,30,33].We conduct a human evaluation by randomly picking 100 predicted links from the test set of AugCitation and generate explanations for each link using GNNExp-Link, PGExp-Link, and PaGE-Link.We design a survey with single-choice questions.In each question, we show respondents the predicted link and those three explanations with both the graph structure and the node/edge type information, similarly as in Figure 5 but excluding method names.The survey is sent to people across graduate students, postdocs, engineers, research scientists, and professors, including people with and without background knowledge about GNNs.We ask respondents to "please select the best explanation of 'why the model predicts this author will like the recommended paper?' ".At least three answers from different people are collected for each question.In total, 340 evaluations are collected and 78.79% of them selected explanations by PaGE-Link as the best.

CONCLUSION
In this work, we study model transparency and accountability on graphs.We investigate a new task: GNN explanation for heterogeneous LP.We identify three challenges for the task and propose a new path-based method, i.e.PaGE-Link, that produces explanations with interpretable connections, is scalable, and handles graph heterogeneity.PaGE-Link explanations quantitatively improve ROC-AUC by 9 -35% over baselines and are chosen by 78.79% responses as qualitatively more interpretable in human evaluation.
Step (8) to ( 9) is because exp is continuous.■ B DETAILED THEOREM 5.1 We now state a more detailed version of Theorem 5.1.This theorem gives the exact formula of  V (, , ) and  E (, , ), which are built upon a Poisson random variable.The argument is adapted from [14,29].Readers can refer to [14,29] for the proof.

C COMPLEXITY OF SUBGRAPHX
The search-based methods often have much higher time complexity exponential in the number of nodes or edges.Thus, a budget is forced instead of searching subgraphs with all sizes.For example, SubgraphX finds all connected subgraphs with at most   nodes, which has complexity Θ(|V  | D2  −2 ) for a graph with maximum degree D = max  ∈V   .This complexity can be shown using the following two lemmas.Proof.See [11] for proof using an encoding procedure.■

D DATASET DETAILS
We show the hyperparameters for constructing the datasets in Section 6 in Table 5, which includes the augmentation of the Aminer citation graph and the generation of the synthetic graph.

Figure 2 :
Figure 2: (a) PaGE-Link outperforms GNNExplainer and PG-Explainer in terms of explanation AUC on the citation graph and the user-item graph.(b) The running time of PaGE-Link scales linearly in the number of graph edges.

Figure 3 :
Figure 3: PaGE-Link on a graph with user nodes , item nodes , and two attribute types  1 and  2 .(Best viewed in color.)

Figure 4 :
Figure 4: The proposed augmented graph AugCitation and the synthetic graph UserItemAttr.

Figure 6 :
Figure 6: Top three paths (green arrows) selected by PaGE-Link for explaining the predicted link (328, 5670) (dashed red).The selected paths are short and do not go through a generic field of study like "Computer Science".

Lemma C. 1 .
For a graph G with n vertices, the number of the connected subgraph of G having   nodes is bounded below by the number of trees in G having   nodes.Proof.Each connected subgraph has a spanning tree.■ Lemma C.2.For a graph G with node set V, the number of trees in G having   tree nodes is Θ(|V | D2  −2 ).

Table 2 :
Notation table edges with type  ∈ R, i.e., E