Bridged-GNN: Knowledge Bridge Learning for Effective Knowledge Transfer

The data-hungry problem, characterized by insufficiency and low-quality of data, poses obstacles for deep learning models. Transfer learning has been a feasible way to transfer knowledge from high-quality external data of source domains to limited data of target domains, which follows a domain-level knowledge transfer to learn a shared posterior distribution. However, they are usually built on strong assumptions, e.g., the domain invariant posterior distribution, which is usually unsatisfied and may introduce noises, resulting in poor generalization ability on target domains. Inspired by Graph Neural Networks (GNNs) that aggregate information from neighboring nodes, we redefine the paradigm as learning a knowledge-enhanced posterior distribution for target domains, namely Knowledge Bridge Learning (KBL). KBL first learns the scope of knowledge transfer by constructing a Bridged-Graph that connects knowledgeable samples to each target sample and then performs sample-wise knowledge transfer via GNNs.KBL is free from strong assumptions and is robust to noises in the source data. Guided by KBL, we propose the Bridged-GNN} including an Adaptive Knowledge Retrieval module to build Bridged-Graph and a Graph Knowledge Transfer module. Comprehensive experiments on both un-relational and relational data-hungry scenarios demonstrate the significant improvements of Bridged-GNN compared with SOTA methods

Transfer learning (TL) [20,43,58,59,61,63] attempts to transfer knowledge from the source domain data to the target domain data and has gained significant success in various scenarios, e.g., image recognition and text mining [9,31,50,69,72].Specifically, TL methods transfer information at the domain level, i.e., they jointly train models on data from both the source and target domain and diminish the domain difference concurrently.However, these methods have two shortcomings shown in Fig. 1 (a): (1) they are usually built on strong assumptions [43,50], e.g., assuming that data from different domains have the same conditional distribution (detailed analysis in Sec.3.1).These assumptions, however, are usually unsatisfied in real scenarios, thus leading to limited choices for available source domain data or even no satisfied source domain data [59]; (2) domain-level information contains both useful and noisy information, and transferring source information indiscriminately may restrict the modeling capacity.Due to the above two limitations, the paradigm of domain-level knowledge transfer may lead to poor generalization ability on the target domain.
Inspired by the idea of Graph Neural Networks (GNNs) that learn node representations by transferring information from neighbors [7,14,38,39,48,54], we redefine the paradigm of knowledge transfer and propose the framework of Knowledge Bridge Learning (KBL).In Fig. 1 (b), we regard both the source domain and target domain as an open-domain knowledge base that consists of diverse samples.Each sample serves as a query, which is similar to information retrieval models that retrieve relevant items for a query from a collection of open-domain documents.We extract valuable knowledge (samples in the open domain) for each query (a target sample) and bridge them.Such bridges form the scope of knowledge transfer, namely Bridged-Graph, specifically identifying which samples in the open-domain data contain knowledge that is beneficial for the samples in the target domain.Then we transfer knowledge under the guidance of the learned scope (Bridged-Graph) with GNNs.Overall, KBL consists of two steps as illustrated in Fig. 1 (c): learning the scope of knowledge transfer (i.e., Bridged-Graph) first and then transferring useful knowledge according to the learned scope.
The paradigm of KBL has the following unique advantages: (1) KBL breaks the limitations of transfer learning that assume a domain-invariant posterior distribution between the source and target domains.KBL aims at learning a knowledge-enhanced posterior distribution for target domain, i.e., learning   ( |, K ( )) where K ( ) is the knowledge of each sample, rather than jointly learning a shared posterior distribution  + ( | ) of source and target domains.(2) KBL can filter the noise of source domains by fine-grained knowledge transfer scoped by Bridged-Graph.
Under the paradigm of Knowledge Bridge Learning, we propose a novel Bridged-Graph Neural Network (Bridged-GNN) model.Bridged-GNN consists of two main components: the Adaptive Knowledge Retrieval (AKR) module and the Graph Knowledge Transfer (GKT) module.The Adaptive Knowledge Retrieval (AKR) module aims to retrieve beneficial samples that contain useful knowledge for the given sample from both the source and target domains.Then we view these retrieved beneficial samples, which may be from the source domain (inter-domain) or target domain (intra-domain), as knowledge for each target benefited sample and connect such directed "beneficial-benefited" edges to construct the Bridged-Graph.Bridged-Graph is a structure of data that defines  According to whether there exist intra-domain and inter-domain relations between samples of original data, we divide knowledge transfer into three scenarios (see Fig. 2): (a) un-relational data, namely  , (b) relational data with intra-domain relations and without inter-domain relations, namely   , (c) relational data with both intra-domain relations and inter-domain relations, namely  & .We conduct comprehensive experiments of three scenarios on four real-world datasets (Company, Twitter, Facebook100, Office31) and a synthetic dataset.The results consistently show that Bridged-GNN gains significant improvements in all three scenarios.
The main contributions of this paper are summarized as follows: • We redefine the paradigm of knowledge transfer as Knowledge Bridge Learning (KBL) that conducts sample-wise knowledge transfers within the learned scope.• We propose a novel Bridged-GNN model under the paradigm of KBL to transfer knowledge effectively in three different scenarios.
• Comprehensive experiments on real-world and synthetic datasets demonstrate the effectiveness of our method.

PRELIMINARY
In this section, we introduce GNNs and our problem definitions.Graph Neural Networks (GNNs) aim at learning representations for nodes on the graph.Given a graph G( , , ,  ) as input, GNNs learn node representations by iteratively aggregating information from neighbors.Current mainstream GNNs update node representations with the following function: Where

Graph Neural Networks
is the node representation vector at -th layer, AGG denotes the neighborhood aggregation function, and Combine denotes the combination function that updates node representation with the aggregated neighborhood feature and the central node feature.

Problem Definitions
We introduce the scenarios of Knowledge Transfer and the definitions of Knowledge Bridge Learning in this section.

Scenarios of Knowledge Transfer.
Knowledge transfer can be leveraged in three scenarios as shown in Fig. 2: (a) un-relational data ( ) represents the scenario where both source domain and target domain are non-graph data, denoted by (  ,   ) and (  ,   ).(b) data with intra-domain relations and without inter-domain relations (  ) represents the scenario of both source domain and target data are graph data and there are no edges between the two graphs, denoted by   (  ,   ,   ) and   (  ,   ,   ).(c) data with intra-domain relations and interdomain relations ( & ) represents the scenario that both intra-domain and inter-domain edges exist in the original graph, denoted by  ({  ,   }, {  ,   }, ).Existing transfer learning methods usually focus on un-relational scenarios, e.g., images.

Knowledge Bridge Learning (KBL).
As shown in Fig. 1 (c), KBL is a new paradigm of knowledge transfer, which regards both the source domain and target domain as an open-domain knowledge base, and takes each sample as a query.KBL first learns a scope of knowledge transfer by querying the open-domain knowledge base and then transfers knowledge within the learned scope.In this paper, we propose to use Bridged-Graph to represent the scope of knowledge transfer: Definition 2.1 (Bridged-Graph).Bridged-Graph is represented as  bridged = (  ,   ,   ,   ,   ,   , ) which defines the scope of knowledge transfer with beneficial-benefited relations between samples, where   (  ) is the sample/node set of the source (target) domain, ) is the feature matrix of the source (target) domain,   (  ) is the labels of the source (target) domain,  = {   = (  ,   )|  ,   ∈   ∪   } is the beneficialbenefited edge set, where the edge    indicates   contains useful knowledge for   (i.e.,   is a beneficial node for   ).
Note that edges on the Bridged-Graph include both intra-domain edges and inter-domain edges, namely "Knowledge Bridge".Then we define Knowledge Bridge Learning (KBL) as follows: Definition 2.2 (Knowledge Bridge Learning).Knowledge Bridge Learning (KBL) is a paradigm of knowledge transfer.KBL first learns a Bridged-Graph from data to define the valid scope of knowledge transfer and then transfers knowledge on the Bridged-Graph from beneficial nodes to benefited nodes.
In all three scenarios shown in Fig. 2, KBL learns a Bridged-Graph to scope the knowledge transfer, while the difference of the learned Bridged-Graph in the three scenarios is that we will reuse the original edges in relational data which are suitable for knowledge transfer.For un-relational data scenarios, we learn Bridged-Graph from scratch with the original isolated samples.For   scenario, we learn Bridged-Graph by adding new knowledge bridges and reusing original intra-domain edges.For  & scenario, we further reuse the original inter-domain edges.

METHODOLOGY
In this section, we introduce the motivation and architecture of our Bridged Graph Neural Network (Bridged-GNN) under the guidance of Knowledge Bridge Learning.Domain shift problem usually refers to the difference in the joint distribution (i.e.,  (,  )) between the source domain and the target domain.However, directly modeling the joint distribution is difficult with few target labels, and some methods [45,46] use pseudo labels to estimate the pseudo joint distribution, which is sensitive to noises and results in estimation bias.As one of the most representative transfer learning methods, Domain Adaptation (DA) simplifies the problem by making assumptions based on the Bayes formula:

Motivations of Knowledge Bridge Learning
DA methods usually assume that the source domain and target domain share invariant conditional distribution ( ( | ) or  ( | )), while having different marginal distribution ( ( ) or  ( )) [43,50,59].And the mainstream framework of DA, including the unsupervised and the semi-supervised settings, is to jointly train a shared model on the source and target domains while aligning the marginal distribution discrepancy, assuming a shared posterior distribution of source and target domains during this process.
However, we observed that this assumption is hard to be satisfied in real-world data, and the conditional distribution between the source domain and target domain may have significant differences.As shown in Fig. 3, we visualize the features of five datasets used in this paper with the T-SNE algorithm.In each scatter plot, we distinguish the samples of different classes and different domains with four different colors.For the Facebook100 dataset which has multiple classes, we only select the samples of the first two classes to present the results more clearly.Fig. 3 (a)∼(d) are from four realworld datasets, we find there exist significant domain differences on both the conditional distribution and marginal distribution in these datasets.These evidences indicate that we cannot ignore the conditional distribution differences in the real data, and motivate us to design a new paradigm.
As shown in Fig. 1 (c), learning a shared posterior distribution ( + ( | )) on source and target domains is sensitive to the conditional distribution shift and noise data.Different from previous transfer learning, our Knowledge Bridge Learning aims at enriching the sample information by injecting knowledge from other samples, and changing the paradigm of learning shared posterior distribution into learning a knowledge-enhanced posterior distribution for target domain,i.e.,   ( |, K ( )), where K ( ) is the external knowledge of samples (source nodes on the Bridged-Graph).

Effectiveness of Knowledge Transfer on Bridged-Graph.
Knowledge Bridge Learning aims to learn a scope named Bridged-Graph and transfer knowledge via this graph.Such a graph determines the upper bound of performance improvement gained by knowledge transfer.Therefore, we first validate the effectiveness of our KBL by a synthetic Bridged-Graph.Specifically, we generated many Bridged-Graphs on two real-world datasets (Twitter [64] and Office31 [50]) by randomly adding edges between samples while controlling the ratio of homophilous neighbors (neighbors of the same class).For each synthetic Bridged-Graph, we randomly add 8 neighbors   as source nodes (4 nodes in   and 4 nodes in   ) for each target domain sample   ∈   , and only add 4 neighbors from   as source nodes for each source domain sample   ∈   .Then we use GraphSAGE [24] as the Graph Knowledge Transfer module of KBL to transfer knowledge from source nodes to target nodes and record the test accuracy of node classification.As shown in Fig. 4, when the homophilous ratio of intra-domain edges and inter-domain edges tends to 1.0 at the same time, the test accuracy

Overview of Bridged-GNN
Guided the paradigm of Knowledge Bridge Learning in Sec.2.2.2, we propose Bridged-GNN and show its overview in Fig. 5. Bridged-GNN is implemented in a two-stage manner which is composed of two main components: Adaptive Knowledge Retrieval (AKR) module and Graph Knowledge Transfer (GKT) module.Specifically, Bridged-GNN first learns a Bridged-Graph to determine the scope of knowledge transfer, and then uses a GNN model as plug-in to transfer knowledge on the Bridged-Graph.Bridged-GNN model can be applied to all three scenarios in Fig. 2.

Adaptive Knowledge Retrieval (AKR)
In the first step, we need to learn the Bridged-Graph to determine the scope of knowledge transfer.We mainly focus on the classification tasks in this paper, and the required useful knowledge of a specified sample is mainly contained in the homophilous samples of the same class (validated in Sec.3.1.1)[1-3, 21, 47, 68].Considering the candidates of beneficial samples are from different domains, we design an Adaptive Knowledge Retrieval (AKR) module.Given arbitrarily specified sample as a query, AKR retrieves top-K beneficial samples from all candidates as intra-domain/inter-domain knowledge for the query sample.

Architecture of AKR.
The AKR module first learns the similarities of samples for knowledge retrieval, which may be from the same domain (intra-domain) or different domains (inter-domain), and then retrieve beneficial samples for each sample (query) with the learned similarities to construct the Bridged-Graph.As shown in Fig. 6, we design a twinflow architecture for the AKR module.AKR is composed of the source encoder, the target encoder, and a discriminator with the adversarial training strategy.First, a source encoder F  (•) and a target encoder F  (•) are designed to encode the input data from the source domain and target domain respectively: where   ∈ R   ×  and   ∈ R   ×  represent the sample features of the source domain and target domain,   and   denote the hidden representations of the source domain and target domain.Besides, for graph data, the input data also includes the adjacent matrix   ∈ R   ×  and   ∈ R   ×  .According to the different types of input data (e.g., text, graph, image), we can choose appropriate backbone networks (e.g., DNN, GNN, CNN) as encoders to fit the data better.
Then we transform the representations of target domain samples (  ) into the same feature space as the source domain.We achieve this by using a Domain Divergence Learner module to learn the domain difference variable (i.e., the domain difference between the source domain and target domain) which considers the personalized information of each target domain sample and the overall difference between the source domain and target domain representations: where Pooling is the SUM pooling function.Then we can get the transformed target domain representations denoted as   : In order to prevent the target domain representations encoded by the target domain encoder from forgetting the original target domain information, we add a target domain decoder to form an Auto-Encoder and further optimize it by reconstruction loss: Then we use an adversarial loss to make sample pairs from intra-domain and inter-domain measured in a common semantic space.Specifically, we use a discriminator (D) to distinguish the samples from the source domain and target domain, and calculate adversarial loss iteratively: With the obtained source domain representations   and the transformed target domain representations   , which belong to the same feature space, we then train a cosine similarity learner

Similarity
Matrix  q u e r y q u er y query Retrieved Knowledge (top-K similar samples) module to learn the pair-wise similarity of arbitrary sample pair: Where  , represents the similarity between nodes   and   (  and   may be from the same domain or different domains).Γ Cosine (•, •) represents the cosine similarity function.Finally, we get a similarity matrix  ∈ R  × where  =   +   .Then given any query sample   ∈ {  ∪   }, we can retrieve top-K (K is a hyperparameter) similar samples from   ∪   as the knowledge of   : As shown in Fig. 7, considering the insufficiency and low quality of target domain data, we retrieve knowledge for each source domain sample from source domain only, while retrieving knowledge for each target domain sample from both source and target domains.
We optimize the AKR module by binary cross entropy (BCE) loss: Where  pair represents the number of sample pairs <   ,   > used for training,  pair   = 1 indicates that the pair of samples <   ,   > belong to the same category, otherwise  pair   = 0.  (•) represents the Sigmoid function.Overall, we leverage L  to optimize discriminator while fixing the parameters of AKR except for the discriminator, then L  is leveraged together with L  and L  (i.e., L  + L  + L  ) to optimize the AKR except for the discriminator.These two steps are performed iteratively.In principle, our method can be applied to any machine learning tasks (e.g., classification and regression tasks).

Scalable optimization strategy with balanced sampling.
Considering that the number of possible sample pairs is the square of the universal sample size ( 2 ), which is infeasible to train with full-batch with a large  .Besides, sample pairs of datasets with multiple classes are extremely unbalanced, i.e., negative pairs with different labels are far more than positive pairs with the same label.
To tackle the scalability and imbalance problems, we design a scalable optimization strategy with Balanced Pair-wise Sampling (BPS).Specifically, we use the BPS algorithm shown in Algo. 1 to sample  pair intra-domain sample pairs in the source domain and target domain respectively (setting 1 =  2 =   ,  1 =  2 =   and  1 =  2 =   ,  1 =  2 =   ).And we sample  pair inter-domain (source→target) sample pairs (setting  1 =   ,  2 =   ,  1 =   ,  2 =   ).Then we get a total of 3 pair pairs of samples in each iteration.We set Max_Class_Num to 10 for all datasets.The BPS algorithm can guarantee the same number of positive pairs and negative pairs in each mini-batch.Besides, we can set an arbitrary  pair according to the data size.

Graph Knowledge Transfer (GKT)
Leveraging the retrieved knowledge in the previous step, we can construct the Bridged-Graph defined in Sec.2.2.2.Then we use a GNN model to transfer knowledge on the Bridged-Graph.

Conversion from Retrieved Knowledge to Bridged-Graph.
We further convert the retrieved knowledge (see Eq. 8 and Fig. 7) into a Bridged-Graph.For all three scenarios of Knowledge Transfer shown in Fig. 2, we view each sample as a node and add edges from top-K retrieved beneficial nodes to each target node, i.e. a KNN-Graph, where K is a hyperparameter set by grid search in {4, 8, 16, 20} in our experiments.Besides, we also reuse the original edges that are suitable for knowledge transfer in two relational data scenarios (  ,  & ).Specifically, we remove original edges with similarities lower than the threshold  (we set  as the 25% quantile number of the similarity matrix in our experiments and can adjust it according to the actual situations), and then reuse the remaining edges in the Bridged-Graph.

GNN for Knowledge Transfer on Bridged-Graph.
With the learned Bridged-Graph that defines the scope of knowledge transfer, we then use a Graph Neural Network model to transfer knowledge from beneficial samples to benefited samples.At this step, all message-passing-based GNNs can be used as a plug-in for the GKT module.We use GNNs as our GKT module because the neighborhood aggregation framework (see Eq. 1) of GNNs just matches our motivation (see Sec. 3.1.1)to learn a knowledgeenhanced posterior distribution for the target domain: The AGG function of GNN is used to aggregate the retrieved knowledge from AKR, while the Combine function of GNN is used to combine the original sample feature and the aggregated knowledge.Considering that there exist multi-domain nodes on the Bridged-Graph, we mainly use KTGNN [6] which considers node-level domain shift as the GKT module in Bridged-GNN.And we also evaluate the performance of other mainstream GNNs (e.g., GCN, GAT, GCNII) (see Table 3 and Table 4).

EXPERIMENTS
In this section, we compare Bridged-GNN with other state-of-theart methods on the three knowledge transfer scenarios in Fig. 2.

Datasets
We conduct experiments on five datasets, including four real-world datasets and a synthetic dataset.The basic information is shown in Table 1, and a brief introduction of datasets is as follows: Twitter [6,64]: Twitter is a social network dataset that describes the social relations among politicians (source domain) and civilians (target domain) on the Twitter platform.And the goal is to predict the political tendency of civilians.By removing original edges, we also construct the Twitter   dataset in un-relational data scenarios.
Office31 [50]: The Office31 dataset is a mainstream image benchmark for transfer learning.There are three different domains (Amazon, Webcam, and DSLR) in this dataset.We select Webcam and DSLR, which are data-hungry, as the target domain, and use Amazon, which is rich in data, as the source domain, including Office31 (A→W) and Office31 (A→D).
FB [56]: FB (Facebook100) datasets are social networks of 100 different universities in the United States.And the goal is to predict Node identity flags.We view the social network of each university as an independent domain and select two of them to form a Table 2: Experiments of classification on target domain samples in the scenario of un-relational data (  in Fig. 2 (a)).The meaning of the model with subscripts can be found at Sec. 4.2.

Dataset
Twitter Company [6]: Company dataset is a company investment network with 10641 real-world companies in China.We regard listed companies as source domain and unlisted companies as target domain and the goal is to predict the risk status of non-listed companies.And we use this dataset in the  & scenario.
Sync- /Sync-  /Sync- & : We construct 3 synthetic datasets for three scenarios of GKT by randomly sampling points of source and target domains from two distinct Multivariate Gaussian distributions, which are visualized in Fig. 3 (e).The samples of source and target domains in this dataset are designed to have distinct conditional distribution and marginal distribution to validate the motivations described in Sec.3.1.

Experimental Settings
Data Settings: Under the semi-supervised classification setting, we use all source domain samples and only few target domain samples as training set for all datasets.For Twitter and Company datasets, we use the same data split as Bi et al. [6]; For other datasets, we randomly select 20% target domain samples in each class for training, and the remaining samples are further divided into the validation set and test set in equal proportions.
Model Settings: For the scenarios of   and  & , we use DNN and other GNNs, including GCN [36], GAT [57], GC-NII [11], OODGAT [51], KTGNN [6], as baselines to encode graph data.The hyperparameters settings of these models all refer to the original papers.We remove the original feature completion module of KTGNN because the missing feature problem is not considered in this paper.For un-relational data scenarios, we also use some optimal transfer learning methods (i.e., DANN [16], DAN citeDAN, CDAN [44], MME [50], APE [35], S 3 D [69]), where the S 3 D model adopts a sample-wise distillation method for semi-supervised domain adaptation and achieves state-of-the-art performance.For transfer learning methods and the encoder of AWR in Bridged-GNN, we use ResNet34 as the backbone network for Office31 dataset and use 3-layer MLP as the backbone network for other datasets with vector features.For all transfer learning methods, we use the recommended hyperparameters in the original papers and train them all in semi-supervised learning settings.Besides, we also compare models with different training strategies: (1) T: train on the target domain only.
(2) S+T: train on the source and target domain concurrently.
(3) S→T: pretrain on source domain and finetune on target domain.We use subscripts to denote a model trained under a specific strategie (e.g., DNN T , DNN S+T , DNN S → T) and Bridged-GNN with a specific GNN as the GKT module, e.g., Bridged-GNN \KTGNN .
Evaluation Metric: For binary-classification datasets (Twitter, Company, Synthetic datasets), we use Binary F1-Score and AUC as evaluation metrics.For multi-classification datasets (FB, Office31), we use Macro F1-Score and Micro F1-Score as evaluation metrics.For all datasets, we evaluate the models by their performance of classifying test samples in the target domain.

Main Experiments
Analysis of experiments on the three main scenarios is as follows.

Results of Knowledge Transfer in 𝑈 𝐷 scenarios.
As shown in Table 2, we conduct experiments on four un-relational datasets, including three real-world datasets (Twitter   , Office31 (A→D), Office31 (A→W)) and one synthetic dataset Sync- .Our Bridged-GNN KTGNN gains significant improvements of classification performance on all datasets, e.g., gains 4.46% improvements of F1-macro on Twitter   dataset.Compared with other baseline methods and state-of-the-art transfer learning methods, our method gains the best performance in the scenarios of knowledge transfer on un-relational data ( ).

Results of Knowledge Transfer in 𝑅𝐷 𝑖𝑛𝑡𝑟𝑎 scenarios.
As shown in Table 3, we conduct experiments on three crossnetwork datasets, including two real-world datasets from Facebook social networks (FB (Hamilton→Caltech), FB (Howard→Simmons)) and one synthetic dataset Sync-  .Considering most graph neural networks are not designed for cross-network graph representation learning, we compare our Bridged-GNN with two other learning frameworks (T and S+T, see Sec. 4.2) to validate the gain of performance caused by Bridged-GNN, i.e., our idea of "bridging cross-networks samples".The results show that all variants of Bridged-GNN * combined with a specifical GNN model (e.g., GCN, Table 3: Experiments of node classification on target domain samples in the scenarios of relational data with only intra-domain relations (  in Fig. 2 (b)).The meaning of the model with subscripts can be found at Sec. 4 GAT, GCNII, KTGNN) gain significant improvements in classification performance on all datasets.By bridging originally independent graphs, our method gains the best performance in scenarios of intra-domain relational data (  ).

Results of Knowledge Transfer in 𝑅𝐷 𝑖𝑛𝑡𝑟𝑎&𝑖𝑛𝑡𝑒𝑟 scenarios.
As shown in Table 4, we conduct experiments on two real-world graph datasets (Twitter Graph , Company) and a synthetic dataset Sync- & .Compared with other mainstream GNNs, our Bridged-GNN KTGNN gains significant improvements in classification tasks on all graph datasets.By building a Bridged-Graph based on the original graph structure, our method gains the best performance in the scenarios of relational data with both intra-domain and inter-domain relations ( & ).

Ablation Study
We dive into the mechanisms of Bridged-GNN by ablation studies.

Effects of Adaptive Knowledge Retrieval (AKR) module.
We conduct ablation studies on the AKR module of Bridged-GNN by comparing our AKR module with other traditional methods of learning sample-pair similarity [10,13,60], and we evaluate their   sample logits (output of the last layer).Pair-wise DNN Classifier means we directly train a DNN-based binary classifier to classify a pair of samples whether they belong to the same class.As shown in Table 5, our AKR module can always gain the best performance on pair-wise classification and node classification on Bridged-Graph.

Effects of Intra-domain/Inter-domain Connections.
As shown in Fig. 8, we also validate the effects of intra-domain/interdomain connections when constructing Bridged-Graphs with the retrieved knowledge (top-K beneficial samples).The results show that the inter-domain connections establishes the bridge for knowledge transfer from the source domain to the target domain, while the intra-domain connections further enlarge the scope and effects of knowledge transfer.

Hyperparameter Sensitivity Analysis
We analyze the effects of an important hyperparameter K which controls the edge density of the Bridged-Graph.K achieves the trade-off between the quantity and quality of knowledge transfer, i.e., a larger K leads to a larger quantity of knowledge transfer, while the quality of the transferred knowledge is lower.Fig. 9 shows the final classification results of Bridged-GNN with different values of K (X-axis in Fig. 9).By replacing the AKR module in Bridged-GNN with other similarity learners (see Sec. 4.4.1),we observe that the full Bridged-GNN model can always achieve the best performance, and the results of our model are more robust as K increases.

RELATED WORKS
Transfer learning is the mainstream framework of implementing knowledge transfer to alleviate data-hungry.Domain Adaptation (DA), as the representative transfer learning branch, best matches the topic of knowledge transfer in this paper and can be broadly divided into methods with shallow and deep architectures.Shallow DA methods [19,30] mainly utilize feature-based or instancereweight-based strategies to minimize the inter-domain distribution distance measured by some metrics (e.g., MMD [18], CORAL [53]).Deep DA methods [8, 25-27, 32, 34, 52, 70]   learn shared or separated encoders for both source and target domains, while eliminating the domain gap.However, these methods usually learn a shared posterior distribution between domains and require a relatively closer inter-domain divergence, thus having severe limitations.Traditional DA methods usually refer to unsupervised DA (UDA), while recent studies [15,50,59,69] prove that UDA has significant defects in substantial domain-shift scenarios and propose to gain better performance via semi-supervised DA.Graph Neural Networks (GNNs) have gained powerful performance on modeling graph data.Existing mainstream GNNs [4,12,17,23,40,41,[65][66][67] follow the message-passing framework, where node representations are updated based on features aggregated from neighbors.However, most of the existing GNNs are built on the IID hypothesis, i.e., all samples belong to the same data distribution.And the IID hypothesis is hard to be satisfied in many real scenarios, making the model performance degrade significantly.Recently, some GNNs for OOD scenarios [28,29,33,49,71,71,73] have been designed, and studies [6,22,42,62] prove that GNNs can complete inter-domain knowledge transfer on graphs well.However, they strictly rely on high-quality graph structures and cannot be applied to non-graph ( ) and cross-graph (  ) scenarios.

CONCLUSION
In this paper, we redefine the paradigm of knowledge transfer by Knowledge Bridge Learning (KBL) to solve the data-hungry problem.Compared with the existing transfer learning paradigm with strong assumptions, KBL learns a knowledge-enhanced posterior distribution of the target domain by defining the scope of knowledge transfer first and then transferring knowledge with GNNs.Correspondingly, we propose a novel Bridged-GNN model under the paradigm of KBL to conduct sample-wise knowledge transfer in both un-relational and relational data-hungry scenarios.Comprehensive experiments demonstrate that Bridged-GNN under KBL paradigm outperforms existing methods by a large margin.

Figure 2 :
Figure 2: Knowledge Bridge Learning in three scenarios.

Figure 3 :
Figure 3: T-SNE visualization of the domain shift in real-world datasets and synthetic dataset.To better show the difference in feature distribution conditioned on different classes, we only use samples from the first two classes to draw scatter plots.

Figure 4 :
Figure 4: Results on Bridged-Graphs constructed via randomly adding edges and controlling the ratio of homophilous neighbors, including homophilous ratio of intra-domain edges (Y-axis) and inter-domain edges (X-axis).

⨁Iter- 2 :Figure 6 :
Figure 6: Architecture of the Adaptive Knowledge Retrieval (AKR) module.The blue (orange) arrows denote the data flow of the source (target) domain.
use neural networks to

Figure 9 :
Figure 9: Analysis on hyperparameter K, which denotes the number of retrieved knowledgeable nodes for each target node and controls the density of the Bridged-Graph.
{  |  ∈  } and  = {   = (  ,   )|  and   is connected}. = | | is the number of nodes,  ∈ R  ×  is the feature matrix and  ∈ R  ×1 is the labels of all nodes.

Table 1 :
Basic information of the datasets used in this paper.  and   denote the number of samples in the source domain and target domain.  is the feature dimension.

Table 5
, Raw Feature Similarity denotes calculating the pair-wise similarity with the raw features.Pointwise DNN Classifier means we directly train a classifier to classify each sample and then calculate the pair-wise similarity with the