Incomplete Graph Learning via Attribute-Structure Decoupled Variational Auto-Encoder

Graph Neural Networks (GNNs) conventionally operate under the assumption that node attributes are entirely observable. Their performance notably deteriorates when confronted with incomplete graphs due to the inherent message-passing mechanisms. Current solutions either employ classic imputation techniques or adapt GNNs to tolerate missed attributes. However, their ability to generalize is impeded especially when dealing with high rates of missing attributes. To address this, we harness the representations of the essential views on graphs, attributes and structures, into a common shared latent space, ensuring robust tolerance even at high missing rates. Our proposed neural model, named ASD-VAE, parameterizes such space via a coupled-and-decoupled learning procedure, reminiscent of brain cognitive processes and multimodal fusion. Initially, ASD-VAE separately encodes attributes and structures, generating representations for each view. A shared latent space is then learned by maximizing the likelihood of the joint distribution of different view representations through coupling. Then, the shared latent space is decoupled into separate views, and the reconstruction loss of each view is calculated. Finally, the missing values of attributes are imputed from this learned latent space. In this way, the model offers enhanced resilience against skewed and biased distributions typified by missing information and subsequently brings benefits to downstream graph machine-learning tasks. Extensive experiments conducted on four typical real-world incomplete graph datasets demonstrate the superior performance of ASD-VAE against the state-of-the-art


INTRODUCTION
Graph learning [3,7,12,16,29,40,47,52,55], recently burgeoned a surge of interest in both academic and industry communities, as graphs can naturally model real-world data in applications from information, societal, chemical, biological domains, etc. Graph Neural Network (GNN), a type of neural network architecture operating on graphs, achieved superior performance in various areas such as recommendation [31,39], molecular prediction [6] and etc [17,50].Wherein the message passing mechanism [10,24,33] of GNNs enables collaborative modeling of the two essential components on graphs, i.e., attributive and topological information, and distinguishes themselves from other graph embedding approaches that focus only on graph topology [9,38].
GNNs generally assume the attributive information is fully observed.However, this assumption may hardly hold in the real world since graph attributes are possibly incomplete or missing due to either subjective or objective factors [13].For example, 96.2% of data on reconstruction costs, 88.1% on insured damages, and 41.5% on total estimated damages are missing in Emergency Events Database [21] 1 .In the finance field, fraudsters may intentionally omit transactions, accounts, or other significant evidence to evade regulation [35].Users are reluctant to expose private information, e.g., gender, age, location and etc, in social networks with a proportion of 89% due to data privacy issues [1, 15]2 .Recall that GNNs simultaneously encode information from both attributes and structures.These missing attributes could deteriorate their performance to some extent.Figure 1 demonstrates the performance of GCN on different incomplete graphs.An apparent attenuation of accuracy is observed when the missing rate surpasses 50%, and the accuracy even goes below 40% when the missing rate reaches around 90%.
Hence, it is in great demand to buoy the robustness of GNNs when addressing incomplete graphs with missing attributive information.To this end, an intuitive way is to exploit classic imputation methods on these missing node attributes [2,8,20,45].For instance, using MF [32] or KNN [2] to compute the average of the observed attributes and replace the missing values.Or applying deep generative models such as VAE [23] and GAIN [20] to reconstruct missing attribute values.After imputation, the entire attributes, including the original and the estimated ones, are fed for training the downstream GNNs.However, none of these approaches are aware of the underlying graph structure, and an inaccurate imputation may even introduce noise to the following learning process of GNNs.Recently, adapting GNNs to tolerate missing attributes has been explored.To name some, PaGNN [5] directly ignores the missing attribute values when performing message passing and aggregation.[41] uses a randomly assigned value as nodes' attributes.There are also GNNs that infer attributes from graph topology.For example, [5] generates artificial attributes according to nodes' positional and structural information.SAT [4] reconstructs missing attributes by aligning the distributions of node attributes and graph structures.
While current methodologies exhibit notable achievements, they often struggle to generalize in scenarios with substantial missing attributes (c.f.Section 5 for details).We analyze one possible reason comes from that a large area of missing attributes may observe a biased attribute distribution.Thus aligning the structural distribution encoded by topology information to a skewed and biased attribute distribution might render the existing approaches ineffective, even infeasible, in the downstream tasks.
In this study, we move beyond the direct alignment of specific view distributions.Instead, inspired by the graph can be categorized into attributes and structures [4], we posit that the attributes and structures can be sampled from a common shared latent space, and then the missing values in the graph attributes could be estimated with the aid of such a common shared latent space.It is enlightened by the capacity that the human brain is able to reconstruct views from three-dimensional representation merely from incomplete views [51].This latent space is parameterized collaboratively by both two views on the graph.To illustrate, consider the analogy of a damaged cylinder.If one were to observe this cylinder from both its front and side views, a comprehensive mental image can be sculpted.Leveraging this overall impression, the damaged part can be restored by referencing the undamaged view, which resonates with our methodological approach.This restoration process is iterative and we repeat to infer the missing part.Figure 2 exhibits the difference between our idea and the existing ones.What distinguishes our method from conventional ones is the shared latent space  , or the overall impression.This space inherently offers more resilience when confronted with views plagued by missing information.However, defining this shared space remains intractably challenging since graph attributes typically inhabit the Euclidean space, while graph topology predominantly stems from the non-Euclidean domain.
To tackle this issue, we put forth a neural model named ASD-VAE (Attribute-Structure Decoupled Variational Auto-Encoder).Drawing inspiration from multimodal fusion concepts, ASD-VAE is meticulously crafted to impute missing attributes with the fusion of graph topology information and underpins downstream machinelearning tasks.First, ASD-VAE encodes the attributes and structures by different encoders individually and derives representations of each view (observe).For the sake of obtaining Euclidean structures, we find geometric anchors within the graph to obtain the "distance coordinates" of graph nodes.Then, the common latent space is learned by maximizing the likelihood of the joint distribution of different view representations (sculpt).Meanwhile, the latent space is decoupled into separate views (restore), and the reconstruction loss of each view is calculated.After the learning process, we estimate the missing attributes repeatedly under the guidance of the common space (repeat) and feed them into the downstream GNN model along with the observed ones.An improved message-passing schema of GNNs is also devised in this paper to adapt the label prediction on incomplete graph data.
Extensive experiments on four real-world datasets demonstrate the superiority of the proposed ASD-VAE .It outperforms the competitors in label prediction and attribute estimation tasks on incomplete graphs.Meanwhile, the visualization shows that our model can yield high-quality estimated attributes for incomplete graphs.

RELATED WORK
The related research in the literature is summarized as follows.
Attribute imputation.Imputation skills are widely applied in data completion, such as zero or mean imputation [8], multivariate imputation [45], k-nearest neighbor (KNN) [2], and singular value decomposition (SVD) based-matrix completion [34], etc. Deep learning models are also used to estimate missing values, such as Denoising Auto-Encoder (DAE) [48] and GAIN [20].However, when dealing with the problem of node attributes imputation for graphs, none of them consider the topology information in them [11,13,36].
GNN for incomplete graph.An intuitive way to extend GNNs adapting on incomplete graphs is to directly ignore the missing values during the message passing process.For example, PaGNN [14] employs a partial message-passing scheme that only propagates observed attributes.Another way is to perform missing attribute estimation before or during the GNN model training.To name some, GCNMF [11] adopts a Gaussian mixture model to represent missing data and integrates the missing attributes estimation with label prediction in the same framework.In heterogeneous graphs, attributes of certain types of nodes are inaccessible.HGNN-AC [19] performs weighted aggregation of attributes via the attention mechanism, and completes attributes for these no-attribute nodes.SAT [4] is a state-of-the-art model for node attribute estimation, which encodes attributes and structures separately and employs distribution matching techniques to minimize the difference between them.SVGA [53] introduces a Gaussian Markov random field based on the graph structure to model the prior representation of missing attributes.ITR [43] consists of initialization and refinement, which initializes the missing attributes from the structure information and then adaptively refines the imputed variables by aligning these two distributions PCFI [44] develops a new imputation framework consisting of channel-wise inter-node diffusion and node-wise interchannel propagation.A high missing rate of node attribute may distort the observed attribute distribution, and directly aligning a biased representation may hamper the performance.
Graph auto-encoder.Our work is also related to Graph Auto-Encoders (GAEs), which are deep neural networks encoding nodes or graphs into a latent feature space and reconstructing graph information from the encoded representations.If GAE is utilized for graph attribute imputation, a simple way is adopting GNN as encoder and aligning the distribution of the missing part with the learned latent space [13,25,27,42].Recall that GNNs cannot learn high-quality representations when tackling incomplete graphs, GAEs thus may not work well for our task when the encoded representations are unreliable (cf.Section 5 for details).

PRELIMINARIES 3.1 Problem Formulation
is the set of nodes, and E ⊆ V × V is the edge set.Y is the set of labels for node in V.  ∈ R  × denotes the adjacency matrix, where    > 0 if (  ,   ) ∈ E and    = 0 if (  ,   ) ∉ E.  ∈ R  × is the attribute matrix, and  is the number of attribute dimensions. is a mask, where    = 0 if the node   's -th attribute    is unknown, else    = 1.If there exists    = 0, we call G an incomplete graph or attribute-missing graph.Definition 3.2 (Label prediction for incomplete graph).Given an incomplete graph G = (V, E, Y, , , ), and the node set is divided into a train set and a test set, i.e., V = V train ∪ V test .The label   of node   can be observed only if   ∈ V train .The goal of label prediction is to predict test labels

Revisiting Graph Neural Networks
Generally, GNNs utilize a message-passing schema in which the representation of node  is iteratively updated by aggregating information from its neighbors.Here we use Graph Convolutional Network (GCN) [24] to illustrate the message-passing procedures: where Â is the aggregation matrix.Particularly, Â = D − 1 2 Ã D − 1 2 .Ã =  +  is the adjacency matrix with self loops added, and  is the identity matrix.D is the degree matrix of Ã,  ( ) ∈ R  × is the node representation matrix in -th layer, and  ( ) is the learnable weight matrix in -th layer. (•) denotes an activation function.

METHODOLOGY
In this section, we detail our proposed ASD-VAE.First, we present the overall framework of ASD-VAE.Then, we derive the ASD-VAE formulation using variational inference methods.In Section 4.3, we present the details of the model components.Next, we introduce the optimization and learning process.In addition, we propose a component named Katz-GCN, which is adapted for learning on incomplete graphs.

The ASD-VAE Framework
Figure 3 shows the overall framework of ASD-VAE, which is basically a VAE-based architecture and consists of attribute encoder   , structure graph encoder   , sub-common encoder   and attribute decoder   , structure decoder   , respectively.
We assume the representation of graph nodes comes from two perspectives: attributes  and structures .The view encoders   and   generate sub-latent representations of attribute and structure, which are denoted by  *  and  *  , respectively.We couple the two marginal representations  *  and  *  to obtain a common sublatent representation, denoted by  * .The encoder   encodes  * into common latent space  .Next, decoders   and   reconstruct attributes and structure from   and   , which are decoupled from the common latent space  .After modeling the attribute and structure through this coupled-and-decoupled process, a GNN classifier is trained for downstream tasks.

Decoupled Variational Inference
In this part, we introduce the theoretical framework of variational inference designed for our coupled-and-decoupled learning process.

Derivation of Evidence Lower
Bound.We assume that the representations of graph nodes come from the attribute and structure views and they can be represented in a common shared latent space  .Given attribute  and structure  of graph G, we can learn the latent space by maximizing the likelihood of the joint distribution of attribute and structure [23], and estimate the missing attributes from the latent space: In particular, for node   , denote   and   as the attribute and structure vector.  is the shared common latent vector.The loglikelihood of the joint distribution of   and   is decomposed into: We consider the joint log-likelihood  (, ) as a sum over the likelihoods of each data point, i.e. log  (, ) =   log   (  ,   ).Then the Evidence Lower Bound (ELBO) [23] (, ; , ) can be rewritten as: where Drawing from principles of multimodal fusion [4,37], we emphasize the intrinsic uniqueness of the information of   and   .Our aim is to extract and capitalize on the non-redundant insights each view provides, then the relationship between priors:  (  ) =  (  ) (  ) is mathematically formed by the product of the two views [54], and  (  ),  (  ) both follow standard Gaussian distributions.We also suppose the latent vectors from different views are conditionalindependent refer to SAT [4]: where (  |  ) and (  |  ) are parameterized by   and   .
the essence of our model.By minimizing the KL divergence, the common latent vector   is decoupled to derivate each specific view   and   , and the complementary information from other views is cross-referred to help optimize or restore missing attributes.
which constrains   and   to be consistent with their corresponding prior distributions in view latent space.

Model Instantiation
In this subsection, we present the details of ASD-VAE's components and instantiate the distributions in Eq. ( 6).

4.
3.1 Pre-processing.Recall that we face a graph G with incomplete attributes.We first use the column-wise mean to fill in the missing values.Then, we employ 2 code [18] as the input  ∈ R  ×log  to the subsequent structure encoder.Compared with SAT [4], which uses one-hot code ∈ R  × , 2 code reduces the size of the matrix exponentially meanwhile encompasses richer structure information.
Next we use   and   to decode   and   .To reconstruct the attribute and structure information, we assume that: We use  ′  and  ′  to denote the reconstructed marginal representation from   and   .Thus, the reconstruction process can be written as  ( ′  |  ) and  ( ′  |  ).Decoders   and   model these two distributions, parameterized with   and   , which are twolayer MLPs in ASD-VAE.Note that we need to reconstruct  by updating the normalized adjacency matrix, also denoted as Â in Eq. (1).For the whole graph, the reconstruction process is: The matrices  ′ and  ′ represent the reconstructed attribute and structure information.

( Repeat)
Adaptive Update.To optimize the two views, we repeatedly update the attribute and aggregation matrix through the above process until converge according to the following equations [30]: where  and  are the equilibrium coefficients,  is training epoch, , and  (0) is the pre-filled attribute.Indeed, we use a threshold to shrink redundant computation for  ( +1) .The attribute is completed and the structure is refined gradually from the common latent space by leveraging the other view.

Prediction.
Based on the reconstructed attributes and structure, we utilize a two-layer GNN for downstream tasks and predict the label Ŷ at epoch .

Optimization & Model Training
4.4.1 Loss Functions.The loss functions of ASD-VAE can be separated into attribute completion loss and prediction loss.
Attribute completion loss.The attribute completion loss consists of the following parts.The first is the regularization loss designed for the coupled-and-decoupled learning process.According to Eq. 6, we rewrite the ELBO as regularization loss: The second is the attribute and structure reconstruction loss.We rewrite the reconstruction loss of ASD-VAE at epoch  as: where   > 0 is the penalty coefficient for missing attributes.For attribute completion, the loss where Θ = {   ,    ,   ,   ,   ,  *  ,  *  ,   ,   }.Prediction loss.We use the estimated attributes and adjacency matrix to predict labels Ŷ and we use the Cross-Entropy Loss as prediction loss: here | | is the number of classes, and   is 1 if node   belongs to class , else it is 0. ŷ is the predicted probability that node   belongs to class . is the parameter of the GNN predictor.
Total loss.The final loss combines the attribute completion loss and prediction loss, and we optimize them simultaneously:

Katz-GCN
Recall that we are facing an incomplete graph.General GNN may fail to obtain enough information for modeling due to data sparsity, especially when dealing with graphs with high-rate missing attributes.Therefore, we design an advanced GCN named Katz-GCN which is used in   and the GNN predictor in ASD-VAE.The aggregation matrix of the Katz-GCN in Eq.( 1) is defined as: where   ∈ (0, 1) is an attenuation coefficient to control information strength from different hops.In practice, we use a threshold function that resets the value below the threshold  Katz to 0 to shrink computation cost.Generally, there are two perspectives on the analysis of Katz-GNN: • An infinite aggregator.We find Katz-GCN is essentially an infinite aggregator aggregating neighborhood information from different hops with different weights.Katz-GCN enlarges the receptive field while tremendously reducing the model parameters.• A polynomial low-pass filter.An effective polynomial filter âKatz (•) should satisfy: âKatz () ≥ 0, ∀ ∈ (−1, 1] to avoid unstable training, here  is the eigenvalue of graph G.We also prove the available   should satisfy: ÂKatz is a low-pass filter, it is a high-pass filter.Katz-filter can smooth the high-frequency noise caused by missing attributes when   ∈ (0, 1).

EXPERIMENTS
In this section, we investigate the effectiveness of the proposed ASD-VAE model 3 in the node classification task on four benchmark datasets.We strive to answer the following research questions.
• RQ1: Does ASD-VAE outperform the state-of-the-art methods for attribute-incomplete graph node classification?• RQ2: How do the key components benefit the prediction, and whether structures help reconstruct attributes?• RQ3: Do the generated attributes benefit the learning of highquality embeddings?
• Cora and Citeseer: Citation graphs with papers as nodes and citation links as edges.The attribute vector of each node indicates the bag-of-words, and the label represents the topic of the paper.• AmaPhoto and AmaComp: Product co-purchase graphs where the nodes are Amazon products and the edges link the products that users frequently co-purchase.Node attributes are bag-ofwords representations encoded from the product reviews, and labels are the category of products.
We test the proposed ASD-VAE on these graphs with train/valid/test settings summarized in Table 1.For each, we conduct two types of incomplete attributes following previous studies [11,14]: • Uniform Missing.For each node, we randomly select and remove  (short for missing rate) percentage of attributes from the attribute matrix  .This simulates the situation where part of the node attribute is missing or unavailable.• Structurally Missing.For the graph, we randomly select  percentage of nodes and remove their attributes from  .This simulates the situation where all the attributes of these nodes are missing or unavailable.
Generally, structurally missing means a node's attribute is fully unobserved, producing a larger deviation from original distributions.
and auto-encoders like VAE [23], GAE [42], VGAE [25], GINN [13].GAIN [20] is an imputation method using adversarial networks.To evaluate them, we first fill in the missing values and then use the same GNN predictor consistent with ASD-VAE for classification.We also compare against the methods that adapt GNNs to tolerate missing attributes such as GCNMF [11], PaGNN [14], SAT [4], ITR [43], SVGA [53], and PCFI [44].We modify HGNN-AC [19] on the homogeneous dataset and employ its attention-based attribute completion (denoted by GNN-AC).To illustrate the importance of attributes, we also compare with GNN using complete attributes and Label Propagation (LP) [56].LP predicts the node class by propagating the known labels in graphs, without processing attributes.

Experimental Settings.
The parameters of ASD-VAE are optimized with Adam optimizer [22], the learning rate is set to 0.001, and weight decay is 0.001.In ASD-VAE,  ℎ = 2000,  1 = 1,  2 = 200,  3 = 7,   = 100,  =  = 0.2.Moreover, we use Katz-GCN as the graph encoder   and the label predictor.We set   to a very small value close to zero when  ≤ 0.7, and   = 0.667 for Cora and Citeseer,   = 0.5 for AmaPhoto and AmaComp when  > 0.7.The parameters of baselines are also optimized using the Adam with  2 regularization and dropout.For PaGNN and GCNMF, the dropout rate is set to 0.5, and the learning rate is 0.001.In GC-NMF, the number of Gaussian components is 5.For GNN-AC, the number of independent attention heads is set to 8. SAT, ITR, and SVGA consist of two stages, namely attribute completion and node classification, and we set the same data split for both tasks.SAT and ITR can only be used for structurally missing because these models are based on fully-observed nodes.The GNN in ASD-VAE and baselines are two-layered.We set the hidden units to 16 for Cora and Citeseer and 64 for AmaPhoto and AmaComp.Similar to previous work [11,14], we repeat experiments with different random seeds 5 times and report the average.We employ the early-stopping strategy with patience equal to 500 to avoid over-fitting.Experimental results are statistically significant with  < 0.05.

Implementation
. ASD-VAE is implemented in Pytorch 1.9.0 with Python 3.8.All the experiments are conducted on Ubuntu 18.04.5 LTS server with 1 NVIDIA Tesla V100 GPU, 440 GB RAM.Label propagation is implemented based on DGL [49].MF, GAE, VGAE, and PaGNN are implemented by ourselves.Other baselines are implemented using author-provided source codes.

Node Classification Results (RQ1)
To answer RQ1, we conduct experiments on the node classification task under the two attribute-missing scenarios and report the prediction accuracy in Table 2, respectively.We gradually increase the  from 0.1 to 0.9, with a leg step of 0.4.Several observations can be derived from the results.
First, our model ASD-VAE exceeds all the baselines in most cases.For example, the improvements range from 1%∼23% in the uniform setting and 1%∼42% in the structurally missing setting on Cora and Citeseer, respectively.Similar performance is achieved on the AmaPhoto and AmaComp datasets.Meanwhile, our ASD-VAE is more robust to the attribute missing than baselines.When  ≤ 50%, ASD-VAE even outperforms the GNN with complete attributes, suggesting our model not only completes missing values but also  Second, we observe the classic imputation methods are inferior to those of GNN-based models, indicating that estimating unknown values merely from observed attributes is unreliable and may even introduce noise.For GNN-based approaches, recall that GCNMF, SAT, and SVGA reconstruct missing attributes from the distribution of learned structure embeddings.Though they surpass imputationbased methods, they yield similar results to PaGNN that directly ignore attribute missing.We speculate the possible reason is that the distribution of partially observed attributes is biased, resulting in an ineffective distribution alignment.GNN-AC and ITR estimate attributes by aggregating the embeddings of nodes with observed attributes, which may boost the overall embedding quality.However, as the  increases, all baselines deteriorate significantly since more unavailable data makes the distribution more unreliable.PCFI introduces the concept of channel-wise confidence to impute incomplete graphs with a high  .Despite the superior performance of PCFI at high  settings, it is still inferior to ASD-VAE.
Third, the gap in performance among the methods on AmaPhoto and AmaComp datasets is less than that of the other two.We conjecture the reason may be that AmaPhoto and AmaComp are much denser.As a result, the message-passing schema of GNN enables the graph nodes to receive massive amounts of information from their neighbors to mitigate the impact of attribute missing.The high performance of the label propagation on AmaPhoto and AmaComp datasets also supports our speculation, considering the method only takes topological structure in its modeling.

Attribute Completion Results (RQ1)
We conduct experiments on the attribute completion task and report Recall@k and NDCG@k following [4,43,53] to evaluate the quality of restored attributes.The results are demonstrated in Table 3. From Table 3, ASD-VAE outperforms all compared baselines in terms of four metrics except Recall@50 in Amaphoto.Take the results of Recall@10 and NDCG@10 as example, compared with the strongest competitor, ASD-VAE exceeds by 34%/39%, 43%/58%, 16%/24%, 53%/32% on four datasets, respectively.Though such a task might fail to be the ultimate goal of the downstream learning tasks, an accurate estimation of missing attributes can further facilitate the learning of node representation, recalling that GNNs usually adopt collaborative learning for both attributes and structure.Hence, such a comparison could also underpin the answer to RQ1.

Ablation Study (RQ2)
To answer RQ2, we perform ablation studies to verify the effectiveness of the critical components of ASD-VAE.Specifically, we verify the performance of three variants of ASD-VAE (1) ASD-VAE without coupling and decoupling (denoted by −CD), (2) ASD-VAE without adaptive update (denoted by −ADU), and (3) ASD-VAE with general GCN instead of Katz-GCN (denoted by −Katz).The results are illustrated in Figure 4. From the figure, we can observe every component has a positive contribution to the model performance.The absence of every variant deteriorates the performance compared with the full model.−CD exhibits the most significant degradation in most cases, which is also the core component of the model ASD-VAE.The results demonstrate that coupling the attributes and structures space to learn the common space  and disentangling the latent space of each view from  is beneficial for attribute completion.−ADU shows that the adaptive update of the adjacency matrix benefits the node classification as it adjusts the graph topology to accommodate the attribute reconstruction.−Katz suggests that the Katz-GCN facilitates the missing attributes estimation and downstream tasks, especially at high  .Katz-GCN component enlarges the receptive field of neighborhood aggregation, mitigating the information loss due to high data deficiency.

Visualization (RQ3)
To answer RQ3, we visualize the learned node embeddings into 2D space by -SNE [46] on Cora at the structurally missing setting.Good representational ability implies that a method can learn representative embeddings in which similar nodes map to nearby objects.Therefore, nodes of the same class are expected to be grouped by -SNE.In Figure 5, we found under either a high  , ASD-VAE as well as the recent competitors SVGA and PCFI perform more stable than the other methods.Their illustrated visualizations are more distinguishable, underpinning the effectiveness of our method.

CONCLUSION
In this paper, we proposed ASD-VAE, a neural model for attributemissing graph imputation, which parameterizes the structure and attribute representations into a common shared latent space via a coupled-and-decoupled learning process.ASD-VAE learns attribute and structure dependencies in a coupled way and then maximizes the likelihood of joint distribution.Meanwhile, the shared space is decoupled into separate views and missing values are imputed from this learned latent space.Moreover, we presented an advanced GCN named Katz-GCN to enlarge the receptive field and remove high-frequency noise under a high missing rate.The comprehensive experiments conducted on eight missing graph learning tasks (with four datasets and two types) verify the effectiveness of our method.

ACKNOWLEDGMENTS
The research work is supported by National Key R&D Plan No.2022Y FC3303302, the National Natural Science Foundation of China under Grant (No.61976204), and the CAAI Huawei MindSpore Open Fund.Xiang Ao is also supported by the Project of Youth Innovation Promotion Association CAS and the Beijing Nova Program.

Figure 1 :
Figure 1: GCN's effect on node classification task under different missing rates on four incomplete graphs.

Figure 2 :
Figure 2: Difference between the existing methods and ASD-VAE.The original graph can be distinguished into several views, and most existing methods focus on aligning different view representations (denoted by  1 and  2 ), while ours assumes view representations are derived by decoupling from a common latent space (denoted by  ).

Figure 3 :
Figure 3: The overall framework of ASD-VAE.ASD-VAE encodes the pre-processed attributes and structures into the sub-latent space at first.After coupling, the common latent space  is obtained by   .Next, ASD-VAE decouples the marginal latent space   and   from  .Then, ASD-VAE reconstructs the attributes and structures by decoders   and   .Finally, a GNN predicts labels based on the reconstructed graph.

Figure 4 :Figure 5 :
Figure 4: An ablation study of ASD-VAE on Cora and Citeseer compared with its three variants (details in Section 5.4).ASD-VAE yields the best test accuracy.
For node   , we denote   as the attribute latent vector of   and   as the structure latent vector of   .
(  ,   |  ) and   (  |  ,   ) are the conditional distribution parameterized by  and .For clarity, we omit the parameters  and  in the equations below.The first term is the reconstruction term, in which   is used to reconstruct   and   .The second term is a KL divergence that regularizes the (  |  ,   ) to match the prior distribution  (  ).Since the KL term is non-negative, we convert the optimization objective into maximizing the ELBO.4.2.2 ELBO Decomposition.
4.3.2( Observe) Structure & Attribute Encoder.Prior to introducing the attribute and structural encoders, we first present the sublatent space as it is difficult to directly calculate   (  |  ,   ).Symbol * is used to differentiate sub-latent space and latent space.The encoders   and   aim to model the marginal sub-latent distributions, namely ( *  |  ) and ( *  |  ) with parameters   * and   * . *  and  *  are sub-latent embeddings of   and   .In ASD-VAE, we adopt encoder   as a two-layer MultiLayer Perceptron (MLP), and the input of   is the pre-filled attribute matrix.  is adopted a two-layer GNN (e.g.GCN or Katz-GNN we proposed in Section 4.5, and the inputs of   is 2 code of the nodes and the graph adjacency matrix.After encoding, we get sub-latent vectors of node   , i.e., ( *  ) and ( *  ).4.3.3 ( Sculpt) Coupling.Under our assumption, we obtain the common sub-latent representation  *  by coupling the marginal sub-latent representations: ( *  ) = ( *  )( *  ), and the coupling process can be achieved by Hadmard products as  * =  *  ⊙  *  =   ( ) ⊙   (, Â).Next, we need to convert the sub-latent space to latent space.Here we deem (  ) = ( *  )(  | *  ).An encoder   models the distribution (  | *  ) and maps the  *  to   with parameters   .  is a two-layer MLP:  =   ( * ).4.3.4 ( Restore) Decoupling & Decoder.After getting the common latent vector   , we decouple it to get the latent representation of each view, i.e.,   and   , which is different from previous studies [4, 28].To be specific, we get the mean and standard deviation of   and   through two MLPs, which model the distribution (  |  ) and (  |  ) and are parameterized by    and    .The mean and standard deviation of   and   are denoted as   ,   ,   ,   .Through reparameterization techniques, we can obtain   and   :

Table 1 :
The statistics and train/valid/test split of datasets.

Table 2 :
Performance comparison of node classification on four benchmarks.

Table 3 :
Performance comparison of attribute completion on structurally missing with  = 0.70.