Generative Models for Complex Logical Reasoning over Knowledge Graphs

Answering complex logical queries over knowledge graphs (KGs) is a fundamental yet challenging task. Recently, query representation has been a mainstream approach to complex logical reasoning, making the target answer and query closer in the embedding space. However, there are still two limitations. First, prior methods model the query as a fixed vector, but ignore the uncertainty of relations on KGs. In fact, different relations may contain different semantic distributions. Second, traditional representation frameworks fail to capture the joint distribution of queries and answers, which can be learned by generative models that have the potential to produce more coherent answers. To alleviate these limitations, we propose a novel generative model, named DiffCLR, which exploits the diffusion model for complex logical reasoning to approximate query distributions. Specifically, we first devise a query transformation to convert logical queries into input sequences by dynamically constructing contextual subgraphs. Then, we integrate them into the diffusion model to execute a multi-step generative process, and a structure-enhanced self-attention is further designed for incorporating the structural features embodied in KGs. Experimental results on two benchmark datasets show our model effectively outperforms state-of-the-art methods, particularly in multi-hop chain queries with significant improvement.


CCS CONCEPTS
• Computing methodologies → Knowledge representation and reasoning.

INTRODUCTION
Reasoning is the process of analyzing, synthesizing, and making predictions from existing knowledge facts [11,25].Knowledge graphs (KGs) organize world knowledge with relational edges between entities, and serve as a foundation for reasoning [1,24,40].In recent years, reasoning over KGs has attracted much attention in many domains [15,16,44,46], such as recommendation system, question answering, and drug discovery, since it can be used to infer new knowledge or answer queries based on observed facts.
One interesting task we focus on is complex logical reasoning over knowledge graphs [1,10,12,27,49], i.e., answering Existential Positive First-Order (EPFO) queries, including logical operations existential quantifier (∃), conjunction (∧) and disjunction (∨).For example, the question "Which universities did Canadian Turing Award winners graduate from?" can be expressed as an EPFO query, as shown in Figure 1.However, complex logical reasoning is a challenging task because most real-world KGs are large-scale and even incomplete, making EPFO queries unanswerable.
Recently, two research directions have been explored to address these challenges.The first direction focuses on applying or adapting knowledge graph embeddings that are successful in knowledge graph completion [5,19,35].Most methods [1,2,27,49] map EPFO queries to low-dimensional continuous spaces by iteratively executing neural logical operators based on the query dependency graph.The other direction resorts to pre-trained transformer architectures for deeply exploring richer semantic patterns.These approaches [4,18,21,45,48] are mainly devoted to incorporating structural knowledge into transformers, and promising technological advances have been achieved.Despite the success achieved above, there are still several limitations of the existing methods for reasoning over incomplete KGs.First, these methods assign a fixed vector for each query with neural architectures, but ignore the uncertainty of relations on KGs.In fact, different relations may contain different semantic distributions.For instance, the semantic distribution of the relation Nationality is obviously closer than Graduate when inferring the query (Geoffrey Hinton, Citizen, ?). Accordingly, a fixed embedding may have limited capability in capturing the uncertainty of the EPFO query with multiple relational semantics.
Second, most approaches encode EPFO queries using discriminative representation frameworks that just retrieve answers without interpreting the entire query.In contrast, we argue that generative models can learn the joint distribution of queries and answers, which has great potential to model complex interactions and produce coherent answers.Recently, generative models have shown impressive results in various fields, such as image generation, video prediction, and language comprehension.However, little attention has been paid to exploring the generative process (cf. Figure 1 (b)) to capture underlying compound distributions for complex queries on KGs with generative models.
To address these issues, we innovatively regard the complex logical reasoning task as a new generative question answering to approximate the joint distribution of queries and answers.Inspired by the diffusion model [20], a new promising paradigm for generative models, we try to incorporate the diffusion model into KG reasoning.The main motivations are: (1) It provides an elegant way to inject uncertainty into the model through Gaussian distribution; (2) It utilizes the multi-step generative process to improve distribution representation; and (3) It ensures the intermediate process of the model is controllable and explainable.
In this paper, we present a Diffusion model for Complex Logical Reasoning, named DiffCLR.Specifically, our DiffCLR first proposes a query transformation to convert EPFO queries into input sequences by dynamically sampling contextual subgraphs, where relations are regarded as normal nodes (like entities) to alleviate the heterogeneity of KGs.Then, we introduce a diffusion model to perform a multi-step generative process in order to better learn the joint distribution of queries and answers under multi-grained control.For the forward diffusion phase, we corrupt the target query embedding into a Gaussian distribution via noise injection.To fulfill the reverse diffusion phase, we adopt the transformer and further design a structure-enhanced self-attention to effectively incorporate the crucial structural knowledge encapsulated within KG.Finally, we conduct extensive experiments on two benchmark datasets to demonstrate the effectiveness of our model.
In summary, our contributions are as follows: • We propose a model DiffCLR for complex logical reasoning over KGs, which takes advantage of the diffusion model in uncertainty injection and distribution generation.

RELATE WORK 2.1 Complex Logical Reasoning
Complex logical reasoning (CLR) is an intricate task of answering first-order logic queries over KGs.Traditionally, symbolic approaches, such as logic programming [7] or markov logic network [30], traverse the KG to directly search for the answers to complex queries.But many KGs are incomplete and large-scale in the real world, which limits the usage of these methods.Nowadays, the main research focusing on query embeddings can be divided into two lines [1,21,28,50].One is to represent complex logic queries with various geometric shapes and deep neural architectures.Guu et al. [10] first proposes compositional training to predict answers for path queries.GQE [12] answers conjunctive queries by learning a geometric conjunction operator in the embedding space.Q2B [27] represents EPFO queries as hyper-rectangle boxes, where the points inside the box are considered as answers to the query.ConE [49] introduces cone embeddings for queries and designs a series of geometric logical operators.GNN-QE [50] converts the complex query into an expression over fuzzy sets, which is then executed with GNN-based relation projection and fuzzy logical operations.In general, these works represent queries using geometric logical operators, but it is difficult to generalize well to more complex queries because the pure embedding architecture limits their expressiveness.
The other is to adapt advanced transformer architectures for capturing rich semantics and intrinsic patterns [21,29,41].StAR [39] adopts a Siamese neural network to learn both contextualized and structured knowledge for graph completion tasks.BiQE [18] embeds conjunctive queries with the transformer framework based on bi-directional sequence encoders.Relphormer [4] proposes a new variant of the transformer only for knowledge graph completion.kgTransformer [21] introduces triple transformation and mixture-of-experts sparse activation with masked pre-training and fine-tuning strategies for complex logical reasoning.Nevertheless, they generally focus on autoencoder language models to fine-tune reasoning tasks.In contrast, we try to explore generative models to handle complex logical queries over KGs.In addition, few methods consider the uncertainties, like KG2E [13] and TransG [38], which focus only on link prediction rather than multi-hop reasoning.

Diffusion Models for Generation
Diffusion models [14,32] are a class of generative models, achieving remarkable progress in various applications, such as image [31], audio [17] and text [9].The diffusion model is first introduced by Sohl-Dickstein et al. [32].After that, numerous methods have been developed to improve diffusion models, either by advancing theoretical arguments [34], or by enhancing the empirical performance of models [22,33].
More specifically, for text-to-image generation, DALL-E [26] proposes a two-stage model to align embedding vectors from two sources.GLIDE [23] explores the diffusion model with classifier-free guidance.For text generation, Diffusion-LM [20] adds an embedding step and a rounding step to bridge the gap between continuous and discrete spaces.DiffuSeq [9] utilizes a partial noising strategy by imposing noises only on the target sequence rather than on the entire sequence.DiffuEMP [3] designs a diffusion model under explicit control signals to guide the empathetic response.In addition, for recommending system, DiffRec [42] proposes a new generative recommender to model users' interaction in a denoising manner.To the best of our knowledge, we are the first to introduce diffusion models to knowledge graph reasoning.

PRELIMINARY
We briefly introduce the background of EPFO logical queries on knowledge graphs and diffusion models.

EPFO Logical Queries
Knowledge graph G = (E, R) is a heterogeneous graph to integrate the real world together, where  ∈ E represents an entity, and  ∈ R is a binary function  : E × E → { , }, indicating whether a relation is established between a pair of entities.
Given the first-order logical queries with existential (∃) and conjunctive (∧) operations, the conjunctive query  is defined as follows: where where  ?denotes the target variable of the query,  1 , . . .,   refer to a series of existentially quantified bound variables, and   represents a non-variable anchor entity.The goal of EPFO query answering is to find target entity set A ⊆ E that satisfies  ?∈ A iff  [ ?] is true.In addition, we can transform an EPFO query into the Disjunctive Normal Form [27] for disjunctive (∨) queries.

Diffusion Models
The diffusion model [32] is a probabilistic generative model that estimates the distribution of the target data to generate the desired samples.Typically, the diffusion model consists of a forward process and a reverse process.To be concrete, given a sample X 0 ∼ (X), a Markov chain of multiple latent variables X 1 , . . ., X  is produced in the forward process (X  |X  −1 ) via gradually adding Gaussian noises at each time step  ∈ {1, 2, . . ., }.At the last step, X 0 is corrupted into an isotropic Gaussian distribution X  ∼ N (0, 1).Once the forward process is completed, the reverse process iteratively regenerates the original data X 0 by training the diffusion model to predict the learnable distribution   (X  −1 |X  ).In other words, by offering fine-grained control over the generative process, the diffusion model exhibits its capacity to discern implicit dependencies and intrinsic patterns within data distributions.
Taking inspiration from diffusion models, we present an innovative model DiffCLR, which regards complex logical reasoning as a novel generative question answering task.It is essential to stress that our model is different from traditional complex logical reasoning.On the one hand, we consider the uncertainty of relations by corrupting the query embedding into a Gaussian distribution via noise adding; on the other hand, we first introduce generative models of the joint distribution of queries and answers, which are trained to interpret the entire query, not just to answer it.

METHODOLOGY
In this section, we introduce DiffCLR, a diffusion-based generative model for answering EPFO queries over KGs.The framework of DiffCLR is illustrated in Figure 2. Concretely, our model consists of two parts: (1) a query transformation to convert EPFO queries into input sequences, and (2) a diffusion model to approximate query distribution in the generative process.Finally, we adopt a joint training strategy to optimize the model.In the following, we will explain the technical details of our model.

Query Transformation
Given an EPFO query, the first step is to transform it into a sequence to satisfy the input constraint of the diffusion model, which uses a transformer architecture to process sequences.But considering that knowledge graphs are heterogeneous and structured, it is challenging to encode query information while preserving structural knowledge contained in KGs.
In recent years, most methods [4,39,45] focus on single triple concatenation for knowledge graph completion, which are hardly transferred to multi-hop reasoning over KGs, because EPFO queries are more intricate, including multiple anchor entities and logical operations.To handle EPFO queries, little work [18] represents the query as a dependency graph and decomposes it into several reasoning paths.However, there are still two challenges: heterogeneous knowledge into query encoding, which is crucial for complex logical reasoning.
To address these issues, we propose a query transformation strategy that transforms EPFO queries into input sequences via dynamically constructing contextual subgraphs.Specifically, we first extract the local subgraph LG = {E L , R L } for the query .For each anchor entity   in , we obtain node sets of -hop neighborhood N  (  ), where  = {1, 2, ..,  ′ } and  ′ is the number of anchor entities.And then we take the intersection nodes {∩N  (  ),  = 1, 2, ..,  ′ } as the nodes of LG.Finally, we utilize these nodes and their connections embodied in the KG to extract the local subgraph LG, and prune nodes that are isolated or at a distance greater than  from either of the anchor entities.
Next, for each directed triple (, , ) in LG with  (, ) = True, we create a relation-node   that connects both the entity  and the entity .By treating each relation edge in LG as a relation-node, we further construct the contextual subgraph CG = {E C , R C }, where E C = {E L ∪ R L } and the unattributed edge set R C = { :  (,   ) =  (  , ) = True| (, ) = True}, for better fusion of heterogeneous information from entities and relations in the knowledge graph.
In the end, based on the contextual graph CG, we flatten it to obtain the query sequence S = { 1 ,  2 , ...,   }, where  denotes the length of the sequence, and we add an answer node at the end of the sequence.The query transformation submodule is represented in Figure 2 (a).Moreover, a special node-type encoding scheme is introduced to distinguish the types of nodes, where 0 indicates the entity node and 1 means the relation node.Furthermore, the adjacency matrix of CG is stored to retain structural information within the KG, which will be utilized in the diffusion model.

Diffusion Model for Reasoning
To better enhance query representations, instead of modeling the representations as fixed vectors, we represent them as distributions, where different distributions imply different semantics, and use the diffusion model to learn the compound distribution of complex queries.Concretely, based on the contextualized query sequence S, in the forward process, we first obtain its embedding and convert the embedding into a Gaussian distribution via injecting noises.
In the reverse phase, we iteratively reverse a Gaussian noise to restore the target query representation.In particular, we design a structure-enhanced self-attention in Transformer [8] to integrate the essential structural information existing in the KG.Forward Process.In the forward process , given the query sequence S, we first use an embedding function [20] EMB(•) to map the discrete sequence S into a continuous representation X 0 .Then, the model adds noises to the original X 0 step by step: where X 1 , . . ., X  make up a chain of Markov variants and X  ∼ N (0, 1).  ∈ (0, 1) is a noise schedule to arrange the noise scale injected in each step.Note that conventional diffusion models corrupt the entire X 0 .However, taking into account the heterogeneity of the knowledge graph, we explore whether different nodes (entities or relations) contribute to the generation of query distribution.Therefore, we adopt partial noising strategy [9] to impose noises on different parts of X  .Reverse Process.Once the forward process is completed, the reverse process aims to gradually reconstruct the target query representation X 0 from a pure Gaussian noise X  via iterative sampling from a learnable transition: where   (•) and   (•) are predicted mean and standard deviation of (X  |X  −1 ) in the forward process, which are derived using Bayes' rule and implemented by a transformer model   .For the reverse process, we also add a rounding step   (S|X 0 ) =  =1   (  |  ), where   (  |  ) is a softmax distribution, to map the final generation representations to discrete tokens.In summary, figure 2 (b) shows the inference process, following the typical continuous diffusion process, to recover query representations.Structure-enhanced Self-attention.Due to the nature of the conventional diffusion model, one input token can attend to all other tokens in the sequence with a fully self-attention mechanism, acting like a fully connected graph neural network.However, we need to encode the weighted structural features of KGs, where entityrelation pair information is a quite essential signal for complex logical reasoning over knowledge graphs.To this end, we design a structure-enhanced self-attention to incorporate important structured knowledge facts for better query understanding.Figure 3 illustrates the architecture.More concretely, after the query transformation procedure, we can obtain the contextualized query sequence.At the same time, the local structure of the contextual subgraph CG is preserved in the adjacency matrix .Therefore, we integrate the adjacency matrix into the self-attention layer in Transformer: +1 = softmax( where ℎ  is the hidden state of the -th transformer layer,   ,   and   are learnable parameters.  is the dimension of , which is utilized for scaling.In this way, we utilize the attention bias to capture the structural features to model the relationship between entities and relations among contextual subgraphs.

Learning
During training, according to the original diffusion model, we compute the variational lower bound to minimize the difference between the generated target distribution and the real data.The loss function on the diffusion model L dif is defined as follows: where   (X  , ) denotes the output of the diffusion model at the step , and R (•) is a mathematically equivalent regularization term to regularize the embedding learning.Then, we further utilize the generated representation to train knowledge graph reasoning by minimizing the cross entropy loss: where A  is the set of answers to the complex query  and  (|) represents the probability that entity a is the correct answer.Finally, the joint training objective of our model DiffCLR is defined as the combination of the diffusion model loss and the knowledge reasoning loss: In general, the overall process of DiffCLR is summarized as Algorithm 1 that mainly includes query transformation and the diffusion model with forward and reverse processes.

Algorithm 1 The overall process of our DiffCLR model
Input: the knowledge graph G, the EPFO queries Q, and the model   initialized with  .Output: Answer sets for EPFO queries to be predicted.

8:
Calculate L via Eq.( 9), and take gradient descent step on ∇  L to optimize  ; Obtain generated representations, and predict the probabilities for target answers;

EXPERIMENTS
In this section, we conduct experiments to verify the superiority of our proposed model and answer the following research questions: (RQ1) How does DiffCLR perform compared to state-of-the-art CLR methods?(RQ2) What are the effects of different modules in DiffCLR on performance?(RQ3) How does DiffCLR effectively learn the semantic distribution of relations?(RQ4) How does Diff-CLR perform in low-resource settings?(RQ5) How does DiffCLR make answer predictions interpretable?

Experiment Setup
Datasets.In the experiments, we use two widely applied benchmarks, FB15k-237 [36] and NELL995 [43].The statistics of these datasets can be found in [28].To make a fair comparison, we use the same query structures generated by BetaE [28], covering nine different reasoning patterns including both in-domain and outof-domain queries.For training and validation, queries consist of Baselines.We compare DiffCLR 1 against both geometric-embedded and transformer-based methods.The geometric-embedded methods contain GQE [12], Q2B [27], BetaE [28], ConE [49], FuzzQE [6] and MLPmix [1].The transformer-based methods mainly include kgTransformer [21].For all comparable methods, the results of GQE and Q2B are taken from [28], the results of BetaE, FuzzQE, ConE and MLPmix are taken from their original papers, and we retrain kgTransformer using official codes due to different datasets used in the experiments.Evaluation Protocol.Following the evaluation protocol in [28], the answers to each query are divided into two sets: easy answers and hard answers.The easy answers are the entities that can be directly reached via traversing the graph, and the hard answers are those that can only be discovered with predicted links due to missing edges in KG.Therefore, to measure reasoning ability of our model, we adopt the rank of each hard answer as the evaluation metric, calculated by the Mean Reciprocal Rank (MRR) and HITS at N (H@N) metrics.Implementation Details.Our work is based on the representative architecture Transformer.First, we sample 2-hop neighborhoods around the anchor entities to create contextual subgraphs in the query transformation.For diffusion settings, following Gong et al.
[9], we use an exponential noise schedule and set 512 diffusion steps to balance efficiency and effectiveness (more discussion in Section 5.3).For each query sequence, the maximum input length is set to 64, the dimensions of word embedding, type embedding, and time embedding are all 128, and these parameters are randomly initialized based on a Xaiver normalization.During the training, we set the batch size to 128, then apply Adam as an optimizer with a learning rate of 1e-4 and a dropout rate of 0.1.Note that we initially 1 Our source code will be available at https://github.com/liuyudiy/DiffCLR.
sample arbitrary masked subgraphs from the original KGs to pretrain the model.Our model is implemented in Pytorch and trained on Nvidia Tesla V100 GPU.

Overall Comparison (RQ1)
Table 1 reports the MRR results of different models for answering EPFO queries on NELL995 and FB15k-237.In general, DiffCLR significantly outperforms compared models, which demonstrates the effectiveness of our model.Compared to the previous geometric method MLPmix, DiffCLR achieves on average 3.28% and 5.68% relative improvement MRR over NELL995 and FB15k-237.In contrast with the transformer-based kgTransformer, our model also obtains average improvements of 2.17% and 1.26%.In particular, the results for the chain queries (1, 2, and 3) suggest that Dif-fCLR is extremely effective, even when increasing the length of the chain.DiffCLR obtains relative improvements of 6. 99%, 2. 75%, and 4. 73% over NELL995 compared to kgTransformer, respectively.We attribute this gain to the merits of diffusion models with a multi-step reasoning process.In addition, compared to geometricembedded approaches, transformer-based methods make full use of deep learning models to achieve better performance.

Ablation Study (RQ2)
To investigate the impact of different components and different hyperparameter settings on performance improvement, we design the following four ablation experiments.Effect of Structural Self-Attention.To illustrate the necessity of incorporating structural knowledge when encoding the EPFO query, we carry out multiple experiments on structure-enhanced self-attention.From the results in Figure 4 (a), we can see a huge performance drop when removing structural features from structural self-attention, which demonstrates that structure information plays an important role in knowledge graph reasoning.Furthermore, structural self-attention is proved to be superior to basic  self-attention.We consider that the semantic textual similarity acquired from basic self-attention is inadequate in capturing graph structures and entity relationships, which are well maintained in our structure-enhanced self-attention.Effect of Diffusion Model.Unlike the previous methods, using discriminative frameworks to encode EPFO queries, DiffCLR takes full advantage of the generative diffusion model.To evaluate the influence of diffusion models, we remove the diffusion process, and thus the model deteriorates to a simple transformer.As shown in Figure 4 (b), the model without diffusion process gets notably worse performance on two standard datasets, which indicates the effectiveness of adopting the diffusion model to generate the joint distribution of EPFO queries.Effect of Subgraph Scale.In the query transformation, we construct a contextual subgraph to encode the EPFO query.In order to assess the impact of subgraph scale, we conduct experiments by sampling contextual graphs with different hops.The results on NELL995 and FB15k-237 are displayed in Figure 4 (c).We can observe that the ground-truth answers predicted by DiffCLR improve with the number of hops.But when the length exceeds 3, the MRR scores of both two datasets decline, from which we speculate that they may further introduce irrelevant entities and lead to loss of critical information.Note that when the hop is equal to 0, we directly transform the original query into an input sequence.Finally, we set the hop to 2 to construct the contextual subgraph.Effect of Diffusion Step.In general, the number of diffusion steps is the most critical factor to determine the efficiency of our DiffCLR.Therefore, we investigate the influence of diffusion steps on model performance and inference time over NELL995.The results are shown in Figure 4 (d), where the left and right vertical axes represent MRR metric and time consumption, respectively.We can see that as the diffusion step increases, the inference time for one sample grows exponentially, while the performance does not exhibit the same upward trend.Balancing the query representation quality and the model inference time, we argue that a moderate diffusion step (e.g., 2 9 ) is sufficient for knowledge graph reasoning.

Analysis of Relational Distribution (RQ3)
As we emphasize that different relations may contain multiple relational semantics, relations are thus represented as distributions in our paper.To analyze the semantic distribution of the relations learned by DiffCLR, we randomly select several query relations and visualize the representations of the relations using the t-SNE algorithm [37].The visualization result is illustrated in Figure 5.We can observe that: 1) relations with the same or similar topics tend to cluster together, such as person-related relations (motherofperson and parentofperson), and sport-related relations (teamplayagainsteam and playsforteam), which means that our model can generate relation representations with specific semantics; and 2) DiffCLR also has the ability to explore multi-dimensional relation semantics, for example, worksfor is closer to person-related relations than sport-related relations and location-relation relations because the head entities of triplets related to worksfor are usually collocated with people in KGs.

Performance on low-resource settings (RQ4)
One of the main challenges for KGs is the scarcity of resources in the real world, so we try to assess our model in low-resource situations2 , including two following experiments.Performance on Knowledge Memorization.To explore whether DiffCLR can remember the knowledge it has seen, we suppose that the knowledge memorization correlates with training frequency, and then divide the relations into high-frequency and lowfrequency.The results are in Table 3.As seen, 1) high-frequency relations outperform the total dataset significantly, with 11.9% MRR relative improvements for in-domain reasoning; 2) the results of low-frequency relations are much worse than those of high-frequency ones; and 3) relations have a greater influence on out-of-domain queries than in-domain queries, for instance, the increase in MRR is enlarged to 23.4% for high-frequency relations on out-of-domain queries.These findings confirm that knowledge memorization is highly correlated with relation frequency.
Performance on Cross-Domain Transfer.To evaluate whether DiffCLR transfers effectively between different but related domains, we also conduct experiments under four extra query reasoning patterns.As shown in Table 3, for out-of-domain queries (, , 2, and ), promising results have been achieved, demonstrating Diff-CLR's capability in gaining generalizability.However, our model does not achieve competitive results compared to other models in Table 1.We speculate that there are two reasons.First, our sequence reasoning does not extend well to more complex query structures.Second, unseen logical operations make it difficult for our model to estimate data distributions.We will improve the performance in the out-of-domain for future research.

Case Study for Model Interpretability (RQ5)
A valuable property of our model is its interpretability when computing answers for a given query.Here, we illustrate how the DiffCLR clearly answers EPFO queries, and three queries with unique structures are presented in Table 2.In case 1, consider the following logical query:  ?.∃:   (ℎ, ) ∧  (,  ?), where Richard Clark is an anchor node, we can find that the intermediate entity Merck is extracted in the contextual subgraph, which provides useful clues for answering the query.In case 2, taking the query for example "Which athletes join the Chicago Cubs and the home stadium is Wrigley Field?", where there are two head anchors but no intermediate assignment, the extracted neighbors are integrated into the diffusion model to obtain correct answers, suggesting the necessity of iteratively performing diffusion process for reasoning.In case 3, the conjunctive form is  ?.∃:  (, ℎ) ∧   (,  ? ) ∧   (ℎ ,  ?), although intermediate entities are correctly assigned, the third prediction is wrong, which may be attributed to invisible logical operation and more challenging query.

CONCLUSION
In this work, we present a new generative model, named DiffCLR, to answer multi-hop positive first-order logical (EPFO) queries over KGs.Our model first proposes query transformation and then takes advantage of the diffusion model in uncertainty injection and distribution generation.Extensive experimental results demonstrate significant performance improvements.There are several limitations of DiffCLR that require future studies.First, as discussed in Section 5.5, we further improve the model performance on the outof-domain queries.Second, we strive to handle arbitrary first-order logical queries, which include negative queries.Third, we will explore the combination of the KG structure and text information in knowledge reasoning.

Figure 1 :
Figure 1: Motivation illustration of complex logical reasoning.(a) Query distribution can be seen as a compound distribution of three relations, each with a distinct semantic distribution.(b) The generative process produces the compound distributions of the query in multiple iterative steps.

10 Figure 2 :
Figure 2: The overall framework of our DiffCLR model.(a) Query transformation describes the construction of the contextual graph to convert the given query into an input sequence.(b) Diffusion model illustrates the forward process and the reverse process, and the right part shows details of the transformer with structure-enhanced self-attention.

Figure 3 :
Figure 3: Illustration of structure-enhanced self-attention in Transformer.The left part shows an example of contextual graph, and the right part illustrates the basic self-attention.

Figure 4 :
Figure 4: The comparative experiments of our model.(a) Structure-enhanced self-attention removes structural features and text semantics, respectively.(b) DiffCLR detaches the diffusion model and the contextual graph, respectively.(c) Contextual graph is constructed with different hops.(d) Diffusion step affects model effectiveness and efficiency on NELL995.

Figure 5 :
Figure 5: Visualization of relational distribution with t-SNE.
Extract local subgraphLG from G via taking the intersection {N  (  )}  ′ =1 as nodes and relations as edges; 1: for each  ∈ Q do 2: 3: Construct contextual graph CG based on LG, and flatten it to the input sequence S = {  }  =1 ; 4: repeat 5: Sample a batch of EPFO queries Q ⊂ Q  ; 6:

Table 1 :
Test MRR results on answering EPFO queries.(bold denotes the best results; underline denotes the second best results).
five conjunctive structures (1/2/3/2/3).The model is tested on these query structures, plus four additional query structures (//2/) that have never been seen during training to evaluate its generalizability.

Table 2 :
Case Study.Three forms of EPFO queries are demonstrated to explain the interpretability of our model.
This work is supported by the National Key Research and Development Program of China (NO.2022YFB3102200) and Strategic Priority Research Program of the Chinese Academy of Sciences with No. XDC02030400.