Link Prediction with Attention Applied on Multiple Knowledge Graph Embedding Models

Predicting missing links between entities in a knowledge graph is a fundamental task to deal with the incompleteness of data on the Web. Knowledge graph embeddings map nodes into a vector space to predict new links, scoring them according to geometric criteria. Relations in the graph may follow patterns that can be learned, e.g., some relations might be symmetric and others might be hierarchical. However, the learning capability of different embedding models varies for each pattern and, so far, no single model can learn all patterns equally well. In this paper, we combine the query representations from several models in a unified one to incorporate patterns that are independently captured by each model. Our combination uses attention to select the most suitable model to answer each query. The models are also mapped onto a non-Euclidean manifold, the Poincaré ball, to capture structural patterns, such as hierarchies, besides relational patterns, such as symmetry. We prove that our combination provides a higher expressiveness and inference power than each model on its own. As a result, the combined model can learn relational and structural patterns. We conduct extensive experimental analysis with various link prediction benchmarks showing that the combined model outperforms individual models, including state-of-the-art approaches.

Knowledge graphs are typically stored using the W3C standard RDF (Resource Description Framework) [4] which models graphs as sets of triples (ℎ, , ) where ℎ,  , and  represent resources that are described on the Web.The link prediction community refers to them as head entity, relation, and tail entity, respectively.Each triple corresponds to a known fact involving entities ℎ and  and relation  .For example, the fact that Berlin is the capital of Germany is modeled as the triple (Berlin, capitalOf, Germany).
A relevant problem for knowledge graphs, called link prediction, is predicting unknown facts (links) based on known facts, and knowledge graph embedding (KGE) is a prominent approach for it.To predict links, KGEs map entities ℎ and  and relations  into elements , , and  in a low-dimensional vector space, and score the plausibility of a link (ℎ, , ) using a score function on , , and .Most KGE models [5,22,33,35,46] score a link (ℎ, , ) by splitting it into the query  = (ℎ, , ?) and the corresponding candidate answer .The query is embedded to an element in the same space as the candidate answers with a transformation function  =   () that depends on the relation  and is applied to .The score of a link is then a measure of the similarity or proximity between  and .
KGE models can learn logical and other patterns (example in Figure 1) to predict links.For instance, the facts that co-author is a symmetric relation and part-of is hierarchical can be learned from the data.However, the capability of different KGE models to learn and express patterns for predicting missing links varies widely and, so far, no single model does it equally well for each pattern.Logical patterns exhibit the form  − →  where  is the conjunction of several atoms and  is an atom.[23].
Structural patterns refer to the arrangements of elements in a graph.A relation forms a hierarchical pattern when its corresponding graph is close to tree-like [6], e.g., (eagle, type-of, bird).For example, RotatE defines transformations as rotations  RotatE  () =  •  in Complex space (• is an element-wise complex product).In this way, RotatE can enforce both  •  = ,  •  =  if  2 = 1 and, thus, it is able to model symmetric relations.In Table 1, we present a summary of the query representation of some state-of-the-art baselines.We indicate whether a KGE can or cannot model a specific pattern.If it can model a pattern, we further include the number of constraints they have to satisfy to express this pattern.For instance, antisymmetry for RotatE requires two constraints  ≠ −1 and  ≠ 1 to be expressed.Further explanation of Table 1 can be found in Appendix A. 4.
Beyond the KGEs surveyed in Table 1, further works have defined query representations successfully dealing with different subsets of patterns, such as 5*E [22], AttE/H [6], TransH [41], or ProjE [29].However, there is neither a single transformation function that can model all patterns nor a single approach that can take advantage of all the different transformation functions.
In this paper, we tackle this problem and propose a general framework to integrate different transformation functions from several KGE models, M, in a low-dimensional geometric space such that heterogeneous relational and structural patterns are well represented.In particular, we employ spherical geometry to unify different existing representations of KGE queries, (ℎ, , ?).In our framework, representations of KGE queries,    (h) with  ∈ M, define the centers of hyperspheres, and candidate answers lie inside or outside of the hyperspheres whose radiuses are derived during training.Plausible answers mostly lie inside the convex hull formed by the centers of the hyperspheres.Based on this representation, we learn how to pay attention to the most suitable representations of a KGE query.Thereby, attention is acquired to adhere to applicable patterns (see Figure 1 and Figure 2).
Furthermore, we also project our model onto a non-Euclidean manifold, the Poincaré ball, to also facilitate structural preservation.
In summary, our key contributions are as follows: • We propose a spherical geometric framework for combining several existing KGE models.To our knowledge, this is the first approach to integrate KGE models taking advantage of the different underlying geometric transformations.

RELATED WORK
We review the related works in three parts, namely the baseline models we used for combination, the models which provide other approaches for combinations, and models that combine spaces.

KGE Model Baselines
Various models [11,17,38] have been proposed for KGE in the last few years.Each KGE defines a score function  (ℎ, , ) which takes embedding vectors of a triple (, , ) and scores the triple.In our work we integrate and compare them to the following baselines: • TransE [5] computes the score of a triple by computing the distance between the tail and the translated head.Thanks to the translation-based transformation, this KGE is particularly suited for modeling inverse and composition patterns.• RotatE [33] uses a relation-specific rotation r  =   to map each element of the head to the corresponding tail.RotatE can infer symmetrical patterns if the angle formed by the head and tail is either 0 or  .Besides, rotations are also effective in capturing antisymmetry, composition, or inversion.• DistMult [45] represents each relation as a diagonal matrix.
Its score function captures pairwise interaction between the same dimension of the head and tail embedding.Thus, DistMult treats symmetric relations well, but scores so highly inverse links of non-symmetric and antisymmetric relations.• ComplEx [35] extends DistMult in the complex space to effectively capture symmetric and antisymmetric patterns.• AttH [6] combines relation-specific rotations and reflections using hyperbolic attention and applies a hyperbolic translation.Rotation can capture antisymmetrical and symmetrical patterns, reflection can naturally represent symmetrical relations, while the hyperbolic translation can capture hierarchy.We also compared our models against AttE [6], a variant of AttH with curvature set to zero.

KGEs Combination
Combinations between KGEs of the same kind.Authors in [44] showed that, under some conditions, the ensemble generated from the combination of multiple runs of low-dimensional embedding models of the same kind outperforms the corresponding individual high-dimensional embedding model.Unlike our approach, the ensemble model will still be able to express only a subset of existing logical patterns.
Combination between different KGE models.Prior works [20] proposed to combine different knowledge graph embeddings through score concatenation to improve the performance in link prediction.[40] proposed a relation-level ensemble, where the combination of individual models is performed separately for each relation.A recent work [39], proposed to combine the scores of different embedding models by using a weighted sum.Such methods combine scores either per model or per relation, while we provide a query attention mechanism for the combination.
A different approach has been proposed in MulDE [37], where link prediction is improved by correcting the prediction of a "student" embedding through the use of several pre-trained embeddings that act as "teachers".The student embedding can be considered to constitute an ensemble model.However, this ensemble cannot steer decisions towards the strengths of individual models but can only decide randomly or based on majority guidance by teachers.
Further ensemble approaches between KGE and machine learning models can be found in the Appendix A.3

Combination Of Spaces
A different line of research aims at improving link prediction performance by combining different geometrical spaces.[12] improves link prediction by combining Hyperbolic, Spherical, and Euclidean space.Similarly, [42] embedded knowledge graphs into an Ultrahyperbolic manifold, which generalizes Hyperbolic and Spherical manifolds.On the other hand, we combine queries rather than geometric spaces.

PROPOSED APPROACH
In this section, we present our geometric query integration model using Euclidean and Hyperbolic geometries, and introduce our approach in the following four items: a) entity, relation, and query representation, b) spherical query embedding, c) Riemannian attentionbased query combination, d) expressivity analysis.We split each triple (ℎ, , ) into two parts, namely the tail query  = (ℎ, , ?) and the candidate answer , and represent their embeddings by ,  respectively.
In our model, we aim at combining the queries from several existing KGE models that are specified in Table 1.We denote the query representation set by Q = {  |  =    (),  ∈ M} where M is a set of several existing KGE models such as TransE, RotatE, ComplEx, DistMult, etc, and the function    () is a relation-specific transformation from a head embedding to a query representation for model .Note that we assume that query representations by different models lie on the same space.In this paper, we stay in Euclidean space for query combination.In this regard, we can combine models lying either directly in Euclidean space (e.g., TransE and DistMult) and models that can be rewritten to lie in the Euclidean space (e.g., models lying in Complex or Hypercomplex spaces as ComplEx, RotatE, and QuatE by assuming where C  , R  are -dimensional Complex and Euclidean spaces).We then project such query vectors on a hyperbolic manifold to handle hierarchical patterns.b) Spherical Query Embedding.In this part, first, we propose a spherical query embedding to represent each query as a sphere whose center is the vector embedding of the query.This sphere defines the answer space of the query.Second, we propose an approach to combine query representations of several already existing embedding models in one spherical query representation to enhance the modeling of heterogeneous patterns.In "radius and ranking", we will show that the spherical representation is connected to the ranking metric Hits@k.In particular, the top k candidate answers for a query  are embedded in a sphere whose center is a combination of the vector embeddings   of query .To practically enforce this, the radius in our spherical query embedding needs to be set.Therefore, in "radius and loss", we will show that a loss function can enforce the improvement of Hits@k by enforcing top k candidate answers of a query inside the sphere.
Here, we formalize the combination of  spherical KGEs.Let us assume that  1 ,  2 , . . .,   ∈ Q be the  vector query embeddings of a query  = (ℎ, , ?) from  distinct KGE models, and  =  be the embedding of the candidate answer.We represent each query as a hypersphere with a pair    = (  ,   ),   ∈ Q, where   ∈ R  is the center of the th sphere associated to the th model and   is the radius.By using the function we define the answer space A and non-answer space N as decision boundaries in the embedding space for each query as follows: In this case, all embeddings of answers  are supposed to lie on or inside a sphere with a radius of   and center   , i.e.,  ∈ A  , and the ones which are not answers lie outside of the sphere [24,47].We combine spherical query embeddings of several existing KGE models in one spherical query embedding as follows: Combination.Given the vector embeddings  1 ,  2 , . . .,   ∈ Q we can set a radius    for each   such that the answer space A  covers the top  candidate answers.
Therefore, the combined spherical query embedding is the spherical embedding    = (  ,   ) where This leads to the following top  candidate answer space of the combined spherical query embedding: Figure 2 (top right) shows the query representations, and candidate answer spaces of TransE, RotatE, RefE, and DistMult, together with the combined query (without attention to a particular model).The combined query mainly lies inside the convex hull of all the models within the answer space.We later show that most answers lie within the convex hull covered by the combined query.Therefore, the combined model takes the advantage of all models.Before theoretically justifying this, we bridge the radius  in spherical query embedding and ranking metrics, as well as the practical way of modeling radius using the loss function in the following parts.
Radius and Ranking.Most KGE models are evaluated based on ranking metrics such as Hits@k [5].Here we explain the connection between the ranking metrics and radius in our spherical query embedding.Because the overall ranking is computed by taking the average over ranks of all test triples, we explain the connection between ranking and our model by considering an individual test triple.During testing, for each given positive test triple (ℎ, , ), the tail  is replaced one by one with all entities  ∈ E. We denote T  = (ℎ, , ) the corrupted triple generated by replacing  by .Therefore, T = {T  | ∈ E − { }} is the set of all corrupted triples generated from the correct triple (ℎ, , ).After computing the score of each triple in T and sorting them based on their scores in descending way, we select top  high score samples and generate a new set T  containing these samples.The spherical query embedding    = (  ,   ) associated to a query  = (ℎ, , ?) defines a spherical answer space A  that contains the vector embeddings  for the top  entities  ∈ T  .That is, T  contains top  candidates for a query , and A  in Equation 5is the candidate answer embedding space.We want the vectors of answers in T  to lie inside A  , and to be as close as possible to the query center to improve ranking results.To enforce this, we define a loss function to optimize the embeddings, as is explained below.
Radius and Loss Function.In this part, we show that the existing loss functions implicitly enforce a particular (implicit) radius around the vector query embedding   .Let us focus on the widely used loss function shown in the following [6]: where Theoretical Analysis.Equation 5 indicates that if the query is represented by (  ,   ), then the score given by the combined model to a plausible answer is lower than the average of the scores given by the individual models, and higher than the lowest individual model score because, without loss of generality, we have min( ( 1 , ),  ( 2 , )) ≤  (  , ) ≤ max( ( 1 , ),  ( 2 , )).(7) This equation shows that for a particular , the combined model gets a better score than the worst model, but it gets a lower score than the best one.However, by increasing , the combined model covers the answers provided by both models because most of the answers lie in the convex hull between the queries (later it will be proved), and the combined model with arbitrary large  covers the answers represented by each model.Therefore, the combined model improves Hits@k with a sufficiently large .Later in this section, we present the attention-based model which enables us to improve Hits@k for small .
The following proposition states that the best embedding for an answer to a query lies in the convex hull of the query embeddings given by two models.This implies that if two models are trained jointly with the combined model, the answers of each query lie between the centers of the two spheres associated with the two embeddings of the query.This facilitates the answer space of combined spherical query embedding to cover the answer embedding from each individual model.This can be generalized for an arbitrary number of models.Proposition 3.1.Let  1 and  2 be two query embeddings for a query .Then, the following two statements are equivalent for every vector  in the vector space: lies in the convex hull of vectors  1 and  2 .
Weighted Combined Query Embedding.A consequence of proposition 3.1 is that the combined query embedding can improve the performance when  is sufficiently large (e.g., Hits@20).However, for a low  (e.g., Hits@1) the performance is degraded because one model gets a better ranking, and the combined model with an average query does not cover it.In addition, among several models, there might be possible that some models return wrong answers which might also influence the combined model.Therefore, allowing the combined spherical query embedding   to slide to  1 or  2 is beneficial.Hence, without loss of generality, we combine two query embeddings as the convex combination of the inequalities: By computing this convex combination, we have Therefore, the combined spherical query embedding is    = (  ,    ) where   = ( 1 +  2 ) and    =   1 +   2 .This combination is generalized for  models: Attention Calculation.Given a combined spherical query embedding    = (  ,   ) with we can compute   s by providing an attention mechanism [6]   = exp((  )) exp((  )) where () =  is a function with a trainable parameter .We call this version of our model Spherical Embedding with Attention (SEA).
Riemannian Query Combination.We next extend our attentionbased query combination to Riemannian manifolds to model both relational patterns (via various transformations used in different models) and structural patterns as hierarchy via the manifolds (e.g., Poincaré ball).Similarly to [6], we perform attention on tangent space.We consider all models in Euclidean space and combine their query embeddings.The resulting query embedding on the tangent space is then projected to the manifold via the exponential map.This attention-based model combination is defined as follows: We compute the score as  (, ) =  (   ⊕ , ) , where , , ,  are points on a manifold M, exp 0 (•) is the exponential map from origin, and ⊕ is Möbius addition.In terms of the Poincaré ball, the manifold, exponential map, and Möbius addition are defined as follows [2,6]: where  is the curvature, exp is the exponential map from a point on tangent space to the manifold,   is the distance function with curvature , and  is the point on the tangent space to be mapped to manifold via the exponential map.We call the hyperbolic version of our model Spherical Embedding with Poincaré Attention (SEPA).
d) Expressivity Analysis.In this section, we analyze our models in terms of expressive power as well as the subsumption of other models.Our model is a generalization of various existing Euclidean and non-Euclidean KGE models.We say that a model  1 subsumes a model  2 if for every given KG  and scoring by model  2 , there exists a scoring by model  1 such that the score of every triple  ∈  by  1 approximates the score of  by  2 [19,22].It is important to notice that a model can infer a pattern inherently or infer a pattern under a certain condition (see Table 1).Our model aims to take advantage of the inference power of multiple models on heterogeneous patterns with minimum certain conditions by providing attention mechanisms per relation type forming different patterns.Note that our model is not influenced by incapable models on particular patterns because the attention can be learned as zero for those models.Overall, our combined model can inherit the capabilities mentioned in Table 1 and ignore the incapability of other models which is shown in Theorem 3.2 and Theorem 3.3.Hence, if our model is executed on a dataset containing only a single pattern, we do not expect to outperform the combined models, rather than achieving competitive performance to the best model.
Proof of propositions can be found in Appendix A.1.

EXPERIMENTS
In this section, we conduct extensive evaluations to show the effectiveness of our proposed approach.To do so, we first introduce the utilized datasets, followed by the selected baseline for the combination and the comparison.We then present the experimental setup and hyper-parameter setting.The results and analysis are presented in three folds: comparison with the individual baselines, comparison with other combination models, and comparison with models in the Ultrahyperbolic space.Finally, we provide several analyses to show the role of attention on learning and inference over various patterns for different kinds of relations and models.

Baseline
In this section, we aim to show experimentally that the geometric combination of several existing KGE models improve their performance.To this end, we select a subset of KGEs in Euclidean, Complex, and Hyperbolic space with different capabilities to show we can combine a wide range of models.In particular, we select a subset of TransE, DistMult, ComplEx, RotatE, AttE (only reflection), and AttH (hyperbolic projection operator) and compare our combined models against such baselines.We also compare our models with two additional state-of-the-art KGEs: in high dimension, TuckER [3], and in low dimension, MuRP [2], to show that our models can outperform models that were not combined.Furthermore, we also compare our model with a recent top model for combining several KGEs, namely MulDE [37] because it uses a similar set of KGEs for the combination, similar dimensions, and some of the benchmarks we used.Additionally, we will show that our model gets comparable performance with UltraE [42], a model on the Ultrahyperbolic space.

Experimental Setup
Evaluation Metrics.We use the popular ranking metrics [38] namely Mean Reciprocal Rank (MRR), and Hits@k, k = 1,3,10.Given a set of test triples T = {(ℎ, , )}, for each test triple  = (ℎ, , ), we compute its rank as follows: we first corrupt the head entity by replacement with all possible entities in the KG, say  ′ ∈ E, and generate a set of candidate corrupted triples for  i.e.,   = { ′ = ( ′ , , )}.We filter   by removing all generated candidates which are already appeared in the train, validation, and test sets, together with removing the cycle.After computing the score of the candidate triples and sorting them, we find the rank of the candidate test , and call it   .The same procedure is performed for computing the right rank by corrupting the tail entity and computing the right rank.The average of the left and right ranks will be considered the final rank of the test triple.We then compute the average reciprocal of all test triples and report it as MRR.Hits@k will be computed by reporting the percentage of the test triples ranked less than .

Link Prediction Results And Analysis
The result of comparing SEA and SEPA to the combined models on FB15k-237, WN18RR and NELL-995-h100 are shown in Table 3 ( = 32) and in Table 4 ( = 500).As expected, while the hyperbolic version of our combined model (SEPA) outperforms all baselines in low-dimensional settings, the Euclidean one (SEA) is the best model in high-dimensional space.Comparing SEPA and SEA, in low-dimensional space, we can see the performance improvements on WN18RR and NELL-995-h100 are much more than FB15k-237.This is due to the presence of a significant amount of hierarchical relations in WordNet and NELL compared to Freebase.We still observe SEPA outperforms SEA on FB15k-237 dataset.The main reason is that SEPA combines hyperbolic manifolds with various transformations used in queries of different models, so it is capable of capturing the mixture of structural and logical patterns in a low dimension (e.g., compositional patterns in Freebase).Even though we did not combine AttE and AttH directly, but only used reflection and the hyperbolic projection, respectively, we were still able to outperform them.Similarly, SEPA outperforms MuRP in low dimensions, and SEA outperforms TuckER in high dimensions in all metrics apart from the H@1 of FB15k-237.More details are available in Appendix A.6.
Our combination model increases the expressiveness of individual models (Proposition 3.2), having the best performance gain in low-dimensional space.Besides, our model takes advantage of the inference power of the base models with fewer constraints (Table 1) by utilizing the attention mechanism.On the other hand, in high-dimensional space, Euclidean models are proven to be fully expressive [40].Hence, even though SEA outperforms all baselines, the performance gain is not as significant as in low-dimensions.

Further analyses
We additionally make a series of further analyses to evaluate the performance of our attention-base combination function.First, we want to show that our model is able to increase the precision of predictions for both symmetric and antisymmetric relations.Table 2 shows the H@1 results in WN18RR, in the low-dimensional setting of SEPA, compared to the individual combined KGE.Further results on H@10 can be found in Appendix A.5.For example, if we look at the symmetric relation derivationally related form, we can see that the H@1 of TransE is very low when compared to the one of ComplEx and DistMult, and yet, our model was able to improve this metric.Similarly, when we look at an antisymmetric relation (e.g., member of domain usage) we have the opposite situation, having high performance for TransE and a lower one for ComplEx and DistMult.The intuition is that the attention base combination can effectively give more importance to the best models for the specific kind of relation involved in the query.Such intuition is reinforced in Figure 3, which shows the (averaged) attention value among the individual models for the above-mentioned relations.It shows that the attention function can effectively select the correct proportion among the models for the two different kinds of relations.
Besides, the importance of the attention function is highlighted by our ablation study, which consists of turning off the attention from our best models, SEPA at dimension 32, and SEA at dimension 500.We obtained two new versions of the models, namely SEP and SE.Tables 3, 4 show that SEPA and SEA outperform SEP and SE.

Comparison With Ensemble Models
We further compare our models to MulDE [37], which uses 64 to 512 dimensional embeddings for the teachers, and 8 to 64 dimensional ones for the junior embedding.We selected the best version of their model, MulDE 1 having 32 and 64 as junior and teachers dimensions, respectively, and, for a fair comparison, MulDE 2 , having dimension 64 for both junior and teachers embeddings.We brought our models to their setting, testing SEPA at dimension 32 (SEPA 1 ) and 64 (SEPA 2 ) for both WN18RR and FB15k-237.Table 5 shows that apart from the value of H@10 in WN18RR, our models substantially outperform such baselines, having for MRR and H@1 up to 3,46% relative improvements.Besides, we notice that when increasing the number of dimensions the performance of MulDE on H@1 in WN18RR decreases, and the ones of MRR and H@10 slightly increase.On the other hand, our models can substantially improve their performance when increasing the number of dimensions.

Comparison With Models On Ultrahyperbolic Space
Additionally, we compared our models against the best versions of UltraE [42].Even though we did not utilize Ultrahyperbolic Elements Model WN18RR FB15k-237 NELL-995-h100 MRR H@1 H@3 H@10 MRR H@1 H@3 H@10 MRR H@1 H@3 H@10 space as a sophisticated manifold containing several sub-manifolds, our models get competitive results to the state-of-the-art in the Ultrahyperbolic space (Table 6).In particular, SEPA gets competitive results in low-dimensions, while SEA in high-dimensions.One may consider using our idea to integrate approaches such as [12,42] with other baselines.However, due to their involved multiple geometric spaces, such integration will require a substantial revision of combinations of transformations and, hence, is left for future work.

CONCLUSION
In this paper, we propose a new approach that facilitates the combination of the query representations from a wide range of popular knowledge graph embedding models, designed in different spaces such as Euclidean, Hyperbolic, ComplEx, etc.We presented a spherical approach together with attention to queries to capture heterogeneous logical and structural patterns.We presented a theoretical analysis to justify such characteristics in expressing and inferring patterns and provided experimental analysis on various benchmark datasets with different rates of patterns to show our models uniformly perform well in link prediction tasks on various datasets with diverse characteristics in terms of patterns.Our ablation studies, relation analysis on WN18RR and analysis of the learned attention values show our models mainly take the advantage of the best-performing models in link prediction tasks.By doing that, we achieved state-of-the-art results in Euclidean and Hyperbolic spaces.
In future work, we will combine various manifolds besides combining the queries in knowledge graph embedding.Additionally, the proposed approach could be applied to other tasks.For example, it could be possible to use an attention mechanism to combine multihop queries computed using different complex query answering methods [27,28].

A.3 Related Work: combination between machine learning models including KGE
We review the related work corresponding to the general approaches in machine learning, including embedding that combines different models.
DuEL [18] exploits embedding models for classifying facts to be either true or false, rather than ranking them.Starting from a tail query (ℎ, , ?), it uses an embedding model to obtain the top  list of predicted answers and feeds different classifiers (e.g., LSTM, CNN) to label each answer as true or false.Finally, the predictions are ensembled using different techniques.A similar approach [26], also proposed an ensemble-based framework for fact-checking.Starting from a triple (ℎ, , ), it runs three different methods: (1) a textbased approach, (2) a KGE model, and (3) a path-based approach.It concatenates the outputs and lets a neural network compute a final veracity score.On the other hand, in our work, we propose to combine query representations of different KGE models.

Figure 2 :
Figure 2: The overall architecture of our proposed model with spherical geometry.We combine query representations of TransE, RotatE, AttE (with Reflection), and DistMult (per dimension scaling).The left part shows query integration with attention to TransE model.The right part represents query combination without attention.

Figure 3 :
Figure 3: Comparison between the importance given by each model to a symmetric (in green) and antisymmetric (in blue) relation. 1 • We utilize an attention mechanism to focus on query representations depending on the characteristics of the underlying relation in the query.Therefore, our method can support various relational patterns.Furthermore, structural patterns are captured by projecting the model onto the Poincaré ball.•We present various theoretical analyses to show that our model subsumes various existing models.

Table 1 :
Specification of query representation of baseline and state-of-the-art KGE models and respective pattern modeling and inference abilities.AttE/H include both rotation (RotatE) and reflection (RefH), hence are not mentioned in the table to avoid repetitions.• is element-wise complex product together with relation normalization.
and   = −1 if  ≠ , and  ℎ and   are trainable entity biases.Minimization of this loss function leads to maximizing the function − (  , ) +  ℎ +   .This can be approximately represented as − (   which forms boundaries for classification as well as ranking.In the next part, we theoretically show that   lies within the convex hull of the set of vectors { 1 , . . .,   }.Thus, the combined model takes advantage of each model in ranking. , ) +  ℎ +   ≥  where  is a large number.Therefore, we have  (  , ) ≤  ℎ +   −  =  ℎ −  =

Table 5 :
[37]arison between our proposed models and the ensemble models proposed in MulDE[37].Values marked '-' were not reported in the original paper.

Table 3 :
Link prediction evaluation on datasets for d=32.Best score and best baseline are in bold and underlined, respectively.

Table 4 :
Link prediction evaluation on datasets for d=500.

Table 6 :
[42]arison between our proposed models and the best UltraE[42]in WN18RR.Best score in bold and second best underlined.