A Comparative Analysis of Symbolic and Deep Learning Based RDFS Materialization

The advancement of both symbolic reasoning and deep learning techniques have opened new fronts for knowledge graph (KG) processing. Knowledge graphs commonly store data that adheres to the Resource Description Framework (RDF) model, the cornerstone of the semantic web vision. As reasoning is one of the core functionalities of KGs, RDF-Schema, a set of semantics for the enablement of reasoning over RDF graphs, is often a first-class citizen in these graphs. Within that context, To reason is to infer new data based on existing ones, alongside a set of rules, such as RDF-Schema. This process however is very computationally expensive, to the point that it cannot happen when the KG is being queried, as that would inquire too much latency to the response. An alternative is to materialise the data before query time, which is to compute all inferences, and store them alongside the original data. Our contribution lies in a comparative study of two approaches of materialization: a deterministic, datalog-based approach, and a connectionist one, using graph neural networks. This paper provides an analysis of how both of these work, what are their tradeoffs, and how do they perform with respect to each other. We focus on a point that hasn’t been discussed so far in literature, which is in the practicality both of implementation, and their runtime. Empirical experiments were made as well, and while as expected deterministic reasoning was much faster, inference with graph neural networks did not consume much more memory, and showcased promising attributes that are not available in the former approach.


INTRODUCTION
The Semantic Web [11] is a vision of the next generation of the internet, in which data is not only accessible by machines, but understandable too, potentially enabling more intelligent and userfriendly services.While its proposals and technologies haven't come to mainstream adoption, it still remains relevant.We posit that it will become ever-increasingly so, as it is a promising medium to improve performance and reliablity of autonomous agents, a topic that has seen a stark increase in interest since the advent of LLMs [9], as it could shift the problem from learning on unstructured data, to reasoning over graph-structured data.
Knowledge graphs [13] (KGs) are a recent use-case that has propelled semantic technologies back to the spotlight.These graphs consist of nodes and edges that represent entities and their relationships, and are fit to scenarios that require highly contextualized information.The manner in which contextualization is acquired, is through reasoning.KGs are often physically stored, be it in memory or on disk, in their unmaterialized form.A major difference from a database to KG, is that the latter might also contain inference rules over the data.These rules imply the notion that there is implicit data, inferences, that follow from explicit data, also referred to as ground facts.Let  be a KG and Π be a set of rules, The materialization of Π with ,  , is the union of  with all of its implicit consequences  =  ∪ Π().
In order for there to be reasoning, there has to be a formal infrastructure to ensure that it is sound, well-defined, and correct.Semantic technologies are a collection of data models and formal models that are tailored entirely to large-scale reasoning on the web.The most fundamental component is the Resource Description Framework (RDF) model [3].Data is shaped as a multigraph, with labeled nodes, called resources, being represented as triples in the form of statements composed of a subject , the source node, predicate , the edge label, and object , the target node.RDF graphs  are most commonly stored as on Figure 1.This graph is remniscent of the NTriples [2] format, and most easily conveys the nature of RDF.In the first triple, FullProfessor7 is the subject, headOf is the predicate, and Department0 is the object.There are two main kinds of predicates.The first are rdf:types.The RDF model attaches semantics to certain nodes and edges, with rdf:type indicating that  ∈ .The second is as the predicate of the first triple: (FullProfessor7, Department0) ∈ headOf, in this case, headOf is referred to as a rdf:Property.
Materialising that figure's graph would be to complete it, such that no more information could be derived.The following rule (?, rdf:type, rdf:Property) ← (?, ?, ?) reads as: for every triple (?, ?, ?), add a new (?, rdf:type, rdf:Property) triple to the graph.This dictates the RDF semantics for what a property is, hence, with respect to Figure 1, there would be two inferences that directly follow: (headOf, rdf:type, rdf:Property) and (teacherOf, rdf:type, rdf:Property).These inferences alongside the original data, are the materialisation of that rule with respect to that data.Due to reasoning in itself being costly, materialisation is paramount to the practical usage of knowledge graphs, as in case one is incomplete, reasoning would need to be done during query time, which could pose unnaceptable latency.
The solution is to then first materialise the KG, and then to query it.To this date, the most performant way to materialize is through the usage of datalog [5], a logic programming language that is able to model many of the expressive logics from the semantic web.The object of this article is RDF-Schema [4] (RDFS), a restriction of RDF that adds enough semantics to ensure that any data modelled with it supports non-trivial reasoning.
While there has been much investigation on efficiently materialising RDFS graphs with deterministic methods, research on materialisation with machine learning [8,17] has been limited to figuring how to do so either accurately, or in the presence of false or noisy data, with no respect to scalability.There are multiple Word2Vec [10]-inspired embeddings for RDF KGs [6,22] that allow for mapping the problem to edge-prediction with Graph Neural Networks (GNNs).The practicality of doing so, however, relative to the deterministic approach, has been unresearched.This gap in the literature signals an opportunity for an examination of the two approaches.This work aims to bridge this gap by evaluating the performance of each step of a GNN-based method, in contrast to Datalog, for KG materialization.Through a comparative analysis, we seek to uncover the potential and limitations of both methodologies and explore their applicability for data-intensive scenarios.

PROBABILISTIC AND DETERMINISTIC APPROACHES
We start by most precisely defining the RDF data model, in order to follow up with the definition of both deterministic and probabilistic reasoning over it.There are multiple kinds of RDF nodes.An IRI Node, as abbreviated with < FullProfessor7 >, is a global identifier with respect to an IRI, which is a generalization of URI, Uniform Resource Identifier, extending it by using the Universal Character Set [25] in place of ASCII.Blank nodes _ : something indicate the existence of something, but of no particular thing, as IRIs do.
A literal node represents a value, which can either be untyped, or typed with the allowed RDF datatypes.We will not discuss these, and will ignore the rules that follow from them, as they significantly add complexity that is heavily use-case dependent, and not as general.Let I be the set of IRI nodes, B, the set of blanks, and L the set of literals.If  ∈ I ∪ B,  ∈ I, and  ∈ I ∪ B ∪ L, then ⟨, , ⟩ is an RDF Triple.An RDF Graph is defined as a finite set of RDF triples.

Deterministic Reasoning
The rules that were introduced are Datalog rules.These rules have the form with   representing an assumption that ( 1 , ...,   ) ∈   holds, and if each   holds, then ℎ, the head atom, holds.  are referred to as atoms, those whose terms are constant are said to be ground facts.
A KG only ever contains ground facts.All rules shown so far are datalog rules.The essence of Datalog-based RDFS materialization lies in the execution of program 1 to infer new RDF triples, and an iterative process, repeatedly inferring and adding new triples to the dataset until no further inferences can be made, eventually reaching a fixpoint.Algorithm 1 describes the naive evaluation of a datalog program.It is a direct translation of Datalog's evaluation fixpoint semantics.Every application of  will yield the union of all inferences that directly stem from the current value of  , repeatedly applying it until it equals to itself will happen at the fixpoint that is guaranteed to exist, so as long as  is finite.It is possible to get the lowest bound of the number of iterations of a rule until it reaches its fixpoint.Taking the sub class transitivity rule, due to the relationship between conjunctive queries and relational algebra [1], its equivalent equation is:  (1,rdfs:subClassOf,4) (T 2=1 T).Hence, if Apply T to I until  = T( ) 8: end procedure 9: End: Return Materialise( ).
there are  1 , ...,   classes in which all (  ,  +1 ) ∈ rdfs:subClassOf with  ≤ , there will be a need for at least  − 1 joins to fully materialize this chain of subclass relationships.Each join represents the materialization of the transitive property across adjacent pairs of classes, leading to the inference of direct subclass relationships between non-adjacent classes.
In more complex RDF graphs with multiple intersecting chains of subclass relationships, the number of joins needed can increase significantly.The total number of joins relies on not only the structure and complexity of the RDF graph, but on the density of the subclass relationships as well.An approximation of the maximum number of joins needed can be expressed as a function of the depth of the deepest chain of subclass relationships and the breadth, capturing the intersecting relationships among different transitive chains.Let  be the depth and  the breadth, an estimate of the total number of joins | | needed for complete materialization could be expressed as | | ≈ ( − 1) • .
In practice, Naive evaluation is not used.Semi-naive evaluation [5] is the de facto standard method, with the difference being that only the most recently inferred data can progress the computation.This does not however decrease the number of iterations, as all chains can still only progress one-hop at a time.This limitation is not present with probabilistic reasoning, as it has been shown that sufficiently-deep neural networks can learn multiple hops at once [8].

GNN-based reasoning
Graph Neural Networks (GNNs) have risen as a promising class of machine learning models for graph-structured data [24].In particular, it has shown respectable performance in applications such as edge prediction, known as link prediction (LP) tasks, often surpassing traditional techniques and other machine learning models [14,21,27,28].We do however comment that this applies only to inference, which as we demonstrate on results, is but a small part of the whole process.Deterministic approaches on the other hand, do no training whatsoever, and do not require embeddings.
LP is an important problem in graph analytics and network science that focuses on determining the probability of connections between nodes in a graph.In RDF graphs, this process would involve predicting the missing entity in a relational triple, seeking < , , ?> (object prediction) or < , ?,  > (predicate prediction).In this context, the known entity in the prediction is referred to as the source entity, while the entity to be predicted is termed the target entity.RDFS materialization can at first sight easily be put as an LP problem.There are two prediction targets, Classes and Properties.All rules in program 1 either infer new rdf:type triples, or properties.Property assertions are all that do not have rdf:type in the predicate.There is a third dimension however.An RDF graph with RDFS semantics is not exactly a multi-graph, as it has two sub-components, the Ontology, known as TBox, and individual assertions, the ABox.
TBox rules can be materialised independently of the ABox, but the same does not hold for the latter, as individual assertions state facts with the terminology outlined in the ontology.This makes probabilistic reasoning significantly harder, hence most of the mentioned deep-learning approaches do not attempt to learn rules 3 and 4 of program 1, and pre-materialise the TBox with a deterministic reasoner before any training.
All mentioned articles follow the same general structure for encoding RDF Graphs.We choose to focus on the standpoint of Makni et al [16]'s work, as it is the only that has extensive information about the process as a whole alongside with significant amounts of code being publicly available [15].

RDF Graph Matrix
Representation.The first step is to represent an RDF graph in a manner amenable to deep learning.This is done by layering the graph as adjacency matrices, as depicted on Figure 2.Each graph layer corresponds to a specific aspect or property, which is then encoded in a 3D adjacency matrix.Layers can be visualized as stacked 2D matrices, with each cell representing the relationship between a pair of nodes in the graph.

Graph Words.
Once the RDF graph has been layered and encoded as a matrix, graph words and sequences can be formed.Words are formed by extracting substructures from adjacency matrices, which can then be arranged into sequences representing the input graph and its entailments.These sequences serve as the input and output for the neural machine translation process and state the relationships and entailments within the RDF graph.
A graph word is a unique combination (, , ) that maps to a triple.A graph sequence is a series of graph words that represent a set of interconnected triples in the RDF graph.It is a collection of (i,j,k) positions that define a particular subgraph or a specific pattern within the RDF graph.Each transitive hop can be modelled as a graph sentence, and inference in general is modelled as sequenceto-sequence learning.

2.2.3
Inference.RDFS inference can be visualized as neural machine translation of RDF graph word sequences.The model takes the sequences of graph words as input and generates new sequences that represent the inferred relationships and entailments within the RDF graph.This translation process can be depicted as a series of transformations applied to the graph word sequences, ultimately leading to materialisation.This architecture can be likened to that of sequence-to-sequence models commonly used in machine translation.In this scenario, the "language of origin" consists of graph sequences that capture the known RDF triples, while the "target language" is the set of entailments and new triples we aim to infer.
We give a general outline of the meta architecture most commonly followed by all referenced articles as follows: • Encoder: The encoder component ingests sequences of graph words, representing known triples, and processes them through layers of GNNs.Each layer captures a different level of relational complexity or hierarchy in the data.The final layer outputs a context-rich embedding or representation of the input graph sequence.• Intermediate Processing: Between the encoder and decoder, there might be additional processing layers.Techniques like attention mechanisms [26] can also be applied to weigh the importance of different parts of the input graph.• Decoder: The decoder begins with the rich embedding from the encoder and starts generating the output sequence.In this context, it generates a of graph words that represent the inferred triples or the entailments.As it produces each graph word, it takes feedback from the previous steps to ensure the generated sequence's coherency and correctness.
The training phase involves feeding it with known RDF triples, and their corresponding entailments, to then adjust the model's weights to minimize the difference between its predictions and the actual entailments.For inference, the trained model is fed with a new set of triples, and it produces a set of inferred triples that follow.

EMPIRICAL EVALUATION
In order to empirically evaluate the practicality of probabilistic reasoning, we focus not in measuring accuracy, as it has been shown to be sufficent in most standard datasets [17], but its performance instead, as this is the biggest impairment to the wide adoption of reasoning in general.If it holds that the determinisic method is much faster, it would signal that more research in scalability is warranted.
To evaluate the approach that uses Datalog, we implement a datalog reasoner from scratch, that uses semi-naive evaluation, and directly executes the RhoDF program1 as it is written.We note that this is not an efficient program to run, as it only contains one single relation, and almost all rules are mutually recursive with each other.On the other hand, there are no deep class nor property hierarchies, thus it will not require an excessive number of iterations.
For the GNN-based implementation, we attempted to reproduce the work laid out by Makni et al [16].There were issues in attempting to do so, primarily due to insufficient documentation and disparities between the paper and its corresponding GitHub repository [15].In spite of this, we did manage to get results that were similar to those of its publication, therefore vouching for the validity of its approach.
The core of the sequence-to-sequence architecture is as follows, with as most context-specific descriptions as it is reasonable: (1) Input Embedding: • Output Shape: (18, 3200) (2) Dense Transformation: • Output Shape: (18, 256) (3) Bidirectional GRU Encoder: • Output Shape: (None, 256) • Description: Bidirectionality is important as it could help to uncover multiple associated reflexive relationships.If there is a graph word saying that a professor teaches a course, and the inference task is to predict the subject, then without one backwards pass, the information that courses are often taught by professors might be missed.(4) Repeat: • Output Shape: (18, 256) • Description: only eighteen triples are infered at a time (5) GRU Decoder: • Output Shape: (None, 18, 128) (6) Time Distributed Dense Softmax: • Output Shape: (18, 490) • Description: 490 represent all possible inference graph words.

Experiments
The dataset of choice is LUBM [12], an infinitely scalable semantic web dataset that contains triples that cover the full expressiveness of RDFS, hence encompassing its minimal fraction, RhoDF.The dataset is synthetic, and can be generated in multiple sizes, with the smallest being 100000 triples, up to billions.We stick to LUBM1, the smallest, as most of the articles referenced up to hours long training times for its larger variants.LUBM is the de-facto dataset for evaluating RDFS reasoning used in all referenced articles [7,16,20], hence our choice provides an immediate point of reference for evaluating the results of this paper.All experiments were run on a bare-metal machine with ubuntu 22.04 using kernel version 5.15, on a AMD Ryzen 7 2700X 8-core processor with 32 gigabytes of ram.The GPU used was an NVIDIA GeForce GTX 1070.No other processes were running during evaluation, and all measurements were re-run enough times in order to obtain high-confidence means.The Central objective was to contrast the performance of Datalog and GNN-based methods.We first discuss Datalog, as it is the baseline established method.Table 1 illustrates the relationship between dataset size, inference time, and memory usage across different LUBM datasets using the Datalog-based method.Within the given physical limits, it was possible to compute the materialisation of up to LUBM50 in a reasonable amount of time, barely taking over a minute.Highly specialized state-of-the-art datalog engines such as [20] can be up to one-to-two orders of magnitude more memory efficient than our reasoner.Our reasoner was implemented in the Rust programming language, and is publicly available [23].Table 2 displays information about the time taken to attempt to of peak memory usage.The approach taken is far from being the optimal way both algorithmically and in its implementation.With these steps being relegated to a lower-level language, or through the usage of some heavily optimised library, we believe that it could take a much smaller amount of time, hence we do not claim it to be a deterrent to usage.The training time, as it is often the case, takes significant amounts of time, and outpaces inference.Given that this would only need to happen once, we believe that this too does not deter it from being used in practice.
Inference consumes only up to three times as much memory as datalog, and its output is much different.While with datalog the input to materialisation is the whole graph, with the GNN-based approach the input are sub-graphs, out of which there are close to 20000 in the original dataset.The accuracy of the model is very high, night short of 99% accuracy for each tested sub-graph.This validates the results found in [16].Other adjacent approaches, such as with memory networks [7] might generalize to different datasets through transfer learning, but it never gets high enough to be nighdeterministic, with accuracy varying from model to model, nor does it output the actual materialisation, as it only evaluates whether a fact could be part of the materialisation.
The time taken for Inference is how long it takes for one subgraph to be processed.Using the CPU, this takes from around 1 second for the simplest graphs, up to 30 for the densest ones.This puts a lower bound on the materialistion time of 20000 seconds.In spite of GPUs and FPGAs being able to provide double-digit faster training and inference, we argue that it still is far too slow to be utilized in data-intensive scenarios.

CONCLUSION AND RELATED WORKS
This paper set out to bridge the gap in the current literature between deterministic and probabilistic methods of RDFS reasoning.The traditional method, which uses Datalog for this process, and the more recent GNN-based approach, which employs machine learning to derive implicit knowledge, were both empirically evaluated and compared with each other in terms of performance.Both methods were also introduced and analyzed from their foundational concepts to their execution.
Our findings suggest that while GNN-based methods seem promising, particularly due to their ability to perform multi-hop inference and in utilizing a GPU and all highly developed high-performance computing infrastructure around it, its applicability remains constrained by significant performance bottlenecks.On the other hand, the deterministic method utilizing Datalog was shown to be more much more efficient, showing linear scalability in the LUBM benchmark, alongside much smaller memory usage.This study doubles down on the strength of deterministic reasoning for RDFS materialization, particularly in situations where performance and scalability are paramount.However, the potential of GNNs should not be understated.With further advancements in technology and architectures, GNNs might bridge the current performance gap.
GNN-based materialisation of KGs is a problem that falls in the sphere of Neurosymbolic AI, the sub-field that bridges both symbolic and connectionist approaches, most commonly by utilising graph representations of data and reasoning with machine learning models.A thorough overview of it can be found at [18].As previously mentioned, this problem has been largely unexplored, with only a handful of attempts with not much success.We highlight the most successful ones.The most performant and accurate models have been those that train and evaluate on the same KG [17], most commonly through sequence-to-sequence learning, as it has been shown in this article.The issues with this method are that it cannot be transfered to other KGs.If one trains on LUBM1, then that model cannot deduce any data from LUBM2.There have been only two works [7,8] that have succeeded in transfering deduction knowledge between KGs, albeit with very small amounts of data.For future research, we believe that the most promising way to go forward is to adapt the work of [7] to a model that has limited transferrability i.e Instead of attempting to transfer learn from one RDFS graph to another completely different one, transfering to one that shares the same TBOX, and perhaps the same ABOX but with more data, might yield more promising results.

<
F u l l P r o f e s s o r 7 > <headOf > < Department0 > .< F u l l P r o f e s s o r 7 > < r d f # t yp e > < F u l l P r o f e s s o r > .< F u l l P r o f e s s o r 7 > < t e a c h e r O f > < Course10 > .

Figure 1 :
Figure 1: RDF Graph in an abbreviated N-Triples-like format