Modelling an RDF Knowledge Graph with Transitivity and Symmetry for Bus Route Path Finding

A key property of Linked Data is the representation and publication of data as an inter-connected labelled graph where different resources linked to each other form a network of meaningful information. A problem of path finding can be seen as searching important relationships between resources, such as, looking for chains of intermediate nodes. In this paper, we tackle this problem in the context of public transport navigation system, where we aim to find candidates of bus route path given two bus stations. We model a novel lightweight bus network as Resource Description Framework (RDF) triples of directed bus lines and walking paths between connected stations. Indeed, we demonstrate that lightweight bus network can be achieved by exploiting the sub-property of RDF Schema (RDFS) and the transitivity and symmetry provided by Web Ontology Language (OWL). We also perform a scalability test of our approach using a real-world bus network in Bangkok, Thailand. Various patterns of SPARQL Protocol and RDF Query Language (SPARQL) query statements are validated, showing the usefulness of the RDF model. The further step of this paper is to work with bus schedules and travel time analysis in order to select some proper candidates for users through an application.


INTRODUCTION
In the past decade, graphs have become the de facto data model in some of the popular application domains, such as, bioinformatics [3,11], academic search [21], social relationship recognition [20], and transportation network [5,6,12,16,18].One of the most used variants of the graph model is the labeled graph where vertices and edges are associated with the names.In the context of semantic web technologies, this graph representation is called linked data, expressed using the resource description framework (RDF), where entities are given their names with Uniform Resource Identifiers (URIs) which uniquely identify each entity.Examples of prominent RDF knowledge graphs (KGs) are YAGO, DBpedia, and Bio2RDF.
This study is a preliminary development of a public transport navigation system in Thailand.The final goal of our development is to deal with real-time data of all transportation modes that include buses, ferries, trains, and walk, and also comprise of Thai minibuses and vans which are not currently provided by well-known applications such as the Google map direction [7].In the first phase of this project is to work with a public bus service, after that other modes are further added into the system.The key component is a bus route planner that search for several best plans with a time aspect and send them to an application programming interface for displaying mobile application.The bus route network, which are the connections between adjacent bus stations, is a big graph, creating a heavy processing load of the bus routes planner module.Thus, our work is aimed to introduce a module named "Finding Candidates of Bus Route Plans" for searching for several initial bus route plans that are potential paths between given origin and destination stations from the bus network.After that, the bus routes planner calculates the time plan of each candidate using the real-time data and statistical travel time from the bus trajectory data.
Many transportation ontologies [10,13,14,19] have been introduced.Most of them do not design and support bus transportation and its path finding.Nevertheless, the study [18] introduces an ontology focused on Public Transport by bus, commonly used in cities globally.This ontology is designed to be compatible with Transmodel, a UML specification aimed at standardizing transport system data across Europe.The other study [12] introduces an ontology for Public Transport in the bus domain designed for data interoperability and transportation.The ontology uses the Linked Open Terms methodology and is divided into three modules: (1) agencies and lines, (2) routes and stops, and (3) timetables.Both systems are capable of finding the path for the bus; however, they do not consider the sequence of paths and might suffer from readability as they cover entities more than necessary in our use case.
The main contributions of this work are as follows: • We model a lightweight RDF KG for representing a bus network.To reduce the model's complexity, our KG is formalized based on the transitivity and symmetry on relations.• We show an application of semantic reasoning and SPARQL of the semantic web for finding possible candidate bus routes.• We demonstrate our application with real-world bus line data in Bangkok, Thailand.
The structure of this paper is organized into four sections.The background and some studies are reviewed in Section 1; the method including the modelling of our RDF knowledge graph, semantic reasoning, and SPARQL queries are explained in Section 2; results and discussions are described in Section 3; and the conclusion and future work are drawn in Section 4.

METHODOLOGY
As we mentioned in the introduction, this paper mainly focuses on the module of finding candidates of bus route plans in the public transportation navigation system in Thailand.A conceptual flow of this module is broken down into two parts: (1) constructing a knowledge graph and (2) finding candidates as depicted in Fig. 1.First, the part of constructing a knowledge graph is to create an inferred knowledge graph from the ontology of bus lines and an instance graph of the bus network.The reasoning process is mainly focusing the transitive closure of the knowledge graph.Second, once the knowledge is fully constructed, the part of finding candidates is to search for bus route paths from input origin and destination stations together with the given increasing number of hops, and it returns the candidates as an output.
To describe this module; logical construct, knowledge graph of a bus network, knowledge graph reasoning, and candidate querying are demonstrated hereafter.

Logical Construct
This subsection provides logical notations and expressions of the semantic web [8,9] that are used in this paper.

Triple.
A bus line between adjacent bus stations is formulated by a triple (Eq.1), where a subject () represents a previous bus station, a property () represents a bus line, and an object () represents a next bus station.
Examples of the property are illustrated in the following.
•   (abbreviated as   ) denotes a transportation line named "A" with the direction "go".•   (abbreviated as   ) denotes a transportation line named "A" with the direction "back".•    (abbreviated as    ) denotes a transitive transportation line named "A" with the direction "go".
•    (abbreviated as    ) denotes a transitive transportation line named "A" with the direction "back".
2.1.3Knowledge Graph.The graph of a bus network is a set of triples presenting the chains of bus lines.This graph is considered as an A-Box as shown in Fig. 2(a).The triples of this example are expressed as follows: 2.1.4Logical Reasoning.The reasoning mechanism in this paper is based on the semantic web technology [9].Since we model the bus transportation network as an RDF knowledge graph, some rules of RDF Schema (RDFS) and Web Ontology Language (OWL) are discussed.In addition, the source code explaining our concept is available at https://github.com/Rathachai/semantic-bus-planning.First, the schema of each bus line in T-Box as depicted in Fig.

2(b) is described as follows:
•   is a property.
•    is a transitive property.•   is a sub-property of    .Second, the seventh RDFS entailment rule (RDFS-7) defined as a sub-property entailment is considered as described in the following.From Eq. 3,   is the relation between 1 and 2.From the T-Box,  is a sub-property of    , then we have that    becomes the relation between 1 and 2.

⟨𝑠1
Next, the transitive entailment rule of OWL allows having the chain of triples when they use the same transitive properly.The Eq. 5 shows that if the pairs of 1 and 2, and 2 and 3 are connected by the same transitive property    , it results in that 1 and 3 are connected by the same property.
Last, the symmetric entailment rule of OWL is additionally explained for representing the walk activity.In the Eq. 6, the term   denotes a symmetric property of the term "walk".Thus, if we can walk from 1 to 2, so it can entail that we can walk from 2 to 1.This scenario is used when two stations close to each other and people can walk to change a bus line.
After the T-Box in Fig. 2(b).and A-Box in Fig. 2(a) are executed by the reasoning process of some entailment rules, it results in the inferred knowledge graph, which is mostly connected by transitive properties, as shown in Fig. 2(c).This graph is an output of the part of constructing knowledge graph in Fig. 1 in order to be used for finding candidates of bus route paths.Table 1 demonstrates examples of logical queries of a knowledge graph and result according to the knowledge graph in Fig. 2(c).First, the query Q1 aims to find a bus line between 1 and 2 with 1 hop, and it returns    .Second, the query Q2 cannot find an answer as there is no one line between 1 and 6.Last, to solve the problem of Q2, the query Q3 allows having 2 hops that return 2 bus lines and 1 interchange station.Thus, to query candidates of bus route plans, it needs to begin with 1 hop and increasing the number of hops until results found.In addition, one or more extra hops are recommended in order to search for more potential candidates which may take faster travel time.

Knowledge Graph for a Bus Network
This subsection materializes our logical construct into an RDF KG in terms of T-Box and A-Box for presenting a bus route network in Fig 3 .Our KG contains 17 stations from 1 to 17; 4 bus lines , , , and  with both directions "go" and "back"; and the mode  between 15 and 16.

T-Box.
The T-Box presents the description of common terms and the application ontology of bus lines in the following RDF expression in Turtle format [2].There are three parts: namespace, Transline terms, and application ontology.First, this project initiates the domain "http://transline.org/"for custom Uniform Resource Identifiers (URIs) from three prefixes.The prefix "tc:" is used for common terms, the prefix "sta:" is for any stations, and the prefix "line" is for any transportation lines such as buses, train, etc.
Second, the Transline terms contains common terms for this project.The term "tc:PlanItem" is used as a marker for any properties of bus lines which are needed to included in candidates of bus route plans.Next, the term "tc:walk" represented the mode of walk and it is a symmetric property.

Knowledge Graph Reasoning
To perform knowledge graph reasoning, we employ the reasoning mechanism from RDFS and OWL [9].In this case, Python libraries named RDFLib and OWLRL are used.
Let g is an RDF graph created by the library RDFLib and this graph containing T-Box and A-Box, the deductive closures of RDFS and OWL from the libraries OWLRL is shown by the following example Python statement.The "DeductiveClosure" is a function to expand an original RDF graph with inferred triples.The arguments "owlrl.OWLRL_Extension" and "rdfs_closure = True" are used to enable the reasoner with OWL and RDFS entailment, respectively.owlrl.DeductiveClosure(owlrl.OWLRL_Extension, rdfs_closure = True).expand(g)

Candidates Finding using SPARQL
To find candidate bus paths from the inferred knowledge graph, we perform querying from SPARQL Protocol and RDF Query Language (SPARQL) statements [9].As demonstrated in the logical query in the previous subsection, we aims to query the inferred knowlege graph from Fig. 3  We show various patterns of SPARQL Protocol and RDF Query Language query statements for potential scenarios in the following.− −−−− → 6 in to a SPARQL statement.This query aims to find a transportation line ?p from sta:s01 to sta:s06.In this case, it needs to select only line marked as a plan item in order to ignore other noisy predicate, so the ?p must be a tc:PlanItem as defined in the T-Box ontology.

Scenario 2:
From 01 to 08.This scenario is firstly query with 1 hop which is similar to the previous scenario, but there is no result returned.It means that commuting from 01 to 08 is not possible done by 1 transportation line.Thus, querying with 2 hops is considered.

FILTER ((?p1 != ?p2)) }
This query finds two transportation lines ?p1 and ?p2, and an interchange station ?x1.In addition, to avoid some noisy results, it needs to filter out the connected same lines by the condition ?FILTER ((?p1 != ?p2)).The results of all variables are ?p1=line:A_gt,?x1=sta:s03, and ?p2=line:B_bt.It means that, from the origin station s01, take a bus line A to the station s03, and change to the bus line B to the destination station s08.

RESULTS
This section demonstrates the implementation of our approach using the sample group of bus lines in Bangkok, Thailand.There are bus line numbers 8, 23, 36, 73, 95, and 145, which are selected and visualized in Fig. 4. Our knowledge graph uses a resource patterned sta:busnode_id as a bus station where id is a station id such as sta:busnode_5188; and line:bus_num_dir as a bus line where num is a line number and dir is a direction such as line:bus_145_g.The raw graph contains 659 bus stations and 154 of them are interchange stations.There are 41 triples in the T-Box and the A-Box includes 855 triples for bus transits together with 230 triples for walk transit between two stations having Euclidean distance not more than 100 meters.Our experiment is done against use cases which are pairs of origin and destination stations in the real map.

Results against Use Cases
To demonstrate an ability to find candidates using the inferred knowledge graph and queries, four use cases with different origin and destination stations and different hop numbers are tested and reported in this subsection.The result is written in the format "(sta:busnode_001) → line:busline_8_gt → (sta:busnode_002)", where stations are written in round brackets and arrows show direction from left to right.In addition, the star (*) symbol at the result is also a path returned from the Dijkstra's algorithm [4], which provides the shortest path.

Discussion
To state much more advantage of our approach, experiment, and result, this subsection discusses about our experimental results, research positions, challenges, and the beyond steps.First, the aim of our work is to model the bus route network in an RDF graph, and use RDFS and OWL entailment to infer the knowledge graph especially via the transitive closure.The results of our experiments against four use cases demonstrate that the schema, the knowledge graph structure, the reasoning processes, and the query patterns are able to solve the problem of finding candidates of bus paths between given origin and destination stations.For example, in the 3rd use case, one results of 3-hop paths from the station 5188 to the station 4784 is presented in Fig. 5.
Second, every use case includes a result from the Dijkstra's algorithm [4], thus, it can be rechecked that the set of returned candidates contains a shortest path.Comparing to the related studies, the work from [12] used a bus line as a URI resource, and one URI resource was a subject of many triples of begin stations and end stations, so the raw data in the A-Box were visualized as a hairballlike graph.Moreover, the sequence number was an attribute of a station, so this concept was suitable for private intercity buses whose stations were not shared with other bus lines.In addition, the study of [18] created many URI resources of bus route subsections for a bus line, so the visualization of the raw graph became easy to read by human.In this case, developers who maintained the graph must create a new resource for every pair of stations, so it seemed that it created more terms than necessary.In our approach, we aims to create a lightweight graph for both machine-readable and human-readable.Thus, data in the A-Box should be simple and easy to maintain by developers who are not expert in semantic web.Modelling a directed transit line as an RDF triple becomes straightforward in terms of a bus network representation, and it enables the reasoning mechanism executable on complex tasks for inference and querying.In addition, creating the necessary number of URI resources prevents the complexity for those who maintain the graph data.In our work, the developers have to create only new bus stations and bus lines if new ones are added.
Third, the challenge of our approach is to deal with bus lines that are continuously overlapping in a long route covering many stations.For example, a bus line b1 covers stations from s1 to s6, and a bus line b2 covers stations from s2 to s7.If a passenger travels from s1 to s7, it needs two hops of the bus lines b1 and b2.The candidates of this case having different exchange station are (1) Thus, the module that selects proper candidates has to realize this case, and considers a strategy to select one of these.In this case, our recommendation is to group all candidates by the sequences of transpiration line, and select one candidate from each sequence.In addition, some approaches about graph centrality [1,15] can be considered for indicating some important nodes, which might be less number than working with all stations, in order to reduce computation time of the path finding process.
Last, since this paper focuses on generating candidates of bus paths, the further steps are working with the whole bus network and other transportation modes, and provide several best paths to users.Due to no open data of public transportation network in Thailand, the former task consumes high human effort for collecting and clean data.We wish that when it is done, it becomes another contribution to the smart mobility domain.For the latter task, it requires temporal data of travel times between connected station to compare the travel time of each candidate.In this case, the data of bus schedules, historical data of bus trajectories, and real-time GPS data of busses must be analyzed.

CONCLUSION
This paper proposes an approach to the modelling of bus route network and query for candidates of bus route plans using the semantic web technology.Since this paper is one module of the bus navigation project, it aims to find all possible candidates of bus paths from given origin and destination stations and the number of hops.To materialize this module, we firstly introduce a schema and an instance of our Resource Description Framework (RDF) knowledge graph for a bus network of which nodes are bus stations and edges are transportation modes such as bus lines and walks.Each bus line with a direction is a directed relation defined as subproperty of the transitive property.Secondly, using the reasoning mechanisms of RDF Schema (RDFS) and Web Ontology Language (OWL), all nodes linked by the same bus line are connected in an inferred knowledge graph and it is ready for query.Next, we suggest several patterns of queries based on the SPARQL Protocol and RDF Query Language (SPARQL).There are one or more than one hops for finding transit lines and interchange stations between origin and destination stations.Last, we create a knowledge graph from 6 bus routes of the real bus network data in Bangkok, Thailand.Using the RDFS and OWL reasoning and SPARQL, some candidates of bus route paths from given conditions are generated.
In the future, we are going to analyze the travel time, waiting time, and bus schedule from the global positioning system (GPS) data of busses.We also plan to include the data into each queried candidates when recommending top potential route plans to users.Another idea could be to combine both reasoning about conceptual schemata with rules in order to support non-deductive reasoning patterns from the modelled RDF bus network as done in [17].

Figure 1 :
Figure 1: Conceptual Flow of the Module of Finding Candidates of Bus Route Plans

Figure 5 :
Figure 5: Example result of a 3-hop Bus Path

Table 1 :
Example Logical Queries of a Knowledge Graph (?1)