An RDF Benchmark for Enriched Maritime Data

This paper provides an RDF benchmark for triple stores for querying maritime data. The benchmark comes with an integrated data set of real-world data from diverse sources, which has been transformed to RDF triples, and a set of 15 SPARQL queries of varying complexity. Differently from previous available benchmarks that have focused on spatial RDF data, our work targets RDF data about trajectories of moving vessels, therefore the focus is on spatio-temporal RDF data. We present the process to generate the integrated data set, we delve into the details of the benchmark queries, and we provide evaluation results using a selected RDF triple store as a showcase. Furthermore, we make all data, queries and results publicly available to stimulate further research in the field.


INTRODUCTION
The availability of enriched mobility data sets in the form of semantic trajectories is critical for developing novel applications that rely on advanced data analytics.Indeed, mobility analytics [4,11] of moving objects (i.e., humans, animals, vehicles, vessels, aircraft) based on heterogeneous data sets that go far beyond plain positional information (i.e., trajectories) is a challenging research topic.
Despite this evident need for enriched mobility data sets, there is a notable lack of such real-life data.To address this limitation, in this work, we present a benchmark for integrated data for the maritime domain.The benchmark includes vessel trajectories from public data sources that have been enriched with static vessel information, weather conditions, as well as contextual information related to ports and protected areas.To represent this diverse set of data sources, we use a common OWL/RDFS schema (an ontology) and we transform all data into RDF triples.As such, the benchmark consists of enriched data, represented in the form of RDF triples, and a set of SPARQL queries of varying diculty that can be used for evaluation.Moreover, we provide indicative evaluation results of the proposed benchmark over an open-source RDF triple store, namely Blazegraph (https://blazegraph.com/), which was the basis for Amazon Neptune (https://aws.amazon.com/neptune/).
Summarizing, in this work, we make the following contributions: • We provide a benchmark for enriched maritime data that originate from diverse data sources and have been transformed in a common representation format using RDF.• We dene a set of meaningful SPARQL queries, including queries with spatial and spatio-temporal constraints, that cover various real-world scenarios of use.• We make the enriched RDF data, the SPARQL queries, as well as our evaluation results publicly available to stimulate further research on the eld.
The remaining of this paper is structured as follows: Section 2 provides an overview of related work.Section 3 provides the descriptions for the raw data sets that were integrated in RDF.Section 4 briey presents the common representation schema which is used to represent enriched trajectories of vessels.Section 5 provides a brief overview of the queries, while Section 6 describes in detail the SPARQL queries that are used in the proposed benchmark.Section 7 presents the results of our evaluation.Finally, Section 8 concludes the paper.

RELATED WORK
RDF benchmarks are a valuable resource for the principled evaluation and comparison of RDF triple stores.Many benchmarks exist already and are widely used, including BSBM [3], LUBM [8], SP2Bench [16], WatDiv [1], and [7] for streaming RDF data.However, criticism about the representativeness of RDF benchmarks with real RDF data sets does exist [5], which gives rise to the need for benchmarks based on real-life data sets.For example, Lam et al. [10] evaluate dierent RDF triple stores using data from Wikidata.
Despite the existence of various benchmarks for RDF queries, there is very limited work on benchmarks for real-life, integrated data, which have both spatial and temporal dimension.This is particularly the case for mobility data, as in the maritime domain [2].In very few cases, portions of integrated data are made available [12], but either they are relatively small-sized or they are not accompanied with a well-dened set of queries.Another limitation is that less focus has been put on spatio-temporal data, whereas existing benchmarks are mainly geospatial [9].The challenges are manifold: large, real-world data sets are required, along with comprehensive and domain-specic queries, including spatio-temporal constraints as maritime RDF data has a strong spatio-temporal nature.1: Summary table of the data sets used in the benchmark, the number of records in each data set and the generated triples, as well as the transformation time and time required for loading the triples to the store.
To address this shortcoming, we develop a new benchmark for big integrated maritime data.This benchmark consists of integrated data in RDF as well as of a set of domain-specic queries that correspond to real-world requirements for querying integrated maritime data.The integrated data set consists of more than 300 million RDF triples and spans a temporal period of one month (January 2021).The main data sources that were integrated are AIS1 messages, weather data, vessel characteristics, ports, and areas of interest in the maritime domain.

DATA SOURCES, DATA COLLECTION AND TRANSFORMATION
We consider a set of diverse data sources for the preparation of this benchmark.At a high level, the set of data sources can be classied in (a) static and (b) dynamic data sources.The static data sources contain archival data sets with information about the maritime domain: vessel characteristics, ports and protected areas.
The dynamic data sources correspond to time-evolving (streaming) data and mainly include the positions of moving vessels as well as the respective weather conditions.Table 1 provides a summary of the data sources that were considered during the data collection and transformation in RDF.For each data source, we report the number of records, the number of generated triples, the time required for converting the data from its raw format to RDF triples, and the time required for loading the RDF triples into the selected triple store (Blazegraph).
The data transformation to RDF has been performed using RDF-Gen [14,15], which supports RDF data generation from dierent data sources, both archival and streaming, with salient features such as state-of-the-art performance, scalability and exibility.
More specically, the data sets used in this benchmark include 2 .Seaports.This data set has been compiled into a 10-column CSV le, from diverse online web sources.The rst two columns report a short and a full name respectively, the third column reports the United Nations Code for Trade and Transport Locations (UN/LOCODE), which is a geographic coding scheme developed and maintained by United Nations Economic Commission for Europe.The fourth, fth and sixth columns provide the spatial information, i.e., longitude, latitude and Well Known Text representation of the position of each seaport.The remaining columns report features and facilities of the seaport such as lift and cranes, railway terminal, etc.A brief description and a URL providing more details are also reported.This data set contains 6,036 records, which has been transformed to RDF triples in 3.2 seconds generating 96,576 triples (in which 49,848 are unique triples not being available in other data sets, e.g., in vessel characteristics).
Vessel Characteristics.The vessel characteristics data set is also retrieved from online web source3 , compiled by the Norwegian Coastal Administration's SafeSeaNet solution, from messages transmitted by passing by ships (static AIS messages).The data set contains information about vessel dimensions, vessel type, port of departure and port of arrival.Sensitive information has been removed before making the data set publicly available, and data are organized as one le per year based on the time of arrival of the message.We use only the le that corresponds to year 2021, reporting 158,186 records, which has been transformed to RDF triples in 163 seconds, generating 5,243,559 triples (in which 5,171,127 are unique triples not being available in other data sets, e.g., partial description of seaports).
Protected Areas.While there can be several types of regions of interest in the maritime domain (e.g., shing areas, endangered species habitat areas, exclusive economic zones, etc.), we employ Natura2000 regions located around Denmark and the Scandinavian Peninsula.Natura2000 is a network of core breeding and resting sites for endangered species, as well as rare natural habitat types which are protected in their own right.The aim of the network is to ensure the survival of threatened species and habitats, listed under both the Birds Directive and the Habitats Directive.Since Natura 2000 is not a system of strict nature reserves from which all human activities would be excluded, we expect that moving vessels reported in the surveillance data sets will not cross these regions.Figure 1(left) illustrates the selected Natura2000 regions.We have selected 1,882 Natura2000 regions, which are transformed into RDF triples in 1.1 second, generating 22,584 triples.
AIS data -NCA.Regarding the dynamic data we use a surveillance data set which reports positions of vessels, as provided by the Norwegian Coastal Administration's SafeSeaNet solution, for the area illustrated in Figure 1 (middle).Each reported position in this data set is annotated with a timestamp.The data set can be therefore used to reconstruct the trajectories of the vessels, and derive movement patterns that can be further used on event recognition or movement prediction.The data source provides the reported positions in a separate le per day.We process the records of a single month, namely January 2021.During the data transformation to RDF triples, we enrich the reported positions with the nearby weather conditions as provided by Copernicus Knowledge Base4 (ERA5).We use 5,863,343 reported positions, which were transformed into RDF triples in 34,179 seconds, generating 246,260,364 triples (in which 164,793,359 are unique triples not being repeated in the generated le, e.g., description of weather conditions).
AIS data -DMA.This surveillance data source provides positions of vessels only within the Norwegian EEZ.We extend the surveillance data set with data from http://web.ais.dk/aisdata/ .The zipped le for 202101 found on DMA.dk is 15GB and it has been reduced by removing: a) repeated information (type of xing device, type of ship, etc.) in each record, b) removing AIS messages transmitted by non-vessel devices(e.g., base station), c) ignoring vessels that IMO is not provided in DMK.dk neither in NMA.We use 1,767,560 records for 202101 from this data set, which have been converted into 74,237,478 triples in 10,258 seconds.
Weather data.The weather conditions data set is another dynamic data set.It is not being transformed into RDF triples as a separate source however, since we are interested only in weather conditions at reported positions of vessels (i.e., the weather conditions of areas without any vessel activity have no interest and can be omitted).Figure 1(right) illustrates the reference positions of the weather conditions reported (for January 2021), forming a grid of granularity 0.5⇥0.5 degrees.We enrich vessel positions with the corresponding values of weather variables based on the nearest reference position to the vessel position.The data set is provided as a binary GRIB2 le.

THE VESSELAI ONTOLOGY
The VesselAI ontology [13] provides a generic conceptual framework for the representation of semantic information related to the trajectory, the moving object and the contextual information in the maritime domain.We explain the basic notions of trajectory and moving object, as well as event and contextual information.In the VesselAI ontology, we extend and rene the representation of semantic trajectories that was provided in the datAcron ontology [17].Also, we provide the description of events that can be either triggered by moving objects or that can aect moving objects, as well as a rened description of contextual data in the maritime domain.
Figure 2 illustrates the modules and their links comprising Ves-selAI ontology.The core of the ontology is the module describing Trajectories of moving objects.Each trajectory is always associated with exactly one moving object which is described in the Vessel Characteristics module.Trajectories are composed by trajectory parts, which can be related to Contextual Information such as regions or points of interest, via topological or proximity relations.Also, trajectory parts can be associated with simple (e.g., turn, stop, etc.) or complex events (e.g., shing, loitering, etc.), as well as weather conditions via either qualitative or quantitative descriptions.Description of events and weather conditions are provided in the Events/Weather Conditions module.Spatial representation of entities in modules is Contextual Information and Trajectories is supported by GeoSPARQL.The VesselAI ontology currently comprises 453 classes, 198 object properties and 54 data properties.

Trajectories of Moving Objects
This module provides the description of trajectories of moving objects and the constituents of a trajectory according to the reported surveillance data.Descriptions of the properties that relate trajectories to moving objects and regions of interest described in contextual information module (such as departure and destination seaports) are also provided.A trajectory is composed by a sequence of trajectory parts, which can be distinguished to the RawPosition (i.e., a reported position of the moving object), the Node (i.e., a reported position annotated with contextual data), and the Segment (i.e., includes at least two raw positions or nodes).For example, each reported positions of a vessel in a surveillance data set can be represented by a RawPosition.Similarly, the reported positions of a vessel that is anchored in a seaport, can be represented by a single Node resource (i.e., an aggregate of the RawPosition resources), where the temporal constituent will specify the time interval that the vessel is being anchored.A sequence of RawPosition or Node resources can also be represented by a Segment to report a sequence of trajectory parts associated with some event, weather conditions or regions of interest.This conceptualization enables the representation of compressed trajectories [6], using any variation of the available trajectory parts.The various levels of abstraction satisfy a wide range of analytics tasks.The trajectories and their parts are mainly described via their spatial and a temporal constituents.The spatial constituent (the geometry of the trajectory, trajectory part, or semantic node).Finally, a route is modeled as a trajectory of a sequence of raw positions (waypoints).This module is populated with positions and trajectories reported in the surveillance NCA and DMA data sets.

Vessels Characteristics
The vessel characteristics module provides an extensive taxonomy of vessel types (256 classes), their characteristics including physical dimensions (length, width, height, depth), properties related to the vessel type such as draft, identication codes (name, callsign, International Maritime Organization (IMO) code, Maritime Mobile Service Identity (MMSI) code, etc.), and other associations such as the depiction of vessel and the MMSI transmitter.Descriptions in this module take into consideration domain-specic requirements, such that -in practice -MMSI transmitters may be transferred from one vessel to the other.This conceptualization requires that the association between a vessel and its MMSI transmitter is temporally dened.Finally, the vessels characteristics module is only connected to the trajectory of moving objects module.This module is populated with entities reported in the vessel characteristics data set, which has been compiled from static AIS messages.

Contextual Information
This module describes the regions of interest that are related to the maritime domain, such as seaports, shing regions, protected areas, lighthouses, coastlines, etc.Each region of interest is always related to exactly one geometry (type of point, polygon, linestring or multipolygon), which provides the spatial representation of the region in the physical world.Entities described in this module are often associated with trajectories or trajectory parts (in the module of trajectories of moving objects), via the origin and destination properties, as well as any proximity and topological relations detected between trajectory parts and regions of interest.This module is populated with entities reported in seaports and protected areas data sets.

Events and Weather Conditions
Finally, the Events and Weather Conditions module describes the spatio-temporal entities that are aecting or triggered by trajectory parts of moving objects.Specically, a weather condition is considered to aect a specic region for a specic time interval.These regions are associated with any trajectory parts (of a moving objects) that crosses these regions.Events on the other hand, can be triggered by trajectory parts of a single trajectory (e.g., simple events such as turn, stop, etc.) or trajectory parts of two or more trajectories of dierent moving objects (e.g., collision, illegal activities, etc.).This module imports the World Meteorological Organization (WMO) conceptualizations and properties (https://codes.wmo.int/) to enable the description of weather conditions using both qualitative and quantitative descriptions.The qualitative description employs predened presets on specic weather variables and value ranges.For example, "Light rain" indicates that the precipitation rate is less than 2.5 mm per hour, "Moderate rain" indicates that the precipitation rate is between 2.5 mm and 7.6 mm per hour, etc.The quantitative description of a weather condition is based on measurements of specic weather variables at sea level.This module is populated with weather conditions of spatio-temporal positions reported in the surveillance data sets.However, events can be asserted by third party components analyzing the data, such as trajectory synopses generator or complex event recognition.

OVERVIEW OF THE QUERIES
We separate the SPARQL 1.1 queries of this benchmark into three categories with respect to the complexity of the query and the functionalities required for the query to be answered.Specically, the rst category (easy queries, 01. to 04.) involves a small and simple basic graph pattern (BGP), and it can be answered by any triple store available.The second category (intermediate queries, 05. to 09.), involves more complex graph patterns, however it is expected that most of the triple stores available, will be able to answer such queries.Finally, the third category (hard queries, 10. to 15.), involves complex graph patterns as well as functionality that is not available on every triple store, such as topological and temporal functions.
The SPARQL 1.1 queries to be discussed and used in the benchmark in this document, are the following: 01."get the number of vessels of each type X": This is a simple query, based on a basic graph pattern.02."get trajectories of vessel V": This is another simple query, requesting all the trajectories known in the data, of a specic vessel.03."get trajectories of vessels of type X": This query is an extension of 02., where we request the trajectories of vessels of a specic type.04."get vessels of type X that visited port Y": This query selects the vessels of a specic type, that have been at a specic port.

"get vessels of type X reported their position near POINT(A B)
during [t0,t1] (spatio-temporal query)": This query depends on the computation of spatial distance between the positions of vessels of a specic type and a xed point.The computation of spatial distance is not supported by many triple stores.09."get the positions of vessels of type X and the weather conditions at their reported positions at time t": This query retrieves the positions of all vessels for a given time instant, and the weather conditions at those positions.10. "get moving vessels of type X that are under specic weather conditions E, near to a seaport that is also used for passengers": This query employs a complex graph pattern and lters the vessels matching the pattern using ranges of values of the weather attributes.
11. "get vessels of type X within any protected area (spatial query)": This query employs a simple graph pattern, however it depends on topological function "within", which is not supported by all the triple stores available.12. "get vessels of type X within any protected area during [t0,t1] (spatio-temporal query)": A slightly dierent version of the query 11., requests the vessels of the given type, that are reported to be within a protected area within a given time interval.
13. "nd for a specic vessel X and time t the nearest port that features ship repair facilities": This query combines data from both surveillance and seaports data sets, and depends on proximity computations which are not supported by every triple store available.In addition to that, the query applies a string pattern on the textual description provided for each report, to lter the nearest ports.14. "nd the nearest port for all vessels of type X under a range of values of weather attributes, at any time": This query depends on spatial distance computations, employs a complex graph pattern and evaluates all the data stored in the triple store.Therefore, it is supported only by some triple stores and it is expected to stress them even more when the number of triples are increased.15. "nd all pairs of positions of vessels in the data set, that are as far as K meters and 10 minutes temporal distance": This query computes the spatial distance between all the reported positions of dierent vessels in the data set, and lters the results both spatially and temporally.

QUERIES AND RESULTS
The queries provided in natural language in the previous section, are constructed as SPARQL The above query retrieves all the instances that are known to have an installed MMSI transmitter (thus they are vessels), and then it retrieves their type.An alternative approach can exploit the Vessel taxonomy, which however requires traversing the rdfs:subclassOf path of the inferred model (not all triple stores are capable to derive RDFS inferences).Figure 3 illustrates the query pattern graphically.Q02: The following query retrieves the trajectories of each vessel V: SELECT DISTINCT ?vessel ?trWHERE { ?vessel :hasTrajectory ?tr } 4: The graph pattern of the query that retrieves the types of vessels.
An interesting extension of the above query, retrieves the number of trajectories reported in the data for each vessel, as follows (the query provides 4,288 results, sorted by the number of trajectories in descending order): SELECT DISTINCT (count(?tr) as ?cntr) ?vesselWHERE { ?vessel :hasTrajectory ?tr .} GROUP BY ?vesselORDER BY DESC(?cntr) This query has a similar structure to the one used for query 01., however it is expected to show larger processing time since the number of trajectories in the data are expected to be much more than the number of vessel types.Q03: The following query requests the number of trajectories of each vessel type in the data set: SELECT DISTINCT ?vesselType (count(?tr) as ?cntr)WHERE { [] a ?vesselType ; :hasTrajectory ?tr.

FILTER(?vesselType!=:Vessel)} GROUP BY ?vesselType ORDER BY DESC(?cntr)
A considerably harder variation of the above query, would require to delve into the trajectory details, and retrieve the reported positions of each vessel type (i.e., aiming to detect any obvious activity patterns or regions of interest per vessel type in the data set): SELECT ?type ?wktWHERE { ?v a ?type ; :hasTrajectory/:hasPart/:hasGeometry/:hasWKT ?wkt } The above query pattern is illustrated in Figure 4.
Figure 5 illustrates the trajectories of top-10 most active vessel types, where blue color indicates positions of cargo vessels, yellow color indicates positions of chemical Cargo vessels, red color indicates dangerous Cargo vessels, and green color indicates passenger vessels.We observe that: a) the routes of passenger vessels are clearly formed on the map, b) there is no obvious distinction of the cargo type (general, chemical or dangerous) from the pattern of the movement (i.e., there are no special routes for each cargo type), c) passenger vessels seem to cross paths with dangerous cargo vessels, which increases the possibility of a maritime incident.Q04: This query requests the association between vessel types and seaports.The vessels of each type and the seaports each vessel has visited can be retrieved by the following query (reports the name of seaport): SELECT DISTINCT ?vesselType ?nameWHERE { { [] a ?vesselType ; :hasTrajectory ?tr .This query features a nested query, which retrieves for each port its name and the URI of its geometry.The geometry of the departure seaport is related with the starting semantic node of each trajectory in the data set.Q06: The query 06. is an extension of query 05., which retrieves the type of vessels that have visited a sequence of ports (including indirect successors): SELECT DISTINCT ?v ?n1 ?n2 ?t1 ?t2WHERE { ?v a :BulkCarrier ; :hasTrajectory ?tr1 ; :hasTrajectory ?tr2 .?st1 :hasGeometry ?g1 ; :hasTemporalFeature ?tf1 .?tr1 :hasStart ?st1 .?tr2 :hasStart ?st2 .?st2 :hasGeometry ?g2 ; :hasTemporalFeature ?tf2 .?tf1 :hasTimeStart ?t1 .?tf2 :hasTimeStart ?t2 .FILTER((?tr1!=?tr2) && (?t1<?t2)) { SELECT DISTINCT ?n1 ?g1WHERE { ?p1 a :Port ; :hasName ?n1; :hasGeometry ?g1 .} The above query is also illustrated in Figure 6.
Notice that retrieving the immediately consecutive visited ports is trivial, as they are dened as departure and destination ports of a single trajectory.Also, the above query is very dicult to be processed by most of the known triple stores.The reason is that the part of the graph pattern that retrieves two trajectories of the same vessel, "repeats" the query for each trajectory of the vessel.Therefore, the more trajectories a vessel has, the harder the query for the triple store.Q07: Queries that employ regular expressions often are challenging.The query 07.requests the vessels and their type that have visited a seaport reported to be used also for passengers, during a specic time duration: SELECT DISTINCT ?vName ?vType ?name ?timeWHERE { [] a ?vType ; :hasName ?vName ; :hasTrajectory ?tr .?tr :hasStart ?st .?st :hasGeometry ?g ; :hasTemporalFeature ?tf .?tf :hasTimeStart ?time .?port a :Port ; :hasPortFeatures ?features; :hasGeometry ?g ; :hasName ?name .FILTER(( 2021-01-01 00:00:00 <?time) && (?time< 2021-01-05 00:00:00 ) && contains(lcase(str(?name)), norway ) && contains(lcase(str(?features)), passenger )) } ORDER BY ?time There are 51 out of 540 such seaports in Norway.The above query returns 110 cases of dangerous vessels that have visited some seaport that is also being used for passengers during the time period specied.Q08: SPARQL queries that are based on spatial distance from a xed point of interest or other moving objects are also challenging and not supported by every triple store available.We have implemented topological and proximity functions to enable the evaluation of such queries on the selected triple store.The query 08.requests the vessels of a specic type, that report their position near some xed position (e.g., Oslo) during a given time interval: SELECT DISTINCT ?vType ?g WHERE { ?v a ?vType ; :hasTrajectory ?tr. ?tr :hasPart ?node.?node :hasGeometry ?geom ; :hasTemporalFeature ?tf .?geom ogc:asWKT ?g .?tf :hasTimeStart ?t .FILTER((?t> 2021-01-01 09:04:39 )&& (?t< 2021-01-01 10:04:39 ) && (s:distance(?g,POINT(10.746159.9127) )<50)) .} This query depends on a custom function (not part of SPARQL 1.1) for the computation of spatial distance between the centroids of two geometries (results are given in Km).Also, we assume that "POINT(10.746159.9127)" represents the position of Oslo.Q09: This query requests the weather conditions close to the reported position of vessels of a specic type (e.g., shing vessel).We also lter the timestamp using a time interval, to enable the selection of vessels that have reported their position within the specied time interval.SELECT DISTINCT ?v ?p ?oWHERE { ?v a :FishingVessel ; :hasTrajectory ?tr .?node :hasWeatherCondition ?c ; :hasTemporalFeature ?tf ; :hasGeometry ?geom .?geom ogc:asWKT ?g .?tf :hasTimeStart ?t .?tr :hasPart ?node .?c ?p ?o .FILTER((?t> 2021-01-01 09:04:39 ) && (?t< 2021-01-01 10:04:39 ) ) .} Q10: This query requests all the moving vessels (i.e., speed over ground is not "0.0") of a specic type (e.g., shing vessels) that are under specic weather conditions (e.g., dew point is less than 272 K and mean wave period is less than 4 seconds), near to a seaport (i.e., spatial distance is less than 150Km) that is also used for passengers (i.e., the description of features of the seaport contains the text "passengers").The SPARQL 1. Please notice that a more generic query such as "retrieve the type of vessels within any protected region" is considerably more expensive, since it would require the evaluation of within function on any known vessel position and any known protected area.The generic query remains hard, even if the search space is narrowed down by selecting a specic vessel type (i.e., the protected areas in the evaluated data set are 1,882, and the reported positions of shing vessels are 642,955).We therefore request the positions of a specic vessel type (i.e., Fishing Vessel) that are within a specic protected area (Gule Rev).Other protected areas that can be evaluated are "Skagens Gren og Skagerak" and "Bratten".Q12: This query extends the previously presented query 11. with a temporal lter, which restricts that the results provided are within the a temporal interval (e.g., from "2021-01-10 00:00:00" to "2021-01-11 00:00:00").The SPARQL :hasTemporalFeature/:hasTimeStart ?t .?tr :hasPart ?node .?geom ogc:asWKT ?g .FILTER((?t> 2021-01-15 11:00:00 ) && (?t< 2021-01-15 16:00:00 )) .} } FILTER(ex:within(?g, ?pg)) .} Q13: Query 13. combines the computation of spatial distance between positions of a specic vessel (e.g., :ves259490000_2021) and the position of seaports, with text matching "ship repair" in their descriptions of features.Since the query retrieves the nearest port, the results are sorted in ascending order with respect to the computed method).Table 3 reports the evaluation results, where the rst column reports the query number, the second column reports the number of results and the third column reports the processing time, computed as the dierence between the time that the rst result arrived and the time that the query was submitted.
For the demonstration of the evaluation platform, we have set up Blazegraph 2.1.6triple store on a virtual machine providing a 8-core CPU (at 2.1GHz), with 8GB of memory (the maximum heap size allowed for the JVM was set to 6GB).The triple store has been loaded with approximately 340 million triples, as described in Section 3.
QueryID We observe that the processing time of the rst ve queries (easy queries) range between 0.5 and 1.3 seconds, the processing time of queries Q06-Q10 (intermediate queries) vary between 10 to 86 seconds and the processing time of rest of the queries (hard queries) go up to 4,427 seconds (excluding Q15 which was not completely processed).In the above results, Q15 throws an out of memory exception and it failed to provide any results.
The set of hard queries are based on spatial and proximity functions, which are not available in SPARQL 1.1, and depend on custom functions implemented in each triple store.Although we consider such queries as hard, we observe that in the case of Q13 the reported processing time is signicantly smaller than any other reported in this category.The reason for this fact is that there is a small number of candidate triples that match the triple patterns in this query, as they are ltered by the URI of the vessel (:ves259490000_2021 :hasTrajectory ?v1 .).This wide divergence of the recorded processing time is explained by the small number of results for this query.

CONCLUDING REMARKS
This paper provides a benchmark for querying spatio-temporal RDF data using real-life data sources from the maritime domain.Apart from the integrated RDF data under a common schema, the benchmark provides a set of 15 SPARQL queries of varying complexity.The queries are grouped based on their expected diculty level into three groups: plain graph patterns, complex graph patterns, and complex graph patterns combined with spatial/temporal functions.In addition, we present evaluation results using a specic open-source RDF triple store, namely Blazegraph.We make all the material publicly available: the integrated data set in the form of RDF triples (https://zenodo.org/record/7102043), the queries, and the evaluation results.In our future work, we intend to expand the evaluation of the benchmark for other RDF triple stores.We believe that others will nd interest in our benchmark, as a valuable resource for querying integrated spatio-temporal RDF data.

Figure 1 :
Figure 1: (left): The selected Natura2000 regions (shaded polygons) located around Denmark and the Scandinavian Peninsula, (middle): The spatial lter applied (orange rectangle) for the selection of surveillance data, (right): Reference points for weather conditions.

Figure 2 :
Figure 2: The connections between modules of the VesselAI ontology.

Figure 3 :
Figure 3: The graph pattern of the query that retrieves the types of vessels.

Figure 5 :
Figure 5: The trajectories of the top-10 most active vessel types in the data set.Green color indicates passenger vessels, blue color indicates general cargo, yellow color indicates chemical cargo, and red color indicates dangerous cargo.

Figure 6 :
Figure 6: The graph pattern of the query 06..

Table 2 :
It is an intermediate query, as it extends the property path in the graph pattern of the previous queries.05."get all ports visited by vessel V sorted by time": This query provides an overview of the sequence of ports visited by a specic vessel.Please notice that no spatial functionality of the triple store is required in this query, since the departure and Queries in the benchmark.
1.1 queries in the next paragraphs.For brevity, we assume for each query the following list of preces:

Table 3 :
Evaluation results on Blazegraph triple store.