Cost-based Load Balancing of RDF Reasoning in Fog-Computing Environments

Fog computing has attracted growing attention as a distributed computing platform for real-world problems, and semantic web technologies, including RDF (resource description framework), have been used as key enablers for semantic data processing. RDF reasoning allows us to perform high-level reasoning about real-world entities, but RDF reasoning in a distributed environment is still challenging because of the changing nature of the real world, leading to unbalanced loads among fog nodes and cloud servers. Existing distributed reasoning methods in fog computing cannot cope with such dynamism. This work proposes a dynamic load-balancing method for RDF reasoning in fog environments. Specifically, we devise a cost model of RDF reasoning considering the processing and network loads, thereby enabling balancing the load between fog nodes. We conduct a set of experiments to demonstrate the performance of the method.


INTRODUCTION
The Internet of Things (IoT) has been widely accepted as a platform for recent information systems, including smart transportation, smart agriculture, etc [4,10], where many sensors are connected by wireless networks, making it possible to capture real-world information in real-time.Most such systems exploit cloud servers as their backends to collect, aggregate, and analyze the data collected by the edge sensors and host the business logic.This approach exhibits a scalability issue in the cloud server where all network traffics are concentrated because the number of edge sensors becomes hundreds to thousands in large-scale systems [5].Consequently, the processing and network performance at the cloud server becomes the bottleneck.
To alleviate this problem, fog computers have been introduced [3].Fog computers are located between the cloud server and the edge devices and perform intermediate aggregation and analysis, reducing the network traffic and computing load at the cloud servers.
In the meantime, recent advances in computer and network performance allowed edge devices to perform more complicated processes, such as prediction over raw sensor data using machine learning models.Besides, IoT systems must accommodate heterogeneous devices, requiring integrating heterogeneous data generated by heterogeneous sensors.To enable high-level data representation and interoperability, data semantization [8] using RDF [12] is increasingly used.RDF makes it easier to use existing knowledge bases (KBs) as external information sources for different purposes, such as reasoning, model training, etc.
There have been several works about distributed RDF reasoning in fog computing environments.Earlier works focused on distributed RDF reasoning without considering load balancing except for a few works.[10] proposed the distribution of RDF reasoning between cloud and edge devices.[7] proposed an architecture for adaptive allocation of RDF reasoning rules in fog environments to minimize latency.Later, our previous work [6] proposed a dynamic load-balancing method where CPU load in fog nodes is considered.
However, none of the existing methods considered the dynamic nature of real networks where processing loads in computing nodes and network load dynamically change according to the change in the real world's status.Consider a system monitoring moving objects, i.e., citizens or cars.The locations of such devices change as time passes, e.g., they tend to be biased in the city centers while moving to suburbs in the nighttime.Besides, the network traffic is also biased to the daytime, while less traffic is observed at night.Let us consider that the processing performance of fog nodes is limited compared to cloud servers.Therefore, we have to consider both the processing load and network load to maximize the throughput and minimize the latency simultaneously.
To address this problem, we propose a cost-based dynamic loadbalancing method for distributed RDF reasoning in fog computing environments.More precisely, we target simple rule-based RDF reasoning and devise a method of dynamic assignment of the rules between fog nodes and a cloud server.The proposed cost model estimates the cost, where the processing cost for reasoning and the network cost for network communications are considered.We also propose a heuristic method to optimize this integer linear programming problem.We conduct a set of experiments using a real system using Raspberry Pi and a simulation to show that the proposed method can dynamically allocate RDF reasoning rules among fog nodes and cloud computers by considering the load on the fog nodes and the network conditions.
The rest of this paper is organized as follows: Section 2 explains several preliminary concepts or terms necessary for the subsequent discussions.Next, Section 3 reviews the existing related works, followed by the proposed method in Section 4. Section 5 reports the experimental evaluation of this work, and Section 6 concludes this paper and mentions our future work.

PRELIMINARIES 2.1 Fog and edge computing
Fog computing is a distributed processing architecture proposed by Cisco Systems, where fog nodes are located in each local area network (LAN) for performing local aggregation and transmitting the results to the nodes at the higher levels.Figure 1 illustrates the conceptual view of fog computing.The aim is to perform intermediate analysis over the raw data submitted by massive sensors, reducing the volume of data and the computing load of the higherlevel servers.Besides, it can contribute to achieving shorter latency if fog nodes can respond directly to edge computers because fog nodes are closer to edge computers in the network.However, fog nodes' computational capability, i.e., memory and processors, are usually less powerful than those of the servers in higher levels.
In general, intermediate aggregation can be performed by multiple layers of fog nodes, but we limit our scope to single-level fog nodes.Note also that, in some literature, the term edge computing is used for almost the same meaning as fog computing, e.g., [9].We assume in this work fog nodes are distinct from edge computers and located between cloud servers and edge computers while there is no clear distinction between fog and edge computers.

RDF and RDF reasoning
RDF (resource description framework) [12] is a W3C standard for data representation and exchange.A resource is identified by an IRI (internationalized resource identifier).The properties and the relation between resources are described by triples, each of which comprises subject, predicate, and object where subject, predicate, and object are IRI; alternatively, a subject can be a blank node, and an object can be a blank node or a literal.
Having represented base facts in RDF, reasoning allows us to infer additional (unknown or implicit) facts based on the given base facts using axioms and/or rules.In general, there are two approaches to reasoning, ontology-based and rule-based reasoning.The former exploits ontologies where basic concepts and the relations among concepts are defined, and representative ones include RDFS [14] and OWL [11].On the other hand, in the latter approach, we assume that predefined rules that describe domain-specific rules are given, and we can reason new facts by evaluating those rules.A simple example is that a car is detected as dangerous when the speed is 20 km/h more than the limit, yielding a new triple as the result of reasoning.
This work targets rule-based reasoning because it is powerful enough to cope with a wide range of real-world applications.There are different ways to describe reasoning rules, but in this work, we assume that regulations are defined by SPARQL [13], which allows us to describe complicated reasoning rules using different operations, e.g., aggregation.Specifically, the condition of a rule is described by SELECT-WHERE-FILTER clauses, while the resulting triples are described by CONSTRUCT clause.

RELATED WORK 3.1 Distributed reasoning in fog environments
With the recent advances in the performance of devices in edge or fog, several works have attempted to distribute the RDF reasoning process to edge or fog nodes, thereby improving the latency at the edge.
Su et al. [10] analyzed the performance improvement by distributing RDF reasoning between the cloud servers and edge devices.They employed the traffic system in a smart city as the use case and evaluated the performance using 16 reasoning rules for reasoning high-level traffic events, e.g., right turn, traffic congestion, etc., using the trajectories of taxis containing geographic location, speed, and direction, etc. Besides, they compared the time for transmission and reasoning with four types of RDF formats.They found that distributing RDF reasoning rules to edge computers improves the entire performance, but they only considered static allocation of reasoning rules.
Seydoux et al. [7] tried to address the drawback of Su et al. by introducing the dynamic distribution of RDF reasoning rules between cloud servers and fog nodes and proposed an architecture for it.They targeted fog environments where fog nodes form multiple layers rather than a single layer and proposed to perform RDF reasoning in fog nodes closest to the edge and have the capability to process the rules, improving the response time and processing speed, and scalability.However, they did not consider the dynamic load changes among fog nodes.Since the processing capability of fog nodes is far more limited than that of cloud servers in general, performance degradation results if fog nodes are overloaded.Another drawback is that they did not consider the difference in loads in different fog nodes, making it hard to cope with dynamic load changes among fog nodes.
Kokubo et al. [6] proposed a dynamic load-balancing method for fog environments.More precisely, they monitored CPU load in the fog nodes and distributed RDF reasoning rules from the fog node to the cloud server if the node was detected as overloaded, achieving a better performance even when loads are dynamically changed.Their drawbacks are: 1) they did not consider network load; and 2) the method requires a predefined threshold for detecting overloaded nodes in terms of CPU load, which is, in general, difficult to find.

IoT task scheduling
Azizi et al. [2] proposed a couple of semi-greedy algorithms for task scheduling in heterogeneous fog environments.IoT task scheduling generally aims to minimize response time and/or energy consumption.In the meantime, this work attempts yet another goal of optimization of energy consumption while managing the task deadline.If any task schedule cannot meet the required deadline, they choose such a node that minimizes the total violation time, which makes this work different from existing ones.
Yadav et al. [15] addressed the problem of task scheduling in vehicular fog computing where computing resources on vehicles are exploited as fog nodes.Specifically, they formulated the task scheduling problem as a cost-optimization problem where latency and power consumption are minimized and proposed a heuristic approach called ECOS for the problem consisting of the following three steps: 1) detection of overloaded nodes, 2) determination of tasks to be offloaded, and 3) determination of nodes to be offloaded.The evaluation is based on a simulation, and they showed that the proposed method could successfully reduce power consumption and latency while maintaining the requirements.
The above works targeted horizontal task distribution among fog nodes, while our work considers vertical task distribution between fog and cloud.This makes sense because cloud servers are much more powerful than fog nodes, making it possible to accommodate more loads than fog nodes.

PROPOSED METHOD
We proposed in this section a novel cost-based dynamic load-balancing method for fog environments by considering processing and net workloads.First, we describe the proposed cost model for estimating the cost of processing RDF reasoning rules.Then, we formulate the problem as a cost-minimization problem based on the cost function and a dedicated heuristic algorithm for the problem.

Target scenario
In this work, we exploit reasoning about city status in a smart city, e.g., traffic, air pollution, etc., as our target scenario.There are fog nodes in traffic signals and streetlights as computing resources that collect and perform reasoning to infer high-level information from the low-level data captured by sensor nodes and smartphones at the edge.Specifically, to evaluate the system, we target reasoning about the level of pollution w.r.t.different materials, allowing detection of polluted areas and notification to the citizens.The details can be found in Section 5.1.
The system is expected to present low latency in some applications, e.g., traffic safety.Besides, the computational capability of fog nodes is usually far more limited than cloud servers, and the loads on fog nodes significantly vary depending on the time in many cases.For this reason, dynamic load balancing is required.

Architecture
Figure 2 illustrates an overview of the proposed system.The system comprises IoT nodes, fog nodes, and cloud servers.The communication between IoT and fog nodes is implemented using UDP socket while MQTT1 is used between fog nodes and cloud servers.MQTT is a lightweight protocol that supports bidirectional multi-point communications and has been employed in many IoT systems.One of its features is that it supports several QoS (quality of service) levels, and we used QoS2, which ensures data is transmitted exactly once to avoid missing control data for identifying fog nodes and measuring execution time.
The role and configuration of each layer are as follows: IoT node collects data and makes the initial processing by the data collector, and RDF annotator converts the data into RDF format.Then, the results are outputs RDF data.Fog node aggregates RDF data from IoT node and performs reasoning using a semantic reasoner implemented using Apache Jena2 .Also, it has a knowledge base used by the semantic reasoner and a reasoning distributor to outsource reasoning rules for balancing the load.Cloud server collects RDF data from fog nodes and performs additional reasoning by RDF reasoner and a knowledge base.Also, it has a reasoning distributor to balance the load and an MQTT broker that mediates the publish/subscribe communication in MQTT implemented by mosquitto3 .

System model
This section describes the system model used to derive our cost model.Specifically, we model time for reasoning and communication, allowing us to model the notification time for the end users.

Reasoning time.
We formulate the total time for reasoning at a fog node   and cloud servers   as follows: where    represents the time required to process rule  at node ; and    represents a binary variable (0 or 1) whether or not node  process rule , i.e.,    = 1 means node  process rule , while    = 0 means the opposite.In this work, we consider load balancing between fog nodes and cloud servers.Hence,  =  (or  = ) means process at a fog node (or a cloud server, resp.).In general, fog nodes are less powerful than cloud servers and prone to be affected by the change in load, motivating dynamic load balancing between fog nodes and cloud servers.We monitor the load level at the fog nodes by CPU usage rate and model the performance degradation by the processing load in terms of reasoning time for each rule    incremented by the delay due to the load: where   denotes the reasoning time of rule  when CPU load is 0%;  denotes the constant effect of CPU load against reasoning time; and   denotes the CPU load [0, 1.0].These parameters can be determined for each node if the fog nodes differ in performance, while the same set of parameters can be applied if the nodes are homogeneous.Note that, in this work, we do not consider the load level in cloud servers because they are much more performant than fog nodes.

Communication time.
We model communication time as the sum of propagation delay and transmission delay.The physical distance between the terminals causes the propagation delay and could be long if the cloud server is located in a distant overseas country.Meanwhile, the transmission delay is caused by the size of the data being transmitted, i.e., extensive data and/or a narrow network bandwidth result in a long transmission delay.
The following formula models the communication time between nodes  and : where     denotes the propagation delay; and     denotes the transmission delay calcurated by dividing the data size    by the network bandwidth    .

Notification time.
The notification time is measured from when an IoT node receives data until the data are processed either at a fog node or a cloud server and get the result of reasoning at the fog node where the initial IoT node is associated.As can be seen, there are two paths depending on the node where reasoning is performed.In Figure 3, the notification time of data reasoned at a fog node   goes through the path colored purple and blue, while that reasoned at a cloud server   goes through the path colored purple and orange.Thus,   and   can be represented by the following formula using the propagation and and transmission delays: where    denotes a binary assignment s.t.   = 1 denoted node  process rule , while    = 0 means opposite.Note that each rule is assumed to be processed at either fog or cloud exactly once.   denotes the notification time.Note here that   and   increase according to the number of rules processed at node .If we individually transmit the reasoning results, the performance will degrade due to the massive communication of small data.Besides, the semantic reasoner needs a reasoning model according to the set of rules to reason.For this reason, the fog node and the cloud servers process all assigned rules in a batch and transmit the results to the next node/server, causing an increased reasoning time on a node with many reasoning rules to process.Thus, we have to care not only for the assignment of a rule itself but also for the rest of the rules assigned to the same node because it affects the notification time of the rest.
As this problem is constrained to assign an integer value to    , it is classified as an integer programming of NP-hard complexity to optimize.

Cost-based allocation algorithm.
To optimize the cost, we propose a heuristic algorithm.The basic approach is to prioritize the reasoning rules according to the estimated processing time and assign rules that require longer processing time to cloud servers.This is because we can expect more improvement if we move a rule with a longer processing time to a cloud server.Having determined the order of reasoning rules, we do not have to examine all possible combinations, reducing the solution space from  2 to  + 1.
Algorithm 1 shows our optimization algorithm based on the above idea.It takes as inputs: the constant effect of CPU load against reasoning time , the number of rules  , the order list of reasoning time at fog/cloud for each rule in descending order consisting of  1 , . . .,   ,  1 , . . .,   , and notification time of rule processed at fog (cloud)    (  ), and outputs the optimal assignment  that minimizes the cost out of  + 1 candidate assignments.First, we measure the CPU usage ratio at the fog node, computing the additional latency to be added to the processing time (Lines 1-2).Next, we compute the estimated cost of  + 1 candidates by reducing the number of rules assigned to the cloud (Lines 3-14).Lastly, we output the assignment with the least cost as results (Line 22).
This algorithm is executed in the reasoning distributor (in Figure 2).The fog node initializes the following parameters when the system starts: 1) the processing time of each reasoning rule and sorting them and 2) the constant effect of CPU load against reasoning time.Then, it measures the propagation and transmission latencies at fog (and cloud as well).The cloud server does a similar initialization and notifies the time for reasoning rules to the fog nodes.When the system runs for a long time, we update these parameters periodically, making it possible to add/remove reasoning rules dynamically.

EXPERIMENTS
We conducted a set of experiments to evaluate the proposed method.

Experimental environment and setups
We implemented an IoT node on a MacBook Pro, fog nodes on Raspberry Pi 4, and a cloud server on an Amazon Elastic Compute Cloud (EC2) m5.2xlarge instance in the Tokyo region.
Table 4 shows the set of parameters.Each fog node manages at most five IoT nodes, resulting in a maximum of 25 IoT nodes.
For measuring the time, we measured ten trials and computed the average of eight results by removing the maximum and minimum values.Besides, we used tc to control the network latency and ping and iPerf to measure the propagation delay and the network bandwidth.The concrete values in the experimental environment were 7 ms (IoT-fog) and 17 ms (fog-cloud) as propagation delays and 18 Mbps (IoT-fog) and 34 Mbps (fog-cloud) as network bandwidth.

Algorithm 1 Determination of rule assignment based on cost
Input: Fog load variation , # of rules  , reasoning time per rule for fog-cloud  1 , . . .,   ,  1 , . . .,   , notification time for rules processed by fog-cloud    and   .Output: Assignment  // Assign rules up to I-th to fog.

Dataset and rules for reasoning
As for the dataset, we employed the one provided by City Pulse project [1] that offers smart-city-related data, e.g., traffic, weather, events, etc., along with related software.Specifically, we used synthetic air pollution data containing the sensor, geographical location (latitude and longitude), acquisition date and time, target material, and AQI (air quality index).AQI is Identification of the maximum AQI and its substance R8 Notification of contaminated areas an index that represents the degree of air pollution, e.g., 0 to 33 means Very Good, 34 to 66 means Good, etc.The data size was 3,000 triples/IoT node and, in total, 75,000 triples at the maximum.
Table 5 shows the list of REF reasoning rules.The reasoning time of each rule was measured in the experimental environment as described above.The reasoning at the cloud was 10x faster than fog nodes.For this reason, we estimated the reasoning time at the cloud to be 1/10 of the time at the fog node.Besides, we processed R1 to R6 by Apache Jena's Rule Engine, while R7 and R8 were processed by Jena's SPARQL Processor (ARQ).

Comparison of time for notification
In this experiment, we measured the performance with 1) different numbers of fog nodes (denoted as IoT1, IoT3, or IoT5), 2) different CPU loads, and 3) different delays on IoT-fog and fog-cloud communications by ingesting latencies (0 ms, 100 ms, or 200 ms) in the TCP/IP driver in each machine.
Figure 4 compares notification time.It depicts different combinations of delays and CPU loads in a 3-by-3 matrix of bar charts where each row and column corresponds to the delays (0 ms to 200 ms) and the CPU lodas (0% to 100%), respectively.Each chat's horizontal axis corresponds to the number of IoT nodes, and the vertical axis represents the average notification time.Proposed1 (blue) only considers the CPU load at the fog nodes, corresponding to [6].Proposed2 (orange) is the proposed scheme considering both CPU load and network.Table 6 shows the result of cost-based rule assignment, where each cell represents the assignment using a notation like "F6C2, " meaning that six rules are assigned to fog and the rest to the cloud.
Regardless of the CPU load and network status, Proposed1 and Proposed2 showed linearly increased notification time if the number of IoT nodes increases due to the increased reasoning load from IoT nodes.When the network latency was 0 ms, the notification time of Proposed1 significantly decreased because it only considered the CPU load and more rules were assigned to the cloud server.E.g., all rules were assigned to the cloud server when the load was 100%.Since the network latency in the experimental environment was not so large, assigning all results to the cloud resulted in a shorter notification time without additional induced latency.On the other hand, Proposed2 achieved better performance than Proposed1 by assigning all rules to the cloud server regardless of CPU load (Table 6).When we induced additional network latency 100 ms, the two methods had no significant difference.This was due to the balanced cost between the performance gain by outsourcing reasoning rules to the cloud and the increased network latency, resulting in similar rule assignments between the methods.With the induced network latency 200 ms, Proposed2 performed better than Proposed1 when the CPU load was 100%, while Proposed1 performed better with a CPU load of 0%.If the CPU load was 100%, Proposed1 attempted to assign all rules to the cloud regardless of the network latency, leading to a longer response time.On the other hand, Poposed2 successfully found better assignments by considering both CPU load and network latency.According to Table 6, even with long network latency (200 ms), some rules were assigned to the cloud, although the best strategy was to perform all reasoning on the fog nodes.The reason was that rules with long reasoning times tend to be assigned to the cloud, and the difference in processing speed between cloud and fog was significant (10x).

Notification time with different # of fog nodes
Figure 5 shows the response time with five IoT nodes when varying the number of fog nodes.We can observe that increased fog nodes slightly affected the notification time due to the additional cost at the cloud to process requests from fog nodes.With a network latency of 0 ms, there was no difference because all rules were assigned to the cloud.For the cases of 100 ms and 200 ms, we can see that there was almost no effect caused by the increased network latency.

Notification time analysis
Figure 6 shows the breakdown of total notification time for evaluating the impact of the time for rule assessment.From the bottom, it shows 1) transmission time from IoT to fog, 2) time for rule assignment, 3) reasoning time at the fog, 4) transmission time from fog to the cloud, and 3) reasoning time at the cloud with one IoT node and one fog node.From the figure, we can see that the time for the assignment is short enough.

CONCLUSIONS
We have proposed a cost-based optimization method for distributed rule-based RDF reasoning in fog environments.We introduced a cost model considering CPU load and network latency and proposed a heuristic algorithm to optimize the average notification time.The experimental evaluation showed that the proposed method successfully finds rule assignments that achieve shorter notification times by considering CPU and network statuses.Our future work includes testing the method with a larger environment.Besides, we plan to cope with complex reasoning rules.

Figure 2 :
Figure 2: An overview of the proposed system.

Figure 4 :
Figure 4: Comparison of notification times of proposed methods 1 and 2.

Figure 5 :
Figure 5: Change in notification time when the number of fog nodes was changed.

Figure 6 :
Figure 6: Breakdown of notification time

Table 5 :
Rule details