Efficient and Resilient Edge Computing: Algorithms, Techniques and Research Opportunities

We are witnessing a huge proliferation of low-cost devices connected in the Internet of Things. Given the large amounts of data generated by these devices at the edge of the network, there is an increasing need to process them near the network edge in order to meet the strict latency requirements of IoT applications. Edge computing is a promising paradigm to improve the quality of service for such applications by filling the latency gaps between the IoT devices and the typical cloud infrastructures. While Micro Data Centers provide computing resources that are geographically distributed, careful management of these resources near the edge of the network is vital for ensuring efficient, cost-effective and resilient operation of the system while providing low-latency access for applications executing near the network edge. This tutorial provides an introduction to edge computing and introduces the notion of Micro Data Centers and illustrates the edge computing architecture. We will discuss the algorithms, techniques and design methodologies focusing on efficient and resilient resource allocation for latency-sensitive edge computing applications. Finally, we will go through open research problems in this area and discuss potential directions of future work.


INTRODUCTION
We are witnessing a huge proliferation of low-cost devices connected in the Internet of Things for supporting various applications including Smart Homes, Smart Grid, Smart Buildings, Public Safety and Environment Monitoring, Medical and Healthcare, Agriculture and Breeding and Industrial Processing.The number of IoT devices is estimated to triple from 8.74 billion in 2020 to more than 25.44 billion by 2030 [8].Traditional cloud datacenters are typically not located close to the devices and therefore, they become a limiting factor when it comes to latency-sensitive applications.Several applications require low latency computing (Figure 1) [3][5] [6].For https://doi.org/10.1145/3631461.3632515instance, augmented reality applications often require a response time of less than 10 ms to achieve a good user experience.Similarly, connected autonomous vehicle applications (collision warning, autonomous driving, traffic efficiency, etc.) have latency requirements of less than 10 ms [4].

Figure 1: Low Latency Applications
The notion of Micro Data Centers (MDCs) in an edge computing platform makes it possible for IoT applications to process data and access computational resources located closer to the endpoints, providing low response time guarantees to latency-sensitive applications that may operate on these platforms (Figure 2).Here, small-scale MDCs that represent ad-hoc and distributed collection of computing infrastructure pose new challenges in terms of management and efficient resource sharing towards achieving a globally efficient resource allocation.Efficient resource allocation for edge computing considering the utility, cost and latency requirement of the applications is central for managing the MDC resources distributed geographically.
For handling many modern workloads, stream processing using edge computing resources is a promising approach to support low latency processing of large-scale data.Both the latency and throughput requirements of modern applications can be met by deploying stream processing applications using geo-distributed edge computing resources.While the approach is promising, most stream processing engines that are designed for cloud environments cannot handle the bandwidth and resource constraints associated with edge computing environments.For example, edge devices (e.g., smart gateways placed near IoT devices) may not be able to handle the same workload as regular servers in a cloud datacenter.Also, the heterogeneous nature of computing and networking resources in geo-distributed edge environments raises additional challenges in optimizing the performance of stream processing applications.This tutorial provides an introduction to edge computing and introduces the challenges associated with managing micro datacenter resources.We will discuss the state-of-the-art techniques and algorithms for resource allocation in edge computing in order to meet the strict latency requirements of IoT applications.The tutorial then focuses on stream processing as a specific application use case and explores the challenges associated with efficient management of edge computing resources and stream processing application scheduling in order to ensure resiliency and performance.We will cover the state-of-the-art solution techniques on optimizing stream processing for edge computing environments.Finally, we will discuss open research problems in this area and explore potential directions of future work.

SCOPE, MOTIVATION, SUMMARY OF TUTORIAL
The tutorial is organized along the following dimensions.We discuss the scope and motivation in each sub-section.

Edge Computing and Resource Allocation in Micro Data centers
Edge Computing provides an additional layer of infrastructure to fill latency gaps between the IoT devices and the backend computing infrastructure (Figure 3).The notion of Micro Data Centers (MDCs) in an edge computing platform makes it possible for IoT applications to process data and access computational resources located closer to the endpoints, providing low response time guarantees to latency-sensitive applications that may operate on these platforms.A fundamental assumption in these solutions includes a tight coupling of the management of the Edge Computing Infrastructures (ECIs) with that of the service management performed by Service Providers (SPs), which means that the computational resources present at the edge MDCs are coupled and controlled directly by edge Service Providers (SPs) [10].We will demonstrate that such a coupled model for management of Edge Computing Infrastructures (ECIs) by Service Providers (SPs) significantly limits the cost-effectiveness and the opportunities for latency-optimized provisioning of edge infrastructure resource to applications.When the management of the Edge computing infrastructures are controlled by the SPs, it results in an increased infrastructure cost and a decrease in the overall utilization of the system leading to poor cost-effectiveness.We will discuss recent edge resource allocation models that decouple the infrastructure management from service management, enabling the ECIs to be managed by Edge Infrastructure Providers (EIPs).We will discuss how such a decoupled model allows EIPs to establish an Edge Computing Infrastructure Federation (ECIF) to provide resources to the Edge Computing applications managed by the SPs.We will illustrate that this also results in increased opportunities for resource consolidation and utilization as the geo-distributed ECIs can be jointly managed and allocated to maximize application utility and to minimize cost [12].We will also cover the benefits of decentralized management of edge resources [15] and introduce techniques for decentralized allocation of Micro Data Center resources.

Performance Optimization for Stream Processing in Edge Computing
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("source",new MqttSpout(…), 1); builder.setBolt("filter",new SpeedFilterBolt(), 1) .shuffleGrouping("source");builder.setBolt("groupby",new AccidentAggregatorBolt() .withTumblingWindow(Duration.seconds( 5)) .withMessageIdField("source_ts"), 1) .fieldsGrouping("filter","filtered_sp", new Fields("location")); builder.setBolt("sink",new PrinterBolt(), 1); .fieldsGrouping("groupby","grouped", new Fields("location")); … .Figure 4 shows an example stream query in Apache Storm.In an edge-based stream data processing system, the stream processing platform is responsible for interpreting and managing the user-defined stream processing programs at the network edge.Compared to cloud-based deployment of stream processing applications, edge-based stream processing extends the stream processing ability to the edge computing layer and makes it possible to achieve low-latency stream processing using geo-distributed edge resources.We will discuss various challenges in deploying stream processing applications in edge computing environments.We will show that ensuring low latency is critical for many stream processing applications and it requires new techniques for resource optimization and query processing [11,14].Moreover, edge-based stream processing applications also benefit from automatic elasticity as manual tuning incurs significant costs.We will describe the state-of-the art solutions tackling these challenges.Specifically, we will describe techniques that handle the bottlenecks in stream processing applications that may arise from lack of computing capacities or network bandwidth in edge environments.Core performance optimization techniques that we will cover include (i) data locality-awareness in joint physical plan generation and operator placement, (ii) load-aware operator placement and (iii) co-flow aware scheduling that considers the network flow dependencies and schedules the flows to retain the co-flow dependencies.

Fault-tolerant Stream Processing in Edge Computing
The tutorial will emphasize the importance of fault-tolerance in edge computing as many IoT applications require both high accuracy and timeliness of results.As edge computing environments are highly dynamic and include less reliable computing and network resources, guaranteeing resilience and fault-tolerance is vital for providing reliable services.In general, the challenges of deploying resilient stream processing applications in edge computing environments are three folds.First, the edge infrastructure consists of a number of unreliable devices (low-profile smart gateways, IoT devices, etc.) and components.Second, applications typically have variable requirements that require the system to optimize the performance of the applications by carefully considering the faulttolerance requirements and different runtime features (e.g., CPU, memory, network usage and recovery cost).Third, fault-tolerance in existing stream processing engines is based on checkpointing and replaying mechanisms and are not optimized directly for lowlatency processing.We will discuss the state-of-the-art methods to combine checkpointing and active replication mechanisms to reduce the recovery cost while meeting the fault-tolerant budget.
We will analyze the challenges of applying adaptive checkpointing and replication methods for stream processing in edge computing.Specifically, operators in stream processing applications have different runtime features and depending on when checkpointing is applied to different operators will have different recovery costs (time) which also makes the failure of some operators more expensive compared to the others.We will discuss algorithms and techniques that generate fault-tolerant physical plan based on the estimated recovery cost of each operator to partially replicate the stream processing application [9,13].We will also discuss resilience-aware scheduling mechanisms to schedule fault-tolerance components namely checkpointing store and active replications.

Open problems, Research Challenges and Opportunities
Finally, this tutorial will discuss the limitations of the state-of-theart.We will go through open research problems in this area and discuss potential directions of future work by presenting both the challenges and opportunities.Specific directions of future research we will discuss include (i) research opportunities for automatic and autonomous resource allocation in edge computing environments, (ii) automatic application fine tuning in response to dynamic computing environment changes, (iii) research opportunities in decentralized resource management for edge computing.

TARGET AUDIENCE AND RELEVANCE TO ICDCN 2024
This tutorial content is tailored to a broader audience in ICDCN 2024 including students and young researchers working in networking and distributed computing areas.The tutorial does not require any pre-requisite knowledge in edge computing besides some basic familiarity with networking and distributed computing concepts.While the content is primarily targeted at academic audience, the tutorial does include some components that could be of interest to practitioners in this area as well.

TUTORIAL OUTLINE
Table 1 presents the outline of the tutorial and the learning objective.

Figure 2 :
Figure 2: Data Processing in Edge MDC

Figure 3 :
Figure 3: Edge Micro Data Centers

Figure 4 :
Figure 4: An example Stream query