Energy-Efficient Resource Management for Real-Time Applications in FaaS Edge Computing Platforms

Edge computing and Function-as-a-Service are two emerging paradigms that enable a timed analysis of data directly in the proximity of cyber-physical systems and users. Function-as-a-service platforms deployed at the edge require mechanisms for resource management and allocation to schedule function execution and to scale the available resources in order to ensure the proper quality of service to applications. Large-scale deployments will also require mechanisms to control the energy consumption of the overall system, to ensure long-term sustainability. In this paper, we propose a technique to schedule function invocations on Edge resources by powering down idle edge nodes during period of low demands. In doing so, our technique aims at reducing the overall energy consumption without incurring in service level agreements violations. Experimental evaluations demonstrate that the proposed approach reduces service level agreement violations by at least 78.1% and energy consumption by at least 62.5% on average using synthetic and real-world datasets w.r.t. different baselines.


Introduction
In the era of rapidly advancing technology and ever-increasing demand for real-time data processing, a new paradigm known as edge computing has emerged as a promising solution.Edge computing represents a shift from the traditional centralized computing model, where data is processed in a remote data center or cloud, to a decentralized approach that brings computational resources closer to cyber-physical systems and users [8,17] to allow for faster data processing and improved efficiency.Instead of sending all data to a central location for processing, edge computing supports data analysis on edge nodes installed at the edge, thus enabling quicker insights and allowing for rapid actions [6,11].This new paradigm is particularly relevant in the context of Internet of Things (IoT) applications, where often the analysis of the data generated by IoT devices is required to be performed within a certain timeframe, e.g. in autonomous vehicles, remote healthcare monitoring, industrial automation, and smart cities applications [9,10,16].By bringing computation resources closer to the point of action, edge computing not only enables rapid decision-making but also enhances the overall system performance [5,11].This new paradigm, however, is not intended to replace cloud computing entirely but rather to complement it, creating a distributed architecture that optimizes the flow of data between the edge and the cloud [8].
Concurrently to edge computing, a novel service paradigm has emerged to improve flexibility and efficiency of data analysis, the Function-as-a-Service (FaaS) model.FaaS offers developers the opportunity to run code in the form of discrete, stateless functions triggered by specific events, without worrying about underlying infrastructure management [13].FaaS is the ideal service model for edge computing, as by distributing function execution to edge nodes, quicker response times can be ensured to event-triggered data analysis, crucial for applications requiring rapid decision-making.
In the context of large distributed computing platforms, efficient workload assignment and auto-scaling of available resources are two critical aspects to maximize performance and optimizing resource utilization.In FaaS platforms the former can be achieved by assigning function execution to the most suitable devices based on factors like proximity, resource availability, and computational capabilities.In edge computing platforms, auto-scaling can be performed by adding or removing nodes, a task that is crucial for the overall system efficiency [5,11] as it ensures that the proper amount of resources is available, thus avoiding under-utilization or overloading of edge devices.
In response to the increasing demand for processing, however, significant progress can still be made in enhancing task assignments by integrating energy consumption considerations for processing workloads in edge nodes.This advancement is foreseen to facilitate identifying and selecting the most suitable edge node for each workload demand to run functions, considering both processing capability and energy consumption levels.The latter in particular is envisioned to be vital to lower energy and CO 2 footprint, still ensuring the required Quality of Service (QoS) [2,13].
Our work designs a model to utilize resource provisioning and execution of user functions while addressing energy use.In section 4, we introduce a model that minimizes energy by smartly assigning workloads to edge nodes.For automatic scaling, we employ a dynamic approach, powering off idle nodes during low load and activating them during heavier workloads.The proposed approach is assessed considering a realistic use-case of image analysis applications for video surveillance, where images are analyzed by invoking functions.Our performance evaluation based on simulations employed both synthetic and real workload traces.The performance of the proposed approach is compared against other policies.The results show that our proposal reduces the overall system energy consumption, while it is still capable of ensuring the required QoS.
The remainder of this article is structured as follows: Section 2 presents an overview of related works.Section 3 describes system model.In Section 4, our proposed solution is detailed.Its performance evaluation, together with other solutions is then discussed in Section 5. Finally, Section 6 concludes the paper.

Related works
Auto-scaling is essential in cloud environments and encompasses three common strategies: horizontal scaling adjusts capacity by adding or removing instances/nodes, with each handling a portion of workload; vertical scaling resizes existing resources to match demands, and hybrid scaling combines both.In FaaS platforms, instead, scheduling is a crucial component to ensure proper resource management, as it manages the actual execution of functions on the nodes of the platform.In the following, we provide a concise overview of related works by presenting first the auto-scaling methods proposed for edge computing platforms (Section 2.1), then scheduling solutions in FaaS (Section 2.2).We conclude the section by highlighting the novelty of our contribution w.r.t. the literature in Section 2.3.
2.1 Auto-scaling approaches at the edge 2.1.1Horizontal auto-scaling approaches Lee et al. [7] propose an enhanced auto-scaling method for service management in a Mobile Edge Computing (MEC) environment.A crucial aspect of this approach is making accurate scaling decisions regarding the components to scale and where to apply these decisions.To address this, the proposed method incorporates a Deep Q-Network (DQN) model for selecting scaling actions based on the given state and a decision model that complements the DQN model.The decision model ensures that scaling actions are applied to the appropriate locations, considering factors such as QoS, operating costs, and resource availability.Silva et al. [15] present a method for horizontal auto-scaling at the network edge using online machine learning.Their approach employs the MAPE-K control loop architecture to adapt container numbers to workload changes.This method not only dynamically adjusts scaling based on real-time workload fluctuations but also transitions to proactive scaling once the prediction model reaches optimal performance, allowing it to predict and initiate scaling actions preemptively.

Vertical auto-scaling approaches
The Q-Learning algorithm's limitation in selecting actions from a restricted action space prevents it from achieving precise control.To address this, Gan et al. [4] introduce a vertical auto-scaling algorithm that extends the Proximal Policy Optimization (PPO) method to a continuous action space, enabling enhanced control.PPO is a reinforcement learning algorithm renowned for optimizing policies in sequential decision-making requests, striking a balance between stability and sample efficiency.
Li et al. [8] focus on optimizing dynamic auto-scaling and adaptive service placement in edge computing environments.It starts by representing the architecture of edge computing and microservice-based applications as graphs.The objective is to minimize request delays while taking into account resource and bandwidth limitations.To address this problem, the paper proposes a dynamic multi-stage auto-scaling model that incorporates workload prediction for microservices and evaluates the performance of edge nodes.
2.1.3Hybrid auto-scaling approaches Rossi et al. [12] examine a model where a black-box application performs requests.To manage heavier workloads, multiple concurrent instances can be created using containers.Each independent instance meets response time requirements.Dynamic resource allocation handles varying workloads for performance, though elasticity could lead to downtime.Rzadca et al. [14] introduce "Autopilot," a machine learning-driven approach that optimizes resource scaling for requests.Autopilot reduces resource underutilization and the risk of request termination by forecasting vertical resource limits from historical data and employing rule-based horizontal scaling techniques.Vozmediano et al. [17] concentrate on responsive threshold-based methods, enabling users to set scaling rules using performance metrics.Edge nodes and cloud sites independently track metrics, triggering scaling actions if thresholds are breached or unmet within set time intervals.Horizontal scaling adds/removes a specified number of virtual machines (VMs) at the location, while vertical scaling adjusts resources within each VM based on workload.This approach suits small and medium-sized edge computing platforms.

Scheduling in FaaS
To efficiently distribute workloads and alleviate potential overload on edge devices within an edge environment, Ciavotta et al. [2] devised a decentralized FaaS architecture.This innovative approach enables the authors to distribute function execution across all edge devices, mitigating the risk of nodes becoming overloaded.Within their article, the authors employ predictive modeling to forecast the incoming workload for the upcoming time slot, calculating the resource requirements for various classes of functions.Russo et al. [13] introduce a comprehensive approach aimed at enhancing control over scheduling and resource allocation within FaaS platforms.Their strategy addresses the challenge of managing various services catering to diverse user classes.To accomplish this, the authors employ a First-Come First-Served methodology, ensuring the fulfillment of QoS requirements.

Our contribution
The literature in FaaS at the edge needs solutions for performing QoS in general and none of them consider the problem of energy efficiency.To address this crucial aspect, our proposed solution takes this benchmark into account and presents a novel model designed to tackle the energy challenge.Our model aims to minimize energy consumption by calculating the amount of energy required for function execution.By doing so, we create a list of edge nodes along with their energy consumption levels, enabling efficient function execution.To achieve efficient request handling, this study further formulates a model to solve this issue effectively.Moreover, to enhance efficient auto-scaling, our proposed approach involves powering down idle edge nodes when they are not in use and dynamically adding more edge devices to provision resources whenever the demand increases.

System model
The overall architecture of a FaaS system deployed at the edge consists of a two-tier model, namely the Devices tier and the Edge tier as Figure 1 shows.The Devices tier comprises a set of devices (such as smartphones, IoT devices, cameras, etc.) that generate events.An event could be data generated from a sensor or a physical occurrence that requires the execution of a certain code for data analysis or to react on edge nodes at the Edge tier.These edge nodes are capable of execution functions.In our system model, we assume that functions are triggered periodically, as each function is associated to a device/user that generates a continuous stream of events.
Two mechanisms are required to manage the Edge tier: a scheduling function mechanism that selects the node for the execution of a function and an auto-scaling mechanism that manages the amount of available resources available on the system at anytime.In this paper, we propose a scheduling mechanism to select the node for the execution of a function to ensure the required QoS, while minimizing the energy consumption.To reach this objective, the proposed approach selects a node that has sufficient resources to guarantee the required QoS, however, it gives priority to nodes that are more energy efficient, i.e., they are capable of handling functions with lower energy consumption.In addition, an auto-scaling approach is adopted to minimize the overall energy consumption of the platform.To this aim, we propose an auto-scaling mechanism to power down the idle edge devices during low loads, and then turn them on again when more resources are required.
To illustrate the considered system model, we consider a practical example: a video analysis application implemented on a FaaS platform.In this scenario, cameras continuously generate images to be analyzed.Every image is treated as an event and triggers the execution of a function, which analyzes the single frame or a set of frames and produces some results, e.g., the objects that are present.
It is important to highlight that although this use case is considered in the performance evaluation, the proposed model is adaptable without changes also in other scenarios employing FaaS edge computing platforms.

Energy-Efficient Resource Management
In this section, we present the proposed Energy-Efficient Resource Management mechanism (EERM).EERM includes both a scheduling mechanism to distribute function execution on different edge devices and an auto-scaling mechanism to turn on and off the nodes available on the system.The former enables faster and more efficient processing, reducing overall latency and improving system performance, the latter aims at minimizing the overall The edge environment consists of multiple heterogeneous computing devices (edge nodes) which are responsible for running a set of functions.To ensure efficient resource allocation, each request must be assigned to an appropriate edge node.The selection criteria for determining the suitable node for processing a given request is based on the principle of minimizing energy consumption.In other words, the goal is to identify a node that can execute functions while consuming the least amount of energy.The resource allocation is performed periodically and a continuous stream of events is considered.
We address two major issues in FaaS platforms at the edge by managing function execution and energy usage by edge devices.The main objective is to assign requests to nodes that possess sufficient processing capacity while maintaining energy efficiency to minimize overall energy consumption.The capacity of an edge node refers to the number of processing requests that the node can accept.To achieve more efficiency, the system will power down those nodes which are idle.Put simply, whenever the system requires additional resources to run more functions, the idle nodes will be reactivated.
The formulation of the model is established based on the following definitions.There is a set of  edge nodes, representing the collection of available computing devices at the edge.Additionally, there is a set of  requests for function execution that need to be processed.So, let   denote the computational workload of the -th request, which refers to the number of processing requests.Let   denote the energy consumed by -th node for processing a single workload unit.Based on these definitions, the energy    consumed by workload  processed on node  is given by: where    is a decision variable that is defined as: The total energy consumption E to process all workloads is: Therefore, the goal is to minimize the total energy consumption due to processing workloads at the edge by edge nodes.Given the above details, the objective function is: where   represents the capacity of edge node , which refers to the number of requests that node  can process.Regarding the above objective function, each request must be assigned to one edge node to process.Also, the summation of workloads assigned to node  should not exceed the capacity of this node.After receiving the current workload   of request  in the Edge tier, the goal is to assign this request to the most efficient edge node  for executing its functions.To achieve this, the proposed algorithm employs a three-stage approach.In the first stage, the algorithm identifies nodes within the network that possess sufficient remaining capacity to handle the incoming request effectively.Moving on to the second stage, the system generates a list of eligible edge nodes with adequate capacity to accommodate the computational request .To determine the more suitable node for processing request , the algorithm calculates the energy consumption for each node in the list using the expression (1).The decision variable    defined in expression (2) comes into play at this point, helping to select the most suitable edge node for the current request.As the request should be assigned to just one edge node, this decision variable aims to identify the most efficient node capable of accepting and processing the request to run functions.In the third stage, the system enhances overall efficiency by managing energy through horizontal auto-scaling of edge nodes.This energy management strategy ensures that during periods of low load, idle nodes are powered down, conserving energy resources.Conversely, when the load is high and more edge devices are required, the algorithm activates idle nodes to provision additional resources, ensuring efficient performance.
We employ PuLP, a well-known open-source Python library for Linear Programming modeling.PuLP provides a user-friendly interface for formulating and solving linear programming problems, and it is compatible with various solvers including CPLEX [18].

Performance evaluation
In this section, we carry out the performance evaluation of the proposed method by means of simulations.To this aim, the specific use case of image analysis presented in Section 3 is considered to ensure a more realistic evaluation.The objective of the simulations is to address several crucial research questions, which are as follows: RQ.1 Which method can be employed to effectively minimize violations in Service Level Agreement (SLA)?RQ.2 Which method minimizes energy consumption associated with request processing?RQ.3 Which auto-scaling approach offers the quicker resource provisioning computation process in terms of run-time metric?
The remainder of the section is organized in two sub-sections: Experimental Setup and Results.In Section 5.1 we present the terms of comparison adopted, the simulation methodology and the realworld dataset used, while in Section 5.2, we present the results of the experiments and discuss the results w.r.t. the research questions.

Experimental setup
In our experiments, we compared the proposed approach with Silva et al. [15], Smallest-First (S-F), Largest-First (L-F), and Energy-Aware (E-A) function scheduling approaches, In addition, we decided also to compare our proposed solution with the approach presented by Silva et al. [15] as a baseline technique.The approach proposed in Silva et al. [15] is specifically tailored for the use case of image analysis at the edge and aims at enhancing resource efficiency in the edge environment.To this aim, it dynamically scales resources in response to workloads, making it an apt choice for comparative analysis.The Smallest-First technique selects the edge nodes for function allocation in ascending order according to their processing capacity, giving priority to nodes with lower capacity when executing functions.Conversely, the Larger-First technique sorts edge devices in descending order, prioritizing nodes with higher capacity for requests to run functions.In the Energy-Aware approach, requests are assigned to nodes that exhibit more efficient energy consumption during workload processing.So, this method tries to assign workload to those nodes with higher capacity and lower energy consumption for processing requests to achieve better resource allocation regarding the consuming energy of edge devices for function execution.
All simulations were conducted in an ad-hoc simulator written in Python.As detailed in Section 4, the proposed approach is implemented using a CPLEX solver through the Pulp library 1 .
The simulations were run using two datasets, namely, Madrid and Synthetic, which are well-suited for the scenario involving image analysis and were provided in [15].Namely, the Madrid dataset reports workload demands from traffic measuring points in Madrid, Spain, from September to October 2021.Each point provided the corresponding load to analyze an image to detect the number of vehicles detected.According to the information provided in [15], the dataset includes approximately 30 million vehicle detections on a monthly basis.
The Synthetic dataset, also provided by the authors of [15], was generated using TimeSynth, an open-source library for synthetic time series generation.These synthetic time series data are used to simulate image analysis requests for vehicle license plate recognition.Specifically, during the experiments, images of vehicles are captured at specific frames per second (F/S) to recognize license plates.
Different edge nodes with different capabilities are considered in the experiments.Table 1 summarizes the various edge devices considered, including their capabilities.For instance, the Jetson Nano, the cheapest one, can handle 16 F/S and consumes just 0.0028 J of energy for this processing.The simulations employed 17 edge nodes in total with the following composition: 10 Nvidia Jetson Nano, 5 Nvidia Jetson Xavier NX, and 2 Nvidia Jetson AGX Xavier devices [15].The simulations were executed on Windows 11 OS with an 11th Gen.Intel Core i5-11400H 2.70 GHz CPU and 16 GB RAM.

Results
To address RQ.1, we present the SLA violation percentages across both the Madrid and Synthetic datasets in Figure 2. SLAs, which define agreed-upon service levels between providers and customers, are measured in terms of underprovisioned requests w.r.t. the total request volume.SLA violations occur when insufficient resources are allocated to execute functions [1], thus the corresponding requests are underprovisioned.From Figure 2, our proposed solution exhibits the smallest average underprovisioning percentage, i.e., ≈ 0.1%, on both datasets.
On average, our proposed approach consistently outperforms alternative techniques by a significant margin, reducing the number of SLA violations by at least 78.1% w.r.t. the baselines.In fact, our proposed method fails to process only 6 requests for the Madrid dataset and 7 requests for the Synthetic dataset out of 5856 and 5857, respectively.The best competitor, E-A, fails to process 32, resp.224, request for the Synthetic, resp.Madrid, dataset.
In response to RQ.1, it becomes evident that other methods, except for our proposed solution, exhibit similar ratios due to underprovisioning of resources.In contrast, our approach consistently aligns resource allocation with the requirements for executing functions at the edge, demonstrating its effectiveness in meeting SLAs.
To answer RQ.2, Figure 3  Underprovisioning Requests (%) To conclude on RQ.2, the proposed solution using two mechanisms can find a most efficient way to execute functions at the edge and also, manage the energy usage of edge devices by powering off them during low loads.As can be seen in Figure 3, this technique could significantly improve this benchmark compared to other approaches by at least 62.5% w.r.t. the baselines.The proposed solution, in fact, consumes 78.47   for processing requests with the Synthetic dataset and 148.10   for the Madrid dataset.In terms of energy consumption, the most competitive technique is E-A, which requires 471.14   to process requests on the Synthetic dataset and 395.01   on the Madrid dataset.
Figure 4 shows the average run time for the execution of the considered approaches, i.e., the time required to compute the allocation of the functions to be executed, which addresses RQ.3.As expected, the proposed approach is the one that results in the longest execution time.While the other approaches can be executed in the order of milliseconds, the proposed approach requires almost a second to run.This can be explained considering that the proposed approach requires to solve a linear programming model via a solver.This longer execution time, however, results in a more accurate solution that is beneficial in terms of the other metrics previously presented in this section.To conclude on RQ.3, the proposed solution results in longer run-time than the competitors, which is suitable only in cases in which the allocation can be computed in advance, like in case of periodic function invocation, the one considered in this paper, in case of a real-time

Conclusion
In this paper, we investigated the management of functions of the FaaS paradigm on edge computing platforms.In particular, we focused on energy consumption of existing solutions, and we proposed a new strategy for selecting the most convenient edge devices to handle functions efficiently, while keeping the quality of service.Our proposed solution packs function execution requests on the edge resources, to minimize the energy consumption, while satisfying their computational capacities, to minimize the SLA violations.The experimental evaluation demonstrated that the proposed method successfully reduces service level agreement violations by at least 78.1%, and significantly lowers energy consumption by at least 62.5% on average in relation to function execution on edge nodes.
For future works, we plan to design new strategies for real-time scheduling of function invocations, instead of batch processing.Moreover, we will target more complex execution platforms, where available edge resources are not known beforehand, but may appear and disappear dynamically.

Fig. 1 .
Fig. 1.System model presents the average energy consumption by edge nodes in function execution.Our proposed

Table 1 .
Types of edge devices (nodes) and their capabilities.
[15]age ratio of underprovisioning requests regarding all techniques solution focuses on efficient function distribution considering.In comparison to alternative techniques applied to both datasets, our approach achieves a substantial reduction in energy consumption.As illustrated in the bar chart, the work by Silva et al.[15](Baseline) does not consider the energy usage of edge nodes, so, it consumes the highest amount of energy for resource provisioning compared to other techniques.It is observed that the Smallest-First and Larger-First techniques just concentrate on the capacity of edge nodes.However, the Energy-Aware technique focuses on energy consumption but it is not better than the proposed approach.