Multi-Objective Optimisation of Container Orchestration Systems

The ever-increasing global demand for optimized resource utilization, energy efficiency, and rapid response times in cloud computing environments necessitates innovative approaches for resource management. Traditional cloud systems predominantly cater to large-scale infrastructures, often neglecting smaller-scale environments, such as edge computing infrastructures. In response to this gap, this paper introduces a novel Multi-Objective Stochastic Gradient Descent (MOSGD) approach designed to enhance the efficiency of the application placement processes beyond the cloud and closer to the edge of the network. The MOSGD optimisation addresses two crucial objectives: energy consumption and execution time. We meticulously modelled and integrated the two conflicting objectives into a unified cost function to minimise resource consumption and response time. To validate the MOSGD approach, we deployed a real-life environment utilizing the Carinthian Computing Continuum infrastructure as the target platform. The results of this research exhibit significant performance enhancements compared to two conventional methods. The findings indicate an improvement of up to 80% in energy efficiency and up to 30% reduction in execution time. These outcomes underscore the potential of the MOSGD approach to outperform traditional techniques in scenarios where large-scale procedures may exhibit suboptimal performance.


INTRODUCTION
The rapid evolution of multiple societal and industrial branches owes much of its progress to cloud computing, a concept rooted in virtualization technologies dating back to the 1960s.By enabling computation over the Internet, cloud computing has ushered in a new era, embracing cutting-edge technologies like container orchestration systems and playing a pivotal role in managing Big Data.This transformation has introduced groundbreaking solutions for both industry and scientific endeavours, offering resources and adaptability that have driven down costs and fostered collaboration across diverse domains.
Despite its advantages, cloud computing has complexities, including high latency and energy inefficiency.The geographical separation between cloud data centres and users often results in delays, diminishing the efficiency of accessing computational resources, particularly in real-time applications like autonomous driving.These latency challenges are pervasive in both larger and smaller-scale environments, necessitating innovative approaches for resolution.
To mitigate latency concerns, edge computing was introduced, enhancing cloud computing capabilities by positioning services closer to users and processing data in close proximity to the computational units [1].The primary goal of edge computing is to reduce latency in traditional cloud systems.However, this augmentation exacerbates the heterogeneity and complexity of the overall cloud infrastructure.With many devices and technologies deployed at the network's edge, deploying and managing applications become increasingly arduous, especially in the context of Big Data.This heightened complexity calls for novel solutions to address the distributed nature of cloud and edge Computing.
This paper introduces the Multi-Objective Stochastic Gradient Descent (MOSGD) algorithm, an approach designed to optimize specific objectives in cloud and edge environments when assigning applications to physical machines using container orchestration tools like Kubernetes [2].Efficient placement becomes increasingly critical given the diverse conditions of participating machines and the heterogeneity within the cloud and edge ecosystem.MOSGD is tailored to small-scale scenarios, offering trade-off solutions for potentially conflicting objectives like energy consumption and processing speed.The approach centres on optimizing the placement of applications on suitable devices, extending MOSGD to accommodate discrete problems and incorporating penalty procedures for handling constraints.
The paper has seven sections.We first survey the related work in Section 2. Afterwards, we present a formal model for our method in Section 3. We describe the container placement optimisation architecture in Section 4. Section 6 elaborates on the evaluation results, and Section 7 concludes the paper.
To begin with, in [3], the authors introduce a job scheduling algorithm in federated cloud computing clusters.The deep reinforcement learning-based job scheduler concerns the decision-making problem of scheduling jobs in a data center, which is considered NP-complete.The author points out that common approaches like simple strategies up to complex meta-heuristics like genetic algorithms require many manual configurations and settings to be adjusted.
In the study presented by [4], Mixed Integer Linear Programming (MILP) is applied to the problem of scheduling or placing pods in Kubernetes.This approach explicitly targets the dual objectives of minimizing both makespan and tardiness.They provide a detailed description of the model used, outlining the specific system conditions and constraints considered in their work.To solve the problem, they employ the CPLEX solver [5], an industry-standard optimization tool.
The work in [6] introduces a weight-based multi-objective GA, which simultaneously focuses on optimizing objectives like makespan and flow time.The authors highlight innovative aspects of their algorithm, such as eliminating similar chromosomes to promote diversity and employing a local search based on a defined dominance function to enhance solution quality.The evaluation indicates that the algorithm can discover solutions close to the optimal tradeoffs, or near the optimal Pareto front.It proved to be robust and effective, even in cases where Pareto fronts were disconnected or non-convex.
The work [7] proposes an adapted version of the particle swarm called Load Balancing Mutation Particle Swarm Optimization.The procedure addresses task scheduling in cloud systems.The goal of the algorithm is to keep the system reliable, based on available resources and the rescheduling of tasks that failed to allocate their requests.The factors considered are time, cost, and scalability.The results clearly show, in comparison to the standard PSO and Longest Cloudlet to Fastest Processor (LCFP) algorithm, improvements regarding execution time and transmission costs.
Another similar approach introduced in the paper [8] uses a multi-objective optimization model to optimize the scheduling of microservices.The approach optimizes the scheduling by considering three objectives: network transmission, load balancing, and service reliability.The model is created based on the physical node resources, relationships between services, their requirements, and the system architecture itself.The authors define microservices with selected properties and represent them in a tuple of the service set and their relationships.
Limitation.The current approaches are tailored explicitly for consolidated cloud data centres and cannot consider the high heterogeneity of edge devices.Furthermore, they perform the energy prediction based on synthetical data and are not capable of using direct energy monitoring information.Lastly, they do not consider dynamic container/pod placement strategies.

MODEL
This section presents a formal model of the optimization objectives, decision variables and the stochastic gradient descent optimization approach.

Objective Functions
For optimization purposes, we define two conflicting objectives, namely, energy consumption and execution time.

Energy consumption.
The energy consumption of a given resource (node) can be illustrated as follows: , where: We further define the ratio ( full −  idle ) ×  ,,  , that provides the energy consumption due to the utilization of each component .The percentage of the utilization is calculated for all components and multiplied by the possible remaining energy usage range and the factor for the energy of the current node describing the energy efficiency of the node .

Execution time.
We define the execution time objective as: The equation calculates the time based on the computational size .It is the ratio between the instructions per operation and cycles per instruction, where: • : Number of operations to be performed

Decision variables
The decision variables are defined in a matrix  , where each decision variable    ∈ {0, 1}.   equals 1 if pod  is placed to node , and 0 otherwise.

Multi Objective Stochastic Gradient Descent Optimisation
The gradient descent, similar to line fitting in linear regression, defines the goals in minimizing a cost function [9].The data needed by this procedure is the direction and the so-called learning rate, which is denoted as alpha.This information provides the partial derivative for future iterations and should result in a local or global minimum.The parameter alpha describes the step size toward the desired result.The risk of missing the minimum is relatively high if the learning rate is too high.On the other hand, a pretty low learning rate leads to longer execution times, and possibly no convergence is reached within the iteration count.However, for our optimisation problem, the objective functions were combined into a weighted cost function, and the cost function's calculated gradient points in the direction in which the function regularly increases.This step is inverted concerning the decision variables so that they move in the opposite direction, according to the term of gradient descent.Applying the continuous optimization algorithm to the discrete problem requires specific steps.Given an initial decision variable matrix  , the values which indicate if a pod i is placed on node j are relaxed to continuous variables.The gradient descent is applied, and the decision matrix is de-relaxed.In this scenario, the nearest value to 1 in every row indicates the node where the pod, containing an application, is placed.Additionally, the constraints defined must be included in the setup.Therefore, a penalty concept excludes these correct and feasible solutions.The whole procedure is explained in detail in the following paragraphs.

Gradient Calculation.
The gradient in a scalar field is a vector, which describes the direction and speed of a rise in the scalar function.The components of a gradient, described as the multiple variables in the function, specify the rate of change on the individual axis described per component.The gradient vector consists of the first partial derivative of the function  (, ).Mathematically illustrated, it can be described as: Every element here is a partial derivative of the function regarding a variable.All variables remain constant except for the corresponding variable, which changes slightly.

Central difference method.
The central difference method is a finite difference method used to approximate derivatives.The central difference method approximates the derivative by computing the function value considering an offset ℎ in both directions.This derivative, or gradient component, represents the sensitivity of the total cost to a small change in the assignment of pod  to node , and is used to update the assignment in the direction that decreases the cost.The derivative approximation is described as follows: 3.3.3Constraints.We define the constraints for the problem as follows: Each pod (group of applications) must be placed on one and only one node (cloud or edge container).
The sum of resources required by all pods placed on a node must not exceed the available resources of the node. where: • : Total number of pods

ARCHITECTURE
The conceptual architecture, depicted in Figure 1, shows the essential components of the MOSGD optimization process.We start the optimisation process by configuring the runtime environment, which includes the configuration of the applications, containers, pods, and the monitoring system.The runtime configuration and monitoring data are then provided for pre-processing.This step includes normalisation of the monitoring data and relaxation of the decision variables related to the configuration of the applications and the pods.Afterwards, the stochastic gradient descent algorithm provides the normalised and pre-processed data.In this step, we perform the actual optimisation by computing the gradient of the possible solutions.When the algorithm converges to a near-to-optimal solution, it provides the output data for further preprocessing in the format suitable for utilisation with the Kubernetes container and pod placement system.

EXPERIMENTAL DESIGN 5.1 Evaluation testbed
We used the Carinthian Computing Continuum ( 3 ) to deploy our evaluation testbed [10].Concretely, we set up a Kubernetes cluster over the edge layer of the  3 .In Table 1, we present the utilized

Software and network configuration
The Raspberry Pi (3, 3B+ and 4B) devices are grouped in a Kubernetes cluster and are configured with fixed IP addresses within the range assigned by the cluster router (48-port HP Aruba).Next, we enhance the memory capabilities by changing the cmdline.txt.The file is edited to include cgroup-cgroup-enable=memory.This setting enables memory control groups, a prerequisite for utilizing Microk8s Kubernetes [11].Lastly, we configure the deployment across the cluster using DaemonSets.This allows us to set how many instances of a pod can be executed on a single node in the cluster.All Edge devices in the Kubernetes cluster utilise Raspberry Pi OS version 11.

Use case application
We created synthetic use case applications enclosed in pods with different resource requirements regarding CPU cores, operating memory and Input/Output (I/O).The variation in intensity and combination of the utilization levels approximates real-world situations in the synthetic scenario.The synthetic use case application comprises three nodes, each calculating Fibonacci numbers to stress the CPU cores, doubling the size of a single variable to stress the operating memory, and constantly performing I/O operations to simulate network load.

Related pod placement approaches
We compare MESGD with two related approaches: 5.4.1 Kubernetes pod placement (Default).The approach implements a procedure to optimize and balance the resource utilization in the cluster.The Kubernetes scheduler includes multiple factors that handle different aspects of the cluster, including load balancing and resource utilisation.

Integer
Linear Programming pod placement (IP).We created a pod placement approach and scheduler based on an integer linear programming approach.The approach utilizes a weighted sum method to optimise two objectives, namely the execution time and the energy requirements.

Evaluation scenarios
For evaluation purposes, we created three specific scenarios to assess the scalability and the performance of the MOSGD and the related approaches, as depicted in Table 2.This step leads to an energy consumption drop of the idle energy of the avoided nodes, roughly between 2 W and 3 W per node.The Default placement focuses on balancing resource utilization; therefore, all nodes are utilized by this strategy, which leads to maximum balance in resource usage.The time diagram for the execution time and energy requirements of the small-scale scenario can be observed in Figure 3. MOSGD outperforms the Default Kubernetes scheduler by 15% and the IP by 28% in terms of execution time.However, in terms of energy consumption, the IP and MOSGD perform similarly, with an improvement of over 80% compared to the Default scheduler.The only exception is the Default pod placement that induced higher execution times and energy.It is important to note that the energy consumption objective strongly varies throughout the process due to the possibility of taking nodes offline, especially with such a small amount of pods.We present the time diagram for the medium-scale scenario for the MOSGD and IP approach in Figure 5.The performance of the MOSGD approach is better than the IP, even though MOSGD requires more computational resources for the optimisation process.

Large-Scale
The last scenario is presented in Figure 6.The results show that the MOSGD outperform both IP and the Default approach in relation to the two objectives by up to 5% and 20%, respectively.The energy consumption can be justified due to the additional nodes that are utilized by the Default scheduler concerning the placement of the selected nodes.We present the timing diagram for the optimisation objectives on the large-scale scenario in Figure 7.The two compared strategies achieve proper results.The results of the MOSGD are slightly better, especially after longer execution periods, extending over 800 seconds.

CONCLUSION
This paper introduces a novel stochastic gradient descent algorithm called MOSGD, which is based on a multi-objective cost function.MOSGD significantly improves performance and energy efficiency when applied to pod placement within orchestration systems such as Kubernetes.The research encompasses several key aspects, including the deployment of new edge infrastructure, the formulation of optimization metrics, and the creation of associated optimisation models.
To validate the effectiveness of the proposed MOSGD approach, we conducted real-world experiments on the  3 computing infrastructure, exploring three distinct scenarios.Additionally, we conducted a comparative analysis against two state-of-the-art methods.Our evaluation results reveal that the MOSGD approach can lead to a remarkable 30% reduction in execution time, coupled with an impressive 80% reduction in energy consumption.
Our future research endeavors will focus on devising more efficient algorithms to mitigate the risk of getting stuck in local minima during the optimization process within the stochastic gradient descent function.

Figure 2
Figure2depicts the results of the MOSGD for the small-scale scenario compared with the Default and IP approaches, considering the execution time and power requirements.The IP and the MOSGD approaches utilize just one node compared to the Default strategy.

Figure 3 :
Figure 3: Execution Time and Power Usage Small-Scale

Figure 4
Figure4depicts the evaluation results for the medium-scale scenario for the energy consumption and the execution time, considering four nodes and 40 pods.The behaviour of the approaches is very similar in relation to the small-scale scenario, except for the higher amount of pods that require roughly the same energy consumption.The approaches of IP and MOSGD manage to avoid at least two of the nodes nodes, which still leads to a drop in energy usage.

Figure 5 :
Figure 5: Execution Time and Power Usage Medium-Scale

Figure 7 :
Figure 7: Execution Time and Power Usage Large-Scale , : Pod  resource request  in node  (percentage usage of total capacity) •  , : Total capacity of pod  in node  •  idle : Idle energy consumption of the node [mW] •  full : Full-load energy consumption of the node [mW]