Comparison of Microservice Call Rate Predictions for Replication in the Cloud

Today, many users deploy their microservice-based applications with various interconnections on a cluster of Cloud machines, subject to stochastic changes due to dynamic user requirements. To address this problem, we compare three machine learning (ML) models for predicting the microservice call rates based on the microservice times and aiming at estimating the scalability requirements. We apply the linear regression (LR), multilayer perception (MLP), and gradient boosting regression (GBR) models on the Alibaba microservice traces. The prediction results reveal that the LR model reaches a lower training time than the GBR and MLP models. However, the GBR reduces the mean absolute error and the mean absolute percentage error compared to LR and MLP models. Moreover, the prediction results show that the required number of replicas for each microservice by the gradient boosting model is close to the actual test data without any prediction.


INTRODUCTION
The recent shift towards the increasing number of microservicebased applications in the Cloud-native infrastructure brings new scheduling, deployment, and orchestration challenges [1], such as scaling out overloaded microservices in response to increasing load.
Research problem.inspected in this work, extends our previous work [2], where we explored microservice scheduling on provisioned resources.In [2], we did not inspect the scalability requirements of the containerized microservices by prediction models considering different request arrival rates from end-users acting as producers [3].Traditional microservice scaling methods [4,5] focus on the resource or application processing metrics without predicting the stochastic changes in user requirements, such as dynamic request rates.
Example.tabulated in Table 1 presents an example involving three producers calling three microservices deployed on three resources.In this scenario, the microservices experience varying call rates initiated by the producers.Every producer request leads to interactions with its corresponding microservice on the specific resource within a specific time.Typically, microservices with higher execution times necessitate horizontal scalability to accommodate the call rate.In other words, a direct correlation exists between the microservice time and call rate, motivating the need to explore prediction models addressing their horizontal scaling [6].Table 1 shows that during a 2 s execution, the microservices  0 ,  1 , and  2 receive the following number of calls:  0 : 2 s • 2 calls/s = 4 calls;  1 : 2 s • 2 calls/s = 4 calls;  2 : 2 s • 3 calls/s = 6 calls.However, at the end of the 2 s interval, the microservices  0 ,  1 , and  2 still respond to their third, second, and first calls.To reduce the bottleneck on the Cloud infrastructure [7], we need to scale the microservices based on the multiplication function between the correlated microservice time and call rate up to the following number of replicas:  0 on  0 : 2 calls/s • 0.7 s/call = 1.4 ≈ 2;  1 on  1 : 2 calls/s • 1.5 s/call = 3;  2 on  2 : 3 calls/s • 2 s/call = 6.Method.proposed in this work, addresses the scalability problem through microservices call rate predictions employing ML models involving two features: • Microservice time defining the processing time of each containerized microservice on the Cloud virtual machine; • Microservice call rate defining the number of calls/requests invoking a microservice.We apply ML models to predict microservices call rate based on the microservice time and estimate the number of microservice replicas to support stochastic changes due to the dynamic user requirements.Recently, there has been a growing interest in the applicability of deep learning models to tabular data [8,9].However, tree-based machine learning (ML) models such as bagging (e.g., RandomForest) or boosting (e.g., XGBoost [10], gradient boosting tree, and gradient boosting regression) are among the popular learners for tabular data that outperform deep learning methods [11].Nevertheless, related work did not explore and evaluate the gradient boosting regression (GBR) and multilayer perceptron (MLP) learning methods for microservice call rate prediction.Therefore, we apply and compare the GBR, neural network-based MLP, and traditional linear regression (LR) models to estimate the number of replicas for each microservice.
Contributions. comprise a comparative evaluation of the ML models on trace data collected from a real-word Alibaba Cloud cluster [12] indicating that the GBR reaches a balance between the prediction errors, including the mean absolute error (MAE) and mean absolute percentage error (MAPE), and the training time compared to the MLP and LR methods.
Outline.The paper has eight sections.We survey the related work in Section 2. Section 3 describes the application, resource, schedule, microservice time models, and the main objective, followed by the ML prediction models in Section 4. Section 5 presents the architecture of the replication predictions.Section 6 describes the experimental design and evaluation, followed by the results presented in Section 7. Finally, Section 8 concludes the paper.

RELATED WORK
This section reviews the state-of-the-art analysis of microservice traces, workload prediction, and autoscaling of microservices in the Cloud infrastructures.
Microservice prediction.Luo et al. [13] designed a proactive workload scheduling method by adopting CPU and memory utilization to ensure service level agreements while scaling up the resources.The work in [14] predicted the end-to-end latency between microservices in the Cloud based on the MLP, LR, and GBR models.Cheng et al. [15] applied GBR for predicting the resource requirements to execute the user's workload.Rossi et al. [16] proposed a reinforcement learning scaling method based on the microservice time.Ştefan et al. [17] presented a deep learning-based workload prediction to autoscale microservices, highlighting the MLP model.
Alibaba microservice trace analysis.Luo et al. [12] explored the large-scale deployments of microservices based on their dependencies and the runtime execution times on the Alibaba Cloud clusters and showed that service response time tightly relies on the call graph topology among microservices that impacts the runtime performance.He et al. [18] proposed a graph attention network-based method to predict the resource usages based on the topological relationships among the Cloud physical machines and validated this method through the Alibaba microservice dataset.
Autoscaling.Arkian et al. [4] presented a geo-distributed autoscaling model for the Apache Flink framework to sustain the throughput among resources, optimizing the network latency and resource utilization.Autopilot [5] proposed a method to scale in/out the number of replicas from each service in a time interval (e.g., 5 min) based on the CPU usage and the average required utilization of the microservices.
Research gap.Related methods designed the microservice prediction models based on the completion time or the cost.We extend these methods by researching microservice call rate and replica prediction based on the microservice time using LR, GBR, and MLP machine learning models.

DATA PROCESSING MODEL
This section presents the formal model underneath our work.
Data processing streams.S = (M, M P , D) consist of: Producer.P ∈ M P generating data at the rate MCR P that requires further processing by a microservice   .
Dataflow.data P streaming from a producer P ∈ M P to a microservice   ∈ M: D = {(P,   , data P ) |(P,   ) ∈ M P × M}.
Resource requirements.req (  ) for proper processing of a dataflow data P by a microservice   is a pair representing the minimum number of cores CORE (  ), memory MEM (  ) size (in MB), and deadline for execution (in s) [19]: Minimum processing load.CPU (  ) is the (million) number of instructions (MI)) of dataflow data P processed by a microservice   .
Resources.R =   |0 ≤  < N R represent a set of N R Cloud virtual machines.We define a resource   = CORE  , MEM  as a vector representing its available processing core CORE  and memory MEM  size (in GB), depending on its utilization.Every device has an available processing speed denoted as CPU  (in MI per second).
Schedule. of microservice   is a mapping on a resource   = sched(  ) that satisfies its processing and memory requirements: where MT,  is the microservice time defined in the next paragraph.
Microservice time.MT   ,   or MT , of   on a resource   = sched (  ) is the ratio between its computational workload CPU (  ) (in MI) and the processing speed CPU  (in MI per second) [2]: Objective. is to estimate the number of replicas L   for horizontally scaling a microservice   based on the producer call rate MCR P and its microservice time MT , on a resource   : L  = MCR P • MT , .

PREDICTION MODELS
This section summarizes the ML models used in this paper for predicting microservice call rates based on their service times, further used to decide their replicas.
calculated by minimizing the sum of the squared differences between the predicted microservice call rate MCR ′ P and the weighted microservice time MT , • w P [20]: Moreover, the biases have independent and identical normal distributions with mean zero and constant variance [21].
Multilayer perceptron.belongs to the category of the feedforward artificial neural network, comprising a minimum of three layers of neurons [22]: an input layer, one or more hidden layers, and an output layer (see Figure 1).This algorithm combines inputs with initial weights in a weighted sum and subsequently passes through an activation function and mirrors the process observed in the perceptron [23].This model propagates backward through the ML layers and iteratively trains the partial output of the loss function to update the model parameters (see Figure 1): where w P and b P denote the learnable weight and bias of the linear MLP model and N N defines the number of neurons in a hidden layer [22].This paper defines a single-neuron input layer  = 1 and a corresponding output layer  = 3. Gradient boosting regression.estimates and constructs an additive model in a forward stage-wise manner.GBR [24] can ensemble multiple prediction models (e.g., regression trees) to create a more accurate model [25].
where the ℎ  is a boosting estimator and E is a constant corresponding to the number of estimators [26] used by the fixed-size decision tree regressors.

ARCHITECTURE DESIGN
We present in this section the architecture design of our method implemented in the  toolbox [27].

𝐷𝑎𝑡𝑎𝐶𝑙𝑜𝑢𝑑
We designed the architecture of our method in the context of the  [28] project supporting the lifecycle of microservicesbased applications processing streams and batches of data on the computing continuum through the interaction of four tools.
− .defines the application services and structure from the user input using a domain-specific language model to define the microservice requirements [29];  − .simulates the dataflow execution based on the microservice's processing speed and memory size requirements before large-scale deployment [30];  − .receives the microservices, explores their requirements, such as processing and memory size, predicts the number of replicas for each microservice based on its resource requirements, adapts the execution, and sends to  −  for the deployment;  − .deploys the dataflow processing microservices on the computing resources based on  −  schedules and manages their execution on multiple Kubernetes clusters at the user's location or in the Cloud [31].Orchestration.manages the microservices on the Cloud virtual machines by utilizing the Kubernetes replica scaling1 based on the predicted microservice call rates and decisions taken by the integrated scheduler [2].

EXPERIMENTAL DESIGN
This section presents our experimental design for the dataset preparation, testbed, and tuning of hyperparameters.

Dataset preparation
We validated our method using simulation based on an Alibaba microservice dataset2 available in a public repository 3 .The dataset contains dataflows with various communication paradigms among over 1300 microservices running on more than 90 000 containers for twelve hours, recorded in a time interval of 30 s [3].We selected 180 000 rows of the dataset, denoting the microservice times MT , = 0.01 ms/call -5859 ms/call, and the microservice call rates MCR P = 0.025 calls/s -4874 calls/s.

Testbed design
We implemented ML algorithms in Python 3.9 using scikit-learn API [32].Afterward, we compared the runtime performance of the algorithms on two machines: GPU accelerator and 16 GB of memory4 ; • Personal device with an 8-core Intel ® Core ( ) i7-7600U processor and 16 GB of memory.

ML hyperparameter design
This section presents the learning procedure of fine-tuning and optimization of the hyperparameters of the GBR and MLP models, summarized in Table 2, based on three steps: exhaustive search, hyperparameter tuning, and hyperparameter configuration.However, we rely on the default settings of ordinary least squares optimization for the LR model [20].
6.3.1 Gradient boosting regressor.uses a learning curve to evaluate the changes in the training loss for different iterations based on the number of evaluators and the learning rate (see Figure 3a).
Exhaustive search.uses the GridSearchCV library of the ML toolkit scikit-learn and sets the number of estimators to 300 and learning rate to 0.02, which results in overfitting the training data.
Hyperparameter tuning.modifies the number of estimators in the range of 10 -170 and the learning rate in the range of 0.02 -0.4 to converge to a stability point with a faster training time.
Hyperparameter configuration.sets the number of estimators to 15 and the learning to rate to 0.4 with an improved training score without overfitting and reduced training time.Figure 3a shows that, during the training loop, the model tunes each gradient tree or estimator to the previous tree model's error until it reaches the maximum number of estimators set.6.3.2Multilayer perceptron.uses the three-step learning curve evaluating the ML model's performance through the changes in the training loss with different training iterations, number of layers, and number of neurons in the network (see Figure 3b).
Exhaustive search.uses the PyTorch library and sets the number of hidden layers to 3, the number of neurons to 100, and the learning rate to 0.4 in 50 epochs, overfitting the training data.Hyperparameter tuning.decreases the learning rate to 0.0005 to comprehend more about the training procedure.Afterward, we reduce the complexity of the model by lowering the number of neurons and hidden layers (see Figure 1) in each epoch because of the single input of MT , in the dataset.
Hyperparameter configuration. of the MLP model with two epochs, one hidden layer of two neurons, and a learning rate of 0.003 predicts the MCR P without overfitting, as shown in Figure 3b.

Evaluation metrics
In this section, we evaluate the performance of LR, MLP, and GBR prediction models using five metrics.
Pearson correlation coefficient.between the microservice time and its call rate in the Alibaba trace:

EXPERIMENTAL RESULTS
This section presents the performance evaluations of the ML models in predicting the microservice call rates and replicas.

Feature distribution and correlation
Figure 4 shows the distribution, correlation, and relative variation of the two features MT , and MCR P in the Alibaba dataset using the Pearson coefficient.The results denote that we achieve a high correlation 75 % between both feature sets, denoting that the prediction is applied to a correlated set of features.

Model fitting
Figure 5a shows that the LR model fits a linear relation between the predicted microservice call rate MCR ′ P and microservice time MT , .Figure 5b depicts fitting a linear MLP model to the test dataset.Although the model learns to fit a linear relation between the microservice time and call rate, it has a slightly lower MAE than the LR, as shown in Table 3.This indicates that the MLP model remains a proper fit for this dataset despite its neural network baseline imposing a computationally intensive method compared to LR and GBR. Figure 5c shows that the GBR ensemble model does not follow a linear pattern because it iteratively fits new decision tree regressors to the loss of the previous ensemble.In other words, the model continuously tunes and boosts its predictions by fitting a new subset of training data to the ensemble of previous models to create a single low-error predictive model.

Training time
Table 3 illustrates the superiority of the LR method that lowers the training time with the expense of increasing the prediction errors compared with the MLP and GBR.The neural network-based MLP increases the training time of the prediction, although it is based upon the linear models as defined in Section 4.However, the GBR

CONCLUSION AND FUTURE WORK
We explored and compared three ML methods to improve the resource provisioning affected by stochastic changes due to the users' requirements by investigating the performance evaluation of a set of ML models on the monitoring data.We used three different ML models, LR, GBR, and MLP, that predict the microservice call rate based on the microservice time scheduled in the Alibaba Cloud resources.Since utilizing the MLP for this problem with one input and one output was complex, we set a small number of neurons and layers in its prediction model.The experimental results show that the GBR reduces the MAE and the MAPE compared to LR and MLP models.Moreover, the results show that the gradient boosting model estimates the number of replicas for each microservice close to the actual data without any prediction.In the future, we plan to explore integrating the ML models in the Kubernetes autoscaling component [6] and evaluate the optimal deployment of microservices.
Linear regression.defines a relation between the microservice time MT , (as the input feature to the model) and the microservice call rate MCR P (as the output feature of the model), where   = sched (  ).Thereafter, we model a linear relation between the predicted microservice call rate MCR ′ P and the actual microservice time MT , : MCR ′ P = MT , • w P + b P , where w P and b P denote the weight and bias of the LR model, learned to fit a linear relation between the predicted microservice call rate MCR ′ P and the microservice time MT , .The microservice weights form a set:

P
∈M P ∧  ∈M MT , − MT , • MCR P − MCR P √︄ P ∈M P ∧  ∈M MT , − MT , • √︄ P ∈M P ∧  ∈M MCR P − MCR P where MT , and MCR P show the average microservice time and the call rate, respectively.Predicted microservice call rate.MCR ′ P defined in Section 4. Number of replicas.L  defined in Section 3. Mean absolute error.also referred to as L1Loss [33], represents the average sum of absolute differences between the predicted microservice call rates MCR ′ P and MCR P , respectively, in the testing and training: MAE = 1 N M • ∑︁ P ∈ M P ∧  ∈ M MCR P − MCR ′ P .

Figure 4 :
Figure 4: Distribution and correlation of MT , and MCR P in Alibaba microservices dataset.

Figure 5 :
Figure 5: ML model fitting to the test dataset.

Table 3 :
MAE, MAPE, and training times of the prediction models.balance between the prediction errors, including the MAE and MAPE, and the training time of the training model.

Table 4
shows that the ML models estimate the number of replicas by following almost close prediction errors.The results show that the GBR model reaches lower MAPE regarding replication prediction compared to the LR and MLP.

Table 4 :
MAPE of the prediction models.