Automated Hyperparameter Tuning for Adaptive Cloud Workload Prediction

Efficient workload prediction is essential for enabling timely resource provisioning in cloud computing environments. However, achieving accurate predictions, ensuring adaptability to changing conditions, and minimizing computation overhead pose significant challenges for workload prediction models. Furthermore, the continuous streaming nature of workload metrics requires careful consideration when applying machine learning and data mining algorithms, as manual hyperparameter optimization can be time-consuming and suboptimal. We propose an automated parameter tuning and adaptation approach for workload prediction models and concept drift detection algorithms utilized in predicting future workload. Our method leverages a pre-built knowledge-base based on historical data statistical features, enabling automatic adjustment of model weights and concept drift detection parameters. Additionally, model adaptation is facilitated through a transfer learning approach. We evaluate the effectiveness of our automated approach by comparing it with static approaches using synthetic and real-world datasets. By automating the parameter tuning process and integrating concept drift detection, in our experiments the proposed method enhances the accuracy and efficiency of workload prediction models by 50%.


INTRODUCTION
Cloud computing has transformed the manner in which organizations handle their data and applications, enabling them to store, access, and manage these assets in a more efficient manner [25].This technology offers a flexible and scalable infrastructure that grants users the ability to access computing resources as required, eliminating the necessity for maintaining on-premises hardware and infrastructure [25].An essential benefit of cloud computing is its capacity to dynamically assign resources in response to workload requirements, guaranteeing the optimal provisioning and utilization of resources [3].
Cloud workload prediction plays a critical role in resource provisioning and optimization in cloud computing environments.Efficient resource provisioning in such systems requires careful consideration of current and predicted future trends in factors such as workload characteristics, performance requirements, cost constraints, and service-level agreements (SLAs) [15].Cloud providers need to strike a balance between meeting the performance needs of their customers and optimizing resource utilization to control costs [21].
The dynamic nature of cloud workloads introduces uncertainty when trying to forecast future workload patterns [7,14,22].Consequently, it becomes crucial to investigate the phenomenon of concept drift where the statistical properties of data change over time [10] within cloud workloads [1,12,23,28].This work employs a systematic and rigorous experimental methodology.First, a comprehensive set of concept drift detection methods is selected and evaluated based on real-world cloud workload datasets.Next, an automated hyperparameter selection approach is developed, leveraging techniques to identify optimal hyperparameter configurations that improve the accuracy of concept drift detection algorithms for cloud workload prediction.
One key aspect in concept drift detection is the selection of appropriate hyperparameters for the underlying detection algorithms.Manual tuning of these hyperparameters is time-consuming and may not yield optimal results due to the evolving nature of concept drift.To overcome this limitation, this study proposes an automatic hyperparameter selection framework tailored for concept drift detection methods and model parameter selection in the context of cloud workload prediction.
The effectiveness of the proposed approach is evaluated through extensive experiments, comparing the performance of the automatic hyperparameter selection framework against manually selected hyperparameters and baseline approaches.Evaluation metrics include accuracy and detection delay.The results contribute to advancing the field of cloud workload prediction by enabling more efficient and accurate detection and adaptation to concept drift in real-time cloud environments.
The contribution of this research is centered around enhancing the reliability and efficiency of cloud workload prediction systems by leveraging knowledge from historical data to automate the optimization of hyperparameters for concept drift detection methods and prediction models.We propose a system with automatic adaptation capability that requires no human intervention; this improves the system's ability to adapt to changing workload patterns in dynamic cloud computing environments.The process involves three key steps: firstly, constructing a knowledge base that stores historical data features along with corresponding hyperparameters and model weights; secondly, utilizing the knowledge base to recommend appropriate parameters and model weights for future predictions; and finally, employing transfer learning-based adaptation to handle concept drifts by obtaining suitable model weights from the knowledge base for the initial model and assist in efficient adaptation to new patterns.The paper's contribution can be succinctly outlined in the following manner: • Proposes an automated hyperparameter tuning mechanism for both cloud workload prediction models and concept drift detection algorithms, achieved by harnessing historical insights stored within a knowledge base.• Enhancing adaptability by providing real-time parameter recommendations and harnessing model updates through transfer learning.
• Conducting experiments to assess the adaptive approach in comparison to the static baseline, emphasizing the differences and advantages between the two within a variety of scenarios.These evaluations were performed using realworld cloud workloads collected from actual systems.
The rest of the paper is organized as follows, section II focus on illustrating the state-of-the-art related work.Section III provides a detailed description of the methodology to be followed on the proposed method.Experiment and evaluation of results is on Section IV and V respectively.Finally, conclusion and future work is part of Section VI.

RELATED WORK
We provide an overview of the existing research on cloud workload prediction, including a discussion on concept drift technologies and the latest advancements in automatic parameterization techniques.We summarize key findings and contributions from previous studies in these areas, highlighting the state-of-the-art methods and approaches used.
In recent times, the utilization of neural network-based prediction models has become prevalent in cloud resource management systems.Kumar et al. [16] propose time series-based workload prediction based on Long Short-Term Memory(LSTM) networks.To capture the temporal and spatial dependency in the resource utilization, [20] propose a prediction approach combining Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM).
Quang et al. [6] propose an efficient multivariate autoscaling framework using bidirectional long short-term memory (Bi-LSTM) for cloud computing.A method GRU-ES that combines Gated Recurrent Unit (GRU) and exponential smoothing is proposed by [4], where intermediate results are obtained by GRU to be processed by exponential smoothing to produce final predictions.The work of [5] proposes deep Learning-Based Autoscaling Using Bi-LSTM for Kubernetes; the architecture incorporates a Bi-LSTM model for workload prediction, a cooling-down time period to mitigate oscillation, and a resource removal strategy for faster handling of workload bursts.Furthermore, encoder LSTM networks can be utilized for straggler prediction in cloud computing environments, e.g.[24] To achieve optimal performance, neural network-based models require the careful selection of appropriate hyperparameters.Several methods have been developed to automate the process of hyperparameter optimization in Machine Learning (ML) and neural network-based models.Traditional manual optimization algorithms, such as random search and grid search, often require multiple iterations over the data before yielding significant results [13].[9] propose an incremental learning process for supervised learning with parameters optimization by the neural network over data stream.They combined the process with a hyperparameter searching step based on a window of the data stream.
Lacombe et al. [17] present a meta-learning approach for automating the tuning of hyperparameters in evolving data streams.The paper addresses the challenge of optimizing hyperparameters in the context of data streams, where concept drifts occur frequently, necessitating re-tuning.The proposed approach leverages pre-built knowledge to adapt quickly to new tasks, aiming to improve classification performance in the presence of concept drift.
In summary, existing studies primarily focus on static time series forecasting of workloads, with LSTM and RNN-based models offering improved long-horizon and accurate predictions.However, the dynamic nature of workloads necessitates the adaptation of neural network-based models to future changes, which can introduce unnecessary resources and time overhead.Additionally, data mining algorithms used to detect changes in future workloads require finetuning for different data patterns to be effective.To address these limitations, we propose a hyperparameter autotuning step based on data features to enhance the accuracy of workload predictions and change detections.Furthermore, we employ a transfer-learningbased adaptation step for faster adaptation using relatively less data.

METHODOLOGY
In this section, we discuss the workflow of our proposed automatic hyperparameter selection approach for cloud workload prediction models and concept drift detection algorithms.Firstly, we present the baseline workload prediction model.We then show the architecture of the knowledge base for parameter recommendation and transfer learning based model adaptation process in detail.Finally, we categorize the set of hyperparameters utilized in the concept detection algorithms to statistical parameters and window length settings.Statistical parameters are threshold-setting parameters that alter the sensitivity of the algorithm to the magnitude of drift in the streaming data.

Baseline workload prediction
Throughout the research, a time series prediction model based on LSTM [11] is employed to forecast forthcoming workloads.An LSTM cell is a special type of recurrent neural network (RNN) unit that is designed to handle long-term dependencies in sequential data and eliminate the vanishing gradient problem of RNN.The LSTM cell has several components, including input gates, forget gates, output gates, and a memory cell.These components work together to control the flow of information and memory within the cell as illustrated in 1.

Naive hyperparameter setting
A grid-based and random search approach for hyperparameter optimization is employed as an initial phase to assess the performance of the methods and their responses under different conditions.

Concept drift detection methods
Concept drift is a dynamic phenomenon in which the underlying data distribution generating the target variable changes over time.
In a dataset  = ( 1 ,  1 ), ( 2 ,  2 ), ..., (  ,   ) where   represents the input features and   is the corresponding target variable at time i, the concept drift is illustrated as: Where, the conditional probability distribution of the target variable y given the input features  at time  is not equal to the conditional probability distribution at a different time  ′ .This indicates that the relationship between the input features and the target variable changes over time, leading to concept drift.However, concept drift can take various forms and degrees.It is not always binary change, and there might be gradual shifts in the data distribution.This complexity can make concept drift detection and adaptation challenging tasks in real-world applications like cloud computing scenarios.

Building Knowledge Base
Our approach begins with the extraction of features from the dataset, encompassing statistical and temporal aspects.Statistical features capture properties like mean and standard deviation.These important features, along with their corresponding values, are stored in a knowledge base for future reference as illustrated in Fig. 2. In the realm of data streams, the issue of concept recurrence is a well-recognized challenge [18].Consisting of dataset statistical features, model parameters, and model weights, this knowledge repository is cultivated with precision.The knowledge base structure is adapted from [19] where a knowledge base was created to select a suitable model for changing patterns in cloud workloads.However, the authors of that paper fail to consider a concept drift that appears in cloud workloads that alters the data distribution and makes previously trained models ineffective over time.Thus, in this scenario, there needs a way to handle drift in workload patterns and formulate adapt strategy.Through a systematic process, historical data features are imbued with the associated hyperparameters and model weights, resulting in a multifaceted knowledge base.The dynamic nature of this table ensures that it thrives as a repository of insights gleaned from past experiences.
The utility of this knowledge base unfolds in the system's ability to recommend parameters of optimal parameters as new data streams into the predictive framework.This real-time responsiveness marks a pivotal shift in the approach to handling evolving workload patterns.The integration of historical insights empowers the system to not only comprehend new data, but also to propose parameter configurations that hold promise in aligning predictions with the current context.As this knowledge base extends its influence, the predictive system becomes inherently adaptable, harnessing the collective wisdom of prior instances to guide its responses to new situations efficiently.

Matching Criteria
The knowledge base is constructed based on the dataset features, incorporating optimal parameters obtained from concept drift detection algorithms and model weights.When new data arrives in the stream, this knowledge base serves as a reference for parameter selection.Features are extracted from the new data and compared with the entries in the knowledge base using k-nn algorithm with Euclidean distance as a measure.The complete procedure is illustrated in Algorithm 1.
The Euclidean distance between two consecutive time series data points is illustrated as following: Where   and   +1 represent time series data at time  and time series data at time  + 1 respectively and  is the number of dimensions in the time series data.Euclidean Distance , +1 ≥  Within the context of data feature matching and recommendation, the process encompasses the evaluation of three distinct scenarios.The first scenario arises when an exact match is identified within the knowledge base, marked by the smallest Euclidean distance.This discovery triggers the integration of parameter recommendations into the concept drift detection algorithms, aligned with the findings from the knowledge base.Moreover, the model is initiated by incorporating the pertained weights derived from this process.
Moving to the second scenario, an instance of data entry exhibiting a substantial similarity, albeit falling short of an exact match, is encountered, as indicated by a distance exceeding the defined threshold.Here, the initial model is populated with optimal data weights extracted from the knowledge base, effectively harnessing the best-matched information.Subsequently, a transfer learning approach is followed, serving to meticulously fine tune the model to align with the prevailing data pattern.
The third scenario materializes when the knowledge base yields no corresponding match for incoming data within the stream.In this case, statistical features are extracted from the new dataset, facilitating the training of a new model from scratch.As the optimal parameters become discernible, a new entry is then created within the knowledge base, strategically earmarked for future reference and utilization.

Transfer learning and baseline retraining approach
Transfer Learning is the strategy utilized in adapting to concept drift, particularly when labeled data for the new concept is limited [8].By leveraging a pre-trained model trained on a related or similar task, the model can benefit from the general knowledge and learned representations captured by the pre-trained model [26], [27].This can help the model adapt more quickly to the new concept with a smaller amount of labeled data.Transfer learning allows the model to retain the previously learned knowledge while finetuning specific layers or parameters to align with the new concept.This approach can be particularly useful when the concept drift is gradual or incremental.Thus, the adaptation strategy incorporated in the proposed framework consists of transfer learning approach by of fine-tuning models.
The decision to choose between retraining from scratch or utilizing transfer learning for model adaptation depends on several factors.Firstly, the availability of data features in the knowledge base representing the new concept is crucial.If a substantial similarity exists, transfer learning becomes more practical.
Transfer learning for LSTM time series forecasting involves leveraging knowledge from a source domain to improve forecasting in a target domain.As shown in Algorithm we want to transfer its knowledge to a target time series domain(T) for forecasting future workloads.The model has learned patterns and features from the source time series data.Transfer Learning Process: The approach followed to adapt the source domain model to the target domain represented as   (input sequences) and   (target values) while preserving its learned features is as follows: • Use the LSTM layers of   as a feature extractor.
• Add additional LSTM layers on top for domain-specific learning.• Train only the added layers using target domain data while keeping the source layers frozen.
Transfer Model   : The adapted model for the target domain, denoted as   , has the following structure: • LSTM layers from   (frozen) for feature extraction.
• Additional LSTM layers for domain-specific learning in the target domain.
The overall objective is to minimize the Mean Squared Error (MSE) loss  while adapting the source model to capture domain-specific patterns in the target domain: minimize (  ,   (  )) Due to the aforementioned factors and the observed effectiveness of transfer learning in different domains, we employ transfer learning as an adaptive strategy to handle new concepts throughout our experiments, as shown in Algorithm 2. The transfer learning process begins by initializing the model with pre-trained weights obtained from the knowledge base recommendations.Subsequently, the initial model, which was originally trained on the source task, is fine-tuned specifically for the target task, which in our scenario involves workload data after concept drift has occurred.

EXPERIMENTS 4.1 Experimental Setting
We evaluate our proposed method of automatic hyperparameter tuning in contrast to the static hyperparameter tunning scenarios as baselines.We first conduct experiments on automatic tuning of concept drift detection algorithms.Secondly, the experiment includes on tuning the prediction model and adapting to changes through transfer learning.
We first set the static approaches as baselines.We use grid search optimization to find the best hyperparameters for the LSTM model.Concept drift detection methods are set with default hyperparameters recommended in the literature.The second scenario, with the proposed approach automatic and adaptive tuning is leveraged based on a previously built knowledge base.
Initial hyperparameters for the prediction model Table 1 illustrates the list of hyperparameters and possible values used to build the knowledge base for the LSTM model.Moreover, the corresponding model weights after training are saved in the knowledge base.
All experiments are performed on a machine with specifications, Intel Core i7-7700 3.6 GHz CPU with four cores and 32GB of RAM.Additionally, LSTM model is implemented with Keras1 and Tensorflow2 backend whereas stream processing algorithms are based on scikit-multiflow 3 and River4 stream machine learning packages.

Evaluation metrics
To assess the accuracy of the predictions, we employ several evaluation metrics including Root Mean Squared Error (RMSE), Normalized Root Mean Squared Error (NRMSE), and Mean Absolute Percentage Error (MAPE).These metrics are consistently utilized across the experiments to gauge the performance of the methods in terms of prediction quality.

Datasets
We utilize synthetic and real-world cloud workload datasets across all the experiments.

Real-world Workload. A real-world dataset collected from
Ericsson Research Data Center in Lund, Sweden 5 .As illustrated in Table 2, we select two metrics from the traces; CPU utilization and memory utilization for prediction.Precentage CPU utilization of a server is shown in Fig. 3.

EVALUATION 5.1 Knowledge base creation
To establish the knowledge foundation for every workload trace, we utilise 20,000 historical data records from both datasets as illustrated   6 8 in Figure 3.The data is structured with a 15-minute interval.We proceed to extract statistical attributes from this dataset.Employing the ADWIN concept drift detection algorithm, we identify shifts in data distribution.For the initial phase, default hyperparameter values are applied to the algorithm.Subsequent to the detection of alterations in data distribution using the ADWIN algorithm, a fresh LSTM model is trained.This process is repeated for each instance of identified data distribution alteration.Post each training iteration, a new entry is integrated into the knowledge base.This entry encompasses statistical characteristics of the current dataset, model hyperparameters, and model weights.
The initial configuration of the knowledge base employed for each dataset is presented in Table 3.The quantity of entries preserved within the knowledge base for every dataset is visually depicted.The findings demonstrate that for each data entry, AD-WIN successfully identifies shifts in distribution.Consequently, a multitude of models were trained to address each observed deviation in historical workload patterns.These trained models are then retained in the knowledge base, serving as a resource for future utilization.Additionally, the knowledge base incorporates statistical attributes such as the mean, standard deviation, median, minimum, and maximum values.Results illustrate that the historical workload data has valuable information about the characteristics of the workload.This information needs to be retained in order to be used in the future.

Tuning LSTM model and adaptation
To leverage the adaptation to new patterns through the knowledge base, we employ k-nn (k-nearest neighbors) in conjunction with the Euclidean distance metric exclusively.We utilize a predefined threshold to identify the most closely resembling dataset within the knowledge base.Before inputting the dataset into the k-nn algorithm, each dataset undergoes normalization to ensure values fall within the range of (0,1).We have specifically chosen a Euclidean distance threshold of 0.5 as our criterion.Consequently, datasets exhibiting an Euclidean distance of less than 0.5 are considered to be similar.Therefore, when new data arrives in the data stream, statistical features are extracted, and subsequently, a search for a match in the knowledge base is conducted using k-nn.This process is exemplified in Table 4 for the selected datasets.
In this experiment, we employ two distinct adaptation approaches.We designate the concept drift algorithm-informed retraining approach as the baseline, and introduce the proposed knowledge base-dependent adaptation as an The first approach operates independently of prior dataset information.It updates the model by retraining it from scratch each time a concept drift occurs in the data stream.
Conversely, the latter approach leverages the knowledge base to identify analogous patterns from the past, enabling the utilization of previously saved model parameters.If the data in the stream exhibits an Euclidean distance less than the established threshold, the saved model parameters are applied.In the event that there is no exact match for the current data in the knowledge base, a transfer learning approach is employed to adapt an older, similar model to the new pattern.
As illustrated in Figure 4, the second approach demonstrates a delay of less than 50% in comparison to the first approach across all workload dataset tests.

Runtime performance of adaptive and baseline approach
In this context, we assess and contrast the performance of three different approaches: the baseline method which relies on static predictions, the approach that takes into account concept drift for adaptation, and the novel approach that leverages the knowledge base for informed adaptation.Table 4 showcases the prediction performance of each model in terms of various performance metrics, encompassing each dataset utilized in the experiment.Additionally, it includes the number of model updates required to achieve these results.
As depicted in Table 4, both adaptation strategies exhibited notable improvements in comparison to the static approach, surpassing a 90% enhancement in performance.Conversely, the two adaptation strategies demonstrated comparable performances, with the proposed approach showcasing improvements in specific instances.However, the proposed approach, which incorporates the knowledge base, significantly reduced the necessity for retraining to attain satisfactory performance levels.

CONCLUSION AND FUTURE WORK
We demonstrate that the utilization of automatic hyperparameter tuning significantly enhances the efficiency and adaptability of our  workload prediction model.This effectively eliminates the need for unnecessary computations and time-consuming adjustments to accommodate concept drifts in cloud workloads.By improving upon the baseline approach, our method achieves timely and accurate predictions, thereby demonstrating its effectiveness and practicality.
In conclusion, this paper presents a significant advancement in the realm of cloud workload prediction systems.By focusing on the enhancement of both reliability and efficiency, the core contribution lies in the automation of hyperparameter selection for concept drift detection methods and prediction models.This pivotal step serves to fortify the system's capacity to seamlessly acclimate to the everchanging landscape of dynamic cloud computing environments.
The proposed methodology entails a three-step process.This begins with the construction of a comprehensive knowledge basehistorical data features combined with hyperparameters and model weights to form a foundational knowledge repository.Subsequently, this data is harnessed to not only propose suitable parameters and model weights for forthcoming predictions, but also to actively assist in adaptability.In the face of concept drifts, an inevitable facet of dynamic environments, the research employs transfer learningbased adaptation.This approach, adeptly drawing from the knowledge base, empowers the system to adeptly navigate and accommodate evolving trends.
With the goal to enhance cloud workload prediction systems, this research combines theoretical advancements with practical utility.The advancements in automating hyperparameter selection, leveraging historical data, and implementing transfer learning-based adaptation collectively emphasize the effectiveness of this methodology.By tackling the complexities inherent in dynamic cloud computing environments, this study signifies a noteworthy progression towards the establishment of more resilient and agile predictive systems.
Our future work includes thoroughly investigating the theoretical aspect of parameter optimization in streaming cloud workload dataset, as well as an exploration of meta-learning approaches, aiming at enhancing the accuracy and efficacy of workload prediction in cloud environments.

Figure 2 :
Figure 2: Overview of proposed framework A constraint is established through the definition of a threshold value for the distance measure.This initiates the emergence of the subsequent scenarios.Where the threshold ( ) for detecting pattern change based on the Euclidean distance is:

Figure 3 :
Figure 3: Workload characteristics in real-world traces

Figure 4 :
Figure 4: Delay in adaptation by Concept Drift based adaptation (CD) and proposed adaptation approach 2, let us consider a pretrained LSTM model   on a source time series domain (S), and that 15Use LSTM model for further processing or prediction on the new sample;

Table 1 :
Hyperparameters for prediction model

Table 3 :
Knowledge base (KB) entries for each dataset extracted from historical data.

Table 4 :
Prediction performance for methods, baseline static, concept drift aware adaptive (CD) and proposed approach.