skip to main content
research-article
Open Access

An Intent-driven DaaS Management Framework to Enhance User Quality of Experience

Published:14 November 2022Publication History

Skip Abstract Section

Abstract

Desktop as a Service (DaaS) has become widely used by enterprises. In 2020, the use of DaaS increased dramatically due to the demand to work remotely from home during the COVID-19 pandemic. The DaaS market is expected to continue growing rapidly [1]. The quality of experience (QoE) of a DaaS service has been one of the main factors to enhance DaaS user satisfaction. To ensure user QoE, the amount of cloud computation resources for a DaaS service must be appropriately designed. We propose an Intent-driven DaaS Management (IDM) framework to autonomously determine the cloud-resource-amount configurations for a given DaaS QoE requirement. IDM enables autonomous resource design by abstracting the knowledge about the dependency between DaaS workload, resource configuration, and performance from previous DaaS performance log data. To ensure the IDM framework's applicability to actual DaaS services, we analyzed five main challenges in applying the IDM framework to actual DaaS services: identifying the resource-design objective, quantifying DaaS QoE, addressing low log data availability, designing performance-inference models, and addressing low resource variations in the log data. We addressed these challenges through detailed designing of IDM modules. The effectiveness of the IDM framework was assessed from the aspects of DaaS performance-inference precision, DaaS resource design, and time and human-resource cost reduction.

Skip 1INTRODUCTION Section

1 INTRODUCTION

In the past decade, there was a dramatic work-style revolution in which employers have started to adopt Desktop as a Service (DaaS) to provide employees with virtual desktop environments to conduct daily work. Different from the conventional on-promise desktop environments, DaaS instances are implemented in the cloud instead of on personal computers. This revolution is driven by the demand from the employer to achieve unified infrastructure management and reduce security risk and cost. Cloud technology has been one of the key drivers of this revolution. However, when a DaaS service operator manages the cloud resources, it is a highly complex task to decide the sufficient amount of computing resources to be allocated for the DaaS service to meet the DaaS quality of experience (QoE) requirement, also called DaaS performance requirement in this work. The decision requires deep understanding of the dependency of DaaS resource configuration, DaaS workload patterns, and DaaS QoE. The resource decision currently relies heavily on the experience and skills of DaaS operators, increasing human-resource/time cost as the number of DaaS users rapidly increases. We address this challenge by proposing and implementing an Intent-driven DaaS Management (IDM) framework. The main contributions of this work are as follows:

(1)

Propose an IDM framework to determine the DaaS resource amount in accordance with the QoE requirement (we take the DaaS QoE requirement as intent). To ensure its practicability in actual DaaS services, we identified challenges through interviews with DaaS operators and carefully designed the modules of the IDM framework to address these challenges, as briefly mentioned in (2) through (6).

(2)

Identify the main resource-design objective among the various resource-amount configuration options to meet the QoE requirement.

(3)

Identify representative metrics to quantify DaaS QoE, thus enabling precise QoE management.

(4)

Introduce an availability check–alternative proposal (ACAP) mechanism to address the low log-data-availability issues when applying the IDM framework to actual DaaS services. Intuitive DaaS service performance and workload log data need to be collected from the users’ guest operating system (OS), which is usually difficult in the real world. Examples of intuitive workload and performance data include the operations users conduct on the guest OS applications and the response time of these applications. To address this issue, we use performance and workload metrics from the underlying DaaS infrastructure to represent DaaS service performance and workload instead of intuitive DaaS workload and performance metrics.

(5)

Design performance-inference models that infer the DaaS performance (QoE) for given DaaS workload and resource-amount configuration. On the basis of the inference, the IDM framework is able to identify the resource-amount configuration that meets the QoE requirement.

(6)

To address the challenge that the log data regarding DaaS resource amount is of low variation in actual DaaS services, which leads to performance-inference models not being able to determine the overall dependency between resources amount and performance, we introduce a workload-feature-abstraction module, which calculates the expected workload in accordance with the number of virtual machines (VMs), thus enabling the IDM framework to precisely infer the performance on the basis of expected workload, which is a function of the number of VMs. Consequently, the IDM framework can determine the number of VMs that meets the performance (QoE) requirement more precisely by using the dependency between workload and performance.

(7)

Validate the IDM framework using the log data collected on an actual DaaS platform. We examined the IDM framework's performance-inference models’ precision in inferring DaaS performance including host central processing unit (CPU) ready time, disk write latency and disk read latency, and user input response time (sampled) for various patterns of DaaS workload and resource amount. We also evaluated the resource-amount design of the IDM framework and compared it with a conventional human-resource-based design approach and showed that the IDM framework reduces time/human-resource costs for resource-amount design.

Skip 2RELATED WORKS IN DAAS RESOURCE MANAGEMENT Section

2 RELATED WORKS IN DAAS RESOURCE MANAGEMENT

In this section, we introduce related works in the DaaS resource management area. We selected several most recently published works and most-cited works.

Nakhai and Anuar [2] evaluated different guest OS influence on DaaS performance in terms of memory usage, CPU response time, and application response time under three patterns of workload. The work has identified the virtual desktop OS that has lower response time and memory usage under given conditions. Triyasona and Krathu [3] showed through an experiment that the screen size of user devices used to connect to cloud-based DaaS service has an impact on DaaS user experience. The authors indicated that client screen size should be included in the QoE prediction model for evaluating DaaS user experience in addition to network parameters. Jin et al. [4] designed and implemented a virtual desktop system, and designed the scheduling mechanism in accordance to actual needs for storage, computing resources, and other resources to lower the load difference for hosts. Li et al. [5] proposed a network performance evaluation method for the virtual desktop service. In the proposal, the QoE of the virtual desktop service is evaluated based on the timestamp of the data packet that is sent and received and the packet loss ratio. Calyam et al. [6] VDC-Analyst tool for the virtual desktop service provider to design and verify various resource allocation schemes, as well as to check net utility and service response time, which are metrics for DaaS service readiness. In another work, Calyam et al. [7] investigated the impact of memory and network conditions on the application performance in a virtual desktop environment, and proposed a benchmarking tool (VDBench) to guide online resource allocation.

According to our investigation of the related research, to the best our knowledge, mapping the DaaS QoE requirement (intent) into the cloud resource amount has barely been studied.

Skip 3BACKGROUND AND PROBLEM STATEMENT Section

3 BACKGROUND AND PROBLEM STATEMENT

An increasing number of enterprises are adopting DaaS to support their employees in accessing the standard virtual desktop environment from various locations. According to Gartner [1] (Table 1), DaaS is expected to have the most significant growth in the worldwide public cloud service market in 2020, increasing 95.4% to $1.2 billion. The expansion of the DaaS market has been accelerated due to the demand to work from home during the COVID-19 pandemic. It is also forecasted that the DaaS market will continue to grow rapidly in 2021 and 2022, and the revenue from DaaS services will double in 2022 compared with that in 2020.

Table 1.
 2019202020212022
Desktop as a Service (DaaS)6161,2031,9512,535
Cloud Service Total Market242,697257,867306,948364,062

Table 1. Worldwide Public Cloud Service Revenue Forecast (Millions of U.S. Dollars) [1]

An overview of a DaaS platform is shown in Figure 1. DaaS is usually implemented on a private cloud or public cloud platform empowered by virtualization technology such as Kernel-based Virtual Machine (KVM) [8] and XEN [9]. A DaaS VM instance is assigned for each DaaS user, and the DaaS user is able to access his or her DaaS instance and conduct daily operations on the guest OS of the instance.

Fig. 1.

Fig. 1. DaaS system view.

To ensure employee work efficiency, a DaaS service must keep a high level of user QoE. Since QoE is directly impacted by the cloud resource amount allocated for the DaaS services, the DaaS operator must carefully design the cloud resource amount to meet the QoE requirement. According to our interviews with DaaS operators, deciding and adjusting the cloud resource amount mainly relies on experienced operators, and the design approach is also called human-based resource design approach in this work. The decision relies on the DaaS operator's knowledge and experience about the dependency among cloud-resource-amount configurations, DaaS workload patterns, and DaaS performance. The task is highly complex due to the numerous resource-amount configurations as well as the high variation in DaaS workload.

Along with the expansion of DaaS, the demand for resource design for DaaS services is increasing, whereas the number of experienced DaaS operators is highly limited, and hiring and training of DaaS operators usually incurs huge cost. Furthermore, the manual resource-design process requires hours to days and may lead to long-term QoE degradation due to the latency of resource design.

In this study, to resolve the preceding challenge, we take DaaS QoE requirements as intent, and implemented the IDM framework, which translates DaaS QoE requirements (intent) into cloud-resource-amount configurations (e.g., the number of VMs to be allocated on each host).

Intent-based management has drawn wide attention from the telecommunication industry. It is used to meet a network operator's high-level management goals without requiring the operator to have knowledge on the underlying infrastructure configuration. Intent-based management has been studied and implemented by standards organizations including ETSI ENI [10, 11], TM Forum [12], 3GPP [13, 14] ETSI ZSM [15]; open-source software communities including OpenDaylight [16] and ONOS [17]; and academic organizations. The current main application scenarios of intent-based management include intent-based software-defined-networking management and intent-based wireless network configuration. To the best of our knowledge, the application of intent-based management to DaaS has not been studied.

Skip 4ANALYSIS AND PROPOSAL Section

4 ANALYSIS AND PROPOSAL

The high cost of human-decision-based DaaS resource design calls for an autonomous resource design mechanism to free the operator from complex resource design. We take the DaaS QoE requirement as the DaaS operator's intent and proposed the IDM framework, which autonomously gives recommendations for the resource design to meet the DaaS performance requirement.

To apply the IDM framework to actual DaaS services, we carefully analyzed the challenges in such environments. In this section, we first introduce the basic architecture of the IDM framework (Section 4.1), then introduce the challenges that we must take into consideration when applying the IDM framework to actual DaaS services (Section 4.2). Finally, we discuss the detailed design of each module (Section 4.3) of the IDM framework to address these challenges.

4.1 Overview of the IDM Framework

In our preliminary studies [17, 18], we proposed a resource design framework of intent-based cloud service management that derives the relationship between the cloud workload, cloud resource, and performance when processing the given workload using the given cloud resource. We adopted this basic concept of a resource design framework when we designed the IDM framework. The IDM framework trains performance-inference models that captures the dependency between the DaaS resource, workload, and performance (QoE), and on the basis of the models, it calculates the DaaS resource amount that meets the QoE requirement.

The IDM framework assists the DaaS operator in the resource decision to meet the DaaS QoE requirement. The input of the IDM framework is the intent from the DaaS operator. For instance, the DaaS operator can specify the QoE requirement (intent) that “the DaaS service host CPU ready time should be no longer than 500 seconds from 9:00 to 12:00” through a graphical user interface (GUI), API call, or command line. The output of the framework is the recommendation of resource amount—for example, it recommends that no more than 50 DaaS users should be allocated on the same host for given period to meet the intent. The recommendation is fed back to the DaaS user for confirmation and revision if necessary. The resource recommendation can also be embedded in machine-readable format for the cloud resource management software (e.g., HEAT temple for OpenStack [19]), to autonomize the resource implementation and further reduce human-resource cost.

To achieve this, the IDM framework is composed of three main modules: information collection, knowledge abstraction, and decision-making and implementation.

The IDM framework is closed-loop. In the information-collection module, log data about the DaaS resources, workload, and performance are collected from the DaaS infrastructure and DaaS guest OS. In the knowledge-abstraction module, dependency on the cloud resources, workload, and performance (DaaS users’ QoE) is abstracted from the logs as performance-inference models. In the decision-making and implementation module, these models are used to make decisions about the resource amount for the intent (DaaS performance requirement). The resource-amount configuration is then implemented onto the DaaS service, and more logs about the resource, workload, and performance are collected and may be used to improve performance-inference models. However, it is challenging to apply the IDM framework to actual DaaS services due to constrictions of actual services. We address these challenges in Section 4.2 by introducing the detailed designs of these modules.

4.2 Challenges of Applying IDM to Actual DaaS Services

Challenge 1: Determining the resource-design objective. Designing the resource to meet the QoE requirement is complex. One reason is that the DaaS operator must decide which resource level to adjust to meet the QoE requirement. Possible adjustable resource levels include the instance level (adjusting the number of virtual CPUs (vCPUs) and memory allocated for each VM instance), host level (adjusting the number of VMs allocated on each host), and cluster level (adjusting the number of hosts for one cluster). From our investigation results, to ensure the fairness between employees regarding the instance level, the resource allocated for VM instances are usually set to be identical. Therefore, we did not consider VM instance-level resource design in this study. Regarding the cluster level, since changing the number of hosts involves procurement of additional hosts and may take weeks to months to come into effect, it is not considered the primary solution to QoE degradation. Therefore, we also did not consider cluster-level resource design. We focused on host-level resource design since the host-level resource configuration can be adjusted through the resource management interfaces in seconds, enabling prompt response to QoE changes. The IDM framework determines the number of VMs allocated for each host that meets the QoE requirement.

Challenge 2: Quantification of DaaS user QoE. According to our interviews, the DaaS user's QoE requirements are often expressed qualitatively rather than quantitatively (e.g., “the DaaS service's response needs to be improved” and “the virtual desktop freezes frequently”). Furthermore, the DaaS operator must rely on the user's feedback to determine whether the QoE requirements are satisfied. Feedback-based qualitative QoE measurements largely lengthen the DaaS provider's response time to QoE degradation or QoE requirement changes, preventing precise resource-amount design. To enable precise resource-amount design in accordance with the QoE requirement, it is crucial to identify the representative DaaS performance metrics to quantify QoE.

Challenge 3: Addressing low log-data availability. To implement the IDM framework, we use the history-log data to train performance-inference models, including log data about the DaaS resources, workload, and performance. The log data are not always available in actual DaaS services for the following three reasons. First, due to data confidentiality policy, the DaaS operator is not granted the right to access some of the log data, especially the guest OS log data. Second, collecting a huge amount of real-time log data may largely impact the performance of the DaaS service, and thus it may be restricted in actual DaaS services. Third, for some log data, it is necessary to install data-collection agents on the DaaS platforms, VM instances, and guest OS to collect the log data, which is usually prohibited in DaaS services.

Challenge 4: Deriving the dependency between DaaS resources, workload, and performance. To design the resource correctly in accordance with the QoE requirements, the IDM framework must construct performance-inference models that precisely determine the dependency between the resource, workload, and performance in the knowledge-abstraction module. Thus, the input and outcome variables of this model must be carefully chosen on the basis of the requirement of the DaaS operator and data availability, and the model architecture and parameters need to be tuned carefully to enable high-precision DaaS performance inference.

Challenge 5: Addressing low resource-amount variation in the log data. In actual DaaS services, since the design and re-design of cloud resources for a DaaS service incur huge time and human-resource cost, the resource-amount configuration tends to remain unchanged for a long time. Therefore, when we collect log data from the DaaS environment, the resource amount falls in a small range. Consequently, when we train performance-inference models using the collected log data, due to the low variation in resource-amount log data, it is difficult for the model to determine the overall dependency between the resource amount and performance.

4.3 Designing the IDM Framework to Address Challenges to Applying It to Actual DaaS Services

To apply the IDM framework to actual DaaS services, we need to address the challenges introduced in Section 4.2. In this section, we discuss the detailed designing of the three modules of IDM framework—data collection, knowledge abstraction, and decision-making and implementation and how the designs for these modules address challenges 2 through 5 (note that the solution to Challenge 1 was introduced in Section 4.2).

4.3.1 Information-Collection Module (Addressing Challenges 2 and 3).

This module is responsible for continuously collecting and storing log data used for training performance-inference models. The module collects the log data of DaaS workload, resource amount, and performance. It is mainly composed of three sub-modules: DaaS workload data collection, DaaS performance data collection, and DaaS resource data collection.

To address Challenge 2—quantization of the DaaS QoE—we chose several metrics to represent QoE. The principles used when choosing the metrics are as follows: (1) the metric needs to be highly relative to QoE (i.e., the degradation of the metric usually leads to the degradation in QoE or vice versa), (2) the metric needs to able to be monitored and collected easily using automatic monitoring tools, and (3) it is better for metrics to be from various levels (e.g., the host level, VM instance level, and guest OS level). It is obvious that from the principles (2) and (3), the metrics need to be determined by also taking into consideration Challenge 3—log-data availability.

To address Challenge 3 that the workload and performance log data may be difficult to collect, we designed a two-step ACAP mechanism. ACAP provides cheat sheets for the DaaS service provider, which are composed of two categories: intuitive log categories and alternative log categories. Taking an ACAP cheat sheet for DaaS performance (QoE) log collection as an example, based on principle (1) for QoE metrics, we enumerated the intuitive QoE log categories and alternative QoE log categories. Intuitive log categories are log data that intuitively shows the performance level such as DaaS user input response latency and guest OS application response time. However, these types of log data need to be collected from the guest OS, which has more strict limitations and thus they are difficult to collect, whereas the alternative log categories are log data that can indirectly show the DaaS performance level but have relatively lower restrictions and are easier to collect, such as CPU ready time and disk read/write latency. For instance, longer CPU ready time indicates that the DaaS service is experiencing CPU resource drain and may lead to DaaS QoE degradation. When designing IDM, we ask the DaaS service provider to check the intuitive log's availability, and if the intuitive log is unavailable, we propose alternative log data categories for the DaaS service provider to represent the workload and performance level of the DaaS service.

4.3.1.1 Hierarchical Workload-data-collection Module Enabled by ACAP.

Workload log data are essential for constructing performance-inference models, and the workload-collection module in IDM framework was designed to collect both intuitive and alternative workload log data. The DaaS workload is produced by general DaaS users when conducting operations on the guest OS, e.g., text editing, web browsing, and conferencing. Thus, to intuitively represent DaaS workload, it is necessary to collect logs about how many users are using text-editing software on the guest OS, how many users are attending web conference from the guest OS, etc. These logs need to be collected from each guest OS; thus, an agent in each guest OS is necessary. As mentioned in Challenge 3, in actual DaaS services, the collection approach is usually not suitable due to privacy restrictions as well as the consideration of impact to DaaS users.

In accordance with ACAP, if the above-mentioned intuitive workload log data collection is not available, due to the nature of the DaaS service, the IDM workload-data-collection module collects alternative workload data from the underlying DaaS infrastructure. When users conduct operations on the guest OS, the workload is processed by the underlying DaaS infrastructure. For instance, when users conduct video-editing operations in the guest OS, the workload is processed mainly by the underlying host's graphics processing unit/CPU and memory. When users conduct video-conferencing operations, the workload is processed by the underlying host's CPU and network interfaces. Thus, by collecting the workload data of the underlying DaaS infrastructure, including CPU workload, memory workload, disk I/O workload, and network interface workload, the DaaS workload-data-collection module provides alternative workload data to feed into the knowledge-abstraction module when collecting the intuitive workload log data from the guest OS is not possible, thus enabling IDM framework to be applicable to various actual DaaS services with strict data-collection restrictions.

4.3.1.2 Comprehensive Performance-data-collection Module Enabled by ACAP.

As mentioned above, intuitive DaaS performance (QoE) log categories are logs that are able to directly measure users’ perceptions about the DaaS performance from the guest OS level. Representative intuitive performance log categories include user input response, application response, etc. However, collection of these metrics requires access to the guest OS, which is usually restricted in actual services.

When the intuitive logs are unavailable, the module collects the alternative performance logs in accordance with ACAP results. The alternative performance log categories are the infrastructure-level performance metrics that can be collected from the DaaS infrastructure and reveal the performance that is highly relative to user QoE. These metrics include host CPU ready time, disk read/write latency, and so forth. Longer host CPU ready time and disk read/write latency indicate that the DaaS service is experiencing resource drain, which may lead to QoE degradation. Thus, these log data can indirectly show the DaaS performance level. Based on the analysis and investigations, the performance-data-collection module we implemented in this work collects the following metrics to represent DaaS performance: (i) CPU ready time of the host, (ii) disk read/write latency of the host, and (iii) user input response latency collected from the guest OS. Note that user input response latency is collected from a small number of randomly sampled users.

We implemented the log-data-collection modules by using available DaaS monitoring tools (e.g., vSphere) to collect the selected log categories in accordance with ACAP results instead of building them from scratch.

4.3.2 Knowledge Abstraction (Addressing Challenge 4).

As introduced in Section 4.1 and Figure 2, to design a resource in accordance with the QoE requirement, the IDM framework constructs performance-inference models to determine the dependency between the resource, workload, and performance (QoE) in the knowledge-abstraction module. The usable log-data categories to train the model are decided through the ACAP with the DaaS service provider.

Fig. 2.

Fig. 2. Overview of the IDM framework.

On the basis of the IDM framework and the ACAP results, we designed the explanatory (input) variables and outcome variables of these performance-inference models (Table 2). The DaaS infrastructure-level workload log data and resource-amount-configuration log data are used as the explanatory variables of the models. The workload log data we collected includes the host CPU, memory, disk, network usage metrics, VM CPU, memory, disk, and network usage metrics (note that the VM workload data are collected from a random VM in each host as a sample to minimize the impact on the DaaS service). The resource-amount-configuration log data is the number of VMs on the host. For performance-data collection, the main performance data from the DaaS infrastructure are used as the outcome variables of the performance-inference models. The performance data include the host CPU ready time, host disk read/write latency, and user input response. Note that the user input response time is also collected from a random user's guest OS in each host as a sample to minimize the burden on the DaaS user.

Table 2.
Explanatory Variables of All Performance- Inference ModelsOutcome Variables of Each Performance- Inference Model
WorkloadHost CPU usageHost CPU ready timeHost-level performance
Host disk usage
Host memory usage
Host network usageDisk read latency
Guest CPU usage
Guest disk usage
Guest memory usageDisk write latency
Guest network usage
Resource amountNumber of VMs on hostUser input response latencyGuest OS-level performance

Table 2. Explanatory and Outcome Variable Design of Performance-Inference Models

We used a neural network regression model to train performance-inference models using the collected log data. From preliminary experiments, the relationship between the explanatory and outcome variables of the model is not linear or polynomial, and neural network based models surpass other models (e.g., linear regression, polynomial regression) in performance-inference precision. To evaluate the precision of the performance-inference models, we used fivefold cross validation (CR) to calculate the precision of performance inference. The log data were split into five sets randomly, and for the first iteration, the first set was used to test the model, and the rest were used to train the model. In the second iteration, the second set was used as the testing set, whereas the rest served as the training set. This process was repeated until each of the five sets had been used as the testing set. The average precision of the five iterations was then calculated as the performance-inference precision of the model.

To improve the performance-inference precision, we conducted feature selection for the explanatory variables by removing features that do not contribute to the improvement of the models’ performance-inference precision. We also conducted a grid search of neural network parameters including layers, neurons, optimizer, and activation function to determine the optimal hyper-parameter settings that maximize the precision. We will introduce them further in Section 5.1.

4.3.3 Decision-Making and Implementation (Addressing Challenge 5).

As introduced in Section 4.1 and Figure 2, the DaaS operator inputs the intent to the IDM framework (i.e., the DaaS performance requirement for a given period). For example, a DaaS operator may specify the following intent through the user interface (e.g., GUI, API call): “Host-level performance requirement: host CPU ready time should be no longer than 500,000 ms and the disk read/write latency should be no longer than 100 ms from 9:00 to 12:00, 18 Jan. Mon (future time). Guest OS-level performance requirement: the user input response time should be no longer than 30 ms from 9:00 to 12:00, 18 Jan. Mon (future time).” The intent is passed to the decision-making and implementation module. In accordance with the period specified in the intent, this module calculates the expected DaaS workload amount and reads the configurable resource configuration (i.e., the number of DaaS VMs per host) one by one from the resource-amount configuration database. The expected workload and resource-amount configuration are input to performance-inference models to infer the performance for the given resource-amount configuration and expected workload. Next, if the inferred DaaS performance satisfies both the host-level and guest-OS-level performance requirements, the resource-amount configuration is output as a resource solution that meets the intent. There may be multiple resource-amount configurations that satisfy the intent, and another selection mechanism on the basis of cost and so forth can be executed to determine the optimal resource-amount configuration. Finally, the resource solution is also sent back to the DaaS operator for confirmation or manual adjustment if necessary. The operator can confirm/revise the resource decision and instruct to implement the resource-amount configuration accordingly.

However, when applying the process in an actual DaaS service, as mentioned in Challenge 5, the log data about the DaaS resource amount are of low variation, and due to the lack of the resource variation in the history log data, it is challenging for the performance-inference models to determine the dependency between the DaaS resource amount and performance.

In the dataset we used in this study, the details of which are listed in Table 3, in 1,350 records of log data, the resource amount log data (number of VMs per host) has only nine variations (unique counts) and falls in a narrow range. From common experience, it is difficult for a regression model to determine the dependency between a feature variable x and an objective variable y when the variation of y is much larger than that of x. In our case, y is the DaaS performance (e.g., the host CPU ready time) and x is the resources configuration (i.e., the number of VMs per host). Thus, when the IDM framework enumerates the resource configurations (number of VMs per host) to be input to the performance-inference models, there is a high possibility that the inferred performance will be imprecise. On the basis of the inference, the resource amount (number of VMs) that meets the intent cannot be selected.

Table 3.
CountsUnique Counts
Resource Configuration Log DataNumber of VMs per host1,3509
Workload Log DataHost CPU usage1,3501,182
Host disk usage1,3501,348
Host network usage1,3501,345
Performance Log DataHost CPU ready time1,3501,311
Host disk read latency1,3501,332
Host disk write latency1,3501,262
User input response time1,35096

Table 3. Counts and Unique Counts of Resource-Amount Configuration, and Partial Workload Log Data Used in This Study

According to our observation of the dataset, the host workload log data are of relatively high variations. Taking the host CPU and disk workload (host CPU usage and host disk usage) as examples, the unique counts of host CPU usage and host disk usage log data records are 1,182 and 1,348 compared to the total number of records (1,350) of the dataset. Thus, compared with the number of VMs, the performance-inference models perform better in determining the dependency between the DaaS workload and performance. Another important fact is that the host workload is highly dependent on the number of VMs allocated to the host (i.e., the host CPU and disk usage increase as more VMs are allocated to the host). This inspired us to represent the expected workload in accordance with the number of VMs; thus, when inferring the performance in accordance with the expected workload, which is a function of the number of VMs, the performance-inference models can determine the dependency between the number of VMs and performance more precisely by using the dependency between DaaS workload and performance.

On the basis of this consideration, we modified the decision-making and implementation module, as shown in Figure 3, by introducing a workload-feature-abstraction module that calculates the expected workload in accordance with the number of VMs.

Fig. 3.

Fig. 3. Modified decision-making and implementation module to address Challenge 5.

The design and implementation of the workload-feature-abstraction module is as follows. The task of this module is to calculate the expected workload amount at the host level for given time \( t \) and number of VMs \( {N_{VM}}_j( t )\ \)at time \( t. \)

Assume that the workload caused by a VM is a function of \( t \), (1) \( \begin{equation} w_{v{m_i}}^{\rm{\ }}\left( t \right) = h_{v{m_i}}^{\rm{\ }}\left( t \right)\!, \end{equation} \)where \( w_{v{m_i}} ( t ) \) is the workload of \( v{m_i}\ \)at t and \( h_{v{m_i}} ( t ) \) is the workload function of \( t \).

The workload of the host is the sum of the workloads of the VMs on the host, (2) \( \begin{equation} w_{hos{t_j}}^{\rm{\ }}\left( t \right) = \mathop \sum \limits_{v{m_i} \in hos{t_j}}^{\rm{\ }} h_{v{m_i}}^{\rm{\ }}\left( t \right)\!, \end{equation} \)where \( w_{hos{t_j}\ } ( t ) \) is the workload of \( hos{t_j} \) at \( t \). Note that in our study, only the history log data of the host workload data and number of VMs per host from 1 week prior are available; thus, we need to derive \( w_{hos{t_j}} ( t ) \) from these available log data.

To represent \( w_{hos{t_j}} ( t ) \) in accordance with the number of VMs per host, we used the following two important de facto assumptions.

Assumption 1. A given DaaS user tends to conduct similar operations, thus producing similar workload for the DaaS platform for a homogeneous period. In other words, for a given DaaS user, every Monday morning, we can assume that he or she conducts similar operations and produces similar patterns and amount of workload on Monday mornings as long as there are no changes in work style. The assumption is based on interviews with DaaS operators and users. The observation of log data about individual user workloads also supports this assumption. For example, in Figure 4, we show the CPU, memory, and disk workload produced by user A's operation for period 1 and homogenous period 2. The figure as well as other sampled DaaS users show that the workload patterns are quite close in the homogeneous period. Consequently, we can approximate the expected workload produced for a given period with the workload logs collected in the homogeneous period if no additional information is available.

Fig. 4.

Fig. 4. CPU, memory, and disk workload of homogeneous periods 1 and 2.

Assumption 2. Homogeneous users tend to have similar workload patterns for a given period. For example, two users A and B in the same department tend to conduct similar work, thus producing similar workload patterns. Similar to Assumption 1, this assumption is based on interviews and investigations with DaaS operators and users. The workload log data also support this assumption. For example, Figure 5 shows the CPU, memory, and disk workload produced by two homogenous users’ B and C operations for a given period. The figure as well as other sampled DaaS users show that the workload patterns are quite close for homogeneous users in the same department for a given period. In other words, when the workload log data of user C are unavailable but those of homogeneous user B are available, we can use user B's workload log data to represent those of user C if no other additional information is available. Thus, when the history data of a given VM is not available, the workload function of that VM can be derived from the homogenous users of that VM.

Fig. 5.

Fig. 5. CPU, memory, and disk workload of homogeneous users B and C.

Thus, in accordance with Assumption 1, we can rewrite Formula (1), the workload for the VM \( v{m_i}\ \)as follows: (3) \( \begin{equation} w_{v{m_i}}^{\rm{\ }}\left( t \right) = h_{v{m_i}}^{\rm{\ }}\left( t \right) = h_{v{m_i}}^{\rm{\ }}\left( {t - 1w} \right)\!. \end{equation} \)

Note that we assume the homogenous period for \( t \) is the same time 1 week prior \( ( {t - 1w} ) \) based on interviews and experience. The homogeneous time period can also be set to other values in accordance with actual conditions.

Then, in accordance with Assumption 2, we can derive the workload for the VM \( v{m_i}\ \)on the host \( hos{t_j}\ \)as follows: (4) \( \begin{equation} w_{v{m_i}}\left( t \right) = h_{v{m_i}}\left( {t - 1w } \right) = \begin{matrix}w_{hos{t_j}}\left( {t - 1w} \right)\\{}\end{matrix}\!\!\Big/\!\!\begin{matrix}{}\\{N_{VM}}_j\left( {t - 1w} \right)\end{matrix},\end{equation} \)where \( {N_{VM}}_j( {t - 1w} ) \) is the number of VMs on \( hos{t_j} \) at \( t - 1w \).

Similarly, in accordance with Assumption 2, we can represent the workload on the host as follows: (5) \( \begin{equation} w_{hos{t_j}}^{\rm{\ }}\left( t \right) = \mathop \sum \limits_{v{m_i} \in hos{t_j}}^{\rm{\ }} h_{v{m_i}}^{\rm{\ }}\left( t \right) = {N_{VM}}_j\left( t \right)_{\rm{\ }}^{\rm{\ }}{\rm{*}}h_{v{m_i}}^{\rm{\ }}\left( t \right)\!, \end{equation} \)where \( {N_{VM}}_j( t ) \) is the number of VMs on \( hos{t_j} \) at \( t \).

Finally, by substituting Formula (4) into Formula (5), we can obtain the expected host workload for \( t \) represented as the function of the number of VMs on the host \( {N_{VM}}_j( t ) \) as follows. Note that the history log data of the host workload data \( w_{hos{t_j}} ( {t - 1w} ) \) and number of VMs per host \( {N_{VM}}_j( {t - 1w} )\ \)from 1 week prior are available, as mentioned previously. (6) \( \begin{equation} w_{hos{t_j}}^{\rm{\ }}\left( t \right) = {N_{VM}}_j\left( t \right)_{\rm{\ }}^{\rm{\ }}{\rm{*}}h_{v{m_i}}^{\rm{\ }}\left( t \right) = \frac{{{N_{VM}}_j\left( t \right)}}{{{N_{VM}}_j\left( {t - 1w} \right)}}{\rm{*}}w_{hos{t_j}}^{\rm{\ }}\left( {t - 1w} \right) \end{equation} \)Therefore, the expected host workload \( w_{hos{t_j}} ( t ) \) (host CPU usage, disk usage, etc.) for the given \( t \) is derived as a function of the number of VMs \( {N_{VM}}_j( t ) \) by the workload-feature-abstraction module.

Thus, the overall processing flow of the modified decision-making and implementation module is shown in Figure 6. First, this module enumerates the configurable number of VMs per host from low to high. The number of VMs are then input to the workload-feature-abstraction module, which calculates the expected workload including expected host CPU usage, disk usage, and so forth in accordance with the number of VMs. The number of VMs and expected workload are then input to the performance-inference models, which are generated in the knowledge-abstraction module, and the performance is inferred. If the inferred performance meets the DaaS performance requirement, the number of VMs is recorded as the one alternative of the recommended number of VMs. After that, the max number of the alternatives is selected as the final recommended resource configuration. Note that the selection mechanism from the alternatives can be modified in accordance with other resource cost policy and so forth. The recommended number of VMs is sent to the DaaS operator for confirmation or revision, and at the same time the recommended resource is embedded in resource-orchestration templates. Finally, the resource solution is implemented under the instruction of the DaaS operator.

Fig. 6.

Fig. 6. Overall processing flow of the decision-making and implementation module.

Skip 5VALIDATION DATA AND RESULTS Section

5 VALIDATION DATA AND RESULTS

We validated the IDM framework using the log data collected on an actual DaaS platform. In this section, we present the validation results including precision of the IDM framework's performance-inference models, which is one of the key factors for precise resource design (Section 5.1), and show the effectiveness of the IDM framework's resource design and discuss its advantages compared with the conventional human-based resource design approach (Section 5.2).

Table 4 provides the general information about the scale of the DaaS platform and that of the user population, as well as the software used for the platform. Note that further information about the hardware and software used in the DaaS platform is omitted due to confidentiality restrictions.

Table 4.
No. of Computational Nodes (Host)1∼10
Number of DaaS Users0∼500
HypervisorVMware ESX
Guest OSWindows 10
Software on the Guest OSMicrosoft Office, web conferencing software, etc.

Table 4. DaaS Platform Details

5.1 Evaluation of IDM Performance-Inference Models

As shown in Table 2, we selected the host CPU response time, host disk read/write latency, and user input response time as the DaaS performance (QoE) metrics after taking into consideration data availability. In the knowledge-abstraction module, we trained the performance-inference models to infer the performance for a given workload and the number of VMs on the host. As mentioned in Section 4.3.2, we used neural network regression for the model and conducted feature selection to remove redundant or non-informative input variables. We first conducted a coarse feature selection by calculating Spearman's rank coefficient between the potential input variables (i.e., the workload metrics from the host and guest) and output variables. Input variables that had a low absolute Spearman's rank coefficient with the output variables were removed. We then conducted a fine feature selection of the input variables by omitting the input variables one by one and constructing the model using the remaining input variables. We determined whether the model's performance-inference precision decreases due to omitting the given input. If yes, the omitted input variable is identified as contributing to the inference precision and kept as one of the input variables. Otherwise, the variable is removed from input variables. The input variables that were identified as uninformative or redundant in our experiments included memory swap in speed, memory swap out speed, memory balloon size, and memory overhead consumption. To further improve the performance-inference models’ precision, we conducted grid search to find the optimal hyper-parameters (number of layers, number of neurons for each layer, activation function) for the models. The results are listed in Table 5. We also used a RandomNormal (mean = 0, standard deviation = 0.1) initializer for the initial model parameters, and the parameters were updated automatically with the model.

Table 5
Grid Search targetSearch RangeOptimal Search Result
No. of layers1–104
No. of neurons of each hidden layer32, 64, 96, 128, 160, 192, 224, 256224
Activation functionSoftmax, Sigmoid, tanh, ReLUReLU
Optimization functionAdam, SGD, Nadam, RMSprop, Adam, Nadam, RMSpropAdam

Table 5 Grid Search of Hyper-Parameters for the Model's Performance-Inference Precision

5.1.1 Precision of the Performance-Inference Model for Host CPU Ready Time.

The host CPU ready time inference results of the five sets of the validation data are shown in Figure 7. The vertical axis is the inferred host CPU ready time, and the horizontal axis is the real (observed) host CPU ready time for the given workload and number of VMs on the host. The inference results when we used mean absolute percentage error (MAPE), mean square error (MSE), and mean absolute error (MAE) are in the first, second, and third rows, respectively. For each loss function, we can see that the data points converge to the diagonal line where the inferred and real performance are equal.

Fig. 7.

Fig. 7. Comparison of inferred CPU ready time and real CPU ready time for fivefold CR when applying different loss functions.

Table 6 lists the statistical results of the host CPU ready time inference errors. We used the average MAPE, average root mean square error (RMSE), and average MAE of the fivefold CR to evaluate the precision of performance inference. The model had a minimal average MAPE of 8.57% when we used MAPE as the loss function. Similarly, the model had a minimal average RMSE of 58,860 when we used MSE as the loss function and minimal average MAE of 36,920 when we used MAE as the loss function.

Table 6

Table 6 Inference Errors for Host CPU Ready Time

5.1.2 Precision of the Performance-Inference Model for Host Disk Write Latency.

The host disk write latency inference results of the five sets of the validation data are shown in Figure 8. The vertical axis is the inferred host disk write latency, and the horizontal axis is the real (observed) host disk write latency for the given workload and number of VMs. The inference results when we used MAPE, MSE, and MAE are in the first, second, and third rows, respectively. For each loss function, we can see that the data points converge to the diagonal line where the inferred and real performance are equal.

Fig. 8.

Fig. 8. Comparison of inferred host disk write latency and real host disk write latency for fivefold CR when applying different loss functions.

Table 7 lists the statistical results of the inference errors for host disk write latency. We used the average MAPE, average RMSE, and average MAE of the fivefold CR to evaluate the performance-inference precision. The bold numbers in table 7 represents the minimal average inference error (MAPE, RMSE and MAE) of fivefold CR respectively when using different loss functions. Similarly, the model had a minimal average RMSE of 296.92 when we used MSE as the loss function and minimal average MAE of 210.36 when we used MAE as the loss function.

Table 7.

Table 7. Inference Errors for Host Disk Write Latency

5.1.3 Precision of the Performance-Inference Model for Host Disk Read Latency.

The host disk read latency inference results of the five sets of the validation data are shown in Figure 9. The vertical axis is the inferred host disk read latency, and the horizontal axis is the real (observed) host disk read latency for the given workload and number of VMs. The inference results when we used MAPE, MSE, and MAE are in the first, second, and third rows, respectively. For each loss function, we can see that the data points basically converge to the diagonal line where the inferred and real performance are equal.

Fig. 9.

Fig. 9. Comparison of inferred host disk read latency and real host disk read latency for fivefold CR when applying different loss functions.

Table 8 lists the statistical results of the inference errors for host disk read latency. We used the average MAPE, average RMSE, and average MAE of the fivefold CR to evaluate the precision of performance inference. The model had a minimal average MAPE of 35.7% when we used MAPE as the loss function. Similarly, the model had a minimal average RMSE of 3,440 when we used MSE as the loss function and minimal average MAE of 2,297 when we used MAE as the loss function.

Table 8.

Table 8. Inference Errors for Host Disk Read Latency

5.1.4 Precision of the Performance-Inference Model for User Input Response Time.

The user input response time inference results of the five sets of the validation data are shown in Figure 10. The vertical axis is the real user input response time, and the horizontal axis is the inferred user input response time for the given workload and number of VMs. The inference results are when we applied MSE as the loss function. We can see that unlike the inference of other performance metrics, the data points do not converge to the diagonal line where the inferred and real performance are equal—that is, the model could not infer the user input response time for the given DaaS workload and resource (number of VMs) within a tolerable error. This situation was similar when we used MAE and MAPE as the loss functions.

Fig. 10.

Fig. 10. Comparison of inferred user input response time and real user input response time for fivefold CR when applying different loss functions.

Table 9 lists the statistical results of the inference errors for user input response time. We used average RMSE and average MAE of the fivefold CR to evaluate the precision of performance inference. The reason we did not use MAPE is that since most of the real user input response time is 0 (also mentioned in discussion 2 in the following), if we use MAPE to evaluate the inference error, the absolute error will be infinitely large for the data point where the real user input response time is 0. We can see from the table that the model had an RMSE of 175.8 and a MAE of 20.76 when we used MSE as the loss function. Combing the results in Figure 10, we can see that different from the other performance-inference models for CPU ready time inference and disk read/write latency reference, the inference precision of user input response is deficient for the resource decision. We discuss possible reasons for this and future actions to improve this precision.

Table 9.

Table 9. Inference Errors for the User Input Response Time

Discussion 1: Choosing from different loss functions from the practical aspect. As mentioned earlier, for the performance-inference models for inferring CPU ready time and disk read/write latency, applying MAPE, MSE, and MAE minimizes the average inference errors MAPE, RMSE, and MAE, respectively. When we choose from these loss functions, we must consider how the inference results impact the resource design and user QoE. Taking host CPU ready time inference results (Figure 7) as an example, we can see that in the higher range of real host CPU ready time, the inferred host CPU ready time tends to go below the real host CPU ready time, especially for MAPE. In practice, the difference between the inferred and real host CPU ready time in the higher range is an important factor, and the DaaS operator pays more attention to the CPU ready time inference error in the higher range since it is more likely to lead to user QoE degradation. When we compare the three loss functions, MSE and MAE perform better compared to MAPE in minimizing the difference between the inferred and real CPU ready time in the higher range. The reason for this is that MAPE takes the percentage error, and thus for equal absolute error between the inferred and real CPU ready time that occurs in the higher range, its impact on the back propagation for MAPE is weaker than that for MAE and MSE. Similar results were also observed in the inference results for disk read latency and disk write latency. Thus, we can conclude that MSE and MAE are more recommended as loss functions for training performance inference.

Discussion 2: Possible reasons for low inference precision of user input response time. We observed low precision for the inference of user input response time. The possible reasons for this are as follows.

(1)

Limited samples of user input response log data: As mentioned earlier, for each host, the user input response log data is collected from only a small number of guest OS to minimize the impact on the DaaS user. Thus, there is high possibility that the collected log data cannot represent the average user input response situations of all VMs on the host.

(2)

Limited valid data: For 87% of the log data, the user input response is 0. The reason for this is that the sampled user did not conduct any input operation most of time. The number of valid data records in which the user input response was larger than 0 was only 13%. This number is considered insufficient for training neural network regression models.

To improve performance-inference precision, we are considering improving the number of samples—that is, to collect the user input response time log data from more VM instances on each host and improve the number of samples and number of valid data. Furthermore, if collecting more data is not possible due to restrictions, we are planning to use other inference methodologies, such as anomaly detection, to address the lack of valid data.

5.2 Implementation and Evaluation of IDM Resource Design

We implemented the IDM framework's decision-making and implementation module using the performance-inference models introduced in Section 4.3.3, and implemented a GUI for DaaS operators through which they can specify their intent (performance requirements) concerning the host CPU ready time and disk read/write latency.

In Section 4.3.4, we introduced a workload-feature-abstraction module for the decision-making and implementation module to address the lack of resource amount variation in the history log data. The motivation is that it is difficult for the performance-inference models to determine the overall dependency between DaaS resource and performance. We implemented the workload-feature-abstraction module, which calculates the expected workload in accordance with the number of VMs. Therefore, the IDM framework can infer the performance more precisely using the determined dependency between DaaS workload and expected performance. To validate the effectiveness of the workload-feature-abstraction module, we have used the expected workload that is calculated in accordance with the number of VMs by the workload-feature-abstraction module, and input the expected workload to performance-inference models. Figure 11 plots the inferred CPU ready time (blue line) in accordance with the given number of VMs using the workload-feature-abstraction module and its comparison with real CPU ready time (red line: 5-minute interval; green line: 30-minute interval; orange line: 1-hour interval) on each host for 1 day on five hosts. The performance-inference models could precisely infer the performance in accordance with the number of VMs by using the workload-feature-abstraction module, thus enabling precise recommendation of the number of VMs per host on the basis of the performance inference.

Fig. 11.

Fig. 11. Inferred CPU ready time in accordance with given number of VMs and its comparison with real CPU ready time on each host.

In Figures 12 and 13, we show that the IDM framework decides recommended numbers of VMs for the following two different intents.

Fig. 12.

Fig. 12. IDM resource design result and expected performance inference for recommend resource-amount configuration for Intent 1.

Fig. 13.

Fig. 13. IDM resource design result and expected performance inference for recommend resource configuration for Intent 2.

[Intent 1] “The DaaS host CPU ready time needs to be lower than 600,000 ms from 06:00 to 22:00 next Friday.”

As shown in Figure 12, the DaaS operator first specifies the intent through the GUI. Then on the same screen, the expected performance given the current number of VMs (54 VMs on the host, in the lower right of the screen) for the time is shown on the right. Thus, the operator can see that for the current configuration, there are periods when the host CPU ready time is longer than 600,000 ms; thus, the current configuration does not meet the intent. Note that the expected performance for the current number of VMs is also calculated using the performance-inference models. As introduced earlier, the IDM framework then enumerates each configurable number of VMs, infers the performance, and finally determines that the maximum number of VMs that meets the intent is 45, and feeds back the design result of 45 VMs (the lower right of the screen) on the host to the DaaS operator. The expected performance (host CPU ready time) for the recommended number of VMs is shown to the operator (the right of the screen). By reducing the number of VMs to 45 as recommended by the IDM framework, the DaaS operator can prevent the DaaS performance requirements from being violated.

[Intent 2] “The DaaS host CPU ready time needs to be lower than 1,200,000 ms from 06:00 to 22:00 next Friday.”

As shown in Figure 13, the DaaS operator first specifies the intent through the GUI. On the same screen, the expected performance given the current number of VMs (54 VMs on the host) is shown on the right. The operator can see that the performance for the current configuration meets the intent. However, the expected host CPU ready time is much shorter than the threshold specified by the intent; thus, it is possible to allocate more VMs to the host to increase the resource efficiency while satisfying intent. The IDM framework then enumerates each configurable number of VMs, infers the performance, and finally determines that the maximum number of VMs that meets the intent is 60, and feeds back the design result of 60 VMs on the host to the DaaS operator. The expected performance for the recommended number of VMs is shown to the operator. By increasing the number of VMs from 54 to 60 as recommended by the IDM framework, the DaaS operator can increase resource efficiency while satisfying the intent.

Discussion 3: Effectiveness of the IDM framework compared with a conventional human-based resource-design approach for DaaS. From our investigation of related research (Section 2) and to the best of our knowledge, designing the resource amount (number of VMs per host) in accordance with DaaS QoE requirements using history log data has not been studied. Therefore, we compared the IDM framework with the conventional human-based resource-design approach currently used in the cloud industry (introduced in Section 3) regarding time and human-resource costs and resource design results (Table 10).

Table 10.
IDM FrameworkConventional Approach
DaaS-resource-design-time costSeconds-minutesHours-days
Human-Resource CostHuman experience with cloud designNot necessaryNecessary
DaaS-Resource-Design ResultsWhether able to determine all resource-amount configurations that meet requirementsYesDifficult
Optimal resource designYesDifficult

Table 10. Effectiveness of the IDM Framework Compared with a Conventional Human-Based Resource-Design Approach

For the time cost, the execution-time-benchmarking result shows that the IDM framework takes seconds to infer the performance for all configurable resource-amount configurations and determines the resource-amount configuration that meets the intent. For the conventional approach, the design time largely varies in accordance with the DaaS operator's skill level and other factors and usually takes hours to days to design the DaaS resource. Thus, the IDM framework largely surpasses the conventional manual approach regarding time cost by reducing resource design time from hours-days to seconds.

For human-resource cost, the conventional approach requires the DaaS operator to have sufficient experience about the cloud platform and the relationship among DaaS workload, resource, and performance, which leads to a high human-resource cost for the DaaS service provider in training experienced operators. The IDM framework enables autonomous DaaS resource design by using the knowledge (the performance-inference models) abstracted from history log data; thus, less human-resource cost is incurred in the resource-design process. The comparison shows that the IDM framework can reduce human-resource cost compared with the conventional approach. Note that to implement the IDM framework, interviews with DaaS operators are necessary to identify the resource-design objective, available log data, and so forth. However, training DaaS operators for resource design is necessary for the conventional approach, and the training process usually incurs much more human-resource cost than the time required to interview DaaS operators for the IDM framework.

For the resource design result, the IDM framework determines all the resource-amount configurations that meet the DaaS performance requirements since it infers the performance for each available resource-amount configuration and checks whether the requirement can be met with the configuration, enabling the operator to choose the optimal resource design result in accordance with price policy and so forth—for example, allocate the maximum number of VMs on the host that meet the intent. It would be challenging for the conventional approach to determine all resource-amount configurations that meet the DaaS performance requirement.

Compared with the conventional approach, the IDM framework shortens the resource-design time from hours-days to seconds and significantly reduces the human-resource cost by autonomizing the resource-design process using the knowledge abstracted from log data. It can also determine all resource-amount configurations that meet the intent (DaaS QoE requirement). Therefore, it enables the DaaS operator to choose the optimal resource design.

Skip 6CONCLUSION AND FUTURE WORK Section

6 CONCLUSION AND FUTURE WORK

We aimed to address the challenge that designing cloud resources in accordance with various DaaS QoE requirements requires a high level of skills and experience, which increases human-resource and time costs for DaaS service providers. To address this challenge, we proposed an IDM framework that takes the DaaS QoE requirements as intent and autonomously calculates the resource-amount configuration that meets that intent. The IDM framework constructs models that infer the DaaS performance (QoE) for given DaaS workload and resource-amount configuration. On the basis of the inference, it identifies the resource-amount configuration that meets the QoE requirements. To ensure the IDM framework's practicality to actual DaaS services, we collaborated closely with DaaS operators to identify the challenges in applying the IDM framework to such services. We addressed these challenges through the detailed design of the IDM framework. To ensure practicality, we identified the objective for DaaS resource design, QoE metrics for DaaS to quantify the DaaS QoE, designed an ACAP mechanism to address the issue of low log-data availability, designed the IDM framework's DaaS performance-inference models, and designed a workload-abstraction module to address the difficulty in calculating the resource amount precisely given the issue of low resource variation in log data.

We implemented the IDM framework and validated it using the log data collected on a DaaS platform. The performance-inference models achieved 8.57%, 13.8%, and 35.07% MAPE for the inference of the DaaS QoE (performance) metrics of host CPU ready time, disk write latency, and disk read latency, respectively.

We also examined the IDM framework's effectiveness in designing resources. As mentioned in Section 5.2, the IDM framework can precisely determine the number of VMs to meet the intent (DaaS performance requirement) and avoid QoE degradation as well as increase cloud-resource efficiency. We compared the IDM framework with the conventional human-based resource-design approach and showed that the IDM framework reduces time/human-resource costs for resource design as well as improves the resource design results by enabling optimal resource design.

In this study, the quantification of user QoE relied on consulting with DaaS operators and is highly restricted to the log data categories that are collectable from actual services. Therefore, we will study how to autonomize the quantification the QoE for DaaS and other applications in our future work.

REFERENCES

  1. [1] Gartner Newsroom. 2020. Gartner forecasts worldwide public cloud revenue to grow 6.3% in 2020. Gartner. Retrieved March 27, 2021 from https://www.gartner.com/en/newsroom/press-releases/2020-07-23-gartner-forecasts-worldwide-public-cloud-revenue-to-grow-6point3-percent-in-2020.Google ScholarGoogle Scholar
  2. [2] Nakhai P. H. and Anuar N. B.. 2017. Performance evaluation of virtual desktop operating systems in virtual desktop infrastructure. In Proceedings of the 2017 IEEE Conference on Application, Information, and Network Security (AINS’17). 105110. Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Triyasona T. and Krathu W.. 2017. The impact of screen size toward QoE of cloud-based virtual desktop. Procedia Computer Science 111 (2017), 203208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Jin Y., Zhu J., Bai H., Chen H., and Sun N.. 2020. Design of virtual cloud desktop system based on OpenStack. advances in intelligent information hiding and multimedia signal processing. In Advances in Intelligent Information Hiding and Multimedia Signal Processing. Springer, 393401. Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Li W., Sheng J., Yan Y., Zhang S., Deng X., and Huang W.. 2019. The optimization of network performance evaluation method for virtual desktop QoE based on SPICE. In Smart City and Informatization. Communications in Computer and Information Science, Vol. 1122. Springer, 141–151. Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Calyam P., Rajagopalan S., Seetharam S., Selvadhurai A., Salah K., and Ramnath R.. 2014. VDC-Analyst: Design and verification of virtual desktop cloud resource allocations. Computer Networks 68 (2014), 110122. Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Calyam P., Patali R., Berryman A., Lai A. M., and Ramnath R.. 2011. Utility-directed resource allocation in virtual desktop clouds. Computer Networks 55, 18 (2011), 41124130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Kivity A., Kamay Y., Laor D., Lublin U., and Liguori A.. 2007. KVM: The Linux Virtual Machine Monitor. Retrieved July 30, 2021 from https://www.kernel.org/doc/ols/2007/ols2007v1-pages-225-230.pdf.Google ScholarGoogle Scholar
  9. [9] Barham P., Dragovic B., Fraser K., Hand S., Harris T., Ho A., Neugebauer R., Pratt I., and Warfield A.. 2003. Xen and the art of virtualization. SIGOPS Operating Systems Review 37, 5 (2003), 164177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Strassner J.. n.d. Experiential Networked Intelligence (ENI) System Architecture (work in progress). Draft ETSI GS ENI 005 V2.0.30. Retrieved July 30, 2021 from https://portal.etsi.org/webapp/WorkProgram/Report_WorkItem.asp?WKI_ID=58576.Google ScholarGoogle Scholar
  11. [11] ETSI. n.d. Experiential Networked Intelligence (ENI); InTent Aware Network Autonomicity (ITANA). ETSI GR ENI 008 V2.1.1. Retrieved July 30, 2021 from https://www.etsi.org/deliver/etsi_gr/ENI/001_099/008/02.01.01_60/gr_ENI008v020101p.pdfGoogle ScholarGoogle Scholar
  12. [12] TM Forum. n.d. Autonomous Networks: Empowering Digital Transformation for Smart Societies and Industries. Retrieved July 30, 2021 from https://www.tmforum.org/resources/whitepapers/autonomous-networks-empowering-digital-transformation-for-smart-societies-and-industries/.Google ScholarGoogle Scholar
  13. [13] 3GPP. Telecommunication Management: Study on Scenarios for Intent-Driven Management Services for Mobile Networks (work in progress). 3GPP TR 28.812. Retrieved July 30, 2021 from https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3553.Google ScholarGoogle Scholar
  14. [14] 3GPP. n.d. Management and Orchestration Intent-Driven Management Services for Mobile Networks (work in progress). 3GPP TR 28.312. Retrieved July 30, 2021 from https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3554.Google ScholarGoogle Scholar
  15. [15] ETSI. n.d. Zero-Touch Network and Service Management (ZSM); Closed-Loop Automation; Part 1: Enablers. DGS/ZSM-009-1 V1.1.1. Retrieved July 30, 2021 from https://www.etsi.org/deliver/etsi_gs/ZSM/001_099/00901/01.01.01_60/gs_ZSM00901v010101p.pdf.Google ScholarGoogle Scholar
  16. [16] OpenDayLight Network Intent Composition. Retrieved March 27, 2021 from https://wiki.opendaylight.org/view/NetworkIntentCompositionUseCase.Google ScholarGoogle Scholar
  17. [17] ONOS Intent Framework. Retrieved March 27, 2021 from https://wiki.onosproject.org/display/ONOS/Intent+Framework.Google ScholarGoogle Scholar
  18. [18] Wu C. and Shingo H.. 2018. Intent-based service management. In Proceedings of the 2018 21st Conference on Innovation in Clouds, Internet and Networks, and Workshops (ICIN’18). Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Wu C., Horiuchi S., and Tayama K.. 2019. A resource design framework to realize intent-based cloud management. In Proceedings of the 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom’19). Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] OpenStack. n.d. Home Page. Retrieved July 30, 2021 from https://www.openstack.org/.Google ScholarGoogle Scholar

Index Terms

  1. An Intent-driven DaaS Management Framework to Enhance User Quality of Experience

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Internet Technology
        ACM Transactions on Internet Technology  Volume 22, Issue 4
        November 2022
        642 pages
        ISSN:1533-5399
        EISSN:1557-6051
        DOI:10.1145/3561988
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 November 2022
        • Online AM: 23 March 2022
        • Accepted: 27 September 2021
        • Revised: 20 August 2021
        • Received: 31 March 2021
        Published in toit Volume 22, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!