VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instances

Video streaming constitutes 65 % of global internet traffic, prompting an investigation into its energy consumption and CO2 emissions. Video encoding, a computationally intensive part of streaming, has moved to cloud computing for its scalability and flexibility. However, cloud data centers' energy consumption, especially video encoding, poses environmental challenges. This paper presents VEED, a FAIR Video Encoding Energy and CO2 Emissions Dataset for Amazon Web Services (AWS) EC2 instances. Additionally, the dataset also contains the duration, CPU utilization, and cost of the encoding. To prepare this dataset, we introduce a model and conduct a benchmark to estimate the energy and CO2 emissions of different Amazon EC2 instances during the encoding of 500 video segments with various complexities and resolutions using Advanced Video Coding (AVC) and High-Efficiency Video Coding (HEVC). VEED and its analysis can provide valuable insights for video researchers and engineers to model energy consumption, manage energy resources, and distribute workloads, contributing to the sustainability of cloud-based video encoding and making them cost-effective. VEED is available at https://github.com/cd-athena/VEED-dataset.


INTRODUCTION
Global greenhouse gas (GHG) emissions cause severe climate change and environmental warming, impacting ecosystems and human well-being [24].In particular, Internet data traffic contributes to approximately 3.7 % of the GHG emissions, which is comparable to the global airline industry [9].A significant portion of this impact is attributed to video streaming, which accounts for 65 % of the global Internet data traffic [27].
Video encoding represents a critical stage in video streaming, characterized by its computational intensity and energy consumption.Recognizing the importance of the encoding process, cloud computing has emerged as a preferred framework for video encoding [2,11], primarily due to its scalability [20] that computing resources can be seamlessly adjusted to handle fluctuating workloads.Furthermore, cloud computing offers flexibility [20], allowing companies to dynamically scale their operations in response to demand and specific workload requirements.Cloud service providers extend computing power on demand, allowing businesses to fine-tune their operations as needed.However, cloud computing consumes significant energy.Data centers, the backbone of cloud services, account for approximately 3 % of global electricity consumption [15,18].Notably, video encoding stands out as one of the most energyintensive tasks performed within these data centers.Therefore, it is essential to address video encoding energy consumption in the cloud because it is a significant contributor to the environmental impact of cloud computing.
AWS [4], as an example of a cloud provider, is available for on-demand rental and offers instances tailored to video encoding scenarios.The careful selection of instances for the wide range of encoding variants plays a crucial role in determining the overall encoding energy.In modern data center servers (without GPU), the CPU and memory are identified as the two largest power consumers [26].
To contribute to this area, our paper proposes a model for estimating CPU energy consumption during video encoding.Subsequently, we conduct a benchmark, involving the encoding of 500 video segments [5] on six types of AWS EC2 instances across different regions.We gather the results of the energy and CO 2 emissions as the Video Encoding Energy and CO 2 Emission Dataset (VEED) presented in this paper.We also provide a detailed analysis of the dataset in terms of energy, time, CPU utilization, cost, and CO 2 emissions.This paper provides valuable insights and can be used to create recommendations for optimizing video streaming workflows, enhancing efficiency, and ultimately reducing environmental impact and costs.
The core of the paper is providing VEED, a FAIR-compliant dataset detailing energy consumption and CO 2 emissions of video encoding from AWS EC2 instances [34].To enhance Findability, the dataset is accompanied by a metadata description file.Accessibility is ensured through the https://github.com/cd-athena/VEED-dataset for users to easily download the dataset.The dataset is provided in standard CSV formats to promote Interoperability and allow integration with analysis applications.The dataset's Reusability is facilitated by description files, enabling researchers to understand and leverage the data for diverse analytical purposes.Providing FAIR principles, making VEED a valuable and accessible resource for the video streaming research community.The remainder of the paper is organized as follows.We introduce a model in Section 2 to calculate the energy consumption and CO 2 emissions for Virtual Machines (VMs).Section 3 outlines our testbed and the setup of the benchmark.Subsequently, the accuracy of the model is evaluated in Section 4.1 for VMs and the structure of the data set is explained in Section 4.2.Section 5 presents our analysis using the dataset.Finally, we highlight potential applications in Section 6 and conclude the paper in Section 7.

MODEL
Inspired by Codecarbon [13] and cloud Carbon Footprint [31], we introduce the model, which calculates energy consumption based on key metrics that play a crucial role in video encoding [10,19,32], i.e., (i) number of CPU and vCPU cores, (ii) Thermal Design Power (TDP), (iii) variations in CPU utilization, and (iv) duration of encoding time.
The energy consumption  for CPU is calculated as follows: where  is power draw and   is time duration to encode a video segment.
The power  for CPU is a multiply of the processor's   [16], the average watt that the processor can draw when it is running at 100 % utilization, and the average utilization of the processor  during encoding one video segment.
We define   as the number of vCPUs and   as the total number of CPU cores.
The CO 2 emission M is a multiple of the energy used by the carbon intensity  [14].The CO 2 intensity is the number of grams of CO 2 emitted per kWh of energy consumed in each country.

BENCHMARK SETUP
We measured the CPU utilization during the benchmarks using Psutil [25], a Python library that provides metrics for system utilization.The CO 2 intensity was extracted from the electricity map1 considering the energy production resources and imports of the country.
We conducted a benchmark of encoding 500 video segments [5] encoded in HEVC and AVC with different resolutions, implemented as a Python 3.10 script and executed on different AWS EC2 instances.
For bare metal scenarios, we employed codecarbon 2.1.3[13,14] for measuring energy consumption.On bare metal machines, codecarbon can read the energy consumption from the Running Average Power Limit Energy (RAPL) [8,17].
Video Sequence Dataset.We used Video Complexity Dataset (VCD) [5], a collection of 500 lossless HEVC encoded segments with diverse spatial and temporal complexities.Each segment is 5 s long with a resolution of 4K (i.e., 3840 × 2160) and a framerate of 23.98 fps.
Video Encoding.We encoded each video segment in six representations with different resolutions and bitrates, as shown in Table 2.We selected these representations in accordance with the Apple HLS authoring specification [6].Specifically, we selected the highest HEVC bitrates specified for each resolution and applied them to both our HEVC and AVC encoding benchmarks.We encoded the video segments using FFmpeg 5.0.2 with libx265 [22, 23] for HEVC and libx264 [21] for AVC.We used the medium preset for the encoding process.
Virtual Machine.We instantiated an Ubuntu VM utilizing the multipass command-line interface (CLI) program [7] on an Ubuntu laptop equipped with an Intel Core i7-8700 CPU featuring 6 CPU cores and 12 threads (6 of them used for encoding), as well as 16 GB of RAM.The VM had 6 cores and 6 threads assigned to it, we also allocated 5 GB of RAM to it, and it operated on the Ubuntu operating system.AWS EC2 Instances.We selected six different AWS EC2 instances from the compute-optimized, memory-optimized, and general-purpose instance families for the benchmark (cf.Table 4).Most instances are from the compute-optimized category since the benchmark assessed CPU encoding performance, making c5.large,c5.xlarge, c5.2xlarge, and also c5.9xlarge the optimal choices.For memoryoptimized and general-purpose, we opted for their 2xlarge variants, specifically r5.2xlarge and m5.2xlarge, to enable a reasonable instance type comparison [3].Table 4 also details CPU models, TDPs, the number of vCPU cores (  ), the number of physical cores of the CPU (  ), the amount of RAM the instance has available, and the cost of running the instance for one hour.

ENCODING ENERGY BENCHMARK 4.1 Model validation
To evaluate the accuracy of the model, we conducted a benchmark on bare metal and an Ubuntu VM.Since RAPL is supported on Ubuntu, Codecarbon was able to retrieve the energy consumption via RAPL [8] on bare metal [12,17].The percentage differences observed between the average energy consumption of the Codecarbon measurements on Ubuntu in bare metal and the estimates for Ubuntu VM are on average approximately 4.5 % for AVC and HEVC.
Relying on our model evaluation results, we extended our benchmark to study the energy consumption of video encoding on AWS EC2 instances in cloud environments, preparing VEED for this purpose.Next, we provide details about the dataset structure.

VEED structure
Figure 1 shows the structure of the dataset.It comprises dedicated folders for HEVC and AVC, each further organized into subfolders corresponding to different instance types.These instance type folders contain a data.csvfile for the benchmark results and a description.txtfile with information about the instance and its CPU model.The description file inside the dataset folder contains information about the CO 2 intensities used to compare the CO 2 emissions in different countries.Table 3 provides an overview of the fields in the dataset and describes their meaning.The same information can be found in the description.txtfile in the dataset folder.

VEED DATASET ANALYSIS
We define three scenarios that (i) analyze AWS EC2 instances' energy consumption, (ii) assess computing cost, and (iii) measure CO 2 emissions.

AWS EC2 Energy Consumption Analysis
Codec. Figure 2 shows the comparison of the average energy consumption between HEVC and AVC encodings at various resolutions on a c5.2xlarge instance.We selected c5.2xlarge as a  representative instance because the other types show similar trends.This figure shows that HEVC consumes more energy than AVC across all resolutions, with energy consumption rising along with increased resolutions.Table 5 shows that the difference in average energy consumption is small for low resolutions and becomes significant for high resolutions.For example, the average energy consumption for encoding a resolution of 360p using HEVC on the c5.2xlarge instance type is 2.73 % higher than encoding it using AVC.On the other hand, encoding a resolution of 4K using HEVC, consumes 51.42 % more energy than encoding it using AVC (see Table 5).
Resolution. Figure 3a shows the average CPU utilization compared to the energy consumption for the different resolutions for AVC on a single instance (c5.2xlarge).One trend we can observe from this figure is how different video resolutions affect a device's energy consumption and CPU utilization.The figure illustrates that higher video resolutions require more CPU resources and consume more energy than lower video resolutions.The figure also reveals a significant increase in energy consumption for the 4K resolution when CPU utilization reaches 80 %.The same analysis is applied   to HEVC and shown in Figure 3b.Similarly, encoding 4K resolutions increases CPU utilization, and consequently consumes more energy.
Encoding duration.In Figures 4a and 4b, we compare the time required to encode 4K video segments and their corresponding energy consumptions across various instance types for AVC and HEVC, respectively.An observation is that a longer encoding time corresponds to a higher energy consumption for each instance.Additionally, a direct relationship exists between time and energy consumption.As video files become more complex, time and energy linearly increase.Another observation is that the c5.large instance, with only two vCPUs, performs as the slowest among our instance types, while the c5.9xlarge, boasting 36 vCPUs, is the fastest (see Table 4).The 2xlarge variants of the instance families take nearly the same amount of time to encode.Only r5.2xlarge takes a little bit longer.This trend is true for both HEVC and AVC encoding benchmarks.CPU utilization.Figure 5 shows the average CPU utilization compared to the energy consumption for the 4K encodings.For the c5.9xlarge instance, the CPU utilization never reaches more than 70 %, neither for HEVC nor for AVC encodings.The c5.xlarge instance consumes more energy than the 2xlarge variants.The remaining instance types are always between 70 % to 100 % utilization.Profiling AWS EC2 instances.Figure 6 shows the total energy each instance consumed to encode all 500 segments in the resolutions from Table 2.For AVC c5.2xlarge, m5.2xlarge and c5.large consumed roughly the same amount of energy at around 0.755 kWh while c5.9xlarge and r5.2xlarge consumed a little bit more energy at 0.844 kWh and c5.xlarge consume the most at around 1.022 kWh.For HEVC the results are pretty similar, the only differences are that the r5.2xlarge instance consumes less energy than the m5.2xlarge instance and that overall all instances used more energy.The instances c5.2xlarge, r5.2xlarge and c5.large consumed roughly 1.136 kWh, c5.9xlarge and m5.2xlarge consumed

Computing cost assessment
Figure 7 illustrates a comparison between the cost of encoding 4K resolution video segments and the corresponding energy consumption across various instance types.Notably, an increase in energy consumption translates to higher costs for each instance.Additionally, cost and energy consumption increase linearly as video files become more complex.Moreover, Figure 7 highlights that the c5.large instance is the most cost-effective, while the c5.9xlarge instance emerges as the most expensive choice for encoding all video files.The main reason is that c5.large instance is the cheapest among our testbed instances, while the c5.9xlarge is the most expensive one (see Table 4).This pattern can also be observed for HEVC, except for some resolutions where the r5.2xlarge instance is more expensive.In conclusion, the choice of instance type significantly influences the cost and energy consumption of video segment encoding.Opting for a more efficient instance can effectively reduce both cost and energy consumption.

AWS EC2 CO 2 emissions analysis
CO 2 impact factor.Figure 8 shows the average CO 2 emissions produced in Germany, Austria, Sweden, Poland, South Africa, Taiwan, and Great Britain between 13th of August 2023 19:00 to 14th of August 2023 18:00.The data for the CO 2 emissions was retrieved from the electricity-map API.It can be seen that choosing a country with a low CO 2 impact, for example, Austria or Sweden, can lead to significantly lower CO 2 emissions compared to, for instance, Poland or South Africa.For example, the difference in CO 2 emissions between Sweden, with the lowest CO 2 emissions, and Poland, with the highest CO 2 emissions, is 4336.36%.

USAGE OF VEED
VEED is crucial in advancing research and practices related to energy-efficient video encoding in cloud computing environments.
Energy consumption modeling.VEED provides a valuable foundation for developing energy consumption models by thoroughly analyzing various encoding scenarios, parameters, and instance specifications.
Energy management and workload distribution.Leveraging VEED analysis, researchers can propose effective energy management methods and scheduling algorithms [1], strategically distributing workloads across various computing devices to significantly reduce overall energy consumption in cloud-based video encoding processes.
Cost savings.Understanding the energy efficiency of various video streaming approaches in cloud computing can lead to significant cost savings.Developing energy-efficient methods requires fewer resources, resulting in lower operational costs for service providers and end-users.
Developing sustainable approaches.Findings from the VEED benchmark serve as a practical guide to enhance the eco-friendliness and efficiency of video encoding processes in cloud computing environments.

Figure 2 :
Figure 2: Average energy consumption of HEVC and AVC encodings at different resolutions on c5.2xlarge instance type.

Figure 3 :
Figure 3: Average CPU utilization vs. energy consumed for each input segment encoded in different resolutions on c5.2xlarge.

Figure 6 :
Figure 6: Total energy consumption of all video segments across instances for a) AVC and b) HEVC.

Figure 4 :
Figure 4: Duration of the encoding vs. energy consumed for each input segment encoded at 4K for each instance.

Figure 5 :
Figure 5: CPU utilization vs. energy consumed for each input segment encoded at 4K for each instance.

Figure 7 :
Figure 7: Comparing the cost of the instance to the energy consumed for each input segment encoded at 4K for each instance.

Table 2 :
Encoding ladder used for the benchmark.

Table 3 :
Explanation for the different fields in the dataset.

Table 5 :
Difference of average energy consumption for HEVC compared to AVC.