PyStream: Enhancing Video Streaming Evaluation

As streaming services become more commonplace, analyzing their behavior effectively under different network conditions is crucial. This is normally quite expensive, requiring multiple players with different bandwidth configurations to be emulated by a powerful local machine or a cloud environment. Furthermore, emulating a realistic network behavior or guaranteeing adherence to a real network trace is challenging. This paper presents PyStream, a simple yet powerful way to emulate a video streaming network, allowing multiple simultaneous tests to run locally. By leveraging a network of Docker containers, many of the implementation challenges are abstracted away, keeping the resulting system easily manageable and upgradeable. We demonstrate how PyStream not only reduces the requirements for testing a video streaming system but also improves the accuracy of the emulations with respect to the current state-of-the-art. On average, PyStream reduces the error between the original network trace and the bandwidth emulated by video players by a factor of 2-3 compared to Wondershaper, a common network traffic shaper in many video streaming evaluation environments. Moreover, PyStream decreases the cost of running experiments compared to existing cloud-based video streaming evaluation environments such as CAdViSE.


INTRODUCTION
Video streaming is by far the most data-intensive application of the Internet, accounting for approximately 65% of the global network traffic [1].Therefore, it is crucial to develop tools for bandwidth management, as any 1% increase in video streaming efficiency will save almost 1 billion TBs of Internet traffic [2].Adaptive video streaming is the current state-of-the-art multimedia delivery mechanism designed to optimize the viewer's experience by dynamically adjusting video quality based on real-time network conditions and device capabilities [3].At its core, this technology leverages HTTP-based protocols and employs a segmented representation approach [4].Video content is encoded into discrete segments, each representing a fraction of the entire video stream.Leveraging adaptive bitrate algorithms (ABRs), video players dynamically select each segment's appropriate representation (bitrate and resolution), targeting crisp, seamless playback.By constantly monitoring network metrics such as available bandwidth and latency, ABRs autonomously switch between representations, striking a balance between video quality and rebuffering [5].
Evaluating the performance of a video streaming pipeline (codec, ABR, etc.) requires monitoring many video playbacks in many combinations of network conditions, players, devices, bitrate ladder, etc. [6].This comes at a potentially prohibitive cost unless the testbed used is efficient in terms of cost and accuracy.Such a testbed must emulate multiple clients, each with a different player and a dedicated network trace approaching a real streaming scenario.In this way, monitoring streaming components' behavior can be parallelized, and performance can be evaluated efficiently.For example, CAdViSE [7] is a cloud-based adaptive video streaming evaluation framework for the automated testing of adaptive media players (see Figure 1).To run an experiment in CAdViSE with  number of players, we need to instantiate  EC2 machines and load a video player script (e.g., dashjs [8], shaka [9]) in each.Therefore, the evaluation cost of CAdViSE is proportional to the number of EC2 instances used.Moreover, in this context, adhering to realistic network traces is crucial.For example, using an inefficient network traffic shaper may not generalize well for an ABR in the real world, leading to sub-optimal performance and low user satisfaction.Wondershaper [10] is a network shaper used not only in CAd-ViSE but also in many other emulators that need to control network speed [11,12].Wondershaper shapes the network bandwidth by connecting directly to a computer's network interface card (NIC) and bottlenecking it to the required bandwidth using the Linux built-in command, i.e., the TC command [13].

System Manager
In this paper, we address the cost and accuracy of video streaming emulators and introduce PyStream, a simple yet powerful emulator tool that initiates and monitors multiple players on a physical machine or a virtual machine (e.g., AWS EC2 [14]).PyStream is a docker-based emulator that, as Figure 2 shows, allows developers and researchers to run different numbers of players with various player types and network traces.
Section 2 introduces the current state-of-the-art landscape regarding network shapers and video streaming monitoring tools and emulators.Section 3 describes the current problems with network emulation and motivates our work.Section 4 delves into the specific design and implementation of PyStream, while Section 5 provides benchmarks and observations about PyStream's performance.Finally, Section 6 concludes our work with use cases and conclusions.

RELATED WORK
Regarding the defined objectives for designing PyStream, we have identified several tools that handle one or more steps in the traffic emulation pipeline.
Wondershaper [10] is a foundational network shaping tool that was released in 2002 and is still relevant nowadays.Conceptually simple, it is a script that allows the user to limit the bandwidth of one or more network adapters.It uses Linux's iproute TC command [13] but greatly simplifies its operation.Mahimahi [15] is another network emulation and traffic shaping framework that works under the record-replay principle.HTTP traffic can be recorded, stored, and replayed with different network conditions.For this matter, Mahimahi provides several tools, such as the RecordShell and ReplayShell for recording and replaying traffic, respectively, the DelayShell for emulating delays and the LinkShell to emulate packet loss.Trickle [16] is a tool for rate limiting TCP connections in smaller, unmanaged networks that runs in user-space.It utilizes the preloading functionality of the Unix dynamic loader to load its own socket library wrappers.It enables Trickle to delay and truncate socket I/Os without requiring administrator privileges; and, in this way it limits the bandwidth.Dummynet [17] is a network emulation tool widely used in the field of networking research.It includes a traffic shaper and a packet scheduler, enables the emulation of a whole set of network environments, and it has the ability to model delay, and packet loss, it is easy to integrate and provides consistent results.With Mininet [18], users can create, configure, and connect virtual network nodes using lightweight containerization technologies.This enables the emulation of complex network topologies on a single computer, in accordance with Mininet's focus on SDN research, education, and testing.ns-3 [19] is a discrete-event network simulator targeted at research and education, allowing the modeling and simulation of computer networks with a focus on network protocols and scenarios.Sabre [20] is an open-source framework that emulates real-world conditions to evaluate ABRs via quality of experience (QoE) metrics.Sabre can also be used to test new ABR algorithms without learning the intricacies of implementing the production layer.AdViSE [21] is a framework that enables testing media players under various network conditions.It accomplishes this by creating software-defined networks that are then shaped into the desired network.This network is then hosted on a physical machine so the players can access it.CAdViSE [7] improves over AdViSE by removing the system's physical deployment.This testbed can be operated on cloud-based systems such as Amazon AWS, making upscaling easier and standardizing the machine's performance.This architecture is continuously updated with new features and improvements.One such improvement is LLL-CAdViSE [22], which enables the testing of low-latency metrics.
Table 1 presents a quick overview of video streaming emulators and network traffic shaper tools.

PROBLEM DESCRIPTION AND MOTIVATION
As mentioned earlier, PyStream aims to tackle two main challenges in video streaming emulators: the cost of emulation/evaluation and the need for accuracy.In Figure 1, we illustrated that as the

TS Type Platform
Wondershaper [10] TS -Linux Trickle [16] TS -Unix-like Mahimahi [15] NE, TS -Linux/Windows Dummynet [17] NE, TS -Unix-like Mininet [18] SE iproute TC Linux ns-3 [19] NS iproute TC Unix-like Sabre [20] VSE Wondershaper Linux AdViSE [21] VSE Wondershaper Linux CAdViSE [7] VSE Wondershaper Linux PyStream VSE PyShaper Linux/Windows number of players increases in CAdViSE, the cost also escalates.This increase occurs because we have to assign an EC2 virtual machine to each player, pulling its image from the Docker hub.The limitation arises from Wondershaper [10], the traffic shaper used in CAdViSE, which can only run on a physical or virtual network interface.Note that Wondershaper has been used as the main traffic shaper module in many emulators (see Table 1).
From an accuracy standpoint, we compare the original network trace with the predicted bandwidth across various network traces1 [7] and two ABR algorithms: L2A [23] and throughput based [24].As depicted in Figure 3, the predicted bandwidth by both ABR algorithms is not accurate.This behavior could be attributed to either the inaccuracy of the ABR bandwidth prediction module or the low performance of Wondershaper.While numerous studies have focused on optimizing the bandwidth prediction module in ABRs [25], the primary reason for this behavior is the low accuracy in shaping network traffic with Wondershaper.The evaluation section will demonstrate that ABRs can accurately predict the bandwidth if paired with an efficient traffic shaper.Therefore, in this paper, we introduce PyStream, a Docker-based video streaming emulator which includes an efficient Python-based network traffic shaper called PyShaper.Our system is developed not only to enhance the accuracy of traffic shaping but also to reduce the cost of streaming emulation.

PYSTREAM DETAILS
Figure 2 illustrates the conceptual architecture of PyStream, which can be executed on either a physical or virtual (e.g., AWS EC2) machine.PyStream operates through Docker and consists of the following modules: coordinator, virtual reverse proxy (VRP), PyShaper, and the local database.Before describing the details of these modules, let us explain how PyStream can be configured to run various video streaming scenarios.
The system manager, orchestrating the experiments, first defines the quartet <[PID], PT, NT, L>, where [PID] is the list of player IDs that should run the player type PT (e.g., dashjs [8], shaka [9], etc.) operating over the network trace NT, that is similar to the proposed JSON format in CAdViSE.Moreover,  is the duration of the emulation in seconds.We note here that PID can be a unique name for each player.When a player sends an HTTP request, its PID is embedded in the URL.Thus, we can easily customize the manifest address for each player by adding the PID.The quartets are defined and stored in the local database by the system manager.The main responsibility of the coordinator module is to fetch a docker from Docker Hub, update the determined player script ( ), and run the docker.The coordinator also instructs PyShaper to apply the assigned network traces ( ) to the players' traffic.Dockerized players are configured to connect to the VRP module to start streaming.The VRP plays as a man-in-the-middle between the players and the origin/CDN server.In the VRP, PyShaper adjusts the bandwidth for each player by introducing synthetic delay when sending HTTP packets, according to the determined network traces.
Whenever a player sends an HTTP request, the VRP downloads the requested segment from the origin/CDN server2 .The VRP then calculates the synthetic time the segment should have taken to download, according to the player's network trace and waits that long before forwarding the response to the player.To accurately follow the given network traces, a separate CPU thread updates the variables used for bandwidth calculations for each player every second.Crucially, before setting the thread to sleep for the calculated delay, part of the segment must be sent to ensure that the TCP connection remains active and an accurate bandwidth estimation at the player is maintained.Once the thread's sleep period concludes, the remaining segment content is transmitted to the player (see Figure 4).
Algorithm 1 shows the logic for calculating the synthetic delay.To that end, it uses the following parameters: Input.
• Player ID (PID): This uniquely identifies the player that the algorithm is calling.Let us demonstrate Algorithm 1 through a simple example.Consider a dockerized player with PID=1 requesting a segment with a size of SS=25 kb (see Figure 4).Additionally, we have a simple network trace NT which consists of three values NT={10, 20, 30} kbps,  LeftOverBW [1] followed by setting RBW to 0. Since RBW is now 0, the while loop terminates in the next iteration, and d is returned in line 23.

PERFORMANCE
In order to assess PyStream's performance, we setup an AWS EC2 instance of type c6a.32xlarge, which is equipped with the following components: • CPU: 64 cores @ 3.6 GHz with hyperthreading, for a total of 128 threads • RAM: 256 GB  • OS: Ubuntu 20.04 LTS The machine's performance is monitored with the built-in Docker Stats [26], and a custom script logged each player's load at every second.To benchmark these results, the same tests are run on CAdViSE, in order to emulate the same traces used in PyStream.

Accuracy Tests
The accuracy tests are performed in order to see how close the players' predictions are to the real network traces.For this, 20 players run for 120 seconds, and the average bandwidth predictions of L2a and Throughput-based ABRs are plotted in Figures 5  and 6, respectively.Around the average value, the range between minimum and maximum player value is shown as an indication of variability.It highlights how predicated bandwidth by ABRs is closer and much more consistent among each other to the original network trace when we use Pyshaper compared to the experiments using wondershaper.
To have a better evaluation on the performance of PyStream, in the next scenario, we use Linux TC command [13] to shape the bandwidth for players.As shown in Figure 7, PyStream efficiently emulates the bandwidth of the given network trace in comparison with TC tool.

Load Tests
Having shown the PyStream's accuracy on different ABRs and network traces, PyStream is then tested on its scalability.Specifically, how increasing the number of players would impact the accuracy.To this end, the experiment is repeated with 8, 16, 32, 64, and 128 players.The number of players was chosen due to the EC2 machine's resources to check the response of PyStream when the number of threads needed for players became equal (64) and higher (128) than the physical number of cores (64).Figure 8 shows the results of this stress test, where the red dot corresponds to the mean absolute error (MAE), calculated as follows:  , where   is the real bitrate for client i,   is the bitrate predicted by the player for client i, and  is the number of players.The results confirm that, while the number of players is much lower than the number of physical cores, PyStream-caused deviations from the network traces remain small and increase exponentially (linearly in our log plot) up to the physical number of cores.Once the number of processes surpasses the number of cores, it appears that the performance stabilizes, but this is due to the machine being forced to freeze several threads.Therefore, performances above this threshold should not be considered.

Resouce Tests
In the final test, we investigate the connection between the machine's resources and PyStream's utilization of said resources.This is extrapolated from running the same experiments as above and logging each Docker container's ( players + 1 controller) resource utilization over time.Figure 9 shows a linear dependence of both CPU and RAM usage to the number of players, but not from the same containers: while CPU usage is negligible for player containers but high for the controller, the opposite is true for RAM.This is because the controller is fundamentally a Python script that needs to run calculations for each player, while each player does not perform complex calculations but has to store a player and, crucially, its streamed video segments.This is a fundamental point in favor of PyStream's architecture over CAdViSE: typical virtual machines' CPU count and RAM scale together.Therefore, a system that is able to leverage both from the same machine is a more efficient use of resources than needing to spin up multiple machines of which only one dimension is fully utilized.As can be seen from Figure 9, the CPU count is the bottleneck of this system, while the required RAM is much lower than the physical limit of an EC2 machine with a sufficient number of cores.Therefore, both PyStream and CAdViSE need the same-sized machine to perform an experiment with the same number of players.Quantifying the resource-saving is then a simple matter of summing the costs of EC2 machines required for CAdViSE's players.Figure 9 shows how, for 64 players, the total utilization of RAM is around 7.5% of the total 256 GB available, resulting in 0.3 GB of RAM per player.This amount of RAM is serviced already by the smallest EC2 instance type, namely t2.nano, which costs 0.6 cents/hour, for a total cost of 38 cents/hour.Given that a 64-core EC2 instance type such as c6a.32xlarge costs about 5 euros/hour, the savings in this experiment amount to around 8%.However, this calculation assumes that the t2.nano instance type is actually able to run a CAdViSE player instance and that CAdViSE's server requires similar computational power than PyStream's controller.Therefore, more experiments are necessary to quantify the cost comparisons between the two systems, and possibly others, with precision.

CONCLUSION
In this study, we presented PyStream, a novel way to emulate network traces through a Docker environment, making it possible to run multiple simultaneous tests locally.We showed the limitations of current approaches, especially with regard to traffic shaping, and how PyStream significantly improves the accuracy of the emulations by developing PyShaper, an algorithm that calculates how long each segment should take to reach each player given their assigned network trace, and keeps the player from receiving it beforehand.Furthermore, by combining the resource requirements of the different parts of a network emulator, PyStream is able to make full use of a single virtual/physical machine, reducing the cost of running experiments.While further exploration and more comprehensive tests could provide improvements, we showed how PyStream's approach is a valuable addition to the current landscape of network emulation.

Figure 1 Figure 2 :
Figure 1: CAdViSE architecture Segment Size (SS): Represents the size of the currently requested segment by the player PID.• Network Trace (NT): The set of network trace values used for the player PID.• Streaming time (ST): A pointer that tracks the current position inside the set NT for the player PID.Global Variables.Leftover Bandwidth (LeftOverBW) [global variables]: A two-dimensional array that, with the length of the number of players, stores unused bandwidth from their previously requested segments.Output.Synthetic Delay (d)[output]: The determined delay by the algorithm for player PID.

Figure 4 :
Figure 4: An example of requesting a segment by a dockerized player in PyStream.

Algorithm 1 :Figure 5 :Figure 6 :
Figure 5: Comparing the variation between the original network trace and predicted bandwidth by L2A ABR using PyStream and Wondershaper to shape the network traffic across various network traces.

Figure 7 :
Figure 7: Comparing the variation between the original network trace and predicted bandwidth by Throughput ABR using PyStream and Linux TC command to shape the network traffic across various network traces.

Figure 9 :
Figure 9: Resource usage in relation to concurrent players