Abstract
In HTTP Adaptive Streaming (HAS), videos are encoded at multiple bitrates and spatial resolutions (i.e., representations) to adapt to the heterogeneity of network conditions, device attributes, and end-user preferences. Encoding the same video segment at multiple representations increases costs for content providers. State-of-the-art multi-encoding schemes improve the encoding process by utilizing encoder analysis information from already encoded representation(s) to reduce the encoding time of the remaining representations. These schemes typically use the highest bitrate representation as the reference to accelerate the encoding of the remaining representations. Nowadays, most streaming services utilize cloud-based encoding techniques, enabling a fully parallel encoding process to reduce the overall encoding time. The highest bitrate representation has a higher encoding time than the other representations. Thus, utilizing it as the reference encoding is unfavorable in a parallel encoding setup as the overall encoding time is bound by its encoding time. This article provides a comprehensive study of various multi-rate and multi-encoding schemes in both serial and parallel encoding scenarios. Furthermore, it introduces novel heuristics to limit the Rate Distortion Optimization (RDO) process across various representations. Based on these heuristics, three multi-encoding schemes are proposed, which rely on encoder analysis sharing across different representations: (i) optimized for the highest compression efficiency, (ii) optimized for the best compression efficiency-encoding time savings tradeoff, and (iii) optimized for the best encoding time savings. Experimental results demonstrate that the proposed multi-encoding schemes (i), (ii), and (iii) reduce the overall serial encoding time by 34.71%, 45.27%, and 68.76% with a 2.3%, 3.1%, and 4.5% bitrate increase to maintain the same VMAF, respectively, compared to stand-alone encodings. The overall parallel encoding time is reduced by 22.03%, 20.72%, and 76.82% compared to stand-alone encodings for schemes (i), (ii), and (iii), respectively.
1 INTRODUCTION
Motivation: The Moving Picture Experts Group (MPEG) has developed a standard called Dynamic Adaptive Streaming over HTTP (MPEG-DASH) [24] to meet the high demand for streaming high-quality video content over the Internet and overcome the associated challenges in HTTP Adaptive Streaming (HAS). The main idea behind HAS is to divide the video content into segments and to encode each segment at various bitrates and resolutions, called representations, as shown in Figure 1. These representations enable a continuous adaptation of the video delivery to the client’s network conditions and device capabilities. The increase in video traffic and improvements in video characteristics such as resolution, framerate [15], and bit-depth raise the need to develop a large-scale, highly efficient video encoding environment [6]. This is even more crucial for DASH-based content provisioning as it requires encoding multiple representations of the same video content.
Fig. 1. An example video representation’s storage in HAS. The input video is encoded at multiple resolutions and bitrates. \(B_{i,0}\) to \(B_{i, M_{i}-1}\) represent the target bitrates in descending order for the representations in the \((i)^{th}\) resolution, where \(M_{i}\) denotes the number of representations in the \((i)^{th}\) resolution.
High Efficiency Video Coding (HEVC) [26] is one standard video codec that significantly improves coding efficiency over its predecessor Advanced Video Coding (AVC) [27]. This improvement is achieved at the cost of significantly increased encoding time, which is a challenge for content and service providers. As various representations of the same video content are encoded at different bitrates or resolutions, the encoding analysis information from the already encoded representations can be shared to accelerate the encoding of other representations. Several state-of-the-art schemes [7, 9, 12, 17, 23, 28] first encode a single representation, called a reference representation. The encoder creates analysis metadata (file) with information such as the slice-type decisions, CU [18], PU, TU partitioning, and the HEVC bitstream itself during this encoding. The remaining representations, called dependent representations, analyze the above metadata and then reuse it to skip searching some partitioning, thus reducing the encoding time. With the emergence of cloud-based encoding services,1 video encoding is accelerated by utilizing an increased number of resources; i.e., with multi-core CPUs, multiple representations can be encoded in parallel.
When encoding multiple representations serially, i.e., one after the other, the overall serial encoding time \(\tau _{S}\) is the sum of the encoding times of all representations as shown in Equation (1): (1) \(\begin{equation} \tau _{S} = \sum _{i=0}^{N-1} \sum _{j=0}^{M_{i}-1} (\tau _{B_{i,j}}), \end{equation}\) where N denotes the number of resolutions, \(M_{i}\) denotes the number of bitrate representations in the \((i)^{th}\) resolution, and \(\tau _{B_{i,j}}\) represents the time taken for encoding the representations, \(B_{i,j}~\forall ~i \in [1, N]\) and \(j \in [1, M_{i}]\). When encoding multiple representations in parallel, the overall parallel encoding time \(\tau _{P}\) is bounded by the encoding time of the representation with the highest encoding time. In case of stand-alone encoding, the highest encoding time is for the highest bitrate representation. Although the schemes utilizing the highest bitrate representation as Reference [1] reduce the encoding time for dependent representations, the overall parallel encoding time will remain unchanged if all representations are encoded in parallel. Therefore, the overall parallel encoding time depends on the encoding time of the representation with the highest encoding time. This is represented in Equation (2), where \(\tau _{P}\) denotes the total time taken for parallel encoding: (2) \(\begin{equation} \tau _{P} = \mathop {max}\limits _{i,j} (\tau _{B_{i,j}}); j \in [0, M_{i}-1], i \in [0, N-1]. \end{equation}\)
In this article, the schemes are analyzed for both serial and parallel encoding environments. The term multi-rate is used when all representations are encoded at a single resolution but at different bitrates. The term multi-resolution is used when all representations are encoded at multiple resolutions. Multi-encoding is used when a single video is provided at various resolutions, and each resolution is encoded at different bitrates.
Contributions: In this article, a double-bound approach, which is based on our previous work [2], is used for fast CU depth estimation in multi-rate encoding. In this scheme, both the highest and the lowest bitrate representations are used as references to speed up the encoding of the intermediate bitrate representations. This article also introduces prediction mode and motion estimation heuristics to accelerate multi-rate encoding further. Novel multi-resolution encoder analysis sharing methods are presented to accelerate encoding in more than one resolution. Three fast multi-encoding schemes are introduced: (i) optimized for the highest compression efficiency, (ii) optimized for the best compression efficiency-encoding time savings tradeoff, and (iii) optimized for the highest encoding time savings. All the state-of-the-art and proposed schemes in this article are implemented and evaluated using the x265 HEVC open-source encoder2 for serial and parallel encoding environments, and the conclusions are presented.
Article organization: An overview of the schemes presented in this article is shown in Table 1. In Section 2, the background and related work are described briefly. It also explains the state-of-the-art schemes used for multi-rate encoding. Sections 4 and 5 introduce the proposed efficient schemes for multi-rate encoding and multi-encoding, respectively. In Section 6, the schemes are evaluated, and the corresponding experimental results are presented. Finally, Section 7 concludes the article.
| Scheme | Name | Abbreviation | Reference | Section |
|---|---|---|---|---|
| Multi-rate encoding | State-of-the-art Multi-Rate Encoding Scheme-1 | [23] | 2 | |
| State-of-the-art Multi-Rate Encoding Scheme-2\(^{{2}}\) | 2 | |||
| State-of-the-art Multi-Rate Encoding Scheme-3 | [8, 13] | 2 | ||
| Efficient Multi-Rate Encoding Scheme-1 | [2] | 4.1 | ||
| Efficient Multi-Rate Encoding Scheme-2 | [17] | 4.2 | ||
| Efficient Multi-Rate Encoding Scheme-3 | [17] | 4.2 | ||
| Efficient Multi-Rate Encoding Scheme-4 | - | 4.2 | ||
| Multi-encoding | State-of-the-art Multi-Encoding Scheme | [8, 13] | 2 | |
| Efficient Multi-Encoding Scheme-1 | [17] | 5.1 | ||
| Efficient Multi-Encoding Scheme-2 | [17] | 5.2 | ||
| Efficient Multi-Encoding Scheme-3 | - | \(5.3\) |
Table 1. Overview of the Schemes Presented in This Article
2 BACKGROUND AND RELATED WORK
In HAS, the same video content is encoded at multiple representations, referred as to a bitrate ladder, to continuously adapt the video delivery to the user’s needs. Bitrate ladders are usually optimized per content over different dimensions, including bitrate [21], resolution [11, 16], framerate [15], and codec [20]. On the other hand, encoding the same video content at multiple representations results in a significant increase in the encoding cost. However, as the same content is encoded multiple times, the encoder analysis information from the already encoded representation(s) can be reused to speed up the encoding process of the remaining representations. In HEVC, frames are first divided into slices, and then they are further divided into square regions called Coding Tree Units (CTUs) [26], which are the main building blocks of HEVC. To encode each CTU, it is recursively divided into smaller square regions called Coding Units (CUs) (see Figure 2). Depth values from 0 to 3 are assigned to CU sizes from \(64\times 64\) to \(8\times 8\) pixels. Therefore, to find the best CTU partitioning, 85 CUs including one \(64\times 64\) CU, four \(32 \times 32\) CUs, sixteen \(16 \times 16\) CUs, and sixty-four \(8 \times 8\) CUs are searched.
Fig. 2. In HEVC encoding, frames are divided into CTUs, and each CTU is then divided into CUs. Each CU is subdivided into PUs, and the prediction residuals of CUs are partitioned into TUs. The optimal CTU partitioning is found after an exhaustive search process through all CUs, PUs, and TUs.
Additionally, for both intra-picture and inter-picture prediction, each CU can be further subdivided into smaller blocks called Prediction Units (PUs) along the coding tree boundaries. The inter-prediction modes comprise Merge/Skip 2N \(\times\) 2N, Inter 2N \(\times\) 2N, Symmetric Motion Partition (SMP, including Inter 2N \(\times\) N and Inter N \(\times\) 2N), Asymmetric Motion Partition (AMP, including Inter 2N \(\times\) nU, Inter 2N \(\times\) nD, Inter nL \(\times\) 2N, and Inter nR \(\times\) 2N), and Inter N \(\times\) N. In contrast, the intra-prediction modes involve Intra 2N \(\times\) 2N and Intra N \(\times\) N. The best PU mode is selected according to all modes’ minimum rate-distortion cost (RD-cost). Furthermore, for transform coding of the prediction residuals, each CU can be partitioned into multiple transform blocks (TBs), the size of which can also vary from 4 \(\times\) 4 to 32 \(\times\) 32 pixels [26]. In general, finding the optimal partitioning for each CTU is time-consuming, given the possible CU, PU, and TU partitionings; multiple reference frame search processes for inter-coded CUs; and the actual motion estimation algorithm.
Schroeder et al. [23] propose a single bound approach for CU depth estimation denoted as State-of-the-art Multi-Rate Encoding Scheme (
Fig. 3. Encoder analysis flow diagram of SMRES-1 [23]. The gray arrow represents the single bound for CU estimation.
| Resolution | 540p (i = 0) | 1080p (i = 1) | 2160p (i = 2) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Bitrate | 0.5 Mbps | 1.0 Mbps | 1.5 Mbps | 3.0 Mbps | 4.5 Mbps | 5.8 Mbps | 11.6 Mbps | 16.8 Mbps | 20.0 Mbps |
| (j = 3) | (j = 2) | (j = 1) | (j = 3) | (j = 2) | (j = 1) | (j = 3) | (j = 2) | (j = 1) | |
| Average | 94.62% | 96.31% | 95.76% | 91.27% | 93.29% | 94.13% | 90.23% | 89.68% | 90.73% |
Table 2. Statistics of CUs in the Bitrate Representations (cf. Figure 1) That Have Depth Values Lower Than the Highest Bitrate Representation (j = 0) for JVET Sequences Used in This Article (cf. Table 11) Encoded Using the veryslow Preset of x265 HEVC Encoder
In an analysis reuse scheme of x2653 (
Amirpour et al. [1] consider all representations from the highest to the lowest bitrate representation as a potential, single reference to speed up the encoding of dependent representations. Using the median bitrate representation as the reference shows the best performance compared to using other representations as the reference encoding. De Praeter et al. [7] first encode the highest-quality representation in the highest resolution as the reference encoding. They predict encoding decisions for dependent representations by a random forest model. The variance of the transform coefficients, motion vector variance, and information about the block structure such as mean, variance, maximum, and minimum values of the co-located CU, PU, and TU block sizes are used as inputs to the random forest model. Grellert et al. [9] use the information from the highest-quality representation to speed up the encoding of other representations based on heuristic and learning-based methods. Random Forests were used as CTU depth upper-bound estimators to skip larger CUs in lower-quality representations. Yang et al. [28] identify CTUs that cover objects of interest in the highest-quality representation and allocate higher bitrates for lower-quality representations. Lindino et al. [12] inherit PU mode information from the highest-quality representation to accelerate the PU mode decision during the encoding of lower-quality representations. Gu et al. [10] utilize the high correlation between low- and high-quality representations along with information on low-quality representations to optimize and accelerate the rate control model of high-quality representations.
In this article, the multi-encoding scheme proposed in [8, 13] is considered as the State-of-the-art Multi-Encoding Scheme (
(i) | The highest bitrate representation of the lowest resolution is encoded. | ||||
(ii) | The remaining representations of the lowest resolution are encoded using | ||||
(iii) | As the multi-resolution approach (referred to as | ||||
(iv) | The remaining bitrate representations of the next higher resolution are encoded using | ||||
(v) | Repeat Step 3 for the remaining resolution layers in ascending order. | ||||
The encoding analysis flow diagram of this scheme is shown in Figure 4.
Fig. 4. Encoder analysis flow diagram of SMES. The golden arrow denotes encoder analysis sharing using SMRES-3 . The red line denotes the multi-resolution Mode-S .
3 PROPOSED HEURISTICS
This section discusses specific proposed heuristics that can be applied coupled with CU depth search optimization-based multi-rate encoding approaches to improve the overall encoding speed without compromising the coding efficiency.
Prediction Mode Heuristics: If the CU is split to the CU size chosen in the highest bitrate representation, the prediction mode heuristics is proposed for the remaining representations as follows:
(i) | As shown in Table 3, if | ||||
(ii) | As observed in Table 4, if | ||||
(iii) | It is observed in Table 5 that if any inter-prediction mode was selected as the best mode in the RDO process of the highest bitrate representation, it is less likely that any intra-prediction mode would be selected in the other representations. Hence, if any inter-prediction mode was chosen in the highest bitrate representation, RDO is skipped for intra-prediction modes. | ||||
(iv) | As noticed in Table 6, the probability of PUs in intermediate bitrate representations choosing an intra-prediction mode is very high when the co-located PUs in the highest and lowest bitrate representations have selected an intra-prediction mode. Hence, if an intra-prediction mode was chosen for the highest and the lowest bitrate representations, RDO is evaluated for only Merge/Skip 2N \(\times\) 2N and intra-prediction modes in the intermediate representations. | ||||
| Resolution | 540p (i = 0) | 1080p (i = 1) | 2160p (i = 2) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Bitrate | 0.5 Mbps | 1.0 Mbps | 1.5 Mbps | 3.0 Mbps | 4.5 Mbps | 5.8 Mbps | 11.6 Mbps | 16.8 Mbps | 20.0 Mbps |
| (j = 3) | (j = 2) | (j = 1) | (j = 3) | (j = 2) | (j = 1) | (j = 3) | (j = 2) | (j = 1) | |
| Average | 89.79% | 89.03% | 87.95% | 87.98% | 86.57% | 87.01% | 86.80% | 85.72% | 87.22% |
Table 3. Statistics of PUs in the Bitrate Representations (cf. Figure 1) That Chose Skip 2N \(\times\) 2N Mode When the Co-located PU in the Highest Bitrate Representation Had Chosen Skip 2N \(\times\) 2N Mode for JVET Sequences Used in This Article (cf. Table 11) Encoded Using the veryslow Preset of x265 HEVC Encoder
| Resolution | 540p (i = 0) | 1080p (i = 1) | 2160p (i = 2) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Bitrate | 0.5 Mbps | 1.0 Mbps | 1.5 Mbps | 3.0 Mbps | 4.5 Mbps | 5.8 Mbps | 11.6 Mbps | 16.8 Mbps | 20.0 Mbps |
| (j = 3) | (j = 2) | (j = 1) | (j = 3) | (j = 2) | (j = 1) | (j = 3) | (j = 2) | (j = 1) | |
| Average | 1.34% | 1.47% | 1.42% | 1.59% | 1.54% | 1.61% | 2.08% | 1.97% | 2.13% |
Table 4. Statistics of PUs in the Bitrate Representations (cf. Figure 1) That Chose AMP Mode When the Co-located PU in the Highest Bitrate Representation Had Chosen 2N x 2N Mode for JVET Sequences Used in This Article (cf. Table 11) Encoded Using the veryslow Preset of x265 HEVC Encoder
| Resolution | 540p (i = 0) | 1080p (i = 1) | 2160p (i = 2) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Bitrate | 0.5 Mbps | 1.0 Mbps | 1.5 Mbps | 3.0 Mbps | 4.5 Mbps | 5.8 Mbps | 11.6 Mbps | 16.8 Mbps | 20.0 Mbps |
| (j = 3) | (j = 2) | (j = 1) | (j = 3) | (j = 2) | (j = 1) | (j = 3) | (j = 2) | (j = 1) | |
| Average | 3.78% | 3.39% | 4.06% | 3.85% | 3.72% | 3.97% | 4.07% | 4.31% | 4.23% |
Table 5. Statistics of PUs in the Bitrate Representations (cf. Figure 1) That Chose Intra-prediction Mode When the Co-located PU in the Highest Bitrate Representation Had Chosen Intra-Prediction Mode for JVET Sequences Used in This Paper (cf. Table 11) Encoded Using the veryslow Preset of x265 HEVC Encoder
| Resolution | 540p (i = 0) | 1080p (i = 1) | 2160p (i = 2) | |||
|---|---|---|---|---|---|---|
| Bitrate | 1.0 Mbps | 1.5 Mbps | 4.5 Mbps | 5.8 Mbps | 16.8 Mbps | 20.0 Mbps |
| (j = 2) | (j = 1) | (j = 2) | (j = 1) | (j = 2) | (j = 1) | |
| Average | 89.76% | 87.31% | 86.38% | 85.32% | 87.35% | 89.97% |
Table 6. Statistics of PUs in the Intermediate Bitrate Representations (cf. Figure 1) That Chose Intra-prediction Mode When the Co-located PU in the Highest and Lowest Bitrate Representation Had Chosen Intra-prediction Mode for JVET Sequences Used in This Paper (cf. Table 11) Encoded Using the veryslow Preset of x265 HEVC Encoder
Motion Estimation Heuristics: If the CU is split to the CU size chosen in the highest bitrate representation and the PU size is also the same as that of the highest bitrate representation, the motion estimation heuristics is proposed for the remaining representations as follows:
(i) | It is observed in Table 7 that the probability of the dependent representations choosing the same reference frame as that in the highest bitrate representation is very high. Hence, the same reference frame is selected as the highest bitrate representation, and other reference frame searches are skipped in the dependent representations. | ||||
(ii) | The Motion Vector Predictor (MVP) is set to be the Motion Vector (MV) of the highest bitrate representation. | ||||
(iii) | The motion search range is decreased to a smaller window size if the MV of the highest bitrate representation and the MV of the lowest bitrate representation are close. The search range is determined to be the maximum difference between the x and y coordinates of the MVs. | ||||
| Resolution | 540p (i = 0) | 1080p (i = 1) | 2160p (i = 2) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Bitrate | 0.5 Mbps | 1.0 Mbps | 1.5 Mbps | 3.0 Mbps | 4.5 Mbps | 5.8 Mbps | 11.6 Mbps | 16.8 Mbps | 20.0 Mbps |
| (j = 3) | (j = 2) | (j = 1) | (j = 3) | (j = 2) | (j = 1) | (j = 3) | (j = 2) | (j = 1) | |
| Average | 82.79% | 81.36% | 79.80% | 81.98% | 79.96% | 79.83% | 82.03% | 80.17% | 80.15% |
Table 7. Statistics of PUs in the Bitrate Representations (cf. Figure 1) That Chose the Same Reference Frame as That of the Co-located PU in the Highest Bitrate Representation for JVET Sequences Used in This Article (cf. Table 11) Encoded Using the veryslow Preset of x265 HEVC Encoder
4 EFFICIENT MULTI-RATE ENCODING SCHEMES
This section introduces the proposed Efficient Multi-Rate Encoding Scheme (
4.1 Efficient Multi-Rate Encoding Scheme (EMRES )-1
It is observed in Table 8 that the CU depth in the intermediate bitrate representations is highly likely to be between the CU depth of the highest and lowest bitrate representations. Thus, in this scheme, which is based on previous work [2], the information from the highest and the lowest bitrate representations is used to reduce the encoding time of the intermediate representations. First, the highest bitrate representation is encoded as a stand-alone encoding (i.e., independent of RDO of other representations). The CU depth information obtained from this encoding is used as the upper bound of the CU depth estimation process of the lowest bitrate representation similar to [23]. The remaining intermediate-quality representations are encoded with a double-bound approach for CU depth search and reference frame selection as shown in Figure 5(a). Hence, in the CU depth search of the intermediate bitrate representations, the possible depth-level search window is limited by the depth values of the highest and lowest bitrate representations, as shown in Equation (4): (4) \(\begin{equation} {[}d_{L}, d_{U}{]} = {\left\lbrace \begin{array}{ll} {[}0, 3{]}, & {B_{i,0}} \\ {[}0, {d_{i,0}}{]}, & {B_{i,M_{i}-1}} \\ {[}{d_{i,M_{i}-1}, d_{i,0}}{]}, & {B_{i,j}} \forall j \in {{[}1, M_{i}-2{]},} \end{array}\right.} \end{equation}\) where \(d_{i,M_{i}-1}\) and \(d_{i,0}\) denote the CU depths chosen in the lowest and highest bitrate representations of the \((i)^{th}\) resolution, respectively. In addition, other reference frame searches are skipped if both the highest and lowest bitrate representations have the same reference frame [2].
Fig. 5. Encoder analysis flow diagram of (a) EMRES-1, (b) EMRES-2, (c) EMRES-3, and (d) EMRES-4. The gray line denotes the upper bound for CU depth estimation, the blue line denotes the lower bound of CU depth estimation, and the purple line represents the upper bound for CU depth estimation coupled with the proposed heuristics.
| Resolution | 540p (i = 0) | 1080p (i = 1) | 2160p (i = 2) | |||
|---|---|---|---|---|---|---|
| Bitrate | 1.0 Mbps | 1.5 Mbps | 4.5 Mbps | 5.8 Mbps | 16.8 Mbps | 20.0 Mbps |
| (j = 2) | (j = 1) | (j = 2) | (j = 1) | (j = 2) | (j = 1) | |
| Average | 88.76% | 87.92% | 90.21% | 89.83% | 90.04% | 91.39% |
Table 8. Statistics of CUs in the Intermediate Bitrate Representations (cf. Figure 1) That Have Depth Values between Highest and Lowest Bitrate Representations for JVET Sequences Used in This Article (cf. Table 11) Encoded Using the veryslow Preset of x265 HEVC Encoder
4.2 Heuristics-based Efficient Multi-rate Encoding Schemes
This section introduces the proposed multi-rate encoding schemes based on the proposed prediction mode and motion estimation heuristics.
Efficient Multi-rate Encoding Scheme (EMRES )-2.
This combines the
Efficient Multi-rate Encoding Scheme (EMRES )-3.
This combines the
Efficient Multi-rate Encoding Scheme (EMRES )-4.
This is tailor-made for parallel encoding environments. In this scheme, the lowest bitrate representation is encoded first. The CU depth search in the highest bitrate representation is lower bound by the CU depth values of the lowest bitrate representation. The remaining intermediate-quality representations are encoded with a double-bound approach for CU depth search and the proposed heuristics. This scheme is proposed specifically to encode multiple bitrate representations of a single resolution in parallel. Figure 5(d) shows the encoder analysis flowchart for this scheme.
5 EFFICIENT MULTI-ENCODING SCHEMES
In this section, multi-encoding schemes based on the proposed heuristics and multi-resolution approaches are introduced.
5.1 Efficient Multi-encoding Scheme (EMES )-1
In the first proposed multi-encoding scheme,
The encoding analysis flow diagram of this scheme is shown in Figure 6(a).
5.2 Efficient Multi-encoding Scheme (EMES)-2
The second proposed multi-encoding scheme is a small variation of
5.3 Efficient Multi-encoding Scheme (EMES)-3
The third proposed multi-encoding scheme is a small variation of
Fig. 7. Encoder analysis flow diagram of EMES-3 . The red line denotes the multi-resolution Mode-S (explained in Section 2).
6 EVALUATION
This section first introduces the test methodology used in this article and then presents the experimental results.
6.1 Test Methodology
All schemes presented in this article are implemented using x265 v3.56 with the veryslow preset and the Video Buffering Verifier (VBV) rate control mode. Psycho-visual optimizations and adaptive quantization are not used in the evaluation. For encoder analysis sharing in the considered schemes, per-segment encoding analysis metadata (file) is generated by the reference representations along with the HEVC bitstream and shared to the dependent representations. All experiments are run on a dual-processor server with Intel Xeon Gold 5218R (80 cores, frequency at 2.10 GHz), which utilizes multi-threading optimizations [19] (i.e., Wavefront Parallel Processing (WPP) and frame-threading) of x265. ABR ladder encoding of four test sequences was run in parallel. In this configuration, the full multi-threading capabilities of x265 are utilized.
Video sequences from JVET [4], MCML [5], and SJTU [25] datasets are used, representing various types of contents (cf. Table 11). The discussion of the multi-encoding schemes in this article is restricted to the resolutions that are integer power-of-2 multiples of each other. As mentioned in Table 9, 12 representations are considered in the ABR ladder: three resolutions with four bitrates for each resolution (\(M_{1} = M_{2} = M_{3} = 4\)). The bitrates are chosen in compliance to HTTP Live Streaming (HLS) specification7. The lowest resolution is \(960 \times 540\), intermediate resolution is \(1920 \times 1080\), and highest resolution is \(3840 \times 2160\) pixels. The lower-resolution sources were generated from the original video source by applying bi-cubic scaling using FFmpeg.8
The resulting encoding time, quality in terms of PSNR and VMAF,9 and achieved bitrate are compared for each test sequence. Since it is assumed that representations are displayed on the highest resolution, i.e., 2160p, all representations are scaled (bi-cubic) to 2160p to calculate VMAF and PSNR [6]. In the experimental results, \(\Delta T_{S}\) and \(\Delta T_{P}\) represent the cumulative encoding time savings for all bitrate representations compared to the stand-alone encoding in serial and parallel encoding scenarios, respectively; Bjøntegaard delta rates [3] \(BDR_{P}\) and \(BDR_{V}\) refer to the average increase in bitrate of the representations to that of the stand-alone encoding to maintain the same PSNR and VMAF, respectively. A positive BDR indicates a drop in coding efficiency of the proposed method compared to the stand-alone encoding, while a negative BDR represents a coding gain.
6.2 Experimental Results
In this section, the results of multi-rate encoding schemes and multi-encoding schemes are presented. The analysis for both serial and parallel encoding scenarios are discussed.
6.2.1 Multi-rate Encoding.
Serial Encoding: Table 10 provides the summary, and Figure 8 presents a graphical analysis of the encoding time savings (\(\Delta T_{S}\)) and BDR, respectively.
Fig. 8. Comparison of (a) \(BDR_P\) and (b) \(BDR_V\) and \(\Delta T\) of multi-rate encoding schemes in serial encoding. The blue and orange points represent the state-of-the-art and proposed multi-rate encoding schemes, respectively. The red dashed line in each figure has a slope of \(\frac{BDR}{\Delta T_{S}}\) of SMRES-1, which shows the least BDR, while the black line in each figure has a slope of \(\frac{BDR}{\Delta T_{S}}\) of SMRES-3, which shows the highest BDR.
Parallel Encoding: As shown in Table 10, the encoding time savings (\(\Delta T_{P}\)) is 0% for
6.2.2 Multi-encoding.
Serial Encoding: As shown in Table 11, using the
Fig. 9. Comparison of (a) \(BDR_P\) and (b) \(BDR_V\) and \(\Delta T\) of multi-encoding schemes in serial encoding. The blue and orange points represent the state-of-the-art and proposed multi-encoding schemes, respectively. The gray dashed line in each figure has a slope of \(\frac{BDR}{\Delta T_{S}}\) of SMES, which shows the highest BDR.
Table 11. Results for SMES
Table 12. Results for the Proposed Multi-encoding Schemes Using veryslow Preset of x265
Parallel Encoding: As shown in Table 11, using the
Fig. 10. Comparison of (a) \(BDR_P\) and (b) \(BDR_V\) and \(\Delta T\) of multi-encoding schemes in parallel encoding. The black dashed line in each figure has a slope of \(\frac{BDR}{\Delta T_{P}}\) of SMES, which shows the highest BDR.
Figure 11 summarizes the BD-PSNR and BD-VMAF results of the proposed multi-encoding schemes. The maximum-quality degradation in terms of PSNR is observed for the
Fig. 11. BD-PSNR and BD-VMAF results of the multi-encoding schemes.
Fig. 12. Relative encoding time (in percentage) of all bitrate representations for the considered multi-encoding schemes. The encoding times are normalized by the stand-alone encoding time of the 25 Mbps representation.
Table 13 provides a comprehensive analysis of the encoding time savings and BDR results using the ultrafast preset of x265. The encoding time savings decreases as we use faster presets. Since the amount of RDO computations is already very low in the ultrafast preset, the scope of significant encoding time savings is also low. The difference between the BDR values compared to the results using the veryslow preset is negligible. Figure 13 shows an example frame from the Bunny test sequence encoded with
Fig. 13. An example frame from the Bunny test sequence encoded at 4.5 Mbps (1080p) using (a) SMES, (b) its zoomed patches, (c) EMES-3, and (d) its zoomed patches. SMES yields a bitrate of 4.955 Mbps for VMAF 90, while EMES-3 yields a bitrate of 4.400 Mbps for VMAF 89.
Table 13. Results for the Proposed Multi-encoding Schemes Using ultrafast Preset of x265
Since
7 CONCLUSIONS
This article presented efficient multi-encoding schemes for HTTP Adaptive Streaming deployments. The schemes are analyzed by integrating them into the open-source x265 HEVC encoder. The article first explores existing state-of-the-art schemes for multi-rate encoding and multi-encoding. Prediction mode heuristics and motion estimation heuristics are proposed for multi-rate encoding. The experimental results demonstrate that, on average, the proposed heuristics improve the overall encoding time savings of CU depth-bound-based schemes by 12%, with negligible reduction in compression efficiency. Another efficient scheme for multi-rate encoding is proposed, saving 20.74% of the overall encoding time in parallel encoding with a negligible increase in bitrate. The article then proposes novel multi-encoding schemes that extend the proposed multi-rate encoding schemes across resolutions, emphasizing the best compression efficiency, best compression efficiency-encoding time savings tradeoff, and best encoding time savings, respectively. The experimental results suggest that the multi-encoding schemes (i) optimized for the highest compression efficiency, (ii) optimized for the best compression efficiency-encoding time savings tradeoff, and (iii) optimized for the best encoding time savings reduce the serial encoding time by 34.71%, 45,27%, and 68.76%, respectively, with a 2.3%, 3.1%, and 4.5% respective increase in bitrate to maintain the same VMAF. In parallel encoding, the overall encoding time is reduced by 22.03%, 20.72%, and 76.82%, respectively. The proposed multi-encoding scheme for the best encoding time savings also yields the best compression efficiency-encoding time savings tradeoff.
Footnotes
1 https://bitmovin.com/introducing-cloud-connect-encoding-aws-gcp-azure/ (accessed June 30, 2022).
Footnote2 http://x265.org/ (accessed June 30, 2022).
Footnote3 analysis-load-reuse-level = 6 in https://x265.readthedocs.io/en/master/cli.html.
Footnote4 analysis-load-reuse-level = 10, refine-intra = 4, refine-inter = 2, refine-mv = 1 in https://x265.readthedocs.io/en/master/cli.html.
Footnote5 scale-factor = 2, analysis-load-reuse-level = 10, refine-intra = 4, refine-inter = 2, refine-mv = 1 in https://x265.readthedocs.io/en/master/cli.html.
Footnote6 http://x265.org/ (accessed June 30, 2022).
Footnote7 https://developer.apple.com/documentation/http_live_streaming/hls_authoring_specification_for_apple_devices (accessed June 30, 2022).
Footnote8 https://ffmpeg.org/ffmpeg.html (Accessed June 30, 2022).
Footnote9 https://netflixtechblog.com/toward-a-practical-perceptual-video-quality-metric-653f208b9652 (accessed June 30, 2022).
Footnote
- [1] . 2021. Towards optimal multirate encoding for HTTP adaptive streaming. In MultiMedia Modeling: 27th International Conference (MMM’21), Proceedings, Part I. Springer-Verlag, Berlin, 469–480.
DOI: Google ScholarDigital Library
- [2] . 2020. Fast multi-rate encoding for adaptive HTTP streaming. In 2020 Data Compression Conference (DCC’20). 358–358.
DOI: Google ScholarCross Ref
- [3] . 2001. Calculation of average PSNR differences between RD-curves. In VCEG-M33.Google Scholar
- [4] . 2018. JVET-J1010: JVET Common Test Conditions and Software Reference Configurations.Google Scholar
- [5] . 2018. Subjective and objective quality assessment of compressed 4K UHD videos for immersive experience. IEEE Transactions on Circuits and Systems for Video Technology 28, 7 (2018), 1467–1480.
DOI: Google ScholarCross Ref
- [6] . 2016. Complexity-based consistent-quality encoding in the cloud. In 2016 IEEE International Conference on Image Processing (ICIP’16). 1484–1488.
DOI: Google ScholarCross Ref
- [7] . 2015. Fast simultaneous video encoder for adaptive streaming. In 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP’15). 1–6.
DOI: Google ScholarCross Ref
- [8] . 2018. Adaptive multi-resolution encoding for ABR streaming. In 2018 25th IEEE International Conference on Image Processing (ICIP’18). 1008–1012.
DOI: Google ScholarCross Ref
- [9] . 2021. Coding mode decision algorithm for fast HEVC transrating using heuristics and machine learning. J. Real-Time Image Process. 18, 6 (
Dec. 2021), 1881–1896.DOI: Google ScholarDigital Library
- [10] . 2018. Multi-representations encoding framework for adaptive http streaming. In 2018 25th IEEE International Conference on Image Processing (ICIP’18). 988–992.
DOI: Google ScholarCross Ref
- [11] . 2021. Efficient bitrate ladder construction for content-optimized adaptive video streaming. IEEE Open Journal of Signal Processing 2 (2021), 496–511.
DOI: Conference Name: IEEE Open Journal of Signal Processing .Google ScholarCross Ref
- [12] . 2021. Low-complexity HEVC transrating based on prediction unit mode inheritance. In 2020 28th European Signal Processing Conference (EUSIPCO’21). 550–554.
DOI: Google ScholarCross Ref
- [13] . 2020. Open source framework for reduced-complexity multi-rate HEVC encoding. In Applications of Digital Image Processing XLIII, and (Eds.), Vol. 11510. International Society for Optics and Photonics, SPIE, 461–471.
DOI: Google ScholarCross Ref
- [14] . 2021. Efficient content-adaptive feature-based shot detection for HTTP adaptive streaming. In 2021 IEEE International Conference on Image Processing (ICIP’21). 2174–2178.
DOI: Google ScholarCross Ref
- [15] . 2022. CODA: Content-aware frame dropping algorithm for high frame-rate video streaming. In 2022 Data Compression Conference (DCC’22). 475–475.
DOI: Google ScholarCross Ref
- [16] . 2022. OPTE: Online per-title encoding for live video streaming. In 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’22). 1865–1869.
DOI: ISSN: 2379-190X .Google ScholarCross Ref
- [17] . 2021. Efficient multi-encoding algorithms for HTTP adaptive bitrate streaming. (2021), 1–5.
DOI: Google ScholarCross Ref
- [18] . 2021. INCEPT: Intra CU depth prediction for HEVC. In 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP’21). 1–6.
DOI: Google ScholarCross Ref
- [19] . 2014. A multi-threaded full-feature HEVC encoder based on wavefront parallel processing. In 2014 International Conference on Signal Processing and Multimedia Applications (SIGMAP’14). 90–98.
DOI: Google ScholarDigital Library
- [20] . 2019. Optimal multi-codec adaptive bitrate streaming. In 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW’19). 348–353.
DOI: Google ScholarCross Ref
- [21] . 2021. Per-clip and per-bitrate adaptation of the Lagrangian multiplier in video coding. In Applications of Digital Image Processing XLIV, Vol. 11842. SPIE, 185–194.
DOI: Google ScholarCross Ref
- [22] . 2018. Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming. IEEE Transactions on Circuits and Systems for Video Technology 28, 1 (
Jan. 2018), 143–157.DOI: Google ScholarDigital Library
- [23] . 2015. Block structure reuse for multi-rate high efficiency video coding. In 2015 IEEE International Conference on Image Processing (ICIP’15). 3972–3976.
DOI: Google ScholarDigital Library
- [24] . 2011. The MPEG-DASH standard for multimedia streaming over the Internet. IEEE MultiMedia 18, 4 (
April 2011), 62–67.DOI: Google ScholarDigital Library
- [25] . 2013. The SJTU 4K video sequence dataset. In 2013 5th International Workshop on Quality of Multimedia Experience (QoMEX’13). 34–35.
DOI: Google ScholarCross Ref
- [26] . 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668.
DOI: Google ScholarDigital Library
- [27] . 2003. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 560–576.
DOI: Google ScholarDigital Library
- [28] . 2019. Object-based rate adjustment for HEVC transrating. In 2019 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW’19). 1–2.
DOI: Google ScholarCross Ref
Index Terms
EMES: Efficient Multi-encoding Schemes for HEVC-based Adaptive Bitrate Streaming
Recommendations
Mixed-resolution HEVC based multiview video codec for low bitrate transmission
There has been increasing demand for multiview video transmission over band limited channel over past years and various techniques have been proposed to fulfil this need. In this paper, a High Efficiency Video Codec (HEVC) based spatial resolution ...
Efficient Multi-Rate Video Encoding for HEVC-Based Adaptive HTTP Streaming
Adaptive HTTP streaming requires a video to be encoded at multiple representations, that is, different qualities. Encoding these multiple representations is a computationally complex process, especially when using the recent High Efficiency Video Coding ...
Efficient algorithms for HEVC bitrate transcoding
HEVC, which is the latest video coding standard, resulting in much higher compression efficiency than any previous standards. It is expected to take the place of the widely deployed standard H.264. The final version of HEVC has been published by ISO/IEC ...



















Comments