ABSTRACT
In the typical transcoding pipeline for adaptive video streaming, raw videos are pre-chunked and pre-encoded according to a set of resolution-bitrate or resolution-quality pairs on the server-side, where the pair is often named as bitrate ladder. Different from existing heuristics, we argue that a good bitrate ladder should be optimized by considering video content features, network capacity, and storage costs on the cloud. We propose DeepLadder, a per-chunk optimization scheme which adopts state-of-the-art deep reinforcement learning (DRL) method to optimize the bitrate ladder w.r.t the above concerns. Technically, DeepLadder selects the proper setting for each video resolution autoregressively. We use over 8,000 video chunks, measure over 1,000,000 perceptual video qualities, collect real-world network traces for more than 50 hours, and invent faithful virtual environments to help train DeepLadder efficiently. Across a series of comprehensive experiments on both Constant Bitrate (CBR) and Variable Bitrate (VBR)-encoded videos, we demonstrate significant improvements in average video quality bandwidth utilization, and storage overhead in comparison to prior work as well as the ability to be deployed in the real-world transcoding framework.
References
- 2019. HTTP Live Streaming. https://developer.apple.com/streaming/. (2019).Google Scholar
- 2019. Youtube. (2019). https://www.youtube.comGoogle Scholar
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265--283.Google Scholar
Digital Library
- Zahaib Akhtar, YunSeongNam, Ramesh Govindan, et al. 2018. Oboe: auto-tuning video ABR algorithms to network conditions. In SIGCOMM 2018. ACM, 44--58.Google Scholar
Digital Library
- Pedro A Amado Assuncao and I Ghanbari. 1997. Optimal transcoding of compressed video. In Proceedings of International Conference on Image Processing, Vol. 1. IEEE, 739--742.Google Scholar
Cross Ref
- Abdelhak Bentaleb, Ali C Begen, Saad Harous, and Roger Zimmermann. 2018. A distributed approach for bitrate selection in HTTP adaptive streaming. In Proceedings of the 26th ACM international conference on Multimedia. 573--581.Google Scholar
Digital Library
- Abdelhak Bentaleb, Bayan Taani, Ali C Begen, Christian Timmerer, and Roger Zimmermann. 2018. A Survey on Bitrate Adaptation Schemes for Streaming Media over HTTP. IEEE Communications Surveys & Tutorials (2018).Google Scholar
- Chao Chen, Yao-Chung Lin, Steve Benting, and Anil Kokaram. 2018. Optimized transcoding for large scale adaptive streaming using playback statistics. In 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 3269--3273.Google Scholar
Cross Ref
- Cisco. 2017. Cisco Visual Networking Index: Forecast and Methodology, 2016-2021. (2017). https://www.cisco.com/c/dam/en/us/solutions/collateral/service-provider/visual-networking-index-vni/complete-white-paper-c11-481360.pdfGoogle Scholar
- Jan De Cock, Zhi Li, Megha Manohara, and Anne Aaron. 2016. Complexity-based consistent-quality encoding in the cloud. In 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 1484--1488.Google Scholar
Cross Ref
- Jan De Cock, Aditya Mavlankar, Anush Moorthy, and Anne Aaron. 2016. A large-scale video codec comparison of x264, x265 and libvpx for practical VOD applications. In Applications of Digital Image Processing XXXIX, Vol. 9971. International Society for Optics and Photonics, 997116.Google Scholar
- FFmpeg. 2020. FFmpeg. https://ffmpeg.org. (2020).Google Scholar
- M. Gadaleta, F. Chiariotti, M. Rossi, and A. Zanella. 2017. D-DASH: A Deep Q-Learning Framework for DASH Video Streaming. IEEE Transactions on Cognitive Communications and Networking 3, 4 (Dec 2017), 703--718. Google Scholar
Cross Ref
- Guanyu Gao and Yonggang Wen. 2016. Morph: A fast and scalable cloud transcoding system. In Proceedings of the 24th ACM international conference on Multimedia. 1160--1163.Google Scholar
Digital Library
- Antonio Gulli and Sujit Pal. 2017. Deep learning with Keras. Packt Publishing Ltd.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google Scholar
Cross Ref
- Alain Hore and Djemel Ziou. 2010. Image quality metrics: PSNR vs. SSIM. In 2010 20th International Conference on Pattern Recognition. IEEE, 2366--2369.Google Scholar
Digital Library
- Angeliki V Katsenou, Joel Sole, and David R Bull. 2019. Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming. In 2019 Picture Coding Symposium (PCS). IEEE, 1--5.Google Scholar
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Jani Lainema, Frank Bossen, Woo-Jin Han, Junghye Min, and Kemal Ugur. 2012. Intra coding of the HEVC standard. IEEE transactions on circuits and systems for video technology 22, 12 (2012), 1792--1801.Google Scholar
Digital Library
- Jean Le Feuvre, Jean-Marc Thiesse, Matthieu Parmentier, Mickaël Raulet, and Christophe Daguet. 2014. Ultra high definition HEVC DASH data set. In Proceedings of the 5th ACM Multimedia Systems Conference. 7--12.Google Scholar
Digital Library
- Ehud Lehrer and Rann Smorodinsky. 2000. Relative entropy in sequential decision problems. Journal of Mathematical Economics 33, 4 (2000), 425--439.Google Scholar
Cross Ref
- Hongzi Mao. 2017. Pensieve-traces. (Jul 2017). https://www.dropbox.com/sh/ss0zs1lc4cklu3u/AAB-8WC3cHD4PTtYT0E4M19Ja?dl=0Google Scholar
- Hongzi Mao, Ravi Netravali, Mohammad Alizadeh, et al. 2017. Neural adaptive video streaming with pensieve. In SIGCOMM 2017. ACM, 197--210.Google Scholar
Digital Library
- NVIDIA. 2020. GPU-accelerated video processing integrated into the most popular open-source multimedia tools. (2020). https://developer.nvidia.com/ffmpegGoogle Scholar
- Yanyuan Qin, Shuai Hao, Krishna R Pattipati, Feng Qian, Subhabrata Sen, Bing Wang, and Chaoqun Yue. 2018. ABR streaming of VBR-encoded videos: characterization, challenges, and solutions. In Proceedings of the 14th International Conference on emerging Networking EXperiments and Technologies. 366--378.Google Scholar
Digital Library
- Jason J Quinlan and Cormac J Sreenan. 2018. Multi-profile ultra high definition (UHD) AVC and HEVC 4K DASH datasets. In Proceedings of the 9th ACM Multimedia Systems Conference. 375--380.Google Scholar
Digital Library
- Reza Rassool. 2017. VMAF reproducibility: Validating a perceptual practical video quality metric. In Broadband Multimedia Systems and Broadcasting (BMSB), 2017 IEEE International Symposium on. IEEE, 1--2.Google Scholar
Cross Ref
- Fixed Broadband Report. 2016. Raw Data Measuring Broadband America 2016. https://www.fcc.gov/reports-research/reports/measuring-broadband-america/raw-data-measuring-broadband-america-2016. (2016). [Online; accessed 19-July-2016].Google Scholar
- Yuriy Reznik, Xiangbo Li, Karl Lillevold, Robert Peck, Thom Shutt, and Peter Howard. 2020. Optimizing Mass-Scale Multi-Screen Video Delivery. SMPTE Motion Imaging Journal 129, 3 (2020), 26--38.Google Scholar
Cross Ref
- Yuriy A Reznik, Xiangbo Li, Karl O Lillevold, Abhijith Jagannath, and Justin Greer. 2019. Optimal Multi-Codec Adaptive Bitrate Streaming. In 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 348--353.Google Scholar
- Yuriy A Reznik, Karl O Lillevold, Abhijith Jagannath, Justin Greer, and Jon Corley. 2018. Optimal design of encoding profiles for abr streaming. In Proceedings of the 23rd Packet Video Workshop. 43--47.Google Scholar
Digital Library
- Haakon Riiser, Paul Vigmostad, Carsten Griwodz, and Pål Halvorsen. 2013. Commute path bandwidth traces from 3G networks: analysis and applications. In Proceedings of the 4th ACM Multimedia Systems Conference. ACM, 114--118.Google Scholar
Digital Library
- Werner Robitza. 2017. CRF Guide. (2017). https://slhck.info/video/2017/02/24/crf-guide.htmlGoogle Scholar
- Silvia Rossi, Cagri Ozcinar, Aljosa Smolic, and Laura Toni. 2020. Do Users Behave Similarly in VR? Investigation of the User Influence on the System Design. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16, 2 (2020), 1--26.Google Scholar
Digital Library
- SandDrive. 2020. COVID-19 Global Internet Phenomena Report. (2020). https://www.sandvine.com/phenomenaGoogle Scholar
- Hamid R Sheikh, Muhammad F Sabir, and Alan C Bovik. 2006. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on image processing 15, 11 (2006), 3440--3451.Google Scholar
Digital Library
- Kevin Spiteri, Ramesh Sitaraman, Daniel Sparacio, et al. 2018. From theory to practice: improving bitrate adaptation in the DASH reference player. In MMSys 2018. ACM, 123--137.Google Scholar
- Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1. MIT press Cambridge.Google Scholar
Digital Library
- Yuan Tang. 2016. TF. Learn: TensorFlow's high-level module for distributed machine learning. arXiv preprint arXiv:1612.04251 (2016).Google Scholar
- Laura Toni, Ramon Aparicio-Pardo, Karine Pires, Gwendal Simon, Alberto Blanc, and Pascal Frossard. 2015. Optimal selection of adaptive streaming representations. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 11, 2s (2015), 1--26.Google Scholar
Digital Library
- Usc-Nsl. 2018. USC-NSL/Oboe. (Oct 2018). https://github.com/USC-NSL/OboeGoogle Scholar
- Vivek Veeriah, Junhyuk Oh, and Satinder Singh. 2018. Many-goals reinforcement learning. arXiv preprint arXiv:1806.09605 (2018).Google Scholar
- Wei Wang, Vincent W Zheng, Han Yu, and Chunyan Miao. 2019. A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1--37.Google Scholar
Digital Library
- Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600--612.Google Scholar
Digital Library
- Daniel Weinberger. 2015. Video Bitrates for Streaming. (2015). https://bitmovin.com/video-bitrate-streaming-hls-dash/Google Scholar
- Keith Winstein, Anirudh Sivaraman, and Hari Balakrishnan. 2013. Stochastic forecasts achieve high throughput and low delay over cellular networks. (2013), 459--471.Google Scholar
- x265.org. 2015. The x265 website. https://x265.org/. (2015).Google Scholar
- Huaifei Xing, Zhichao Zhou, Jialiang Wang, Huifeng Shen, Dongliang He, and Fu Li. 2019. Predicting Rate Control Target Through A Learning Based Content Adaptive Model. In 2019 Picture Coding Symposium (PCS). IEEE, 1--5.Google Scholar
- Yanling Xu, Yueqiang Lin, and Chenfeng Yu. 2020. Rate-Distortion Cost Estimation Model Based on Cauchy Distributions for HEVC Encoder. In 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Vol. 1. IEEE, 436--440.Google Scholar
- Deheng Ye, Zhao Liu, Mingfei Sun, Bei Shi, Peilin Zhao, Hao Wu, Hongsheng Yu, Shaojie Yang, Xipeng Wu, Qingwei Guo, et al. 2019. Mastering Complex Control in MOBA Games with Deep Reinforcement Learning. arXiv preprint arXiv:1912.09729 (2019).Google Scholar
- Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. 2015. A control-theoretic approach for dynamic adaptive video streaming over HTTP. In SIGCOMM 2015. ACM, 325--338.Google Scholar
Digital Library
- Anatoliy Zabrovskiy, Christian Feldmann, and Christian Timmerer. 2018. Multi-codec DASH dataset. In Proceedings of the 9th ACM Multimedia Systems Conference. 438--443.Google Scholar
Digital Library
- Hui Zhang, Xiuhua Jiang, and Xiaohua Lei. 2015. A method for evaluating QoE of live streaming services. international Journal of computer and electrical engineering 7, 5 (2015), 296.Google Scholar
Index Terms
Deep reinforced bitrate ladders for adaptive video streaming





Comments