Abstract
Immersive video streaming technologies improve Virtual Reality (VR) user experience by providing users more intuitive ways to move in simulated worlds, e.g., with 6 Degree-of-Freedom (6DoF) interaction mode. A naive method to achieve 6DoF is deploying cameras at numerous different positions and orientations that may be required based on users’ movement, which unfortunately is expensive, tedious, and inefficient. A better solution for realizing 6DoF interactions is to synthesize target views on-the-fly from a limited number of source views. While such view synthesis is enabled by the recent Test Model for Immersive Video (TMIV) codec, TMIV dictates manually-composed configurations, which cannot exercise the tradeoff among video quality, decoding time, and bandwidth consumption. In this article, we study the limitation of TMIV and solve its configuration optimization problem by searching for the optimal configuration in a huge configuration space. We first identify the critical parameters in the TMIV configurations. Then, we introduce two Neural Network (NN)-based algorithms from two heterogeneous aspects: (i) a Convolutional Neural Network (CNN) algorithm solving a regression problem and (ii) a Deep Reinforcement Learning (DRL) algorithm solving a decision making problem, respectively. We conduct both objective and subjective experiments to evaluate the CNN and DRL algorithms on two diverse datasets: an equirectangular and a perspective projection dataset. The objective evaluations reveal that both algorithms significantly outperform the default configurations. In particular, with the equirectangular (perspective) projection dataset, the proposed algorithms only require 95% (23%) decoding time, stream 79% (23%) views, and improve the utility by 6% (73%) on average. The subjective evaluations confirm the proposed algorithms consume fewer resources while achieving comparable Quality of Experience (QoE) than the default and the optimal TMIV configurations.
- [1] . 2020. QoE-fair DASH video streaming using server-side reinforcement learning. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 2s (2020), 68:1–68:21. Google Scholar
Digital Library
- [2] . 1997. Novel view synthesis in tensor space. In Proc. of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’97). 1034–1040. Google Scholar
Digital Library
- [3] . 2019. A survey on bitrate adaptation schemes for streaming media over HTTP. IEEE Communications Surveys Tutorials 21, 1 (2019), 562–585.Google Scholar
Cross Ref
- [4] . 2010. Online game QoE evaluation using paired comparisons. In Proc. of IEEE International Workshop Technical Committee on Communications Quality and Reliability (CQR’10). 1–6.Google Scholar
- [5] . 1993. View interpolation for image synthesis. In Proc. of ACM Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’93). 279–288. Google Scholar
Digital Library
- [6] . 2015. Adaptive video transmission control system based on reinforcement learning approach over heterogeneous networks. IEEE Transactions on Automation Science and Engineering 12, 3 (2015), 1104–1113.Google Scholar
Cross Ref
- [7] . 2016. Online learning adaptation strategy for DASH clients. In Proc. of ACM International Conference on Multimedia Systems (MMSys’16). 8:1–8:12. Google Scholar
Digital Library
- [8] . 2018. Dynamic adaptive streaming for multi-viewpoint omnidirectional videos. In Proc. of ACM International Conference on Multimedia Systems Conference (MMSys’18). 237–249. Google Scholar
Digital Library
- [9] . 2019. MAMUT: Multi-agent reinforcement learning for efficient real-time multi-user video transcoding. In Proc. of IEEE Design, Automation Test in Europe Conference Exhibition (DATE’19). 558–563.Google Scholar
Cross Ref
- [10] . 2018. Updated Call for Test Materials for 3DoF+ Visual. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG2018/N17617. (2018).Google Scholar
- [11] . 2018. View selection for virtual view synthesis in free navigation systems. In Proc. of IEEE International Conference on Signals and Electronic Systems (ICSES’18). 83–87.Google Scholar
Cross Ref
- [12] . 2019. A survey on 360\(^{\circ }\) video streaming: Acquisition, transmission, and display. Comput. Surveys 52, 4 (2019), 71:1–71:36. Google Scholar
Digital Library
- [13] . 2020. An immersive video experience with real-time view synthesis leveraging the upcoming MIV distribution standard. In 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). 1–2.Google Scholar
Cross Ref
- [14] . 2019. 360SRL: A sequential reinforcement learning approach for ABR tile-based 360 video streaming. In Proc. of IEEE International Conference on Multimedia and Expo (ICME’19). 290–295.Google Scholar
Cross Ref
- [15] . 2017. D-DASH: A deep Q-learning framework for DASH video streaming. IEEE Transactions on Cognitive Communications and Networking 3, 4 (2017), 703–718.Google Scholar
Cross Ref
- [16] . 2013. Psychophysics: The Fundamentals. Psychology Press.Google Scholar
Cross Ref
- [17] . 2017. A rate adaptation algorithm for tile-based 360-degree video streaming. arXiv preprint arXiv:1704.08215 (2017).Google Scholar
- [18] . 2015. A learning-based algorithm for improved bandwidth-awareness of adaptive streaming clients. In Proc. of IFIP/IEEE International Symposium on Integrated Network Management (IM’15). 131–138.Google Scholar
- [19] . 2017. Towards coordinated bandwidth adaptations for hundred-scale 3D tele-immersive systems. Multimedia Systems 23, 4 (2017), 421–434. Google Scholar
Digital Library
- [20] . 2019. HTC VIVE. (2019).Retrieved April 21, 2020 from https://www.vive.com/tw/product/vive.Google Scholar
- [21] . 2017. HEVC/H.265 coding unit split decision using deep reinforcement learning. In Proc. of IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS’17). 570–575.Google Scholar
- [22] . 2018. Reinforcement learning for HEVC/H.265 intra-frame rate control. In Proc. of IEEE International Symposium on Circuits and Systems (ISCAS’18). 1–5.Google Scholar
- [23] . 2017. 6-DOF VR videos with a single 360-camera. In Proc. of IEEE Virtual Reality Conference (VR’17). 37–44.Google Scholar
Cross Ref
- [24] . 2018. QARC: Video quality aware rate control for real-time video streaming based on deep reinforcement learning. In Proc. of ACM International Conference on Multimedia (MM’18). 1208–1216. Google Scholar
Digital Library
- [25] . 2020. Towards viewport-dependent 6DoF 360 video tiled streaming for virtual reality systems. In Proc. of ACM International Conference on Multimedia (MM’20). 3687–3695. Google Scholar
Digital Library
- [26] . 2018. Plato: Learning-based adaptive streaming of 360-degree videos. In Proc. of IEEE Conference on Local Computer Networks (LCN’18). 393–400.Google Scholar
Cross Ref
- [27] . 2019. Common Test Conditions for Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/N18563. (2019).Google Scholar
- [28] . 2018. A survey of emerging concepts and challenges for QoE management of multimedia services. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2s (2018), 29:1–29:29. Google Scholar
Digital Library
- [29] . 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations Track Proceedings (poster).Google Scholar
- [30] . 2014. Design and evaluation of a self-learning HTTP adaptive video streaming client. IEEE Communications Letters 18, 4 (2014), 716–719.Google Scholar
Cross Ref
- [31] . 2019. HM 16.16. (2019). Retrieved April 21, 2020 from https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-16.16/.Google Scholar
- [32] . 2020. Text of ISO/IEC FDIS 23090-5 Visual Volumetric Video-based Coding and Video-based Point Cloud Compression. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/w19579. (2020).Google Scholar
- [33] . 2021. Text of ISO/IEC DIS 23090-12 MPEG Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/w20003. (2021).Google Scholar
- [34] . 2009. View synthesis for advanced 3D video systems. EURASIP Journal on Image and Video Processing (2009), 438148:1–438148:11.Google Scholar
- [35] . 2019. Towards low latency multi-viewpoint 360\(^{\circ }\) interactive video: A multimodal deep reinforcement learning approach. In Proc. of IEEE Conference on Computer Communications (INFOCOM’19). 991–999.Google Scholar
- [36] . 1975. The analysis of permutations. Journal of the Royal Statistical Society. Series C (Applied Statistics) 24, 2 (1975), 193–202.Google Scholar
- [37] . 2018. Virtual Reality (VR) Market by Hardware and Software for (Consumer, Commercial, Enterprise, Medical, Aerospace and Defense, Automotive, Energy and Others): Global Industry Perspective, Comprehensive Analysis and Forecast, 2016–2022. (2018). Retrieved April 21, 2020 from https://www.zionmarketresearch.com/report/virtual-reality-market.Google Scholar
- [38] . 2019. Multi-pass Add-on Tool for Coherent and Complete View Synthesis (US Patent 2019/0320164 A1). (2019).Google Scholar
- [39] . 2019. Test Model 2 for Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/N18577. (2019).Google Scholar
- [40] . 2019. Test Model 3 for Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/N18795. (2019).Google Scholar
- [41] . 2019. Test Model for Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/N18470. (2019).Google Scholar
- [42] . 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668. Google Scholar
Digital Library
- [43] . 2017. Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE Signal Processing Letters 24, 9 (2017), 1408–1412.Google Scholar
- [44] . 2018. Reinforcement Learning: An Introduction (2 ed.). A Bradford Book. Google Scholar
Digital Library
- [45] . 2009. View synthesis techniques for 3D video. In Proc. of SPIE Conference on Applications of Digital Image Processing (ADIP’09). 74430T:1–74430T:11.Google Scholar
Cross Ref
- [46] . 2020. A survey on adaptive 360\(^{\circ }\) video streaming: Solutions, challenges and opportunities. IEEE Communications Surveys Tutorials 22, 4 (2020), 2801–2838.Google Scholar
Cross Ref
- [47] . 2019. DRL360: 360-degree video streaming with deep reinforcement learning. In Proc. of IEEE Conference on Computer Communications (INFOCOM’19). 1252–1260.Google Scholar
Cross Ref
Index Terms
Optimizing Immersive Video Coding Configurations Using Deep Learning: A Case Study on TMIV
Recommendations
XREmul: An Emulation Environment for XR Application Development
HotMobile '19: Proceedings of the 20th International Workshop on Mobile Computing Systems and ApplicationsEXtended Reality(XR), which includes the concepts of virtual reality, augmented reality and mixed reality, is a promising technology for the research community and also the commercial domain in the sense that it can open a variety of new applications in ...
Don’t make me sick: investigating the incidence of cybersickness in commercial virtual reality headsets
AbstractThe resurgence of interest in the use of virtual reality (VR) technology for research and entertainment purposes has led to an increase in concerns about human factor issues inherent in VR technology. One issue that has received a great deal of ...
Stepping off a ledge in an HMD-based immersive virtual environment
SAP '13: Proceedings of the ACM Symposium on Applied PerceptionWe explore whether a gender-matched, calibrated self-avatar affects the perception of the affordance of stepping off of a ledge, or visual cliff, in an immersive virtual environment. Visual cliffs form demonstrations in many immersive virtual ...






Comments