skip to main content
research-article

Optimizing Immersive Video Coding Configurations Using Deep Learning: A Case Study on TMIV

Authors Info & Claims
Published:27 January 2022Publication History
Skip Abstract Section

Abstract

Immersive video streaming technologies improve Virtual Reality (VR) user experience by providing users more intuitive ways to move in simulated worlds, e.g., with 6 Degree-of-Freedom (6DoF) interaction mode. A naive method to achieve 6DoF is deploying cameras at numerous different positions and orientations that may be required based on users’ movement, which unfortunately is expensive, tedious, and inefficient. A better solution for realizing 6DoF interactions is to synthesize target views on-the-fly from a limited number of source views. While such view synthesis is enabled by the recent Test Model for Immersive Video (TMIV) codec, TMIV dictates manually-composed configurations, which cannot exercise the tradeoff among video quality, decoding time, and bandwidth consumption. In this article, we study the limitation of TMIV and solve its configuration optimization problem by searching for the optimal configuration in a huge configuration space. We first identify the critical parameters in the TMIV configurations. Then, we introduce two Neural Network (NN)-based algorithms from two heterogeneous aspects: (i) a Convolutional Neural Network (CNN) algorithm solving a regression problem and (ii) a Deep Reinforcement Learning (DRL) algorithm solving a decision making problem, respectively. We conduct both objective and subjective experiments to evaluate the CNN and DRL algorithms on two diverse datasets: an equirectangular and a perspective projection dataset. The objective evaluations reveal that both algorithms significantly outperform the default configurations. In particular, with the equirectangular (perspective) projection dataset, the proposed algorithms only require 95% (23%) decoding time, stream 79% (23%) views, and improve the utility by 6% (73%) on average. The subjective evaluations confirm the proposed algorithms consume fewer resources while achieving comparable Quality of Experience (QoE) than the default and the optimal TMIV configurations.

REFERENCES

  1. [1] Altamimi S. and Shirmohammadi S.. 2020. QoE-fair DASH video streaming using server-side reinforcement learning. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 2s (2020), 68:1–68:21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Avidan S. and Shashua A.. 1997. Novel view synthesis in tensor space. In Proc. of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’97). 10341040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Bentaleb A., Taani B., Begen A., Timmerer C., and Zimmermann R.. 2019. A survey on bitrate adaptation schemes for streaming media over HTTP. IEEE Communications Surveys Tutorials 21, 1 (2019), 562585.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chang Y., Chen K., Wu C., Ho C., and Lei C.. 2010. Online game QoE evaluation using paired comparisons. In Proc. of IEEE International Workshop Technical Committee on Communications Quality and Reliability (CQR’10). 16.Google ScholarGoogle Scholar
  5. [5] Chen S. and Williams L.. 1993. View interpolation for image synthesis. In Proc. of ACM Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’93). 279288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Cheng B., Yang J., Wang S., and Chen J.. 2015. Adaptive video transmission control system based on reinforcement learning approach over heterogeneous networks. IEEE Transactions on Automation Science and Engineering 12, 3 (2015), 11041113.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chiariotti F., D’Aronco S., Toni L., and Frossard P.. 2016. Online learning adaptation strategy for DASH clients. In Proc. of ACM International Conference on Multimedia Systems (MMSys’16). 8:1–8:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Corbillon X., Simone F., Simon G., and Frossard P.. 2018. Dynamic adaptive streaming for multi-viewpoint omnidirectional videos. In Proc. of ACM International Conference on Multimedia Systems Conference (MMSys’18). 237249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Costero L., Iranfar A., Zapater M., Igual F., Olcoz K., and Atienza D.. 2019. MAMUT: Multi-agent reinforcement learning for efficient real-time multi-user video transcoding. In Proc. of IEEE Design, Automation Test in Europe Conference Exhibition (DATE’19). 558563.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Doré R. and Lafruit G.. 2018. Updated Call for Test Materials for 3DoF+ Visual. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG2018/N17617. (2018).Google ScholarGoogle Scholar
  11. [11] Dziembowski A., Samelak J., and Domański M.. 2018. View selection for virtual view synthesis in free navigation systems. In Proc. of IEEE International Conference on Signals and Electronic Systems (ICSES’18). 8387.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Fan C., Lo W., Pai Y., and Hsu C.. 2019. A survey on 360\(^{\circ }\) video streaming: Acquisition, transmission, and display. Comput. Surveys 52, 4 (2019), 71:1–71:36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Fleureau J., Chupeau B., Thudor F., Briand G., Tapie T., and Doré R.. 2020. An immersive video experience with real-time view synthesis leveraging the upcoming MIV distribution standard. In 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). 12.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Fu J., Chen X., Zhang Z., Wu S., and Chen Z.. 2019. 360SRL: A sequential reinforcement learning approach for ABR tile-based 360 video streaming. In Proc. of IEEE International Conference on Multimedia and Expo (ICME’19). 290295.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Gadaleta M., Chiariotti F., Rossi M., and Zanella A.. 2017. D-DASH: A deep Q-learning framework for DASH video streaming. IEEE Transactions on Cognitive Communications and Networking 3, 4 (2017), 703718.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Gescheider G.. 2013. Psychophysics: The Fundamentals. Psychology Press.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Ghosh A., Aggarwal V., and Qian F.. 2017. A rate adaptation algorithm for tile-based 360-degree video streaming. arXiv preprint arXiv:1704.08215 (2017).Google ScholarGoogle Scholar
  18. [18] Hooft J., Petrangeli S., Claeys M., Famaey J., and Turck F.. 2015. A learning-based algorithm for improved bandwidth-awareness of adaptive streaming clients. In Proc. of IFIP/IEEE International Symposium on Integrated Network Management (IM’15). 131138.Google ScholarGoogle Scholar
  19. [19] Hosseini M., Kurillo G., Etesami S., and Yu J.. 2017. Towards coordinated bandwidth adaptations for hundred-scale 3D tele-immersive systems. Multimedia Systems 23, 4 (2017), 421434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] VIVE HTC. 2019. HTC VIVE. (2019).Retrieved April 21, 2020 from https://www.vive.com/tw/product/vive.Google ScholarGoogle Scholar
  21. [21] Hu J., Peng W., and Chung C.. 2017. HEVC/H.265 coding unit split decision using deep reinforcement learning. In Proc. of IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS’17). 570575.Google ScholarGoogle Scholar
  22. [22] Hu J., Peng W., and Chung C.. 2018. Reinforcement learning for HEVC/H.265 intra-frame rate control. In Proc. of IEEE International Symposium on Circuits and Systems (ISCAS’18). 15.Google ScholarGoogle Scholar
  23. [23] Huang J., Chen Z., Ceylan D., and Jin H.. 2017. 6-DOF VR videos with a single 360-camera. In Proc. of IEEE Virtual Reality Conference (VR’17). 3744.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Huang T., Zhang R., Zhou C., and Sun L.. 2018. QARC: Video quality aware rate control for real-time video streaming based on deep reinforcement learning. In Proc. of ACM International Conference on Multimedia (MM’18). 12081216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Jeong J., Lee S., Ryu I., Le T., and Ryu E.. 2020. Towards viewport-dependent 6DoF 360 video tiled streaming for virtual reality systems. In Proc. of ACM International Conference on Multimedia (MM’20). 36873695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Jiang X., Chiang Y., Zhao Y., and Ji Y.. 2018. Plato: Learning-based adaptive streaming of 360-degree videos. In Proc. of IEEE Conference on Local Computer Networks (LCN’18). 393400.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Jung J., Kroon B., and Boyce J.. 2019. Common Test Conditions for Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/N18563. (2019).Google ScholarGoogle Scholar
  28. [28] Kapov L., Varela M., Hoßfeld T., and Chen K.. 2018. A survey of emerging concepts and challenges for QoE management of multimedia services. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2s (2018), 29:1–29:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Kingma D. and Ba J.. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations Track Proceedings (poster).Google ScholarGoogle Scholar
  30. [30] Maxim C., Steven L., Jeroen F., and Filip D.. 2014. Design and evaluation of a self-learning HTTP adaptive video streaming client. IEEE Communications Letters 18, 4 (2014), 716719.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] MPEG. 2019. HM 16.16. (2019). Retrieved April 21, 2020 from https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-16.16/.Google ScholarGoogle Scholar
  32. [32] MPEG. 2020. Text of ISO/IEC FDIS 23090-5 Visual Volumetric Video-based Coding and Video-based Point Cloud Compression. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/w19579. (2020).Google ScholarGoogle Scholar
  33. [33] MPEG. 2021. Text of ISO/IEC DIS 23090-12 MPEG Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/w20003. (2021).Google ScholarGoogle Scholar
  34. [34] Mueller K., Smolic A., Dix K., Merkle P., Kauff P., and Wiegand T.. 2009. View synthesis for advanced 3D video systems. EURASIP Journal on Image and Video Processing (2009), 438148:1–438148:11.Google ScholarGoogle Scholar
  35. [35] Pang H., Zhang C., Wang F., Liu J., and Sun L.. 2019. Towards low latency multi-viewpoint 360\(^{\circ }\) interactive video: A multimodal deep reinforcement learning approach. In Proc. of IEEE Conference on Computer Communications (INFOCOM’19). 991999.Google ScholarGoogle Scholar
  36. [36] Placket R.. 1975. The analysis of permutations. Journal of the Royal Statistical Society. Series C (Applied Statistics) 24, 2 (1975), 193202.Google ScholarGoogle Scholar
  37. [37] Research ZION Market. 2018. Virtual Reality (VR) Market by Hardware and Software for (Consumer, Commercial, Enterprise, Medical, Aerospace and Defense, Automotive, Energy and Others): Global Industry Perspective, Comprehensive Analysis and Forecast, 2016–2022. (2018). Retrieved April 21, 2020 from https://www.zionmarketresearch.com/report/virtual-reality-market.Google ScholarGoogle Scholar
  38. [38] Salahieh B., Bhatia S., and Boyce J.. 2019. Multi-pass Add-on Tool for Coherent and Complete View Synthesis (US Patent 2019/0320164 A1). (2019).Google ScholarGoogle Scholar
  39. [39] Salahieh B., Kroon B., Jung J., and Domański M.. 2019. Test Model 2 for Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/N18577. (2019).Google ScholarGoogle Scholar
  40. [40] Salahieh B., Kroon B., Jung J., and Domański M.. 2019. Test Model 3 for Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/N18795. (2019).Google ScholarGoogle Scholar
  41. [41] Salahieh B., Kroon B., Jung J., and Domański M.. 2019. Test Model for Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/N18470. (2019).Google ScholarGoogle Scholar
  42. [42] Sullivan G., Ohm J., Han W., and Wiegand T.. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 16491668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Sun Y., Lu A., and Yu L.. 2017. Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE Signal Processing Letters 24, 9 (2017), 14081412.Google ScholarGoogle Scholar
  44. [44] Sutton R. and Barto A.. 2018. Reinforcement Learning: An Introduction (2 ed.). A Bradford Book. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Tian D., Lai P., Lopez P., and Gomila C.. 2009. View synthesis techniques for 3D video. In Proc. of SPIE Conference on Applications of Digital Image Processing (ADIP’09). 74430T:1–74430T:11.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Yaqoob A., Bi T., and Muntean G.. 2020. A survey on adaptive 360\(^{\circ }\) video streaming: Solutions, challenges and opportunities. IEEE Communications Surveys Tutorials 22, 4 (2020), 28012838.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Zhang Y., Zhao P., Bian K., Liu Y., Song L., and Li X.. 2019. DRL360: 360-degree video streaming with deep reinforcement learning. In Proc. of IEEE Conference on Computer Communications (INFOCOM’19). 12521260.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Optimizing Immersive Video Coding Configurations Using Deep Learning: A Case Study on TMIV

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1
      January 2022
      517 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3505205
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 January 2022
      • Accepted: 1 June 2021
      • Revised: 1 May 2021
      • Received: 1 December 2020
      Published in tomm Volume 18, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!