skip to main content
research-article

Spherical Convolution Empowered Viewport Prediction in 360 Video Multicast with Limited FoV Feedback

Authors Info & Claims
Published:05 January 2023Publication History
Skip Abstract Section

Abstract

Field of view (FoV) prediction is critical in 360-degree video multicast, which is a key component of the emerging virtual reality and augmented reality applications. Most of the current prediction methods combining saliency detection and FoV information neither take into account that the distortion of projected 360-degree videos can invalidate the weight sharing of traditional convolutional networks nor do they adequately consider the difficulty of obtaining complete multi-user FoV information, which degrades the prediction performance. This article proposes a spherical convolution-empowered FoV prediction method, which is a multi-source prediction framework combining salient features extracted from 360-degree video with limited FoV feedback information. A spherical convolutional neural network is used instead of a traditional two-dimensional convolutional neural network to eliminate the problem of weight sharing failure caused by video projection distortion. Specifically, salient spatial-temporal features are extracted through a spherical convolution-based saliency detection model, after which the limited feedback FoV information is represented as a time-series model based on a spherical convolution-empowered gated recurrent unit network. Finally, the extracted salient video features are combined to predict future user FoVs. The experimental results show that the performance of the proposed method is better than other prediction methods.

REFERENCES

  1. [1] Liu Zhi, Li Qiyue, Chen Xianfu, Wu Celimuge, Ishihara Susumu, Li Jie, and Ji Yusheng. 2021. Point cloud video streaming: Challenges and solutions. IEEE Network 35, 5 (2021), 202209.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Li Jie, Zhang Cong, Liu Zhi, Hong Richang, and Hu Han. 2022. Optimal volumetric video streaming with hybrid saliency based tiling. IEEE Transactions on Multimedia. Early access, February 23, 2022.Google ScholarGoogle Scholar
  3. [3] Xu Anyue, Chen Xinyu, Liu Yu, and Wang Yumei. 2019. A flexible viewport-adaptive processing mechanism for real-time VR video transmission. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo Workshops (ICMEW’19). 336341.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Zhang Jie, Zhong Yi, Han Yi, Li Dongdong, Yu Chenxi, and Mo Junchang. 2020. A 360\(\circ\) video adaptive streaming scheme based on multiple video qualities. In Proceedings of the 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC’20). IEEE, Los Alamitos, CA, 402407.Google ScholarGoogle Scholar
  5. [5] Fan Ching-Ling, Lo Wen-Chih, Pai Yu-Tung, and Hsu Cheng-Hsin. 2019. A survey on 360 video streaming: Acquisition, transmission, and display. ACM Computing Surveys 52, 4 (2019), 136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Li Jie, Feng Ransheng, Sun Wei, Liu Zhi, and Li Qiyue. 2020. QoE-driven coupled uplink and downlink rate adaptation for 360-degree video live streaming. IEEE Communications Letters 24, 4 (2020), 863867.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Guo Chengjun, Cui Ying, and Liu Zhi. 2018. Optimal multicast of tiled 360 VR video. IEEE Wireless Communications Letters 8, 1 (2018), 145148.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Guo Chengjun, Zhao Lingzhi, Cui Ying, Liu Zhi, and Ng Derrick Wing Kwan. 2021. Power-efficient wireless streaming of multi-quality tiled 360 VR video in MIMO-OFDMA systems. IEEE Transactions on Wireless Communications 20, 8 (2021), 54085422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Eltobgy Omar, Arafa Omar, and Hefeeda Mohamed. 2020. Mobile streaming of live 360-degree videos. IEEE Transactions on Multimedia 22, 12 (2020), 31393152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Zhai Guangtao and Min Xiongkuo. 2020. Perceptual image quality assessment: A survey. Science China Information Sciences 63, 11 (2020), 211301.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Lee Kyunghan, Guerrero Gaetan, Cha Seunghoon, Kim Younghui, and Cho Sungmin. 2017. VR theater, a virtual reality based multi-screen movie theater simulator for verifying multi-screen content and environment. In Proceedings of the SMPTE 2017 Annual Technical Conference and Exhibition. 113.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Rumney Moray (Ed.). 2013. LTE and the Evolution to 4G Wireless: Design and Measurement Challenges. John Wiley & Sons.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Bao Yanan, Wu Huasen, Zhang Tianxiao, Ramli Albara Ah, and Liu Xin. 2016. Shooting a moving target: Motion-prediction-based transmission for 360-degree videos. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data’16). IEEE, Los Alamitos, CA, 11611170.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Nasrabadi Afshin Taghavi, Samiei Aliehsan, and Prakash Ravi. 2020. Viewport prediction for 360\(\circ\) videos: A clustering approach. In Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV’20). ACM, New York, NY, 3439. Google ScholarGoogle Scholar
  15. [15] Tang Jinting, Huo Yongkai, Yang Shaoshi, and Jiang Jianmin. 2020. A viewport prediction framework for panoramic videos. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN’20). IEEE, Los Alamitos, CA, 18.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Petrangeli Stefano, Simon Gwendal, and Swaminathan Viswanathan. 2018. Trajectory-based viewport prediction for 360-degree virtual reality videos. In Proceedings of the 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR’18). 157160.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Chen Jinyu, Luo Xianzhuo, Hu Miao, Wu Di, and Zhou Yipeng. 2021. Sparkle: User-aware viewport prediction in 360-degree video streaming. IEEE Transactions on Multimedia 23 (2021), 38533866.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Nguyen Anh, Yan Zhisheng, and Nahrstedt Klara. 2018. Your attention is unique: Detecting 360-degree video saliency in head-mounted display for head movement prediction. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). ACM, New York, NY, 11901198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Chao Fang-Yi, Zhang Lu, Hamidouche Wassim, and Deforges Olivier. 2018. Salgan360: Visual saliency prediction on 360 degree images with generative adversarial networks. In Proceedings of the 2018 IEEE International Conference on Multimedia Expo Workshops (ICMEW’18). IEEE, Los Alamitos, CA, 14.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Su Yu-Chuan and Grauman Kristen. 2017. Learning spherical convolution for fast features from 360\(\circ\) imagery. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). 529539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Li Xiao, Wang Siyi, Zhu Chen, Song Li, Xie Rong, and Zhang Wenjun. 2019. Viewport prediction for panoramic video with multi-CNN. In Proceedings of the 2019 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB’19). IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Yu Matt, Lakshman Haricharan, and Girod Bernd. 2015. A framework to evaluate omnidirectional video coding schemes. In Proceedings of the 2015 IEEE International Symposium on Mixed and Augmented Reality. IEEE, Los Alamitos, CA, 3136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Yang Qin, Zou Junni, Tang Kexin, Li Chenglin, and Xiong Hongkai. 2019. Single and sequential viewports prediction for 360-degree video streaming. In Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS’19). 15.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Vielhaben Johanna, Camalan Hüseyin, Samek Wojciech, and Wenzel Markus. 2019. Viewport forecasting in 360\(\circ\) virtual reality videos with machine learning. In Proceedings of the 2019 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR’19). IEEE, Los Alamitos, CA, 740747.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Jamali Mohammadreza, Coulombe Stéphane, Vakili Ahmad, and Vazquez Carlos. 2020. LSTM-based viewpoint prediction for multi-quality tiled video coding in virtual reality streaming. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS’20). IEEE, Los Alamitos, CA, 15.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Chopra Lovish, Chakraborty Sarthak, Mondal Abhijit, and Chakraborty Sandip. 2021. PARIMA: Viewport adaptive 360-degree video streaming. In Proceedings of the 2021 Web Conference (WWW’21). ACM, New York, NY, 23792391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Sun Liyang, Mao Yixiang, Zong Tongyu, Liu Yong, and Wang Yao. 2020. Flocking-based live streaming of 360-degree video. In Proceedings of the 11th ACM Multimedia Systems Conference (MMSys’20). ACM, New York, NY, 2637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Yang Zhibo, Huang Lihan, Chen Yupei, Wei Zijun, Ahn Seoyoung, Zelinsky Gregory, Samaras Dimitris, and Hoai Minh. 2020. Predicting goal-directed human attention using inverse reinforcement learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 193202.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Xu Yanyu, Dong Yanbing, Wu Junru, Sun Zhengzhong, Shi Zhiru, Yu Jingyi, and Gao Shenghua. 2018. Gaze prediction in dynamic 360\(\circ\) immersive videos. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 53335342.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Qiao Minglang, Xu Mai, Wang Zulin, and Borji Ali. 2021. Viewport-dependent saliency prediction in 360\(\circ\) video. IEEE Transactions on Multimedia 23 (2021), 748760.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Feng Xianglong, Liu Yao, and Wei Sheng. 2020. LiveDeep: Online viewport prediction for live virtual reality streaming using lifelong deep learning. In Proceedings of the 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR’20). IEEE, Los Alamitos, CA, 800808.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Chen Xinwei, Kasgari Ali Taleb Zadeh, and Saad Walid. 2020. Deep learning for content-based personalized viewport prediction of 360-degree VR videos. IEEE Networking Letters 2, 2 (2020), 8184.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Dedhia Bhishma, Chiang Jui-Chiu, and Char Yi-Fan. 2019. Saliency prediction for omnidirectional images considering optimization on sphere domain. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’19). IEEE, Los Alamitos, CA, 21422146.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Zhang Youqiang, Dai Feng, Ma Yike, Li Hongliang, Zhao Qiang, and Zhang Yongdong. 2020. Saliency prediction network for \(360\circ\) videos. IEEE Journal of Selected Topics in Signal Processing 14, 1 (2020), 2737.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Zhao Pengyu, Zhang Yuanxing, Bian Kaigui, Tuo Hu, and Song Lingyang. 2019. LadderNet: Knowledge transfer based viewpoint prediction in 360\(\circ\) video. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’19). IEEE, Los Alamitos, CA, 16571661.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Fan Ching-Ling, Yen Shou-Cheng, Huang Chun-Ying, and Hsu Cheng-Hsin. 2020. Optimizing fixation prediction using recurrent neural networks for 360\(^{\circ }\) video streaming in head-mounted virtual reality. IEEE Transactions on Multimedia 22, 3 (2020), 744759.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Coors Benjamin, Condurache Alexandru Paul, and Geiger Andreas. 2018. SphereNet: Learning spherical representations for detection and classification in omnidirectional images. In Proceedings of the European Conference on Computer Vision (ECCV’18). 518533.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Zhu Yucheng, Zhai Guangtao, and Min Xiongkuo. 2018. The prediction of head and eye movement for 360 degree images. Signal Processing: Image Communication 69 (2018), 1525.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Zhu Yucheng, Zhai Guangtao, Min Xiongkuo, and Zhou Jiantao. 2020. The prediction of saliency map for head and eye movements in 360 degree images. IEEE Transactions on Multimedia 22, 9 (2020), 23312344.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Zhu Yucheng, Zhai Guangtao, Min Xiongkuo, and Zhou Jiantao. 2020. Learning a deep agent to predict head movement in 360-degree images. ACM Trans. Multimedia Computing, Communications, and Applications 16, 4 (Dec. 2020), Article 130, 23 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Petrangeli Stefano, Swaminathan Viswanathan, Hosseini Mohammad, and Turck Filip De. 2017. An HTTP/2-based adaptive streaming framework for 360\(\circ\) virtual reality videos. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). ACM, New York, NY, 306314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Qian Feng, Han Bo, Xiao Qingyang, and Gopalakrishnan Vijay. 2018. Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom’18). ACM, New York, NY, 99114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Yadav Praveen Kumar and Ooi Wei Tsang. 2020. Tile rate allocation for 360-degree tiled adaptive video streaming. In Proceedings of the 28th ACM International Conference on Multimedia (MM’20). ACM, New York, NY, 37243733. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Feng Xianglong, Swaminathan Viswanathan, and Wei Sheng. 2019. Viewport prediction for live 360-degree mobile video streaming using user-content hybrid motion tracking. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 2 (2019), 122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Wang Wenguan, Shen Jianbing, Guo Fang, Cheng Ming-Ming, and Borji Ali. 2018. Revisiting video saliency: A large-scale benchmark and a new model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 48944903.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Jiang Lai, Xu Mai, Liu Tie, Qiao Minglang, and Wang Zulin. 2018. DeepVS: A deep learning based video saliency prediction approach. In Proceedings of the European Conference on Computer Vision (ECCV’18). 602617.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Che Zhaohui, Borji Ali, Zhai Guangtao, Min Xiongkuo, Guo Guodong, and Callet Patrick Le. 2019. How is gaze influenced by image transformations? Dataset and model. IEEE Transactions on Image Processing 29 (2019), 22872300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Min Xiongkuo, Zhai Guangtao, Gu Ke, and Yang Xiaokang. 2016. Fixation prediction through multimodal analysis. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 1 (Oct. 2016), Article 6, 23 pages. Google ScholarGoogle Scholar
  49. [49] Zhu Dandan, Zhao Defang, Min Xiongkuo, Han Tian, Zhou Qiangqiang, Yu Shaobo, Chen Yongqing, Zhai Guangtao, and Yang Xiaokang. 2021. Lavs: A lightweight audio-visual saliency prediction model. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME’21). IEEE, Los Alamitos, CA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Eder Marc and Frahm Jan-Michael. 2019. Convolutions on spherical images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 15.Google ScholarGoogle Scholar
  51. [51] Vo Chuong H., Chiang Jui-Chiu, Le Duy H., Nguyen Thu T. A., and Pham Tuan V.. 2020. Saliency prediction for 360-degree video. In Proceedings of the 2020 5th International Conference on Green Technology and Sustainable Development (GTSD’20). 442448.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Park Sohee, Bhattacharya Arani, Yang Zhibo, Das Samir R., and Samaras Dimitris. 2021. Mosaic: Advancing user quality of experience in 360-degree video streaming with machine learning. IEEE Transactions on Network and Service Management 18, 1 (2021), 10001015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Driscoll James R. and Healy Dennis M.. 1994. Computing Fourier transforms and convolutions on the 2-sphere. Advances in Applied Mathematics 15, 2 (1994), 202250.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Cohen Taco S., Geiger Mario, Köhler Jonas, and Welling Max. 2018. Spherical CNNs. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google ScholarGoogle Scholar
  55. [55] Lin Weisi and Kuo C.-C. Jay. 2011. Perceptual visual quality metrics: A survey. Journal of Visual Communication and Image Representation 22, 4 (2011), 297312.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Woo Sanghyun, Park Jongchan, Lee Joon-Young, and Kweon In So. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Zhang Ziheng, Xu Yanyu, Yu Jingyi, and Gao Shenghua. 2018. Saliency detection in 360\(\circ\) videos. In Proceedings of the European Conference on Computer Vision (ECCV’18). 488503.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Bylinskii Zoya, Judd Tilke, Oliva Aude, Torralba Antonio, and Durand Frédo. 2019. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 3 (2019), 740757.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Liang Qianlin, Shenoy Prashant, and Irwin David. 2020. AI on the edge: Rethinking AI-based IoT applications using specialized edge architectures. [arxiv]:cs.DC/2003.12488 (2020).Google ScholarGoogle Scholar

Index Terms

  1. Spherical Convolution Empowered Viewport Prediction in 360 Video Multicast with Limited FoV Feedback

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1
      January 2023
      505 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572858
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 January 2023
      • Online AM: 12 March 2022
      • Revised: 12 January 2022
      • Accepted: 12 January 2022
      • Received: 9 July 2021
      Published in tomm Volume 19, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!