Abstract
Field of view (FoV) prediction is critical in 360-degree video multicast, which is a key component of the emerging virtual reality and augmented reality applications. Most of the current prediction methods combining saliency detection and FoV information neither take into account that the distortion of projected 360-degree videos can invalidate the weight sharing of traditional convolutional networks nor do they adequately consider the difficulty of obtaining complete multi-user FoV information, which degrades the prediction performance. This article proposes a spherical convolution-empowered FoV prediction method, which is a multi-source prediction framework combining salient features extracted from 360-degree video with limited FoV feedback information. A spherical convolutional neural network is used instead of a traditional two-dimensional convolutional neural network to eliminate the problem of weight sharing failure caused by video projection distortion. Specifically, salient spatial-temporal features are extracted through a spherical convolution-based saliency detection model, after which the limited feedback FoV information is represented as a time-series model based on a spherical convolution-empowered gated recurrent unit network. Finally, the extracted salient video features are combined to predict future user FoVs. The experimental results show that the performance of the proposed method is better than other prediction methods.
- [1] . 2021. Point cloud video streaming: Challenges and solutions. IEEE Network 35, 5 (2021), 202–209.Google Scholar
Digital Library
- [2] . 2022. Optimal volumetric video streaming with hybrid saliency based tiling. IEEE Transactions on Multimedia. Early access, February 23, 2022.Google Scholar
- [3] . 2019. A flexible viewport-adaptive processing mechanism for real-time VR video transmission. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo Workshops (ICMEW’19). 336–341.Google Scholar
Cross Ref
- [4] . 2020. A 360\(\circ\) video adaptive streaming scheme based on multiple video qualities. In Proceedings of the 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC’20). IEEE, Los Alamitos, CA, 402–407.Google Scholar
- [5] . 2019. A survey on 360 video streaming: Acquisition, transmission, and display. ACM Computing Surveys 52, 4 (2019), 1–36.Google Scholar
Digital Library
- [6] . 2020. QoE-driven coupled uplink and downlink rate adaptation for 360-degree video live streaming. IEEE Communications Letters 24, 4 (2020), 863–867.Google Scholar
Cross Ref
- [7] . 2018. Optimal multicast of tiled 360 VR video. IEEE Wireless Communications Letters 8, 1 (2018), 145–148.Google Scholar
Cross Ref
- [8] . 2021. Power-efficient wireless streaming of multi-quality tiled 360 VR video in MIMO-OFDMA systems. IEEE Transactions on Wireless Communications 20, 8 (2021), 5408–5422.Google Scholar
Digital Library
- [9] . 2020. Mobile streaming of live 360-degree videos. IEEE Transactions on Multimedia 22, 12 (2020), 3139–3152.Google Scholar
Digital Library
- [10] . 2020. Perceptual image quality assessment: A survey. Science China Information Sciences 63, 11 (2020), 211–301.Google Scholar
Cross Ref
- [11] . 2017. VR theater, a virtual reality based multi-screen movie theater simulator for verifying multi-screen content and environment. In Proceedings of the SMPTE 2017 Annual Technical Conference and Exhibition. 1–13.Google Scholar
Cross Ref
- [12] . 2013. LTE and the Evolution to 4G Wireless: Design and Measurement Challenges. John Wiley & Sons.Google Scholar
Cross Ref
- [13] . 2016. Shooting a moving target: Motion-prediction-based transmission for 360-degree videos. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data’16). IEEE, Los Alamitos, CA, 1161–1170.Google Scholar
Cross Ref
- [14] . 2020. Viewport prediction for 360\(\circ\) videos: A clustering approach. In Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV’20). ACM, New York, NY, 34–39. Google Scholar
- [15] . 2020. A viewport prediction framework for panoramic videos. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN’20). IEEE, Los Alamitos, CA, 1–8.Google Scholar
Cross Ref
- [16] . 2018. Trajectory-based viewport prediction for 360-degree virtual reality videos. In Proceedings of the 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR’18). 157–160.Google Scholar
Cross Ref
- [17] . 2021. Sparkle: User-aware viewport prediction in 360-degree video streaming. IEEE Transactions on Multimedia 23 (2021), 3853–3866.Google Scholar
Digital Library
- [18] . 2018. Your attention is unique: Detecting 360-degree video saliency in head-mounted display for head movement prediction. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). ACM, New York, NY, 1190–1198. Google Scholar
Digital Library
- [19] . 2018. Salgan360: Visual saliency prediction on 360 degree images with generative adversarial networks. In Proceedings of the 2018 IEEE International Conference on Multimedia Expo Workshops (ICMEW’18). IEEE, Los Alamitos, CA, 1–4.Google Scholar
Cross Ref
- [20] . 2017. Learning spherical convolution for fast features from 360\(\circ\) imagery. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). 529–539. Google Scholar
Digital Library
- [21] . 2019. Viewport prediction for panoramic video with multi-CNN. In Proceedings of the 2019 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB’19). IEEE, Los Alamitos, CA, 1–6.Google Scholar
Cross Ref
- [22] . 2015. A framework to evaluate omnidirectional video coding schemes. In Proceedings of the 2015 IEEE International Symposium on Mixed and Augmented Reality. IEEE, Los Alamitos, CA, 31–36.Google Scholar
Digital Library
- [23] . 2019. Single and sequential viewports prediction for 360-degree video streaming. In Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS’19). 1–5.Google Scholar
Cross Ref
- [24] . 2019. Viewport forecasting in 360\(\circ\) virtual reality videos with machine learning. In Proceedings of the 2019 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR’19). IEEE, Los Alamitos, CA, 740–747.Google Scholar
Cross Ref
- [25] . 2020. LSTM-based viewpoint prediction for multi-quality tiled video coding in virtual reality streaming. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS’20). IEEE, Los Alamitos, CA, 1–5.Google Scholar
Cross Ref
- [26] . 2021. PARIMA: Viewport adaptive 360-degree video streaming. In Proceedings of the 2021 Web Conference (WWW’21). ACM, New York, NY, 2379–2391. Google Scholar
Digital Library
- [27] . 2020. Flocking-based live streaming of 360-degree video. In Proceedings of the 11th ACM Multimedia Systems Conference (MMSys’20). ACM, New York, NY, 26–37. Google Scholar
Digital Library
- [28] . 2020. Predicting goal-directed human attention using inverse reinforcement learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 193–202.Google Scholar
Cross Ref
- [29] . 2018. Gaze prediction in dynamic 360\(\circ\) immersive videos. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5333–5342.Google Scholar
Cross Ref
- [30] . 2021. Viewport-dependent saliency prediction in 360\(\circ\) video. IEEE Transactions on Multimedia 23 (2021), 748–760.Google Scholar
Cross Ref
- [31] . 2020. LiveDeep: Online viewport prediction for live virtual reality streaming using lifelong deep learning. In Proceedings of the 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR’20). IEEE, Los Alamitos, CA, 800–808.Google Scholar
Cross Ref
- [32] . 2020. Deep learning for content-based personalized viewport prediction of 360-degree VR videos. IEEE Networking Letters 2, 2 (2020), 81–84.Google Scholar
Cross Ref
- [33] . 2019. Saliency prediction for omnidirectional images considering optimization on sphere domain. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’19). IEEE, Los Alamitos, CA, 2142–2146.Google Scholar
Cross Ref
- [34] . 2020. Saliency prediction network for \(360\circ\) videos. IEEE Journal of Selected Topics in Signal Processing 14, 1 (2020), 27–37.Google Scholar
Cross Ref
- [35] . 2019. LadderNet: Knowledge transfer based viewpoint prediction in 360\(\circ\) video. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’19). IEEE, Los Alamitos, CA, 1657–1661.Google Scholar
Cross Ref
- [36] . 2020. Optimizing fixation prediction using recurrent neural networks for 360\(^{\circ }\) video streaming in head-mounted virtual reality. IEEE Transactions on Multimedia 22, 3 (2020), 744–759.Google Scholar
Cross Ref
- [37] . 2018. SphereNet: Learning spherical representations for detection and classification in omnidirectional images. In Proceedings of the European Conference on Computer Vision (ECCV’18). 518–533.Google Scholar
Digital Library
- [38] . 2018. The prediction of head and eye movement for 360 degree images. Signal Processing: Image Communication 69 (2018), 15–25.Google Scholar
Cross Ref
- [39] . 2020. The prediction of saliency map for head and eye movements in 360 degree images. IEEE Transactions on Multimedia 22, 9 (2020), 2331–2344.Google Scholar
Cross Ref
- [40] . 2020. Learning a deep agent to predict head movement in 360-degree images. ACM Trans. Multimedia Computing, Communications, and Applications 16, 4 (Dec. 2020), Article 130, 23 pages. Google Scholar
Digital Library
- [41] . 2017. An HTTP/2-based adaptive streaming framework for 360\(\circ\) virtual reality videos. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). ACM, New York, NY, 306–314. Google Scholar
Digital Library
- [42] . 2018. Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom’18). ACM, New York, NY, 99–114. Google Scholar
Digital Library
- [43] . 2020. Tile rate allocation for 360-degree tiled adaptive video streaming. In Proceedings of the 28th ACM International Conference on Multimedia (MM’20). ACM, New York, NY, 3724–3733. Google Scholar
Digital Library
- [44] . 2019. Viewport prediction for live 360-degree mobile video streaming using user-content hybrid motion tracking. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 2 (2019), 1–22.Google Scholar
Digital Library
- [45] . 2018. Revisiting video saliency: A large-scale benchmark and a new model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 4894–4903.Google Scholar
Cross Ref
- [46] . 2018. DeepVS: A deep learning based video saliency prediction approach. In Proceedings of the European Conference on Computer Vision (ECCV’18). 602–617.Google Scholar
Digital Library
- [47] . 2019. How is gaze influenced by image transformations? Dataset and model. IEEE Transactions on Image Processing 29 (2019), 2287–2300.Google Scholar
Digital Library
- [48] . 2016. Fixation prediction through multimodal analysis. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 1 (Oct. 2016), Article 6, 23 pages. Google Scholar
- [49] . 2021. Lavs: A lightweight audio-visual saliency prediction model. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME’21). IEEE, Los Alamitos, CA, 1–6.Google Scholar
Cross Ref
- [50] . 2019. Convolutions on spherical images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 1–5.Google Scholar
- [51] . 2020. Saliency prediction for 360-degree video. In Proceedings of the 2020 5th International Conference on Green Technology and Sustainable Development (GTSD’20). 442–448.Google Scholar
Cross Ref
- [52] . 2021. Mosaic: Advancing user quality of experience in 360-degree video streaming with machine learning. IEEE Transactions on Network and Service Management 18, 1 (2021), 1000–1015.Google Scholar
Digital Library
- [53] . 1994. Computing Fourier transforms and convolutions on the 2-sphere. Advances in Applied Mathematics 15, 2 (1994), 202–250.Google Scholar
Digital Library
- [54] . 2018. Spherical CNNs. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google Scholar
- [55] . 2011. Perceptual visual quality metrics: A survey. Journal of Visual Communication and Image Representation 22, 4 (2011), 297–312.Google Scholar
Digital Library
- [56] . 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google Scholar
Digital Library
- [57] . 2018. Saliency detection in 360\(\circ\) videos. In Proceedings of the European Conference on Computer Vision (ECCV’18). 488–503.Google Scholar
Digital Library
- [58] . 2019. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 3 (2019), 740–757.Google Scholar
Digital Library
- [59] . 2020. AI on the edge: Rethinking AI-based IoT applications using specialized edge architectures. [arxiv]:cs.DC/2003.12488 (2020).Google Scholar
Index Terms
Spherical Convolution Empowered Viewport Prediction in 360 Video Multicast with Limited FoV Feedback
Recommendations
Viewport-adaptive 360-degree video coding
Abstract360-degree videos contain an omnidirectional view with ultra-high resolution, which will lead to the bandwidth-hungry issue in virtual reality (VR) applications. However, only a part of a 360-degree video is displayed on the head-mounted displays (...
Multi-viewport based 3D convolutional neural network for 360-degree video quality assessment
Abstract360-degree videos, also known as omnidirectional or panoramic videos, provide the user an immersive experience that 2D videos cannot provide. It is crucial to access the perceived quality of the 360-degree video. 2D video quality assessment (VQA) ...
Machine Learning Based Content-Agnostic Viewport Prediction for 360-Degree Video
Accurate and fast estimations or predictions of the (near) future location of the users of head-mounted devices within the virtual omnidirectional environment open a plethora of opportunities in application domains such as interactive immersive gaming and ...






Comments