Abstract
Accurate and fast estimations or predictions of the (near) future location of the users of head-mounted devices within the virtual omnidirectional environment open a plethora of opportunities in application domains such as interactive immersive gaming and tele-surgery. Therefore, the past years have seen growing attention to models for viewport prediction in 360֯ environments. Among the approaches, content-agnostic, trajectory-based methods have the potential to provide very fast solutions, as they do not require complex analysis of the videos to provide a prediction. However, accurate trajectory-based viewport prediction is rather difficult due to the intrinsic variability in user behaviour. Furthermore, even when making use of machine learning, current approaches tend to be brute-force and heavily tailored to specific datasets with little comparison to existing benchmarks or publicly available studies. This article presents a generic, content-agnostic viewport prediction method consisting of a window-based approach combined with a preprocessing system to classify behavioural patterns in terms of user clustering and trajectory correlation. Moreover, as the state of the art does not provide a comparative analysis of different approaches, this work contributes to this. Based on the obtained results, a combined prediction model is proposed and evaluated. Our method shows a 36.8% to 53.9% improvement when compared to the static prediction baseline for a prediction horizon of 8 seconds. In addition, a 11.5% to 24.0% improvement to a brute-force machine learning prediction approach is obtained. As such, this work contributes towards the creation of more generic and structured solutions for content-agnostic viewport prediction in terms of data representation, preprocessing and modelling.
- [1] . 2010. Clustering of vehicle trajectories. IEEE Transactions on Intelligent Transportation Systems 11, 3 (2010), 647–657. Google Scholar
Digital Library
- [2] . 2019. Realtime 3D 360-degree telepresence with deep-learning-based head-motion prediction. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 1 (2019), 231–244.Google Scholar
Cross Ref
- [3] . 2018. CUB360: Exploiting cross-users behaviors for viewport prediction in 360 video adaptive streaming. In Proceedings of the 2018 IEEE International Conference on Multimedia and Expo (ICME’18). IEEE, Los Alamitos, CA, 1–6.Google Scholar
Cross Ref
- [4] . 2016. Shooting a moving target: Motion-prediction-based transmission for 360-degree videos. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data’16). IEEE, Los Alamitos, CA, 1161–1170.Google Scholar
Cross Ref
- [5] . 2020. Deep learning for content-based personalized viewport prediction of 360-degree VR videos. IEEE Networking Letters 2, 2 (2020), 81–84.Google Scholar
Cross Ref
- [6] . 2017. 360-degree video head movement dataset. In Proceedings of the 8th ACM Conference on Multimedia Systems (MMSys’17). ACM, New York, NY, 199–204. Google Scholar
Digital Library
- [7] . 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96). 226–231. Google Scholar
Digital Library
- [8] . 2017. Fixation prediction for 360\(^\circ\) video streaming in head-mounted virtual reality. In Proceedings of the 27th Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV’17). ACM, New York, NY, 67–72. Google Scholar
Digital Library
- [9] . 2019. Exploring CNN-based viewport prediction for live virtual reality streaming. In Proceedings of the 2019 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR’19). IEEE, Los Alamitos, CA, 183–1833.Google Scholar
Cross Ref
- [10] . 2020. LiveDeep: Online viewport prediction for live virtual reality streaming using lifelong deep learning. In Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces (VR’20). IEEE, Los Alamitos, CA, 800–808.Google Scholar
Cross Ref
- [11] . 2019. Contextual bandit learning-based viewport prediction for 360 video. In Proceedings of the 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR’19). IEEE, Los Alamitos, CA, 972–973.Google Scholar
Cross Ref
- [12] . 2018. Plato: Learning-based adaptive streaming of 360-degree videos. In Proceedings of the 2018 IEEE 43rd Conference on Local Computer Networks (LCN’18). IEEE, Los Alamitos, CA, 393–400. https://doi.org/10.1109/LCN.2018.8638092Google Scholar
Cross Ref
- [13] . 2019. Very long term field of view prediction for 360-degree video streaming. In Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR’19). IEEE, Los Alamitos, CA, 297–302.Google Scholar
Cross Ref
- [14] . 2019. A taxonomy and dataset for 360\(^\circ\) videos. In Proceedings of the 10th ACM Multimedia Systems Conference (MMSys’19). ACM, New York, NY, 273–278. Google Scholar
Digital Library
- [15] . 2001. On spectral clustering: Analysis and an algorithm. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01). 849–856. Google Scholar
Digital Library
- [16] . 2018. Your attention is unique: Detecting 360-degree video saliency in head-mounted display for head movement prediction. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). ACM, New York, NY, 1190–1198. Google Scholar
Digital Library
- [17] . 2018. Trajectory-based viewport prediction for 360-degree virtual reality videos. In Proceedings of the International Conference on Artificial Intelligence and Virtual Reality (AIVR’18). IEEE, Los Alamitos, CA, 157–160.Google Scholar
Cross Ref
- [18] . 2016. Optimizing 360 video delivery over cellular networks. In Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications, and Challenges (ATC’16). ACM, New York, NY, 1–6. Google Scholar
Digital Library
- [19] . 2019. Optimizing adaptive tile-based virtual reality video streaming. In Proceedings of the 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM’19). IEEE, Los Alamitos, CA, 381–387.Google Scholar
- [20] . 2019. Tile-based adaptive streaming for virtual reality video. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 4 (Dec. 2019), Article 110, 24 pages. Google Scholar
Digital Library
- [21] . 2019. Viewport forecasting in 360\(^\circ\) virtual reality videos with machine learning. In Proceedings of the 2019 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR’19). IEEE, Los Alamitos, CA, 740–747. https://doi.org/10.1109/AIVR46125.2019.00020Google Scholar
Cross Ref
- [22] . 2017. A dataset for exploring user behaviors in VR spherical video streaming. In Proceedings of the 8th ACM Conference on Multimedia Systems (MMSys’17). ACM, New York, NY, 193–198. Google Scholar
Digital Library
- [23] . 2020. A spherical convolution approach for learning long term viewport prediction in 360 immersive video. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 14003–14040.Google Scholar
Cross Ref
- [24] . 2017. 360ProbDASH: Improving QoE of 360 video streaming using tile-based HTTP adaptive streaming. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). ACM, New York, NY, 315–323. Google Scholar
Digital Library
- [25] . 2018. CLS: A cross-user learning based system for improving QoE in 360-degree video adaptive streaming. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). ACM, New York, NY, 564–572. Google Scholar
Digital Library
- [26] . 2018. Gaze prediction in dynamic 360\(^\circ\) immersive videos. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Los Alamitos, CA, 5333–5342.Google Scholar
Cross Ref
- [27] . 2004. Self-tuning spectral clustering. In Proceedings of the 17th Inernational Conference on Neural Information Processing Systems (NIPS’04). 1601–1608. Google Scholar
Digital Library
Index Terms
Machine Learning Based Content-Agnostic Viewport Prediction for 360-Degree Video
Recommendations
Spherical Convolution Empowered Viewport Prediction in 360 Video Multicast with Limited FoV Feedback
Field of view (FoV) prediction is critical in 360-degree video multicast, which is a key component of the emerging virtual reality and augmented reality applications. Most of the current prediction methods combining saliency detection and FoV information ...
Viewport prediction for 360° videos: a clustering approach
NOSSDAV '20: Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and VideoAn important component for viewport-adaptive streaming of 360° videos is viewport prediction. Increasing viewport prediction horizon enables the client to prefetch more chunks into the playback buffer. Having longer buffer results in less rebuffering ...
Viewport-adaptive 360-degree video coding
Abstract360-degree videos contain an omnidirectional view with ultra-high resolution, which will lead to the bandwidth-hungry issue in virtual reality (VR) applications. However, only a part of a 360-degree video is displayed on the head-mounted displays (...






Comments