Abstract
Virtual reality adequately stimulates senses to trick users into accepting the virtual environment. To create a sense of immersion, high-resolution images are required to satisfy human visual system, and low latency is essential for smooth operations, which put great demands on data processing and transmission. Actually, when exploring in the virtual environment, viewers only perceive the content in the current field of view. Therefore, if we can predict the head movements that are important behaviors of viewers, more processing resources can be allocated to the active field of view. In this article, we propose a model to predict the trajectory of head movement. Deep reinforcement learning is employed to mimic the decision making. In our framework, to characterize each state, features for viewport images are extracted by convolutional neural networks. In addition, the spherical coordinate maps and visited maps are generated for each viewport image, which facilitate the multiple dimensions of the state information by considering the impact of historical head movement and position information. To ensure the accurate simulation of visual behaviors during the watching of panoramas, we stipulate that the model imitates the behaviors of human demonstrators. To allow the model to generalize to more conditions, the intrinsic motivation is employed to guide the agent’s action toward reducing uncertainty, which can enhance robustness during the exploration. The experimental results demonstrate the effectiveness of the proposed stepwise head movement predictor.
- TOBII VR. 2019. Discover New Possibilities with Eye Tracking in VR. Retrieved September 23, 2020 from https://vr.tobii.com/.Google Scholar
- Afshin Taghavi Nasrabadi, Anahita Mahzari, Joseph D. Beshay, and Ravi Prakash. 2017. Adaptive 360-degree video streaming using scalable video coding. In Proceedings of the 25th ACM International Conference on Multimedia. ACM, New York, NY, 1689--1697.Google Scholar
Digital Library
- J. M. P. Van Waveren. 2016. The asynchronous time warp for virtual reality on consumer hardware. In Proceedings of the 22nd ACM Conference on Virtual Reality Software and Technology. ACM, New York, NY, 37--46.Google Scholar
Digital Library
- Ana De Abreu, Cagri Ozcinar, and Aljosa Smolic. 2017. Look around you: Saliency maps for omnidirectional images in VR applications. In Proceedings of the 9th International Conference on Quality of Multimedia Experience. IEEE, Los Alamitos, CA, 1--6.Google Scholar
- Yashas Rai, Patrick Le Callet, and Philippe Guillotel. 2017. Which saliency weighting for omni directional image quality assessment? In Proceedings of the International Conference on Quality of Multimedia Experience. IEEE, Los Alamitos, CA, 1--6.Google Scholar
Cross Ref
- Mikhail Startsev and Michael Dorr. 2018. 360-Aware saliency estimation with conventional image saliency predictors. Signal Processing: Image Communication 69 (2018), 43--52.Google Scholar
Cross Ref
- Federica Battisti, Sara Baldoni, Michele Brizzi, and Marco Carli. 2018. A feature-based approach for saliency estimation of omni-directional images. Signal Processing: Image Communication 69 (2018), 53--59.Google Scholar
Cross Ref
- Jing Ling, Kao Zhang, Yingxue Zhang, Daiqin Yang, and Zhenzhong Chen. 2018. A saliency prediction model on 360 degree images using color dictionary based sparse representation. Signal Processing: Image Communication 69 (2018), 60--68.Google Scholar
Cross Ref
- Pierre Lebreton and Alexander Raake. 2018. GBVS360, BMS360, ProSal: Extending existing saliency prediction models from 2D to omnidirectional images. Signal Processing: Image Communication 69 (2018), 69--78.Google Scholar
Cross Ref
- Yucheng Zhu, Guangtao Zhai, and Xiongkuo Min. 2018. The prediction of head and eye movement for 360 degree images. Signal Processing: Image Communication 69 (2018), 15--25.Google Scholar
Cross Ref
- Rafael Monroy, Sebastian Lutz, Tejo Chalasani, and Aljosa Smolic. 2018. SalNet360: Saliency maps for omni-directional images with CNN. Signal Processing: Image Communication 69 (2018), 26--34.Google Scholar
Cross Ref
- Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, and Min Sun. 2018. Cube padding for weakly-supervised saliency prediction in 360 videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1420--1429.Google Scholar
Cross Ref
- Yanyu Xu, Yanbing Dong, Junru Wu, Zhengzhong Sun, Zhiru Shi, Jingyi Yu, and Shenghua Gao. 2018. Gaze prediction in dynamic 360° immersive videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 5333--5342.Google Scholar
Cross Ref
- Yashas Rai, Jesús Gutiérrez, and Patrick Le Callet. 2017. A dataset of head and eye movements for 360 degree images. In Proceedings of the 8th ACM Multimedia Systems Conference. ACM, New York, NY, 205--210.Google Scholar
Digital Library
- Vincent Sitzmann, Ana Serrano, Amy Pavel, Maneesh Agrawala, Diego Gutierrez, Belen Masia, and Gordon Wetzstein. 2018. Saliency in VR: How do people explore virtual environments? IEEE Transactions on Visualization and Computer Graphics 24, 4 (2018), 1633--1642.Google Scholar
Digital Library
- Xavier Corbillon, Francesca De Simone, and Gwendal Simon. 2017. 360-Degree video head movement dataset. In Proceedings of the 8th ACM Multimedia Systems Conference. ACM, New York, NY, 199--204.Google Scholar
Digital Library
- Benjamin J. Li, Jeremy N. Bailenson, Adam Pines, Walter J. Greenleaf, and Leanne M. Williams. 2017. A public database of immersive VR videos with corresponding ratings of arousal, valence, and correlations between head movements and self report measures. Frontiers in Psychology 8 (2017), 2116.Google Scholar
Cross Ref
- Stephan Fremerey, Ashutosh Singla, Kay Meseberg, and Alexander Raake. 2018. AVtrack360: An open dataset and software recording people’s head rotations watching 360° videos on an HMD. In Proceedings of the 9th ACM Multimedia Systems Conference. ACM, New York, NY, 403--408.Google Scholar
Digital Library
- Erwan J. David, Jesús Gutiérrez, Antoine Coutrot, Matthieu Perreira Da Silva, and Patrick Le Callet. 2018. A dataset of head and eye movements for 360 videos. In Proceedings of the 9th ACM Multimedia Systems Conference. ACM, New York, NY, 432--437.Google Scholar
Digital Library
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning. 1928--1937.Google Scholar
- Giacomo Rizzolatti, Lucia Riggio, Isabella Dascola, and Carlo Umilta. 1987. Reorienting attention across the horizontal and vertical meridians: Evidence in favor of a premotor theory of attention. Neuropsychologia 25, 1A (1987), 31--40.Google Scholar
Cross Ref
- Pierre-Yves Oudeyer and Frederic Kaplan. 2007. What is intrinsic motivation? A typology of computational approaches. Frontiers in Neurorobotics 1 (2007), 6.Google Scholar
Cross Ref
- Guangtao Zhai, Xiaolin Wu, Xiaokang Yang, Weisi Lin, and Wenjun Zhang. 2012. A psychovisual quality metric in free-energy principle. IEEE Transactions on Image Processing 21, 1 (2012), 41--52.Google Scholar
Digital Library
- Laurent Itti and Pierre F. Baldi. 2006. Bayesian surprise attracts human attention. In Advances in Neural Information Processing Systems. 547--554.Google Scholar
Digital Library
- Karl Friston. 2010. The free-energy principle: A unified brain theory?Nature Reviews Neuroscience 11, 2 (2010), 127.Google Scholar
- Huiyu Duan, Guangtao Zhai, Xiongkuo Min, Yucheng Zhu, Yi Fang, and Xiaokang Yang. 2018. Perceptual quality assessment of omnidirectional images. In Proceedings of the IEEE International Symposium on Circuits and Systems. IEEE, Los Alamitos, CA, 1--5.Google Scholar
Cross Ref
- Cagri Ozcinar and Aljosa Smolic. 2018. Visual attention in omnidirectional video for virtual reality applications. In Proceedings of the 10th International Conference on Quality of Multimedia Experience. IEEE, Los Alamitos, CA, 1--6.Google Scholar
Cross Ref
- Hou Ning Hu, Yen Chen Lin, Ming Yu Liu, Hsien Tzu Cheng, Yung Ju Chang, and Min Sun. 2017. Deep 360 pilot: Learning a deep agent for piloting through 360 sports video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1396--1405.Google Scholar
Cross Ref
- Wei-Sheng Lai, Yujia Huang, Neel Joshi, Christopher Buehler, Ming-Hsuan Yang, and Sing Bing Kang. 2018. Semantic-driven generation of hyperlapse from 360 degree video. IEEE Transactions on Visualization and Computer Graphics 24, 9 (2018), 2610--2621.Google Scholar
Cross Ref
- Yu-Chuan Su and Kristen Grauman. 2017. Learning spherical convolution for fast features from 360° imagery. In Advances in Neural Information Processing Systems. 529--539.Google Scholar
- Taco S. Cohen, Mario Geiger, Jonas Köhler, and Max Welling. 2018. Spherical CNNs. arXiv:1801.10130Google Scholar
- George A. Miller. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information.Psychological Review 63, 2 (1956), 81.Google Scholar
- Holger Kantz and Thomas Schreiber. 2004. Nonlinear Time Series Analysis. Cambridge University Press.Google Scholar
Digital Library
- Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.Google Scholar
Digital Library
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 770--778.Google Scholar
Cross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.Google Scholar
Digital Library
- Benjamin W. Tatler. 2007. The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision 7, 14 (11 2007), 4.Google Scholar
Cross Ref
- Evgeniy Upenik and Touradj Ebrahimi. 2017. A simple method to obtain visual attention data in head mounted virtual reality. In Proceedings of the IEEE International Conference on Multimedia Expo Workshops. IEEE, Los Alamitos, CA, 73--78.Google Scholar
Cross Ref
- Ruzena Bajcsy. 1988. Active perception. Proceedings of the IEEE 76, 8 (1988), 966--1005.Google Scholar
Cross Ref
- Ke Gu, Guangtao Zhai, Weisi Lin, Xiaokang Yang, and Wenjun Zhang. 2015. Visual saliency detection with free energy theory. IEEE Signal Processing Letters 22, 10 (2015), 1552--1555.Google Scholar
Cross Ref
- Jinjian Wu, Guangming Shi, Weisi Lin, Anmin Liu, and Fei Qi. 2013. Just noticeable difference estimation for images with free-energy principle. IEEE Transactions on Multimedia 15, 7 (2013), 1705--1710.Google Scholar
Digital Library
- Thomas Schmidt and Dirk Vorberg. 2006. Criteria for unconscious cognition: Three types of dissociation. Perception 8 Psychophysics 68, 3 (2006), 489--504.Google Scholar
- Hagai Attias. 2000. A variational Bayesian framework for graphical models. In Advances in Neural Information Processing Systems. 209--215.Google Scholar
- Stephen J. Guastello. 2013. Human Factors Engineering and Ergonomics: A Systems Approach. CRC Press, Boca Raton, FL.Google Scholar
- Neeraj J. Gandhi, Ellen J. Barton, and David L. Sparks. 2008. Coordination of eye and head components of movements evoked by stimulation of the paramedian pontine reticular formation. Experimental Brain Research 189, 1 (2008), 35.Google Scholar
Cross Ref
- Michael F. Goodchild. 2007. Citizens as sensors: The world of volunteered geography. GeoJournal 69, 4 (2007), 211--221.Google Scholar
Cross Ref
- Wei Wang, Cheng Chen, Yizhou Wang, Tingting Jiang, Fang Fang, and Yuan Yao. 2011. Simulating human saccadic scanpaths on natural images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 441--448.Google Scholar
Digital Library
- Olivier Le Meur and Thierry Baccino.2013. Methods for comparing scanpaths and saliency maps: Strengths and weaknesses. Behavior Research Methods 45, 1 (2013), 251--266.Google Scholar
Cross Ref
- Robert J. Peters, Asha Iyer, Laurent Itti, and Christof Koch. 2005. Components of bottom-up gaze allocation in natural images. Vision Research 45, 18 (2005), 2397--2416.Google Scholar
Cross Ref
Index Terms
Learning a Deep Agent to Predict Head Movement in 360-Degree Images
Recommendations
360-Degree Video Head Movement Dataset
MMSys'17: Proceedings of the 8th ACM on Multimedia Systems ConferenceWhile Virtual Reality applications are increasingly attracting the attention of developers and business analysts, the behaviour of users watching 360-degree (i.e. omnidirectional) videos has not been thoroughly studied yet. This paper introduces a ...
Your Attention is Unique: Detecting 360-Degree Video Saliency in Head-Mounted Display for Head Movement Prediction
MM '18: Proceedings of the 26th ACM international conference on MultimediaHead movement prediction is the key enabler for the emerging 360-degree videos since it can enhance both streaming and rendering efficiency. To achieve accurate head movement prediction, it becomes imperative to understand user's visual attention on 360-...
A dataset of head and eye movements for 360° videos
MMSys '18: Proceedings of the 9th ACM Multimedia Systems ConferenceResearch on visual attention in 360° content is crucial to understand how people perceive and interact with this immersive type of content and to develop efficient techniques for processing, encoding, delivering and rendering. And also to offer a high ...






Comments