skip to main content
research-article

Learning a Deep Agent to Predict Head Movement in 360-Degree Images

Published:17 December 2020Publication History
Skip Abstract Section

Abstract

Virtual reality adequately stimulates senses to trick users into accepting the virtual environment. To create a sense of immersion, high-resolution images are required to satisfy human visual system, and low latency is essential for smooth operations, which put great demands on data processing and transmission. Actually, when exploring in the virtual environment, viewers only perceive the content in the current field of view. Therefore, if we can predict the head movements that are important behaviors of viewers, more processing resources can be allocated to the active field of view. In this article, we propose a model to predict the trajectory of head movement. Deep reinforcement learning is employed to mimic the decision making. In our framework, to characterize each state, features for viewport images are extracted by convolutional neural networks. In addition, the spherical coordinate maps and visited maps are generated for each viewport image, which facilitate the multiple dimensions of the state information by considering the impact of historical head movement and position information. To ensure the accurate simulation of visual behaviors during the watching of panoramas, we stipulate that the model imitates the behaviors of human demonstrators. To allow the model to generalize to more conditions, the intrinsic motivation is employed to guide the agent’s action toward reducing uncertainty, which can enhance robustness during the exploration. The experimental results demonstrate the effectiveness of the proposed stepwise head movement predictor.

References

  1. TOBII VR. 2019. Discover New Possibilities with Eye Tracking in VR. Retrieved September 23, 2020 from https://vr.tobii.com/.Google ScholarGoogle Scholar
  2. Afshin Taghavi Nasrabadi, Anahita Mahzari, Joseph D. Beshay, and Ravi Prakash. 2017. Adaptive 360-degree video streaming using scalable video coding. In Proceedings of the 25th ACM International Conference on Multimedia. ACM, New York, NY, 1689--1697.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. M. P. Van Waveren. 2016. The asynchronous time warp for virtual reality on consumer hardware. In Proceedings of the 22nd ACM Conference on Virtual Reality Software and Technology. ACM, New York, NY, 37--46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ana De Abreu, Cagri Ozcinar, and Aljosa Smolic. 2017. Look around you: Saliency maps for omnidirectional images in VR applications. In Proceedings of the 9th International Conference on Quality of Multimedia Experience. IEEE, Los Alamitos, CA, 1--6.Google ScholarGoogle Scholar
  5. Yashas Rai, Patrick Le Callet, and Philippe Guillotel. 2017. Which saliency weighting for omni directional image quality assessment? In Proceedings of the International Conference on Quality of Multimedia Experience. IEEE, Los Alamitos, CA, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  6. Mikhail Startsev and Michael Dorr. 2018. 360-Aware saliency estimation with conventional image saliency predictors. Signal Processing: Image Communication 69 (2018), 43--52.Google ScholarGoogle ScholarCross RefCross Ref
  7. Federica Battisti, Sara Baldoni, Michele Brizzi, and Marco Carli. 2018. A feature-based approach for saliency estimation of omni-directional images. Signal Processing: Image Communication 69 (2018), 53--59.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jing Ling, Kao Zhang, Yingxue Zhang, Daiqin Yang, and Zhenzhong Chen. 2018. A saliency prediction model on 360 degree images using color dictionary based sparse representation. Signal Processing: Image Communication 69 (2018), 60--68.Google ScholarGoogle ScholarCross RefCross Ref
  9. Pierre Lebreton and Alexander Raake. 2018. GBVS360, BMS360, ProSal: Extending existing saliency prediction models from 2D to omnidirectional images. Signal Processing: Image Communication 69 (2018), 69--78.Google ScholarGoogle ScholarCross RefCross Ref
  10. Yucheng Zhu, Guangtao Zhai, and Xiongkuo Min. 2018. The prediction of head and eye movement for 360 degree images. Signal Processing: Image Communication 69 (2018), 15--25.Google ScholarGoogle ScholarCross RefCross Ref
  11. Rafael Monroy, Sebastian Lutz, Tejo Chalasani, and Aljosa Smolic. 2018. SalNet360: Saliency maps for omni-directional images with CNN. Signal Processing: Image Communication 69 (2018), 26--34.Google ScholarGoogle ScholarCross RefCross Ref
  12. Hsien-Tzu Cheng, Chun-Hung Chao, Jin-Dong Dong, Hao-Kai Wen, Tyng-Luh Liu, and Min Sun. 2018. Cube padding for weakly-supervised saliency prediction in 360 videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1420--1429.Google ScholarGoogle ScholarCross RefCross Ref
  13. Yanyu Xu, Yanbing Dong, Junru Wu, Zhengzhong Sun, Zhiru Shi, Jingyi Yu, and Shenghua Gao. 2018. Gaze prediction in dynamic 360° immersive videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 5333--5342.Google ScholarGoogle ScholarCross RefCross Ref
  14. Yashas Rai, Jesús Gutiérrez, and Patrick Le Callet. 2017. A dataset of head and eye movements for 360 degree images. In Proceedings of the 8th ACM Multimedia Systems Conference. ACM, New York, NY, 205--210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Vincent Sitzmann, Ana Serrano, Amy Pavel, Maneesh Agrawala, Diego Gutierrez, Belen Masia, and Gordon Wetzstein. 2018. Saliency in VR: How do people explore virtual environments? IEEE Transactions on Visualization and Computer Graphics 24, 4 (2018), 1633--1642.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xavier Corbillon, Francesca De Simone, and Gwendal Simon. 2017. 360-Degree video head movement dataset. In Proceedings of the 8th ACM Multimedia Systems Conference. ACM, New York, NY, 199--204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Benjamin J. Li, Jeremy N. Bailenson, Adam Pines, Walter J. Greenleaf, and Leanne M. Williams. 2017. A public database of immersive VR videos with corresponding ratings of arousal, valence, and correlations between head movements and self report measures. Frontiers in Psychology 8 (2017), 2116.Google ScholarGoogle ScholarCross RefCross Ref
  18. Stephan Fremerey, Ashutosh Singla, Kay Meseberg, and Alexander Raake. 2018. AVtrack360: An open dataset and software recording people’s head rotations watching 360° videos on an HMD. In Proceedings of the 9th ACM Multimedia Systems Conference. ACM, New York, NY, 403--408.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Erwan J. David, Jesús Gutiérrez, Antoine Coutrot, Matthieu Perreira Da Silva, and Patrick Le Callet. 2018. A dataset of head and eye movements for 360 videos. In Proceedings of the 9th ACM Multimedia Systems Conference. ACM, New York, NY, 432--437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning. 1928--1937.Google ScholarGoogle Scholar
  21. Giacomo Rizzolatti, Lucia Riggio, Isabella Dascola, and Carlo Umilta. 1987. Reorienting attention across the horizontal and vertical meridians: Evidence in favor of a premotor theory of attention. Neuropsychologia 25, 1A (1987), 31--40.Google ScholarGoogle ScholarCross RefCross Ref
  22. Pierre-Yves Oudeyer and Frederic Kaplan. 2007. What is intrinsic motivation? A typology of computational approaches. Frontiers in Neurorobotics 1 (2007), 6.Google ScholarGoogle ScholarCross RefCross Ref
  23. Guangtao Zhai, Xiaolin Wu, Xiaokang Yang, Weisi Lin, and Wenjun Zhang. 2012. A psychovisual quality metric in free-energy principle. IEEE Transactions on Image Processing 21, 1 (2012), 41--52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Laurent Itti and Pierre F. Baldi. 2006. Bayesian surprise attracts human attention. In Advances in Neural Information Processing Systems. 547--554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Karl Friston. 2010. The free-energy principle: A unified brain theory?Nature Reviews Neuroscience 11, 2 (2010), 127.Google ScholarGoogle Scholar
  26. Huiyu Duan, Guangtao Zhai, Xiongkuo Min, Yucheng Zhu, Yi Fang, and Xiaokang Yang. 2018. Perceptual quality assessment of omnidirectional images. In Proceedings of the IEEE International Symposium on Circuits and Systems. IEEE, Los Alamitos, CA, 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  27. Cagri Ozcinar and Aljosa Smolic. 2018. Visual attention in omnidirectional video for virtual reality applications. In Proceedings of the 10th International Conference on Quality of Multimedia Experience. IEEE, Los Alamitos, CA, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  28. Hou Ning Hu, Yen Chen Lin, Ming Yu Liu, Hsien Tzu Cheng, Yung Ju Chang, and Min Sun. 2017. Deep 360 pilot: Learning a deep agent for piloting through 360 sports video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1396--1405.Google ScholarGoogle ScholarCross RefCross Ref
  29. Wei-Sheng Lai, Yujia Huang, Neel Joshi, Christopher Buehler, Ming-Hsuan Yang, and Sing Bing Kang. 2018. Semantic-driven generation of hyperlapse from 360 degree video. IEEE Transactions on Visualization and Computer Graphics 24, 9 (2018), 2610--2621.Google ScholarGoogle ScholarCross RefCross Ref
  30. Yu-Chuan Su and Kristen Grauman. 2017. Learning spherical convolution for fast features from 360° imagery. In Advances in Neural Information Processing Systems. 529--539.Google ScholarGoogle Scholar
  31. Taco S. Cohen, Mario Geiger, Jonas Köhler, and Max Welling. 2018. Spherical CNNs. arXiv:1801.10130Google ScholarGoogle Scholar
  32. George A. Miller. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information.Psychological Review 63, 2 (1956), 81.Google ScholarGoogle Scholar
  33. Holger Kantz and Thomas Schreiber. 2004. Nonlinear Time Series Analysis. Cambridge University Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.Google ScholarGoogle Scholar
  36. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  37. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Benjamin W. Tatler. 2007. The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision 7, 14 (11 2007), 4.Google ScholarGoogle ScholarCross RefCross Ref
  39. Evgeniy Upenik and Touradj Ebrahimi. 2017. A simple method to obtain visual attention data in head mounted virtual reality. In Proceedings of the IEEE International Conference on Multimedia Expo Workshops. IEEE, Los Alamitos, CA, 73--78.Google ScholarGoogle ScholarCross RefCross Ref
  40. Ruzena Bajcsy. 1988. Active perception. Proceedings of the IEEE 76, 8 (1988), 966--1005.Google ScholarGoogle ScholarCross RefCross Ref
  41. Ke Gu, Guangtao Zhai, Weisi Lin, Xiaokang Yang, and Wenjun Zhang. 2015. Visual saliency detection with free energy theory. IEEE Signal Processing Letters 22, 10 (2015), 1552--1555.Google ScholarGoogle ScholarCross RefCross Ref
  42. Jinjian Wu, Guangming Shi, Weisi Lin, Anmin Liu, and Fei Qi. 2013. Just noticeable difference estimation for images with free-energy principle. IEEE Transactions on Multimedia 15, 7 (2013), 1705--1710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Thomas Schmidt and Dirk Vorberg. 2006. Criteria for unconscious cognition: Three types of dissociation. Perception 8 Psychophysics 68, 3 (2006), 489--504.Google ScholarGoogle Scholar
  44. Hagai Attias. 2000. A variational Bayesian framework for graphical models. In Advances in Neural Information Processing Systems. 209--215.Google ScholarGoogle Scholar
  45. Stephen J. Guastello. 2013. Human Factors Engineering and Ergonomics: A Systems Approach. CRC Press, Boca Raton, FL.Google ScholarGoogle Scholar
  46. Neeraj J. Gandhi, Ellen J. Barton, and David L. Sparks. 2008. Coordination of eye and head components of movements evoked by stimulation of the paramedian pontine reticular formation. Experimental Brain Research 189, 1 (2008), 35.Google ScholarGoogle ScholarCross RefCross Ref
  47. Michael F. Goodchild. 2007. Citizens as sensors: The world of volunteered geography. GeoJournal 69, 4 (2007), 211--221.Google ScholarGoogle ScholarCross RefCross Ref
  48. Wei Wang, Cheng Chen, Yizhou Wang, Tingting Jiang, Fang Fang, and Yuan Yao. 2011. Simulating human saccadic scanpaths on natural images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 441--448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Olivier Le Meur and Thierry Baccino.2013. Methods for comparing scanpaths and saliency maps: Strengths and weaknesses. Behavior Research Methods 45, 1 (2013), 251--266.Google ScholarGoogle ScholarCross RefCross Ref
  50. Robert J. Peters, Asha Iyer, Laurent Itti, and Christof Koch. 2005. Components of bottom-up gaze allocation in natural images. Vision Research 45, 18 (2005), 2397--2416.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Learning a Deep Agent to Predict Head Movement in 360-Degree Images

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!