skip to main content
research-article

Online Estimation of Evolving Human Visual Interest

Published:04 September 2014Publication History
Skip Abstract Section

Abstract

Regions in video streams attracting human interest contribute significantly to human understanding of the video. Being able to predict salient and informative Regions of Interest (ROIs) through a sequence of eye movements is a challenging problem. Applications such as content-aware retargeting of videos to different aspect ratios while preserving informative regions and smart insertion of dialog (closed-caption text)1 into the video stream can significantly be improved using the predicted ROIs. We propose an interactive human-in-the-loop framework to model eye movements and predict visual saliency into yet-unseen frames. Eye tracking and video content are used to model visual attention in a manner that accounts for important eye-gaze characteristics such as temporal discontinuities due to sudden eye movements, noise, and behavioral artifacts. A novel statistical- and algorithm-based method gaze buffering is proposed for eye-gaze analysis and its fusion with content-based features. Our robust saliency prediction is instantiated for two challenging and exciting applications. The first application alters video aspect ratios on-the-fly using content-aware video retargeting, thus making them suitable for a variety of display sizes. The second application dynamically localizes active speakers and places dialog captions on-the-fly in the video stream. Our method ensures that dialogs are faithful to active speaker locations and do not interfere with salient content in the video stream. Our framework naturally accommodates personalisation of the application to suit biases and preferences of individual users.

Skip Supplemental Material Section

Supplemental Material

References

  1. A1 Clip: Paris Zarcilla. 2009. Smile. in aniBOOM online video clip, YouTube, https://www.youtube.com/watch?v=ghgzFY85Gw.Google ScholarGoogle Scholar
  2. J. S. Agustin, H. Skovsgaard, E. Mollenbach, M. Barret, M. Tall, D. W. Hansen, and J. P. Hansen. 2010. Evaluation of a low-cost open-source gaze tracker. In Proceedings of the Symposium on Eye-Tracking Research and Applications (ETRA'10). ACM Press, New York, 77--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Alnajar, T. Gevers, R. Valenti, and S. Ghebreab. 2013. Calibration-free gaze estimation using human gaze patterns. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'13). 137--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. 2002. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans. Signal Process. 50, 2, 174--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Avidan and A. Shamir. 2007. Seam carving for content-aware image resizing. ACM Trans. Graph. 26, 3, Article 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Baldi and L. Itti. 2010. Of bits and wows: A bayesian theory of surprise with applications to attention. Neural Netw. 23, 5, 649--666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Cerf, J. Harel, W. Einhuser, and C. Koch. 2007. Predicting human gaze using low-level saliency combined with face detection. In Neural Information Processing Systems, J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds., MIT Press, 1--7.Google ScholarGoogle Scholar
  8. Chakde Clip: Dir. Shimit Amin. 2007. Chak de! India. Yash Raj films, DVD.Google ScholarGoogle Scholar
  9. N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). 886--893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Feng, G. Cheung, W.-T. Tan, P. Le Callet, and Y. Ji. 2013. Low-cost eye gaze prediction system for interactive networked video streaming. IEEE Trans. Multimedia 15, 8, 1865--1879. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Grundmann, V. Kwatra, M. Han, and I. A. Essa. 2010. Discontinuous seam-carving for video retargeting. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'10). 569--576.Google ScholarGoogle Scholar
  12. J. Harel, C. Koch, and P. Perona. 2007. Graph-based visual saliency. Adv. Neural Inf. Process. Syst. 19, 545--552.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Hong, M. Wang, M. Xu, S. Yan, and T.-S. Chua. 2010. Dynamic captioning: Video accessibility enhancement for hearing impairment. In Proceedings of the International Conference on Multimedia (MM'10). 421--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Itti, C. Koch, and E. Niebur. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 11, 1254--1259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Jain. 2012. Attention-guided algorithms to retarget and augment animations, stills, and videos. Ph.D. dissertation, The Robotics Institute, Carnegie Mellon University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. JBDY Clip: Dir. Kundan Shah. 1983. Jaane Bhi Do Yaaro. in National Film Development Corporation, Ultra Distributors, DVD 2004.Google ScholarGoogle Scholar
  17. T. Judd, K. Ehinger, F. Durand, and A. Torralba. 2009. Learning to predict where humans look. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09). 2106--2113.Google ScholarGoogle Scholar
  18. M. Kankanhalli, J. Wang, and R. Jain. 2006. Experiential sampling in multimedia systems. IEEE Trans. Multimedia 8, 5, 937--946. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Katti, S. Ramanathan, M. S. Kankanhalli, N. Sebe, T. S. Chua, and K. R. Ramakrishnan. 2010. Making computers look the way we look: Exploiting visual attention for image understanding. In Proceedings of the International Conference on Multimedia (MM'10). ACM Press, New York, 667--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Kopf, J. Kiess, H. Lemelson, and W. Effelsberg. 2009. FSCAV: Fast seam carving for size adaptation of videos. In Proceedings of the 17th ACM International Conference on Multimedia (MM'09). ACM Press, New York, 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. National Captioning Institute. 1970. Online article on captioned television. http://www.ncicap.org/caphist.asp.Google ScholarGoogle Scholar
  22. S. Ramanathan, H. Katti, R. Huang, T. S. Chua, and M. S. Kankanhalli. 2009. Automated localization of affective objects and actions in images via caption text-cum-eye gaze analysis. In Proceedings of the 17th ACM International Conference on Multimedia (MM'09). 729--732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Ramanathan, H. Katti, M. S. Kankanhalli, T. S. Chua, and N. Sebe. 2010. An eye fixation database for saliency detection in images. In Proceedings of the 11th European Conference on Computer Vision (ECCV'10). 30--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Rubinstein, A. Shamir, and S. Avidan. 2008. Improved seam carving for video retargeting. ACM Trans. Graph. 27, 3, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. San Agustin, H. Skovsgaard, J. P. Hansen, and D. W. Hansen. 2009. Low-cost gaze interaction: Ready to deliver the promises. In Proceedings of the Extended Abstracts on Human Factors in Computing Systems (CHI-EA'09). ACM Press, New York, 4453--4458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Santella and D. Decarlo. 2004. Robust clustering of eye movement recordings for quantification of visual interest. In Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA'04). ACM Press, New York, 27--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Shamir and O. Sorkine. 2009. Visual media retargeting. In Proceeding of the ACM SIGGRAPH ASIA Courses. ACM Press, New York, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. TTSS Clip: Dir. Tomas Alfredson. 2011. Tinker Tailor Soldier Spy. in StudioCanal, Karla Films, Paradis Films, Kinowelt, Filmproduction, Working Title Films, Canal+, Cin+. STUDIOCANAL, UK, DVD.Google ScholarGoogle Scholar
  29. D. Xu and P. Nasiopoulos. 2009. Logo insertion transcoding for h.264/avc compressed video. In Proceedings of the 16th IEEE International Conference on Image Processing (ICIP'09). 3693--3696. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y2 Clip: Justin Lee, K. Tam, T. Bradbury, K. Rashidi, and K. Liang. 2009. The 5 second rule. in CAMPUS MOVIEFEST, Outspire Productions online video clip, YouTube, https://www.youtube.com/watch?v=9rgCsosjJtl.Google ScholarGoogle Scholar

Index Terms

  1. Online Estimation of Evolving Human Visual Interest

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 11, Issue 1
      August 2014
      151 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/2665935
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 September 2014
      • Accepted: 1 April 2014
      • Revised: 1 December 2013
      • Received: 1 August 2013
      Published in tomm Volume 11, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!