skip to main content
research-article
Open Access

FovVideoVDP: a visible difference predictor for wide field-of-view video

Published:19 July 2021Publication History
Skip Abstract Section

Abstract

FovVideoVDP is a video difference metric that models the spatial, temporal, and peripheral aspects of perception. While many other metrics are available, our work provides the first practical treatment of these three central aspects of vision simultaneously. The complex interplay between spatial and temporal sensitivity across retinal locations is especially important for displays that cover a large field-of-view, such as Virtual and Augmented Reality displays, and associated methods, such as foveated rendering. Our metric is derived from psychophysical studies of the early visual system, which model spatio-temporal contrast sensitivity, cortical magnification and contrast masking. It accounts for physical specification of the display (luminance, size, resolution) and viewing distance. To validate the metric, we collected a novel foveated rendering dataset which captures quality degradation due to sampling and reconstruction. To demonstrate our algorithm's generality, we test it on 3 independent foveated video datasets, and on a large image quality dataset, achieving the best performance across all datasets when compared to the state-of-the-art.

Skip Supplemental Material Section

Supplemental Material

a49-mantiuk.mp4
3450626.3459831.mp4

References

  1. Tunç Ozan Aydin, Martin Čadík, Karol Myszkowski, and Hans-Peter Seidel. 2010. Video quality assessment for computer graphics applications. ACM Transactions on Graphics 29, 6 (dec 2010), 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Reynold Bailey, Ann McNamara, Nisha Sudarsanam, and Cindy Grimm. 2009. Subtle Gaze Direction. ACM Transactions on Graphics 28, 4 (Sept. 2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Peter G. J. Barten. 1999. Contrast sensitivity of the human eye and its effects on image quality. SPIE Press. 208 pages.Google ScholarGoogle Scholar
  4. Peter G. J. Barten. 2004. Formula for the contrast sensitivity of the human eye. In Proc. SPIE 5294, Image Quality and System Performance, Yoichi Miyake and D. Rene Rasmussen (Eds.). 231--238. Google ScholarGoogle ScholarCross RefCross Ref
  5. Roy S. Berns. 1996. Methods for characterizing CRT displays. Displays 16, 4 (may 1996), 173--182. Google ScholarGoogle ScholarCross RefCross Ref
  6. Christina A. Burbeck and D. H. Kelly. 1980. Spatiotemporal characteristics of visual mechanisms: excitatory-inhibitory model. Journal of the Optical Society of America 70, 9 (sep 1980), 1121. Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Burt and E. Adelson. 1983. The Laplacian Pyramid as a Compact Image Code. IEEE Transactions on Communications 31, 4 (apr 1983), 532--540. Google ScholarGoogle ScholarCross RefCross Ref
  8. Alexandre Chapiro, Robin Atkins, and Scott Daly. 2019. A Luminance-Aware Model of Judder Perception. ACM Transactions on Graphics (TOG) 38, 5 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S.J. Daly. 1993. Visible differences predictor: an algorithm for the assessment of image fidelity. In Digital Images and Human Vision, Andrew B. Watson (Ed.). Vol. 1666. MIT Press, 179--206. Google ScholarGoogle ScholarCross RefCross Ref
  10. Scott J Daly. 1998. Engineering observations from spatiovelocity and spatiotemporal visual models. In Human Vision and Electronic Imaging III, Vol. 3299. International Society for Optics and Photonics, 180--191.Google ScholarGoogle ScholarCross RefCross Ref
  11. R.L. De Valois, D.G. Albrecht, and L.G. Thorell. 1982. Spatial frequency selectivity of cells in macaque visual cortex. Vision Research 22, 5 (1982), 545--559.Google ScholarGoogle ScholarCross RefCross Ref
  12. Gyorgy Denes, Akshay Jindal, Aliaksei Mikhailiuk, and Rafał K. Mantiuk. 2020. A perceptual model of motion quality for rendering with adaptive refresh-rate and resolution. ACM Transactions on Graphics 39, 4 (jul 2020). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Robert F. Dougherty, Volker M. Koch, Alyssa A. Brewer, Bernd Fischer, Jan Modersitzki, and Brian A. Wandell. 2003. Visual field representations and locations of visual areas v1/2/3 in human visual cortex. Journal of Vision 3, 10 (2003), 586--598. Google ScholarGoogle ScholarCross RefCross Ref
  14. H De Lange Dzn. 1952. Experiments on flicker and some calculations on an electrical analogue of the foveal systems. Physica 18, 11 (1952), 935--950.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. M. Foley. 1994. Human luminance pattern-vision mechanisms: masking experiments require a new model. Journal of the Optical Society of America A (1994).Google ScholarGoogle Scholar
  16. Wilson S. Geisler and Jeffrey S. Perry. 1998. Real-time foveated multiresolution system for low-bandwidth video communication. In Human Vision and Electronic Imaging III. SPIE. Google ScholarGoogle ScholarCross RefCross Ref
  17. M A Georgeson and G D Sullivan. 1975. Contrast constancy: deblurring in human vision by spatial frequency channels. J. Physiol. 252, 3 (nov 1975), 627--656.Google ScholarGoogle ScholarCross RefCross Ref
  18. Brian Guenter, Mark Finch, Steven Drucker, Desney Tan, and John Snyder. 2012. Foveated 3D graphics. ACM Transactions on Graphics 31, 6 (Nov. 2012), 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S.T. Hammett and A.T. Smith. 1992. Two temporal channels or three? A re-evaluation. Vision Research 32, 2 (feb 1992), 285--291. Google ScholarGoogle ScholarCross RefCross Ref
  20. E Hartmann, B Lachenmayr, and H Brettel. 1979. The peripheral critical flicker frequency. Vision Research 19, 9 (1979), 1019--1023.Google ScholarGoogle ScholarCross RefCross Ref
  21. Jonathan C. Horton. 1991. The Representation of the Visual Field in Human Striate Cortex. Archives of Ophthalmology 109, 6 (June 1991), 816. Google ScholarGoogle ScholarCross RefCross Ref
  22. Quan Huynh-Thu and Mohammed Ghanbari. 2008. Scope of validity of PSNR in image/video quality assessment. Electronics letters 44, 13 (2008), 800--801.Google ScholarGoogle Scholar
  23. Yize Jin, Meixu Chen, Todd Goodall Bell, Zhaolin Wan, and Alan Bovik. 2020. Study of 2D foveated video quality in virtual reality. In Applications of Digital Image Processing XLIII, Vol. 11510. International Society for Optics and Photonics, 1151007.Google ScholarGoogle Scholar
  24. Yize Jin, Meixu Chen, Todd Goodall, Anjul Patney, and Alan Bovik. 2019. LIVE-Facebook Technologies-Compressed Virtual Reality (LIVE-FBT-FCVR) Databases. http://live.ece.utexas.edu/research/LIVEFBTFCVR/index.html.Google ScholarGoogle Scholar
  25. Yize Jin, Meixu Chen, Todd Goodall, Anjul Patney, and Alan Bovik. 2021. Subjective and objective quality assessment of 2D and 3D foveated video compression in virtual reality. IEEE transactions on Image Processing in review (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Anton S. Kaplanyan, Anton Sochenov, Thomas Leimkuehler, Mikhail Okunev, Todd Goodall, and Gizem Rufo. 2019. DeepFovea: Neural Reconstruction for Foveated Rendering and Video Compression using Learned Statistics of Natural Videos. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 38, 4 (2019), 212:1--212:13.Google ScholarGoogle Scholar
  27. D. H. Kelly. 1979. Motion and vision II Stabilized spatio-temporal threshold surface. Journal of the Optical Society of America 69, 10 (oct 1979), 1340. Google ScholarGoogle ScholarCross RefCross Ref
  28. D. H. Kelly. 1983. Spatiotemporal variation of chromatic and achromatic contrast thresholds. JOSA 73, 6 (1983), 742--750.Google ScholarGoogle ScholarCross RefCross Ref
  29. Frederick A.A. Kingdom and Paul Whittle. 1996. Contrast discrimination at high contrasts reveals the influence of local light adaptation on contrast processing. Vision Research 36, 6 (1996), 817--829. Google ScholarGoogle ScholarCross RefCross Ref
  30. Pavel Korshunov, P. Hanhart, T. Richter, A. Artusi, R.K. Mantiuk, and T. Ebrahimi. 2015. Subjective quality assessment database of HDR images compressed with JPEG XT. In QoMEX. 1--6. Google ScholarGoogle ScholarCross RefCross Ref
  31. Justin Laird, Mitchell Rosen, Jeff Pelz, Ethan Montag, and Scott Daly. 2006. Spatio-velocity CSF as a function of retinal velocity using unstabilized stimuli. In Human Vision and Electronic Imaging, Vol. 6057. 605705. Google ScholarGoogle ScholarCross RefCross Ref
  32. Gordon E. Legge and John M. Foley. 1980. Contrast masking in human vision. JOSA 70, 12 (dec 1980), 1458--71.Google ScholarGoogle ScholarCross RefCross Ref
  33. Rafał K. Mantiuk and Maryam Azimi. 2021. PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR. In Picture Coding Symposium.Google ScholarGoogle Scholar
  34. Rafał K. Mantiuk, Kil Joong Kim, Allan G. Rempel, and Wolfgang Heidrich. 2011. HDR-VDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Trans. Graph. 30, 4, Article 40 (July 2011), 40:1--40:14 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Rafał K. Mantiuk, Minjung Kim, Maliha Ashraf, Qiang Xu, M. Ronnier Luo, Jasna Martinovic, and Sophie Wuerger. 2020. Practical color contrast sensitivity functions for luminance levels up to 10 000 cd/m2. In Color Imaging Conference. 1--6. Google ScholarGoogle ScholarCross RefCross Ref
  36. A. Mikhailiuk, M. Pérez-Ortiz, D. Yue, W. Suen, and R. K. Mantiuk. 2021. Consolidated dataset and metrics for high-dynamic-range image quality. IEEE Transactions on Multimedia (2021), (in print).Google ScholarGoogle Scholar
  37. Manish Narwaria, Matthieu Perreira Da Silva, Patrick Le Callet, and Romuald Pepion. 2013. Tone mapping-based high-dynamic-range image compression: study of optimization criterion and perceptual quality. Optical Engineering 52, 10 (oct 2013), 102008. Google ScholarGoogle ScholarCross RefCross Ref
  38. Manish Narwaria, Matthieu Perreira Da Silva, and Patrick Le Callet. 2015. HDR-VQM: An objective quality measure for high dynamic range video. Signal Processing: Image Communication 35 (jul 2015), 46--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Anjul Patney, Marco Salvi, Joohwan Kim, Anton Kaplanyan, Chris Wyman, Nir Benty, David Luebke, and Aaron Lefohn. 2016. Towards foveated rendering for gaze-tracked virtual reality. ACM Transactions on Graphics (TOG) 35, 6 (2016), 179.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Eli Peli. 1990. Contrast in complex images. Journal of the Optical Society of America A 7, 10 (oct 1990), 2032--2040. Google ScholarGoogle ScholarCross RefCross Ref
  41. Eli Peli, Jian Yang, and Robert B. Goldstein. 1991. Image invariance with changes in size: the role of peripheral contrast thresholds. Journal of the Optical Society of America A 8, 11 (Nov. 1991), 1762. Google ScholarGoogle ScholarCross RefCross Ref
  42. Maria Perez-Ortiz and Rafal K. Mantiuk. 2017. A practical guide and software for analysing pairwise comparison experiments. arXiv preprint (dec 2017). arXiv:1712.03686 http://arxiv.org/abs/1712.03686Google ScholarGoogle Scholar
  43. Maria Perez-Ortiz, Aliaksei Mikhailiuk, Emin Zerman, Vedad Hulusic, Giuseppe Valenzise, and Rafal K. Mantiuk. 2020. From pairwise comparisons and rating to a unified quality scale. IEEE Transactions on Image Processing 29 (2020), 1139--1151. Google ScholarGoogle ScholarCross RefCross Ref
  44. Nikolay Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir Lukin, Karen Egiazarian, Jaakko Astola, Benoit Vozel, Kacem Chehdi, Marco Carli, Federica Battisti, and C.-C. Jay Kuo. 2015. Image database TID2013: Peculiarities, results and perspectives. Signal Processing: Image Comm. 30 (2015), 57--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Snježana Rimac-Drlje, Goran Martinović, and Branka Zovko-Cihlar. 2011. Foveation-based content Adaptive Structural Similarity index. In 2011 18th International Conference on Systems, Signals and Image Processing. IEEE, 1--4.Google ScholarGoogle Scholar
  46. Snježana Rimac-Drlje, Mario Vranješ, and Drago Žagar. 2010. Foveated mean squared error---a novel video quality metric. Multimedia tools and applications 49, 3 (2010), 425--445.Google ScholarGoogle Scholar
  47. J.G. Robson and Norma Graham. 1981. Probability summation and regional variation in contrast sensitivity across the visual field. Vision Research 21, 3 (jan 1981), 409--418. Google ScholarGoogle ScholarCross RefCross Ref
  48. Guodong Rong and Tiow-Seng Tan. 2006. Jump flooding in GPU with applications to Voronoi diagram and distance transform. In Proceedings of the 2006 symposium on Interactive 3D graphics and games. ACM, 109--116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. Rovamo and V. Virsu. 1979. An estimation and application of the human cortical magnification factor. Experimental Brain Research 37, 3 (1979), 495--510. Google ScholarGoogle ScholarCross RefCross Ref
  50. Kalpana Seshadrinathan and Alan Conrad Bovik. 2009. Motion tuned spatio-temporal quality assessment of natural videos. IEEE transactions on image processing 19, 2 (2009), 335--350.Google ScholarGoogle Scholar
  51. H.R. Sheikh, M.F. Sabir, and A.C. Bovik. 2006. A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms. IEEE Transactions on Image Processing 15, 11 (2006), 3440--3451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. E.P. Simoncelli and W.T. Freeman. 2002. The steerable pyramid: a flexible architecture for multi-scale derivative computation. In IEEE ICIP, Vol. 3. 444--447. Google ScholarGoogle ScholarCross RefCross Ref
  53. Vincent Sitzmann, Ana Serrano, Amy Pavel, Maneesh Agrawala, Diego Gutierrez, Belen Masia, and Gordon Wetzstein. 2017. How do people explore virtual environments? IEEE Transactions on Visualization and Computer Graphics (2017).Google ScholarGoogle Scholar
  54. Philip L. Smith. 1998. Bloch's law predictions from diffusion process models of detection. Australian Journal of Psychology 50, 3 (dec 1998), 139--147. Google ScholarGoogle ScholarCross RefCross Ref
  55. Rajiv Soundararajan and Alan C Bovik. 2012. Video quality assessment by reduced reference spatio-temporal entropic differencing. IEEE Transactions on Circuits and Systems for Video Technology 23, 4 (2012), 684--694.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Srinivas Sridharan, Reynold Bailey, Ann McNamara, and Cindy Grimm. 2012. Subtle gaze manipulation for improved mammography training. In Proceedings of the Symposium on Eye Tracking Research and Applications. 75--82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. C. F. Stromeyer and B. Julesz. 1972. Spatial-Frequency Masking in Vision: Critical Bands and Spread of Masking. Journal of the Optical Society of America 62, 10 (oct 1972), 1221. Google ScholarGoogle ScholarCross RefCross Ref
  58. Qi Sun, A. Patney, L.-Y. Wei, O. Shapira, J. Lu, P. Asente, S. Zhu, M. McGuire, D. Luebke, and A. Kaufman. 2018. Towards virtual reality infinite walking: Dynamic saccadic redirection. ACM Trans. on Graph. (2018), 16.Google ScholarGoogle Scholar
  59. Nicholas T. Swafford, José A. Iglesias-Guitian, Charalampos Koniaris, Bochang Moon, Darren Cosker, and Kenny Mitchell. 2016. User, metric, and computational evaluation of foveated rendering methods. In Proceedings of the ACM Symposium on Applied Perception - SAP '16. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Okan Tarhan Tursun, Elena Arabadzhiyska-Koleva, Marek Wernikowski, Radosław Mantiuk, Hans-Peter Seidel, Karol Myszkowski, and Piotr Didyk. 2019. Luminance-contrast-aware foveated rendering. ACM Transactions on Graphics 38, 4 (July 2019), 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Peter Vangorp, Karol Myszkowski, Erich W. Graf, and Rafał K. Mantiuk. 2015. A model of local adaptation. ACM Transactions on Graphics 34, 6 (oct 2015), 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. V. Virsu and J. Rovamo. 1979. Visual resolution, contrast sensitivity, and the cortical magnification factor. Experimental Brain Research 37, 3 (Nov. 1979). Google ScholarGoogle ScholarCross RefCross Ref
  63. Zhou Wang, Alan C. Bovik, Ligang Lu, and Jack L. Kouloheris. 2001. Foveated wavelet image quality index. In Applications of Digital Image Processing XXIV, Andrew G. Tescher (Ed.). SPIE. Google ScholarGoogle ScholarCross RefCross Ref
  64. Z Wang, E.P. Simoncelli, and A.C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh A silomar Conference on Signals, Systems & Computers, 2003. IEEE, 1398--1402. Google ScholarGoogle ScholarCross RefCross Ref
  65. AB Watson and JA Solomon. 1997. Model of visual contrast gain control and pattern masking. Journal of the Optical Society of America A 14, 9 (1997), 2379--2391.Google ScholarGoogle ScholarCross RefCross Ref
  66. Andrew B. Watson. 1987. The cortex transform: Rapid computation of simulated neural images. Computer Vision, Graphics, and Image Processing 39, 3 (sep 1987), 311--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Andrew B. Watson and Albert J. Ahumada. 2016. The pyramid of visibility. Human Vision and Electronic Imaging 2016, HVEI 2016 (2016), 37--42. Google ScholarGoogle ScholarCross RefCross Ref
  68. Stefan Winkler, Murat Kunt, and Christian J van den Branden Lambrecht. 2001. Vision and video: models and applications. In Vision Models and Applications to Image and Video Processing. Springer, 201--229.Google ScholarGoogle Scholar
  69. Krzysztof Wolski, Daniele Giunchi, Nanyang Ye, Piotr Didyk, Karol Myszkowski, Radosław Mantiuk, Hans-Peter Seidel, Anthony Steed, and Rafał K. Mantiuk. 2018. Dataset and Metrics for Predicting Local Visible Differences. ACM Transactions on Graphics 37, 5 (nov 2018), 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Lei Yang, Diego Nehab, Pedro V. Sander, Pitchaya Sitthi-amorn, Jason Lawrence, and Hugues Hoppe. 2009. Amortized Supersampling. ACM Trans. Graph. 28, 5, Article 135 (Dec. 2009), 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Nanyang Ye, Krzysztof Wolski, and Rafal K. Mantiuk. 2019. Predicting Visible Image Differences Under Varying Display Brightness and Viewing Distance. In CVPR. 5429--5437. Google ScholarGoogle ScholarCross RefCross Ref
  72. Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR. 586--595. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. FovVideoVDP: a visible difference predictor for wide field-of-view video

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 40, Issue 4
      August 2021
      2170 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3450626
      Issue’s Table of Contents

      Copyright © 2021 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2021
      Published in tog Volume 40, Issue 4

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader