Abstract
FovVideoVDP is a video difference metric that models the spatial, temporal, and peripheral aspects of perception. While many other metrics are available, our work provides the first practical treatment of these three central aspects of vision simultaneously. The complex interplay between spatial and temporal sensitivity across retinal locations is especially important for displays that cover a large field-of-view, such as Virtual and Augmented Reality displays, and associated methods, such as foveated rendering. Our metric is derived from psychophysical studies of the early visual system, which model spatio-temporal contrast sensitivity, cortical magnification and contrast masking. It accounts for physical specification of the display (luminance, size, resolution) and viewing distance. To validate the metric, we collected a novel foveated rendering dataset which captures quality degradation due to sampling and reconstruction. To demonstrate our algorithm's generality, we test it on 3 independent foveated video datasets, and on a large image quality dataset, achieving the best performance across all datasets when compared to the state-of-the-art.
Supplemental Material
- Tunç Ozan Aydin, Martin Čadík, Karol Myszkowski, and Hans-Peter Seidel. 2010. Video quality assessment for computer graphics applications. ACM Transactions on Graphics 29, 6 (dec 2010), 1. Google Scholar
Digital Library
- Reynold Bailey, Ann McNamara, Nisha Sudarsanam, and Cindy Grimm. 2009. Subtle Gaze Direction. ACM Transactions on Graphics 28, 4 (Sept. 2009). Google Scholar
Digital Library
- Peter G. J. Barten. 1999. Contrast sensitivity of the human eye and its effects on image quality. SPIE Press. 208 pages.Google Scholar
- Peter G. J. Barten. 2004. Formula for the contrast sensitivity of the human eye. In Proc. SPIE 5294, Image Quality and System Performance, Yoichi Miyake and D. Rene Rasmussen (Eds.). 231--238. Google Scholar
Cross Ref
- Roy S. Berns. 1996. Methods for characterizing CRT displays. Displays 16, 4 (may 1996), 173--182. Google Scholar
Cross Ref
- Christina A. Burbeck and D. H. Kelly. 1980. Spatiotemporal characteristics of visual mechanisms: excitatory-inhibitory model. Journal of the Optical Society of America 70, 9 (sep 1980), 1121. Google Scholar
Cross Ref
- P. Burt and E. Adelson. 1983. The Laplacian Pyramid as a Compact Image Code. IEEE Transactions on Communications 31, 4 (apr 1983), 532--540. Google Scholar
Cross Ref
- Alexandre Chapiro, Robin Atkins, and Scott Daly. 2019. A Luminance-Aware Model of Judder Perception. ACM Transactions on Graphics (TOG) 38, 5 (2019).Google Scholar
Digital Library
- S.J. Daly. 1993. Visible differences predictor: an algorithm for the assessment of image fidelity. In Digital Images and Human Vision, Andrew B. Watson (Ed.). Vol. 1666. MIT Press, 179--206. Google Scholar
Cross Ref
- Scott J Daly. 1998. Engineering observations from spatiovelocity and spatiotemporal visual models. In Human Vision and Electronic Imaging III, Vol. 3299. International Society for Optics and Photonics, 180--191.Google Scholar
Cross Ref
- R.L. De Valois, D.G. Albrecht, and L.G. Thorell. 1982. Spatial frequency selectivity of cells in macaque visual cortex. Vision Research 22, 5 (1982), 545--559.Google Scholar
Cross Ref
- Gyorgy Denes, Akshay Jindal, Aliaksei Mikhailiuk, and Rafał K. Mantiuk. 2020. A perceptual model of motion quality for rendering with adaptive refresh-rate and resolution. ACM Transactions on Graphics 39, 4 (jul 2020). Google Scholar
Digital Library
- Robert F. Dougherty, Volker M. Koch, Alyssa A. Brewer, Bernd Fischer, Jan Modersitzki, and Brian A. Wandell. 2003. Visual field representations and locations of visual areas v1/2/3 in human visual cortex. Journal of Vision 3, 10 (2003), 586--598. Google Scholar
Cross Ref
- H De Lange Dzn. 1952. Experiments on flicker and some calculations on an electrical analogue of the foveal systems. Physica 18, 11 (1952), 935--950.Google Scholar
Cross Ref
- J. M. Foley. 1994. Human luminance pattern-vision mechanisms: masking experiments require a new model. Journal of the Optical Society of America A (1994).Google Scholar
- Wilson S. Geisler and Jeffrey S. Perry. 1998. Real-time foveated multiresolution system for low-bandwidth video communication. In Human Vision and Electronic Imaging III. SPIE. Google Scholar
Cross Ref
- M A Georgeson and G D Sullivan. 1975. Contrast constancy: deblurring in human vision by spatial frequency channels. J. Physiol. 252, 3 (nov 1975), 627--656.Google Scholar
Cross Ref
- Brian Guenter, Mark Finch, Steven Drucker, Desney Tan, and John Snyder. 2012. Foveated 3D graphics. ACM Transactions on Graphics 31, 6 (Nov. 2012), 1. Google Scholar
Digital Library
- S.T. Hammett and A.T. Smith. 1992. Two temporal channels or three? A re-evaluation. Vision Research 32, 2 (feb 1992), 285--291. Google Scholar
Cross Ref
- E Hartmann, B Lachenmayr, and H Brettel. 1979. The peripheral critical flicker frequency. Vision Research 19, 9 (1979), 1019--1023.Google Scholar
Cross Ref
- Jonathan C. Horton. 1991. The Representation of the Visual Field in Human Striate Cortex. Archives of Ophthalmology 109, 6 (June 1991), 816. Google Scholar
Cross Ref
- Quan Huynh-Thu and Mohammed Ghanbari. 2008. Scope of validity of PSNR in image/video quality assessment. Electronics letters 44, 13 (2008), 800--801.Google Scholar
- Yize Jin, Meixu Chen, Todd Goodall Bell, Zhaolin Wan, and Alan Bovik. 2020. Study of 2D foveated video quality in virtual reality. In Applications of Digital Image Processing XLIII, Vol. 11510. International Society for Optics and Photonics, 1151007.Google Scholar
- Yize Jin, Meixu Chen, Todd Goodall, Anjul Patney, and Alan Bovik. 2019. LIVE-Facebook Technologies-Compressed Virtual Reality (LIVE-FBT-FCVR) Databases. http://live.ece.utexas.edu/research/LIVEFBTFCVR/index.html.Google Scholar
- Yize Jin, Meixu Chen, Todd Goodall, Anjul Patney, and Alan Bovik. 2021. Subjective and objective quality assessment of 2D and 3D foveated video compression in virtual reality. IEEE transactions on Image Processing in review (2021).Google Scholar
Digital Library
- Anton S. Kaplanyan, Anton Sochenov, Thomas Leimkuehler, Mikhail Okunev, Todd Goodall, and Gizem Rufo. 2019. DeepFovea: Neural Reconstruction for Foveated Rendering and Video Compression using Learned Statistics of Natural Videos. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 38, 4 (2019), 212:1--212:13.Google Scholar
- D. H. Kelly. 1979. Motion and vision II Stabilized spatio-temporal threshold surface. Journal of the Optical Society of America 69, 10 (oct 1979), 1340. Google Scholar
Cross Ref
- D. H. Kelly. 1983. Spatiotemporal variation of chromatic and achromatic contrast thresholds. JOSA 73, 6 (1983), 742--750.Google Scholar
Cross Ref
- Frederick A.A. Kingdom and Paul Whittle. 1996. Contrast discrimination at high contrasts reveals the influence of local light adaptation on contrast processing. Vision Research 36, 6 (1996), 817--829. Google Scholar
Cross Ref
- Pavel Korshunov, P. Hanhart, T. Richter, A. Artusi, R.K. Mantiuk, and T. Ebrahimi. 2015. Subjective quality assessment database of HDR images compressed with JPEG XT. In QoMEX. 1--6. Google Scholar
Cross Ref
- Justin Laird, Mitchell Rosen, Jeff Pelz, Ethan Montag, and Scott Daly. 2006. Spatio-velocity CSF as a function of retinal velocity using unstabilized stimuli. In Human Vision and Electronic Imaging, Vol. 6057. 605705. Google Scholar
Cross Ref
- Gordon E. Legge and John M. Foley. 1980. Contrast masking in human vision. JOSA 70, 12 (dec 1980), 1458--71.Google Scholar
Cross Ref
- Rafał K. Mantiuk and Maryam Azimi. 2021. PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR. In Picture Coding Symposium.Google Scholar
- Rafał K. Mantiuk, Kil Joong Kim, Allan G. Rempel, and Wolfgang Heidrich. 2011. HDR-VDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Trans. Graph. 30, 4, Article 40 (July 2011), 40:1--40:14 pages. Google Scholar
Digital Library
- Rafał K. Mantiuk, Minjung Kim, Maliha Ashraf, Qiang Xu, M. Ronnier Luo, Jasna Martinovic, and Sophie Wuerger. 2020. Practical color contrast sensitivity functions for luminance levels up to 10 000 cd/m2. In Color Imaging Conference. 1--6. Google Scholar
Cross Ref
- A. Mikhailiuk, M. Pérez-Ortiz, D. Yue, W. Suen, and R. K. Mantiuk. 2021. Consolidated dataset and metrics for high-dynamic-range image quality. IEEE Transactions on Multimedia (2021), (in print).Google Scholar
- Manish Narwaria, Matthieu Perreira Da Silva, Patrick Le Callet, and Romuald Pepion. 2013. Tone mapping-based high-dynamic-range image compression: study of optimization criterion and perceptual quality. Optical Engineering 52, 10 (oct 2013), 102008. Google Scholar
Cross Ref
- Manish Narwaria, Matthieu Perreira Da Silva, and Patrick Le Callet. 2015. HDR-VQM: An objective quality measure for high dynamic range video. Signal Processing: Image Communication 35 (jul 2015), 46--60. Google Scholar
Digital Library
- Anjul Patney, Marco Salvi, Joohwan Kim, Anton Kaplanyan, Chris Wyman, Nir Benty, David Luebke, and Aaron Lefohn. 2016. Towards foveated rendering for gaze-tracked virtual reality. ACM Transactions on Graphics (TOG) 35, 6 (2016), 179.Google Scholar
Digital Library
- Eli Peli. 1990. Contrast in complex images. Journal of the Optical Society of America A 7, 10 (oct 1990), 2032--2040. Google Scholar
Cross Ref
- Eli Peli, Jian Yang, and Robert B. Goldstein. 1991. Image invariance with changes in size: the role of peripheral contrast thresholds. Journal of the Optical Society of America A 8, 11 (Nov. 1991), 1762. Google Scholar
Cross Ref
- Maria Perez-Ortiz and Rafal K. Mantiuk. 2017. A practical guide and software for analysing pairwise comparison experiments. arXiv preprint (dec 2017). arXiv:1712.03686 http://arxiv.org/abs/1712.03686Google Scholar
- Maria Perez-Ortiz, Aliaksei Mikhailiuk, Emin Zerman, Vedad Hulusic, Giuseppe Valenzise, and Rafal K. Mantiuk. 2020. From pairwise comparisons and rating to a unified quality scale. IEEE Transactions on Image Processing 29 (2020), 1139--1151. Google Scholar
Cross Ref
- Nikolay Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir Lukin, Karen Egiazarian, Jaakko Astola, Benoit Vozel, Kacem Chehdi, Marco Carli, Federica Battisti, and C.-C. Jay Kuo. 2015. Image database TID2013: Peculiarities, results and perspectives. Signal Processing: Image Comm. 30 (2015), 57--77. Google Scholar
Digital Library
- Snježana Rimac-Drlje, Goran Martinović, and Branka Zovko-Cihlar. 2011. Foveation-based content Adaptive Structural Similarity index. In 2011 18th International Conference on Systems, Signals and Image Processing. IEEE, 1--4.Google Scholar
- Snježana Rimac-Drlje, Mario Vranješ, and Drago Žagar. 2010. Foveated mean squared error---a novel video quality metric. Multimedia tools and applications 49, 3 (2010), 425--445.Google Scholar
- J.G. Robson and Norma Graham. 1981. Probability summation and regional variation in contrast sensitivity across the visual field. Vision Research 21, 3 (jan 1981), 409--418. Google Scholar
Cross Ref
- Guodong Rong and Tiow-Seng Tan. 2006. Jump flooding in GPU with applications to Voronoi diagram and distance transform. In Proceedings of the 2006 symposium on Interactive 3D graphics and games. ACM, 109--116.Google Scholar
Digital Library
- J. Rovamo and V. Virsu. 1979. An estimation and application of the human cortical magnification factor. Experimental Brain Research 37, 3 (1979), 495--510. Google Scholar
Cross Ref
- Kalpana Seshadrinathan and Alan Conrad Bovik. 2009. Motion tuned spatio-temporal quality assessment of natural videos. IEEE transactions on image processing 19, 2 (2009), 335--350.Google Scholar
- H.R. Sheikh, M.F. Sabir, and A.C. Bovik. 2006. A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms. IEEE Transactions on Image Processing 15, 11 (2006), 3440--3451. Google Scholar
Digital Library
- E.P. Simoncelli and W.T. Freeman. 2002. The steerable pyramid: a flexible architecture for multi-scale derivative computation. In IEEE ICIP, Vol. 3. 444--447. Google Scholar
Cross Ref
- Vincent Sitzmann, Ana Serrano, Amy Pavel, Maneesh Agrawala, Diego Gutierrez, Belen Masia, and Gordon Wetzstein. 2017. How do people explore virtual environments? IEEE Transactions on Visualization and Computer Graphics (2017).Google Scholar
- Philip L. Smith. 1998. Bloch's law predictions from diffusion process models of detection. Australian Journal of Psychology 50, 3 (dec 1998), 139--147. Google Scholar
Cross Ref
- Rajiv Soundararajan and Alan C Bovik. 2012. Video quality assessment by reduced reference spatio-temporal entropic differencing. IEEE Transactions on Circuits and Systems for Video Technology 23, 4 (2012), 684--694.Google Scholar
Digital Library
- Srinivas Sridharan, Reynold Bailey, Ann McNamara, and Cindy Grimm. 2012. Subtle gaze manipulation for improved mammography training. In Proceedings of the Symposium on Eye Tracking Research and Applications. 75--82.Google Scholar
Digital Library
- C. F. Stromeyer and B. Julesz. 1972. Spatial-Frequency Masking in Vision: Critical Bands and Spread of Masking. Journal of the Optical Society of America 62, 10 (oct 1972), 1221. Google Scholar
Cross Ref
- Qi Sun, A. Patney, L.-Y. Wei, O. Shapira, J. Lu, P. Asente, S. Zhu, M. McGuire, D. Luebke, and A. Kaufman. 2018. Towards virtual reality infinite walking: Dynamic saccadic redirection. ACM Trans. on Graph. (2018), 16.Google Scholar
- Nicholas T. Swafford, José A. Iglesias-Guitian, Charalampos Koniaris, Bochang Moon, Darren Cosker, and Kenny Mitchell. 2016. User, metric, and computational evaluation of foveated rendering methods. In Proceedings of the ACM Symposium on Applied Perception - SAP '16. ACM Press. Google Scholar
Digital Library
- Okan Tarhan Tursun, Elena Arabadzhiyska-Koleva, Marek Wernikowski, Radosław Mantiuk, Hans-Peter Seidel, Karol Myszkowski, and Piotr Didyk. 2019. Luminance-contrast-aware foveated rendering. ACM Transactions on Graphics 38, 4 (July 2019), 1--14. Google Scholar
Digital Library
- Peter Vangorp, Karol Myszkowski, Erich W. Graf, and Rafał K. Mantiuk. 2015. A model of local adaptation. ACM Transactions on Graphics 34, 6 (oct 2015), 1--13. Google Scholar
Digital Library
- V. Virsu and J. Rovamo. 1979. Visual resolution, contrast sensitivity, and the cortical magnification factor. Experimental Brain Research 37, 3 (Nov. 1979). Google Scholar
Cross Ref
- Zhou Wang, Alan C. Bovik, Ligang Lu, and Jack L. Kouloheris. 2001. Foveated wavelet image quality index. In Applications of Digital Image Processing XXIV, Andrew G. Tescher (Ed.). SPIE. Google Scholar
Cross Ref
- Z Wang, E.P. Simoncelli, and A.C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh A silomar Conference on Signals, Systems & Computers, 2003. IEEE, 1398--1402. Google Scholar
Cross Ref
- AB Watson and JA Solomon. 1997. Model of visual contrast gain control and pattern masking. Journal of the Optical Society of America A 14, 9 (1997), 2379--2391.Google Scholar
Cross Ref
- Andrew B. Watson. 1987. The cortex transform: Rapid computation of simulated neural images. Computer Vision, Graphics, and Image Processing 39, 3 (sep 1987), 311--327. Google Scholar
Digital Library
- Andrew B. Watson and Albert J. Ahumada. 2016. The pyramid of visibility. Human Vision and Electronic Imaging 2016, HVEI 2016 (2016), 37--42. Google Scholar
Cross Ref
- Stefan Winkler, Murat Kunt, and Christian J van den Branden Lambrecht. 2001. Vision and video: models and applications. In Vision Models and Applications to Image and Video Processing. Springer, 201--229.Google Scholar
- Krzysztof Wolski, Daniele Giunchi, Nanyang Ye, Piotr Didyk, Karol Myszkowski, Radosław Mantiuk, Hans-Peter Seidel, Anthony Steed, and Rafał K. Mantiuk. 2018. Dataset and Metrics for Predicting Local Visible Differences. ACM Transactions on Graphics 37, 5 (nov 2018), 1--14. Google Scholar
Digital Library
- Lei Yang, Diego Nehab, Pedro V. Sander, Pitchaya Sitthi-amorn, Jason Lawrence, and Hugues Hoppe. 2009. Amortized Supersampling. ACM Trans. Graph. 28, 5, Article 135 (Dec. 2009), 12 pages. Google Scholar
Digital Library
- Nanyang Ye, Krzysztof Wolski, and Rafal K. Mantiuk. 2019. Predicting Visible Image Differences Under Varying Display Brightness and Viewing Distance. In CVPR. 5429--5437. Google Scholar
Cross Ref
- Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR. 586--595. Google Scholar
Cross Ref
Index Terms
FovVideoVDP: a visible difference predictor for wide field-of-view video
Recommendations
A perceptual model for eccentricity-dependent spatio-temporal flicker fusion and its applications to foveated graphics
Virtual and augmented reality (VR/AR) displays strive to provide a resolution, framerate and field of view that matches the perceptual capabilities of the human visual system, all while constrained by limited compute budgets and transmission bandwidths ...
Perceptually-based foveated virtual reality
SIGGRAPH '16: ACM SIGGRAPH 2016 Emerging TechnologiesHumans have two distinct vision systems: foveal and peripheral vision. Foveal vision is sharp and detailed, while peripheral vision lacks fidelity. The difference in characteristics of the two systems enable recently popular foveated rendering systems, ...
Foveated light culling
- A complete pipeline to approximate single-bounce diffuse-to-diffuse indirect illumination in the foveal region.
Graphical abstractDisplay Omitted
AbstractIn this paper, we propose a novel Foveated Light Culling (FLC) method to efficiently approximate global illumination for foveated rendering in virtual reality applications. The key idea is to cull the virtual point lights (VPLs) in ...





Comments