Abstract
Image quality is an important practical challenge that is often overlooked in the design of machine vision systems. Commonly, machine vision systems are trained and tested on high-quality image datasets, yet in practical applications the input images cannot be assumed to be of high quality. Modern deep neural networks (DNNs) have been shown to perform poorly on images affected by blur or noise distortions. In this work, we investigate whether human subjects also perform poorly on distorted stimuli and provide a direct comparison with the performance of DNNs. Specifically, we study the effect of Gaussian blur and additive Gaussian noise on human and DNN classification performance. We perform two experiments: one crowd-sourced experiment with unlimited stimulus display time, and one lab experiment with 100ms display time. In both cases, we found that humans outperform neural networks on distorted stimuli, even when the networks are retrained with distorted data.
- Timothy J. Andrews and David M. Coppola. 1999. Idiosyncratic characteristics of saccadic eye movements when viewing different visual environments. Vision Research 39, 17 (1999), 2947--2953.Google Scholar
Cross Ref
- Talis Bachmann. 1991. Identification of spatially quantised tachistoscopic images of faces: How many pixels does it take to carry identity? European Journal of Cognitive Psychology 3, 1 (1991), 87--103.Google Scholar
Cross Ref
- Ali Borji and Laurent Itti. 2014. Human vs. computer in scene and object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 113--120. Google Scholar
Digital Library
- Tejas S. Borkar and Lina J. Karam. 2017. DeepCorrect: Correcting DNN models against image distortions. arXiv:1705.02406.Google Scholar
- Charles F. Cadieu, Ha Hong, Daniel L. K. Yamins, Nicolas Pinto, Diego Ardila, Ethan A. Solomon, Najib J. Majaj, et al. 2014. Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Computational Biology 10, 12 (2014), e1003963.Google Scholar
Cross Ref
- Yue Chen, Ryan McBain, and Daniel Norton. 2015. Specific vulnerability of face perception to noise: A similar effect in schizophrenia patients and healthy individuals. Psychiatry Research 225, 3 (2015), 619--624.Google Scholar
Cross Ref
- Steven Diamond, Vincent Sitzmann, Stephen Boyd, Gordon Wetzstein, and Felix Heide. 2016. Dirty pixels: Optimizing image classification architectures for raw sensor data. arXiv:1701.06487.Google Scholar
- Samuel Dodge and Lina Karam. 2016. Understanding how image quality affects deep neural networks. In Proceedings of teh 2016 8th International Conference on Quality of Multimedia Experience (QoMEX’16). 1--6.Google Scholar
Cross Ref
- Samuel Dodge and Lina Karam. 2017. Quality resilient neural networks. arXiv:1703.08119.Google Scholar
- Li Fei-Fei, Rob Fergus, and Pietro Perona. 2007. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Image Understanding 106, 1 (2007), 59--70. Google Scholar
Digital Library
- François Fleuret, Ting Li, Charles Dubout, Emma K. Wampler, Steven Yantis, and Donald Geman. 2011. Comparing machines and humans on a visual categorization test. Proceedings of the National Academy of Sciences 108, 43 (2011), 17621--17625.Google Scholar
Cross Ref
- Robert Geirhos, David H. J. Janssen, Heiko H. Schutt, Jonas Rauberand, Matthias Bethge, and Felix A. Wichmann. 2017. Comparing deep neural networks against humans: Object recognition when the signal gets weaker. arXiv:1706.06969.Google Scholar
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 580--587. Google Scholar
Digital Library
- Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385.Google Scholar
- Md Tahmid Hossain, Shyh Wei Teng, Dengsheng Zhang, Suryani Lim, and Guojun Lu. 2018. Distortion robust image classification with deep convolutional neural network based on discrete cosine transform. arXiv:1811.05819.Google Scholar
- Lina J. Karam and Tong Zhu. 2015. Quality labeled faces in the wild (QLFW): A database for studying face recognition in real-world environments. In IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, Bellingham, WA, 93940B1--93940B10.Google Scholar
- Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of GANs for improved quality, stability, and variation. arXiv:1710.10196.Google Scholar
- C. Keysers, D.-K. Xiao, P. Földiák, and D. I. Perrett. 2001. The speed of sight. Journal of Cognitive Neuroscience 13, 1 (2001), 90--101. Google Scholar
Digital Library
- Saeed Reza Kheradpisheh, Masoud Ghodrati, Mohammad Ganjtabesh, and Timothée Masquelier. 2016. Deep networks can resemble human feed-forward vision in invariant object recognition. Scientific Reports 6 (2016), 32672.Google Scholar
Cross Ref
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3431--3440.Google Scholar
Cross Ref
- Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2017. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
Cross Ref
- Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42, 3 (2001), 145--175. Google Scholar
Digital Library
- Mary C. Potter, Brad Wyble, Carl Erick Hagmann, and Emily S. McCourt. 2014. Detecting meaning in RSVP at 13 ms per picture. Attention, Perception, and Psychophysics 76, 2 (2014), 270--279.Google Scholar
Cross Ref
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. Google Scholar
Digital Library
- K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Sebastian Stabinger, Antonio Rodríguez-Sánchez, and Justus Piater. 2016. 25 years of CNNs: Can we compare to human abstraction capabilities? In Proceedings of the International Conference on Artificial Neural Networks. 380--387.Google Scholar
Cross Ref
- Jiawei Su, Danilo Vasconcellos Vargas, and Sakurai Kouichi. 2017. One pixel attack for fooling deep neural networks. arXiv:1710.08864.Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, et al. 2015. Going deeper with convolutions. In Proceedings of the IEEE International Conference on Computer Vision (CVPR’15). 1--9.Google Scholar
Cross Ref
- JianWen Tao, Wenjun Hu, and Shiting Wen. 2016. Multi-source adaptation joint kernel sparse representation for visual classification. Neural Networks 76 (2016), 135--151. Google Scholar
Digital Library
- Antonio Torralba, Rob Fergus, and William T. Freeman. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 11 (2008), 1958--1970. Google Scholar
Digital Library
- Shimon Ullman, Liav Assif, Ethan Fetaya, and Daniel Harari. 2016. Atoms of recognition in human and computer vision. Proceedings of the National Academy of Sciences 113, 10, 2744--2749.Google Scholar
Cross Ref
- Igor Vasiljevic, Ayan Chakrabarti, and Gregory Shakhnarovich. 2016. Examining the impact of blur on recognition by convolutional networks. arXiv:1611.05760.Google Scholar
- Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, et al. 2017. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
Cross Ref
- Amir R. Zamir, Te-Lin Wu, Lin Sun, William Shen, Jitendra Malik, and Silvio Savarese. 2017. Feedback networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
Cross Ref
- Yiren Zhou, Sibo Song, and Ngai-Man Cheung. 2017. On classification of distorted images with deep convolutional neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’17). 1213--1217.Google Scholar
Cross Ref
Index Terms
Human and DNN Classification Performance on Images With Quality Distortions: A Comparative Study





Comments