Abstract
Visual saliency models have enjoyed a big leap in performance in recent years, thanks to advances in deep learning and large scale annotated data. Despite enormous effort and huge breakthroughs, however, models still fall short in reaching human-level accuracy. In this work, I explore the landscape of the field emphasizing on new deep saliency models, benchmarks, and datasets. A large number of image and video saliency models are reviewed and compared over two image benchmarks and two large scale video datasets. Further, I identify factors that contribute to the gap between models and humans and discuss the remaining issues that need to be addressed to build the next generation of more powerful saliency models. Some specific questions that are addressed include: in what ways current models fail, how to remedy them, what can be learned from cognitive studies of attention, how explicit saliency judgments relate to fixations, how to conduct fair model comparison, and what are the emerging applications of saliency models.
- [1] , “Shifts in selective visual attention: Towards the underlying neural circuitry,” in Matters of Intelligence. New York, NY, USA: Springer, 1987, pp. 115–141.Google Scholar
Cross Ref
- [2] , “A feature-integration theory of attention,” Cognitive Psychology, vol. 12, no. 1, pp. 97–136, 1980.Google Scholar
Cross Ref
- [3] , “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 11, pp. 1254–1259, Nov. 1998.Google Scholar
Digital Library
- [4] , “State-of-the-art in visual attention modeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 185–207, Jan. 2013.Google Scholar
Digital Library
- [5] , “Eye movements in natural behavior,” Trends Cognitive Sci., vol. 9, no. 4, pp. 188–194, 2005.Google Scholar
Cross Ref
- [6] , “What/where to look next? modeling top-down visual attention in complex interactive environments,” IEEE Trans. Syst. Man. Cybern. Part A - Syst. Humans, vol. 44, no. 5, pp. 523–538, May 2014.Google Scholar
- [7] , “Where should saliency models look next?” in Proc. Eur. Conf. Comput. Vis.,
2016 , pp. 809–824.Google Scholar - [8] , “Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study,” IEEE Trans. Image Process., vol. 22, no. 1, pp. 55–69, Jan. 2013.Google Scholar
Digital Library
- [9] , “What stands out in a scene? a study of human explicit saliency judgment,” Vis. Res., vol. 91, pp. 62–77, Aug. 2013.Google Scholar
- [10] , “On computational modeling of visual saliency: Examining whats right, and whats left,” Vis. Res., vol. 116, pp. 95–112, 2015.Google Scholar
Cross Ref
- [11] , “What do different evaluation metrics tell us about saliency models?” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 3, pp. 740–757, Mar. 2018.Google Scholar
Digital Library
- [12] , “Saliency revisited: Analysis of mouse movements versus fixations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2017 , pp. 1774–1782.Google Scholar - [13] , “Analysis of scores, datasets, and models in visual saliency prediction,” in Proc. IEEE Int. Conf. Comput. Vis.,
Dec. 2013 , pp. 921–928.Google Scholar - [14] , “What do saliency models predict?” J. Vis., vol. 14, no. 3, 2014, Art. no. 14.Google Scholar
- [15] , “Towards the quantitative evaluation of visual attention models,” Vis. Res., vol. 116, pp. 258–268, 2015.Google Scholar
- [16] , “Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks,” in Proc. IEEE Int. Conf. Comput. Vis.,
2015 , pp. 262–270.Google Scholar - [17] , “Salient object detection: A benchmark,” in Proc. Eur. Conf. Comput. Vis.,
2012 , pp. 414–429.Google Scholar - [18] , “Understanding neural networks through deep visualization,” arXiv preprint arXiv:1506.06579, 2015.Google Scholar
- [19] , “TurkerGaze: Crowdsourcing saliency with webcam based eye tracking,” vol. abs/1504.06755, 2015.Google Scholar
- [20] , “Modeling the role of salience in the allocation of overt visual attention,” Vis. Res., vol. 42, no. 1, pp. 107–123, 2002.Google Scholar
Digital Library
- [21] , “Saliency based on information maximization,” in Proc. 18th Int. Conf. Neural Inf. Process. Syst.,
2005 , pp. 155–162.Google Scholar - [22] , “Salicon: Saliency in context,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2015 , pp. 1072–1080.Google Scholar - [23] , “BubbleView: An alternative to eye-tracking for crowdsourcing image importance,” Feb. 2017.Google Scholar
- [24] , “Overview of eye tracking datasets,” in Proc. 5th Int. Workshop Quality Multimedia Exp.,
2013 , pp. 212–217.Google Scholar - [25] , “MIT saliency benchmark.” 2015. [Online]. Available: http://saliency.mit.edu/Google Scholar
- [26] , “Learning to predict where humans look,” in Proc. IEEE 12th Int. Conf. Comput. Vis.,
2009 , pp. 2106–2113.Google Scholar - [27] , “Understanding infographics through textual and visual tag prediction,” arXiv: 1709.09215, 2017.Google Scholar
- [28] , “Saliency in crowd,” in Proc. Eur. Conf. Comput. Vis.,
2014 , pp. 17–32.Google Scholar - [29] , “Webpage saliency,” in Proc. Eur. Conf. Comput. Vis.,
2014 , pp. 33–46.Google Scholar - [30] , “Task-driven webpage saliency,” in Proc. Eur. Conf. Comput. Vis.,
2018 , pp. 287–302.Google Scholar - [31] , “An eye fixation database for saliency detection in images,” in Proc. Eur. Conf. Comput. Vis.,
2010 , pp. 30–43.Google Scholar - [32] , “Emotional attention: A study of image sentiment and visual attention,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2018 , pp. 7521–7531.Google Scholar - [33] , “Probabilistic learning of task-specific visual attention,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2012 , pp. 470–477.Google Scholar - [34] , “Learning where to attend like a human driver” in Proc. 2017 IEEE Intell. Vehicles Symp. (IV), 2017, pp. 920–925.Google Scholar
- [35] , “Clustering of gaze during dynamic scene viewing is predicted by motion,” Cognitive Comput., vol. 3, no. 1, pp. 5–24, 2011.Google Scholar
Cross Ref
- [36] , “Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 7, pp. 1408–1424, Jul. 2015.Google Scholar
Digital Library
- [37] , “Revisiting video saliency: A large-scale benchmark and a new model,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2018 , pp. 4894–4903.Google Scholar - [38] , “Deepvs: A deep learning based video saliency prediction approach,” in Proc. Eur. Conf. Comput. Vis.,
2018 , pp. 602–617.Google Scholar - [39] , “How saliency, faces, and sound influence gaze in dynamic social scenes,” J. Vis., vol. 14, no. 8, pp. 5–5, 2014.Google Scholar
Cross Ref
- [40] , “Predicting salient face in multipleface videos,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2017 , pp. 4420–4428.Google Scholar - [41] , “Saliency detection in 360 videos,” in Proc. Eur. Conf. Comput. Vis.,
2018 , pp. 488–503.Google Scholar - [42] , “Saltinet: Scan-path prediction on 360 degree images using saliency volumes,” in Proc. Comput. Vis. Workshop,
2017 , pp. 2331–2338.Google Scholar - [43] , “The earth mover's distance as a metric for image retrieval,” Int. J. Comput. Vis., vol. 40, no. 2, pp. 99–121, 2000.Google Scholar
Digital Library
- [44] , “Components of bottom-up gaze allocation in natural images,” Vis. Res., vol. 45, no. 8, pp. 2397–2416, Aug. 2005.Google Scholar
Cross Ref
- [45] , “Sun: A bayesian framework for saliency using natural statistics,” J. Vis., vol. 8, no. 7, pp. 32–32, 2008.Google Scholar
- [46] , “Information-theoretic model comparison unifies saliency metrics,” Proc. Nat. Acad. Sci., vol. 112, no. 52, pp. 16054–16059,
2015 .Google Scholar - [47] , “Learning a saliency evaluation metric using crowdsourced perceptual judgments,” arXiv: 1806.10257, 2018.Google Scholar
- [48] , “Shifts in selective visual attention: Towards the underlying neural circuitry,” Hum. Neurobiol., vol. 4, no. 4, pp. 219–27, 1985.Google Scholar
- [49] , “Faces and text attract gaze independent of the task: Experimental data and computer model,” J. Vis., vol. 9, no. 12, pp. 10.1–15, Nov. 18, 2009.Google Scholar
- [50] , “Augmented saliency model using automatic 3d head pose detection and learned gaze following in natural scenes,” Vis. Res., vol. 116B, pp. 113–126, 2015.Google Scholar
- [51] , “Graph-based visual saliency,” in Proc. 19th Int. Conf. Neural Inf. Process. Syst.,
2006 , pp. 545–552.Google Scholar - [52] , “Saliency detection: A spectral residual approach,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2007 , pp. 1–8.Google Scholar - [53] , “Saliency from hierarchical adaptation through decorrelation and variance normalization,” Image Vis. Comput., vol. 30, no. 1, pp. 51–64, 2012.Google Scholar
Digital Library
- [54] , “Saliency detection: A boolean map approach,” in Proc. IEEE Int. Conf. Comput. Vis.,
2013 , pp. 153–160.Google Scholar - [55] , “Dynamic whitening saliency,” in Proc. IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 5, pp. 893–907, May 2017.Google Scholar
Digital Library
- [56] , “How many bits does it take for a stimulus to be salient?” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2015 , pp. 5501–5510.Google Scholar - [57] , “Learning to detect video saliency with hevc features,” IEEE Trans. Image Process., vol. 26, no. 1, pp. 369–385, Jan. 2017.Google Scholar
Digital Library
- [58] , “Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2008, pp. 1–8.Google Scholar
- [59] , “Learning video saliency from human gaze using candidate selection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2013 , pp. 1147–1154.Google Scholar - [60] , “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.Google Scholar
Digital Library
- [61] , “Imagenet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2009 , pp. 248–255.Google Scholar - [62] , “Large-scale optimization of hierarchical features for saliency prediction in natural images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2014 , pp. 2798–2805.Google Scholar - [63] , “Deep gaze i: Boosting saliency prediction with feature maps trained on imagenet,” arXiv:1411.1045, 2014.Google Scholar
- [64] , “Imagenet classification with deep convolutional neural networks,” in Proc. 25th Int. Conf. Neural Inf. Process. Syst. - Vol. 1,
2012 , pp. 1097–1105.Google Scholar - [65] , “Understanding low-and high-level contributions to fixation prediction,” in Proc. IEEE Int. Conf. Comput. Vis.,
2017 , pp. 4799–4808.Google Scholar - [66] , “Predicting eye fixations using convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2015 , pp. 362–370.Google Scholar - [67] , “Deepfix: A fully convolutional neural network for predicting human eye fixations,” IEEE Trans. Image Process., vol. 26, no. 9, pp. 4446–4456, Sep. 2017.Google Scholar
Digital Library
- [68] , “A deep multi-level network for saliency prediction,” in Proc. 23rd Int. Conf. Pattern Recognit.,
2016 , pp. 3488–3493.Google Scholar - [69] , “Shallow and deep convolutional networks for saliency prediction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2016 , pp. 598–606.Google Scholar - [70] , “End-to-end saliency mapping via probability distribution prediction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2016 , pp. 5753–5761.Google Scholar - [71] , “A deep spatial contextual long-term recurrent convolutional network for saliency detection,” IEEE Trans. Image Process., vol. 27, no. 7, pp. 3264–3274, 2018.Google Scholar
- [72] , “Salgan: Visual saliency prediction with generative adversarial networks,” arXiv: 1701.01081, 2017.Google Scholar
- [73] , “Generative adversarial nets,” in Proc. Neural Inf. Process. Syst.,
2014 , pp. 2672–2680.Google Scholar - [74] , “Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features,” Neurocomputing, vol. 244, pp. 10–18, 2017.Google Scholar
Digital Library
- [75] , “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501, 2006.Google Scholar
Cross Ref
- [76] , “Eml-net: An expandable multi-layer network for saliency prediction,” arXiv: 1805.01047, 2018.Google Scholar
- [77] , “Deep visual attention prediction,” IEEE Trans. Image Process., vol. 27, no. 5, pp. 2368–2378, 2017.Google Scholar
Digital Library
- [78] , “Predicting human eye fixations via an lstm-based saliency attentive model,” IEEE Trans. Image Process., vol. 27, no. 10, pp. 5142–5154, 2018.Google Scholar
- [79] , “A deeper look at saliency: feature contrast, semantics, and beyond,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2016 , pp. 516–524.Google Scholar - [80] , “The role of context for object detection and semantic segmentation in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2014 , pp. 891–898.Google Scholar - [81] , “Attentional push: A deep convolutional network for augmenting image salience with shared attention modeling in social scenes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2017 , pp. 3472–3481.Google Scholar - [82] , “An integrated model for effective saliency prediction,” in Proc. 31st AAAI Conf. Artif. Intell.,
2017 , pp. 274–281.Google Scholar - [83] , “Visual saliency prediction using a mixture of deep neural networks,” IEEE Trans. Image Process., vol. 27, no. 8, pp. 4080–4090, Aug. 2018.Google Scholar
Cross Ref
- [84] , “Fixation bank: Learning to reweight fixation candidates,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
Jun. 2015 , pp. 3174–3182.Google Scholar - [85] , “Deepfeat: A bottom-up and top-down saliency model based on deep features of convolutional neural nets,” IEEE Trans. Cogn. Develop. Syst., 2019.Google Scholar
- [86] , “Personalized saliency and its prediction,” arXiv: 1710.03011, 2017.Google Scholar
- [87] , “PathGAN: Visual scanpath prediction with generative adversarial networks,” in Proc. Eur. Conf. Comput. Vis., 2018.Google Scholar
- [88] , “Active fixation control to predict saccade sequences,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2018 , pp. 3184–3193.Google Scholar - [89] , “Static saliency vs. dynamic saliency: A comparative study,” in Proc. 21st ACM Int. Conf. Multimedia,
2013 , pp. 987–996.Google Scholar - [90] , “Spatio-temporal saliency networks for dynamic saliency prediction,” IEEE Trans. Multimedia, vol. 20, no. 7, pp. 1688–1698, Jul. 2018.Google Scholar
- [91] , “Transfer learning with deep networks for saliency prediction in natural video,” in Proc. IEEE Int. Conf. Image Process.,
2016 , pp. 1604–1608.Google Scholar - [92] , “Recurrent mixture density network for spatiotemporal visual attention,” in Proc. Int. Conf. Learn. Representations,
2017 .Google Scholar - [93] , “Predicting video saliency with object-to-motion cnn and two-layer convolutional lstm,” arXiv: 1709.06316, 2017.Google Scholar
- [94] , “Learning gaze transitions from depth to improve video saliency estimation,” in Proc. IEEE Int. Conf. Comput. Vis.,
2017 , vol. 3, pp. 1707–1716.Google Scholar - [95] , “Going from image to video saliency: Augmenting image salience with dynamic attentional push,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2018 , pp. 7501–7511.Google Scholar - [96] , “Sg-fcn: A motion and memory-based deep learning model for video saliency detection,” IEEE Trans. Cybern., vol. 49, no. 8, pp. 2900–2911, Aug. 2018.Google Scholar
- [97] , “Salient object detection in the deep learning era: An in-depth survey,” arXiv: 1904.09146, 2019.Google Scholar
- [98] , “Learning to predict where humans look,” in Proc. IEEE 12th Int. Conf. Comput. Vis.,
2009 , pp. 2106–2113.Google Scholar - [99] , “Boosting bottom-up and top-down visual features for saliency estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2012 , pp. 438–445.Google Scholar - [100] , “Predicting human gaze beyond pixels,” J. Vis., vol. 14, no. 1, pp. 28–28, 2014.Google Scholar
- [101] , “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2016 , pp. 770–778.Google Scholar - [102] , “The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions,” J. Vis., vol. 7, no. 14, pp. 1–17, 2007.Google Scholar
Digital Library
- [103] , “Saliency benchmarking made easy: Separating models, maps and metrics,” in Proc. Eur. Conf. Comput. Vis.,
2018 , pp. 770–787.Google Scholar - [104] , “Variational laws of visual attention for dynamic scenes,” in Proc. Neural Inf. Process. Syst.,
2017 , pp. 3823–3832.Google Scholar - [105] , “Saliency unified: A deep architecture for simultaneous eye fixation prediction and salient object segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2016 , pp. 5781–5790.Google Scholar - [106] , “Video salient object detection via fully convolutional networks,” IEEE Trans. Image Process., vol. 27, no. 1, pp. 38–49, Jan. 2018.Google Scholar
Digital Library
- [107] , “Where are they looking?” in Proc. Neural Inf. Process. Syst.,
2015 , pp. 199–207.Google Scholar - [108] , “Meaning guides attention in real-world scene images: Evidence from eye movements and meaning maps,” J. Vis., vol. 18, no. 6, pp. 10–10, 2018.Google Scholar
- [109] , “Saliency, scale and information: Towards a unifying theory,” in Proc. Neural Inf. Process. Syst.,
2015 , pp. 2188–2196.Google Scholar - [110] , “Five factors that guide attention in visual search,” Nature Hum. Behaviour, vol. 1, 2017, Art. no. 0058.Google Scholar
- [111] , “Computational modelling of visual attention,” Nature Rev. Neuroscience, vol. 2, no. 3, pp. 194–203, 2001.Google Scholar
Cross Ref
- [112] , “Complementary effects of gaze direction and early saliency in guiding fixations during free viewing,” J. Vis., vol. 14, no. 13, pp. 1–32, Nov. 2014.Google Scholar
- [113] , “Vanishing point attracts gaze in free-viewing and visual search tasks,” J. Vis., vol. 16, no. 14, 2016, Art. no. 18.Google Scholar
- [114] , “Object-based attentional selection in scene viewing,” J. Vis., vol. 10, no. 8, 2010, Art. no. 20.Google Scholar
- [115] , “Reconciling saliency and object center-bias hypotheses in explaining free-viewing fixations,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 6, pp. 1214–1226, Jun. 2016.Google Scholar
- [116] , “Faces and text attract gaze independent of the task: Experimental data and computer model,” J. Vis., vol. 9, no. 12, pp. 10–10, 2009.Google Scholar
- [117] , “Cat2000: A large scale fixation dataset for boosting saliency research,” arXiv:1505.03581, pp. 1–4, May 2015.Google Scholar
- [118] , “Labeled faces in the wild: A database for studying face recognition in unconstrained environments,” Univ. Massachusetts, Amherst, MA, USA,
Tech. Rep. 07–49 , 2007.Google Scholar - [119] , “Saliency-based identification and recognition of pointed-at objects,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst.,
2010 , pp. 4638–4643.Google Scholar - [120] , “Gaze allocation in a dynamic situation: Effects of social status and speaking,” Cognition, vol. 117, no. 3, pp. 319–331, 2010.Google Scholar
- [121] , “It's in the eyes: Planning precise manual actions before execution,” J. Vis., vol. 16, no. 1, pp. 18–18, 2016.Google Scholar
- [122] , “Complementary effects of gaze direction and early saliency in guiding fixations during free viewing,” J. Vis., vol. 14, no. 13, pp. 3–3, 2014.Google Scholar
Cross Ref
- [123] , “Measuring and predicting object importance,” Int. J. Comput. Vis., vol. 91, no. 1, pp. 59–76, 2011.Google Scholar
Digital Library
- [124] , “Interesting objects are visually salient,” J. Vis., vol. 8, no. 3:3, pp. 1–15, Mar. 2008.Google Scholar
- [125] , “Fixations on objects in natural scenes: Dissociating importance from salience,” Frontiers Psychology, vol. 4, 2013, Art. no. 455.Google Scholar
- [126] , “Cultural variation in eye movements during scene perception,” Proc. Nat. Acad. Sci. United State Amer., vol. 102, pp. 12629–12633,
2005 .Google Scholar - [127] , “Top-down influences on visual attention during listening are modulated by observer sex,” Vis. Res., vol. 65, pp. 62–76, 2012.Google Scholar
- [128] , “Objects predict fixations better than early saliency,” J. Vis., vol. 8, no. 14, pp. 18–18, 2008.Google Scholar
Cross Ref
- [129] , “Objects do not predict fixations better than early saliency: A re-analysis of Einhaeuser, et al.'s, data,” J. Vis., vol. 13, no. 10, pp. 1–4, Aug. 2013.Google Scholar
- [130] , “Overt attention in natural scenes: Objects dominate features,” Vis. Res., vol. 107, pp. 36–48, 2015.Google Scholar
Cross Ref
- [131] , “Predicting eye fixations on complex visual stimuli using local symmetry,” Cognitive Comput., vol. 3, no. 1, pp. 223–240, 2011.Google Scholar
Cross Ref
- [132] , “Crowdsourcing gaze data collection,” arXiv:1204.3367, 2012.Google Scholar
- [133] , “Fixations on low-resolution images,” J. Vis., vol. 11, no. 4, pp. 14–14, 2011.Google Scholar
- [134] , “The effect of distortions on the prediction of visual attention,” arXiv:1604.03882, 2016.Google Scholar
- [135] , “Visual saliency in noisy images,” J. Vis., vol. 13, no. 4, 2013, Art. no. 5.Google Scholar
- [136] , “Saliency preservation in low-resolution grayscale images,” in Proc. European Conf. Comput. Vis., 2018, pp. 235–251.Google Scholar
- [137] , “Invariance analysis of saliency models versus human gaze during scene free viewing,” arXiv preprint arXiv:1810.04456, 2018.Google Scholar
- [138] , “Adversarial attacks against deep saliency models,” arXiv: 1904.01231, 2019.Google Scholar
- [139] , “Revisiting unreasonable effectiveness of data in deep learning era,” in Proc. IEEE Int. Conf. Comput. Vis.,
2017 , pp. 843–852.Google Scholar - [140] , “Bringing semantics into focus using visual abstraction,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2013 , pp. 3009–3016.Google Scholar - [141] , “Ucf101: A dataset of 101 human actions classes from videos in the wild,” arXiv:1212.0402, 2012.Google Scholar
- [142] , “A large-scale benchmark dataset for event recognition in surveillance video,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2011 , pp. 3153–3160.Google Scholar - [143] , “Detecting activities of daily living in first-person camera views,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2012 , pp. 2847–2854.Google Scholar - [144] , “Actor and observer: Joint modeling of first and third-person videos,” CVPR, 2018.Google Scholar
- [145] , “Studying relationships between human gaze, description, and computer vision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2013 , pp. 739–746.Google Scholar - [146] , “Show, attend and tell: Neural image caption generation with visual attention,” in Proc. Int. Conf. Mach. Learn.,
2015 , pp. 2048–2057.Google Scholar - [147] , “Learning a combined model of visual saliency for fixation prediction,” IEEE Trans. Image Process., vol. 25, no. 4, pp. 1566–1579, Apr. 2016.Google Scholar
Digital Library
- [148] , “High-order attention models for visual question answering,” in Proc. Neural Inf. Process. Syst.,
2017 , pp. 3664–3674.Google Scholar - [149] , “Hierarchical question-image co-attention for visual question answering,” in Proc. Neural Inf. Process. Syst.,
2016 , pp. 289–297.Google Scholar - [150] , “Learning visual attention to identify people with autism spectrum disorder,” in Proc. IEEE Int. Conf. Comput. Vis.,
2017 , pp. 3267–3276.Google Scholar - [151] , “Atypical visual saliency in autism spectrum disorder quantified through model-based eye tracking,” Neuron, vol. 88, no. 3, pp. 604–616, 2015.Google Scholar
- [152] , “Guiding human gaze with convolutional neural networks,” arXiv: 1712.06492, 2017.Google Scholar
- [153] , “Faster gaze prediction with dense networks and fisher pruning,” arXiv: 1801.05787, 2018.Google Scholar
- [154] , “Human attention in image captioning: Dataset and analysis,” in Proc. Int. Conf. Comput. Vis., 2019.Google Scholar
- [155] , “Object detectors emerge in deep scene cnns,” in Proc. Int. Conf. Learn. Representations, 2015.Google Scholar
- [156] , “Light-weighted saliency detection with distinctively lower memory cost and model size,” arXiv: 1901.05002, 2019.Google Scholar
- [157] , “Pros and cons of gan evaluation measures,” Comput. Vis. Image Understanding, vol. 179, pp. 41–65, 2019.Google Scholar
Digital Library
Index Terms
Saliency Prediction in the Deep Learning Era: Successes and Limitations
Recommendations
Towards explainable deep visual saliency models
AbstractDeep neural networks have shown their profound impact on achieving human-level performance in visual saliency prediction. However, it is still unclear how they learn their task and what it means in terms of understanding human visual ...
Highlights- A framework to derive explainable saliency model from its corresponding deep model.
Audio–visual collaborative representation learning for Dynamic Saliency Prediction
AbstractThe Dynamic Saliency Prediction (DSP) task simulates the human selective attention mechanism to perceive a dynamic scene, which is significant and imperative in many vision tasks. Most of existing methods only consider visual cues, ...
Deep Visual Saliency on Stereoscopic Images
Visual saliency on stereoscopic 3D (S3D) images has been shown to be heavily influenced by image quality. Hence, this dependency is an important factor in image quality prediction, image restoration and discomfort reduction, but it is still very difficult ...




Comments