Abstract
The collection of internet images has been growing in an astonishing speed. It is undoubted that these images contain rich visual information that can be useful in many applications, such as visual media creation and data-driven image synthesis. In this article, we focus on the methodologies for building a visual object database from a collection of internet images. Such database is built to contain a large number of high-quality visual objects that can help with various data-driven image applications. Our method is based on dense proposal generation and objectness-based re-ranking. A novel deep convolutional neural network is designed for the inference of proposal objectness, the probability of a proposal containing optimally located foreground object. In our work, the objectness is quantitatively measured in regard of completeness and fullness, reflecting two complementary features of an optimal proposal: a complete foreground and relatively small background. Our experiments indicate that object proposals re-ranked according to the output of our network generally achieve higher performance than those produced by other state-of-the-art methods. As a concrete example, a database of over 1.2 million visual objects has been built using the proposed method, and has been successfully used in various data-driven image applications.
Supplemental Material
Available for Download
Supplemental movie and image files for, Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness Assessment
- {n.d.}. https://www.instagram.com. Accessed: 2018-May-20.Google Scholar
- {n.d.}. https://www.flickr.com. Accessed: 2018-May-20.Google Scholar
- {n.d.}. https://www.facebook.com. Accessed: 2018-May-20.Google Scholar
- Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. 2012. Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2189--2202. Google Scholar
Digital Library
- Pablo Arbeláez, Jordi Pont-Tuset, Jonathan T. Barron, Ferran Marques, and Jitendra Malik. 2014. Multiscale combinatorial grouping. In Computer Vision and Pattern Recognition.Google Scholar
- Neelima Chavali, Harsh Agrawal, Aroma Mahendru, and Dhruv Batra. 2016. Object-proposal evaluation protocol is ‘gameable’. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Cross Ref
- Qifeng Chen, Dingzeyu Li, and Chi-Keung Tang. 2013. KNN matting. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 9 (2013), 2175--2188. Google Scholar
Digital Library
- Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2Photo: Internet image montage. ACM Transactions on Graphics (TOG) 28, 5, Article 124 (2009), 10 pages. Google Scholar
Digital Library
- Xinlei Chen, Abhinav Shrivastava, and Abhinav Gupta. 2013. NEIL: Extracting visual knowledge from web data. In 2013 IEEE International Conference on Computer Vision. 1409--1416. Google Scholar
Digital Library
- Alex Yong-Sang Chia, Shaojie Zhuo, Raj Kumar Gupta, Yu-Wing Tai, Siu-Yeung Cho, Ping Tan, and Stephen Lin. 2011. Semantic colorization with internet images. In ACM Transactions on Graphics (TOG), Vol. 30. ACM, 156. Google Scholar
Digital Library
- Minsu Cho, Suha Kwak, Cordelia Schmid, and Jean Ponce. 2015. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1201--1210.Google Scholar
Cross Ref
- Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, and Jian Sun. 2016. Instance-sensitive fully convolutional networks. Proceedings of European Conference on Computer Vision (2016).Google Scholar
Cross Ref
- Santosh K. Divvala, Ali Farhadi, and Carlos Guestrin. 2014. Learning everything about anything: Webly-supervised visual concept learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google Scholar
Digital Library
- Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. {n.d.}. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.Google Scholar
- Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. {n.d.}. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.Google Scholar
- Amir Ghodrati, Ali Diba, Marco Pedersoli, Tinne Tuytelaars, and Luc Van Gool. 2015. Deep proposal: Hunting objects by cascading deep convolutional layers. In Proceedings of the IEEE International Conference on Computer Vision. 2578--2586. Google Scholar
Digital Library
- Jingwei Guan, Shuai Yi, Xingyu Zeng, Wai-Kuen Cham, and Xiaogang Wang. 2017. Visual importance and distortion guided deep image quality assessment framework. IEEE Transactions on Multimedia 19, 11 (Nov. 2017), 2505--2520.Google Scholar
Cross Ref
- Hedi Harzallah, Frederick Jurie, and Cordelia Schmid. 2009. Combining efficient object localization and image classification. In 2009 IEEE 12th International Conference on Computer Vision. 237--244.Google Scholar
Cross Ref
- James Hays and Alexei A. Efros. 2007. Scene completion using millions of photographs. ACM Transactions on Graphics (SIGGRAPH 2007) 26, 3 (2007). Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 (2014).Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).Google Scholar
Digital Library
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014).Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105. Google Scholar
Digital Library
- Weicheng Kuo, Bharath Hariharan, and Jitendra Malik. 2015. DeepBox: Learning objectness with convolutional networks. CoRR abs/1505.02146 (2015). http://arxiv.org/abs/1505.02146. Google Scholar
Digital Library
- Jean-François Lalonde, Derek Hoiem, Alexei A. Efros, Carsten Rother, John Winn, and Antonio Criminisi. 2007. Photo clip art. ACM Transactions on Graphics (SIGGRAPH 2007) 26, 3 (2007), 3. Google Scholar
Digital Library
- Guanbin Li, Yuan Xie, Liang Lin, and Yizhou Yu. 2017. Instance-level salient object segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Cross Ref
- Guanbin Li and Yizhou Yu. 2016. Visual saliency detection based on multiscale deep CNN features. IEEE Transactions on Image Processing 25, 11 (2016), 5012--5024. Google Scholar
Digital Library
- Guanbin Li and Yizhou Yu. 2018. Contrast-oriented deep neural networks for salient object detection. IEEE Transactions on Neural Networks and Learning Systems 99 (2018), 1--14.Google Scholar
Cross Ref
- Nan Li, Yifang Xu, and Chao Wang. 2017. Quasi-homography warps in image stitching. IEEE Transactions on Multimedia PP, 99 (2017), 1--1.Google Scholar
- Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. CoRR abs/1405.0312 (2014). http://arxiv.org/abs/1405.0312.Google Scholar
- Cewu Lu, Shu Liu, Jiaya Jia, and Chi-Keung Tang. 2015. Contour box: Rejecting object proposals without explicit closed contours. In The IEEE International Conference on Computer Vision (ICCV). Google Scholar
Digital Library
- Wenting Lu, Jingxuan Li, Tao Li, Weidong Guo, Honggang Zhang, and Jun Guo. 2013. Web multimedia object classification using cross-domain correlation knowledge. IEEE Transactions on Multimedia 15, 8 (Dec 2013), 1920--1929. Google Scholar
Digital Library
- Lei Ma, Hongliang Li, Fanman Meng, Qingbo Wu, and King Ngi Ngan. 2017. Learning efficient binary codes from high-level feature representations for multilabel image retrieval. IEEE Transactions on Multimedia 19, 11 (Nov 2017), 2545--2560.Google Scholar
Cross Ref
- Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Pablo Arbeláez, and Luc Van Gool. 2016. Convolutional oriented boundaries. In European Conference on Computer Vision (ECCV).Google Scholar
Cross Ref
- Pedro O. Pinheiro, Ronan Collobert, and Piotr Dollár. 2015. Learning to segment object candidates. CoRR abs/1506.06204 (2015). http://arxiv.org/abs/1506.06204.Google Scholar
- Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, and Piotr Dollár. 2016. Learning to refine object segments. CoRR abs/1603.08695 (2016). http://arxiv.org/abs/1603.08695.Google Scholar
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640 (2015).Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015).Google Scholar
- Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 3 (2004), 309--314. Google Scholar
Digital Library
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252. Google Scholar
Digital Library
- Bryan C. Russell, Ricardo Martin-Brualla, Daniel J. Butler, Steven M. Seitz, and Luke Zettlemoyer. 2013. 3D Wikipedia: Using online text to automatically label and navigate reconstructed geometry. ACM Transactions on Graphics (SIGGRAPH Asia 2013) 32, 6 (2013). Google Scholar
Digital Library
- Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The sketchy database: Learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG) 35, 4 (2016), 119. Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).Google Scholar
- Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2006. Photo tourism: Exploring photo collections in 3D. In SIGGRAPH Conference Proceedings. ACM, New York, 835--846. Google Scholar
Digital Library
- Kevin Tang, Armand Joulin, Li-Jia Li, and Li Fei-Fei. 2014. Co-localization in real-world images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google Scholar
Digital Library
- Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, and Ming-Hsuan Yang. 2016. Sky is not the limit: Semantic-aware sky replacement. ACM Transactions on Graphics (Proc. SIGGRAPH) 35, 4 (2016). Google Scholar
Digital Library
- J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision 104, 2 (2013), 154--171. https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013. Google Scholar
Digital Library
- Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). ACM, New York, 157--166. Google Scholar
Digital Library
- Jiang Wang, Yang Song, Thomas Leung, Chuck Rosenberg, Jingbin Wang, James Philbin, Bo Chen, and Ying Wu. 2014. Learning fine-grained image similarity with deep ranking. CoRR abs/1404.4661 (2014). http://arxiv.org/abs/1404.4661. Google Scholar
Digital Library
- Miao Wang, Yu-Kun Lai, Yuan Liang, Ralph R. Martin, and Shi-Min Hu. 2014. BiggerPicture: Data-driven image extrapolation using graph matching. ACM Transactions on Graphics (TOG) 33, 6 (2014), 173. Google Scholar
Digital Library
- Wenying Wang, Dongming Zhang, Yongdong Zhang, Jintao Li, and Xiaoguang Gu. 2011. Robust spatial matching for object retrieval and its parallel implementation on GPU. IEEE Transactions on Multimedia 13, 6 (Dec 2011), 1308--1318. Google Scholar
Digital Library
- Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin. 2013. Regionlets for generic object detection. In 2013 IEEE International Conference on Computer Vision. 17--24. Google Scholar
Digital Library
- Kan Wu and Yizhou Yu. 2018. Automatic object extraction from images using deep neural networks and the level-set method. IET Image Processing (February 2018). http://digital-library.theiet.org/content/journals/10.1049/iet-ipr.2017.1144.Google Scholar
- Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. 2010. Sun database: Large-scale scene recognition from abbey to zoo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 3485--3492.Google Scholar
Cross Ref
- Yao Xiao, Cewu Lu, Efstratios Tsougenis, Yongyi Lu, and Chi-Keung Tang. 2015. Complexity-adaptive distance metric for object proposals generation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Cross Ref
- Saining Xie and Zhuowen Tu. 2015. Holistically-nested edge detection. In Proceedings of IEEE International Conference on Computer Vision. Google Scholar
Digital Library
- Linjun Yang, Bo Geng, Yang Cai, Alan Hanjalic, and Hua Xian-Sheng. 2011. Object retrieval using visual query context. IEEE Transactions on Multimedia 13, 6 (Dec 2011), 1295--1307. Google Scholar
Digital Library
- Yazhou Yao, Jian Zhang, Fumin Shen, Xiansheng Hua, Jingsong Xu, and Zhenmin Tang. 2017. Exploiting web images for dataset construction: A domain robust approach. IEEE Transactions on Multimedia 19, 8 (Aug. 2017), 1771--1784.Google Scholar
Digital Library
- Fang-Lue Zhang, Miao Wang, and Shi-Min Hu. 2013. Aesthetic image enhancement by dependence-aware object recomposition. IEEE Transactions on Multimedia 15, 7 (Nov. 2013), 1480--1490. Google Scholar
Digital Library
- Huaizheng Zhang, Han Hu, Guanyu Gao, Yonggang Wen, and Kyle Guan. 2018. Deepqoe: A unified framework for learning to predict video QoE. In 2018 IEEE International Conference on Multimedia and Expo (ICME). 1--6.Google Scholar
Cross Ref
- Jianming Zhang, Shuga Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, and Radomír Měch. 2015. Salient object subitizing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Cross Ref
- Jing Zhang, Ying Yang, Qi Tian, Li Zhuo, and Xin Liu. 2017. Personalized social image recommendation method based on user-image-tag model. IEEE Transactions on Multimedia 19, 11 (Nov 2017), 2439--2449.Google Scholar
Cross Ref
- Larry Zitnick and Piotr Dollar. 2014. Edge boxes: Locating object proposals from edges. In ECCV. https://www.microsoft.com/en-us/research/publication/edge-boxes-locating-object-proposals-from-edges/.Google Scholar
Index Terms
Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness Assessment
Recommendations
Spatiotemporal salient object detection by integrating with objectness
This paper proposes a novel spatiotemporal salient object detection method by integrating saliency and objectness, for videos with complicated motion and complex scenes. The initial salient object detection result is first built upon both saliency map ...
Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks
Object proposals have recently emerged as an essential cornerstone for object detection. The current state-of-the-art object detectors employ object proposals to detect objects within a modest set of candidate bounding box proposals instead of ...
A hierarchical model to learn object proposals and its applications
Multimedia in technology enhanced learningGenerating class-agnostic object proposals followed by classification has recently become a common paradigm for object detection. Current state-of-the-art approaches typically generate generic objects, which serve as candidates for object classification. ...






Comments