skip to main content
research-article

Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness Assessment

Authors Info & Claims
Published:08 August 2019Publication History
Skip Abstract Section

Abstract

The collection of internet images has been growing in an astonishing speed. It is undoubted that these images contain rich visual information that can be useful in many applications, such as visual media creation and data-driven image synthesis. In this article, we focus on the methodologies for building a visual object database from a collection of internet images. Such database is built to contain a large number of high-quality visual objects that can help with various data-driven image applications. Our method is based on dense proposal generation and objectness-based re-ranking. A novel deep convolutional neural network is designed for the inference of proposal objectness, the probability of a proposal containing optimally located foreground object. In our work, the objectness is quantitatively measured in regard of completeness and fullness, reflecting two complementary features of an optimal proposal: a complete foreground and relatively small background. Our experiments indicate that object proposals re-ranked according to the output of our network generally achieve higher performance than those produced by other state-of-the-art methods. As a concrete example, a database of over 1.2 million visual objects has been built using the proposed method, and has been successfully used in various data-driven image applications.

Skip Supplemental Material Section

Supplemental Material

References

  1. {n.d.}. https://www.instagram.com. Accessed: 2018-May-20.Google ScholarGoogle Scholar
  2. {n.d.}. https://www.flickr.com. Accessed: 2018-May-20.Google ScholarGoogle Scholar
  3. {n.d.}. https://www.facebook.com. Accessed: 2018-May-20.Google ScholarGoogle Scholar
  4. Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. 2012. Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2189--2202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Pablo Arbeláez, Jordi Pont-Tuset, Jonathan T. Barron, Ferran Marques, and Jitendra Malik. 2014. Multiscale combinatorial grouping. In Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  6. Neelima Chavali, Harsh Agrawal, Aroma Mahendru, and Dhruv Batra. 2016. Object-proposal evaluation protocol is ‘gameable’. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  7. Qifeng Chen, Dingzeyu Li, and Chi-Keung Tang. 2013. KNN matting. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 9 (2013), 2175--2188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2Photo: Internet image montage. ACM Transactions on Graphics (TOG) 28, 5, Article 124 (2009), 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Xinlei Chen, Abhinav Shrivastava, and Abhinav Gupta. 2013. NEIL: Extracting visual knowledge from web data. In 2013 IEEE International Conference on Computer Vision. 1409--1416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Alex Yong-Sang Chia, Shaojie Zhuo, Raj Kumar Gupta, Yu-Wing Tai, Siu-Yeung Cho, Ping Tan, and Stephen Lin. 2011. Semantic colorization with internet images. In ACM Transactions on Graphics (TOG), Vol. 30. ACM, 156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Minsu Cho, Suha Kwak, Cordelia Schmid, and Jean Ponce. 2015. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1201--1210.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, and Jian Sun. 2016. Instance-sensitive fully convolutional networks. Proceedings of European Conference on Computer Vision (2016).Google ScholarGoogle ScholarCross RefCross Ref
  13. Santosh K. Divvala, Ali Farhadi, and Carlos Guestrin. 2014. Learning everything about anything: Webly-supervised visual concept learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. {n.d.}. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.Google ScholarGoogle Scholar
  15. Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. {n.d.}. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.Google ScholarGoogle Scholar
  16. Amir Ghodrati, Ali Diba, Marco Pedersoli, Tinne Tuytelaars, and Luc Van Gool. 2015. Deep proposal: Hunting objects by cascading deep convolutional layers. In Proceedings of the IEEE International Conference on Computer Vision. 2578--2586. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jingwei Guan, Shuai Yi, Xingyu Zeng, Wai-Kuen Cham, and Xiaogang Wang. 2017. Visual importance and distortion guided deep image quality assessment framework. IEEE Transactions on Multimedia 19, 11 (Nov. 2017), 2505--2520.Google ScholarGoogle ScholarCross RefCross Ref
  18. Hedi Harzallah, Frederick Jurie, and Cordelia Schmid. 2009. Combining efficient object localization and image classification. In 2009 IEEE 12th International Conference on Computer Vision. 237--244.Google ScholarGoogle ScholarCross RefCross Ref
  19. James Hays and Alexei A. Efros. 2007. Scene completion using millions of photographs. ACM Transactions on Graphics (SIGGRAPH 2007) 26, 3 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 (2014).Google ScholarGoogle Scholar
  21. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014).Google ScholarGoogle Scholar
  23. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Weicheng Kuo, Bharath Hariharan, and Jitendra Malik. 2015. DeepBox: Learning objectness with convolutional networks. CoRR abs/1505.02146 (2015). http://arxiv.org/abs/1505.02146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jean-François Lalonde, Derek Hoiem, Alexei A. Efros, Carsten Rother, John Winn, and Antonio Criminisi. 2007. Photo clip art. ACM Transactions on Graphics (SIGGRAPH 2007) 26, 3 (2007), 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Guanbin Li, Yuan Xie, Liang Lin, and Yizhou Yu. 2017. Instance-level salient object segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  27. Guanbin Li and Yizhou Yu. 2016. Visual saliency detection based on multiscale deep CNN features. IEEE Transactions on Image Processing 25, 11 (2016), 5012--5024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Guanbin Li and Yizhou Yu. 2018. Contrast-oriented deep neural networks for salient object detection. IEEE Transactions on Neural Networks and Learning Systems 99 (2018), 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  29. Nan Li, Yifang Xu, and Chao Wang. 2017. Quasi-homography warps in image stitching. IEEE Transactions on Multimedia PP, 99 (2017), 1--1.Google ScholarGoogle Scholar
  30. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. CoRR abs/1405.0312 (2014). http://arxiv.org/abs/1405.0312.Google ScholarGoogle Scholar
  31. Cewu Lu, Shu Liu, Jiaya Jia, and Chi-Keung Tang. 2015. Contour box: Rejecting object proposals without explicit closed contours. In The IEEE International Conference on Computer Vision (ICCV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wenting Lu, Jingxuan Li, Tao Li, Weidong Guo, Honggang Zhang, and Jun Guo. 2013. Web multimedia object classification using cross-domain correlation knowledge. IEEE Transactions on Multimedia 15, 8 (Dec 2013), 1920--1929. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Lei Ma, Hongliang Li, Fanman Meng, Qingbo Wu, and King Ngi Ngan. 2017. Learning efficient binary codes from high-level feature representations for multilabel image retrieval. IEEE Transactions on Multimedia 19, 11 (Nov 2017), 2545--2560.Google ScholarGoogle ScholarCross RefCross Ref
  34. Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Pablo Arbeláez, and Luc Van Gool. 2016. Convolutional oriented boundaries. In European Conference on Computer Vision (ECCV).Google ScholarGoogle ScholarCross RefCross Ref
  35. Pedro O. Pinheiro, Ronan Collobert, and Piotr Dollár. 2015. Learning to segment object candidates. CoRR abs/1506.06204 (2015). http://arxiv.org/abs/1506.06204.Google ScholarGoogle Scholar
  36. Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, and Piotr Dollár. 2016. Learning to refine object segments. CoRR abs/1603.08695 (2016). http://arxiv.org/abs/1603.08695.Google ScholarGoogle Scholar
  37. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640 (2015).Google ScholarGoogle Scholar
  38. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015).Google ScholarGoogle Scholar
  39. Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 3 (2004), 309--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Bryan C. Russell, Ricardo Martin-Brualla, Daniel J. Butler, Steven M. Seitz, and Luke Zettlemoyer. 2013. 3D Wikipedia: Using online text to automatically label and navigate reconstructed geometry. ACM Transactions on Graphics (SIGGRAPH Asia 2013) 32, 6 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The sketchy database: Learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG) 35, 4 (2016), 119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).Google ScholarGoogle Scholar
  44. Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2006. Photo tourism: Exploring photo collections in 3D. In SIGGRAPH Conference Proceedings. ACM, New York, 835--846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Kevin Tang, Armand Joulin, Li-Jia Li, and Li Fei-Fei. 2014. Co-localization in real-world images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, and Ming-Hsuan Yang. 2016. Sky is not the limit: Semantic-aware sky replacement. ACM Transactions on Graphics (Proc. SIGGRAPH) 35, 4 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision 104, 2 (2013), 154--171. https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). ACM, New York, 157--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Jiang Wang, Yang Song, Thomas Leung, Chuck Rosenberg, Jingbin Wang, James Philbin, Bo Chen, and Ying Wu. 2014. Learning fine-grained image similarity with deep ranking. CoRR abs/1404.4661 (2014). http://arxiv.org/abs/1404.4661. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Miao Wang, Yu-Kun Lai, Yuan Liang, Ralph R. Martin, and Shi-Min Hu. 2014. BiggerPicture: Data-driven image extrapolation using graph matching. ACM Transactions on Graphics (TOG) 33, 6 (2014), 173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wenying Wang, Dongming Zhang, Yongdong Zhang, Jintao Li, and Xiaoguang Gu. 2011. Robust spatial matching for object retrieval and its parallel implementation on GPU. IEEE Transactions on Multimedia 13, 6 (Dec 2011), 1308--1318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin. 2013. Regionlets for generic object detection. In 2013 IEEE International Conference on Computer Vision. 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Kan Wu and Yizhou Yu. 2018. Automatic object extraction from images using deep neural networks and the level-set method. IET Image Processing (February 2018). http://digital-library.theiet.org/content/journals/10.1049/iet-ipr.2017.1144.Google ScholarGoogle Scholar
  54. Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. 2010. Sun database: Large-scale scene recognition from abbey to zoo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 3485--3492.Google ScholarGoogle ScholarCross RefCross Ref
  55. Yao Xiao, Cewu Lu, Efstratios Tsougenis, Yongyi Lu, and Chi-Keung Tang. 2015. Complexity-adaptive distance metric for object proposals generation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  56. Saining Xie and Zhuowen Tu. 2015. Holistically-nested edge detection. In Proceedings of IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Linjun Yang, Bo Geng, Yang Cai, Alan Hanjalic, and Hua Xian-Sheng. 2011. Object retrieval using visual query context. IEEE Transactions on Multimedia 13, 6 (Dec 2011), 1295--1307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yazhou Yao, Jian Zhang, Fumin Shen, Xiansheng Hua, Jingsong Xu, and Zhenmin Tang. 2017. Exploiting web images for dataset construction: A domain robust approach. IEEE Transactions on Multimedia 19, 8 (Aug. 2017), 1771--1784.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Fang-Lue Zhang, Miao Wang, and Shi-Min Hu. 2013. Aesthetic image enhancement by dependence-aware object recomposition. IEEE Transactions on Multimedia 15, 7 (Nov. 2013), 1480--1490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Huaizheng Zhang, Han Hu, Guanyu Gao, Yonggang Wen, and Kyle Guan. 2018. Deepqoe: A unified framework for learning to predict video QoE. In 2018 IEEE International Conference on Multimedia and Expo (ICME). 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  61. Jianming Zhang, Shuga Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, and Radomír Měch. 2015. Salient object subitizing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  62. Jing Zhang, Ying Yang, Qi Tian, Li Zhuo, and Xin Liu. 2017. Personalized social image recommendation method based on user-image-tag model. IEEE Transactions on Multimedia 19, 11 (Nov 2017), 2439--2449.Google ScholarGoogle ScholarCross RefCross Ref
  63. Larry Zitnick and Piotr Dollar. 2014. Edge boxes: Locating object proposals from edges. In ECCV. https://www.microsoft.com/en-us/research/publication/edge-boxes-locating-object-proposals-from-edges/.Google ScholarGoogle Scholar

Index Terms

  1. Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness Assessment

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!