skip to main content
research-article

Automatic Data Augmentation from Massive Web Images for Deep Visual Recognition

Published:24 July 2018Publication History
Skip Abstract Section

Abstract

Large-scale image datasets and deep convolutional neural networks (DCNNs) are the two primary driving forces for the rapid progress in generic object recognition tasks in recent years. While lots of network architectures have been continuously designed to pursue lower error rates, few efforts are devoted to enlarging existing datasets due to high labeling costs and unfair comparison issues. In this article, we aim to achieve lower error rates by augmenting existing datasets in an automatic manner. Our method leverages both the web and DCNN, where the web provides massive images with rich contextual information, and DCNN replaces humans to automatically label images under the guidance of web contextual information. Experiments show that our method can automatically scale up existing datasets significantly from billions of web pages with high accuracy. The performance on object recognition tasks and transfer learning tasks have been significantly improved by using the automatically augmented datasets, which demonstrates that more supervisory information has been automatically gathered from the web. Both the dataset and models trained on the dataset have been made publicly available.

References

  1. Martin Arjovsky, Soumith Chintala, and Lãĺon Bottou. 2017. Wasserstein GAN. arXiv:1701.07875 (2017).Google ScholarGoogle Scholar
  2. Yalong Bai, Kuiyuan Yang, Wei Yu, Chang Xu, Wei-Ying Ma, and Tiejun Zhao. 2015. Automatic image dataset construction from click-through logs using deep neural network. In Proceedings of the 23rd ACM International Conference on Multimedia. 441--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brendan Collins, Jia Deng, Kai Li, and Li Fei-Fei. 2008. Towards scalable dataset construction: An active learning approach. In Proceedings of the European Conference on Computer Vision. Springer, 86--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR’09). IEEE, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2010. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Ewerth, K. Ballafkir, M. Muhling, D. Seiler, and B. Freisleben. 2012. Long-term incremental web-supervised learning of visual concepts via random savannas. IEEE Transactions on Multimedia 14, 4 (2012), 1008--1020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Li Fei-Fei, Rob Fergus, and Pietro Perona. 2007. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Image Understanding 106, 1 (2007), 59--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gregory Griffin, Alex Holub, and Pietro Perona. 2007. Caltech-256 object category dataset. California Institute of Technology.Google ScholarGoogle Scholar
  9. David R. Hardoon, Sandor Szedmak, and John Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16, 12 (2004), 2639--2664. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385 (2015).Google ScholarGoogle Scholar
  11. Xiaofei He, Deng Cai, Ji-Rong Wen, Wei-Ying Ma, and Hong-Jiang Zhang. 2007. Clustering and searching WWW images using link and page layout analysis. ACM Transactions on Multimedia Computing, Communications, and Applications 3, 2 (May 2007), Article 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Xian-Sheng Hua, Linjun Yang, Jingdong Wang, Jing Wang, Ming Ye, Kuansan Wang, Yong Rui, and Jin Li. 2013. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In Proceedings of the 21st ACM International Conference on Multimedia. ACM, 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv:1607.01759.Google ScholarGoogle Scholar
  14. Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-Fei Li. 2011. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC), Vol. 2. 1 page.Google ScholarGoogle Scholar
  15. Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, and Li Fei-Fei. 2015. The unreasonable effectiveness of noisy data for fine-grained recognition. arXiv:1511.06789.Google ScholarGoogle Scholar
  16. Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, and others. 2016. Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv:1602.07332 (2016).Google ScholarGoogle Scholar
  17. Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical report, University of Toronto, Vol. 1, no. 4.Google ScholarGoogle Scholar
  18. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wen Li, Li Niu, and Dong Xu. 2014. Exploiting privileged information from web data for image categorization. In Proceedings of the European Conference on Computer Vision. Springer, 437--452.Google ScholarGoogle ScholarCross RefCross Ref
  20. Wen Li, Limin Wang, Eirikur Agustsson, and Luc Van Gool. 2017. WebVision: Visual Understanding by Learning from Web Data. Retrieved August 6, 2017 from http://www.vision.ee.ethz.ch/webvision.Google ScholarGoogle Scholar
  21. Z. Li and J. Tang. 2015. Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Transactions on Multimedia 17, 11 (2015), 1989--1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft Coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740--755.Google ScholarGoogle Scholar
  23. Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, and Andrea Vedaldi. 2013. Fine-grained visual classification of aircraft. arXiv:1306.5151 (2013).Google ScholarGoogle Scholar
  24. Nizar Massouh, Francesca Babiloni, Tatiana Tommasi, Jay Young, Nick Hawes, and Barbara Caputo. 2017. Learning deep visual object models from noisy web data: How to make it work. arXiv:1702.08513 (2017).Google ScholarGoogle Scholar
  25. George A. Miller. 1995. WordNet: A lexical database for English. Communications of the ACM (1995). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Roberto Navigli and Simone Paolo Ponzetto. 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193 (2012), 217--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2014. ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575Google ScholarGoogle Scholar
  28. Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 806--813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google ScholarGoogle Scholar
  30. Sainbayar Sukhbaatar, Joan Bruna, Manohar Paluri, Lubomir Bourdev, and Rob Fergus. 2014. Training convolutional networks with noisy labels. arXiv:1406.2080.Google ScholarGoogle Scholar
  31. Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV’17). IEEE, 843--852.Google ScholarGoogle ScholarCross RefCross Ref
  32. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  33. Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Communications of the ACM 59, 2 (2016), 64--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Antonio Torralba and Alexei A Efros. 2011. Unbiased look at dataset bias. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, 1521--1528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Antonio Torralba, Rob Fergus, and William T. Freeman. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 11 (2008), 1958--1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Phong D. Vo, Alexandru Ginsca, Hervé Le Borgne, and Adrian Popescu. 2015. On deep representation learning from noisy web images. arXiv:1512.04785.Google ScholarGoogle Scholar
  37. Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The Caltech-UCSD birds-200-2011 dataset. California Institute of Technology.Google ScholarGoogle Scholar
  38. Shuang Wang and Shuqiang Jiang. 2015. INSTRE: A new benchmark for instance-level object retrieval and recognition. ACM Transaactions of Multimedia Computing, Communications, and Applications 11, (Feb. 2015) 3, Article 37, 21 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. F. Wu, Z. Wang, Z. Zhang, Y. Yang, J. Luo, W. Zhu, and Y. Zhuang. 2015. Weakly semi-supervised deep learning for multi-label image annotation. IEEE Transactions on Big Data 1, 3 (2015), 109--122.Google ScholarGoogle ScholarCross RefCross Ref
  40. Tong Xiao, Tian Xia, Yi Yang, Chang Huang, and Xiaogang Wang. 2015. Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2691--2699.Google ScholarGoogle Scholar
  41. Jun Xu, Tao Mei, Ting Yao, and Yong Rui. 2016. Msr-vtt: A large video description dataset for bridging video and language. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5288--5296.Google ScholarGoogle ScholarCross RefCross Ref
  42. Y. Yao, J. Zhang, F. Shen, X. Hua, J. Xu, and Z. Tang. 2017. Exploiting web images for dataset construction: A domain robust approach. IEEE Transactions on Multimedia 19, 8 (2017), 1771--1784.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yazhou Yao, Fumin Shen, Jian Zhang, Li Liu, Zhenmin Tang, and Ling Shao. 2018. Discovering and distinguishing multiple visual senses for web learning. IEEE Transactions on Multimedia.Google ScholarGoogle Scholar
  44. W. Yu, K. Yang, Y. Bai, H. Yao, and Y. Rui. 2015. Learning cross space mapping via DNN using large scale click-through logs. IEEE Transactions on Multimedia 17, 11 (2015), 2000--2007.Google ScholarGoogle ScholarCross RefCross Ref
  45. Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. Springer, 818--833.Google ScholarGoogle Scholar
  46. Lei Zhang and Yong Rui. 2013. Image search-from thousands to billions in 20 years. ACM Transactions on Multimedia Comput. Communications, and Applications 9, 1s (Oct. 2013), Article 36, 20 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems. 487--495. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic Data Augmentation from Massive Web Images for Deep Visual Recognition

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 14, Issue 3
          August 2018
          249 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3241977
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 July 2018
          • Revised: 1 April 2018
          • Accepted: 1 April 2018
          • Received: 1 January 2018
          Published in tomm Volume 14, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!