skip to main content
research-article

Structure-Aware Deep Learning for Product Image Classification

Authors Info & Claims
Published:24 January 2019Publication History
Skip Abstract Section

Abstract

Automatic product image classification is a task of crucial importance with respect to the management of online retailers. Motivated by recent advancements of deep Convolutional Neural Networks (CNN) on image classification, in this work we revisit the problem in the context of product images with the existence of a predefined categorical hierarchy and attributes, aiming to leverage the hierarchy and attributes to improve classification accuracy. With these structure-aware clues, we argue that more advanced deep models could be developed beyond the flat one-versus-all classification performed by conventional CNNs. To this end, novel efforts of this work include a salient-sensitive CNN that gazes into the product foreground by inserting a dedicated spatial attention module; a multiclass regression-based refinement that is expected to predict more accurately by merging prediction scores from multiple preceding CNNs, each corresponding to a distinct classifier in the hierarchy; and a multitask deep learning architecture that effectively explores correlations among categories and attributes for categorical label prediction. Experimental results on nearly 1 million real-world product images basically validate the effectiveness of the proposed efforts individually and jointly, from which performance gains are observed.

References

  1. Shanshan Ai, Caiyan Jia, and Zhineng Chen. 2017. Large-scale product classification via spatial attention based CNN learning and multi-class regression. In Proceedings of the International Conference on Multimedia Modeling, Reykjavik, Iceland. Springer, 176--188.Google ScholarGoogle ScholarCross RefCross Ref
  2. Jinfeng Bai, Zhineng Chen, Bailan Feng, and Bo Xu. 2014. Image character recognition using deep convolutional neural network learned from different languages. In Proceedings of the International Conference on Image Processing, Paris, France. IEEE, 2560--2564.Google ScholarGoogle ScholarCross RefCross Ref
  3. Lunshao Chai, Zhen Qin, Honggang Zhang, Jun Guo, and Christian R Shelton. 2012. Re-ranking using compression-based distance measure for content-based commercial product image retrieval. In Proceedings of the International Conference on Image Processing, Lake Buena Vista, Orlando, FL. IEEE, 1941--1944.Google ScholarGoogle ScholarCross RefCross Ref
  4. Jingjing Chen and Chong-Wah Ngo. 2016. Deep-based ingredient recognition for cooking recipe retrieval. In Proceedings of the 2016 ACM Conference on Multimedia, Amsterdam, Netherlands. ACM, 32--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Zhineng Chen, Juan Cao, Tian Xia, Yicheng Song, Yongdong Zhang, and Jintao Li. 2011. Web video retagging. Multimedia Tools and Applications 55, 1 (2011), 53--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Zhineng Chen, Chong-Wah Ngo, Wei Zhang, Juan Cao, and Yugang Jiang. 2014. Name-face association in web videos: A large-scale dataset, baselines, and open issues. Journal of Computer Science and Technology 29, 5 (2014), 785--798.Google ScholarGoogle ScholarCross RefCross Ref
  7. Zhineng Chen, Wei Zhang, Bin Deng, Hongtao Xie, and Xiaoyan Gu. 2019. Name-face association with web facial image supervision. Multimedia Systems (2017).Google ScholarGoogle Scholar
  8. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, Florida. IEEE, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA. 4476--4484.Google ScholarGoogle ScholarCross RefCross Ref
  10. Marian George and Christian Floerkemeier. 2014. Recognizing products: A per-exemplar multi-label image classification approach. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland. Springer, 440--455.Google ScholarGoogle ScholarCross RefCross Ref
  11. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH. 580--587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada. 770--778.Google ScholarGoogle Scholar
  13. Gao Huang, Zhuang Liu, Kilian Q. Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA. 4700--4708.Google ScholarGoogle Scholar
  14. Shaoli Huang, Zhe Xu, Dacheng Tao, and Ya Zhang. 2016. Part-stacked CNN for fine-grained visual categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada. 1173--1182.Google ScholarGoogle ScholarCross RefCross Ref
  15. Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial transformer networks. Neural Information Processing Systems (2015), 2017--2025. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yugang Jiang, Jun Yang, Chongwah Ngo, and Alexander G. Hauptmann. 2010. Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Transactions on Multimedia 12, 1 (2010), 42--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yu-Gang Jiang, Zuxuan Wu, Jun Wang, Xiangyang Xue, and Shih-Fu Chang. 2018. Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2 (2018), 352--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-Fei Li. 2011. Novel dataset for fine-grained image categorization: Stanford dogs. In Proceedings of the CVPR Workshop on Fine-Grained Visual Categorization (FGVC), Vol. 2. 1.Google ScholarGoogle Scholar
  19. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Neural Information Processing Systems, Lake Tahoe, Nevada. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hao Lei, Kuizhi Mei, Jingmin Xin, Peixiang Dong, and Jianping Fan. 2016. Hierarchical learning of large-margin metrics for large-scale image classification. Neurocomputing 208 (2016), 46--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees G. M. Snoek, and Alberto Del Bimbo. 2016. Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Computing Surveys 49, 1 (2016), 14:1--14:39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zhetao Li, Jie Zhang, Kaihua Zhang, and Zhiyong Li. 2018. Visual tracking with weighted adaptive local sparse appearance model via spatio-temporal context learning. IEEE Transactions on Image Processing 27, 9 (2018), 4478--4489.Google ScholarGoogle ScholarCross RefCross Ref
  23. Di Lin, Xiaoyong Shen, Cewu Lu, and Jiaya Jia. 2015. Deep lac: Deep localization, alignment and classification for fine-grained recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts. IEEE, 1666--1674.Google ScholarGoogle ScholarCross RefCross Ref
  24. Tsungyu Lin, Aruni Roychowdhury, and Subhransu Maji. 2015. Bilinear CNN models for fine-grained visual recognition. In Proceedings of the International Conference on Computer Vision, Santiago, Chile. 1449--1457. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada. 1096--1104.Google ScholarGoogle ScholarCross RefCross Ref
  26. Shiyang Lu, Tao Mei, Jingdong Wang, Jian Zhang, Zhiyong Wang, and Shipeng Li. 2015. Exploratory product image search with circle-to-search interaction. IEEE Transactions on Circuits and Systems for Video Technology 25, 7 (2015), 1190--1202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Changzhi Luo, Zhetao Li, Kaizhu Huang, Jiashi Feng, and Meng Wang. 2018. Zero-shot learning via attribute regression and class prototype rectification. IEEE Transactions on Image Processing 27, 2 (2018), 637--648.Google ScholarGoogle ScholarCross RefCross Ref
  28. Tiendung Mai, Thanh Duc Ngo, Duydinh Le, Duc Anh Duong, Kiem Hoang, and Shinichi Satoh. 2017. Efficient large-scale multi-class image classification by learning balanced trees. Computer Vision and Image Understanding 156 (2017), 151--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yingwei Pan, Ting Yao, Houqiang Li, Chong-Wah Ngo, and Tao Mei. 2015. Semi-supervised hashing with semantic confidence for large scale visual search. In Proceedings of the ACM SIGIR Conference, Santiago, Chile. ACM, 53--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Florent Perronnin and Diane Larlus. 2015. Fisher vectors meet neural networks: A hybrid classification architecture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts. IEEE, 3743--3752.Google ScholarGoogle ScholarCross RefCross Ref
  31. Zhaofan Qiu, Yingwei Pan, Ting Yao, and Tao Mei. 2017. Deep semantic hashing with generative adversarial networks. In Proceedings of the ACM SIGIR Conference, Tokyo, Japan. ACM, 225--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Scott E Reed, Zeynep Akata, Honglak Lee, and Bernt Schiele. 2016. Learning deep representations of fine-grained visual descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada. 49--58.Google ScholarGoogle ScholarCross RefCross Ref
  33. Jorge Sánchez and Florent Perronnin. 2011. High-dimensional signature compression for large-scale image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO. IEEE, 1665--1672. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv: Computer Vision and Pattern Recognition (2013).Google ScholarGoogle Scholar
  35. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (2015).Google ScholarGoogle Scholar
  36. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts. IEEE, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  37. Dequan Wang, Zhiqiang Shen, Jie Shao, Wei Zhang, Xiangyang Xue, and Zheng Zhang. 2015. Multiple granularity descriptors for fine-grained categorization. In Proceedings of the International Conference on Computer Vision, Boston, Massachusetts. IEEE, 2399--2406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Peter Welinder, Steve Branson, Takeshi Mita, Catherine Wah, Florian Schroff, Serge Belongie, and Pietro Perona. 2010. Caltech-UCSD birds 200. California Institute of Technology (2010).Google ScholarGoogle Scholar
  39. Qiong Wu and Pierre Boulanger. 2016. Enhanced reweighted MRFs for efficient fashion image parsing. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 3 (2016), 42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Hongtao Xie, Ke Gao, Yongdong Zhang, and Jintao Li. 2011. Local geometric consistency constraint for image retrieval. In Proceedings of the International Conference on Image Processing, Belgium, Brussels. IEEE, 101--104.Google ScholarGoogle ScholarCross RefCross Ref
  41. Hongtao Xie, Yongdong Zhang, Jianlong Tan, Guo Li, and Jintao Li. 2014. Contextual query expansion for image retrieval. IEEE Transactions on Multimedia 16, 4 (2014), 1104--1114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Lexing Xie, Rong Yan, Jelena Tešić, Apostol Natsev, and John R. Smith. 2010. Probabilistic visual concept trees. In Proceedings of the 18th ACM international Conference on Multimedia, Firenze, Italy. ACM, 867--870. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. Computer Science (2015), 2048--2057. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis Decoste, Wei Di, and Yizhou Yu. 2015. HD-CNN: Hierarchical deep convolutional neural networks for large scale visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts. IEEE, 2740--2748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alexander J. Smola. 2016. Stacked attention networks for image question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada. 21--29.Google ScholarGoogle Scholar
  46. Ting Yao, Fuchen Long, Tao Mei, and Yong Rui. 2016. Deep semantic-preserving and ranking-based hashing for image retrieval. In Proceedings of the International Joint Conferences on Artificial Intelligence, New York, NY. 3931--3937. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Chunjie Zhang, Jian Cheng, and Qi Tian. 2018. Image-level classification by hierarchical structure learning with visual and semantic similarities. Information Sciences 422 (2018), 271--281.Google ScholarGoogle ScholarCross RefCross Ref
  48. Xiaopeng Zhang, Hongkai Xiong, Wengang Zhou, Weiyao Lin, and Qi Tian. 2016. Picking deep filter responses for fine-grained image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada. 1134--1142.Google ScholarGoogle ScholarCross RefCross Ref
  49. Shiai Zhu, Xiaoyong Wei, and Chong-Wah Ngo. 2014. Collaborative error reduction for hierarchical classification. Computer Vision and Image Understanding 124 (2014), 79--90.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Structure-Aware Deep Learning for Product Image Classification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 1s
      Special Section on Deep Learning for Intelligent Multimedia Analytics and Special Section on Multi-Modal Understanding of Social, Affective and Subjective Attributes of Data
      January 2019
      265 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3309769
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 January 2019
      • Accepted: 1 June 2018
      • Revised: 1 April 2018
      • Received: 1 November 2017
      Published in tomm Volume 15, Issue 1s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!