skip to main content
research-article

DeepProduct: Mobile Product Search With Portable Deep Features

Published:25 April 2018Publication History
Skip Abstract Section

Abstract

Features extracted by deep networks have been popular in many visual search tasks. This article studies deep network structures and training schemes for mobile visual search. The goal is to learn an effective yet portable feature representation that is suitable for bridging the domain gap between mobile user photos and (mostly) professionally taken product images while keeping the computational cost acceptable for mobile-based applications. The technical contributions are twofold. First, we propose an alternative of the contrastive loss popularly used for training deep Siamese networks, namely robust contrastive loss, where we relax the penalty on some positive and negative pairs to alleviate overfitting. Second, a simple multitask fine-tuning scheme is leveraged to train the network, which not only utilizes knowledge from the provided training photo pairs but also harnesses additional information from the large ImageNet dataset to regularize the fine-tuning process. Extensive experiments on challenging real-world datasets demonstrate that both the robust contrastive loss and the multitask fine-tuning scheme are effective, leading to very promising results with a time cost suitable for mobile product search scenarios.

References

  1. Sean Bell and Kavita Bala. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics 34, 4, 98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Gal Chechik, Varun Sharma, Uri Shalit, and Samy Bengio. 2010. Large scale online learning of image similarity through ranking. Journal of Machine Learning Research 11, 1109--1135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274.Google ScholarGoogle Scholar
  4. Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. IEEE, Los Alamitos, CA, 539--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Los Alamitos, CA, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  6. Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2013. Decaf: A deep convolutional activation feature for generic visual recognition. arXiv:1310.1531.Google ScholarGoogle Scholar
  7. Clement Farabet, Camille Couprie, Laurent Najman, and Yann LeCun. 2013. Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8, 1915--1929. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C. Berg, and Tamara L. Berg. 2015. Where to buy it: Matching street clothing photos in online shops. In Proceedings of the IEEE International Conference on Computer Vision. 3343--3351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, Los Alamitos, CA, 1735--1742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Junfeng He, Jinyuan Feng, Xianglong Liu, Tao Cheng, Tai-Hsu Lin, Hyunjin Chung, and Shih-Fu Chang. 2012. Mobile product search with bag of hash bits and boundary reranking. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, Los Alamitos, CA, 3005--3012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Los Alamitos, CA, 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  12. Junshi Huang, Rogerio S. Feris, Qiang Chen, and Shuicheng Yan. 2015. Cross-domain image retrieval with a dual attribute-aware ranking network. In Proceedings of the IEEE International Conference on Computer Vision. 1062--1070. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167.Google ScholarGoogle Scholar
  14. Yu-Gang Jiang and Jiajun Wang. 2016. Partial copy detection in videos: A benchmark and an evaluation of popular methods. IEEE Transactions on Big Data 2, 1, 32--42.Google ScholarGoogle ScholarCross RefCross Ref
  15. Yannis Kalantidis, Lyndon Kennedy, and Li-Jia Li. 2013. Getting the look: Clothing recognition and segmentation for automatic product suggestions in everyday photos. In Proceedings of the 3rd ACM Conference on Multimedia Retrieval. ACM, New York, NY, 105--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12), Vol. 1. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yin-Hsi Kuo, Wen-Huang Cheng, Hsuan-Tien Lin, and Winston H. Hsu. 2012. Unsupervised semantic feature discovery for image object retrieval and tag refinement. IEEE Transactions on Multimedia 14, 4, 1079--1090. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hanjiang Lai, Yan Pan, Ye Liu, and Shuicheng Yan. 2015. Simultaneous feature learning and hash coding with deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3270--3278.Google ScholarGoogle ScholarCross RefCross Ref
  19. Daryl Lim, Brian McFee, and Gert R. Lanckriet. 2013. Robust structural metric learning. In Proceedings of the 30th International Conference on Machine Learning. 615--623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Hanqing Lu, and Shuicheng Yan. 2012. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, Los Alamitos, CA, 3330--3337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wu Liu, Huadong Ma, Heng Qi, Dong Zhao, and Zhineng Chen. 2017. Deep learning hashing for mobile visual search. EURASIP Journal on Image and Video Processing 2017, 1, 17.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. DeepFashion: Powering robust clothes recognition and retrieval with rich annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Los Alamitos, CA, 1096--1104.Google ScholarGoogle ScholarCross RefCross Ref
  23. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 2579--2605.Google ScholarGoogle Scholar
  24. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv:1506.01497.Google ScholarGoogle Scholar
  25. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3, 211--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 815--823.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 806--813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Edgar Simo-Serra and Hiroshi Ishikawa. 2016. Fashion style in 128 floats: Joint ranking and classification using weak data for feature extraction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 298--307.Google ScholarGoogle ScholarCross RefCross Ref
  29. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  30. Koen E. A. Van de Sande, Jasper R. R. Uijlings, Theo Gevers, and Arnold W. M. Smeulders. 2011. Segmentation as selective search for object recognition. In Proceedings of the International Conference on Computer Vision. IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xi Wang, Zhenfeng Sun, Wenqiang Zhang, Yu Zhou, and Yu-Gang Jiang. 2016. Matching user photos to online products with robust deep features. In Proceedings of the 2016 ACM International Conference on Multimedia Retrieval. ACM, New York, NY, 7--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Pengcheng Wu, Steven C. H. Hoi, Hao Xia, Peilin Zhao, Dayong Wang, and Chunyan Miao. 2013. Online multimodal deep similarity learning with application to image retrieval. In Proceedings of the 21st ACM International Conference on Multimedia. ACM, New York, NY, 153--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, New York, NY, 461--470. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DeepProduct: Mobile Product Search With Portable Deep Features

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 14, Issue 2
      May 2018
      208 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3210458
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 April 2018
      • Accepted: 1 February 2018
      • Revised: 1 January 2018
      • Received: 1 October 2017
      Published in tomm Volume 14, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!