skip to main content
research-article

From Selective Deep Convolutional Features to Compact Binary Representations for Image Retrieval

Published:05 June 2019Publication History
Skip Abstract Section

Abstract

In the large-scale image retrieval task, the two most important requirements are the discriminability of image representations and the efficiency in computation and storage of representations. Regarding the former requirement, Convolutional Neural Network is proven to be a very powerful tool to extract highly discriminative local descriptors for effective image search. Additionally, to further improve the discriminative power of the descriptors, recent works adopt fine-tuned strategies. In this article, taking a different approach, we propose a novel, computationally efficient, and competitive framework. Specifically, we first propose various strategies to compute masks, namely, SIFT-masks, SUM-mask, and MAX-mask, to select a representative subset of local convolutional features and eliminate redundant features. Our in-depth analyses demonstrate that proposed masking schemes are effective to address the burstiness drawback and improve retrieval accuracy. Second, we propose to employ recent embedding and aggregating methods that can significantly boost the feature discriminability. Regarding the computation and storage efficiency, we include a hashing module to produce very compact binary image representations. Extensive experiments on six image retrieval benchmarks demonstrate that our proposed framework achieves the state-of-the-art retrieval performances.

References

  1. Relja Arandjelović, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  2. Relja Arandjelović and Andrew Zisserman. 2012. Three things everyone should know to improve object retrieval. In Proceedings of the CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Hossein Azizpour, Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. 2015. From generic to specific deep representations for visual recognition. In Proceedings of the CVPR Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  4. Artem Babenko and Victor Lempitsky. 2015. Aggregating local deep features for image retrieval. In Proceedings of the ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In Proceedings of the ECCV.Google ScholarGoogle ScholarCross RefCross Ref
  6. Jiewei Cao, Zi Huang, Peng Wang, Chao Li, Xiaoshuai Sun, and Heng Tao Shen. 2016. Quartet-net learning for visual instance retrieval. In Proceedings of the ACM MM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Miguel A. Carreira-Perpinan and Ramin Raziperchikolaei. 2015. Hashing with binary autoencoders. In Proceedings of the CVPR.Google ScholarGoogle Scholar
  8. Jonathan Delhumeau, Philippe-Henri Gosselin, Hervé Jégou, and Patrick Pérez. 2013. Revisiting the VLAD image representation. In Proceedings of the ACM MM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Thanh-Toan Do, Tuan Hoang, Dang-Khoa Le-Tan, Trung Pham, Huu Le, Ngai-Man Cheung, and Ian Reid. 2019. Binary constrained deep hashing network for image retrieval without manual annotation. In Proceedings of the WACV.Google ScholarGoogle ScholarCross RefCross Ref
  10. Thanh-Toan Do and Ngai-Man Cheung. 2018. Embedding based on function approximation for large-scale image search. IEEE Trans. Pattern Anal. Mach. Intell. 40, 3 (2018), 626–638.Google ScholarGoogle ScholarCross RefCross Ref
  11. Thanh-Toan Do, Anh-Dzung Doan, and Ngai-Man Cheung. 2016. Learning to hash with binary deep neural network. In Proceedings of the ECCV.Google ScholarGoogle ScholarCross RefCross Ref
  12. Thanh-Toan Do, Dang-Khoa Le Tan, Trung T. Pham, and Ngai-Man Cheung. 2017. Simultaneous feature aggregating and hashing for large-scale image search. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  13. Matthijs Douze, Hervé Jégou, and Florent Perronnin. 2016. Polysemous codes. In Proceedings of the ECCV.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35, 12 (2013), 2916–2929. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of the ECCV.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. 2016. Deep image retrieval: Learning global representations for image search. In Proceedings of the ECCV.Google ScholarGoogle ScholarCross RefCross Ref
  19. Albert Gordo, Jon Almazán, Jerome Revaud, and Diane Larlus. 2017. End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124, 2 (2017), 237–254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kristen Grauman and Rob Fergus. 2013. Learning binary hash codes for large-scale image search. Machine Learning for Computer Vision, Roberto Cipolla, Sebastiano Battiato and Giovanni Maria Farinella (Eds.). Springer Berlin Heidelberg, 49–87.Google ScholarGoogle Scholar
  21. Bharath Hariharan, Pablo ArbelÃąez, Ross Girshick, and Jitendra Malik. 2014. Hypercolumns for object segmentation and fine-grained localization. In Proceedings of the CVPR.Google ScholarGoogle Scholar
  22. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2017. Mask R-CNN. In Proceedings of the ICCV.Google ScholarGoogle Scholar
  23. Kaiming He, Fang Wen, and Jian Sun. 2013. K-means hashing: An affinity-preserving quantization method for learning binary compact codes. In Proceedings of the CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385.Google ScholarGoogle Scholar
  25. Tuan Hoang, Thanh-Toan Do, Huu Le, Dang-Khoa Le Tan, and Ngai-Man Cheung. 2018. Simultaneous compression and quantization: A joint approach for efficient unsupervised hashing. Retrieved from http://arxiv.org/abs/1802.06645.Google ScholarGoogle Scholar
  26. Tuan Hoang, Thanh-Toan Do, Dang-Khoa Le Tan, and Ngai-Man Cheung. 2017. Selective deep convolutional features for image retrieval. In Proceedings of the ACM-MM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Noh Hyeonwoo, Araujo Andre, Sim Jack, Weyand Tobias, and Han Bohyung. 2017. Large-scale image retrieval with attentive deep local features. In Proceedings of the ICCV.Google ScholarGoogle Scholar
  28. Hervé Jégou and Ondřej Chum. 2012. Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In Proceedings of the ECCV.Google ScholarGoogle ScholarCross RefCross Ref
  29. Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming embedding and weak geometric consistency for large-scale image search. In Proceedings of the ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2009. On the burstiness of visual elements. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  31. Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2010. Improving bag-of-features for large-scale image search. Int. J. Comput. Vis. 87, 3 (May 2010), 316–336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Jégou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. (2011), 117–128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  34. Hervé Jégou, Florent Perronnin, Matthijs Douze, Jorge Sánchez, Patrick Pérez, and Cordelia Schmid. 2012. Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34, 9 (2012), 1704–1716. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hervé Jégou and Andrew Zisserman. 2014. Triangulation embedding and democratic aggregation for image search. In Proceedings of the CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yannis Kalantidis, Clayton Mellina, and Simon Osindero. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In Proceedings of the ECCV Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  37. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ying Li, Xiangwei Kong, Liang Zheng, and Qi Tian. 2016. Exploiting hierarchical activations of neural network for image retrieval. In Proceedings of the ACM MM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Guosheng Lin, Chunhua Shen, Ian D. Reid, and Anton van den Hengel. 2015. Efficient piecewise training of deep structured models for semantic segmentation. In Proceedings of the CVPR.Google ScholarGoogle Scholar
  40. Z. Liu, S. Wang, L. Zheng, and Q. Tian. 2017. Robust ImageGraph: Rank-level feature fusion for image search. IEEE Trans. Image Process. 26, 7 (July 2017), 3128–3141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Naila Murray and Florent Perronnin. 2014. Generalized max pooling. In Proceedings of the CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised learning of visual representions by solving jigsaw puzzles. In Proceedings of the ECCV.Google ScholarGoogle Scholar
  44. Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3 (2001), 145–175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Florent Perronnin and Christopher Dance. 2007. Fisher kernels on visual vocabularies for image categorization. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  46. Florent Perronnin, Jorge Sánchez, and Thomas Mensink. 2010. Improving the fisher kernel for large-scale image classification. In Proceedings of the ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. James Philbin, Ondřej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  48. James Philbin, Ondřej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large-scale image databases. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  49. Filip Radenović, Giorgos Tolias, and Ondřej Chum. 2016. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In Proceedings of the ECCV.Google ScholarGoogle ScholarCross RefCross Ref
  50. Filip Radenović, Giorgos Tolias, and Ondřej Chum. 2017. Fine-tuning CNN image retrieval with no human annotation. arXiv:1711.02512.Google ScholarGoogle Scholar
  51. Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the CVPRW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google ScholarGoogle Scholar
  54. Josef Sivic, Andrew Zisserman, et al. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Jingkuan Song, Lianli Gao, Li Liu, Xiaofeng Zhu, and Nicu Sebe. 2018. Quantization-based hashing: A general framework for scalable image and video retrieval. Pattern Recogn. 75 (2018), 175–187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2018. Binary generative adversarial networks for image retrieval. In Proceedings of the AAAI.Google ScholarGoogle Scholar
  57. Jingkuan Song, Tao He, Lianli Gao, Xing Xu, and Heng Tao Shen. 2018. Deep region hashing for efficient large-scale instance search from images. In Proceedings of the AAAI.Google ScholarGoogle Scholar
  58. J. Song, H. Zhang, X. Li, L. Gao, M. Wang, and R. Hong. 2018. Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Trans. Image Process. 27, 7 (2018), 3210–3221.Google ScholarGoogle ScholarCross RefCross Ref
  59. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  60. Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (2016), 64–73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Giorgos Tolias, Yannis Avrithis, and Hervé Jégou. 2013. To aggregate or not to aggregate: Selective match kernels for image search. In Proceedings of the ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2016. Particular object retrieval with integral max-pooling of CNN activations. In Proceedings of the ICLR.Google ScholarGoogle Scholar
  63. Andrea Vedaldi and Brian Fulkerson. 2010. VLFeat: An open and portable library of computer vision algorithms. In Proceedings of the ACM-MM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Andrea Vedaldi and Karel Lenc. 2014. MatConvNet—Convolutional neural networks for MATLAB. Retrieved from http://arxiv.org/abs/1412.4564.Google ScholarGoogle Scholar
  65. J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen. 2017. A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2017), 769–790.Google ScholarGoogle ScholarCross RefCross Ref
  66. X. S. Wei, J. H. Luo, J. Wu, and Z. H. Zhou. 2017. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26 (June 2017), 2868–2881. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Yair Weiss, Antonio Torralba, and Robert Fergus. 2008. Spectral hashing. In Proceedings of the NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Jian Xu, Cunzhao Shi, Chengzuo Qi, Chunheng Wang, and Baihua Xiao. 2018. Unsupervised part-based weighting aggregation of deep convolutional features for image retrieval. In Proceedings of the AAAI.Google ScholarGoogle Scholar
  69. Ke Yan, Yaowei Wang, Dawei Liang, Tiejun Huang, and Yonghong Tian. 2016. CNN vs. SIFT for image retrieval: Alternative or complementary? In Proceedings of the ACM MM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Matthew D. Zeiler and Rob Fergus. 2013. Visualizing and understanding convolutional networks. Retrieved from http://arxiv.org/abs/1311.2901.Google ScholarGoogle Scholar
  71. Y. Zhang, J. Wu, and J. Cai. 2016. Compact representation of high-dimensional feature vectors for large-scale image recognition and retrieval. IEEE Trans. Image Process. 25, 5 (May 2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Liang Zheng, Yi Yang, and Qi Tian. 2016. SIFT meets CNN: A decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (Aug. 2016).Google ScholarGoogle Scholar

Index Terms

  1. From Selective Deep Convolutional Features to Compact Binary Representations for Image Retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!