Abstract
In the large-scale image retrieval task, the two most important requirements are the discriminability of image representations and the efficiency in computation and storage of representations. Regarding the former requirement, Convolutional Neural Network is proven to be a very powerful tool to extract highly discriminative local descriptors for effective image search. Additionally, to further improve the discriminative power of the descriptors, recent works adopt fine-tuned strategies. In this article, taking a different approach, we propose a novel, computationally efficient, and competitive framework. Specifically, we first propose various strategies to compute masks, namely, SIFT-masks, SUM-mask, and MAX-mask, to select a representative subset of local convolutional features and eliminate redundant features. Our in-depth analyses demonstrate that proposed masking schemes are effective to address the burstiness drawback and improve retrieval accuracy. Second, we propose to employ recent embedding and aggregating methods that can significantly boost the feature discriminability. Regarding the computation and storage efficiency, we include a hashing module to produce very compact binary image representations. Extensive experiments on six image retrieval benchmarks demonstrate that our proposed framework achieves the state-of-the-art retrieval performances.
- Relja Arandjelović, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the CVPR.Google Scholar
Cross Ref
- Relja Arandjelović and Andrew Zisserman. 2012. Three things everyone should know to improve object retrieval. In Proceedings of the CVPR. Google Scholar
Digital Library
- Hossein Azizpour, Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. 2015. From generic to specific deep representations for visual recognition. In Proceedings of the CVPR Workshops.Google Scholar
Cross Ref
- Artem Babenko and Victor Lempitsky. 2015. Aggregating local deep features for image retrieval. In Proceedings of the ICCV. Google Scholar
Digital Library
- Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In Proceedings of the ECCV.Google Scholar
Cross Ref
- Jiewei Cao, Zi Huang, Peng Wang, Chao Li, Xiaoshuai Sun, and Heng Tao Shen. 2016. Quartet-net learning for visual instance retrieval. In Proceedings of the ACM MM. Google Scholar
Digital Library
- Miguel A. Carreira-Perpinan and Ramin Raziperchikolaei. 2015. Hashing with binary autoencoders. In Proceedings of the CVPR.Google Scholar
- Jonathan Delhumeau, Philippe-Henri Gosselin, Hervé Jégou, and Patrick Pérez. 2013. Revisiting the VLAD image representation. In Proceedings of the ACM MM. Google Scholar
Digital Library
- Thanh-Toan Do, Tuan Hoang, Dang-Khoa Le-Tan, Trung Pham, Huu Le, Ngai-Man Cheung, and Ian Reid. 2019. Binary constrained deep hashing network for image retrieval without manual annotation. In Proceedings of the WACV.Google Scholar
Cross Ref
- Thanh-Toan Do and Ngai-Man Cheung. 2018. Embedding based on function approximation for large-scale image search. IEEE Trans. Pattern Anal. Mach. Intell. 40, 3 (2018), 626–638.Google Scholar
Cross Ref
- Thanh-Toan Do, Anh-Dzung Doan, and Ngai-Man Cheung. 2016. Learning to hash with binary deep neural network. In Proceedings of the ECCV.Google Scholar
Cross Ref
- Thanh-Toan Do, Dang-Khoa Le Tan, Trung T. Pham, and Ngai-Man Cheung. 2017. Simultaneous feature aggregating and hashing for large-scale image search. In Proceedings of the CVPR.Google Scholar
Cross Ref
- Matthijs Douze, Hervé Jégou, and Florent Perronnin. 2016. Polysemous codes. In Proceedings of the ECCV.Google Scholar
Cross Ref
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the CVPR. Google Scholar
Digital Library
- Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35, 12 (2013), 2916–2929. Google Scholar
Digital Library
- Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features. In Proceedings of the ECCV.Google Scholar
Cross Ref
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the NIPS. Google Scholar
Digital Library
- Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. 2016. Deep image retrieval: Learning global representations for image search. In Proceedings of the ECCV.Google Scholar
Cross Ref
- Albert Gordo, Jon Almazán, Jerome Revaud, and Diane Larlus. 2017. End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124, 2 (2017), 237–254. Google Scholar
Digital Library
- Kristen Grauman and Rob Fergus. 2013. Learning binary hash codes for large-scale image search. Machine Learning for Computer Vision, Roberto Cipolla, Sebastiano Battiato and Giovanni Maria Farinella (Eds.). Springer Berlin Heidelberg, 49–87.Google Scholar
- Bharath Hariharan, Pablo ArbelÃąez, Ross Girshick, and Jitendra Malik. 2014. Hypercolumns for object segmentation and fine-grained localization. In Proceedings of the CVPR.Google Scholar
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2017. Mask R-CNN. In Proceedings of the ICCV.Google Scholar
- Kaiming He, Fang Wen, and Jian Sun. 2013. K-means hashing: An affinity-preserving quantization method for learning binary compact codes. In Proceedings of the CVPR. Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385.Google Scholar
- Tuan Hoang, Thanh-Toan Do, Huu Le, Dang-Khoa Le Tan, and Ngai-Man Cheung. 2018. Simultaneous compression and quantization: A joint approach for efficient unsupervised hashing. Retrieved from http://arxiv.org/abs/1802.06645.Google Scholar
- Tuan Hoang, Thanh-Toan Do, Dang-Khoa Le Tan, and Ngai-Man Cheung. 2017. Selective deep convolutional features for image retrieval. In Proceedings of the ACM-MM. Google Scholar
Digital Library
- Noh Hyeonwoo, Araujo Andre, Sim Jack, Weyand Tobias, and Han Bohyung. 2017. Large-scale image retrieval with attentive deep local features. In Proceedings of the ICCV.Google Scholar
- Hervé Jégou and Ondřej Chum. 2012. Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In Proceedings of the ECCV.Google Scholar
Cross Ref
- Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming embedding and weak geometric consistency for large-scale image search. In Proceedings of the ECCV. Google Scholar
Digital Library
- Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2009. On the burstiness of visual elements. In Proceedings of the CVPR.Google Scholar
Cross Ref
- Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2010. Improving bag-of-features for large-scale image search. Int. J. Comput. Vis. 87, 3 (May 2010), 316–336. Google Scholar
Digital Library
- H. Jégou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. (2011), 117–128. Google Scholar
Digital Library
- Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of the CVPR.Google Scholar
Cross Ref
- Hervé Jégou, Florent Perronnin, Matthijs Douze, Jorge Sánchez, Patrick Pérez, and Cordelia Schmid. 2012. Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34, 9 (2012), 1704–1716. Google Scholar
Digital Library
- Hervé Jégou and Andrew Zisserman. 2014. Triangulation embedding and democratic aggregation for image search. In Proceedings of the CVPR. Google Scholar
Digital Library
- Yannis Kalantidis, Clayton Mellina, and Simon Osindero. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In Proceedings of the ECCV Workshops.Google Scholar
Cross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the NIPS. Google Scholar
Digital Library
- Ying Li, Xiangwei Kong, Liang Zheng, and Qi Tian. 2016. Exploiting hierarchical activations of neural network for image retrieval. In Proceedings of the ACM MM. Google Scholar
Digital Library
- Guosheng Lin, Chunhua Shen, Ian D. Reid, and Anton van den Hengel. 2015. Efficient piecewise training of deep structured models for semantic segmentation. In Proceedings of the CVPR.Google Scholar
- Z. Liu, S. Wang, L. Zheng, and Q. Tian. 2017. Robust ImageGraph: Rank-level feature fusion for image search. IEEE Trans. Image Process. 26, 7 (July 2017), 3128–3141. Google Scholar
Digital Library
- David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the ICCV. Google Scholar
Digital Library
- Naila Murray and Florent Perronnin. 2014. Generalized max pooling. In Proceedings of the CVPR. Google Scholar
Digital Library
- Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised learning of visual representions by solving jigsaw puzzles. In Proceedings of the ECCV.Google Scholar
- Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3 (2001), 145–175. Google Scholar
Digital Library
- Florent Perronnin and Christopher Dance. 2007. Fisher kernels on visual vocabularies for image categorization. In Proceedings of the CVPR.Google Scholar
Cross Ref
- Florent Perronnin, Jorge Sánchez, and Thomas Mensink. 2010. Improving the fisher kernel for large-scale image classification. In Proceedings of the ECCV. Google Scholar
Digital Library
- James Philbin, Ondřej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the CVPR.Google Scholar
Cross Ref
- James Philbin, Ondřej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large-scale image databases. In Proceedings of the CVPR.Google Scholar
Cross Ref
- Filip Radenović, Giorgos Tolias, and Ondřej Chum. 2016. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In Proceedings of the ECCV.Google Scholar
Cross Ref
- Filip Radenović, Giorgos Tolias, and Ondřej Chum. 2017. Fine-tuning CNN image retrieval with no human annotation. arXiv:1711.02512.Google Scholar
- Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the CVPRW. Google Scholar
Digital Library
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the NIPS. Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar
- Josef Sivic, Andrew Zisserman, et al. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the ICCV. Google Scholar
Digital Library
- Jingkuan Song, Lianli Gao, Li Liu, Xiaofeng Zhu, and Nicu Sebe. 2018. Quantization-based hashing: A general framework for scalable image and video retrieval. Pattern Recogn. 75 (2018), 175–187. Google Scholar
Digital Library
- Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2018. Binary generative adversarial networks for image retrieval. In Proceedings of the AAAI.Google Scholar
- Jingkuan Song, Tao He, Lianli Gao, Xing Xu, and Heng Tao Shen. 2018. Deep region hashing for efficient large-scale instance search from images. In Proceedings of the AAAI.Google Scholar
- J. Song, H. Zhang, X. Li, L. Gao, M. Wang, and R. Hong. 2018. Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Trans. Image Process. 27, 7 (2018), 3210–3221.Google Scholar
Cross Ref
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the CVPR.Google Scholar
Cross Ref
- Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (2016), 64–73. Google Scholar
Digital Library
- Giorgos Tolias, Yannis Avrithis, and Hervé Jégou. 2013. To aggregate or not to aggregate: Selective match kernels for image search. In Proceedings of the ICCV. Google Scholar
Digital Library
- Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2016. Particular object retrieval with integral max-pooling of CNN activations. In Proceedings of the ICLR.Google Scholar
- Andrea Vedaldi and Brian Fulkerson. 2010. VLFeat: An open and portable library of computer vision algorithms. In Proceedings of the ACM-MM. Google Scholar
Digital Library
- Andrea Vedaldi and Karel Lenc. 2014. MatConvNet—Convolutional neural networks for MATLAB. Retrieved from http://arxiv.org/abs/1412.4564.Google Scholar
- J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen. 2017. A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2017), 769–790.Google Scholar
Cross Ref
- X. S. Wei, J. H. Luo, J. Wu, and Z. H. Zhou. 2017. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26 (June 2017), 2868–2881. Google Scholar
Digital Library
- Yair Weiss, Antonio Torralba, and Robert Fergus. 2008. Spectral hashing. In Proceedings of the NIPS. Google Scholar
Digital Library
- Jian Xu, Cunzhao Shi, Chengzuo Qi, Chunheng Wang, and Baihua Xiao. 2018. Unsupervised part-based weighting aggregation of deep convolutional features for image retrieval. In Proceedings of the AAAI.Google Scholar
- Ke Yan, Yaowei Wang, Dawei Liang, Tiejun Huang, and Yonghong Tian. 2016. CNN vs. SIFT for image retrieval: Alternative or complementary? In Proceedings of the ACM MM. Google Scholar
Digital Library
- Matthew D. Zeiler and Rob Fergus. 2013. Visualizing and understanding convolutional networks. Retrieved from http://arxiv.org/abs/1311.2901.Google Scholar
- Y. Zhang, J. Wu, and J. Cai. 2016. Compact representation of high-dimensional feature vectors for large-scale image recognition and retrieval. IEEE Trans. Image Process. 25, 5 (May 2016). Google Scholar
Digital Library
- Liang Zheng, Yi Yang, and Qi Tian. 2016. SIFT meets CNN: A decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (Aug. 2016).Google Scholar
Index Terms
From Selective Deep Convolutional Features to Compact Binary Representations for Image Retrieval
Recommendations
Selective Deep Convolutional Features for Image Retrieval
MM '17: Proceedings of the 25th ACM international conference on MultimediaConvolutional Neural Network (CNN) is a very powerful approach to extract discriminative local descriptors for effective image search. Recent work adopts fine-tuned strategies to further improve the discriminative power of the descriptors. Taking a ...
Deep convolutional features for image retrieval
Highlights- A comprehensive study that explores deep convolutional features for CBIR.
- The ...
AbstractNowadays, the use of Convolutional Neural Networks (CNNs) has led to tremendous achievements in several computer vision challenges. CNN-based image retrieval methods vary in complexity, growing capacity, and execution time. This work ...
Content-based image retrieval with compact deep convolutional features
Convolutional neural networks (CNNs) with deep learning have recently achieved a remarkable success with a superior performance in computer vision applications. Most of CNN-based methods extract image features at the last layer using a single CNN ...






Comments