Abstract
In this article, we propose a method of approximate asymmetric nearest-neighbor search for binary embedding codes. The asymmetric distance takes advantage of less information loss at the query side. However, calculating asymmetric distances through exhaustive search is prohibitive in a large-scale dataset. We present a novel method, called multi-index voting, that integrates the multi-index hashing technique with a voting mechanism to select appropriate candidates and calculate their asymmetric distances. We show that the candidate selection scheme can be formulated as the tail of the binomial distribution function. In addition, a binary feature selection method based on minimal quantization error is proposed to address the memory insufficiency issue and improve the search accuracy. Substantial experimental evaluations were made to demonstrate that the proposed method can yield an approximate accuracy to the exhaustive search method while significantly accelerating the runtime. For example, one result shows that in a dataset of one billion 256-bit binary codes, examining only 0.5% of the dataset, can reach 95--99% close accuracy to the exhaustive search method and accelerate the search by 73--128 times. It also demonstrates an excellent tradeoff between the search accuracy and time efficiency compared to the state-of-the-art nearest-neighbor search methods. Moreover, the proposed feature selection method shows its effectiveness and improves the accuracy up to 8.35% compared with other feature selection methods.
- A. Babenko and V. Lempitsky. 2014. Additive quantization for extreme vector compression. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’14). Google Scholar
Digital Library
- A. Babenko and V. Lempitsky. 2015. Tree quantization for large-scale similarity search and classification. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’15). Google Scholar
Cross Ref
- D. Cai, C. Zhang, and X. He. 2010. Unsupervised feature selection for multi-cluster data, In Proceedings of ACM Conference on Knowledge Discovery and Data Mining (KDD’10). Google Scholar
Digital Library
- H. Chernoff. 1952. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23, 4, 493--507. Google Scholar
Cross Ref
- C. Y. Chiu and Y. C. Liou. 2015. A novel feature selection method for nearest neighbor search in binary embedding codes. In Proceedings of Wireless and Optical Communication Conference (WOCC’15). Google Scholar
Cross Ref
- C. Y. Chiu, Y. C. Liou, and S. H. Chou. 2016. Multi-index voting for asymmetric distance computation in a large-scale binary codes. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). Google Scholar
Cross Ref
- W. Dong, M. Charikar, and K. Li. 2008. Asymmetric distance estimation with sketeches for similarity search in high-dimensional spaces. In Proceedings of ACM International Conference on Information Retrieval (SIGIR’08). Google Scholar
Digital Library
- B. Efron and R. Tibshirani. 1993. An Introduction to the Bootstrap. Chapman 8 Hall/CRC.Google Scholar
- M. M. Esmaeili, R. K. Ward, and M. Fatourechi. 2012. A fast approximate nearest neighbor search algorithm in the Hamming space. IEEE Trans. Pattern Anal. Mach. Intell. 34, 12, 2481--2488. Google Scholar
Digital Library
- X. Fang, Y. Xu, X. Li, Z. Fan, H. Liu, and Y. Chen. 2014. Locality and similarity preserving embedding for feature selection. Neurocomputing 128, 304--315. Google Scholar
Digital Library
- T. Ge, K. He, Q. Ke, and J. Sun. 2013. Optimized product quantization for approximate nearest neighbor search. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’13). Google Scholar
Digital Library
- Y. Gong and S. Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’11). Google Scholar
Digital Library
- A. Gordo, F. Perronnin, Y. Gong, and S. Lazebnik. 2014. Asymmetric distances for binary embeddings. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1, 33--47. Google Scholar
Digital Library
- X. He, D. Cai, and P. Niyogi. 2005. Laplacian score for feature selection. In Proceedings of Annual Conference on Neural Information Processing Systems (NIPS’05). Google Scholar
Digital Library
- P. Indyk and R. Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of ACM Symposium on Theory of Computing (STOC’98). Google Scholar
Digital Library
- G. Irie, Z. Li, X. M. Wu, and S. F. Chang. 2014. Locally linear hashing for extracting non-linear manifolds. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’14). Google Scholar
Digital Library
- M. Jain, H. Jégou, and P. Gros. 2011. Asymmetric hamming embedding. In Proceedings of ACM International Conference on Multimedia (ACMMM’11). Google Scholar
Digital Library
- H. Jégou, M. Douze, C. Schmid, and P. Pérez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition CVPR’10. Google Scholar
Cross Ref
- H. Jégou, M. Douze, and C. Schmid. 2011a. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1, 2481--2488. Google Scholar
Digital Library
- H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg. 2011b. Searching in one billion vectors: Re-rank with source coding. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’11). Google Scholar
Cross Ref
- D. E. Knuth. 1998. The Art of Computer Programming, Sorting and Searching Vol. 3. Addison Wesley. Google Scholar
Digital Library
- W. Liu, J. Wang, S. Kumar, and S. F. Chang. 2011. Hashing with graphs. In Proceedings of International Conference on Machine Learning (ICML’11).Google Scholar
- D. Liu, S. Yan, R. R. Ji, X. S. Hua, and H. J. Zhang. 2013. Image retrieval with query-adaptive hashing. ACM Trans Multimed. Comput. Commun. Appl. 9, 1, Article 2. Google Scholar
Digital Library
- D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110. Google Scholar
Digital Library
- Y. Ma, H. Xie, Z. Chen, Q. Dai, Y. Huang, and G. Ji. 2014. Fast search of binary codes with distinctive bits. In Proceedings of Pacific-Rim Conference on Multimedia (PCM’14). Google Scholar
Digital Library
- M. Muja and D. G. Lowe. 2012. Fast matching of binary features. In Proceedings of International Conference on Computer and Robot Vision (CRV’12). Google Scholar
Digital Library
- M. Norouzi and D. J. Fleet. 2011. Minimal loss hashing for compact binary codes. In Proceedings of International Conference on Machine Learning (ICML’11).Google Scholar
- M. Norouzi and D. J. Fleet. 2013. Cartesian k-means. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- M. Norouzi, A. Punjani, and D. J. Fleet. 2014. Fast exact search in hamming space with multi-index hashing. IEEE Trans Pattern Anal. Mach. Intell. 36, 6, 1107--1119. Google Scholar
Digital Library
- A. Oliva and A. Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3, 145--175. Google Scholar
Digital Library
- J. Song, H. T. Shen, J. Wang, N. Sebe, and J. Wang. 2016. A distance-computation-free search scheme for binary code databases. IEEE Trans. Multimed. 18, 3, 484--495. Google Scholar
Digital Library
- J. Wan, S. Tang, Y. Zhang, L. Huang, and J. Li. 2013. Data driven multi-index hashing. In Proceedings of IEEE International Conference on Image Processing ICIP’13. Google Scholar
Cross Ref
- J. Wang, W. Liu, S. Kumar, and S. F. Chang. 2016. Learning to hash for indexing big data—A survey. In Proceedings of IEEE. 104, 1, 34--57. Google Scholar
Cross Ref
- J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen. 2015. A survey on learning to hash. Journal of Latex Class Files 13, 9. In arXiv: 1606.00185 {cs.CV}.Google Scholar
- S. Wang, J. Tang, and H. Liu. 2015. Embedded unsupervised feature selection. In Proceedings of AAAI International Conference on Artificial Intelligence (AAAI’15). Google Scholar
Digital Library
- H. L. Wei and S. A. Billings. 2007. Feature subset selection and ranking for data dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 29, 1, 162--166. Google Scholar
Digital Library
- Y. Weiss, R. Fergus, and A. Torralba. 2012. Multi-dimensional spectral hashing. In Proceedings of European Conference on Computer Vision (ECCV’12). Google Scholar
Digital Library
- X. Zhang, J. Qin, W. Wang, Y. Sun, and J. Lu. 2013. HmSearch: An efficient hamming distance query processing algorithm. In Proceedings of International Conference on Scientific and Statistical Database Management SSDBM’13. Google Scholar
Digital Library
Index Terms
Approximate Asymmetric Search for Binary Embedding Codes
Recommendations
A Fast Approximate Nearest Neighbor Search Algorithm in the Hamming Space
A fast approximate nearest neighbor search algorithm for the (binary) Hamming space is proposed. The proposed Error Weighted Hashing (EWH) algorithm is up to 20 times faster than the popular locality sensitive hashing (LSH) algorithm and works well even ...
Efficient approximate nearest neighbor search with integrated binary codes
MM '11: Proceedings of the 19th ACM international conference on MultimediaNearest neighbor search in Euclidean space is a fundamental problem in multimedia retrieval. The difficulty of exact nearest neighbor search has led to approximate solutions that sacrifice precision for efficiency. Among such solutions, approaches that ...
Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningThe problem of Approximate Maximum Inner Product (AMIP) search has received increasing attention due to its wide applications. Interestingly, based on asymmetric transformation, the problem can be reduced to the Approximate Nearest Neighbor (ANN) search,...






Comments