skip to main content
research-article

Approximate Asymmetric Search for Binary Embedding Codes

Published:25 October 2016Publication History
Skip Abstract Section

Abstract

In this article, we propose a method of approximate asymmetric nearest-neighbor search for binary embedding codes. The asymmetric distance takes advantage of less information loss at the query side. However, calculating asymmetric distances through exhaustive search is prohibitive in a large-scale dataset. We present a novel method, called multi-index voting, that integrates the multi-index hashing technique with a voting mechanism to select appropriate candidates and calculate their asymmetric distances. We show that the candidate selection scheme can be formulated as the tail of the binomial distribution function. In addition, a binary feature selection method based on minimal quantization error is proposed to address the memory insufficiency issue and improve the search accuracy. Substantial experimental evaluations were made to demonstrate that the proposed method can yield an approximate accuracy to the exhaustive search method while significantly accelerating the runtime. For example, one result shows that in a dataset of one billion 256-bit binary codes, examining only 0.5% of the dataset, can reach 95--99% close accuracy to the exhaustive search method and accelerate the search by 73--128 times. It also demonstrates an excellent tradeoff between the search accuracy and time efficiency compared to the state-of-the-art nearest-neighbor search methods. Moreover, the proposed feature selection method shows its effectiveness and improves the accuracy up to 8.35% compared with other feature selection methods.

References

  1. A. Babenko and V. Lempitsky. 2014. Additive quantization for extreme vector compression. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Babenko and V. Lempitsky. 2015. Tree quantization for large-scale similarity search and classification. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’15). Google ScholarGoogle ScholarCross RefCross Ref
  3. D. Cai, C. Zhang, and X. He. 2010. Unsupervised feature selection for multi-cluster data, In Proceedings of ACM Conference on Knowledge Discovery and Data Mining (KDD’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Chernoff. 1952. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23, 4, 493--507. Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Y. Chiu and Y. C. Liou. 2015. A novel feature selection method for nearest neighbor search in binary embedding codes. In Proceedings of Wireless and Optical Communication Conference (WOCC’15). Google ScholarGoogle ScholarCross RefCross Ref
  6. C. Y. Chiu, Y. C. Liou, and S. H. Chou. 2016. Multi-index voting for asymmetric distance computation in a large-scale binary codes. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). Google ScholarGoogle ScholarCross RefCross Ref
  7. W. Dong, M. Charikar, and K. Li. 2008. Asymmetric distance estimation with sketeches for similarity search in high-dimensional spaces. In Proceedings of ACM International Conference on Information Retrieval (SIGIR’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Efron and R. Tibshirani. 1993. An Introduction to the Bootstrap. Chapman 8 Hall/CRC.Google ScholarGoogle Scholar
  9. M. M. Esmaeili, R. K. Ward, and M. Fatourechi. 2012. A fast approximate nearest neighbor search algorithm in the Hamming space. IEEE Trans. Pattern Anal. Mach. Intell. 34, 12, 2481--2488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. Fang, Y. Xu, X. Li, Z. Fan, H. Liu, and Y. Chen. 2014. Locality and similarity preserving embedding for feature selection. Neurocomputing 128, 304--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Ge, K. He, Q. Ke, and J. Sun. 2013. Optimized product quantization for approximate nearest neighbor search. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Gong and S. Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gordo, F. Perronnin, Y. Gong, and S. Lazebnik. 2014. Asymmetric distances for binary embeddings. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1, 33--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. He, D. Cai, and P. Niyogi. 2005. Laplacian score for feature selection. In Proceedings of Annual Conference on Neural Information Processing Systems (NIPS’05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Indyk and R. Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of ACM Symposium on Theory of Computing (STOC’98). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Irie, Z. Li, X. M. Wu, and S. F. Chang. 2014. Locally linear hashing for extracting non-linear manifolds. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Jain, H. Jégou, and P. Gros. 2011. Asymmetric hamming embedding. In Proceedings of ACM International Conference on Multimedia (ACMMM’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Jégou, M. Douze, C. Schmid, and P. Pérez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition CVPR’10. Google ScholarGoogle ScholarCross RefCross Ref
  19. H. Jégou, M. Douze, and C. Schmid. 2011a. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1, 2481--2488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg. 2011b. Searching in one billion vectors: Re-rank with source coding. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’11). Google ScholarGoogle ScholarCross RefCross Ref
  21. D. E. Knuth. 1998. The Art of Computer Programming, Sorting and Searching Vol. 3. Addison Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Liu, J. Wang, S. Kumar, and S. F. Chang. 2011. Hashing with graphs. In Proceedings of International Conference on Machine Learning (ICML’11).Google ScholarGoogle Scholar
  23. D. Liu, S. Yan, R. R. Ji, X. S. Hua, and H. J. Zhang. 2013. Image retrieval with query-adaptive hashing. ACM Trans Multimed. Comput. Commun. Appl. 9, 1, Article 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Ma, H. Xie, Z. Chen, Q. Dai, Y. Huang, and G. Ji. 2014. Fast search of binary codes with distinctive bits. In Proceedings of Pacific-Rim Conference on Multimedia (PCM’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Muja and D. G. Lowe. 2012. Fast matching of binary features. In Proceedings of International Conference on Computer and Robot Vision (CRV’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Norouzi and D. J. Fleet. 2011. Minimal loss hashing for compact binary codes. In Proceedings of International Conference on Machine Learning (ICML’11).Google ScholarGoogle Scholar
  28. M. Norouzi and D. J. Fleet. 2013. Cartesian k-means. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Norouzi, A. Punjani, and D. J. Fleet. 2014. Fast exact search in hamming space with multi-index hashing. IEEE Trans Pattern Anal. Mach. Intell. 36, 6, 1107--1119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Oliva and A. Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3, 145--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Song, H. T. Shen, J. Wang, N. Sebe, and J. Wang. 2016. A distance-computation-free search scheme for binary code databases. IEEE Trans. Multimed. 18, 3, 484--495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Wan, S. Tang, Y. Zhang, L. Huang, and J. Li. 2013. Data driven multi-index hashing. In Proceedings of IEEE International Conference on Image Processing ICIP’13. Google ScholarGoogle ScholarCross RefCross Ref
  33. J. Wang, W. Liu, S. Kumar, and S. F. Chang. 2016. Learning to hash for indexing big data—A survey. In Proceedings of IEEE. 104, 1, 34--57. Google ScholarGoogle ScholarCross RefCross Ref
  34. J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen. 2015. A survey on learning to hash. Journal of Latex Class Files 13, 9. In arXiv: 1606.00185 {cs.CV}.Google ScholarGoogle Scholar
  35. S. Wang, J. Tang, and H. Liu. 2015. Embedded unsupervised feature selection. In Proceedings of AAAI International Conference on Artificial Intelligence (AAAI’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. L. Wei and S. A. Billings. 2007. Feature subset selection and ranking for data dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 29, 1, 162--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Weiss, R. Fergus, and A. Torralba. 2012. Multi-dimensional spectral hashing. In Proceedings of European Conference on Computer Vision (ECCV’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. X. Zhang, J. Qin, W. Wang, Y. Sun, and J. Lu. 2013. HmSearch: An efficient hamming distance query processing algorithm. In Proceedings of International Conference on Scientific and Statistical Database Management SSDBM’13. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Approximate Asymmetric Search for Binary Embedding Codes

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 13, Issue 1
          February 2017
          278 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3012406
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 October 2016
          • Revised: 1 August 2016
          • Accepted: 1 August 2016
          • Received: 1 January 2016
          Published in tomm Volume 13, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!