skip to main content
research-article

Active query sensing: Suggesting the best query view for mobile visual search

Published:16 October 2012Publication History
Skip Abstract Section

Abstract

While much exciting progress is being made in mobile visual search, one important question has been left unexplored in all current systems. When searching objects or scenes in the 3D world, which viewing angle is more likely to be successful? More particularly, if the first query fails to find the right target, how should the user control the mobile camera to form the second query? In this article, we propose a novel Active Query Sensing system for mobile location search, which actively suggests the best subsequent query view to recognize the physical location in the mobile environment. The proposed system includes two unique components: (1) an offline process for analyzing the saliencies of different views associated with each geographical location, which predicts the location search precisions of individual views by modeling their self-retrieval score distributions. (2) an online process for estimating the view of an unseen query, and suggesting the best subsequent view change. Specifically, the optimal viewing angle change for the next query can be formulated as an online information theoretic approach. Using a scalable visual search system implemented over a NYC street view dataset (0.3 million images), we show a performance gain by reducing the failure rate of mobile location search to only 12% after the second query. We have also implemented an end-to-end functional system, including user interfaces on iPhones, client-server communication, and a remote search server. This work may open up an exciting new direction for developing interactive mobile media applications through the innovative exploitation of active sensing and query formulation.

References

  1. Baatz, G., Koser, K., Chen, D., Grzeszczuk, R., and Pollefeys, M. 2010. Handling urban location recognition as a 2d homothetic problem. In Proceedings of the 10th European Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chen, D., Baatz, G., Koser, K., Tsai, S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., and Grzeszczuk, R. 2011. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Crandall, D., Backstrom, L., and Huttenlocher, D. 2009. Mapping the world's photos. In Proceedings of the 18th International World Wide Web Conference (WWW). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Datta, R., Joshi, D., Li, J., and Wang, J. 2008. Image retrieval ideas, influences, and trends of the new age. ACM Comput. Surv. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Eade, E. and Drummond, T. 2008. Unified loop closing and recovery for real time monocular slam. In Proceedings of the British Machine Vision Conference (BMVC).Google ScholarGoogle Scholar
  6. Girod, B., Chandrasekhar, V., Chen, D., Cheung, N.-M., Grzeszczuk, R., Reznik, Y., Takacs, G., Tsai, S., and Vedantham, R. 2011. Mobile visual search. IEEE Sig. Process. Mag.Google ScholarGoogle ScholarCross RefCross Ref
  7. Goggles. http://www.google.com/mobile/goggles/.Google ScholarGoogle Scholar
  8. He, B. and Ounis, I. 2004. Infer query performance using pre-retrieval predictors. In Proceedings of the International Symposium on String Processing and Information Retrieval.Google ScholarGoogle Scholar
  9. He, J., Feng, J., Lin, T.-H., Liu, X., and Chang, S.-F. 2012. Mobile product search with bag of hash bits and boundary rerankings. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Irschara, A., Zach, C., Frahm, J., and Bischof, H. 2009. From structure-from-motion point clouds to fast location recognition. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  11. Kaneva, B., Sivic, J., Torralba, A., Avidan, S., and Freeman, W. 2010. Matching and predicting street level images. In Workshop on Vision for Cognitive Tasks, Proceedings of the 10th European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar
  12. Knopp, J., Sivic, J., and Pajdla, T. 2010. Avoiding confusing features in place recognition. In Proceedings of the 10th European Conference on Computer Vision (ECCV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kooaba. http://www.kooaba.com/.Google ScholarGoogle Scholar
  14. Kwok, K., Grunfeld, L., Sun, H., Deng, P., and Dinstl, N. 2004. Robust track experiments using pircs. In Proceedings of the 13th Text Retrieval Conference (TREC).Google ScholarGoogle Scholar
  15. Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. In Int. J. Comput. Vis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Nister, D. and Stewenius, H. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene a holistic representation of the spatial envelope. Int. J. Comput. Vis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Point and Find. http://www.pointandfind.nokia.com/.Google ScholarGoogle Scholar
  19. Rui, Y., Huang, T., and Chang, S. 1999. Image retrieval current techniques, promising directions, and open issues. J. Vis. Commun. Image Represent.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Schindler, G. and Brown, M. 2007. City-scale location recognition. In Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  21. Sivic, J. and Zisserman, A. 2003. Video google a text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Snaptell. http://www.snaptell.com/.Google ScholarGoogle Scholar
  23. Tong, S. and Chang, E. 2001. Support vector machine active learning for image retrieval. In Proceedings of ACM Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wu, J. and Rehg, J. 2009. Beyond the euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  25. YomTov, E., Fine, S., Carmel, D., and Darlow, A. 2005. Learning to estimate query difficulty. In Proceedings of ACM SIGIR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yu, F., Ji, R., and Chang, S.-F. 2011. Active query sensing for mobile location search. In Proceedings of ACM Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zha, Z.-J., Yang, L., Mei, T., Wang, M., and Wang, Z. 2009. Visual query suggestion. In Proceedings of ACM Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zhang, W. and Kosecka, J. 2006. Image based localization in urban environments. In Proceedings of the 3rd International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Active query sensing: Suggesting the best query view for mobile visual search

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Multimedia Computing, Communications, and Applications
            ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 8, Issue 3s
            Special section of best papers of ACM multimedia 2011, and special section on 3D mobile multimedia
            September 2012
            173 pages
            ISSN:1551-6857
            EISSN:1551-6865
            DOI:10.1145/2348816
            Issue’s Table of Contents

            Copyright © 2012 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 16 October 2012
            • Accepted: 1 June 2012
            • Revised: 1 May 2012
            • Received: 1 March 2012
            Published in tomm Volume 8, Issue 3s

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!