skip to main content
research-article

Robust and accurate mobile visual localization and its applications

Published:17 October 2013Publication History
Skip Abstract Section

Abstract

Mobile applications are becoming increasingly popular. More and more people are using their phones to enjoy ubiquitous location-based services (LBS). The increasing popularity of LBS creates a fundamental problem: mobile localization. Besides traditional localization methods that use GPS or wireless signals, using phone-captured images for localization has drawn significant interest from researchers. Photos contain more scene context information than the embedded sensors, leading to a more precise location description. With the goal being to accurately sense real geographic scene contexts, this article presents a novel approach to mobile visual localization according to a given image (typically associated with a rough GPS position). The proposed approach is capable of providing a complete set of more accurate parameters about the scene geo-context including the real locations of both the mobile user and perhaps more importantly the captured scene, as well as the viewing direction. To figure out how to make image localization quick and accurate, we investigate various techniques for large-scale image retrieval and 2D-to-3D matching. Specifically, we first generate scene clusters using joint geo-visual clustering, with each scene being represented by a reconstructed 3D model from a set of images. The 3D models are then indexed using a visual vocabulary tree structure. Taking geo-tags of the database image as prior knowledge, a novel location-based codebook weighting scheme proposed to embed this additional information into the codebook. The discriminative power of the codebook is enhanced, thus leading to better image retrieval performance. The query image is aligned with the models obtained from the image retrieval results, and eventually registered to a real-world map. We evaluate the effectiveness of our approach using several large-scale datasets and achieving estimation accuracy of a user's location within 13 meters, viewing direction within 12 degrees, and viewing distance within 26 meters. Of particular note is our showcase of three novel applications based on localization results: (1) an on-the-spot tour guide, (2) collaborative routing, and (3) a sight-seeing guide. The evaluations through user studies demonstrate that these applications are effective in facilitating the ideal rendezvous for mobile users.

References

  1. Avrithis, Y., Kalantidis, Y., Tolias, G., and Spyrou, E. 2010. Retrieving landmark and non-landmark images from community photo collections. In Proceedings of the International Conference on Multimedia. ACM, 153--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bourke, S., McCarthy, K., and Smyth, B. 2011. The social camera: A case-study in contextual image recommendation. In Proceedings of the 16th International Conference on Intelligent User Interfaces. ACM, 13--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chen, D., Baatz, G., Koser, K., Tsai, S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., and Grzeszczuk, R. 2011. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 737--744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Crandall, D., Backstrom, L., Huttenlocher, D., and Kleinberg, J. 2009. Mapping the world's photos. In Proceedings of the 18th International Conference on World Wide Web. ACM, 761--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Doersch, C., Singh, S., Gupta, A., Sivic, J., and Efros, A. A. 2012. What makes Paris look like Paris? ACM Trans. Graph. 31, 4, 101:1--101:9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Frey, B. and Dueck, D. 2007. Clustering by passing messages between data points. Science 315, 5814, 972--976.Google ScholarGoogle Scholar
  7. Girod, B., Chandrasekhar, V., Chen, D., Cheung, N., Grzeszczuk, R., Reznik, Y., Takacs, G., Tsai, S., and Vedantham, R. 2011. Mobile visual search. IEEE Signal Proces. Mag. 28, 4, 61--76.Google ScholarGoogle ScholarCross RefCross Ref
  8. Hartley, R. I. and Zisserman, A. 2004. Multiple View Geometry in Computer Vision 2nd Ed. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Irschara, A., Zach, C., Frahm, J., and Bischof, H. 2009. From structure-from-motion point clouds to fast location recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). IEEE, 2599--2606.Google ScholarGoogle Scholar
  10. Ji, R., Duan, L., Chen, J., Yao, H., Rui, Y., Chang, S., and Gao, W. 2011. Towards low bit rate mobile visual search with multiple-channel coding. In Proceedings of the 19th ACM International Conference on Multimedia. ACM, 573--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ji, R., Duan, L., Chen, J., Yao, H., Yuan, J., Rui, Y., and Gao, W. 2012. Location discriminative vocabulary coding for mobile landmark search. Int. J. Comput. Vision 96, 3, 290--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Josephson, K. and Byrod, M. 2009. Pose estimation with radial distortion and unknown focal length. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). IEEE, 2419--2426.Google ScholarGoogle Scholar
  13. Knopp, J., Sivic, J., and Pajdla, T. 2010. Avoiding confusing features in place recognition. In Proceedings of the European Conference on Computer Vision (ECCV'10). 748--761. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kroepfl, M., Wexler, Y., and Ofek, E. 2010. Efficiently locating photographs in many panoramas. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 119--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Li, X., Wu, C., Zach, C., Lazebnik, S., and Frahm, J. 2008. Modeling and recognition of landmark image collections using iconic scene graphs. In Proceedings of the European Conference on Computer Vision (ECCV'08). 427--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Li, Y., Snavely, N., and Huttenlocher, D. 2010. Location recognition using prioritized feature matching. In Proceedings of the European Conference on Computer Vision (ECCV'10). 791--804. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Liu, H., Mei, T., Luo, J., Li, H., and Li, S. 2012. Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing. In Proceedings of the 20th ACM International Conference on Multimedia (MM'12). ACM, New York, NY, 9--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Luo, J., Joshi, D., Yu, J., and Gallagher, A. 2011. Geotagging in multimedia and computer visionla survey. Multimedia Tools Appl. 51, 1, 187--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Luo, Z., Li, H., Tang, J., Hong, R., and Chua, T. 2009. Viewfocus: Explore places of interests on Google maps using photos with view direction filtering. In Proceedings of the 17th ACM International Conference on Multimedia. ACM, 963--964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Nistér, D. 2004. An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intel. 26, 6, 756--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Nister, D. and Stewenius, H. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Vol. 2, IEEE, 2161--2168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Park, M., Luo, J., Collins, R., and Liu, Y. 2010. Beyond GPS: Determining the camera viewing direction of a geotagged image. In Proceedings of the International Conference on Multimedia. ACM, 631--634. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07). IEEE, 1--8.Google ScholarGoogle Scholar
  25. Philbin, J., Isard, M., Sivic, J., and Zisserman, A. 2010. Descriptor learning for efficient retrieval. In Proceedings of the European Conference on Computer Vision (ECCV'10). Springer, 677--691. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sattler, T., Leibe, B., and Kobbelt, L. 2011. Fast image-based localization using direct 2d-to-3d matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'11) IEEE, 667--674. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Schindler, G., Brown, M., and Szeliski, R. 2007. City-scale location recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07). IEEE, 1--7.Google ScholarGoogle Scholar
  28. Schroth, G., Huitl, R., Chen, D., Abu-Alqumsan, M., Al-Nuaimi, A., and Steinbach, E. 2011. Mobile visual location recognition. IEEE Signal Process. Mag. 28, 4, 77--89.Google ScholarGoogle ScholarCross RefCross Ref
  29. Sivic, J. and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision. IEEE, 1470--1477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Snavely, N., Seitz, S., and Szeliski, R. 2006. Photo tourism: Exploring photo collections in 3d. ACM Trans. Graph. 25, 835--846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Turcot, P. and Lowe, D. 2009. Better matching with fewer features: The selection of useful features in large database recognition problems. In Proceedings of the IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 2109--2116.Google ScholarGoogle Scholar
  32. Wang, X., Yang, M., Cour, T., Zhu, S., Yu, K., and Han, T. X. 2011. Contextual weighting for vocabulary tree based image retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). IEEE, 209--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yu, F., Ji, R., and Chang, S. 2011. Active query sensing for mobile location search. In Proceedings of the 19th ACM International Conference on Multimedia. ACM, 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zamir, A. and Shah, M. 2010. Accurate image localization based on google maps street view. In Proceedings of the European Conference on Computer Vision (ECCV'10). 255--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhang, S., Yang, M., Cour, T., Yu, K., and Metaxas, D. N. 2012. Query specific fusion for image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV'12). Springer, 660--673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhang, W. and Kosecka, J. 2006. Image based localization in urban environments. In Proceedings of the 3rd International Symposium on 3D Data Processing, Visualization, and Transmission. IEEE, 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zhou, W., Lu, Y., Li, H., Song, Y., and Tian, Q. 2010. Spatial coding for large scale partial-duplicate web image search. In Proceedings of the International Conference on Multimedia. ACM, 511--520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhuang, J., Mei, T., Hoi, S. C., Xu, Y.-Q., and Li, S. 2011. When recommendation meets mobile: Contextual and personalized recommendation on the go. In Proceedings of the 13th International Conference on Ubiquitous Computing (UbiComp'11). ACM, New York, NY, 153--162. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Robust and accurate mobile visual localization and its applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 9, Issue 1s
          Special Sections on the 20th Anniversary of ACM International Conference on Multimedia, Best Papers of ACM Multimedia 2012
          October 2013
          218 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/2523001
          Issue’s Table of Contents

          Copyright © 2013 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 October 2013
          • Revised: 1 May 2013
          • Accepted: 1 May 2013
          • Received: 1 February 2013
          Published in tomm Volume 9, Issue 1s

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!