Abstract
Mobile applications are becoming increasingly popular. More and more people are using their phones to enjoy ubiquitous location-based services (LBS). The increasing popularity of LBS creates a fundamental problem: mobile localization. Besides traditional localization methods that use GPS or wireless signals, using phone-captured images for localization has drawn significant interest from researchers. Photos contain more scene context information than the embedded sensors, leading to a more precise location description. With the goal being to accurately sense real geographic scene contexts, this article presents a novel approach to mobile visual localization according to a given image (typically associated with a rough GPS position). The proposed approach is capable of providing a complete set of more accurate parameters about the scene geo-context including the real locations of both the mobile user and perhaps more importantly the captured scene, as well as the viewing direction. To figure out how to make image localization quick and accurate, we investigate various techniques for large-scale image retrieval and 2D-to-3D matching. Specifically, we first generate scene clusters using joint geo-visual clustering, with each scene being represented by a reconstructed 3D model from a set of images. The 3D models are then indexed using a visual vocabulary tree structure. Taking geo-tags of the database image as prior knowledge, a novel location-based codebook weighting scheme proposed to embed this additional information into the codebook. The discriminative power of the codebook is enhanced, thus leading to better image retrieval performance. The query image is aligned with the models obtained from the image retrieval results, and eventually registered to a real-world map. We evaluate the effectiveness of our approach using several large-scale datasets and achieving estimation accuracy of a user's location within 13 meters, viewing direction within 12 degrees, and viewing distance within 26 meters. Of particular note is our showcase of three novel applications based on localization results: (1) an on-the-spot tour guide, (2) collaborative routing, and (3) a sight-seeing guide. The evaluations through user studies demonstrate that these applications are effective in facilitating the ideal rendezvous for mobile users.
- Avrithis, Y., Kalantidis, Y., Tolias, G., and Spyrou, E. 2010. Retrieving landmark and non-landmark images from community photo collections. In Proceedings of the International Conference on Multimedia. ACM, 153--162. Google Scholar
Digital Library
- Bourke, S., McCarthy, K., and Smyth, B. 2011. The social camera: A case-study in contextual image recommendation. In Proceedings of the 16th International Conference on Intelligent User Interfaces. ACM, 13--22. Google Scholar
Digital Library
- Chen, D., Baatz, G., Koser, K., Tsai, S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., and Grzeszczuk, R. 2011. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 737--744. Google Scholar
Digital Library
- Crandall, D., Backstrom, L., Huttenlocher, D., and Kleinberg, J. 2009. Mapping the world's photos. In Proceedings of the 18th International Conference on World Wide Web. ACM, 761--770. Google Scholar
Digital Library
- Doersch, C., Singh, S., Gupta, A., Sivic, J., and Efros, A. A. 2012. What makes Paris look like Paris? ACM Trans. Graph. 31, 4, 101:1--101:9. Google Scholar
Digital Library
- Frey, B. and Dueck, D. 2007. Clustering by passing messages between data points. Science 315, 5814, 972--976.Google Scholar
- Girod, B., Chandrasekhar, V., Chen, D., Cheung, N., Grzeszczuk, R., Reznik, Y., Takacs, G., Tsai, S., and Vedantham, R. 2011. Mobile visual search. IEEE Signal Proces. Mag. 28, 4, 61--76.Google Scholar
Cross Ref
- Hartley, R. I. and Zisserman, A. 2004. Multiple View Geometry in Computer Vision 2nd Ed. Cambridge University Press. Google Scholar
Digital Library
- Irschara, A., Zach, C., Frahm, J., and Bischof, H. 2009. From structure-from-motion point clouds to fast location recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). IEEE, 2599--2606.Google Scholar
- Ji, R., Duan, L., Chen, J., Yao, H., Rui, Y., Chang, S., and Gao, W. 2011. Towards low bit rate mobile visual search with multiple-channel coding. In Proceedings of the 19th ACM International Conference on Multimedia. ACM, 573--582. Google Scholar
Digital Library
- Ji, R., Duan, L., Chen, J., Yao, H., Yuan, J., Rui, Y., and Gao, W. 2012. Location discriminative vocabulary coding for mobile landmark search. Int. J. Comput. Vision 96, 3, 290--314. Google Scholar
Digital Library
- Josephson, K. and Byrod, M. 2009. Pose estimation with radial distortion and unknown focal length. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). IEEE, 2419--2426.Google Scholar
- Knopp, J., Sivic, J., and Pajdla, T. 2010. Avoiding confusing features in place recognition. In Proceedings of the European Conference on Computer Vision (ECCV'10). 748--761. Google Scholar
Digital Library
- Kroepfl, M., Wexler, Y., and Ofek, E. 2010. Efficiently locating photographs in many panoramas. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 119--128. Google Scholar
Digital Library
- Li, X., Wu, C., Zach, C., Lazebnik, S., and Frahm, J. 2008. Modeling and recognition of landmark image collections using iconic scene graphs. In Proceedings of the European Conference on Computer Vision (ECCV'08). 427--440. Google Scholar
Digital Library
- Li, Y., Snavely, N., and Huttenlocher, D. 2010. Location recognition using prioritized feature matching. In Proceedings of the European Conference on Computer Vision (ECCV'10). 791--804. Google Scholar
Digital Library
- Liu, H., Mei, T., Luo, J., Li, H., and Li, S. 2012. Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing. In Proceedings of the 20th ACM International Conference on Multimedia (MM'12). ACM, New York, NY, 9--18. Google Scholar
Digital Library
- Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2, 91--110. Google Scholar
Digital Library
- Luo, J., Joshi, D., Yu, J., and Gallagher, A. 2011. Geotagging in multimedia and computer visionla survey. Multimedia Tools Appl. 51, 1, 187--211. Google Scholar
Digital Library
- Luo, Z., Li, H., Tang, J., Hong, R., and Chua, T. 2009. Viewfocus: Explore places of interests on Google maps using photos with view direction filtering. In Proceedings of the 17th ACM International Conference on Multimedia. ACM, 963--964. Google Scholar
Digital Library
- Nistér, D. 2004. An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intel. 26, 6, 756--770. Google Scholar
Digital Library
- Nister, D. and Stewenius, H. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Vol. 2, IEEE, 2161--2168. Google Scholar
Digital Library
- Park, M., Luo, J., Collins, R., and Liu, Y. 2010. Beyond GPS: Determining the camera viewing direction of a geotagged image. In Proceedings of the International Conference on Multimedia. ACM, 631--634. Google Scholar
Digital Library
- Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07). IEEE, 1--8.Google Scholar
- Philbin, J., Isard, M., Sivic, J., and Zisserman, A. 2010. Descriptor learning for efficient retrieval. In Proceedings of the European Conference on Computer Vision (ECCV'10). Springer, 677--691. Google Scholar
Digital Library
- Sattler, T., Leibe, B., and Kobbelt, L. 2011. Fast image-based localization using direct 2d-to-3d matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'11) IEEE, 667--674. Google Scholar
Digital Library
- Schindler, G., Brown, M., and Szeliski, R. 2007. City-scale location recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'07). IEEE, 1--7.Google Scholar
- Schroth, G., Huitl, R., Chen, D., Abu-Alqumsan, M., Al-Nuaimi, A., and Steinbach, E. 2011. Mobile visual location recognition. IEEE Signal Process. Mag. 28, 4, 77--89.Google Scholar
Cross Ref
- Sivic, J. and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision. IEEE, 1470--1477. Google Scholar
Digital Library
- Snavely, N., Seitz, S., and Szeliski, R. 2006. Photo tourism: Exploring photo collections in 3d. ACM Trans. Graph. 25, 835--846. Google Scholar
Digital Library
- Turcot, P. and Lowe, D. 2009. Better matching with fewer features: The selection of useful features in large database recognition problems. In Proceedings of the IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 2109--2116.Google Scholar
- Wang, X., Yang, M., Cour, T., Zhu, S., Yu, K., and Han, T. X. 2011. Contextual weighting for vocabulary tree based image retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). IEEE, 209--216. Google Scholar
Digital Library
- Yu, F., Ji, R., and Chang, S. 2011. Active query sensing for mobile location search. In Proceedings of the 19th ACM International Conference on Multimedia. ACM, 3--12. Google Scholar
Digital Library
- Zamir, A. and Shah, M. 2010. Accurate image localization based on google maps street view. In Proceedings of the European Conference on Computer Vision (ECCV'10). 255--268. Google Scholar
Digital Library
- Zhang, S., Yang, M., Cour, T., Yu, K., and Metaxas, D. N. 2012. Query specific fusion for image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV'12). Springer, 660--673. Google Scholar
Digital Library
- Zhang, W. and Kosecka, J. 2006. Image based localization in urban environments. In Proceedings of the 3rd International Symposium on 3D Data Processing, Visualization, and Transmission. IEEE, 33--40. Google Scholar
Digital Library
- Zhou, W., Lu, Y., Li, H., Song, Y., and Tian, Q. 2010. Spatial coding for large scale partial-duplicate web image search. In Proceedings of the International Conference on Multimedia. ACM, 511--520. Google Scholar
Digital Library
- Zhuang, J., Mei, T., Hoi, S. C., Xu, Y.-Q., and Li, S. 2011. When recommendation meets mobile: Contextual and personalized recommendation on the go. In Proceedings of the 13th International Conference on Ubiquitous Computing (UbiComp'11). ACM, New York, NY, 153--162. Google Scholar
Digital Library
Index Terms
Robust and accurate mobile visual localization and its applications
Recommendations
Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing
MM '12: Proceedings of the 20th ACM international conference on MultimediaWhile on the go, more and more people are using their phones to enjoy ubiquitous location-based services (LBS). One of the fundamental problems of LBS is localization. Researchers are now investigating ways to use a phone-captured image for localization ...
Accurate sensing of scene geo-context via mobile visual localization
Image geo-tagging has drawn a great deal of attention in recent years. The geographic information associated with images can be used to promote potential applications such as location recognition or virtual navigation. In this paper, we propose a novel ...
AMIGO: accurate mobile image geotagging
ICIMCS '12: Proceedings of the 4th International Conference on Internet Multimedia Computing and ServiceWith location-based services gaining popularity among mobile users, researchers are exploring the way using the phone-captured image for localization as it contains more context information than the embedded sensory GPS coordinates. We present in this ...






Comments