Abstract
This article considers the problem of automatically discovering geo-informative attributes for location recognition and exploration. The attributes are expected to be both discriminative and representative, which correspond to certain distinctive visual patterns and associate with semantic interpretations. For our solution, we analyze the attribute at the region level. Each segmented region in the training set is assigned a binary latent variable indicating its discriminative capability. A latent learning framework is proposed for discriminative region detection and geo-informative attribute discovery. Moreover, we use user-generated content to obtain the semantic interpretation for the discovered visual attributes. Discriminative and search-based attribute annotation methods are developed for geo-informative attribute interpretation. The proposed approach is evaluated on one challenging dataset including GoogleStreetView and Flickr photos. Experimental results show that (1) geo-informative attributes are discriminative and useful for location recognition; (2) the discovered semantic interpretation is meaningful and can be exploited for further location exploration.
- Chao-Yeh Chen and Kristen Grauman. 2011. Clues from the beaten path: Location estimation with bursty sequences of tourist photos. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR). 1569--1576. Google Scholar
Digital Library
- David M. Chen, Georges Baatz, Kevin Köser, Sam S. Tsai, Ramakrishna Vedantham, Timo Pylvänäinen, Kimmo Roimela, Xin Chen, Jeff Bach, Marc Pollefeys, Bernd Girod, and Radek Grzeszczuk. 2011. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR). 737--744. Google Scholar
Digital Library
- David J. Crandall, Lars Backstrom, Daniel P. Huttenlocher, and Jon M. Kleinberg. 2009. Mapping the world's photos. In Proceedings of the 18th International World Wide Web Conference (WWW). 761--770. Google Scholar
Digital Library
- Trinh Minh Tri Do and Thierry Artières. 2009. Large margin training for hidden Markov models with partially observed states. In Proceedings of the 26th International Conference on Machine Learning (ICML). 265--272. Google Scholar
Digital Library
- Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. 2012. What makes Paris look like Paris? ACM Trans. Graph. 31, 4 (2012), 101. Google Scholar
Digital Library
- Kun Duan, Devi Parikh, David J. Crandall, and Kristen Grauman. 2012. Discovering localized attributes for fine-grained recognition. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR). 3474--3481. Google Scholar
Digital Library
- Quan Fang, Jitao Sang, and Changsheng Xu. 2013a. GIANT: Geo-informative attributes for location recognition and exploration. In Proceedings of the Conference on ACM Multimedia. 13--22. Google Scholar
Digital Library
- Quan Fang, Jitao Sang, Changsheng Xu, and Ke Lu. 2013b. Paint the city colorfully: Location visualization from multiple themes. In Proceedings of the 19th International Conference on Multimedia Modeling (MMM). 92--105.Google Scholar
Cross Ref
- Ali Farhadi, Ian Endres, Derek Hoiem, and David A. Forsyth. 2009. Describing objects by their attributes. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR). 1778--1785.Google Scholar
- Pedro F. Felzenszwalb, David A. McAllester, and Deva Ramanan. 2008. A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR). 1--8.Google Scholar
Cross Ref
- Gerald Friedland, Jaeyoung Choi, and Adam Janin. 2011. Video2GPS: A demo of multimodal location estimation on Flickr videos. In Proceedings of the Conference on ACM Multimedia. 833--834. Google Scholar
Digital Library
- Gerald Friedland, Oriol Vinyals, and Trevor Darrell. 2010. Multimodal location estimation. In Proceedings of the Conference on ACM Multimedia. 1245--1252. Google Scholar
Digital Library
- Google Maps. 2014. Barcelona, ESP. Google Maps. http://maps.google.com. (Last accessed Jan 2014.)Google Scholar
- Petr Gronat, Michal Havlena, Josef Sivic, and Tomas Pajdla. 2011. Building streetview datasets for place recognition and city reconstruction. Technical Report CTU-CMP-2011-16. Czech Tech University.Google Scholar
- Qiang Hao, Rui Cai, Xin-Jing Wang, Jiang-Ming Yang, Yanwei Pang, and Lei Zhang. 2009. Generating location overviews with images and tags by mining user-generated travelogues. In Proceedings of the International Conference on Multimedia. 801--804. Google Scholar
Digital Library
- James Hays and Alexei A. Efros. 2008. IM2GPS: Estimating geographic information from a single image. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR). 1--8.Google Scholar
- Livia Hollenstein and Ross Purves. 2010. Exploring place through user-generated content: Using Flickr tags to describe city cores. J. Spatial Inform. Sci. 1, 1 (2010), 21--48.Google Scholar
- Alexander Jaffe, Mor Naaman, Tamir Tassa, and Marc Davis. 2006. Generating summaries and visualization for large collections of geo-referenced photographs. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval. 89--98. Google Scholar
Digital Library
- Feng Jing, Lei Zhang, and Wei-Ying Ma. 2006. VirtualTour: An online travel assistant based on high quality images. In Proceedings of the Conference on ACM Multimedia. 599--602. Google Scholar
Digital Library
- Evangelos Kalogerakis, Olga Vesselova, James Hays, Alexei A. Efros, and Aaron Hertzmann. 2009. Image sequence geolocation with human travel priors. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 253--260.Google Scholar
Cross Ref
- Lyndon S. Kennedy, Mor Naaman, Shane Ahern, Rahul Nair, and Tye Rattenbury. 2007. How Flickr helps us make sense of the world: Context and content in community-contributed media collections. In Proceedings of the International Conference on Multimedia. 631--640. Google Scholar
Digital Library
- Xiaowei Li, Changchang Wu, Christopher Zach, Svetlana Lazebnik, and Jan-Michael Frahm. 2008. Modeling and recognition of landmark image collections using iconic scene graphs. In Proceedings of the European Conference on Computer Vision (ECCV). 427--440. Google Scholar
Digital Library
- Yunpeng Li, David J. Crandall, and Daniel P. Huttenlocher. 2009. Landmark classification in large-scale image collections. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 1957--1964.Google Scholar
- Tsung-Yi Lin, Serge Belongie, and James Hays. 2013. Cross-view image geolocalization. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR). 891--898. Google Scholar
Digital Library
- Heng Liu, Tao Mei, Jiebo Luo, Houqiang Li, and Shipeng Li. 2012b. Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing. In Proceedings of the 20th ACM International Conference on Multimedia. 9--18. Google Scholar
Digital Library
- Jiajun Liu, Zi Huang, Lei Chen, Heng Tao Shen, and Zhixian Yan. 2012a. Discovering areas of interest with geo-tagged images and check-ins. In Proceedings of the International Conference on Multimedia. 589--598. Google Scholar
Digital Library
- Jiebo Luo, Dhiraj Joshi, Jie Yu, and Andrew C. Gallagher. 2011. Geotagging in multimedia and computer vision - A survey. Multimedia Tools Appl. 51, 1 (2011), 187--211. Google Scholar
Digital Library
- Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). 467--475. Google Scholar
Digital Library
- Symeon Papadopoulos, Christos Zigkolis, Stefanos Kapiris, Yiannis Kompatsiaris, and Athena Vakali. 2010. ClustTour: City exploration by use of hybrid photo clustering. In Proceedings of the International Conference on Multimedia. 1617--1620. Google Scholar
Digital Library
- Devi Parikh and Kristen Grauman. 2011. Relative attributes. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 503--510. Google Scholar
Digital Library
- Sobhan Naderi Parizi, John G. Oberlin, and Pedro F. Felzenszwalb. 2012. Reconfigurable models for scene recognition. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR). 2775--2782. Google Scholar
Digital Library
- Genevieve Patterson and James Hays. 2012. SUN attribute database: Discovering, annotating, and recognizing scene attributes. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR). 2751--2758. Google Scholar
Digital Library
- Tye Rattenbury and Mor Naaman. 2009. Methods for extracting place semantics from Flickr tags. ACM Trans. Web 3, 1 (2009), 1. Google Scholar
Digital Library
- Jitao Sang, Changsheng Xu, and Jing Liu. 2012. User-aware image tag refinement via ternary semantic analysis. IEEE Trans. Multimedia 14, 3--2 (2012), 883--895.Google Scholar
Digital Library
- Grant Schindler, Matthew Brown, and Richard Szeliski. 2007. City-scale location recognition. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR).Google Scholar
Cross Ref
- Pavel Serdyukov, Vanessa Murdock, and Roelof van Zwol. 2009. Placing flickr photos on a map. In Proceedings of the ACM SIGIR Conference. 484--491. Google Scholar
Digital Library
- Jan van Gemert, Cor J. Veenman, Arnold W. M. Smeulders, and Jan-Mark Geusebroek. 2010. Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. 32, 7 (2010), 1271--1283. Google Scholar
Digital Library
- Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas S. Huang, and Yihong Gong. 2010. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR). 3360--3367.Google Scholar
Cross Ref
- Xian Xiao, Changsheng Xu, Jinqiao Wang, and Min Xu. 2012. Enhanced 3-D Modeling for landmark image classification. IEEE Trans. Multimedia 14, 4 (2012), 1246--1258. Google Scholar
Digital Library
- Oksana Yakhnenko, Jakob Verbeek, and Cordelia Schmid. 2011. Region-based image classification with a latent SVM model. Rapport de recherche RR-7665. INRIA. http://hal.inria.fr/inria-00605344Google Scholar
- Chun-Nam John Yu and Thorsten Joachims. 2009. Learning structural SVMs with latent variables. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML). 1169--1176. Google Scholar
Digital Library
- Zheng-Jun Zha, Meng Wang, Yan-Tao Zheng, Yi Yang, Richang Hong, and Tat-Seng Chua. 2012. Interactive video indexing with statistical active learning. IEEE Trans. Multimedia 14, 1 (2012), 17--27. Google Scholar
Digital Library
- Zheng-Jun Zha, Linjun Yang, Tao Mei, Meng Wang, and Zengfu Wang. 2009. Visual query suggestion. In Proceedings of the Conference on ACM Multimedia. 15--24. Google Scholar
Digital Library
- Zheng-Jun Zha, Linjun Yang, Tao Mei, Meng Wang, Zengfu Wang, Tat-Seng Chua, and Xian-Sheng Hua. 2010. Visual query suggestion: Towards capturing user intent in internet image search. ACM Trans. Multimedia Comput. Commun. Appl. 6, 3 (2010). Google Scholar
Digital Library
- Zheng-Jun Zha, Hanwang Zhang, Meng Wang, Huan-Bo Luan, and Tat-Seng Chua. 2013. Detecting group activities with multi-camera context. IEEE Trans. Circuits Syst. Video Technol. 23, 5 (2013), 856--869. Google Scholar
Digital Library
- Yantao Zheng, Ming Zhao, Yang Song, Hartwig Adam, Ulrich Buddemeier, Alessandro Bissacco, Fernando Brucher, Tat-Seng Chua, and Hartmut Neven. 2009. Tour the world: Building a web-scale landmark recognition engine. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference (CVPR). 1085--1092.Google Scholar
Cross Ref
- Yan-Tao Zheng, Zheng-Jun Zha, and Tat-Seng Chua. 2011. Research and applications on georeferenced multimedia: A survey. Multimedia Tools Appl. 51, 1 (2011), 77--98. Google Scholar
Digital Library
- Yan-Tao Zheng, Zheng-Jun Zha, and Tat-Seng Chua. 2012. Mining travel patterns from geotagged photos. ACM Trans. Intell. Syst. Technol. 3, 3 (2012), 56. Google Scholar
Digital Library
- Jianke Zhu, Steven C. H. Hoi, Michael R. Lyu, and Shuicheng Yan. 2008. Near-duplicate keyframe retrieval by nonrigid image matching. In Proceedings of the Conference on ACM Multimedia. 41--50. Google Scholar
Digital Library
Index Terms
Discovering Geo-Informative Attributes for Location Recognition and Exploration
Recommendations
GIANT: geo-informative attributes for location recognition and exploration
MM '13: Proceedings of the 21st ACM international conference on MultimediaThis paper considers the problem of automatically discovering geo-informative attributes for location recognition and exploration. The attribute is expected to be both discriminative and representative, which corresponds to a distinctive visual pattern ...
Discovering localized attributes for fine-grained recognition
CVPR '12: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)Attributes are visual concepts that can be detected by machines, understood by humans, and shared across categories. They are particularly useful for fine-grained domains where categories are closely related to one other (e.g. bird species recognition). ...
Predicting Geo-informative Attributes in Large-Scale Image Collections Using Convolutional Neural Networks
WACV '15: Proceedings of the 2015 IEEE Winter Conference on Applications of Computer VisionGeographic location is a powerful property for organizing large-scale photo collections, but only a small fraction of online photos are geo-tagged. Most work in automatically estimating geo-tags from image content is based on comparison against models ...






Comments