Abstract
Given a large repository of geotagged imagery, we seek to automatically find visual elements, e. g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering approach able to take into account the weak geographic supervision. We show that geographically representative image elements can be discovered automatically from Google Street View imagery in a discriminative manner. We demonstrate that these elements are visually interpretable and perceptually geo-informative. The discovered visual elements can also support a variety of computational geography tasks, such as mapping architectural correspondences and influences within and across cities, finding representative elements at different geo-spatial scales, and geographically-informed image retrieval.
Supplemental Material
- Berg, T., and Berg, A. 2009. Finding iconic images. In The 2nd Internet Vision Workshop at Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Bourdev, L., and Malik, J. 2009. Poselets: Body part detectors trained using 3D human pose annotations. In IEEE 12th International Conference on Computer Vision (ICCV), 1365--1372.Google Scholar
- Chen, D., Baatz, G., Koser, K., Tsai, S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., and Grzeszczuk, R. 2011. City-scale landmark identification on mobile devices. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 737--744. Google Scholar
Digital Library
- Chum, O., Perdoch, M., and Matas, J. 2009. Geometric min-hashing: Finding a (thick) needle in a haystack. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 17--24.Google Scholar
- Crandall, D., Backstrom, L., Huttenlocher, D., and Kleinberg, J. 2009. Mapping the world's photos. In Proceedings of the 18th International Conference on World Wide Web (WWW), 761--770. Google Scholar
Digital Library
- Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 886--893. Google Scholar
Digital Library
- Fiss, J., Agarwala, A., and Curless, B. 2011. Candid portrait selection from video. ACM Transactions on Graphics (SIGGRAPH Asia) 30, 6, 128. Google Scholar
Digital Library
- Fulkerson, B., Vedaldi, A., and Soatto, S. 2008. Localizing objects with smart dictionaries. In European Conference on Computer Vision (ECCV), 179--192. Google Scholar
Digital Library
- Gong, Y., and Lazebnik, S. 2011. Iterative quantization: A procrustean approach to learning binary codes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 817--824. Google Scholar
Digital Library
- Gronat, P., Havlena, M., Sivic, J., and Pajdla, T. 2011. Building streetview datasets for place recognition and city reconstruction. Tech. Rep. CTU-CMP-2011-16, Czech Tech Univ.Google Scholar
- Hays, J., and Efros, A. 2008. Im2gps: estimating geographic information from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1--8.Google Scholar
- Kalogerakis, E., Vesselova, O., Hays, J., Efros, A., and Hertzmann, A. 2009. Image sequence geolocation with human travel priors. In IEEE 12th International Conference on Computer Vision (ICCV), 253--260.Google Scholar
- Karlinsky, L., Dinerstein, M., and Ullman, S. 2009. Unsupervised feature optimization (ufo): Simultaneous selection of multiple features with their detection parameters. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1263--1270.Google Scholar
- Knopp, J., Sivic, J., and Pajdla, T. 2010. Avoiding confusing features in place recognition. In European Conference on Computer Vision (ECCV), 748--761. Google Scholar
Digital Library
- Lee, Y., and Grauman, K. 2009. Foreground focus: Unsupervised learning from partially matching images. International Journal of Computer Vision (IJCV) 85, 2, 143--166. Google Scholar
Digital Library
- Li, X., Wu, C., Zach, C., Lazebnik, S., and Frahm, J.-M. 2008. Modeling and recognition of landmark image collections using iconic scene graphs. In European Conference on Computer Vision (ECCV), 427--440. Google Scholar
Digital Library
- Li, Y., Crandall, D., and Huttenlocher, D. 2009. Landmark classification in large-scale image collections. In IEEE 12th International Conference on Computer Vision (ICCV), 1957--1964.Google Scholar
- Li, L., Su, H., Xing, E., and Fei-Fei, L. 2010. Object bank: A high-level image representation for scene classification and semantic feature sparsification. In Advances in Neural Information Processing Systems (NIPS), vol. 24.Google Scholar
- Loyer, F. 1988. Paris nineteenth century: architecture and urbanism, 1st american ed. Abbeville Press, New York.Google Scholar
- Moosmann, F., Triggs, B., and Jurie, F. 2007. Fast discriminative visual codebooks using randomized clustering forests. In Advances in Neural Information Processing Systems (NIPS), vol. 19.Google Scholar
- Mueller, P., Wonka, P., Haegler, S., Ulmer, A., and Van Gool, L. 2006. Procedural modeling of buildings. ACM Transactions on Graphics (SIGGRAPH) 25, 3, 614--623. Google Scholar
Digital Library
- Oliva, A., and Torralba, A. 2006. Building the gist of a scene: The role of global image features in recognition. Progress in brain research 155, 23--36.Google Scholar
- Paik, K. 2006. The Art of Ratatouille. Chronicle Books.Google Scholar
- Quack, T., Leibe, B., and Van Gool, L. 2008. Worldscale mining of objects and events from community photo collections. In Proceedings of the International Conference on Content-based Image and Video Retrieval (CIVR), 47--56. Google Scholar
Digital Library
- Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., and Zisserman, A. 2006. Using multiple segmentations to discover objects and their extent in image collections. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1605--1614. Google Scholar
Digital Library
- Schindler, G., Brown, M., and Szeliski, R. 2007. City-scale location recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1--7.Google Scholar
- Shotton, J., Johnson, M., and Cipolla, R. 2008. Semantic texton forests for image categorization and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1--8.Google Scholar
- Shrivastava, A., Malisiewicz, T., Gupta, A., and Efros, A. A. 2011. Data-driven visual similarity for cross-domain image matching. ACM Transactions on Graphics (SIGGRAPH Asia) 30, 6, 154. Google Scholar
Digital Library
- Simon, I., Snavely, N., and Seitz, S. M. 2007. Scene summarization for online image collections. In IEEE 11th International Conference on Computer Vision (ICCV), 1--8.Google Scholar
- Singh, S., Gupta, A., and Efros, A. A. 2012. Unsupervised discovery of mid-level discriminative patches. arXiv:1205.3137 {cs.CV}. Google Scholar
Digital Library
- Sivic, J., and Zisserman, A. 2003. Video google: A text retrieval approach to object matching in videos. In IEEE 9th International Conference on Computer Vision (ICCV), 1470--1477. Google Scholar
Digital Library
- Sutcliffe, A. 1996. Paris: an architectural history. Yale University Press.Google Scholar
- Teboul, O., Simon, L., Koutsourakis, P., and Paragios, N. 2010. Segmentation of building facades using procedural shape priors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3105--3112.Google Scholar
- Torralba, A., and Oliva, A. 2003. Statistics of natural image categories. Network: Computation in Neural Systems, 391--412.Google Scholar
- Zheng, Y.-T., Zhao, M., Song, Y., Adam, H., Buddemeier, U., Bissacco, A., Brucher, F., Chua, T.-S., and Neven, H. 2009. Tour the world: building a web-scale landmark recognition engine. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1085--1092.Google Scholar
Index Terms
What makes Paris look like Paris?
Recommendations
What makes Paris look like Paris?
Given a large repository of geo-tagged imagery, we seek to automatically find visual elements, for example windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a ...
Navigation using special buildings as signposts
MapInteract '14: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Interacting with MapsNavigation has been greatly improved by positioning systems, but visualization still relies on maps. Yet because they only represent an abstract street network, maps are sometimes difficult to read. Conversely, Tourist Maps, which are enriched with ...
Movement bias in visual attention for perceptually-guided selective rendering of animations
SCCG '07: Proceedings of the 23rd Spring Conference on Computer GraphicsThe Human Visual System (HVS) is a key part of the rendering pipeline. The human eye is only capable of sensing image detail in a 2° foveal region, relying on rapid eye movements, or saccades, to jump between points of interest. These points of interest ...





Comments