skip to main content
research-article

What makes Paris look like Paris?

Published:01 July 2012Publication History
Skip Abstract Section

Abstract

Given a large repository of geotagged imagery, we seek to automatically find visual elements, e. g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering approach able to take into account the weak geographic supervision. We show that geographically representative image elements can be discovered automatically from Google Street View imagery in a discriminative manner. We demonstrate that these elements are visually interpretable and perceptually geo-informative. The discovered visual elements can also support a variety of computational geography tasks, such as mapping architectural correspondences and influences within and across cities, finding representative elements at different geo-spatial scales, and geographically-informed image retrieval.

Skip Supplemental Material Section

Supplemental Material

tp212_12.mp4

References

  1. Berg, T., and Berg, A. 2009. Finding iconic images. In The 2nd Internet Vision Workshop at Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  2. Bourdev, L., and Malik, J. 2009. Poselets: Body part detectors trained using 3D human pose annotations. In IEEE 12th International Conference on Computer Vision (ICCV), 1365--1372.Google ScholarGoogle Scholar
  3. Chen, D., Baatz, G., Koser, K., Tsai, S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., and Grzeszczuk, R. 2011. City-scale landmark identification on mobile devices. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 737--744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chum, O., Perdoch, M., and Matas, J. 2009. Geometric min-hashing: Finding a (thick) needle in a haystack. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 17--24.Google ScholarGoogle Scholar
  5. Crandall, D., Backstrom, L., Huttenlocher, D., and Kleinberg, J. 2009. Mapping the world's photos. In Proceedings of the 18th International Conference on World Wide Web (WWW), 761--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 886--893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fiss, J., Agarwala, A., and Curless, B. 2011. Candid portrait selection from video. ACM Transactions on Graphics (SIGGRAPH Asia) 30, 6, 128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fulkerson, B., Vedaldi, A., and Soatto, S. 2008. Localizing objects with smart dictionaries. In European Conference on Computer Vision (ECCV), 179--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gong, Y., and Lazebnik, S. 2011. Iterative quantization: A procrustean approach to learning binary codes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 817--824. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gronat, P., Havlena, M., Sivic, J., and Pajdla, T. 2011. Building streetview datasets for place recognition and city reconstruction. Tech. Rep. CTU-CMP-2011-16, Czech Tech Univ.Google ScholarGoogle Scholar
  11. Hays, J., and Efros, A. 2008. Im2gps: estimating geographic information from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1--8.Google ScholarGoogle Scholar
  12. Kalogerakis, E., Vesselova, O., Hays, J., Efros, A., and Hertzmann, A. 2009. Image sequence geolocation with human travel priors. In IEEE 12th International Conference on Computer Vision (ICCV), 253--260.Google ScholarGoogle Scholar
  13. Karlinsky, L., Dinerstein, M., and Ullman, S. 2009. Unsupervised feature optimization (ufo): Simultaneous selection of multiple features with their detection parameters. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1263--1270.Google ScholarGoogle Scholar
  14. Knopp, J., Sivic, J., and Pajdla, T. 2010. Avoiding confusing features in place recognition. In European Conference on Computer Vision (ECCV), 748--761. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lee, Y., and Grauman, K. 2009. Foreground focus: Unsupervised learning from partially matching images. International Journal of Computer Vision (IJCV) 85, 2, 143--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Li, X., Wu, C., Zach, C., Lazebnik, S., and Frahm, J.-M. 2008. Modeling and recognition of landmark image collections using iconic scene graphs. In European Conference on Computer Vision (ECCV), 427--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Li, Y., Crandall, D., and Huttenlocher, D. 2009. Landmark classification in large-scale image collections. In IEEE 12th International Conference on Computer Vision (ICCV), 1957--1964.Google ScholarGoogle Scholar
  18. Li, L., Su, H., Xing, E., and Fei-Fei, L. 2010. Object bank: A high-level image representation for scene classification and semantic feature sparsification. In Advances in Neural Information Processing Systems (NIPS), vol. 24.Google ScholarGoogle Scholar
  19. Loyer, F. 1988. Paris nineteenth century: architecture and urbanism, 1st american ed. Abbeville Press, New York.Google ScholarGoogle Scholar
  20. Moosmann, F., Triggs, B., and Jurie, F. 2007. Fast discriminative visual codebooks using randomized clustering forests. In Advances in Neural Information Processing Systems (NIPS), vol. 19.Google ScholarGoogle Scholar
  21. Mueller, P., Wonka, P., Haegler, S., Ulmer, A., and Van Gool, L. 2006. Procedural modeling of buildings. ACM Transactions on Graphics (SIGGRAPH) 25, 3, 614--623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Oliva, A., and Torralba, A. 2006. Building the gist of a scene: The role of global image features in recognition. Progress in brain research 155, 23--36.Google ScholarGoogle Scholar
  23. Paik, K. 2006. The Art of Ratatouille. Chronicle Books.Google ScholarGoogle Scholar
  24. Quack, T., Leibe, B., and Van Gool, L. 2008. Worldscale mining of objects and events from community photo collections. In Proceedings of the International Conference on Content-based Image and Video Retrieval (CIVR), 47--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., and Zisserman, A. 2006. Using multiple segmentations to discover objects and their extent in image collections. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1605--1614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Schindler, G., Brown, M., and Szeliski, R. 2007. City-scale location recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1--7.Google ScholarGoogle Scholar
  27. Shotton, J., Johnson, M., and Cipolla, R. 2008. Semantic texton forests for image categorization and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1--8.Google ScholarGoogle Scholar
  28. Shrivastava, A., Malisiewicz, T., Gupta, A., and Efros, A. A. 2011. Data-driven visual similarity for cross-domain image matching. ACM Transactions on Graphics (SIGGRAPH Asia) 30, 6, 154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Simon, I., Snavely, N., and Seitz, S. M. 2007. Scene summarization for online image collections. In IEEE 11th International Conference on Computer Vision (ICCV), 1--8.Google ScholarGoogle Scholar
  30. Singh, S., Gupta, A., and Efros, A. A. 2012. Unsupervised discovery of mid-level discriminative patches. arXiv:1205.3137 {cs.CV}. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sivic, J., and Zisserman, A. 2003. Video google: A text retrieval approach to object matching in videos. In IEEE 9th International Conference on Computer Vision (ICCV), 1470--1477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sutcliffe, A. 1996. Paris: an architectural history. Yale University Press.Google ScholarGoogle Scholar
  33. Teboul, O., Simon, L., Koutsourakis, P., and Paragios, N. 2010. Segmentation of building facades using procedural shape priors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3105--3112.Google ScholarGoogle Scholar
  34. Torralba, A., and Oliva, A. 2003. Statistics of natural image categories. Network: Computation in Neural Systems, 391--412.Google ScholarGoogle Scholar
  35. Zheng, Y.-T., Zhao, M., Song, Y., Adam, H., Buddemeier, U., Bissacco, A., Brucher, F., Chua, T.-S., and Neven, H. 2009. Tour the world: building a web-scale landmark recognition engine. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1085--1092.Google ScholarGoogle Scholar

Index Terms

  1. What makes Paris look like Paris?

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 31, Issue 4
        July 2012
        935 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/2185520
        Issue’s Table of Contents

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 July 2012
        Published in tog Volume 31, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader