10.1145/1963405.1963443acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedings
research-article

Geographical topic discovery and comparison

Authors Info & Claims
Published:28 March 2011

ABSTRACT

This paper studies the problem of discovering and comparing geographical topics from GPS-associated documents. GPS-associated documents become popular with the pervasiveness of location-acquisition technologies. For example, in Flickr, the geo-tagged photos are associated with tags and GPS locations. In Twitter, the locations of the tweets can be identified by the GPS locations from smart phones. Many interesting concepts, including cultures, scenes, and product sales, correspond to specialized geographical distributions. In this paper, we are interested in two questions: (1) how to discover different topics of interests that are coherent in geographical regions? (2) how to compare several topics across different geographical locations? To answer these questions, this paper proposes and compares three ways of modeling geographical topics: location-driven model, text-driven model, and a novel joint model called LGTA (Latent Geographical Topic Analysis) that combines location and text. To make a fair comparison, we collect several representative datasets from Flickr website including Landscape, Activity, Manhattan, National park, Festival, Car, and Food. The results show that the first two methods work in some datasets but fail in others. LGTA works well in all these datasets at not only finding regions of interests but also providing effective comparisons of the topics across different locations. The results confirm our hypothesis that the geographical distributions can help modeling topics, while topics provide important cues to group different geographical regions.

References

  1. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Cao and L. Fei-Fei. Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. In ICCV, pages 1--8, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  3. D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell., 24(5):603--619, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. J. Crandall, L. Backstrom, D. P. Huttenlocher, and J. M. Kleinberg. Mapping the world's photos. In WWW, pages 761--770, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231, 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. S. Kennedy and M. Naaman. Generating diverse and representative image search results for landmarks. In WWW, pages 297--306, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In WWW, pages 101--110, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In WWW, pages 533--542, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. C. Niebles, H. Wang, and F.-F. Li. Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision, 79(3):299--318, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Rattenbury, N. Good, and M. Naaman. Towards automatic extraction of event and place semantics from flickr tags. In SIGIR, pages 103--110, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In CVPR (2), pages 1605--1614, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Sizov. Geofolk: latent spatial semantics in web 2.0 social media. In WSDM, pages 281--290, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Wang, J. Wang, X. Xie, and W.-Y. Ma. Mining geographic knowledge using location aware topic model. In GIR, pages 65--70, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In KDD, pages 424--433, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In KDD, pages 743--748, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Geographical topic discovery and comparison

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      ACM Other conferences cover image
      WWW '11: Proceedings of the 20th international conference on World wide web
      March 2011
      840 pages
      ISBN:9781450306324
      DOI:10.1145/1963405

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 March 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '11 Paper Acceptance Rate 81 of 658 submissions, 12%
      Overall Acceptance Rate 2,771 of 13,232 submissions, 21%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!