ABSTRACT
This paper studies the problem of discovering and comparing geographical topics from GPS-associated documents. GPS-associated documents become popular with the pervasiveness of location-acquisition technologies. For example, in Flickr, the geo-tagged photos are associated with tags and GPS locations. In Twitter, the locations of the tweets can be identified by the GPS locations from smart phones. Many interesting concepts, including cultures, scenes, and product sales, correspond to specialized geographical distributions. In this paper, we are interested in two questions: (1) how to discover different topics of interests that are coherent in geographical regions? (2) how to compare several topics across different geographical locations? To answer these questions, this paper proposes and compares three ways of modeling geographical topics: location-driven model, text-driven model, and a novel joint model called LGTA (Latent Geographical Topic Analysis) that combines location and text. To make a fair comparison, we collect several representative datasets from Flickr website including Landscape, Activity, Manhattan, National park, Festival, Car, and Food. The results show that the first two methods work in some datasets but fail in others. LGTA works well in all these datasets at not only finding regions of interests but also providing effective comparisons of the topics across different locations. The results confirm our hypothesis that the geographical distributions can help modeling topics, while topics provide important cues to group different geographical regions.
References
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google Scholar
Digital Library
- L. Cao and L. Fei-Fei. Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. In ICCV, pages 1--8, 2007.Google Scholar
Cross Ref
- D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell., 24(5):603--619, 2002. Google Scholar
Digital Library
- D. J. Crandall, L. Backstrom, D. P. Huttenlocher, and J. M. Kleinberg. Mapping the world's photos. In WWW, pages 761--770, 2009. Google Scholar
Digital Library
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231, 1996.Google Scholar
Digital Library
- T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999. Google Scholar
Digital Library
- L. S. Kennedy and M. Naaman. Generating diverse and representative image search results for landmarks. In WWW, pages 297--306, 2008. Google Scholar
Digital Library
- Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In WWW, pages 101--110, 2008. Google Scholar
Digital Library
- Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In WWW, pages 533--542, 2006. Google Scholar
Digital Library
- J. C. Niebles, H. Wang, and F.-F. Li. Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision, 79(3):299--318, 2008. Google Scholar
Digital Library
- T. Rattenbury, N. Good, and M. Naaman. Towards automatic extraction of event and place semantics from flickr tags. In SIGIR, pages 103--110, 2007. Google Scholar
Digital Library
- B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In CVPR (2), pages 1605--1614, 2006. Google Scholar
Digital Library
- S. Sizov. Geofolk: latent spatial semantics in web 2.0 social media. In WSDM, pages 281--290, 2010. Google Scholar
Digital Library
- C. Wang, J. Wang, X. Xie, and W.-Y. Ma. Mining geographic knowledge using location aware topic model. In GIR, pages 65--70, 2007. Google Scholar
Digital Library
- X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In KDD, pages 424--433, 2006. Google Scholar
Digital Library
- C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In KDD, pages 743--748, 2004. Google Scholar
Digital Library
Index Terms
Geographical topic discovery and comparison




Comments