skip to main content
research-article

A Pseudo-likelihood Approach for Geo-localization of Events from Crowd-sourced Sensor-Metadata

Published:20 August 2019Publication History
Skip Abstract Section

Abstract

Events such as live concerts, protest marches, and exhibitions are often video recorded by many people at the same time, typically using smartphone devices. In this work, we address the problem of geo-localizing such events from crowd-generated data. Traditional approaches for solving such a problem using multiple video sequences of the event would require highly complex computer vision (CV) methods, which are computation intensive and are not robust under the environment where visual data are collected through crowd-sourced medium. In the present work, we approach the problem in a probabilistic framework using only the sensor metadata obtained from smartphones. We model the event location and camera locations and orientations (camera parameters) as the hidden states in a Hidden Markov Model. The sensor metadata from GPS and the digital compass from user smartphones are used as the observations associated with the hidden states of the model. We have used a suitable potential function to capture the complex interaction between the hidden states (i.e., event location and camera parameters). The non-Gaussian densities involved in the model, such as the potential function involving hidden states, make the maximum-likelihood estimation intractable. We propose a pseudo-likelihood-based approach to maximize the approximate-likelihood, which provides a tractable solution to the problem. The experimental results on the simulated as well as real data show correct event geo-localization using the proposed method. When compared with several baselines the proposed method shows a superior performance. The overall computation time required is much smaller, since only the sensor metadata are used instead of visual data.

Skip Supplemental Material Section

Supplemental Material

References

  1. Sakire Arslan Ay, Lingyan Zhang, Seon Ho Kim, Ma He, and Roger Zimmermann. 2009. GRVS: A georeferenced video search engine. In Proceedings of the ACM Conference on Multimedia. 977--978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Clemens Arth, Alessandro Mulloni, and Dieter Schmalstieg. 2012. Exploiting sensors on mobile phones to improve wide-area localization. In Proceedings of the IEEE International Conference on Pattern Recognition (ICPR’12). 2152--2156.Google ScholarGoogle Scholar
  3. Julian Besag. 1975. Statistical analysis of non-lattice data. The Statistician 24, 3 (1975), 179--195.Google ScholarGoogle ScholarCross RefCross Ref
  4. Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Thanh-Hieu Bui, A.-Yeong Kim, Seong-Bae Park, and Sang-Jo Lee. 2016. Generating point of interest description with geo-tagged web photos. In Information Science and Applications 2016. Springer, 1013--1023.Google ScholarGoogle Scholar
  6. Thanh-Hieu Bui and Seong-Bae Park. 2017. Point of interest mining with proper semantic annotation. Multimedia Tools Appl. 76, 22 (2017), 23435--23457. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yinghao Cai, Ying Lu, Seon Ho Kim, Luciano Nocera, and Cyrus Shahabi. 2017. Querying geo-tagged videos for vision applications using spatial metadata. EURASIP J. Image Vid. Process. 2017, 1 (2017), 19.Google ScholarGoogle ScholarCross RefCross Ref
  8. Liangliang Cao, Jiebo Luo, Andrew Gallagher, Xin Jin, Jiawei Han, and Thomas S. Huang. 2010. A worldwide tourism recommendation system based on geo-tagged web photos. In IEEE Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’10). 2274--2277.Google ScholarGoogle Scholar
  9. David M. Chen, Georges Baatz, Kevin Köser, Sam S. Tsai, Ramakrishna Vedantham, Timo Pylvänäinen, Kimmo Roimela, Xin Chen, Jeff Bach, Marc Pollefeys, et al. 2011. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 737--744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. David J. Crandall, Yunpeng Li, Stefan Lee, and Daniel P. Huttenlocher. 2016. Recognizing landmarks in large-scale social image collections. In Large-Scale Visual Geo-Localization. Springer, 121--144.Google ScholarGoogle Scholar
  11. Hainan Cui, Shuhan Shen, Wei Gao, and Zhanyi Hu. 2015. Efficient large-scale structure from motion by fusing auxiliary imaging information. IEEE Trans. Image Process. 24, 11 (2015), 3561--3573.Google ScholarGoogle ScholarCross RefCross Ref
  12. Arif Tanju Erdem and Ali Özer Ercan. 2015. Fusing inertial sensor data in an extended kalman filter for 3D camera tracking. IEEE Trans. Image Process. 24, 2 (2015), 538--548.Google ScholarGoogle ScholarCross RefCross Ref
  13. John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. Deepstereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 5515--5524.Google ScholarGoogle Scholar
  14. Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M. Seitz. 2007. Multi-view stereo for community photo collections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). 1--8.Google ScholarGoogle Scholar
  15. Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3090--3098.Google ScholarGoogle ScholarCross RefCross Ref
  16. Zhigang Han, Caihui Cui, Yunfeng Kong, Fen Qin, and Pinde Fu. 2016. Video data model and retrieval service framework using geographic information. Trans GIS 20, 5 (2016), 701--717.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jia Hao, Guanfeng Wang, Beomjoo Seo, and Roger Zimmermann. 2014. Point of interest detection and visual distance estimation for sensor-rich video. IEEE Trans. Multimedia 16, 7 (2014), 1929--1941.Google ScholarGoogle ScholarCross RefCross Ref
  18. Arnold Irschara, Christof Hoppe, Horst Bischof, and Stefan Kluckner. 2011. Efficient structure from motion with weak position and orientation priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’11). 21--28.Google ScholarGoogle ScholarCross RefCross Ref
  19. Rongrong Ji, Yue Gao, Wei Liu, Xing Xie, Qi Tian, and Xuelong Li. 2015. When location meets social multimedia: A survey on vision-based recognition and mining for geo-social multimedia analytics. ACM Trans. Intell. Syst. Technol. 6, 1 (2015), 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kai Jiang, Huagang Yin, Peng Wang, and Nenghai Yu. 2013. Learning from contextual information of geo-tagged web photos to rank personalized tourism attractions. Neurocomputing 119 (2013), 17--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Slava Kisilevich, Florian Mansmann, and Daniel Keim. 2010. P-DBSCAN: A density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos. In Proceedings of the ACM International Conference and Exhibition on Computing for Geospatial Research 8 Application. 38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Daniel Kurz and Selim Ben Himane. 2011. Inertial sensor aligned visual feature descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 161--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yuri Almeida Lacerda, Robson Gonçalves Fechine Feitosa, Guilherme Álvaro Rodrigues Maia Esmeraldo, Cláudio de Souza Baptista, and Leandro Balby Marinho. 2012. Compass clustering: A new clustering method for detection of points of interest using personal collections of georeferenced and oriented photographs. In Proceedings of the ACM Brazilian Symposium on Multimedia and the Web (WebMedia’12). 281--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ickjai Lee, Guochen Cai, and Kyungmi Lee. 2014. Exploration of geo-tagged photos through data mining approaches. Expert Syst. Appl. 41, 2 (2014), 397--405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zechao Li, Jing Liu, Meng Wang, Changsheng Xu, and Hanqing Lu. 2013. Enhancing news organization for convenient retrieval and browsing. ACM Trans. Multimedia Comput. Commun. Appl. 10, 1 (2013), 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zechao Li and Jinhui Tang. 2015. Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Trans. Image Process. 24, 12 (2015), 5343--5355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ying Lu, Cyrus Shahabi, and Seon Ho Kim. 2016. Efficient indexing and retrieval of large-scale geo-tagged video databases. GeoInformatica 20, 4 (2016), 829--857. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ying Lu, Hien To, Abdullah Alfarrarjeh, Seon Ho Kim, Yifang Yin, Roger Zimmermann, and Cyrus Shahabi. 2016. GeoUGV: User-generated mobile video dataset with fine granularity spatial metadata. In ACM Proceedings of the ACM Multimedia Systems Conference (MMSys’16). 43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jiebo Luo, Dhiraj Joshi, Jie Yu, and Andrew Gallagher. 2011. Geotagging in multimedia and computer vision-a survey. Multimedia Tools Appl. 51, 1 (2011), 187--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Abdul Majid, Ling Chen, Hamid Turab Mirza, Ibrar Hussain, and Gencai Chen. 2015. A system for mining interesting tourist locations and travel sequences from public geo-tagged photos. Data Knowl. Eng. 95 (2015), 66--86.Google ScholarGoogle ScholarCross RefCross Ref
  31. Kanti V. Mardia and Peter E. Jupp. 2009. Directional Statistics. Vol. 494. John Wiley 8 Sons.Google ScholarGoogle Scholar
  32. Weiqing Min, Bing-Kun Bao, and Changsheng Xu. 2014. Multimodal spatio-temporal theme modeling for landmark analysis. IEEE Multimedia 21, 3 (2014), 20--29.Google ScholarGoogle ScholarCross RefCross Ref
  33. Weiqing Min, Bing-Kun Bao, and Changsheng Xu. 2014. Scene and viewpoint based visual summarization for landmarks. In IEEE Proceedings of Image Processing. 3112--3116.Google ScholarGoogle ScholarCross RefCross Ref
  34. Amit More and Subhasis Chaudhuri. 2016. Event geo-localization and tracking from crowd-sourced video metadata. In ACM Proceedings of the of the 10th Indian Conference on Computer Vision, Graphics and Image Processing. 24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Robin Wentao Ouyang, Animesh Srivastava, Prithvi Prabahar, Romit Roy Choudhury, Merideth Addicott, and F. Joseph McClernon. 2013. If you see something, swipe towards it: Crowdsourced event localization using smartphones. In ACM Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp’13). 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Adrian Popescu and Aymen Shabou. 2013. Towards precise POI localization with social media. In ACM Proceedings of the Multimedia. 573--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-specific video summarization. In Proceedings of the European Conference on Computer Vision (ECCV’14). Springer, 540--555.Google ScholarGoogle ScholarCross RefCross Ref
  38. Mahesh Ramachandran, Ashok Veeraraghavan, and Rama Chellappa. 2011. A fast bilinear structure from motion algorithm using a video sequence and inertial sensors. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1 (2011), 186--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Stevan Rudinac, Alan Hanjalic, and Martha Larson. 2011. Finding representative and diverse community contributed images to create visual summaries of geographic areas. In Proceedings of the ACM Conference on Multimedia. 1109--1112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Stevan Rudinac, Alan Hanjalic, and Martha Larson. 2013. Generating visual summaries of geographic areas using community-contributed images. IEEE Trans. Multimedia 15, 4 (2013), 921--932. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Charles Sutton and Andrew McCallum. 2007. Piecewise pseudolikelihood for efficient training of conditional random fields. In ACM Proceedings of the International Conference on Machine Learning (ICML’07). 863--870. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Bart Thomee, Ioannis Arapakis, and David A. Shamma. 2016. Finding social points of interest from georeferenced and oriented online photographs. ACM Trans. Multimedia Comput. Commun. Appl. 12, 2 (2016), 36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Guanfeng Wang, Ying Lu, Luming Zhang, Abdullah Alfarrarjeh, Roger Zimmermann, Seon Ho Kim, and Cyrus Shahabi. 2014. Active key frame selection for 3D model reconstruction from crowdsourced geo-tagged videos. In Proceedings of the IEEE International Conference on Multimedia 8 Expo (ICME’14). 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  44. Jingya Wang, Mohammed Korayem, Saul Blanco, and David J. Crandall. 2016. Tracking natural events through social media and computer vision. In Proceedings of the ACM Conference on Multimedia. 1097--1101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yiyang Yang, Zhiguo Gong, et al. 2011. Identifying points of interest by self-tuning clustering. In ACM Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieva (SIGIR’11). 883--892. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. L. Yu, Soh-Khim Ong, and Andrew Y. C. Nee. 2016. A tracking solution for mobile augmented reality based on sensor-aided marker-less tracking and panoramic mapping. Multimedia Tools Appl 75, 6 (2016), 3199--3220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ying Zhang and Roger Zimmermann. 2016. Efficient summarization from multiple georeferenced user-generated videos. IEEE Trans. Multimedia 18, 3 (2016), 418--431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Chao Zhu and Yuxin Peng. 2015. A boosted multi-task model for pedestrian detection with occlusion handling. IEEE Trans. Image Process. 24, 12 (2015), 5619--5629.Google ScholarGoogle ScholarCross RefCross Ref
  49. Chao Zhu and Yuxin Peng. 2015. A boosted multi-task model for pedestrian detection with occlusion handling. In AAAI Proceedings of the AAAI Conference on Artificial Intelligence (AI’15). 3878--3884. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Chao Zhu and Yuxin Peng. 2016. Group cost-sensitive boosting for multi-resolution pedestrian detection. In Proceedings of the AAAI Conference on Artificial Intelligence (AI’16). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Pseudo-likelihood Approach for Geo-localization of Events from Crowd-sourced Sensor-Metadata

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Article Metrics

            • Downloads (Last 12 months)9
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!