Abstract
Events such as live concerts, protest marches, and exhibitions are often video recorded by many people at the same time, typically using smartphone devices. In this work, we address the problem of geo-localizing such events from crowd-generated data. Traditional approaches for solving such a problem using multiple video sequences of the event would require highly complex computer vision (CV) methods, which are computation intensive and are not robust under the environment where visual data are collected through crowd-sourced medium. In the present work, we approach the problem in a probabilistic framework using only the sensor metadata obtained from smartphones. We model the event location and camera locations and orientations (camera parameters) as the hidden states in a Hidden Markov Model. The sensor metadata from GPS and the digital compass from user smartphones are used as the observations associated with the hidden states of the model. We have used a suitable potential function to capture the complex interaction between the hidden states (i.e., event location and camera parameters). The non-Gaussian densities involved in the model, such as the potential function involving hidden states, make the maximum-likelihood estimation intractable. We propose a pseudo-likelihood-based approach to maximize the approximate-likelihood, which provides a tractable solution to the problem. The experimental results on the simulated as well as real data show correct event geo-localization using the proposed method. When compared with several baselines the proposed method shows a superior performance. The overall computation time required is much smaller, since only the sensor metadata are used instead of visual data.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, A Pseudo-likelihood Approach for Geo-localization of Events from Crowd-sourced Sensor-Metadata
- Sakire Arslan Ay, Lingyan Zhang, Seon Ho Kim, Ma He, and Roger Zimmermann. 2009. GRVS: A georeferenced video search engine. In Proceedings of the ACM Conference on Multimedia. 977--978. Google Scholar
Digital Library
- Clemens Arth, Alessandro Mulloni, and Dieter Schmalstieg. 2012. Exploiting sensors on mobile phones to improve wide-area localization. In Proceedings of the IEEE International Conference on Pattern Recognition (ICPR’12). 2152--2156.Google Scholar
- Julian Besag. 1975. Statistical analysis of non-lattice data. The Statistician 24, 3 (1975), 179--195.Google Scholar
Cross Ref
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer. Google Scholar
Digital Library
- Thanh-Hieu Bui, A.-Yeong Kim, Seong-Bae Park, and Sang-Jo Lee. 2016. Generating point of interest description with geo-tagged web photos. In Information Science and Applications 2016. Springer, 1013--1023.Google Scholar
- Thanh-Hieu Bui and Seong-Bae Park. 2017. Point of interest mining with proper semantic annotation. Multimedia Tools Appl. 76, 22 (2017), 23435--23457. Google Scholar
Digital Library
- Yinghao Cai, Ying Lu, Seon Ho Kim, Luciano Nocera, and Cyrus Shahabi. 2017. Querying geo-tagged videos for vision applications using spatial metadata. EURASIP J. Image Vid. Process. 2017, 1 (2017), 19.Google Scholar
Cross Ref
- Liangliang Cao, Jiebo Luo, Andrew Gallagher, Xin Jin, Jiawei Han, and Thomas S. Huang. 2010. A worldwide tourism recommendation system based on geo-tagged web photos. In IEEE Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’10). 2274--2277.Google Scholar
- David M. Chen, Georges Baatz, Kevin Köser, Sam S. Tsai, Ramakrishna Vedantham, Timo Pylvänäinen, Kimmo Roimela, Xin Chen, Jeff Bach, Marc Pollefeys, et al. 2011. City-scale landmark identification on mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 737--744. Google Scholar
Digital Library
- David J. Crandall, Yunpeng Li, Stefan Lee, and Daniel P. Huttenlocher. 2016. Recognizing landmarks in large-scale social image collections. In Large-Scale Visual Geo-Localization. Springer, 121--144.Google Scholar
- Hainan Cui, Shuhan Shen, Wei Gao, and Zhanyi Hu. 2015. Efficient large-scale structure from motion by fusing auxiliary imaging information. IEEE Trans. Image Process. 24, 11 (2015), 3561--3573.Google Scholar
Cross Ref
- Arif Tanju Erdem and Ali Özer Ercan. 2015. Fusing inertial sensor data in an extended kalman filter for 3D camera tracking. IEEE Trans. Image Process. 24, 2 (2015), 538--548.Google Scholar
Cross Ref
- John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. Deepstereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 5515--5524.Google Scholar
- Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M. Seitz. 2007. Multi-view stereo for community photo collections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). 1--8.Google Scholar
- Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3090--3098.Google Scholar
Cross Ref
- Zhigang Han, Caihui Cui, Yunfeng Kong, Fen Qin, and Pinde Fu. 2016. Video data model and retrieval service framework using geographic information. Trans GIS 20, 5 (2016), 701--717.Google Scholar
Cross Ref
- Jia Hao, Guanfeng Wang, Beomjoo Seo, and Roger Zimmermann. 2014. Point of interest detection and visual distance estimation for sensor-rich video. IEEE Trans. Multimedia 16, 7 (2014), 1929--1941.Google Scholar
Cross Ref
- Arnold Irschara, Christof Hoppe, Horst Bischof, and Stefan Kluckner. 2011. Efficient structure from motion with weak position and orientation priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’11). 21--28.Google Scholar
Cross Ref
- Rongrong Ji, Yue Gao, Wei Liu, Xing Xie, Qi Tian, and Xuelong Li. 2015. When location meets social multimedia: A survey on vision-based recognition and mining for geo-social multimedia analytics. ACM Trans. Intell. Syst. Technol. 6, 1 (2015), 1. Google Scholar
Digital Library
- Kai Jiang, Huagang Yin, Peng Wang, and Nenghai Yu. 2013. Learning from contextual information of geo-tagged web photos to rank personalized tourism attractions. Neurocomputing 119 (2013), 17--25. Google Scholar
Digital Library
- Slava Kisilevich, Florian Mansmann, and Daniel Keim. 2010. P-DBSCAN: A density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos. In Proceedings of the ACM International Conference and Exhibition on Computing for Geospatial Research 8 Application. 38. Google Scholar
Digital Library
- Daniel Kurz and Selim Ben Himane. 2011. Inertial sensor aligned visual feature descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 161--166. Google Scholar
Digital Library
- Yuri Almeida Lacerda, Robson Gonçalves Fechine Feitosa, Guilherme Álvaro Rodrigues Maia Esmeraldo, Cláudio de Souza Baptista, and Leandro Balby Marinho. 2012. Compass clustering: A new clustering method for detection of points of interest using personal collections of georeferenced and oriented photographs. In Proceedings of the ACM Brazilian Symposium on Multimedia and the Web (WebMedia’12). 281--288. Google Scholar
Digital Library
- Ickjai Lee, Guochen Cai, and Kyungmi Lee. 2014. Exploration of geo-tagged photos through data mining approaches. Expert Syst. Appl. 41, 2 (2014), 397--405. Google Scholar
Digital Library
- Zechao Li, Jing Liu, Meng Wang, Changsheng Xu, and Hanqing Lu. 2013. Enhancing news organization for convenient retrieval and browsing. ACM Trans. Multimedia Comput. Commun. Appl. 10, 1 (2013), 1. Google Scholar
Digital Library
- Zechao Li and Jinhui Tang. 2015. Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Trans. Image Process. 24, 12 (2015), 5343--5355.Google Scholar
Digital Library
- Ying Lu, Cyrus Shahabi, and Seon Ho Kim. 2016. Efficient indexing and retrieval of large-scale geo-tagged video databases. GeoInformatica 20, 4 (2016), 829--857. Google Scholar
Digital Library
- Ying Lu, Hien To, Abdullah Alfarrarjeh, Seon Ho Kim, Yifang Yin, Roger Zimmermann, and Cyrus Shahabi. 2016. GeoUGV: User-generated mobile video dataset with fine granularity spatial metadata. In ACM Proceedings of the ACM Multimedia Systems Conference (MMSys’16). 43. Google Scholar
Digital Library
- Jiebo Luo, Dhiraj Joshi, Jie Yu, and Andrew Gallagher. 2011. Geotagging in multimedia and computer vision-a survey. Multimedia Tools Appl. 51, 1 (2011), 187--211. Google Scholar
Digital Library
- Abdul Majid, Ling Chen, Hamid Turab Mirza, Ibrar Hussain, and Gencai Chen. 2015. A system for mining interesting tourist locations and travel sequences from public geo-tagged photos. Data Knowl. Eng. 95 (2015), 66--86.Google Scholar
Cross Ref
- Kanti V. Mardia and Peter E. Jupp. 2009. Directional Statistics. Vol. 494. John Wiley 8 Sons.Google Scholar
- Weiqing Min, Bing-Kun Bao, and Changsheng Xu. 2014. Multimodal spatio-temporal theme modeling for landmark analysis. IEEE Multimedia 21, 3 (2014), 20--29.Google Scholar
Cross Ref
- Weiqing Min, Bing-Kun Bao, and Changsheng Xu. 2014. Scene and viewpoint based visual summarization for landmarks. In IEEE Proceedings of Image Processing. 3112--3116.Google Scholar
Cross Ref
- Amit More and Subhasis Chaudhuri. 2016. Event geo-localization and tracking from crowd-sourced video metadata. In ACM Proceedings of the of the 10th Indian Conference on Computer Vision, Graphics and Image Processing. 24. Google Scholar
Digital Library
- Robin Wentao Ouyang, Animesh Srivastava, Prithvi Prabahar, Romit Roy Choudhury, Merideth Addicott, and F. Joseph McClernon. 2013. If you see something, swipe towards it: Crowdsourced event localization using smartphones. In ACM Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp’13). 23--32. Google Scholar
Digital Library
- Adrian Popescu and Aymen Shabou. 2013. Towards precise POI localization with social media. In ACM Proceedings of the Multimedia. 573--576. Google Scholar
Digital Library
- Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-specific video summarization. In Proceedings of the European Conference on Computer Vision (ECCV’14). Springer, 540--555.Google Scholar
Cross Ref
- Mahesh Ramachandran, Ashok Veeraraghavan, and Rama Chellappa. 2011. A fast bilinear structure from motion algorithm using a video sequence and inertial sensors. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1 (2011), 186--193. Google Scholar
Digital Library
- Stevan Rudinac, Alan Hanjalic, and Martha Larson. 2011. Finding representative and diverse community contributed images to create visual summaries of geographic areas. In Proceedings of the ACM Conference on Multimedia. 1109--1112. Google Scholar
Digital Library
- Stevan Rudinac, Alan Hanjalic, and Martha Larson. 2013. Generating visual summaries of geographic areas using community-contributed images. IEEE Trans. Multimedia 15, 4 (2013), 921--932. Google Scholar
Digital Library
- Charles Sutton and Andrew McCallum. 2007. Piecewise pseudolikelihood for efficient training of conditional random fields. In ACM Proceedings of the International Conference on Machine Learning (ICML’07). 863--870. Google Scholar
Digital Library
- Bart Thomee, Ioannis Arapakis, and David A. Shamma. 2016. Finding social points of interest from georeferenced and oriented online photographs. ACM Trans. Multimedia Comput. Commun. Appl. 12, 2 (2016), 36. Google Scholar
Digital Library
- Guanfeng Wang, Ying Lu, Luming Zhang, Abdullah Alfarrarjeh, Roger Zimmermann, Seon Ho Kim, and Cyrus Shahabi. 2014. Active key frame selection for 3D model reconstruction from crowdsourced geo-tagged videos. In Proceedings of the IEEE International Conference on Multimedia 8 Expo (ICME’14). 1--6.Google Scholar
Cross Ref
- Jingya Wang, Mohammed Korayem, Saul Blanco, and David J. Crandall. 2016. Tracking natural events through social media and computer vision. In Proceedings of the ACM Conference on Multimedia. 1097--1101. Google Scholar
Digital Library
- Yiyang Yang, Zhiguo Gong, et al. 2011. Identifying points of interest by self-tuning clustering. In ACM Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieva (SIGIR’11). 883--892. Google Scholar
Digital Library
- L. Yu, Soh-Khim Ong, and Andrew Y. C. Nee. 2016. A tracking solution for mobile augmented reality based on sensor-aided marker-less tracking and panoramic mapping. Multimedia Tools Appl 75, 6 (2016), 3199--3220. Google Scholar
Digital Library
- Ying Zhang and Roger Zimmermann. 2016. Efficient summarization from multiple georeferenced user-generated videos. IEEE Trans. Multimedia 18, 3 (2016), 418--431. Google Scholar
Digital Library
- Chao Zhu and Yuxin Peng. 2015. A boosted multi-task model for pedestrian detection with occlusion handling. IEEE Trans. Image Process. 24, 12 (2015), 5619--5629.Google Scholar
Cross Ref
- Chao Zhu and Yuxin Peng. 2015. A boosted multi-task model for pedestrian detection with occlusion handling. In AAAI Proceedings of the AAAI Conference on Artificial Intelligence (AI’15). 3878--3884. Google Scholar
Digital Library
- Chao Zhu and Yuxin Peng. 2016. Group cost-sensitive boosting for multi-resolution pedestrian detection. In Proceedings of the AAAI Conference on Artificial Intelligence (AI’16). Google Scholar
Digital Library
Index Terms
A Pseudo-likelihood Approach for Geo-localization of Events from Crowd-sourced Sensor-Metadata
Recommendations
Event geo-localization and tracking from crowd-sourced video metadata
ICVGIP '16: Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image ProcessingWe propose a novel technique for event geo-localization (i.e. 2-D location of the event on the surface of the earth) from the sensor metadata of crowd-sourced videos collected from smartphone devices. With the help of sensors available in the smartphone ...
An extension of the state-observation dependency in partly hidden Markov models and its application to continuous speech recognition
We extend the state-observation dependencies in a Partly Hidden Markov Model (PHMM) and apply this model to continuous speech recognition. In a PHMM the observations and state transitions are dependent on a series of hidden and observable states. In the ...
An improved hidden Markov model for literature metadata extraction
ICIC'10: Proceedings of the 6th international conference on Advanced intelligent computing theories and applications: intelligent computingIn this paper, we proposed an improved Hidden Markov Model (HMM) to extract metadata in the academic literatures. We have built a dataset including 458 literatures from the VLDB conferences, which contains the visual feature of text blocks. Our approach ...






Comments