Abstract
Given N geo-located point instances (e.g., crime or disease cases) in a spatial domain, we aim to detect sub-regions (i.e., hotspots) that have a higher probability density of generating such instances than the others. Hotspot detection has been widely used in a variety of important urban applications, including public safety, public health, urban planning, and equity, among others. The problem is challenging because its societal applications often have low tolerance for false positives and require significance testing that is computationally intensive. In related work, the spatial scan statistic introduced a likelihood ratio--based framework for hotspot evaluation and significance testing. However, it fails to consider the effect of spatial non-determinism, causing many missing detections. Our previous work introduced a non-deterministic normalization--based scan statistic to mitigate this issue. However, its robustness against false positives is not stably controlled. To address these limitations, we propose a unified framework that can improve the completeness of results without incurring more false positives. We also propose a reduction algorithm to improve the computational efficiency. Experiment results confirm that the unified framework can greatly improve the recall of hotspot detection without increasing the number of false positives, and the reduction algorithm can greatly reduce execution time.
- National Science Foundation. 2017. S8CC-IRG Track 1: Connecting the Smart-City Paradigm with a Sustainable Urban Infrastructure Systems Framework to Advance Equity in Communities. Retrieved July 17, 2020 from https://www.nsf.gov/awardsearch/showAward?AWD_ID=17376338HistoricalAwards=false.Google Scholar
- National Cancer Institute. 2017. Surveillance Research Program. Retrieved July 17, 2020 from https://surveillance.cancer.gov//.Google Scholar
- SatScan. 2017. Home Page. Retrieved July 17, 2020 from https://www.satscan.org/.Google Scholar
- Gowtham Atluri, Anuj Karpatne, and Vipin Kumar. 2018. Spatio-temporal data mining: A survey of problems and methods. ACM Computing Surveys 51, 4 (2018), 83.Google Scholar
Digital Library
- Jose Cadena, Arinjoy Basak, Anil Vullikanti, and Xinwei Deng. 2018. Graph scan statistics with uncertainty. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- Eric Delmelle, Coline Dony, Irene Casas, Meijuan Jia, and Wenwu Tang. 2014. Visualizing the impact of space-time uncertainties on dengue fever patterns. International Journal of Geographical Information Science 28, 5 (2014), 1107--1127.Google Scholar
Digital Library
- Emre Eftelioglu, Yan Li, Xun Tang, Shashi Shekhar, James M. Kang, and Christopher Farah. 2016. Mining network hotspots with holes: A summary of results. In Proceedings of the International Conference on Geographic Information Science. 51--67.Google Scholar
- Emre Eftelioglu, Shashi Shekhar, Dev Oliver, Xun Zhou, Michael R. Evans, Yiqun Xie, James M. Kang, Renee Laubscher, and Christopher Farah. 2014. Ring-shaped hotspot detection: A summary of results. In Proceedings of the 2014 IEEE International Conference on Data Mining. IEEE, Los Alamitos, CA, 815--820.Google Scholar
Digital Library
- Emre Eftelioglu, Xun Tang, and Shashi Shekhar. 2015. Geographically robust hotspot detection: A summary of results. In Proceedings of the IEEE International Conference on Data Mining Workshop (ICDMW’15). IEEE, Los Alamitos, CA, 1447--1456.Google Scholar
- Lan Huang, Ram C. Tiwari, Linda W. Pickle, and Zhaohui Zou. 2010. Covariate adjusted weighted normal spatial scan statistics with applications to study geographic clustering of obesity and lung cancer mortality in the United States. Statistics in Medicine 29, 23 (2010), 2410--2422.Google Scholar
Cross Ref
- Lan Huang, Ram C. Tiwari, Zhaohui Zou, Martin Kulldorff, and Eric J. Feuer. 2009. Weighted normal spatial scan statistic for heterogeneous population data. Journal of the American Statistical Association 104, 487 (2009), 886--898.Google Scholar
Cross Ref
- Yan Huang and Jason W. Powell. 2012. Detecting regions of disequilibrium in taxi services under uncertainty. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems. ACM, New York, NY, 139--148.Google Scholar
- Vandana Pursnani Janeja and Vijayalakshmi Atluri. 2005. LS 3: A linear semantic scan statistic technique for detecting anomalous windows. In Proceedings of the 2005 ACM Symposium on Applied Computing. ACM, New York, NY, 493--497.Google Scholar
Digital Library
- Xia Jiang and Gregory F. Cooper. 2010. A Bayesian spatio-temporal method for disease outbreak detection. Journal of the American Medical Informatics Association 17, 4 (2010), 462--471.Google Scholar
Cross Ref
- Inkyung Jung, Martin Kulldorff, and Otukei John Richard. 2010. A spatial scan statistic for multinomial data. Statistics in Medicine 29, 18 (2010), 1910--1918.Google Scholar
Cross Ref
- Julia Krolik, Gerald Evans, Paul Belanger, Allison Maier, Geoffrey Hall, Alan Joyce, Stephanie Guimont, Amanda Pelot, and Anna Majury. 2014. Microbial source tracking and spatial analysis of E. coli contaminated private well waters in southeastern Ontario. Journal of Water and Health 12, 2 (2014), 348--357.Google Scholar
Cross Ref
- Julia Krolik, Allison Maier, Gerald Evans, Paul Belanger, Geoffrey Hall, and Alan Joyce. 2013. A spatial analysis of private well water Escherichia coli contamination in southern Ontario. Geospatial Health 8, 1 (2013), 65--75.Google Scholar
- Martin Kulldorff. 1997. A spatial scan statistic. Communications in Statistics—Theory and Methods 26, 6 (1997), 1481--1496.Google Scholar
- Martin Kulldorff, Lan Huang, and Kevin Konty. 2009. A scan statistic for continuous data based on the normal probability model. International Journal of Health Geographics 8, 1 (2009), 58.Google Scholar
Cross Ref
- Martin Kulldorff, Lan Huang, and Linda Pickle. 2003. An elliptic spatial scan statistic and its application to breast cancer mortality data in Northeastern United States. Journal of Urban Health 80 (2003), i130--i131.Google Scholar
- Martin Kulldorff, Lan Huang, Linda Pickle, and Luiz Duczmal. 2006. An elliptic spatial scan statistic. Statistics in Medicine 25, 22 (2006), 3929--3943.Google Scholar
Cross Ref
- Martin Kulldorff, Farzad Mostashari, Luiz Duczmal, W. Katherine Yih, Ken Kleinman, and Richard Platt. 2007. Multivariate scan statistics for disease surveillance. Statistics in Medicine 26, 8 (2007), 1824--1833.Google Scholar
Cross Ref
- Michael Leitner and Marco Helbich. 2011. The impact of hurricanes on crime: A spatio-temporal analysis in the city of Houston, Texas. Cartography and Geographic Information Science 38, 2 (2011), 213--221.Google Scholar
Cross Ref
- Lan Luo. 2013. Impact of spatial aggregation error on the spatial scan analysis: A case study of colorectal cancer. Geospatial Health 8, 1 (2013), 23--35.Google Scholar
- Nicholas Malizia. 2013. Inaccuracy, uncertainty and the space-time permutation scan statistic. PLoS One 8, 2 (2013), e52034.Google Scholar
Cross Ref
- Tomoki Nakaya and Keiji Yano. 2010. Visualising crime clusters in a space-time cube: An exploratory data-analysis approach using space-time kernel density estimation and scan statistics. Transactions in GIS 14, 3 (2010), 223--239.Google Scholar
Cross Ref
- Daniel B. Neill. 2009. Expectation-based scan statistics for monitoring spatial time series data. International Journal of Forecasting 25, 3 (2009), 498--517.Google Scholar
Cross Ref
- Daniel B. Neill. 2011. Fast Bayesian scan statistics for multivariate event detection and visualization. Statistics in Medicine 30, 5 (2011), 455--469.Google Scholar
Cross Ref
- Daniel B. Neill and Gregory F. Cooper. 2010. A multivariate Bayesian scan statistic for early event detection and characterization. Machine Learning 79, 3 (2010), 261--282.Google Scholar
Digital Library
- Daniel B. Neill, Gregory F. Cooper, Kaustav Das, Xia Jiang, and Jeff Schneider. 2009. Bayesian network scan statistics for multivariate pattern detection. In Scan Statistics. Statistics for Industry and Technology. Springer, 221--249.Google Scholar
- Daniel B. Neill and Andrew W. Moore. 2004. A fast multi-resolution method for detection of significant spatial disease clusters. In Advances in Neural Information Processing Systems 10. 651--658.Google Scholar
- Daniel B. Neill and Andrew W. Moore. 2004. Rapid detection of significant spatial clusters. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 256--265.Google Scholar
- Daniel B. Neill, Andrew W. Moore, and Gregory F. Cooper. 2006. A Bayesian spatial scan statistic. In Advances in Neural Information Processing Systems. 1003--1010.Google Scholar
- Fernando L. P. Oliveira, André L. F. Cançado, Luiz H. Duczmal, and Anderson R. Duarte. 2012. Assessing the outline uncertainty of spatial disease clusters. In Public Health—Methodology, Environmental and Systems Issues, J. Maddock (Ed.). InTech, 51--66.Google Scholar
- Dev Oliver, Shashi Shekhar, James M. Kang, Renee Laubscher, Veronica Carlan, and Abdussalam Bannur. 2013. A k-main routes approach to spatial network activity summarization. IEEE Transactions on Knowledge and Data Engineering 26, 6 (2013), 1464--1478.Google Scholar
Digital Library
- Sushil K. Prasad, Danial Aghajarian, Michael McDermott, Dhara Shah, Mohamed Mokbel, Satish Puri, Sergio J. Rey, et al. 2017. Parallel processing over spatial-temporal datasets from geo, bio, climate and social science communities: A research roadmap. In Proceedings of the 2017 IEEE International Congress on Big Data (BigData Congress’17). IEEE, Los Alamitos, CA, 232--250.Google Scholar
- Shashi Shekhar, Steven Feiner, and Walid Aref. 2015. Spatial computing. Communications of the ACM 59, 1 (2015), 72--81.Google Scholar
Digital Library
- Shashi Shekhar, Zhe Jiang, Reem Ali, Emre Eftelioglu, Xun Tang, Venkata Gunturi, and Xun Zhou. 2015. Spatiotemporal data mining: A computational perspective. ISPRS International Journal of Geo-Information 4, 4 (2015), 2306--2338.Google Scholar
Cross Ref
- Lei Shi and Vandana P. Janeja. 2009. Anomalous window discovery through scan statistics for linear intersecting paths (SSLIP). In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 767--776.Google Scholar
- Joanne R. Stevenson, Christopher T. Emrich, Jerry T. Mitchell, and Susan L. Cutter. 2010. Using building permits to monitor disaster recovery: A spatio-temporal case study of coastal Mississippi following Hurricane Katrina. Cartography and Geographic Information Science 37, 1 (2010), 57--68.Google Scholar
Cross Ref
- Xun Tang, Emre Eftelioglu, Dev Oliver, and Shashi Shekhar. 2017. Significant linear hotspot discovery. IEEE Transactions on Big Data 3, 2 (2017), 140--153.Google Scholar
Cross Ref
- Jonathan Wakefield and Albert Kim. 2013. A Bayesian model for cluster detection. Biostatistics 14, 4 (2013), 752--765.Google Scholar
Cross Ref
- Clemens Wastl, Yong Wang, Aitor Atencia, and Christoph Wittmann. 2019. Independent perturbations for physics parametrization tendencies in a convection-permitting ensemble (pSPPT). Geoscientific Model Development 12, 1 (2019), 261--273.Google Scholar
- Antje Weisheimer, Susanna Corti, Tim Palmer, and Frederic Vitart. 2014. Addressing model error through atmospheric stochastic physical parametrizations: Impact on the coupled ECMWF seasonal forecasting system. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 372, 2018 (2014), 20130290.Google Scholar
- Claire S. Witham and Clive Oppenheimer. 2004. Mortality in England during the 1783--4 Laki Craters eruption. Bulletin of Volcanology 67, 1 (2004), 15--26.Google Scholar
- Yiqun Xie, Emre Eftelioglu, Reem Ali, Xun Tang, Yan Li, Ruhi Doshi, and Shashi Shekhar. 2017. Transdisciplinary foundations of geospatial data science. ISPRS International Journal of Geo-Information 6, 12 (2017), 395.Google Scholar
Cross Ref
- Yiqun Xie, Jayant Gupta, Yan Li, and Shashi Shekhar. 2018. Transforming smart cities with spatial computing. In Proceedings of the 2018 IEEE International Smart Cities Conference (ISC2’18). IEEE, Los Alamitos, CA, 1--9.Google Scholar
- Yiqun Xie and Shashi Shekhar. 2019. A nondeterministic normalization based scan statistic (NN-scan) towards robust hotspot detection: A summary of results. In Proceedings of the SIAM International Conference on Data Mining (SDM’19).Google Scholar
Cross Ref
- Yiqun Xie and Shashi Shekhar. 2019. Significant DBSCAN towards statistically robust clustering. In Proceedings of the 16th International Symposium on Spatial and Temporal Databases. 31--40.Google Scholar
Digital Library
- Yiqun Xie, Xun Zhou, and Shashi Shekhar. 2020. Discovering interesting sub-paths with statistical significance from spatio-temporal datasets. ACM Transactions on Intelligent Systems and Technology 11, 1 (2020), Article 2.Google Scholar
Index Terms
A Unified Framework for Robust and Efficient Hotspot Detection in Smart Cities
Recommendations
A Formal Framework for Program Anomaly Detection
RAID 2015: Proceedings of the 18th International Symposium on Research in Attacks, Intrusions, and Defenses - Volume 9404Program anomaly detection analyzes normal program behaviors and discovers aberrant executions caused by attacks, misconfigurations, program bugs, and unusual usage patterns. The merit of program anomaly detection is its independence from attack ...
Fast Dual-Graph-Based Hotspot Filtering
As advanced technologies in wafer manufacturing push patterning processes toward lower subwavelength printing, lithography for mass production potentially suffers from decreased patterning fidelity. This results in the generation of many hotspots, which ...
Crowdsourcing-based Urban Anomaly Prediction System for Smart Cities
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementCrowdsourcing has become an emerging data collection paradigm for smart city applications. A new category of crowdsourcing-based urban anomaly reporting systems have been developed to enable pervasive and real-time reporting of anomalies in cities (e.g.,...






Comments