skip to main content
research-article
Open Access

A Unified Framework for Robust and Efficient Hotspot Detection in Smart Cities

Authors Info & Claims
Published:14 September 2020Publication History
Skip Abstract Section

Abstract

Given N geo-located point instances (e.g., crime or disease cases) in a spatial domain, we aim to detect sub-regions (i.e., hotspots) that have a higher probability density of generating such instances than the others. Hotspot detection has been widely used in a variety of important urban applications, including public safety, public health, urban planning, and equity, among others. The problem is challenging because its societal applications often have low tolerance for false positives and require significance testing that is computationally intensive. In related work, the spatial scan statistic introduced a likelihood ratio--based framework for hotspot evaluation and significance testing. However, it fails to consider the effect of spatial non-determinism, causing many missing detections. Our previous work introduced a non-deterministic normalization--based scan statistic to mitigate this issue. However, its robustness against false positives is not stably controlled. To address these limitations, we propose a unified framework that can improve the completeness of results without incurring more false positives. We also propose a reduction algorithm to improve the computational efficiency. Experiment results confirm that the unified framework can greatly improve the recall of hotspot detection without increasing the number of false positives, and the reduction algorithm can greatly reduce execution time.

References

  1. National Science Foundation. 2017. S8CC-IRG Track 1: Connecting the Smart-City Paradigm with a Sustainable Urban Infrastructure Systems Framework to Advance Equity in Communities. Retrieved July 17, 2020 from https://www.nsf.gov/awardsearch/showAward?AWD_ID=17376338HistoricalAwards=false.Google ScholarGoogle Scholar
  2. National Cancer Institute. 2017. Surveillance Research Program. Retrieved July 17, 2020 from https://surveillance.cancer.gov//.Google ScholarGoogle Scholar
  3. SatScan. 2017. Home Page. Retrieved July 17, 2020 from https://www.satscan.org/.Google ScholarGoogle Scholar
  4. Gowtham Atluri, Anuj Karpatne, and Vipin Kumar. 2018. Spatio-temporal data mining: A survey of problems and methods. ACM Computing Surveys 51, 4 (2018), 83.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jose Cadena, Arinjoy Basak, Anil Vullikanti, and Xinwei Deng. 2018. Graph scan statistics with uncertainty. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  6. Eric Delmelle, Coline Dony, Irene Casas, Meijuan Jia, and Wenwu Tang. 2014. Visualizing the impact of space-time uncertainties on dengue fever patterns. International Journal of Geographical Information Science 28, 5 (2014), 1107--1127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Emre Eftelioglu, Yan Li, Xun Tang, Shashi Shekhar, James M. Kang, and Christopher Farah. 2016. Mining network hotspots with holes: A summary of results. In Proceedings of the International Conference on Geographic Information Science. 51--67.Google ScholarGoogle Scholar
  8. Emre Eftelioglu, Shashi Shekhar, Dev Oliver, Xun Zhou, Michael R. Evans, Yiqun Xie, James M. Kang, Renee Laubscher, and Christopher Farah. 2014. Ring-shaped hotspot detection: A summary of results. In Proceedings of the 2014 IEEE International Conference on Data Mining. IEEE, Los Alamitos, CA, 815--820.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Emre Eftelioglu, Xun Tang, and Shashi Shekhar. 2015. Geographically robust hotspot detection: A summary of results. In Proceedings of the IEEE International Conference on Data Mining Workshop (ICDMW’15). IEEE, Los Alamitos, CA, 1447--1456.Google ScholarGoogle Scholar
  10. Lan Huang, Ram C. Tiwari, Linda W. Pickle, and Zhaohui Zou. 2010. Covariate adjusted weighted normal spatial scan statistics with applications to study geographic clustering of obesity and lung cancer mortality in the United States. Statistics in Medicine 29, 23 (2010), 2410--2422.Google ScholarGoogle ScholarCross RefCross Ref
  11. Lan Huang, Ram C. Tiwari, Zhaohui Zou, Martin Kulldorff, and Eric J. Feuer. 2009. Weighted normal spatial scan statistic for heterogeneous population data. Journal of the American Statistical Association 104, 487 (2009), 886--898.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yan Huang and Jason W. Powell. 2012. Detecting regions of disequilibrium in taxi services under uncertainty. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems. ACM, New York, NY, 139--148.Google ScholarGoogle Scholar
  13. Vandana Pursnani Janeja and Vijayalakshmi Atluri. 2005. LS 3: A linear semantic scan statistic technique for detecting anomalous windows. In Proceedings of the 2005 ACM Symposium on Applied Computing. ACM, New York, NY, 493--497.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xia Jiang and Gregory F. Cooper. 2010. A Bayesian spatio-temporal method for disease outbreak detection. Journal of the American Medical Informatics Association 17, 4 (2010), 462--471.Google ScholarGoogle ScholarCross RefCross Ref
  15. Inkyung Jung, Martin Kulldorff, and Otukei John Richard. 2010. A spatial scan statistic for multinomial data. Statistics in Medicine 29, 18 (2010), 1910--1918.Google ScholarGoogle ScholarCross RefCross Ref
  16. Julia Krolik, Gerald Evans, Paul Belanger, Allison Maier, Geoffrey Hall, Alan Joyce, Stephanie Guimont, Amanda Pelot, and Anna Majury. 2014. Microbial source tracking and spatial analysis of E. coli contaminated private well waters in southeastern Ontario. Journal of Water and Health 12, 2 (2014), 348--357.Google ScholarGoogle ScholarCross RefCross Ref
  17. Julia Krolik, Allison Maier, Gerald Evans, Paul Belanger, Geoffrey Hall, and Alan Joyce. 2013. A spatial analysis of private well water Escherichia coli contamination in southern Ontario. Geospatial Health 8, 1 (2013), 65--75.Google ScholarGoogle Scholar
  18. Martin Kulldorff. 1997. A spatial scan statistic. Communications in Statistics—Theory and Methods 26, 6 (1997), 1481--1496.Google ScholarGoogle Scholar
  19. Martin Kulldorff, Lan Huang, and Kevin Konty. 2009. A scan statistic for continuous data based on the normal probability model. International Journal of Health Geographics 8, 1 (2009), 58.Google ScholarGoogle ScholarCross RefCross Ref
  20. Martin Kulldorff, Lan Huang, and Linda Pickle. 2003. An elliptic spatial scan statistic and its application to breast cancer mortality data in Northeastern United States. Journal of Urban Health 80 (2003), i130--i131.Google ScholarGoogle Scholar
  21. Martin Kulldorff, Lan Huang, Linda Pickle, and Luiz Duczmal. 2006. An elliptic spatial scan statistic. Statistics in Medicine 25, 22 (2006), 3929--3943.Google ScholarGoogle ScholarCross RefCross Ref
  22. Martin Kulldorff, Farzad Mostashari, Luiz Duczmal, W. Katherine Yih, Ken Kleinman, and Richard Platt. 2007. Multivariate scan statistics for disease surveillance. Statistics in Medicine 26, 8 (2007), 1824--1833.Google ScholarGoogle ScholarCross RefCross Ref
  23. Michael Leitner and Marco Helbich. 2011. The impact of hurricanes on crime: A spatio-temporal analysis in the city of Houston, Texas. Cartography and Geographic Information Science 38, 2 (2011), 213--221.Google ScholarGoogle ScholarCross RefCross Ref
  24. Lan Luo. 2013. Impact of spatial aggregation error on the spatial scan analysis: A case study of colorectal cancer. Geospatial Health 8, 1 (2013), 23--35.Google ScholarGoogle Scholar
  25. Nicholas Malizia. 2013. Inaccuracy, uncertainty and the space-time permutation scan statistic. PLoS One 8, 2 (2013), e52034.Google ScholarGoogle ScholarCross RefCross Ref
  26. Tomoki Nakaya and Keiji Yano. 2010. Visualising crime clusters in a space-time cube: An exploratory data-analysis approach using space-time kernel density estimation and scan statistics. Transactions in GIS 14, 3 (2010), 223--239.Google ScholarGoogle ScholarCross RefCross Ref
  27. Daniel B. Neill. 2009. Expectation-based scan statistics for monitoring spatial time series data. International Journal of Forecasting 25, 3 (2009), 498--517.Google ScholarGoogle ScholarCross RefCross Ref
  28. Daniel B. Neill. 2011. Fast Bayesian scan statistics for multivariate event detection and visualization. Statistics in Medicine 30, 5 (2011), 455--469.Google ScholarGoogle ScholarCross RefCross Ref
  29. Daniel B. Neill and Gregory F. Cooper. 2010. A multivariate Bayesian scan statistic for early event detection and characterization. Machine Learning 79, 3 (2010), 261--282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Daniel B. Neill, Gregory F. Cooper, Kaustav Das, Xia Jiang, and Jeff Schneider. 2009. Bayesian network scan statistics for multivariate pattern detection. In Scan Statistics. Statistics for Industry and Technology. Springer, 221--249.Google ScholarGoogle Scholar
  31. Daniel B. Neill and Andrew W. Moore. 2004. A fast multi-resolution method for detection of significant spatial disease clusters. In Advances in Neural Information Processing Systems 10. 651--658.Google ScholarGoogle Scholar
  32. Daniel B. Neill and Andrew W. Moore. 2004. Rapid detection of significant spatial clusters. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 256--265.Google ScholarGoogle Scholar
  33. Daniel B. Neill, Andrew W. Moore, and Gregory F. Cooper. 2006. A Bayesian spatial scan statistic. In Advances in Neural Information Processing Systems. 1003--1010.Google ScholarGoogle Scholar
  34. Fernando L. P. Oliveira, André L. F. Cançado, Luiz H. Duczmal, and Anderson R. Duarte. 2012. Assessing the outline uncertainty of spatial disease clusters. In Public Health—Methodology, Environmental and Systems Issues, J. Maddock (Ed.). InTech, 51--66.Google ScholarGoogle Scholar
  35. Dev Oliver, Shashi Shekhar, James M. Kang, Renee Laubscher, Veronica Carlan, and Abdussalam Bannur. 2013. A k-main routes approach to spatial network activity summarization. IEEE Transactions on Knowledge and Data Engineering 26, 6 (2013), 1464--1478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sushil K. Prasad, Danial Aghajarian, Michael McDermott, Dhara Shah, Mohamed Mokbel, Satish Puri, Sergio J. Rey, et al. 2017. Parallel processing over spatial-temporal datasets from geo, bio, climate and social science communities: A research roadmap. In Proceedings of the 2017 IEEE International Congress on Big Data (BigData Congress’17). IEEE, Los Alamitos, CA, 232--250.Google ScholarGoogle Scholar
  37. Shashi Shekhar, Steven Feiner, and Walid Aref. 2015. Spatial computing. Communications of the ACM 59, 1 (2015), 72--81.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Shashi Shekhar, Zhe Jiang, Reem Ali, Emre Eftelioglu, Xun Tang, Venkata Gunturi, and Xun Zhou. 2015. Spatiotemporal data mining: A computational perspective. ISPRS International Journal of Geo-Information 4, 4 (2015), 2306--2338.Google ScholarGoogle ScholarCross RefCross Ref
  39. Lei Shi and Vandana P. Janeja. 2009. Anomalous window discovery through scan statistics for linear intersecting paths (SSLIP). In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 767--776.Google ScholarGoogle Scholar
  40. Joanne R. Stevenson, Christopher T. Emrich, Jerry T. Mitchell, and Susan L. Cutter. 2010. Using building permits to monitor disaster recovery: A spatio-temporal case study of coastal Mississippi following Hurricane Katrina. Cartography and Geographic Information Science 37, 1 (2010), 57--68.Google ScholarGoogle ScholarCross RefCross Ref
  41. Xun Tang, Emre Eftelioglu, Dev Oliver, and Shashi Shekhar. 2017. Significant linear hotspot discovery. IEEE Transactions on Big Data 3, 2 (2017), 140--153.Google ScholarGoogle ScholarCross RefCross Ref
  42. Jonathan Wakefield and Albert Kim. 2013. A Bayesian model for cluster detection. Biostatistics 14, 4 (2013), 752--765.Google ScholarGoogle ScholarCross RefCross Ref
  43. Clemens Wastl, Yong Wang, Aitor Atencia, and Christoph Wittmann. 2019. Independent perturbations for physics parametrization tendencies in a convection-permitting ensemble (pSPPT). Geoscientific Model Development 12, 1 (2019), 261--273.Google ScholarGoogle Scholar
  44. Antje Weisheimer, Susanna Corti, Tim Palmer, and Frederic Vitart. 2014. Addressing model error through atmospheric stochastic physical parametrizations: Impact on the coupled ECMWF seasonal forecasting system. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 372, 2018 (2014), 20130290.Google ScholarGoogle Scholar
  45. Claire S. Witham and Clive Oppenheimer. 2004. Mortality in England during the 1783--4 Laki Craters eruption. Bulletin of Volcanology 67, 1 (2004), 15--26.Google ScholarGoogle Scholar
  46. Yiqun Xie, Emre Eftelioglu, Reem Ali, Xun Tang, Yan Li, Ruhi Doshi, and Shashi Shekhar. 2017. Transdisciplinary foundations of geospatial data science. ISPRS International Journal of Geo-Information 6, 12 (2017), 395.Google ScholarGoogle ScholarCross RefCross Ref
  47. Yiqun Xie, Jayant Gupta, Yan Li, and Shashi Shekhar. 2018. Transforming smart cities with spatial computing. In Proceedings of the 2018 IEEE International Smart Cities Conference (ISC2’18). IEEE, Los Alamitos, CA, 1--9.Google ScholarGoogle Scholar
  48. Yiqun Xie and Shashi Shekhar. 2019. A nondeterministic normalization based scan statistic (NN-scan) towards robust hotspot detection: A summary of results. In Proceedings of the SIAM International Conference on Data Mining (SDM’19).Google ScholarGoogle ScholarCross RefCross Ref
  49. Yiqun Xie and Shashi Shekhar. 2019. Significant DBSCAN towards statistically robust clustering. In Proceedings of the 16th International Symposium on Spatial and Temporal Databases. 31--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yiqun Xie, Xun Zhou, and Shashi Shekhar. 2020. Discovering interesting sub-paths with statistical significance from spatio-temporal datasets. ACM Transactions on Intelligent Systems and Technology 11, 1 (2020), Article 2.Google ScholarGoogle Scholar

Index Terms

  1. A Unified Framework for Robust and Efficient Hotspot Detection in Smart Cities

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM/IMS Transactions on Data Science
          ACM/IMS Transactions on Data Science  Volume 1, Issue 3
          Special Issue on Urban Computing and Smart Cities
          August 2020
          217 pages
          ISSN:2691-1922
          DOI:10.1145/3424342
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 September 2020
          • Online AM: 7 May 2020
          • Accepted: 1 December 2019
          • Revised: 1 November 2019
          • Received: 1 June 2019
          Published in tds Volume 1, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!