skip to main content
research-article
Open Access

GeoMatch: Efficient Large-scale Map Matching on Apache Spark

Authors Info & Claims
Published:14 September 2020Publication History
Skip Abstract Section

Abstract

We develop GeoMatch as a novel, scalable, and efficient big-data pipeline for large-scale map matching on Apache Spark. GeoMatch improves existing spatial big-data solutions by utilizing a novel spatial partitioning scheme inspired by Hilbert space-filling curves. Thanks to its partitioning scheme, GeoMatch can effectively balance operations across different processing units and achieve significant performance gains. GeoMatch also incorporates a dynamically adjustable error-correction technique that provides robustness against positioning errors. We demonstrate the effectiveness of GeoMatch through rigorous and extensive empirical benchmarks that consider large-scale urban spatial datasets ranging from 166,253 to 3.78B location measurements. We separately assess execution performance and accuracy of map matching and develop a benchmark framework for evaluating large-scale map matching. Results of our evaluation show up to 27.25-fold performance improvements compared to previous works while achieving better processing accuracy than current solutions. We also showcase the practical potential of GeoMatch with two urban management applications. GeoMatch and our benchmark framework are open-source.

References

  1. Mark Abkowitz et al. 1978. Transit Service Reliability. Technical Report. Cambridge, MA.Google ScholarGoogle Scholar
  2. Anita Ahmed, Alexey Kalinin, Pooneh Famili, Xin Tang, Ziman Zhou, Kaan Ozbay, and Huy Vo. 2017. Predicting Unmet Trip Demand. Technical Report. In 2016 IEEE International Conference on Big Data (Big Data). 833--842.Google ScholarGoogle Scholar
  3. Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, and Joel H. Saltz. 2013. Hadoop-GIS: A high performance spatial data warehousing system over MapReduce. PVLDB 6, 11 (2013), 1009--1020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. M. R. Almeida, M. I. V. Lima, J. A. F. Macedo, and J. C. Machado. 2016. DMM: A distributed map-matching algorithm using the MapReduce paradigm. In Proceedings of the IEEE 19th International Conference on Intelligent Transportation Systems (ITSC’16). 1706--1711.Google ScholarGoogle Scholar
  5. Jie Bao, Ruiyuan Li, Xiuwen Yi, and Yu Zheng. 2016. Managing massive trajectories on the cloud. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’16). Association for Computing Machinery, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. J. Bouknight. 1969. An Improved Procedure for Generation of Half-tone Computer Graphics Presentations. Coordinated Science Laboratory Report no. R-432 (1969).Google ScholarGoogle Scholar
  7. Jack E. Bresenham. 1965. Algorithm for computer control of a digital plotter. IBM Syst. J. 4, 1 (1965), 25--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hongting Chen, Francis Ko, Shay Lehmann, Nurvita Monarizqa, Ian Wright, Kaan Ozbay, and Huy Vo. 2017. Performance Analysis and Tracking for NYC’s Transit System. Technical Report. In Center for Urban Science and Progress (CUSP), New York University.Google ScholarGoogle Scholar
  9. Ahmed Eldawy. 2014. SpatialHadoop: Towards flexible and scalable spatial processing using MapReduce. In Proceedings of the SIGMOD PhD Symposium (SIGMOD’14 PhD Symposium). ACM, New York, NY, 46--50.Google ScholarGoogle Scholar
  10. Ahmed Eldawy, Mohamed F. Mokbel, et al. 2016. The era of big spatial data: A survey. Found. Trends® Datab. 6, 3--4 (2016), 163--273.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Stefan Hagedorn, Philipp Götze, and Kai-Uwe Sattler. 2017. The STARK framework for spatio-temporal data analytics on Spark. In Datenbanksysteme für Business, Technologie und Web (BTW’17). 123--142.Google ScholarGoogle Scholar
  12. Yaobin He, Haoyu Tan, Wuman Luo, Shengzhong Feng, and Jianping Fan. 2014. MR-DBSCAN: A scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comput. Sci. 8, 1 (2014), 83--99.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yan Huang and Jason W. Powell. 2012. Detecting regions of disequilibrium in taxi services under uncertainty. In Proceedings of the ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’12). ACM, New York, NY, 139--148.Google ScholarGoogle Scholar
  14. Environmental Systems Research Institute. 2018. GIS Tools for Hadoop by Esri. Retrieved from http://esri.github.io/gis-tools-for-hadoop/.Google ScholarGoogle Scholar
  15. International Data Corporation (IDC). 2019. IDC Forecasts Revenues for Big Data and Business Analytics Solutions Will Reach $189.1 Billion This Year with Double-Digit Annual Growth through 2022. Retrieved from https://www.idc.com/getdoc.jsp?containerId=prUS44998419.Google ScholarGoogle Scholar
  16. Ibrahim Kamel and Christos Faloutsos. 1994. Hilbert R-tree: An improved R-tree using fractals. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). Morgan Kaufmann Publishers Inc., San Francisco, CA, 500--509.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Scott T. Leutenegger, Mario A. Lopez, and Jeffrey Edgington. 1997. STR: A simple and efficient algorithm for R-tree packing. In Proceedings of the 13th International Conference on Data Engineering. IEEE, 497--506.Google ScholarGoogle Scholar
  18. Bin Li, Daqing Zhang, Lin Sun, Chao Chen, Shijian Li, Guande Qi, and Qiang Yang. 2011. Hunting or waiting? Discovering passenger-finding strategies from a large-scale real-world taxi dataset. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications (PerCom 2011). IEEE, Los Alamitos, CA, 63--68.Google ScholarGoogle Scholar
  19. LocationTech. [n.d.]. LocationTech JTS Topology Suite. Retrieved from https://projects.eclipse.org/projects/locationtech.jts.Google ScholarGoogle Scholar
  20. Yin Lou, Chengyang Zhang, Yu Zheng, Xing Xie, Wei Wang, and Yan Huang. 2009. Map-matching for low-sampling-rate GPS trajectories. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 352--361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and L. Damas. 2013. Predicting taxi-passenger demand using streaming data. IEEE Trans. Intell. Transport. Syst. 14, 3 (Sept. 2013), 1393--1402.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. New York City Taxi and Limousine Commission. 2020. TLC Mentors Students Using Big Data. Retrieved from https://medium.com/@NYCTLC/students-use-tlc-data-to-study-unmet-taxi-demand-and-find-ideal-spots-for-taxi-relief-stands-644e40ebe11a.Google ScholarGoogle Scholar
  23. Varun Pandey, Andreas Kipf, Thomas Neumann, and Alfons Kemper. 2018. How good are modern spatial analytics systems? Proc. VLDB Endow. 11, 11 (2018), 1661--1673.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Douglas Alves Peixoto, Hung Quoc Viet Nguyen, Bolong Zheng, and Xiaofang Zhou. 2019. A framework for parallel map-matching at scale using spark. Distrib. Parallel Datab. 37, 4 (2019), 697--720.Google ScholarGoogle ScholarCross RefCross Ref
  25. Meng Qu, Hengshu Zhu, Junming Liu, Guannan Liu, and Hui Xiong. 2014. A cost-effective recommender system for taxi drivers. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 45--54. DOI:https://doi.org/10.1145/2623330.2623668Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Ruan, R. Li, J. Bao, T. He, and Y. Zheng. 2018. CloudTP: A cloud-based flexible trajectory preprocessing framework. In Proceedings of the IEEE 34th International Conference on Data Engineering (ICDE’18). 1601--1604.Google ScholarGoogle Scholar
  27. P. Shimonti. 2015. What Is Geospatial Industry’s Value and Impact in World Economy? Retrieved from https://www.geospatialworld.net/blogs/geospatial-industrys-value-world-economy/.Google ScholarGoogle Scholar
  28. Ram Sriharsha. 2018. Magellan: Geospatial Analytics Using Spark. Retrieved from https://github.com/harsha2010/magellan.Google ScholarGoogle Scholar
  29. Mingjie Tang, Yongyang Yu, Qutaibah M. Malluhi, Mourad Ouzzani, and Walid G. Aref. 2016. LocationSpark: A distributed in-memory data management system for big spatial data. VLDB Endow. 9, 13 (Sept. 2016), 1565--1568.Google ScholarGoogle Scholar
  30. Mark Trompet, Xiang Liu, and Daniel J. Graham. 2011. Development of key performance indicator to compare regularity of service between urban bus operators. Transport. Res. Rec. 2216, 1 (2011), 33--41.Google ScholarGoogle ScholarCross RefCross Ref
  31. Hoang Vo, Ablimit Aji, and Fusheng Wang. 2014. SATO: A spatial data partitioning framework for scalable query processing. In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’14). ACM, New York, NY, 545--548.Google ScholarGoogle Scholar
  32. Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. Simba: Efficient in-memory spatial analytics. In Proceedings of the International Conference on Management of Data. ACM, New York, NY, 1071--1085.Google ScholarGoogle Scholar
  33. Simin You, Jianting Zhang, and Le Gruenwald. 2015. Large-scale spatial join query processing in cloud. In Proceedings of the 31st IEEE International Conference on Data Engineering Workshops (ICDEW’15). IEEE, 34--41.Google ScholarGoogle Scholar
  34. Jia Yu, Jinxuan Wu, and Mohamed Sarwat. 2015. GeoSpark: A cluster computing framework for processing large-scale spatial data. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’15). ACM, New York, NY.Google ScholarGoogle Scholar
  35. Jing Yuan, Yu Zheng, Chengyang Zhang, Wenlei Xie, Xing Xie, Guangzhong Sun, and Yan Huang. 2010. T-drive: Driving directions based on taxi trajectories. In Proceedings of the 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems (ACM-GIS’10). ACM, New York, NY, 99--108.Google ScholarGoogle Scholar
  36. Jing Yuan, Yu Zheng, Chengyang Zhang, Xing Xie, and Guang-Zhong Sun. 2010. An interactive-voting based map matching algorithm. In Proceedings of the 11th International Conference on Mobile Data Management. IEEE Computer Society, 43--52.Google ScholarGoogle Scholar
  37. Ayman Zeidan, Eemil Lagerspetz, Kai Zhao, Petteri Nurmi, Sasu Tarkoma, and Huy T. Vo. 2018. GeoMatch: Efficient large-scale map matching on Apache Spark. In Proceedings of the IEEE International Conference on Big Data (BigData’18). IEEE, 384--391.Google ScholarGoogle Scholar
  38. Daqing Zhang, Nan Li, Zhi-Hua Zhou, Chao Chen, Lin Sun, and Shijian Li. 2011. iBAT: Detecting anomalous taxi trajectories from GPS traces. In Proceedings of the 13th International Conference on Ubiquitous Computing (UbiComp’11). ACM, New York, NY, 99--108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yu Zheng, Yanchi Liu, Jing Yuan, and Xing Xie. 2011. Urban computing with taxicabs. In Proceedings of the 13th International Conference on Ubiquitous Computing (UbiComp’11). ACM, New York, NY, 89--98.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GeoMatch: Efficient Large-scale Map Matching on Apache Spark

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!