Abstract
We develop GeoMatch as a novel, scalable, and efficient big-data pipeline for large-scale map matching on Apache Spark. GeoMatch improves existing spatial big-data solutions by utilizing a novel spatial partitioning scheme inspired by Hilbert space-filling curves. Thanks to its partitioning scheme, GeoMatch can effectively balance operations across different processing units and achieve significant performance gains. GeoMatch also incorporates a dynamically adjustable error-correction technique that provides robustness against positioning errors. We demonstrate the effectiveness of GeoMatch through rigorous and extensive empirical benchmarks that consider large-scale urban spatial datasets ranging from 166,253 to 3.78B location measurements. We separately assess execution performance and accuracy of map matching and develop a benchmark framework for evaluating large-scale map matching. Results of our evaluation show up to 27.25-fold performance improvements compared to previous works while achieving better processing accuracy than current solutions. We also showcase the practical potential of GeoMatch with two urban management applications. GeoMatch and our benchmark framework are open-source.
- Mark Abkowitz et al. 1978. Transit Service Reliability. Technical Report. Cambridge, MA.Google Scholar
- Anita Ahmed, Alexey Kalinin, Pooneh Famili, Xin Tang, Ziman Zhou, Kaan Ozbay, and Huy Vo. 2017. Predicting Unmet Trip Demand. Technical Report. In 2016 IEEE International Conference on Big Data (Big Data). 833--842.Google Scholar
- Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, and Joel H. Saltz. 2013. Hadoop-GIS: A high performance spatial data warehousing system over MapReduce. PVLDB 6, 11 (2013), 1009--1020.Google Scholar
Digital Library
- A. M. R. Almeida, M. I. V. Lima, J. A. F. Macedo, and J. C. Machado. 2016. DMM: A distributed map-matching algorithm using the MapReduce paradigm. In Proceedings of the IEEE 19th International Conference on Intelligent Transportation Systems (ITSC’16). 1706--1711.Google Scholar
- Jie Bao, Ruiyuan Li, Xiuwen Yi, and Yu Zheng. 2016. Managing massive trajectories on the cloud. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’16). Association for Computing Machinery, New York, NY.Google Scholar
Digital Library
- W. J. Bouknight. 1969. An Improved Procedure for Generation of Half-tone Computer Graphics Presentations. Coordinated Science Laboratory Report no. R-432 (1969).Google Scholar
- Jack E. Bresenham. 1965. Algorithm for computer control of a digital plotter. IBM Syst. J. 4, 1 (1965), 25--30.Google Scholar
Digital Library
- Hongting Chen, Francis Ko, Shay Lehmann, Nurvita Monarizqa, Ian Wright, Kaan Ozbay, and Huy Vo. 2017. Performance Analysis and Tracking for NYC’s Transit System. Technical Report. In Center for Urban Science and Progress (CUSP), New York University.Google Scholar
- Ahmed Eldawy. 2014. SpatialHadoop: Towards flexible and scalable spatial processing using MapReduce. In Proceedings of the SIGMOD PhD Symposium (SIGMOD’14 PhD Symposium). ACM, New York, NY, 46--50.Google Scholar
- Ahmed Eldawy, Mohamed F. Mokbel, et al. 2016. The era of big spatial data: A survey. Found. Trends® Datab. 6, 3--4 (2016), 163--273.Google Scholar
Digital Library
- Stefan Hagedorn, Philipp Götze, and Kai-Uwe Sattler. 2017. The STARK framework for spatio-temporal data analytics on Spark. In Datenbanksysteme für Business, Technologie und Web (BTW’17). 123--142.Google Scholar
- Yaobin He, Haoyu Tan, Wuman Luo, Shengzhong Feng, and Jianping Fan. 2014. MR-DBSCAN: A scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comput. Sci. 8, 1 (2014), 83--99.Google Scholar
Digital Library
- Yan Huang and Jason W. Powell. 2012. Detecting regions of disequilibrium in taxi services under uncertainty. In Proceedings of the ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’12). ACM, New York, NY, 139--148.Google Scholar
- Environmental Systems Research Institute. 2018. GIS Tools for Hadoop by Esri. Retrieved from http://esri.github.io/gis-tools-for-hadoop/.Google Scholar
- International Data Corporation (IDC). 2019. IDC Forecasts Revenues for Big Data and Business Analytics Solutions Will Reach $189.1 Billion This Year with Double-Digit Annual Growth through 2022. Retrieved from https://www.idc.com/getdoc.jsp?containerId=prUS44998419.Google Scholar
- Ibrahim Kamel and Christos Faloutsos. 1994. Hilbert R-tree: An improved R-tree using fractals. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). Morgan Kaufmann Publishers Inc., San Francisco, CA, 500--509.Google Scholar
Digital Library
- Scott T. Leutenegger, Mario A. Lopez, and Jeffrey Edgington. 1997. STR: A simple and efficient algorithm for R-tree packing. In Proceedings of the 13th International Conference on Data Engineering. IEEE, 497--506.Google Scholar
- Bin Li, Daqing Zhang, Lin Sun, Chao Chen, Shijian Li, Guande Qi, and Qiang Yang. 2011. Hunting or waiting? Discovering passenger-finding strategies from a large-scale real-world taxi dataset. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications (PerCom 2011). IEEE, Los Alamitos, CA, 63--68.Google Scholar
- LocationTech. [n.d.]. LocationTech JTS Topology Suite. Retrieved from https://projects.eclipse.org/projects/locationtech.jts.Google Scholar
- Yin Lou, Chengyang Zhang, Yu Zheng, Xing Xie, Wei Wang, and Yan Huang. 2009. Map-matching for low-sampling-rate GPS trajectories. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 352--361.Google Scholar
Digital Library
- L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and L. Damas. 2013. Predicting taxi-passenger demand using streaming data. IEEE Trans. Intell. Transport. Syst. 14, 3 (Sept. 2013), 1393--1402.Google Scholar
Digital Library
- New York City Taxi and Limousine Commission. 2020. TLC Mentors Students Using Big Data. Retrieved from https://medium.com/@NYCTLC/students-use-tlc-data-to-study-unmet-taxi-demand-and-find-ideal-spots-for-taxi-relief-stands-644e40ebe11a.Google Scholar
- Varun Pandey, Andreas Kipf, Thomas Neumann, and Alfons Kemper. 2018. How good are modern spatial analytics systems? Proc. VLDB Endow. 11, 11 (2018), 1661--1673.Google Scholar
Digital Library
- Douglas Alves Peixoto, Hung Quoc Viet Nguyen, Bolong Zheng, and Xiaofang Zhou. 2019. A framework for parallel map-matching at scale using spark. Distrib. Parallel Datab. 37, 4 (2019), 697--720.Google Scholar
Cross Ref
- Meng Qu, Hengshu Zhu, Junming Liu, Guannan Liu, and Hui Xiong. 2014. A cost-effective recommender system for taxi drivers. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 45--54. DOI:https://doi.org/10.1145/2623330.2623668Google Scholar
Digital Library
- S. Ruan, R. Li, J. Bao, T. He, and Y. Zheng. 2018. CloudTP: A cloud-based flexible trajectory preprocessing framework. In Proceedings of the IEEE 34th International Conference on Data Engineering (ICDE’18). 1601--1604.Google Scholar
- P. Shimonti. 2015. What Is Geospatial Industry’s Value and Impact in World Economy? Retrieved from https://www.geospatialworld.net/blogs/geospatial-industrys-value-world-economy/.Google Scholar
- Ram Sriharsha. 2018. Magellan: Geospatial Analytics Using Spark. Retrieved from https://github.com/harsha2010/magellan.Google Scholar
- Mingjie Tang, Yongyang Yu, Qutaibah M. Malluhi, Mourad Ouzzani, and Walid G. Aref. 2016. LocationSpark: A distributed in-memory data management system for big spatial data. VLDB Endow. 9, 13 (Sept. 2016), 1565--1568.Google Scholar
- Mark Trompet, Xiang Liu, and Daniel J. Graham. 2011. Development of key performance indicator to compare regularity of service between urban bus operators. Transport. Res. Rec. 2216, 1 (2011), 33--41.Google Scholar
Cross Ref
- Hoang Vo, Ablimit Aji, and Fusheng Wang. 2014. SATO: A spatial data partitioning framework for scalable query processing. In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’14). ACM, New York, NY, 545--548.Google Scholar
- Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. Simba: Efficient in-memory spatial analytics. In Proceedings of the International Conference on Management of Data. ACM, New York, NY, 1071--1085.Google Scholar
- Simin You, Jianting Zhang, and Le Gruenwald. 2015. Large-scale spatial join query processing in cloud. In Proceedings of the 31st IEEE International Conference on Data Engineering Workshops (ICDEW’15). IEEE, 34--41.Google Scholar
- Jia Yu, Jinxuan Wu, and Mohamed Sarwat. 2015. GeoSpark: A cluster computing framework for processing large-scale spatial data. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’15). ACM, New York, NY.Google Scholar
- Jing Yuan, Yu Zheng, Chengyang Zhang, Wenlei Xie, Xing Xie, Guangzhong Sun, and Yan Huang. 2010. T-drive: Driving directions based on taxi trajectories. In Proceedings of the 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems (ACM-GIS’10). ACM, New York, NY, 99--108.Google Scholar
- Jing Yuan, Yu Zheng, Chengyang Zhang, Xing Xie, and Guang-Zhong Sun. 2010. An interactive-voting based map matching algorithm. In Proceedings of the 11th International Conference on Mobile Data Management. IEEE Computer Society, 43--52.Google Scholar
- Ayman Zeidan, Eemil Lagerspetz, Kai Zhao, Petteri Nurmi, Sasu Tarkoma, and Huy T. Vo. 2018. GeoMatch: Efficient large-scale map matching on Apache Spark. In Proceedings of the IEEE International Conference on Big Data (BigData’18). IEEE, 384--391.Google Scholar
- Daqing Zhang, Nan Li, Zhi-Hua Zhou, Chao Chen, Lin Sun, and Shijian Li. 2011. iBAT: Detecting anomalous taxi trajectories from GPS traces. In Proceedings of the 13th International Conference on Ubiquitous Computing (UbiComp’11). ACM, New York, NY, 99--108.Google Scholar
Digital Library
- Yu Zheng, Yanchi Liu, Jing Yuan, and Xing Xie. 2011. Urban computing with taxicabs. In Proceedings of the 13th International Conference on Ubiquitous Computing (UbiComp’11). ACM, New York, NY, 89--98.Google Scholar
Digital Library
Index Terms
GeoMatch: Efficient Large-scale Map Matching on Apache Spark
Recommendations
A novel big data analytics framework for smart cities
AbstractThe emergence of smart cities aims at mitigating the challenges raised due to the continuous urbanization development and increasing population density in cities. To face these challenges, governments and decision makers undertake ...
A comparative between hadoop mapreduce and apache Spark on HDFS
IML '17: Proceedings of the 1st International Conference on Internet of Things and Machine LearningData is growing now in a very high speed with a large volume, Spark and MapReduce1 both provide a processing model for analyzing and managing this large data -Big Data- stored on HDFS. In this paper, we discuss a comparative between Apache Spark and ...
Performance comparison of Apache Hadoop and Apache Spark
ICAICR '19: Proceedings of the Third International Conference on Advanced Informatics for Computing ResearchThe term 'Big Data' is a broad term used for the data sets, which is enormous and traditional data processing applications find it hard to process. Both Apache Spark and Apache Hadoop are one of the significant parts of the big data family. Some of the ...






Comments