ABSTRACT
Spatial analysis of Big data is a key component of Cyber-GIS. However, how to utilize existing cyberinfrastructure (e.g. large computing clusters) to perform parallel and distributed spatial analysis on Big data remains a huge challenge. Problems such as efficient spatial weights creation, spatial statistics and spatial regression of Big data still need investigation. In this research, we propose a MapReduce algorithm for creating contiguity-based spatial weights. This algorithm provides the ability to create spatial weights from very large spatial datasets efficiently by using computing resources that are organized in the Hadoop framework. It works in the paradigm of MapReduce: mappers are distributed in computing clusters to find contiguous neighbors in parallel, then reducers collect the results and generate the weights matrix. To test the performance of this algorithm, we design experiment to create contiguity-based weights matrix from artificial spatial data with up to 190 million polygons using Amazon's Hadoop framework called Elastic MapReduce. The experiment demonstrates the scalability of this parallel algorithm which utilizes large computing clusters to solve the problem of creating contiguity weights on Big data.
- L. Anselin. From spacestat to cybergis twenty years of spatial data analysis software. International Regional Science Review, 35(2):131--157, 2012.Google Scholar
Cross Ref
- L. Anselin and S. J. Rey. Spatial econometrics in an age of cybergiscience. International Journal of Geographical Information Science, 26(12):2211--2226, 2012. Google Scholar
Digital Library
- L. Anselin, I. Syabri, and Y. Kho. Geoda: an introduction to spatial data analysis. Geographical analysis, 38(1):5--22, 2006.Google Scholar
Cross Ref
- M. F. Goodchild. Whose hand on the tiller? revisiting "spatial statistical analysis and gis". pages 49--59, 2010.Google Scholar
- C. Ji, T. Dong, Y. Li, Y. Shen, K. Li, W. Qiu, W. Qu and M. Guo. Inverted grid-based knn query processing with mapreduce. In ChinaGrid Annual Conference (ChinaGrid), 2012 Seventh, pages 25--32. IEEE, 2012. Google Scholar
Digital Library
- S. J. Rey and L. Anselin. Pysal: A python library of spatial analytical methods. pages 175--193, 2010.Google Scholar
- K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1--10. IEEE, 2010. Google Scholar
Digital Library
- S. Wang. A cybergis framework for the synthesis of cyberinfrastructure, gis, and spatial analysis. Annals of the Association of American Geographers, 100(3):535--557, 2010.Google Scholar
Cross Ref
- S. Wang, L. Anselin, B. Bhaduri, C. Crosby, M. F. Goodchild, Y. Liu, and T. L. Nyerges. Cybergis software: a synthetic review and integration roadmap. International Journal of Geographical Information Science, 27(11):2122--2145, 2013. Google Scholar
Digital Library
Index Terms
A MapReduce algorithm to create contiguity weights for spatial analysis of big data




Comments