Abstract
Graph databases use graph structures to store data sets as nodes, edges, and properties. They are used to store and search the relationships between a large number of nodes, such as social networking services and recommendation engines that use customer social graphs. Since computation cost for graph search queries increases as the graph becomes large, in this pa- per we accelerate the graph search functions (Dijkstra and A* algorithms) of a graph database Neo4j using two ways: multi- threaded library and CUDA library for graphics processing units (GPUs). We use 100,000-node graphs generated based on a degree distribution of Facebook social graph for evaluations. Our multi-threaded and GPU-based implementations require an auxiliary adjacency matrix for a target graph. The results show that, when we do not take into account additional overhead to generate the auxiliary adjacency matrix, multi-threaded version improves the Dijkstra and A* search performance by 16.2x and 13.8x compared to the original implementation. The GPU-based implementation improves the Dijkstra and A* search performance by 26.2x and 32.8x. When we take into account the overhead, although the speed-ups by our implementations are reduced, by reusing the auxiliary adjacency matrix for multiple graph search queries we can significantly improve the graph search performance.
- J. M. Bull and M. E. Kambites. JOMP - An OpenMP-like Interface for Java. In Proc. of International Conference on Java Grande, pages 44--53, June 2000. Google Scholar
Digital Library
- T. H. Hetherington, T. G. Rogers, L. Hsu, M. O'Connor, and T. M. Aamodt. Characterizing and Evaluating a Key-value Store Application on Heterogegenenous CPU-GPU Systems. In Proc. of the International Symposium on Performance Analysis of System and Software, pages 88--98, April 2012. Google Scholar
Digital Library
- jcuda.org. http://www.jcuda.org.Google Scholar
- D. Merill, M. Garland, and A. Grimshaw. Scalable GPU Graph Traversal. In Proc. of International Symposium on Principles and Practice of Parallel Programming, pages 117--128, August 2012. Google Scholar
Digital Library
- Neo4j.org. http://www.neo4j.org.Google Scholar
- S. Nobari, T.-T. Cao, S. Bressan, and P. Karras. Scalable Parallel Minimum Spanning Forest Computation. In Proc. of International Symposium on Principles and Practice of Parallel Programming, pages 205--214, August 2012. Google Scholar
Digital Library
- H. Ortega-Arranz, Y. Torres, D. R. Llanos, and A. Gonzalez-Escribano. A New GPU-based Approach to the Shortest Path Problem. In Proc. of International Conference on High Performance Computing and Simulation, pages 505--511, July 2013.Google Scholar
Cross Ref
- J. Ugander, B. Karrer, L. BackStrom, and C. Marlow. The Anatomy of the Facebook Social Graph. In Arxiv preprint arXiv:1111.4503, November 2011.Google Scholar
Recommendations
Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud ComputingMany modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. ...
High-performance cone beam reconstruction using CUDA compatible GPUs
Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper presents an acceleration method for cone beam reconstruction using CUDA ...
Accelerating CUDA graph algorithms at maximum warp
PPoPP '11Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffered heavily when the graph structure ...






Comments