RealGraphGPUWeb: A Convenient and Efficient GPU-Based Graph Analysis Platform on the Web

In this demo paper, we present RealGraphGPUWeb a web-based graph analysis platform with the following features: (1) easy to use user-friendly GUI, (2) high processing performance, (3) various graph algorithms and data formats supported, (4) high accessibility anywhere on the web, and (5) no coding requirements. In our demo, we show how a naive user (e.g., non-CS researcher) gets graph analysis results conveniently and efficiently with a few clicks through the web-based GUI inRealGraphGPUWeb. We also make the user feel the effect of performance improvement obtained by our optimization strategies employed in RealGraphGPUWeb. We believe that RealGraphGPUWeb could be a good platform not only for CS users but also for non-CS users who want to analyze big graphs for their applications easily and efficiently.


INTRODUCTION
A graph is a data structure widely used to model relationships among objects in various application domains [8,16,17].We analyze graphs to discover interesting patterns such as node ranking, community detection, and structural analysis [4,6,7,12].As the size of real-world graphs is increasing rapidly thus forming big graphs, efficient processing of such big graphs is getting more important.
Toward this end, various graph engines based on a single machine (single-machine graph engines, in short) have been developed [3,9,10,13,16].However, there are still significant difficulties for users to analyze graphs easily and conveniently.From the viewpoint of a naive user (e.g., non-CS researcher), developing her application and its graphical user interface (GUI) based on a graph engine is a difficult and time-consuming task because it requires significant programming skills.
To address this issue, we developed RealGraphGPU Web , a platform that runs on a single machine with an interactive web-based GUI, thereby allowing users to access various graph analysis services easily and conveniently anywhere on the web.In our platform, a user simply uploads a target graph to a server and chooses a graph algorithm along with its required parameters.Then, the server performs the selected algorithm on the graph and represents the analysis result graphically in a variety of forms such as tables, charts, or graphs.
We developed RealGraphGPU Web on top of [9], a GPU version of our RealGraph [16] that is a state-of-the-art single-machine graph engine designed by considering carefully the unique characteristics of real-world graphs (e.g., power-law degree distribution).Real-GraphGPU achieves up to five times improvement in performance compared with the original RealGraph, enabling to perform graph analysis much faster than other state-of-the-art single-machine graph engines such as GraphChi [1], X-Stream [2], FlashGraph [3], and TurboGraph [13].The graph algorithms already implemented and thus provided in RealGraphGPU Web allow users to easily analyze their graphs without any programming.
In our demo, we will present two main scenarios.The first one is for a user to perform graph analysis easily and simply by clicking some options through the web-based GUI implemented in RealGraphGPU Web .The second one is to make the user feel the performance change in graph analysis by turning on/off our optimization options employed in RealGraphGPU, thus understanding the power of each optimization strategy.
The organization of this paper is as follows.Section 2 briefly describes RealGraphGPU, a high-performance GPU-based singlemachine graph engine that supports efficient graph analysis in RealGraphGPU Web .Section 3 illustrates the overall demo scenarios with RealGraphGPU Web .Finally, Section 4 summarizes and concludes this paper.

OVERVIEW OF REALGRAPHGPU 2.1 Architecture
Figure 1 illustrates the 5-layer architecture and the processing flow of RealGraphGPU.Each layer can be featured as follows.
• Storage layer: This layer manages the data in storage, where data is stored in fixed-size blocks and processed in a block-based

Processing Steps and Optimizations
RealGraphGPU performs a graph algorithm as follows.First, for the bits set as 1 in an indicator vector, corresponding to the nodes in the graph to be processed in the current iteration, RealGraphGPU loads the block containing those nodes from storage to the CPU buffer and transfers it again to the GPU buffer.Then, it assigns the nodes in the loaded block to GPU threads for processing.GPU threads perform the operations in the graph algorithm and update the results; they also identify the nodes to be processed for the next iteration and set the bits corresponding to them as 1 in the other indicator vector.This process is performed iteratively until no more bits are set as 1 in the indicator vector.
The important features of RealGraphGPU employed to provide high performance in big graph analysis are as follows: Workload allocation: Other graph engines distribute workloads over GPU threads in a node-based manner.However, nodes have significantly different numbers of edges in a real-world graph.Thus, a thread with a small workload (i.e., assigned with a node having a few edges) should wait for other threads since it may finish earlier than others, which would degrade the overall performance [11].To address this, RealGraphGPU distributes workloads over GPU threads in an edge-based manner, allocating the same number of edges.This edge-based allocation makes workloads balanced over threads, achieving better performance.
Buffer pre-checking: RealGraphGPU avoids redundant block transfers in the CPU/GPU buffers.It checks the CPU/GPU buffer table in each iteration before block transfer.If the table indicates the existence of the block in the GPU buffer, block transfer does not occur; if the block does not exist in the GPU buffer but exists in the CPU buffer, it is loaded from the CPU buffer to the GPU buffer without being loaded from disk.This enables the reuse of data blocks already in buffers, eliminating the overhead of unnecessary block transfers.This leads to a significant improvement in overall performance.

Performance
Table 1 shows the performance comparison of RealGraphGPU with other single-machine graph engines on the Yahoo dataset having 1,413M nodes and 6.6B edges (top-half) and distributed graph engines on the Twitter dataset having 61M nodes and 1.4B edges (bottom-half)."O.O.M" indicates the out-of-memory case; "O.O.T" indicates the case when the experiment does not finish in 24 hours; "-" indicates the absence of the result in the original work.We observe that the performance of RealGraphGPU is an order-ofmagnitude better than those of the state-of-the-art single-machine graph engines, such as TurboGraph [13] and GridGraph [15], and does not cause out-of-memory cases like FlashGraph [3].Also, Re-alGraphGPU equipped with much lower computing resources consistently and significantly outperforms distributed graph engines such as PowerGraph [5] and GraphLab [18].Graph analysis services using RealGraphGPU Web are performed in three steps: graph upload, algorithm execution, and result visualization & download (refer to Figure 1).Graph upload: In the step of graph upload, a user uploads a target graph to a RealGraphGPU Web server.RealGraphGPU Web provides the user with a sample input graph; this helps the user to upload her own graph in the correct format.Then, it converts the format of the uploaded graph to its internal data structure and stores the converted graph in the storage of RealGraphGPU.
Algorithm execution: Figures 1-(a)∼(d) show the substeps of the algorithm execution step.The server provides a list of graphs already stored in the storage of RealGraphGPU Web , among which the user selects her target for analysis (Figure 1-(a)).She can see the characteristics of the graph, such as numbers of nodes and edges.The server shows various graph algorithms already implemented in RealGraphGPU Web , such as outdegree/indegree distribution, breadth-first search (BFS) [12], PageRank [7], and weakly connected component (WCC) [6], betweenness centrality (BC) [20], hypertext induced topic selection (HITS) [19], random walk with restart (RWR) [4], and community detection (CD) [14], among which a user selects one for her analysis (Figure 1-(b)).Each algorithm has its own parameters: for instance, BFS requires the starting node while PageRank does the number of iterations.A user sets parameters as the values she wants (Figure 1-(c)).Then, she requests the execution of the algorithm on the graph (Figure 1-(d)).The server stores the result and the elapsed time of algorithm execution in a binary file and prints them in text format as well after the execution is completed.

Result visualization & download:
The server provides visualization and download functions in this step.It visualizes the execution result in a binary file in a table, a chart, or a graph by using the libraries such as Matplotlib, Echarts, and Shingle.js.The server allows the user to download the image of a visualized result and also the whole binary file that contains the execution result.It provides an option to transform a binary file to a CSV file or an ARFF file for compatibility because some tools such as Matlab, R, and Weka require those formats.
Figure 2 shows some examples of visualizing the results of different graph algorithms.For example, in the case of HITS, where there exist two kinds of scores (i.e., hub score and authority score), the distribution of each set of scores is shown in a chart (Figure 2-(a)).In the case of PageRank, it visualizes the graph in such a way that the size of a node is proportional to its PageRank score and the nodes with top-N PageRank scores are highlighted with different colors (Figure 2-(b)); in the case of BFS, it visualizes the graph by highlighting the BFS traversal paths of each iteration (Figure 2-(c)); in the case of CD, it visualizes the graph in such a way that the nodes belonging to the same community have the same color and the nodes belonging to different communities have different colors (Figure 2-(d)).
We have two main scenarios in demo.In the first scenario, we will show how a naive user performs graph analysis just by clicking some options through the web-based GUI of RealGraphGPU Web .A user will upload graphs, analyze them, and visualize/download results, simply following the three steps provided by our platform.In the second scenario, we will make the user feel the superior performance of RealGraphGPU Web ; specifically, during the graph Figure 3: Graphical User Interface of RealGraphGPU Web .analysis, she will realize the performance improvement obtained from the optimizations employed in RealGraphGPU by just turning on/off the optimization options.Furthermore, we provide the original (i.e., CPU-based) RealGraph with its optimization options of hierarchical indicator, block-wise workload allocation, and efficient data layout [16].Users could experience the performance comparison between two versions of RealGraph and also feel the performance improvement obtained by a variety of options.Figure 3 shows the GUI implemented in our RealGraphGPU Web .

CONCLUSIONS
In this paper, we introduced RealGraphGPU Web , a web-based graph analysis platform with good features as follows.
• Easy to use, thanks to user-friendly GUI • Efficient, thanks to the great power of RealGraphGPU • Convenient, thanks to various graph algorithms and data formats supported • Easily accessible, thanks to web-based support • Comfortable, thanks to no coding requirements We believe that our RealGraphGPU Web could be a good platform not only for CS users but also for non-CS users who want to analyze big graphs for discovering interesting patterns for their applications.

Table 1 :
Performance comparison of RealGraphGPU with other graph engines on Yahoo and Twitter data manner.Each block includes multiple objects, each of which stores a node and its related edges.•Bufferlayer: This layer maintains the CPU/GPU buffers for storing blocks loaded in main memory/device memory, and the CPU/GPU buffer tables for indexing the loaded blocks.•Object layer: This layer manages the objects contained in blocks.In the object index table, the -th column represents the indices of the objects in the -th block, where an object is indexed with its corresponding node ID. • CPU thread layer: This layer manages CPU threads and attribute/indicator vectors in main memory.CPU threads oversee the data transfer between main memory and device memory.The attribute vectors store the analysis result (e.g., PageRank scores) during the current/next iteration and the indicator vectors indicate which nodes are processed in the current/next iteration.• GPU thread layer: This layer manages GPU threads and attribute/indicator vectors in device memory.GPU threads perform the actual operations of a graph algorithm (e.g., PageRank).The attribute/indicator vectors play the same roles as in the CPU thread layer.After the operations are completed, the results are transferred back to the CPU thread layer.