Abstract
Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence condition is reached. Such features have led to the development of two different computation modes for graph-structured programs, namely synchronous (Sync) and asynchronous (Async) modes. Unfortunately, there is currently no in-depth study on their execution properties and thus programmers have to manually choose a mode, either requiring a deep understanding of underlying graph engines, or suffering from suboptimal performance. This paper makes the first comprehensive characterization on the performance of the two modes on a set of typical graph-parallel applications. Our study shows that the performance of the two modes varies significantly with different graph algorithms, partitioning methods, execution stages, input graphs and cluster scales, and no single mode consistently outperforms the other. To this end, this paper proposes Hsync, a hybrid graph computation mode that adaptively switches a graph-parallel program between the two modes for optimal performance. Hsync constantly collects execution statistics on-the-fly and leverages a set of heuristics to predict future performance and determine when a mode switch could be profitable. We have built online sampling and offline profiling approaches combined with a set of heuristics to accurately predicting future performance in the two modes. A prototype called PowerSwitch has been built based on PowerGraph, a state-of-the-art distributed graph-parallel system, to support adaptive execution of graph algorithms. On a 48-node EC2-like cluster, PowerSwitch consistently outperforms the best of both modes, with a speedup ranging from 9% to 73% due to timely switch between two modes.
- The laboratory for web algorithmics. http://law.dsi.unimi.it/datasets.php.Google Scholar
- A PACHE. The Apache Giraph Project. http://giraph.apache.org/.Google Scholar
- A PACHE. The Apache Hama Project. http://hama.apache.org/.Google Scholar
- B EAMER, S., A SANOVI ´ C, K., AND P ATTERSON, D. Directionoptimizing breadth-first search. Scientific Programming 21, 3 (2013), 137–148. Google Scholar
Digital Library
- B ERTSEKAS, D. P., G UERRIERO, F., AND M USMANNO, R. Parallel asynchronous label-correcting methods for shortest paths. Journal of Optimization Theory and Applications 88, 2 (1996), 297–320. Google Scholar
Digital Library
- B ERTSEKAS, D. P., AND T SITSIKLIS, J. N. Parallel and distributed computation: numerical methods. Prentice-Hall, Inc., 1989. Google Scholar
Digital Library
- B RIN, S., AND P AGE, L. The anatomy of a large-scale hypertextual web search engine. In Proc. WWW (1998). Google Scholar
Digital Library
- C HANDY, K., AND L AMPORT, L. Distributed snapshots: determining global states of distributed systems. ACM TOCS 3, 1 (1985), 63–75. Google Scholar
Digital Library
- C HEN, R., D ING, X., W ANG, P., C HEN, H., Z ANG, B., AND G UAN, H. Computation and communication efficient graph processing with distributed immutable view. In Proc. HPDC (2014). Google Scholar
Digital Library
- C HEN, R., S HI, J., C HEN, Y., G UAN, H., AND C HEN, H. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. Tech. Rep. 2013-11-001, Shanghai Jiao Tong University, 2013.Google Scholar
- C HEN, R., S HI, J., Z ANG, B., AND G UAN, H. Bipartite-oriented distributed graph partitioning for big learning. In Proc. APSys (2014). Google Scholar
Digital Library
- C HENG, R., H ONG, J., K YROLA, A., M IAO, Y., W ENG, X., W U, M., Y ANG, F., Z HOU, L., Z HAO, F., AND C HEN, E. Kineograph: taking the pulse of a fast-changing and connected world. In Proc. EuroSys (2012). Google Scholar
Digital Library
- C HIERICHETTI, F., K UMAR, R., L ATTANZI, S., M ITZENMACHER, M., P ANCONESI, A., AND R AGHAVAN, P. On compressing social networks. In Proc. SIGKDD (2009). Google Scholar
Digital Library
- D EAN, J., AND G HEMAWAT, S. Mapreduce: simplified data processing on large clusters. CACM 51, 1 (2008), 107–113. Google Scholar
Digital Library
- E FRON, B., H ASTIE, T., J OHNSTONE, I., AND T IBSHIRANI, R. Least angle regression. The Annals of statistics 32, 2 (2004), 407–499.Google Scholar
- E LIDAN, G., M C G RAW, I., AND K OLLER, D. Residual belief propagation: Informed scheduling for asynchronous message passing. In Proc. UAI (2006).Google Scholar
- G ONZALEZ, J., L OW, Y., G RETTON, A., AND G UESTRIN, C. Parallel gibbs sampling: From colored fields to thin junction trees. In Proc. ICAIS (2011).Google Scholar
- G ONZALEZ, J., L OW, Y., G U, H., B ICKSON, D., AND G UESTRIN, C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. OSDI (2012). Google Scholar
Digital Library
- G ONZALEZ, J. E., L OW, Y., G UESTRIN, C., AND O’H ALLARON, D. Distributed parallel inference on large factor graphs. In Proc. UAI (2009). Google Scholar
Digital Library
- G ONZALEZ, J. E., X IN, R. S., D AVE, A., C RANKSHAW, D., F RANKLIN, M. J., AND S TOICA, I. Graphx: Graph processing in a distributed dataflow framework. In Proc. OSDI (2014). Google Scholar
Digital Library
- H AN, W., M IAO, Y., L I, K., W U, M., Y ANG, F., Z HOU, L., P RAB - HAKARAN, V., C HEN, W., AND C HEN, E. Chronos: a graph engine for temporal graph analysis. In Proc. EuroSys (2014). Google Scholar
Digital Library
- H ASELGROVE, H. Wikipedia page-to-page link database. http://haselgrove.id.au/wikipedia.htm, 2010.Google Scholar
- H ERODOTOU, H., AND B ABU, S. Profiling, what-if analysis, and cost-based optimization of mapreduce programs. Proceedings of the VLDB Endowment (2011).Google Scholar
- H ERODOTOU, H., D ONG, F., AND B ABU, S. No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In Proc. SoCC (2011). Google Scholar
Digital Library
- J AIN, N., L IAO, G., AND W ILLKE, T. L. Graphbuilder: scalable graph etl framework. In Proc. GDM (2013). Google Scholar
Digital Library
- K WAK, H., L EE, C., P ARK, H., AND M OON, S. What is twitter, a social network or a news media? In Proc. WWW (2010). Google Scholar
Digital Library
- K YROLA, A., B LELLOCH, G., AND G UESTRIN, C. GraphChi: Largescale graph computation on just a PC. In Proc. OSDI (2012). Google Scholar
Digital Library
- L ESKOVEC, J., L ANG, K., D ASGUPTA, A., AND M AHONEY, M. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29–123.Google Scholar
- L IU, Y., W U, B., W ANG, H., AND M A, P. Bpgm: A big graph mining tool. Tsinghua Science and Technology 19, 1 (2014), 33–38.Google Scholar
- L OW, Y., B ICKSON, D., G ONZALEZ, J., G UESTRIN, C., K YROLA, A., AND H ELLERSTEIN, J. M. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB (2012). Google Scholar
Digital Library
- M ALEWICZ, G., A USTERN, M. H., B IK, A. J., D EHNERT, J. C., H ORN, I., L EISER, N., AND C ZAJKOWSKI, G. Pregel: a system for large-scale graph processing. In Proc. SIGMOD (2010). Google Scholar
Digital Library
- M URRAY, D. G., M C S HERRY, F., I SAACS, R., I SARD, M., B ARHAM, P., AND A BADI, M. Naiad: a timely dataflow system. In Proc. SOSP (2013). Google Scholar
Digital Library
- N GUYEN, D., L ENHARTH, A., AND P INGALI, K. A lightweight infrastructure for graph analytics. In Proc. SOSP (2013). Google Scholar
Digital Library
- P ANDA, B., H ERBACH, J., B ASU, S., AND B AYARDO, R. PLANET: massively parallel learning of tree ensembles with MapReduce. Proc. VLDB (2009). Google Scholar
Digital Library
- P ROJECT, S. N. A. Stanford large network dataset collection. http://snap.stanford.edu/data/.Google Scholar
- R OY, A., M IHAILOVIC, I., AND Z WAENEPOEL, W. X-stream: Edgecentric graph processing using streaming partitions. In Proc. SOSP (2013). Google Scholar
Digital Library
- S ALIHOGLU, S., AND W IDOM, J. GPS: A Graph Processing System. http://infolab.stanford.edu/gps/, 2012. Google Scholar
Digital Library
- S HAO, B., W ANG, H., AND L I, Y. Trinity: A distributed graph engine on a memory cloud. In Proc. SIGMOD (2013). Google Scholar
Digital Library
- S HUN, J., AND B LELLOCH, G. E. Ligra: a lightweight graph processing framework for shared memory. In Proc. PPoPP (2013). Google Scholar
Digital Library
- S MOLA, A., AND N ARAYANAMURTHY, S. An architecture for parallel topic models. Proc. VLDB (2010). Google Scholar
Digital Library
- V ALIANT, L. G. A bridging model for parallel computation. CACM 33, 8 (1990), 103–111. Google Scholar
Digital Library
- W ANG, G., X IE, W., D EMERS, A. J., AND G EHRKE, J. Asynchronous large-scale graph processing made easy. In Proc. CIDR (2013).Google Scholar
- W ILSON, C., B OE, B., S ALA, A., P UTTASWAMY, K. P., AND Z HAO, B. Y. User interactions in social networks and their implications. In Proc. EuroSys (2009). Google Scholar
Digital Library
- Y E, J., C HOW, J., C HEN, J., AND Z HENG, Z. Stochastic gradient boosted distributed decision trees. In Proc. CIKM (2009). Google Scholar
Digital Library
- Y OU, Y., S ONG, S. L., AND K ERBYSON, D. An adaptive crossarchitecture combination method for graph traversal. In Proc. SC (2014). Google Scholar
Digital Library
- Z HANG, K., C HEN, R., AND C HEN, H. Numa-aware graphstructured analytics. In Proc. PPoPP (2015). Google Scholar
Digital Library
- Z HAO, X., C HANG, A., S ARMA, A. D., Z HENG, H., AND Z HAO, B. Y. On the embeddability of random walk distances. Proc. VLDB (2013). Google Scholar
Digital Library
- Z HONG, J., AND H E, B. Medusa: Simplified graph processing on gpus. IEEE TPDS (2013). Google Scholar
Digital Library
- Z HONG, J., AND H E, B. Parallel graph processing on graphics processors made easy. Proc. VLDB (2013). Google Scholar
Digital Library
Index Terms
SYNC or ASYNC: time to fuse for distributed graph-parallel computation
Recommendations
SYNC or ASYNC: time to fuse for distributed graph-parallel computation
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingLarge-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence condition is reached. Such features have led to the development of two different ...
(Sync|Async)+ MPI search engines
PVM/MPI'07: Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing InterfaceWe propose a parallel MPI search engine that is capable of automatically switching between asynchronous message passing and bulk-synchronous message passing modes of operation. When the observed query traffic is small or moderate the standard multiple ...
Lazygraph: lazy data coherency for replicas in distributed graph-parallel computation
PPoPP '18Replicas 1 of a vertex play an important role in existing distributed graph processing systems which make a single vertex to be parallel processed by multiple machines and access remote neighbors locally without any remote access. However, replicas of ...






Comments