ABSTRACT
Large-scale graph-structured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graph-parallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the real-world have highly skewed power-law degree distributions, which challenge the assumptions made by these abstractions, limiting performance and scalability.
In this paper, we characterize the challenges of computation on natural graphs in the context of existing graph-parallel abstractions. We then introduce the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges. Leveraging the PowerGraph abstraction we introduce a new approach to distributed graph placement and representation that exploits the structure of power-law graphs. We provide a detailed analysis and experimental evaluation comparing PowerGraph to two popular graph-parallel systems. Finally, we describe three different implementation strategies for PowerGraph and discuss their relative merits with empirical evaluations on large-scale real-world problems demonstrating order of magnitude gains.
- ABOU-RJEILI, A., AND KARYPIS, G. Multilevel algorithms for partitioning power-law graphs. In IPDPS (2006). Google Scholar
Digital Library
- AHMED, A., ALY, M., GONZALEZ, J., NARAYANAMURTHY, S., AND SMOLA, A. J. Scalable inference in latent variable models. In WSDM (2012), pp. 123-132. Google Scholar
Digital Library
- ALBERT, R., JEONG, H., AND BARABÁSI, A. L. Error and attack tolerance of complex networks. In Nature (2000), vol. 406, pp. 378-482.Google Scholar
- BERTSEKAS, D. P., AND TSITSIKLIS, J. N. Parallel and distributed computation: numerical methods. Prentice-Hall, 1989. Google Scholar
Digital Library
- BOLDI, P., ROSA, M., SANTINI, M., AND VIGNA, S. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW (2011), pp. 587-596. Google Scholar
Digital Library
- BOLDI, P., AND VIGNA, S. The WebGraph framework I: Compression techniques. In WWW (2004), pp. 595-601. Google Scholar
Digital Library
- BORDINO, I., BOLDI, P., DONATO, D., SANTINI, M., AND VIGNA, S. Temporal evolution of the uk web. In ICDM Workshops (2008), pp. 909-918. Google Scholar
Digital Library
- BULUÇ, A., AND GILBERT, J. R. The combinatorial blas: design, implementation, and applications. IJHPCA 25, 4 (2011), 496-509. Google Scholar
Digital Library
- CATALYUREK, U., AND AYKANAT, C. Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In IRREGULAR (1996), pp. 75-86. Google Scholar
Digital Library
- CHANDY, K. M., AND MISRA, J. The drinking philosophers problem. ACM Trans. Program. Lang. Syst. 6, 4 (Oct. 1984), 632-646. Google Scholar
Digital Library
- CHENG, R., HONG, J., KYROLA, A., MIAO, Y., WENG, X., WU, M., YANG, F., ZHOU, L., ZHAO, F., AND CHEN, E. Kineograph: taking the pulse of a fast-changing and connected world. In EuroSys (2012), pp. 85-98. Google Scholar
Digital Library
- CHIERICHETTI, F., KUMAR, R., LATTANZI, S., MITZENMACHER, M., PANCONESI, A., AND RAGHAVAN, P. On compressing social networks. In KDD (2009), pp. 219-228. Google Scholar
Digital Library
- DEVINE, K. D., BOMAN, E. G., HEAPHY, R. T., BISSELING, R. H., AND CATALYUREK, U. V. Parallel hypergraph partitioning for scientific computing. In IPDPS (2006). Google Scholar
Digital Library
- DIJKSTRA, E. W. Hierarchical ordering of sequential processes. Acta Informatica 1 (1971), 115-138.Google Scholar
Digital Library
- EKANAYAKE, J., LI, H., ZHANG, B., GUNARATHNE, T., BAE, S., QIU, J., AND FOX, G. Twister: A runtime for iterative MapReduce. In HPDC (2010), ACM. Google Scholar
Digital Library
- FALOUTSOS, M., FALOUTSOS, P., AND FALOUTSOS, C. On power-law relationships of the internet topology. ACM SIGCOMM Computer Communication Review 29, 4 (1999), 251-262. Google Scholar
Digital Library
- GONZALEZ, J., LOW, Y., GRETTON, A., AND GUESTRIN, C. Parallel gibbs sampling: From colored fields to thin junction trees. In AISTATS (2011), vol. 15, pp. 324-332.Google Scholar
- GONZALEZ, J., LOW, Y., AND GUESTRIN, C. Residual splash for optimally parallelizing belief propagation. In AISTATS (2009), vol. 5, pp. 177-184.Google Scholar
- GONZALEZ, J., LOW, Y., GUESTRIN, C., AND O'HALLARON, D. Distributed parallel inference on large factor graphs. In UAI (2009). Google Scholar
Digital Library
- GREGOR, D., AND LUMSDAINE, A. The parallel BGL: A generic library for distributed graph computations. POOSC (2005).Google Scholar
- HOFMANN, T. Probabilistic latent semantic indexing. In SIGIR (1999), pp. 50-57. Google Scholar
Digital Library
- KANG, U., TSOURAKAKIS, C. E., AND FALOUTSOS, C. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM (2009), pp. 229-238. Google Scholar
Digital Library
- KARYPIS, G., AND KUMAR, V. Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 1 (1998), 96-129. Google Scholar
Digital Library
- KWAK, H., LEE, C., PARK, H., AND MOON, S. What is twitter, a social network or a news media? In WWW (2010), pp. 591-600. Google Scholar
Digital Library
- KYROLA, A., BLELLOCH, G., AND GUESTRIN, C. GraphChi: Large-scale graph computation on just a PC. In OSDI (2012). Google Scholar
Digital Library
- LANG, K. Finding good nearly balanced cuts in power law graphs. Tech. Rep. YRL-2004-036, Yahoo! Research Labs, Nov. 2004.Google Scholar
- LESKOVEC, J., KLEINBERG, J., AND FALOUTSOS, C. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 1 (mar 2007). Google Scholar
Digital Library
- LESKOVEC, J., LANG, K. J., DASGUPTA, A., AND MAHONEY, M. W. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2008), 29-123.Google Scholar
- LOW, Y., GONZALEZ, J., KYROLA, A., BICKSON, D., GUESTRIN, C., AND HELLERSTEIN, J. M. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. PVLDB (2012). Google Scholar
Digital Library
- MALEWICZ, G., AUSTERN, M. H., BIK, A. J., DEHNERT, J., HORN, I., LEISER, N., AND CZAJKOWSKI, G. Pregel: a system for large-scale graph processing. In SIGMOD (2010). Google Scholar
Digital Library
- PELLEGRINI, F., AND ROMAN, J. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In HPCN Europe (1996), pp. 493-498. Google Scholar
Digital Library
- POWER, R., AND LI, J. Piccolo: building fast, distributed programs with partitioned tables. In OSDI (2010). Google Scholar
Digital Library
- PUJOL, J. M., ERRAMILLI, V., SIGANOS, G., YANG, X., LAOUTARIS, N., CHHABRA, P., AND RODRIGUEZ, P. The little engine(s) that could: scaling online social networks. In SIGCOMM (2010), pp. 375-386. Google Scholar
Digital Library
- SMOLA, A. J., AND NARAYANAMURTHY, S. An Architecture for Parallel Topic Models. PVLDB 3, 1 (2010), 703-710. Google Scholar
Digital Library
- STANTON, I., AND KLIOT, G. Streaming graph partitioning for large distributed graphs. Tech. Rep. MSR-TR-2011-121, Microsoft Research, November 2011.Google Scholar
- SURI, S., AND VASSILVITSKII, S. Counting triangles and the curse of the last reducer. In WWW (2011), pp. 607-614. Google Scholar
Digital Library
- ZAHARIA, M., CHOWDHURY, M., FRANKLIN, M. J., SHENKER, S., AND STOICA, I. Spark: Cluster computing with working sets. In HotCloud (2010). Google Scholar
Digital Library
- ZHOU, Y., WILKINSON, D., SCHREIBER, R., AND PAN, R. Large-scale parallel collaborative filtering for the netflix prize. In AAIM (2008), pp. 337-348. Google Scholar
Digital Library
Index Terms
PowerGraph: distributed graph-parallel computation on natural graphs
Recommendations
On the Multichromatic Number of s-Stable Kneser Graphs
For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes
An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...




Comments