skip to main content
10.5555/2387880.2387883acmotherconferencesArticle/Chapter ViewAbstractPublication PagesosdiConference Proceedingsconference-collections
Article

PowerGraph: distributed graph-parallel computation on natural graphs

Published:08 October 2012Publication History

ABSTRACT

Large-scale graph-structured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graph-parallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the real-world have highly skewed power-law degree distributions, which challenge the assumptions made by these abstractions, limiting performance and scalability.

In this paper, we characterize the challenges of computation on natural graphs in the context of existing graph-parallel abstractions. We then introduce the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges. Leveraging the PowerGraph abstraction we introduce a new approach to distributed graph placement and representation that exploits the structure of power-law graphs. We provide a detailed analysis and experimental evaluation comparing PowerGraph to two popular graph-parallel systems. Finally, we describe three different implementation strategies for PowerGraph and discuss their relative merits with empirical evaluations on large-scale real-world problems demonstrating order of magnitude gains.

References

  1. ABOU-RJEILI, A., AND KARYPIS, G. Multilevel algorithms for partitioning power-law graphs. In IPDPS (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AHMED, A., ALY, M., GONZALEZ, J., NARAYANAMURTHY, S., AND SMOLA, A. J. Scalable inference in latent variable models. In WSDM (2012), pp. 123-132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ALBERT, R., JEONG, H., AND BARABÁSI, A. L. Error and attack tolerance of complex networks. In Nature (2000), vol. 406, pp. 378-482.Google ScholarGoogle Scholar
  4. BERTSEKAS, D. P., AND TSITSIKLIS, J. N. Parallel and distributed computation: numerical methods. Prentice-Hall, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BOLDI, P., ROSA, M., SANTINI, M., AND VIGNA, S. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW (2011), pp. 587-596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. BOLDI, P., AND VIGNA, S. The WebGraph framework I: Compression techniques. In WWW (2004), pp. 595-601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. BORDINO, I., BOLDI, P., DONATO, D., SANTINI, M., AND VIGNA, S. Temporal evolution of the uk web. In ICDM Workshops (2008), pp. 909-918. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. BULUÇ, A., AND GILBERT, J. R. The combinatorial blas: design, implementation, and applications. IJHPCA 25, 4 (2011), 496-509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. CATALYUREK, U., AND AYKANAT, C. Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In IRREGULAR (1996), pp. 75-86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. CHANDY, K. M., AND MISRA, J. The drinking philosophers problem. ACM Trans. Program. Lang. Syst. 6, 4 (Oct. 1984), 632-646. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. CHENG, R., HONG, J., KYROLA, A., MIAO, Y., WENG, X., WU, M., YANG, F., ZHOU, L., ZHAO, F., AND CHEN, E. Kineograph: taking the pulse of a fast-changing and connected world. In EuroSys (2012), pp. 85-98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. CHIERICHETTI, F., KUMAR, R., LATTANZI, S., MITZENMACHER, M., PANCONESI, A., AND RAGHAVAN, P. On compressing social networks. In KDD (2009), pp. 219-228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. DEVINE, K. D., BOMAN, E. G., HEAPHY, R. T., BISSELING, R. H., AND CATALYUREK, U. V. Parallel hypergraph partitioning for scientific computing. In IPDPS (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. DIJKSTRA, E. W. Hierarchical ordering of sequential processes. Acta Informatica 1 (1971), 115-138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. EKANAYAKE, J., LI, H., ZHANG, B., GUNARATHNE, T., BAE, S., QIU, J., AND FOX, G. Twister: A runtime for iterative MapReduce. In HPDC (2010), ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. FALOUTSOS, M., FALOUTSOS, P., AND FALOUTSOS, C. On power-law relationships of the internet topology. ACM SIGCOMM Computer Communication Review 29, 4 (1999), 251-262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. GONZALEZ, J., LOW, Y., GRETTON, A., AND GUESTRIN, C. Parallel gibbs sampling: From colored fields to thin junction trees. In AISTATS (2011), vol. 15, pp. 324-332.Google ScholarGoogle Scholar
  18. GONZALEZ, J., LOW, Y., AND GUESTRIN, C. Residual splash for optimally parallelizing belief propagation. In AISTATS (2009), vol. 5, pp. 177-184.Google ScholarGoogle Scholar
  19. GONZALEZ, J., LOW, Y., GUESTRIN, C., AND O'HALLARON, D. Distributed parallel inference on large factor graphs. In UAI (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. GREGOR, D., AND LUMSDAINE, A. The parallel BGL: A generic library for distributed graph computations. POOSC (2005).Google ScholarGoogle Scholar
  21. HOFMANN, T. Probabilistic latent semantic indexing. In SIGIR (1999), pp. 50-57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. KANG, U., TSOURAKAKIS, C. E., AND FALOUTSOS, C. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM (2009), pp. 229-238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. KARYPIS, G., AND KUMAR, V. Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 1 (1998), 96-129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. KWAK, H., LEE, C., PARK, H., AND MOON, S. What is twitter, a social network or a news media? In WWW (2010), pp. 591-600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. KYROLA, A., BLELLOCH, G., AND GUESTRIN, C. GraphChi: Large-scale graph computation on just a PC. In OSDI (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. LANG, K. Finding good nearly balanced cuts in power law graphs. Tech. Rep. YRL-2004-036, Yahoo! Research Labs, Nov. 2004.Google ScholarGoogle Scholar
  27. LESKOVEC, J., KLEINBERG, J., AND FALOUTSOS, C. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 1 (mar 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. LESKOVEC, J., LANG, K. J., DASGUPTA, A., AND MAHONEY, M. W. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2008), 29-123.Google ScholarGoogle Scholar
  29. LOW, Y., GONZALEZ, J., KYROLA, A., BICKSON, D., GUESTRIN, C., AND HELLERSTEIN, J. M. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. PVLDB (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. MALEWICZ, G., AUSTERN, M. H., BIK, A. J., DEHNERT, J., HORN, I., LEISER, N., AND CZAJKOWSKI, G. Pregel: a system for large-scale graph processing. In SIGMOD (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. PELLEGRINI, F., AND ROMAN, J. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In HPCN Europe (1996), pp. 493-498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. POWER, R., AND LI, J. Piccolo: building fast, distributed programs with partitioned tables. In OSDI (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. PUJOL, J. M., ERRAMILLI, V., SIGANOS, G., YANG, X., LAOUTARIS, N., CHHABRA, P., AND RODRIGUEZ, P. The little engine(s) that could: scaling online social networks. In SIGCOMM (2010), pp. 375-386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. SMOLA, A. J., AND NARAYANAMURTHY, S. An Architecture for Parallel Topic Models. PVLDB 3, 1 (2010), 703-710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. STANTON, I., AND KLIOT, G. Streaming graph partitioning for large distributed graphs. Tech. Rep. MSR-TR-2011-121, Microsoft Research, November 2011.Google ScholarGoogle Scholar
  36. SURI, S., AND VASSILVITSKII, S. Counting triangles and the curse of the last reducer. In WWW (2011), pp. 607-614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. ZAHARIA, M., CHOWDHURY, M., FRANKLIN, M. J., SHENKER, S., AND STOICA, I. Spark: Cluster computing with working sets. In HotCloud (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. ZHOU, Y., WILKINSON, D., SCHREIBER, R., AND PAN, R. Large-scale parallel collaborative filtering for the netflix prize. In AAIM (2008), pp. 337-348. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PowerGraph: distributed graph-parallel computation on natural graphs
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            OSDI'12: Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
            October 2012
            362 pages
            ISBN:9781931971966

            Publisher

            USENIX Association

            United States

            Publication History

            • Published: 8 October 2012

            Check for updates

            Qualifiers

            • Article