skip to main content
research-article
Public Access

A Simple Yet Effective Balanced Edge Partition Model for Parallel Computing

Published:13 June 2017Publication History
Skip Abstract Section

Abstract

Graph edge partition models have recently become an appealing alternative to graph vertex partition models for distributed computing due to their flexibility in balancing loads and their performance in reducing communication cost [6, 16]. In this paper, we propose a simple yet effective graph edge partitioning algorithm. In practice, our algorithm provides good partition quality (and better than similar state-of-the-art edge partition approaches, at least for power-law graphs) while maintaining low partition overhead. In theory, previous work [6] showed that an approximation guarantee of O(dmax√ log n log k) apply to the graphs with m=Ω(k2) edges (k is the number of partitions). We further rigorously proved that this approximation guarantee hold for all graphs.

We show how our edge partition model can be applied to parallel computing. We draw our example from GPU program locality enhancement and demonstrate that the graph edge partition model does not only apply to distributed computing with many computer nodes, but also to parallel computing in a single computer node with a many-core processor.

References

  1. Fusion of parallel array operations. pages 71--85, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A.-L. Barabási and R. Albert. Emergence of scaling in random networks. science, 286(5439):509--512, 1999.Google ScholarGoogle Scholar
  3. N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, Dec. 2008.Google ScholarGoogle Scholar
  4. R. F. Boisvert, R. Pozo, K. A. Remington, R. F. Barrett, and J. Dongarra. Matrix market: a web resource for test matrix collections. In Quality of Numerical Software, pages 125--137, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pages 101--113, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Bourse, M. Lelarge, and M. Vojnovic. Balanced graph edge partition. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, pages 1456--1465, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. U. V. Catalyürek and C. Aykanat. PaToH: a multilevel hypergraph partitioning tool, version 3.0. Bilkent University, Department of Computer Engineering, Ankara, 6533, 1999.Google ScholarGoogle Scholar
  8. S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), IISWC '09, pages 44--54, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Corrigan, F. Camelli, R. Löhner, and J. Wallin. Running unstructured grid cfd solvers on modern graphics hardware. In 19th AIAA Computational Fluid Dynamics Conference, number AIAA 2009--4001, June 2009.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Corrigan, F. Camelli, R. Löhner, and J. Wallin. Running unstructured grid-based cfd solvers on modern graphics hardware. Int. J. Numer. Meth. Fluids, 66:221--229, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  11. S. Dalton and N. Bell. CUSP: A CGoogle ScholarGoogle Scholar
  12. templated sparse matrix library, 2014.Google ScholarGoogle Scholar
  13. T. A. Davis and Y. Hu. The university of florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1--1:25, Dec. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, PLDI '99, pages 229--241, New York, NY, USA, 1999. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing, 64(1):108--134, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Even. Fast approximate graph partitioning algorithms. SIAM J. Comput., 28(6):2187--2214, Aug. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, volume 12, page 2, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Hendrickson and T. G. Kolda. Graph partitioning models for parallel computing. Parallel Comput., 26(12):1519--1534, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Hendrickson and T. G. Kolda. Partitioning rectangular and structurally unsymmetric sparse matrices for parallel processing. SIAM Journal on Scientific Computing, 21(6):2048--2072, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. 1952.Google ScholarGoogle Scholar
  21. G. Karypis and V. Kumar. Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. 1995.Google ScholarGoogle Scholar
  22. G. Karypis and V. Kumar. hMETIS 1.5: A hypergraph partitioning package. Technical report, Department of Computer Science, University of Minnesota, 1998.Google ScholarGoogle Scholar
  23. R. Krauthgamer, J. S. Naor, and R. Schwartz. Partitioning graphs into balanced components. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '09, pages 942--949, Philadelphia, PA, USA, 2009. Society for Industrial and Applied Mathematics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716--727, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135--146. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. NVIDIA. NVIDIA's Next Generation CUDA Compute Architecture: Kepler GK110. 2012.Google ScholarGoogle Scholar
  27. NVIDIA. cuSPARSE library. 2014.Google ScholarGoogle Scholar
  28. C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM '14, pages 333--342, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Wu, Z. Zhao, E. Z. Zhang, Y. Jiang, and X. Shen. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on gpu. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 57--68, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 369--380, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Simple Yet Effective Balanced Edge Partition Model for Parallel Computing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
          Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 1, Issue 1
          June 2017
          712 pages
          EISSN:2476-1249
          DOI:10.1145/3107080
          Issue’s Table of Contents

          Copyright © 2017 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 June 2017
          Published in pomacs Volume 1, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!