skip to main content
10.1145/2287076.2287103acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Work stealing and persistence-based load balancers for iterative overdecomposed applications

Published:18 June 2012Publication History

ABSTRACT

Applications often involve iterative execution of identical or slowly evolving calculations. Such applications require incremental rebalancing to improve load balance across iterations. In this paper, we consider the design and evaluation of two distinct approaches to addressing this challenge: persistence-based load balancing and work stealing. The work to be performed is overdecomposed into tasks, enabling automatic rebalancing by the middleware. We present a hierarchical persistence-based rebalancing algorithm that performs localized incremental rebalancing. We also present an active-message-based retentive work stealing algorithm optimized for iterative applications on distributed memory machines. We demonstrate low overheads and high efficiencies on the full NERSC Hopper (146,400 cores) and ALCF Intrepid systems (163,840 cores), and on up to 128,000 cores on OLCF Titan.

References

  1. NERSC Hopper. http://www.nersc.gov/users/computational-systems/hopper.Google ScholarGoogle Scholar
  2. G. Baumgartner et al. Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models. Proc. of IEEE, 93(2):276--292, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In PPoPP, pages 207--216, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. D. Blumofe and P. A. Lisiecki. Adaptive and reliable parallel computing on networks of workstations. In USENIX, pages 10--10, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ü. V. Çatalyürek and C. Aykanat. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst., 10(7):673--693, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ü. V. Çatalyürek, E. G. Boman, K. D. Devine, D. Bozdag, R. T. Heaphy, and L. A. Riesen. A repartitioning hypergraph model for dynamic load balancing. JPDC, 69(8):711--724, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Chandramowlishwaran, K. Knobe, and R. Vuduc. Performance evaluation of concurrent collections on high-performance multicore computing systems. In IPDPS, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  8. P. Charles, C. Grothoff, V. Saraswat, et al. X10: an object-oriented approach to non-uniform cluster computing. In OOPSLA, pages 519--538, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. H. Darach Golden and S. McGrath. Parallel adaptive mesh refinement for large eddy simulation using the finite element methods. In PARA, pages 172--181, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Darte, J. Mellor-Crummey, R. Fowler, and D. C. Miranda. Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations. JPDC, 63(9):887--911, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Das, Y.-S. Hwang, M. Uysal, J. Saltz, and A. Sussman. Applying the CHAOS/PARTI library to irregular problems in computational chemistry and computational aerodynamics. In Scalable Parallel Libraries Conference, pages 45--56, oct 1993.Google ScholarGoogle Scholar
  12. J. Dinan, D. B. Larkins, P. Sadayappan, S. Krishnamoorthy, and J. Nieplocha. Scalable work stealing. In SC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Francez. Distributed termination. ACM Trans. Program. Lang. Syst., 2:42--55, January 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Gabriel et al. Open MPI: Goals, concept, and design of a next generation MPI implementation. In European PVM/MPI, September 2004.Google ScholarGoogle Scholar
  15. G. R. Gao, T. L. Sterling, R. Stevens, M. Hereld, and W. Zhu. Parallex: A study of a new parallel computation model. In IPDPS, pages 1--6, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  16. B. Hendrickson and R. Leland. An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput., 16:452--469, March 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Joerg and B. C. Kuszmaul. Massively parallel chess. In Proceedings of the Third DIMACS Parallel Implementation Challenge, Rutgers, 1994.Google ScholarGoogle Scholar
  18. L. Kalé and S. Krishnan. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In OOPSLA'93, pages 91--108, September 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Karypis, K. Schloegel, and V. Kumar. Parmetis: Parallel graph partitioning and sparse matrix ordering library. Version 1.0, Dept. of Computer Science, University of Minnesota, 1997.Google ScholarGoogle Scholar
  20. J. Nieplocha, V. Tipparaju, M. Krishnan, and D. K. Panda. High performance remote memory access communication: The ARMCI approach. Int. J. High Perform. Comput. Appl., 20(2):233--253, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. C. Phillips et al. Scalable molecular dynamics with namd. Journal of Computational Chemistry, 26(16):1781--1802, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O'Reilly Media, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. A. Saraswat, P. Kambadur, S. B. Kodali, D. Grove, and S. Krishnamoorthy. Lifeline-based global load balancing. In PPoPP, pages 201--212, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Szabo and N. S. Ostlund. Modern Quantum Chemistry. McGraw-Hill Inc., New York, 1996.Google ScholarGoogle Scholar
  25. R. V. van Nieuwpoort, T. Kielmann, and H. E. Bal. Efficient load balancing for wide-area divide-and-conquer applications. In PPoPP, pages 34--43, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. D. Williams. Performance of dynamic load balancing algorithms for unstructured mesh calculations. Concurrency: Pract. Exper., 3:457--481, October 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Zheng, A. Bhatele, E. Meneses, and L. V. Kale. Periodic Hierarchical Load Balancing for Large Supercomputers. IJHPCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Work stealing and persistence-based load balancers for iterative overdecomposed applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
          June 2012
          308 pages
          ISBN:9781450308052
          DOI:10.1145/2287076
          • General Chair:
          • Dick Epema,
          • Program Chairs:
          • Thilo Kielmann,
          • Matei Ripeanu

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          HPDC '12 Paper Acceptance Rate23of143submissions,16%Overall Acceptance Rate166of966submissions,17%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader