skip to main content
research-article

Periodic hierarchical load balancing for large supercomputers

Authors Info & Claims
Published:01 November 2011Publication History
Skip Abstract Section

Abstract

Large parallel machines with hundreds of thousands of processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with a relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to take longer to arrive at good solutions. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and longer running times of traditional distributed schemes. Our solution overcomes these issues by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We discuss techniques to deal with scalability challenges of load balancing at very large scale. We present performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at the Texas Advanced Computing Center) and 65,536 cores of Intrepid (the Blue Gene/P at Argonne National Laboratory) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD, with results on Intrepid.

References

  1. Ahmad I , Ghafoor A . A semi distributed task allocation strategy for large hypercube supercomputers .Proceedings of the 1990 conference on Supercomputing; 1990 ; 1990. 898. Google ScholarGoogle Scholar
  2. Baldeschwieler JE , Blumofe RD , Brewer EA . Atlas: an infrastructure for global computing .EW 7: Proceedings of the 7th workshop on ACM SIGOPS European workshop; 1996 ; 1996. 165. Google ScholarGoogle Scholar
  3. Bhatelé A , Bohm E , Kalé LV . Optimizing communication for Charm++ applications by reducing network contention .Concurrency and Computation: Practice and Experience. 2010;: Google ScholarGoogle Scholar
  4. Bhatelé A , Kalé LV , Kumar S . Dynamic topology aware load balancing algorithms for molecular dynamics applications .23rd ACM International Conference on Supercomputing; 2009 ; 2009. 110. Google ScholarGoogle Scholar
  5. Bhatelé A , Kumar S , Mei C , Phillips JC , Zheng G , Kalé LV . Overcoming scaling challenges in biomolecular simulations across multiple platforms .Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008; 2008 ; 2008. 1.Google ScholarGoogle Scholar
  6. Catalyurek U , Boman E , Devine K , Bozdag D , Heaphy R , Riesen L . Hypergraph-based dynamic load balancing for adaptive scientific computations .Proc. of 21st International Parallel and Distributed Processing Symposium (IPDPS'07); 2007 ; 2007. 1.Google ScholarGoogle Scholar
  7. Catlett C , et alHPC and Grids in Action. Grandinetti L , ed. Amsterdam: IOS Press ; 2007:225-49.Google ScholarGoogle Scholar
  8. CorradiALeonardiLZambonelliF (1999) Diffusive load balancing policies for dynamic applications. IEEE Concurrency7(1):22-31. URLhttp://polaris.ing.unimo.it/Zambonelli/PDF/Concurrency.pdf. Google ScholarGoogle Scholar
  9. Devine KD , Boman EG , Heaphy RT , Hendrickson BA , Teresco JD , Faik J , et al. New challenges in dynamic load balancing .Appl. Numer. Math.. 2005;52 (2-3): 133-52 Google ScholarGoogle Scholar
  10. Dinan J , Larkins DB , Sadayappan P , Krishnamoorthy S , ieplocha J . Scalable work stealing .SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis; 2009 ; 2009. 1. Google ScholarGoogle Scholar
  11. Frigo M , Leiserson CE , Randall KH . The Implementation of the Cilk-5 Multithreaded Language .ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), volume 33 of ACM Sigplan Notices; 1998 ; 1998. 212. Google ScholarGoogle Scholar
  12. Furuichi M , Taki K , Ichiyoshi N . A multi-level load balancing scheme for or-parallel exhaustive search programs on the multi-psi .Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1990;: 50-9 Google ScholarGoogle Scholar
  13. Ha'c A , Jin X . Dynamic load balancing in distributed system using a decentralized algorithm .Proc. of 7-th Intl. Conf. on Distributed Computing Systems; 1987 ; 1987. 170.Google ScholarGoogle Scholar
  14. Jetley P , Gioachin F , Mendes C , Kale LV , Quinn TR . Massively parallel cosmological simulations with ChaNGa .Proceedings of IEEE International Parallel and Distributed Processing Symposium; 2008 ; 2008. 1.Google ScholarGoogle Scholar
  15. Kalé L , Krishnan S . CHARM++: A Portable Concurrent Object Oriented System Based on C++ .Proceedings of OOPSLA'93; 1993 ; 1993. 91. Google ScholarGoogle Scholar
  16. Kalé LV . Comparing the performance of two dynamic load distribution methods .Proceedings of the 1988 International Conference on Parallel Processing; 1988 St. Charles, IL; 1988. 8.Google ScholarGoogle Scholar
  17. Kalé LV , Bhandarkar M , Brunner R . Load balancing in parallel molecular dynamics .Fifth International Symposium on Solving Irregularly Structured Problems in Parallel, volume 1457 of Lecture Notes in Computer Science; 1998 ; 1998. 251. Google ScholarGoogle Scholar
  18. KarypisGKumarV (1998) Multilevel k-way Partitioning Scheme for Irregular Graphs. Journal of Parallel and Distributed Computing48: 96-129. URLhttp://www-users.cs.umn.edu/~karypis/publications/Papers/PDF/mlevel_kparallel.pdf. Google ScholarGoogle Scholar
  19. Lawlor O , Chakravorty S , Wilmarth T , Choudhury N , Dooley I , Zheng G , et al. Parfum: A parallel framework for unstructured meshes for scalable dynamic physics applications .Engineering with Computers. 2006;22 (3-4): 215-35 Google ScholarGoogle Scholar
  20. Lawlor OS , Kalé LV . Supporting dynamic parallel object arrays .Concurrency and Computation: Practice and Experience. 2003;15: 371-93Google ScholarGoogle Scholar
  21. Mangala S , Wilmarth T , Chakravorty S , Choudhury N , Kale LV , Geubelle PH . Parallel adaptive simulations of dynamic fracture events .Engineering with Computers. 2007;24: 341-58 Google ScholarGoogle Scholar
  22. Nieuwpoort RVV , Kielmann T , Bal HE Lecture Notes in Computer Science. Springer ; 2000:690-9.Google ScholarGoogle Scholar
  23. Phillips JC , Zheng G , Kumar S , Kalé LV . NAMD: Biomolecular simulation on thousands of processors .Proceedings of the 2002 ACM/IEEE conference on Supercomputing; 2002 Baltimore, MD; 2002. 1. Google ScholarGoogle Scholar
  24. Shu WW , Kalé LV . A dynamic load balancing strategy for the Chare Kernel system .<conftl/>; 1989 ; 1989. 389. Google ScholarGoogle Scholar
  25. Sinha A , Kalé L . A load balancing strategy for prioritized execution of tasks .International Parallel Processing Symposium; 1993 New Port Beach, CA; 1993. 230. Google ScholarGoogle Scholar
  26. Weirs G , Dwarkadas V , Plewa T , Tomkins C , Marr-Lyon M . Validating the Flash code: vortex-dominated flows .Astrophysics and Space Science. 2005;298: 341-6Google ScholarGoogle Scholar
  27. Willebeek-LeMair MH , Reeves AP . Strategies for dynamic load balancing on highly parallel computers .IEEE Transactions on Parallel and Distributed Systems. 1993;4: 979-93 Google ScholarGoogle Scholar
  28. ZhengG (2005) Achieving high performance on extremely large parallel machines: performance prediction and load balancing. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign. Google ScholarGoogle Scholar
  29. Zoltan User's Guide. Zoltan Hierarchical Partitioning. Available at: http://www.cs.sandia.gov/Zoltan/ug_html.Google ScholarGoogle Scholar

Index Terms

  1. Periodic hierarchical load balancing for large supercomputers
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access