Abstract
Large parallel machines with hundreds of thousands of processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with a relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to take longer to arrive at good solutions. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and longer running times of traditional distributed schemes. Our solution overcomes these issues by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We discuss techniques to deal with scalability challenges of load balancing at very large scale. We present performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at the Texas Advanced Computing Center) and 65,536 cores of Intrepid (the Blue Gene/P at Argonne National Laboratory) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD, with results on Intrepid.
- Ahmad I , Ghafoor A . A semi distributed task allocation strategy for large hypercube supercomputers .Proceedings of the 1990 conference on Supercomputing; 1990 ; 1990. 898. Google Scholar
- Baldeschwieler JE , Blumofe RD , Brewer EA . Atlas: an infrastructure for global computing .EW 7: Proceedings of the 7th workshop on ACM SIGOPS European workshop; 1996 ; 1996. 165. Google Scholar
- Bhatelé A , Bohm E , Kalé LV . Optimizing communication for Charm++ applications by reducing network contention .Concurrency and Computation: Practice and Experience. 2010;: Google Scholar
- Bhatelé A , Kalé LV , Kumar S . Dynamic topology aware load balancing algorithms for molecular dynamics applications .23rd ACM International Conference on Supercomputing; 2009 ; 2009. 110. Google Scholar
- Bhatelé A , Kumar S , Mei C , Phillips JC , Zheng G , Kalé LV . Overcoming scaling challenges in biomolecular simulations across multiple platforms .Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008; 2008 ; 2008. 1.Google Scholar
- Catalyurek U , Boman E , Devine K , Bozdag D , Heaphy R , Riesen L . Hypergraph-based dynamic load balancing for adaptive scientific computations .Proc. of 21st International Parallel and Distributed Processing Symposium (IPDPS'07); 2007 ; 2007. 1.Google Scholar
- Catlett C , et alHPC and Grids in Action. Grandinetti L , ed. Amsterdam: IOS Press ; 2007:225-49.Google Scholar
- CorradiALeonardiLZambonelliF (1999) Diffusive load balancing policies for dynamic applications. IEEE Concurrency7(1):22-31. URLhttp://polaris.ing.unimo.it/Zambonelli/PDF/Concurrency.pdf. Google Scholar
- Devine KD , Boman EG , Heaphy RT , Hendrickson BA , Teresco JD , Faik J , et al. New challenges in dynamic load balancing .Appl. Numer. Math.. 2005;52 (2-3): 133-52 Google Scholar
- Dinan J , Larkins DB , Sadayappan P , Krishnamoorthy S , ieplocha J . Scalable work stealing .SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis; 2009 ; 2009. 1. Google Scholar
- Frigo M , Leiserson CE , Randall KH . The Implementation of the Cilk-5 Multithreaded Language .ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), volume 33 of ACM Sigplan Notices; 1998 ; 1998. 212. Google Scholar
- Furuichi M , Taki K , Ichiyoshi N . A multi-level load balancing scheme for or-parallel exhaustive search programs on the multi-psi .Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1990;: 50-9 Google Scholar
- Ha'c A , Jin X . Dynamic load balancing in distributed system using a decentralized algorithm .Proc. of 7-th Intl. Conf. on Distributed Computing Systems; 1987 ; 1987. 170.Google Scholar
- Jetley P , Gioachin F , Mendes C , Kale LV , Quinn TR . Massively parallel cosmological simulations with ChaNGa .Proceedings of IEEE International Parallel and Distributed Processing Symposium; 2008 ; 2008. 1.Google Scholar
- Kalé L , Krishnan S . CHARM++: A Portable Concurrent Object Oriented System Based on C++ .Proceedings of OOPSLA'93; 1993 ; 1993. 91. Google Scholar
- Kalé LV . Comparing the performance of two dynamic load distribution methods .Proceedings of the 1988 International Conference on Parallel Processing; 1988 St. Charles, IL; 1988. 8.Google Scholar
- Kalé LV , Bhandarkar M , Brunner R . Load balancing in parallel molecular dynamics .Fifth International Symposium on Solving Irregularly Structured Problems in Parallel, volume 1457 of Lecture Notes in Computer Science; 1998 ; 1998. 251. Google Scholar
- KarypisGKumarV (1998) Multilevel k-way Partitioning Scheme for Irregular Graphs. Journal of Parallel and Distributed Computing48: 96-129. URLhttp://www-users.cs.umn.edu/~karypis/publications/Papers/PDF/mlevel_kparallel.pdf. Google Scholar
- Lawlor O , Chakravorty S , Wilmarth T , Choudhury N , Dooley I , Zheng G , et al. Parfum: A parallel framework for unstructured meshes for scalable dynamic physics applications .Engineering with Computers. 2006;22 (3-4): 215-35 Google Scholar
- Lawlor OS , Kalé LV . Supporting dynamic parallel object arrays .Concurrency and Computation: Practice and Experience. 2003;15: 371-93Google Scholar
- Mangala S , Wilmarth T , Chakravorty S , Choudhury N , Kale LV , Geubelle PH . Parallel adaptive simulations of dynamic fracture events .Engineering with Computers. 2007;24: 341-58 Google Scholar
- Nieuwpoort RVV , Kielmann T , Bal HE Lecture Notes in Computer Science. Springer ; 2000:690-9.Google Scholar
- Phillips JC , Zheng G , Kumar S , Kalé LV . NAMD: Biomolecular simulation on thousands of processors .Proceedings of the 2002 ACM/IEEE conference on Supercomputing; 2002 Baltimore, MD; 2002. 1. Google Scholar
- Shu WW , Kalé LV . A dynamic load balancing strategy for the Chare Kernel system .<conftl/>; 1989 ; 1989. 389. Google Scholar
- Sinha A , Kalé L . A load balancing strategy for prioritized execution of tasks .International Parallel Processing Symposium; 1993 New Port Beach, CA; 1993. 230. Google Scholar
- Weirs G , Dwarkadas V , Plewa T , Tomkins C , Marr-Lyon M . Validating the Flash code: vortex-dominated flows .Astrophysics and Space Science. 2005;298: 341-6Google Scholar
- Willebeek-LeMair MH , Reeves AP . Strategies for dynamic load balancing on highly parallel computers .IEEE Transactions on Parallel and Distributed Systems. 1993;4: 979-93 Google Scholar
- ZhengG (2005) Achieving high performance on extremely large parallel machines: performance prediction and load balancing. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign. Google Scholar
- Zoltan User's Guide. Zoltan Hierarchical Partitioning. Available at: http://www.cs.sandia.gov/Zoltan/ug_html.Google Scholar
Index Terms
Periodic hierarchical load balancing for large supercomputers
Recommendations
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers
ICPPW '10: Proceedings of the 2010 39th International Conference on Parallel Processing WorkshopsLarge parallel machines with hundreds of thousands of processors are being built. Recent studies have shown that ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized ...
Load balancing on speed
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingTo fully exploit multicore processors, applications are expected to provide a large degree of thread-level parallelism. While adequate for low core counts and their typical workloads, the current load balancing support in operating systems may not be ...
Load balancing on speed
PPoPP '10To fully exploit multicore processors, applications are expected to provide a large degree of thread-level parallelism. While adequate for low core counts and their typical workloads, the current load balancing support in operating systems may not be ...




Comments