Abstract
Load balancing for clusters has been investigated extensively, mainly focusing on the effective usage of global CPU and memory resources. However, previous CPU- or memory-centric load balancing schemes suffer significant performance drop under I/O-intensive workloads due to the imbalance of I/O load. To solve this problem, we propose two simple yet effective I/O-aware load-balancing schemes for two types of clusters: (1) homogeneous clusters where nodes are identical and (2) heterogeneous clusters, which are comprised of a variety of nodes with different performance characteristics in computing power, memory capacity, and disk speed. In addition to assigning I/O-intensive sequential and parallel jobs to nodes with light I/O loads, the proposed schemes judiciously take into account both CPU and memory load sharing in the system. Therefore, our schemes are able to maintain high performance for a wide spectrum of workloads. We develop analytic models to study mean slowdowns, task arrival, and transfer processes in system levels. Using a set of real I/O-intensive parallel applications and synthetic parallel jobs with various I/O characteristics, we show that our proposed schemes consistently improve the performance over existing non-I/O-aware load-balancing schemes, including CPU- and Memory-aware schemes and a PBS-like batch scheduler for parallel and sequential jobs, for a diverse set of workload conditions. Importantly, this performance improvement becomes much more pronounced when the applications are I/O-intensive. For example, the proposed approaches deliver 23.6--88.0 % performance improvements for I/O-intensive applications such as LU decomposition, Sparse Cholesky, Titan, Parallel text searching, and Data Mining. When I/O load is low or well balanced, the proposed schemes are capable of maintaining the same level of performance as the existing non-I/O-aware schemes.
- Acharya, A. and Setia, S. 1999. Availability and utility of idle memory in workstation clusters. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'99). ACM, New York, NY, 35--46. Google Scholar
Digital Library
- Agarwala, S., Agarwala, I., Poellabauer, C., Kong, J., Schwan, K., and Wolf, M. 2003. Resource-aware stream management with the customizable dproc distributed monitoring mechanisms. In Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12). 250--259. Google Scholar
Digital Library
- Basney, J. and Livny, M. 2000. Managing network resources in condor. In Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing (HPDC'00). IEEE Computer Society, Los Alamitos, CA, 298. Google Scholar
Digital Library
- Bode, B., Halstead, D. M., Kendall, R., Lei, Z., and Jackson, D. 2000. The portable batch scheduler and the maui scheduler on linux clusters. In Proceedings of the 4th Annual Linux Showcase&Conference (ALS'00). USENIX Association, Berkeley, CA, 27--27. Google Scholar
Digital Library
- Braun, T. D., Siegal, H. J., and Beck, et al. 1999. A comparison study of static mapping heuristics for a class of meta-tasks on heterogeneous computing systems. In Proceedings of the 8th Heterogeneous Computing Workshop (HCW'99). IEEE Computer Society, Los Alamitos, CA, 15. Google Scholar
Digital Library
- Brown, T. 1979. M/g/1 round robin discipline. Computing 22, 3, 225--241.Google Scholar
Cross Ref
- Cao, J., Li, Y., and Guo, M. 2005. Process migration for mpi applications based on coordinated checkpoint. In Proceedings of the 11th International Conference on Parallel and Distributed Systems (ICPADS'05). IEEE Computer Society, Los Alamitos, CA, 306--312. Google Scholar
Digital Library
- Carns, P. H., Ligon, W. B., III, Ross, R. B., and Thakur, R. 2000. Pvfs: A parallel file system for linux clusters. In Proceedings of the 4th Annual Linux Showcase and Conference. USENIX Association, 317--327. Google Scholar
Digital Library
- Carr, R. W. and Hennessy, J. L. 1981. Wsclock—a simple and effective algorithm for virtual memory management. SIGOPS Oper. Syst. Rev. 15, 5, 87--95. Google Scholar
Digital Library
- Chang, C., Moon, B., Acharya, A., Shock, C., Sussman, A., and Saltz, J. 1997. Titan: a high-performance remote-sensing database. In Proceedings of the International Conference on Data Engineering. IEEE Computer Society Press, 375--384. Google Scholar
Digital Library
- Chow, K.-P. and Kwok, Y.-K. 2002. On load balancing for distributed multiagent computing. IEEE Trans. Parall. Distrib. Syst. 13, 8, 787--801. Google Scholar
Digital Library
- Cruz, J. and Park, K. 2001. Towards communication-sensitive load balancing. In Proceedings of the 21st International Conference on Distributed Computing Systems, 731--734. Google Scholar
Digital Library
- Dean, J. and Ghemawat, S. 2008. Mapreduce: Simplified data processing on large clusters. Comm. ACM 51, 1, 107--113. Google Scholar
Digital Library
- Deng, Q., Wang, X., and Zang, D. 2005. Monitoring MPI running nodes status for load balance. In Proceedings of the 4th International Conference on Grid and Cooperative Computing. Lecture Notes in Computer Science, vol. 3795, Springer, Berlin, 467--473. Google Scholar
Digital Library
- Dusseau, A. C., Arpaci, R. H., and Culler, D. E. 1996. Effective distributed scheduling of parallel workloads. SIGMETRICS Perform. Eval. Rev. 24, 1, 25--36. Google Scholar
Digital Library
- Eager, D. L., Lazowska, E. D., and Zahorjan, J. 1986. Adaptive load sharing in homogeneous distributed systems. IEEE Trans. Softw. Engin. 12, 5, 662--675. Google Scholar
Digital Library
- Forney, B. C., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2002. Storage-aware caching: Revisiting caching for heterogeneous storage systems. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST'02). USENIX Association, Berkeley, CA, 5. Google Scholar
Digital Library
- Geoffray, P. 2002. Opiom: off-processor i/o with myrinet. Future Gener. Comput. Syst. 18, 4, 491--499. Google Scholar
Digital Library
- Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The google file system. SIGOPS Oper. Syst. Rev. 37, 5, 29--43. Google Scholar
Digital Library
- Harchol-Balter, M. and Downey, A. B. 1996. Exploiting process lifetime distributions for dynamic load balancing. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'96). ACM, New York, NY, 13--24. Google Scholar
Digital Library
- Hui, C.-C. and Chanson, S. T. 1999. Improved strategies for dynamic load balancing. IEEE Concurrency 7, 3, 58--67. Google Scholar
Digital Library
- Kannan, S., Roberts, M., Mayes, P., Brelsford, D., and Skovira, J. F. 2001. Workload Management with LoadLeveler. IBM.Google Scholar
- Keren, A. and Barak, A. 2003. Opportunity cost algorithms for reduction of i/o and interprocess communication overhead in a computing cluster. IEEE Trans. Parall. Distrib. Syst. 14, 1, 39--50. Google Scholar
Digital Library
- Kim, M. Y. 1986. Synchronized disk interleaving. IEEE Trans. Comput. 35, 11, 978--988. Google Scholar
Digital Library
- Kotz, D. and Nieuwejaar, N. 1994. Dynamic file-access characteristics of a production parallel scientific workload. In Proceedings of the Conference on Supercomputing (Supercomputing'94). IEEE Computer Society Press, Los Alamitos, CA, 640--649. Google Scholar
Digital Library
- Lavi, R. and Barak, A. 2001. The home model and competitive algorithms for load balancing in a computing cluster. In Proceedings of the The 21st International Conference on Distributed Computing Systems (ICDCS'01). IEEE Computer Society, Los Alamitos, CA, 127. Google Scholar
Digital Library
- Lee, L.-W., Scheuermann, P., and Vingralek, R. 2000. File assignment in parallel i/o systems with minimal variance of service time. IEEE Trans. Comput. 49, 2, 127--140. Google Scholar
Digital Library
- Ma, X., Winslett, M., Lee, J., and Yu, S. 2002. Faster collective output through active buffering. In Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS'02). IEEE Computer Society, Los Alamitos, CA, 151. Google Scholar
Digital Library
- Pasquale, B. K. and Polyzos, G. C. 1994. Dynamic i/o characterization of i/o intensive scientific applications. In Proceedings of the Conference on Supercomputing (Supercomputing'94). IEEE Computer Society Press, Los Alamitos, CA, 660--669. Google Scholar
Digital Library
- Qin, X. 2008. Performance comparisons of load balancing algorithms for i/o-intensive workloads on clusters. J. Netw. Comput. Appl. 31, 1, 32--46. Google Scholar
Digital Library
- Qin, X., Jiang, H., Zhu, Y., and Swanson, D. R. 2003a. Dynamic load balancing for i/o- and memory-intensive workload in clusters using a feedback control mechanism. In Proceedings of the 9th International Euro-Par Conference on Parallel Processing (Euro-Par'03). 224--229.Google Scholar
- Qin, X., Jiang, H., Zhu, Y., and Swanson, D. R. 2003b. Dynamic load balancing for i/o-intensive tasks on heterogeneous clusters. In Proceedings of the 10th International Conference on High Performance Computing (HiPC'03). 300--309.Google Scholar
- Qin, X., Jiang, H., Zhu, Y., and Swanson, D. R. 2003c. A dynamic load balancing scheme for i/o-intensive applications in distributed systems. Proceedings of the International Conference on Parallel Processing Workshops. 79.Google Scholar
- Roads, J. and et al. 1992. A preliminary description of the western u.s. climatology. In Proceedings of the Annual Pacific Climate Workshop.Google Scholar
- Singh, R. and Graham, P. 2008. Performance driven partial checkpoint/migrate for lam-mpi. In Proceedings of the 22nd International Symposium on High Performance Computing Systems and Applications (HPCS'08). IEEE Computer Society, Los Alamitos, 110--116. Google Scholar
Digital Library
- Surdeanu, M., Moldovan, D. I., and Harabagiu, S. M. 2002. Performance analysis of a distributed question/answering system. IEEE Trans. Parall. Distrib. Syst. 13, 6, 579--596. Google Scholar
Digital Library
- Tanaka, T. 1993. Configurations of the solar wind flow and magnetic field around the planets with no magnetic field: Calculation by a new mhd. Geophys. Res., 17251--17262.Google Scholar
- Uysal, M., Acharya, A., and Saltz, J. 1997. Requirements of i/o systems for parallel machines: an application-driven study. Tech. rep., College Park, MD. Google Scholar
Digital Library
- Varman, P. J. and Verma, R. M. 1999. Tight bounds for prefetching and buffer management algorithms for parallel i/o systems. IEEE Trans. Parall. Distrib. Syst. 10, 12, 1262--1275. Google Scholar
Digital Library
- Vazhkudai, S. and Schopf, J. M. 2002. Using disk throughput data in predictions of end-to-end grid data transfers. In Proceedings of the 3rd International Workshop on Grid Computing (GRID'02). Springer-Verlag, Berlin, 291--304. Google Scholar
Digital Library
- Voelker, G. M., Jamrozik, H. A., Vernon, M. K., Levy, H. M., and Lazowska, E. D. 1997. Managing server load in global memory systems. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'97). ACM, New York, NY, 127--138. Google Scholar
Digital Library
- Vydyanathan, N., Khanna, G., Kurc, T., Catalyurek, U., Wyckoff, P., Saltz, J., and Sadayappan, P. 2004. Use of pvfs for efficient execution of jobs with pipeline-shared i/o. In Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing (GRID'04). IEEE Computer Society, Los Alamitos, CA, 235--242. Google Scholar
Digital Library
- Xiao, L., Zhang, X., and Qu, Y. 2000. Effective load sharing on heterogeneous networks of workstations. In Proceedings of the of International Symposium on Parallel and Distributed Processing. IEEE Computer Society Press, 431--438. Google Scholar
Digital Library
- Zhang, X., Xiao, L., and Qu, Y. 2000. Improving distributed workload performance by sharing both cpu and memory resources. In Proceedings of the 20th International Conference on Distributed Computing Systems (ICDCS'00). IEEE Computer Society, Los Alamitos, CA, 233. Google Scholar
Digital Library
- Zhang, Y., Yang, A., Sivasubramaniam, A., and Moreira, J. 1993. Gang scheduling extensions for i/o intensive workloads. In Proceedings of the Job Scheduling Strategies for Parallel Processing Workshop.Google Scholar
- Zhu, Y., Jiang, H., Qin, X., and Swanson, D. 2004. A case study of parallel i&##47;o for biological sequence search on linux clusters. Int. J. High Perform. Comput. Netw. 1, 4, 214--222. Google Scholar
Digital Library
Index Terms
Dynamic load balancing for I/O-intensive applications on clusters
Recommendations
Performance comparisons of load balancing algorithms for I/O-intensive workloads on clusters
Load balancing techniques play a critically important role in developing high-performance cluster computing platforms. Existing load balancing approaches are concerned with the effective usage of CPU and memory resources. Due to imbalance in disk I/O ...
Dynamic load balancing of SAMR applications on distributed systems
SC '01: Proceedings of the 2001 ACM/IEEE conference on SupercomputingDynamic load balancing(DLB) for parallel systems has been studied extensively; however, DLB for distributed systems is relatively new. To efficiently utilize computing resources provided by distributed systems, an underlying DLB scheme must address both ...
Multi-cluster load balancing based on process migration
APPT'07: Proceedings of the 7th international conference on Advanced parallel processing technologiesLoad balancing is important for distributed computing systems to achieve maximum resource utilization, and process migration is an efficient way to dynamically balance the load among multiple nodes. Due to limited capacity of a single cluster, it's ...








Comments