skip to main content
10.1145/1693453.1693475acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Load balancing on speed

Authors Info & Claims
Published:09 January 2010Publication History

ABSTRACT

To fully exploit multicore processors, applications are expected to provide a large degree of thread-level parallelism. While adequate for low core counts and their typical workloads, the current load balancing support in operating systems may not be able to achieve efficient hardware utilization for parallel workloads. Balancing run queue length globally ignores the needs of parallel applications where threads are required to make equal progress. In this paper we present a load balancing technique designed specifically for parallel applications running on multicore systems. Instead of balancing run queue length, our algorithm balances the time a thread has executed on ``faster'' and ``slower'' cores. We provide a user level implementation of speed balancing on UMA and NUMA multi-socket architectures running Linux and discuss behavior across a variety of workloads, usage scenarios and programming models. Our results indicate that speed balancing when compared to the native Linux load balancing improves performance and provides good performance isolation in all cases considered. Speed balancing is also able to provide comparable or better performance than DWRR, a fair multi-processor scheduling implementation inside the Linux kernel. Furthermore, parallel application performance is often determined by the implementation of synchronization operations and speed balancing alleviates the need for tuning the implementations of such primitives.

References

  1. Parallel Workload Archive. Available at http://www.cs.huji.ac.il/labs/parallel/workload/.Google ScholarGoogle Scholar
  2. K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec 2006.Google ScholarGoogle Scholar
  3. M. Banikazemi, D. E. Poff, and B. Abali. PAM: A Novel Performance/Power Aware Meta-Scheduler for Multi-Core Systems. In Proceedings of the ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC'08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT '08: Proceedings of the 17th International Conference On Parallel Architectures And Compilation Techniques, pages 72--81, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Boneti, R. Gioiosa, F. J. Cazorla, and M. Valero. A Dynamic Scheduler for Balancing HPC Applications. In SC '08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Borkar, P. Dubey, K. Kahn, D. Kuck, H. Mulder, S. Pawlowski, and J. Rattner. Platform 2015: Intel Processor and Platform Evolution for the Next Decade. White Paper, Intel Corporation, 2005.Google ScholarGoogle Scholar
  7. M. Curtis-Maury, F. Blagojevic, C. D. Antonopoulos, and D. S. Nikolopoulos. Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes. IEEE Trans. Parallel Distrib. Syst., 19(10):1396--1410, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. G. Feitelson and L. Rudolph. Gang Scheduling Performance Benefits for Fine--Grain Synchronization. Journal of Parallel and Distributed Computing, 16:306--318, 1992.Google ScholarGoogle Scholar
  9. K. B. Ferreira, P. Bridges, and R. Brightwell. Characterizing Application Sensitivity to OS Interference Using Kernel-Level Noise Injection. In SC '08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Gupta, A. Tucker, and S. Urushibara. The Impact Of Operating System Scheduling Policies And Synchronization Methods On Performance Of Parallel Applications. SIGMETRICS Perform. Eval. Rev., 19(1), 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Huang, O. Lawlor, and L. V. Kalé. Adaptive MPI. In In Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC 03), pages 306--322, 2003.Google ScholarGoogle Scholar
  12. M. A. Jette. Performance Characteristics Of Gang Scheduling In Multiprogrammed Environments. In Supercomputing '97: Proceedings of the 1997 ACM/IEEE Conference on Supercomputing (CDROM), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Kleinrock and R. R. Muntz. Processor Sharing Queueing Models of Mixed Scheduling Disciplines for Time Shared System. J. ACM, 19(3):464--482, 1972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kunal Agrawal and Yuxiong He and Wen Jing Hsu and Charles E. Leiserson. Adaptive Task Scheduling with Parallelism Feedback. In Proceedings of the Annual ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Li, D. Baumberger, and S. Hahn. Efficient And Scalable Multiprocessor Fair Scheduling Using Distributed Weighted Round-Robin. In PPoPP '09: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 65--74, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn. Efficient Operating System Scheduling for Performance-Asymmetric Multi-Core Architectures. In SC '07: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Liao, Z. Liu, L. Huang,, and B. Chapman. Evaluating OpenMP on Chip MultiThreading Platforms. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  18. R. Liu, K. Klues, S. Bird, S. Hofmeyr, K. Asanovic, and J. Kubiatowicz. Tessellation: Space-Time Partitioning in a Manycore Client OS. Proceedings of the First Usenix Workshop on Hot Topics in Parallelism, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. The NAS Parallel Benchmarks. Available at http://www.nas.nasa.gov/Software/NPB.Google ScholarGoogle Scholar
  20. The UPC NAS Parallel Benchmarks. Available at http://upc.gwu.edu/download.html.Google ScholarGoogle Scholar
  21. R. Nishtala and K. Yelick. Optimizing Collective Communication on Multicores. In First USENIX Workshop on Hot Topics in Parallelism (HotPar'09), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Ousterhout. Scheduling Techniques for Concurrent Systems. In In Proceedings of the 3rd International Conference on Distributed Computing Systems (ICDCS), 1982.Google ScholarGoogle Scholar
  23. D. Petrou, J. W. Milford, and G. A. Gibson. Implementing Lottery Scheduling: Matching the Specializations in Traditional Schedulers. In ATEC '99: Proceedings of the Annual Conference on USENIX Annual Technical Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Roberson. ULE: A Modern Scheduler for FreeBSD. In USENIX BSDCon, pages 17--28, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Snavely. Symbiotic Jobscheduling For A Simultaneous Multithreading Processor. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 234--244, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. S. Squiillante and E. D. Lazowska. Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling. IEEE Trans. Parallel Distrib. Syst., 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Tam, R. Azimi, and M. Stumm. Thread Clustering: Sharing-Aware Scheduling on SMP-CMP-SMT Multiprocessors. In EuroSys '07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Thekkath and S. J. Eggers. Impact of Sharing-Based Thread Placement on Multithreaded Architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA), 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Tsafrir, Y. Etsion, and D. G. Feitelson. Backfilling Using System-Generated Predictions Rather Than User Runtime Estimates. In IEEE TPDS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Tsafrir, Y. Etsion, D. G. Feitelson, and S. Kirkpatrick. System Noise, OS Clock Ticks, and Fine-Grained Parallel Applications. In ICS '05: Proceedings of the 19th Annual International Conference on Supercomputing, pages 303--312, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Vaswani and J. Zahorjan. The Implications Of Cache Affinity On Processor Scheduling For Multiprogrammed, Shared Memory Multiprocessors. SIGOPS Oper. Syst. Rev., 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. B. Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: An Operating System for Many Cores. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI '08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Zhang, H. Franke, J. Moreira, and A. Sivasubramaniam. An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling and Migration. In IEEE Transactions on Parallel and Distributed Systems (TPDS), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Load balancing on speed

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
          January 2010
          372 pages
          ISBN:9781605588773
          DOI:10.1145/1693453
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 45, Issue 5
            PPoPP '10
            May 2010
            346 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/1837853
            Issue’s Table of Contents

          Copyright © 2010 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 January 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate230of1,014submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!