skip to main content
research-article

Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments

Published:03 June 2015Publication History
Skip Abstract Section

Abstract

Matching program parallelism to platform parallelism using thread selection is difficult when the environment and available resources dynamically change. Existing compiler or runtime approaches are typically based on a one-size fits all policy. There is little ability to either evaluate or adapt the policy when encountering new external workloads or hardware resources. This paper focuses on selecting the best number of threads for a parallel application in dynamic environments. It develops a new scheme based on a mixture of experts approach. It learns online which, of a number of existing policies, or experts, is best suited for a particular environment without having to try out each policy. It does this by using a novel environment predictor as a proxy for the quality of an expert thread selection policy. Additional expert policies can easily be added and are selected only when appropriate. We evaluate our scheme in environments with varying external workloads and hardware resources.We then consider the case when workloads use affinity scheduling or are themselves adaptive and show that our approach, in all cases, outperforms existing schemes and surprisingly improves workload performance. On average, we improve 1.66x over OpenMP default, 1.34x over an online scheme, 1.25x over an offline policy and 1.2x over a state-of-art analytic model. Determining the right number and type of experts is an open problem and our initial analysis shows that adding more experts improves accuracy and performance.

References

  1. NAS 2.3. http://phase.hpcc.jp/Omni/benchmarks/ NPB/index.html.Google ScholarGoogle Scholar
  2. J. Ansel, Y. L. Wong, C. Chan, M. Olszewski, A. Edelman, and S. Amarasinghe. Language and compiler support for auto-tuning variable-accuracy algorithms. CGO ’11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. PetaBricks: A Language and Compiler for Algorithmic Choice. PLDI, 2009. doi: 10.1145/1542476.1542481. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. PPOPP ’95, pages 207–216, New York, NY, USA, 1995. ACM. ISBN 0-89791-700-6. doi: http://doi.acm.org/10.1145/209936.209958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Carriero, E. Freeman, D. Gelernter, and D. Kaminsky. Adaptive parallelism and piranha. IEEE Computer, 28(1):40–49, Jan 1995. doi: 10.1109/2.362631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Curtis-Maury, J. Dzierwa, C. D. Antonopoulos, and D. S. Nikolopoulos. Online power-performance adaptation of multithreaded programs using hardware event-based prediction. ICS ’06, pages 157– 166, New York, NY, USA, 2006. ACM. ISBN 1-59593-282-8. doi: http://doi.acm.org/10.1145/1183401.1183426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Dagum and R. Menon. OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Comput. Sci. Eng., 5(1):46–55, Jan. 1998. ISSN 1070-9924. doi: 10.1109/99.660313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Dey, W. Wang, J. W. Davidson, and M. L. Soffa. Resense: Mapping dynamic workloads of colocated multithreaded applications using resource sensitivity. ACM Trans. Archit. Code Optim., 10(4):41:1– 41:25, Dec. 2013. ISSN 1544-3566. doi: 10.1145/2555289.2555298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. U. Edakunni, G. Brown, and T. Kovacs. Boosting as a product of experts. CoRR, abs/1202.3716, 2012.Google ScholarGoogle Scholar
  11. M. K. Emani, Z. Wang, and M. F. O’Boyle. Smart, adaptive mapping of parallelism in the presence of external workload. In CGO, pages 1–10. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Hoffmann. Coadapt: Predictable behavior for accuracy-aware applications running on power-aware systems. In ECRTS, pages 223– 232, July 2014. doi: 10.1109/ECRTS.2014.32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Hoffmann, M. Maggio, M. D. Santambrogio, A. Leva, and A. Agarwal. Seec: A framework for self-aware computing. 2010. URL http://hdl.handle.net/1721.1/59519.Google ScholarGoogle Scholar
  14. H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Dynamic knobs for responsive power-aware computing. ASPLOS XVI, pages 199–212, New York, NY, USA, 2011. ACM. doi: 10.1145/1950365.1950390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Hoffmann, M. Maggio, M. Santambrogio, A. Leva, and A. Agarwal. A generalized software framework for accurate and efficient management of performance goals. EMSOFT ’13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Ioannidis and S. Dwarkadas. Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems. In Languages, Compilers, and Run-Time Systems for Scalable Computers, pages 107–122. Springer Berlin Heidelberg, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures of local experts. Neural Comput., 3(1):79–87, Mar. 1991.Google ScholarGoogle ScholarCross RefCross Ref
  18. doi: 10.1162/neco.1991.3.1.79.Google ScholarGoogle Scholar
  19. M. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the em algorithm. In IJCNN, volume 2, pages 1339–1344 vol.2, Oct 1993.Google ScholarGoogle ScholarCross RefCross Ref
  20. doi: 10.1109/IJCNN.1993.716791.Google ScholarGoogle Scholar
  21. T. Lattimore, K. Crammer, and C. Szepesvári. Optimal Resource Allocation with Semi-Bandit Feedback. CoRR, abs/1406.3840, 2014. URL http://arxiv.org/abs/1406.3840.Google ScholarGoogle Scholar
  22. J. Lee, H. Wu, M. Ravichandran, and N. Clark. Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. ISCA, 2010. doi: 10.1145/1815961.1815996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Long and M. O’Boyle. Adaptive java optimisation using instancebased learning. ICS ’04, 2004. doi: 10.1145/1006209.1006243. URL http://doi.acm.org/10.1145/1006209.1006243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Parsec. Parsec 2.1. http://parsec.cs.princeton.edu/.Google ScholarGoogle Scholar
  25. A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. Parallelism orchestration using dope: The degree of parallelism executive. PLDI ’11, New York, NY, USA, 2011. ACM. doi: 10.1145/1993498.1993502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Raman, A. Zaks, J. W. Lee, and D. I. August. Parcae: A System for Flexible Parallel Execution. PLDI ’12, pages 133–144, 2012. doi: 10.1145/2254064.2254082. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Reinders. Intel threading building blocks. O’Reilly & Associates, Inc., Sebastopol, CA, USA, first edition, 2007. ISBN 9780596514808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. SpecOMP. SPECOMP 3.0. http://www.spec.org/omp/.Google ScholarGoogle Scholar
  29. S. Sridharan, G. Gupta, and G. S. Sohi. Holistic Run-time Parallelism Management for Time and Energy Efficiency. ICS ’13, pages 337– 348, 2013. doi: 10.1145/2464996.2465016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Sridharan, G. Gupta, and G. S. Sohi. Adaptive, Efficient, Parallel Execution of Parallel Programs. PLDI ’14, pages 169–180, 2014. doi: 10.1145/2594291.2594292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Streit, C. Hammacher, A. Zeller, and S. Hack. Sambamba: a runtime system for online adaptive parallelization. CC’12, Berlin, Heidelberg, 2012. Springer-Verlag. doi: 10.1007/978-3-642-28652-0 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback-driven threading: power-efficient and high-performance execution of multithreaded workloads on CMPs. ASPLOS XIII, New York, NY, USA, 2008. ACM. doi: http://doi.acm.org/10.1145/1346281.1346317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA, pages 283–294. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing Shared Resource Contention in Multicore Processors via Scheduling. ASPLOS XV, pages 129–142, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-839-1. doi: 10.1145/1736020.1736036. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 50, Issue 6
          PLDI '15
          June 2015
          630 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2813885
          • Editor:
          • Andy Gill
          Issue’s Table of Contents
          • cover image ACM Conferences
            PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
            June 2015
            630 pages
            ISBN:9781450334686
            DOI:10.1145/2737924

          Copyright © 2015 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 June 2015

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!