Abstract
Matching program parallelism to platform parallelism using thread selection is difficult when the environment and available resources dynamically change. Existing compiler or runtime approaches are typically based on a one-size fits all policy. There is little ability to either evaluate or adapt the policy when encountering new external workloads or hardware resources. This paper focuses on selecting the best number of threads for a parallel application in dynamic environments. It develops a new scheme based on a mixture of experts approach. It learns online which, of a number of existing policies, or experts, is best suited for a particular environment without having to try out each policy. It does this by using a novel environment predictor as a proxy for the quality of an expert thread selection policy. Additional expert policies can easily be added and are selected only when appropriate. We evaluate our scheme in environments with varying external workloads and hardware resources.We then consider the case when workloads use affinity scheduling or are themselves adaptive and show that our approach, in all cases, outperforms existing schemes and surprisingly improves workload performance. On average, we improve 1.66x over OpenMP default, 1.34x over an online scheme, 1.25x over an offline policy and 1.2x over a state-of-art analytic model. Determining the right number and type of experts is an open problem and our initial analysis shows that adding more experts improves accuracy and performance.
- NAS 2.3. http://phase.hpcc.jp/Omni/benchmarks/ NPB/index.html.Google Scholar
- J. Ansel, Y. L. Wong, C. Chan, M. Olszewski, A. Edelman, and S. Amarasinghe. Language and compiler support for auto-tuning variable-accuracy algorithms. CGO ’11. Google Scholar
Digital Library
- J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. PetaBricks: A Language and Compiler for Algorithmic Choice. PLDI, 2009. doi: 10.1145/1542476.1542481. Google Scholar
Digital Library
- C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google Scholar
Digital Library
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. PPOPP ’95, pages 207–216, New York, NY, USA, 1995. ACM. ISBN 0-89791-700-6. doi: http://doi.acm.org/10.1145/209936.209958. Google Scholar
Digital Library
- N. Carriero, E. Freeman, D. Gelernter, and D. Kaminsky. Adaptive parallelism and piranha. IEEE Computer, 28(1):40–49, Jan 1995. doi: 10.1109/2.362631. Google Scholar
Digital Library
- M. Curtis-Maury, J. Dzierwa, C. D. Antonopoulos, and D. S. Nikolopoulos. Online power-performance adaptation of multithreaded programs using hardware event-based prediction. ICS ’06, pages 157– 166, New York, NY, USA, 2006. ACM. ISBN 1-59593-282-8. doi: http://doi.acm.org/10.1145/1183401.1183426. Google Scholar
Digital Library
- L. Dagum and R. Menon. OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Comput. Sci. Eng., 5(1):46–55, Jan. 1998. ISSN 1070-9924. doi: 10.1109/99.660313. Google Scholar
Digital Library
- T. Dey, W. Wang, J. W. Davidson, and M. L. Soffa. Resense: Mapping dynamic workloads of colocated multithreaded applications using resource sensitivity. ACM Trans. Archit. Code Optim., 10(4):41:1– 41:25, Dec. 2013. ISSN 1544-3566. doi: 10.1145/2555289.2555298. Google Scholar
Digital Library
- N. U. Edakunni, G. Brown, and T. Kovacs. Boosting as a product of experts. CoRR, abs/1202.3716, 2012.Google Scholar
- M. K. Emani, Z. Wang, and M. F. O’Boyle. Smart, adaptive mapping of parallelism in the presence of external workload. In CGO, pages 1–10. IEEE, 2013. Google Scholar
Digital Library
- H. Hoffmann. Coadapt: Predictable behavior for accuracy-aware applications running on power-aware systems. In ECRTS, pages 223– 232, July 2014. doi: 10.1109/ECRTS.2014.32. Google Scholar
Digital Library
- H. Hoffmann, M. Maggio, M. D. Santambrogio, A. Leva, and A. Agarwal. Seec: A framework for self-aware computing. 2010. URL http://hdl.handle.net/1721.1/59519.Google Scholar
- H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Dynamic knobs for responsive power-aware computing. ASPLOS XVI, pages 199–212, New York, NY, USA, 2011. ACM. doi: 10.1145/1950365.1950390. Google Scholar
Digital Library
- H. Hoffmann, M. Maggio, M. Santambrogio, A. Leva, and A. Agarwal. A generalized software framework for accurate and efficient management of performance goals. EMSOFT ’13, 2013. Google Scholar
Digital Library
- S. Ioannidis and S. Dwarkadas. Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems. In Languages, Compilers, and Run-Time Systems for Scalable Computers, pages 107–122. Springer Berlin Heidelberg, 1998. Google Scholar
Digital Library
- R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures of local experts. Neural Comput., 3(1):79–87, Mar. 1991.Google Scholar
Cross Ref
- doi: 10.1162/neco.1991.3.1.79.Google Scholar
- M. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the em algorithm. In IJCNN, volume 2, pages 1339–1344 vol.2, Oct 1993.Google Scholar
Cross Ref
- doi: 10.1109/IJCNN.1993.716791.Google Scholar
- T. Lattimore, K. Crammer, and C. Szepesvári. Optimal Resource Allocation with Semi-Bandit Feedback. CoRR, abs/1406.3840, 2014. URL http://arxiv.org/abs/1406.3840.Google Scholar
- J. Lee, H. Wu, M. Ravichandran, and N. Clark. Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. ISCA, 2010. doi: 10.1145/1815961.1815996. Google Scholar
Digital Library
- S. Long and M. O’Boyle. Adaptive java optimisation using instancebased learning. ICS ’04, 2004. doi: 10.1145/1006209.1006243. URL http://doi.acm.org/10.1145/1006209.1006243. Google Scholar
Digital Library
- Parsec. Parsec 2.1. http://parsec.cs.princeton.edu/.Google Scholar
- A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. Parallelism orchestration using dope: The degree of parallelism executive. PLDI ’11, New York, NY, USA, 2011. ACM. doi: 10.1145/1993498.1993502. Google Scholar
Digital Library
- A. Raman, A. Zaks, J. W. Lee, and D. I. August. Parcae: A System for Flexible Parallel Execution. PLDI ’12, pages 133–144, 2012. doi: 10.1145/2254064.2254082. Google Scholar
Digital Library
- J. Reinders. Intel threading building blocks. O’Reilly & Associates, Inc., Sebastopol, CA, USA, first edition, 2007. ISBN 9780596514808. Google Scholar
Digital Library
- SpecOMP. SPECOMP 3.0. http://www.spec.org/omp/.Google Scholar
- S. Sridharan, G. Gupta, and G. S. Sohi. Holistic Run-time Parallelism Management for Time and Energy Efficiency. ICS ’13, pages 337– 348, 2013. doi: 10.1145/2464996.2465016. Google Scholar
Digital Library
- S. Sridharan, G. Gupta, and G. S. Sohi. Adaptive, Efficient, Parallel Execution of Parallel Programs. PLDI ’14, pages 169–180, 2014. doi: 10.1145/2594291.2594292. Google Scholar
Digital Library
- K. Streit, C. Hammacher, A. Zeller, and S. Hack. Sambamba: a runtime system for online adaptive parallelization. CC’12, Berlin, Heidelberg, 2012. Springer-Verlag. doi: 10.1007/978-3-642-28652-0 13. Google Scholar
Digital Library
- M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback-driven threading: power-efficient and high-performance execution of multithreaded workloads on CMPs. ASPLOS XIII, New York, NY, USA, 2008. ACM. doi: http://doi.acm.org/10.1145/1346281.1346317. Google Scholar
Digital Library
- L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA, pages 283–294. IEEE, 2011. Google Scholar
Digital Library
- S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing Shared Resource Contention in Multicore Processors via Scheduling. ASPLOS XV, pages 129–142, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-839-1. doi: 10.1145/1736020.1736036. Google Scholar
Digital Library
Index Terms
Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments
Recommendations
Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and ImplementationMatching program parallelism to platform parallelism using thread selection is difficult when the environment and available resources dynamically change. Existing compiler or runtime approaches are typically based on a one-size fits all policy. There ...
Integrating machine learning with knowledge acquisition through direct interaction with domain experts
Knowledge elicitation from experts and empirical machine learning are two distinct approaches to knowledge acquisition with differing and mutually complementary capabilities. Learning apprentices have provided environments in which a knowledge engineer ...
An experiment in agent teaching by subject matter experts
This paper presents a successful knowledge-acquisition experiment in which subject matter experts who did not have any prior knowledge-engineering experience succeeded in teaching the Disciple- COA agent how to critique courses of action, a challenge ...






Comments