Abstract
In this paper we propose a novel approach which automatizes task partitioning in heterogeneous systems. Our framework is based on the Insieme Compiler and Runtime infrastructure. The compiler translates a single-device OpenCL program into a multi-device OpenCL program. The runtime system then performs dynamic task partitioning based on an offline-generated prediction model. In order to derive the prediction model, we use a machine learning approach that incorporates static program features as well as dynamic, input sensitive features. Our approach has been evaluated over a suite of 23 programs and achieves performance improvements compared to an execution of the benchmarks on a single CPU and a single GPU only.
- Insieme compiler and runtime infrastructure. - Distributed and Parallel Systems Group, University of Innsbruck. http://insieme-compiler.org, 2012.Google Scholar
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, pages 44--54, 2009. Google Scholar
Digital Library
- A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter. The scalable heterogeneous computing (shoc) benchmark suite. In GPGPU, pages 63--74, 2010. Google Scholar
Digital Library
- S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, , and J. Cavazos. Auto-tuning a high-level language targeted to gpu codes. In InPar, 2012.Google Scholar
Cross Ref
- C. Gregg and K. M. Hazelwood. Where is the data? why you cannot debate cpu vs. gpu performance without the answer. In ISPASS, pages 134--144, 2011. Google Scholar
Digital Library
- Khronos OpenCL Working Group. The OpenCL 1.2 specification. http://www.khronos.org/opencl, 2012.Google Scholar
- P. Thoman, K. Kofler, H. Studt, J. Thomson, and T. Fahringer. Automatic opencl device characterization: guiding optimized kernel design. In Euro-Par, pages 438--452, 2011. Google Scholar
Digital Library
Index Terms
Automatic problem size sensitive task partitioning on heterogeneous parallel systems
Recommendations
An automatic input-sensitive approach for heterogeneous task partitioning
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputingUnleashing the full potential of heterogeneous systems, consisting of multi-core CPUs and GPUs, is a challenging task due to the difference in processing capabilities, memory availability, and communication latencies of different computational ...
Automatic problem size sensitive task partitioning on heterogeneous parallel systems
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingIn this paper we propose a novel approach which automatizes task partitioning in heterogeneous systems. Our framework is based on the Insieme Compiler and Runtime infrastructure. The compiler translates a single-device OpenCL program into a multi-device ...
A static task partitioning approach for heterogeneous systems using OpenCL
CC'11/ETAPS'11: Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of softwareHeterogeneous multi-core platforms are increasingly prevalent due to their perceived superior performance over homogeneous systems. The best performance, however, can only be achieved if tasks are accurately mapped to the right processors. OpenCL ...







Comments