Abstract
High-performance computing systems increasingly combine multi-core processors and heterogeneous resources such as graphics-processing units and field-programmable gate arrays. However, significant application design complexity for such systems has often led to untapped performance potential. Application designers targeting such systems currently must determine how to parallelize computation, create device-specialized implementations for each heterogeneous resource, and determine how to partition work for each resource. In this paper, we present the RACECAR heuristic to automate the optimization of applications for multi-core heterogeneous systems by automatically exploring implementation alternatives that include different algorithms, parallelization strategies, and work distributions. Experimental results show RACECAR-specialized implementations achieve speedups up to 117x and average 11x compared to a single CPU thread when parallelizing computation across multiple cores, graphics-processing units, and field-programmable gate arrays.
- A. DeHon, "The density advantage of configurable computing," Computer, vol. 33, no. 4, pp. 41--49, 2000. Google Scholar
Digital Library
- A. George, H. Lam, and G. Stitt. "Novo-g: at the forefront of scalable reconfigurable supercomputing". IEEE Computing in Science and Engineering Magazine (Jan/Feb 2011), pp. 82--86, 2011. Google Scholar
Digital Library
- Z. Guo, W. Najjar, F. Vahid, and K. Vissers, "A quantitative analysis of the speedup factors of FPGAs over processors," in FPGA '04: Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, pp. 162--170, 2004. Google Scholar
Digital Library
- B. Nelson, M. Wirthlin, B. Hutchings, P. Athanas, and S. Bohner. "Design productivity for configurable computing," in ERSA '08: Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, pp. 57--66, 2008.Google Scholar
- P. Trancoso and M. Charalambous, "Exploring graphics processor performance for general purpose applications," in Proceedings of the 8th Euromicro Conference on Digital System Design, pp. 306--313, 2005. Google Scholar
Digital Library
- J. R. Wernsing and G. Stitt, "Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing," in LCTES '10: Proceedings of the ACM SIGPLAN/SIGBED 2010 Conference on Languages, Compilers, and Tools for Embedded Systems, pp. 115--124, 2010. Google Scholar
Digital Library
Index Terms
RACECAR: a heuristic for automatic function specialization on multi-core heterogeneous systems
Recommendations
RACECAR: a heuristic for automatic function specialization on multi-core heterogeneous systems
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel ProgrammingHigh-performance computing systems increasingly combine multi-core processors and heterogeneous resources such as graphics-processing units and field-programmable gate arrays. However, significant application design complexity for such systems has often ...
The RACECAR heuristic for automatic function specialization on multi-core heterogeneous systems
CASES '12: Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systemsEmbedded systems increasingly combine multi-core processors and heterogeneous resources such as graphics-processing units and field-programmable gate arrays. However, significant application design complexity for such systems caused by parallel ...
The time and energy efficiency of modern multicore systems
AbstractWith the increasing adoption of homogeneous and heterogeneous shared-memory multicore systems, we aim to improve the understanding of their time and energy performance by extending the classic speedup laws proposed by Amdahl and ...







Comments