Abstract

Multiprocessor System-on-Chips(MPSoCs) are now widely used in embedded devices. MPSoCs typically contain a range of specialised processors. Alongside the CPU, there are microcontrollers, DSPs and other hardware accelerators. Programming these MPSoCs is difficult because of the difference in instruction-set architecture (ISA) and disjoint address spaces. In this paper we consider MPSoCs as a target for individual benchmarks. We examine how data-parallel programs can be optimally mapped to heterogeneous multicores for different criteria such as performance, power and energy. We investigate the partitioning of seven benchmarks taken from DSPstone, UTDSP and Polybench suites. Based on design space exploration we show that the best partition depends on compiler optimization level, program, input size and crucially optimization criteria. We develop a straightforward approach that attempts to select the best partitioning for a given program. On average it achieves speedups of 2.2x and energy improvements of 1.45x on the OMAP 4430 platform.
- Agarwal, A., Kranz, D.A., and Natarajan, V., "Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors," Parallel and Distributed Systems, IEEE Transactions on , Vol.6, No.9, pp. 943--962, Sep 1995 Google Scholar
Digital Library
- ARM big.LITTLE, http://goo.gl/aL4f4LGoogle Scholar
- C.G. Lee and M. Stoodley, UTDSP Benchmark Suite, 1992, http://goo.gl/PE5wjgGoogle Scholar
- Ceng, J. et.al, MAPS: An integrated framework for MPSoC application parallelization, Design Automation Conference, Jun 2008, pp. 754--759 Google Scholar
Digital Library
- Chenguang Shen, Supriyo Chakraborty, Kasturi Rangan Raghavan, Haksoo Choi, and Mani B. Srivastava. 2013. Exploiting processor heterogeneity for energy efficient context inference on mobile phones. In Proceedings of the Workshop on Power-Aware Computing and Systems (HotPower '13). ACM, New York, NY, USA, Article 9 Google Scholar
Digital Library
- F. X. Lin, Z. Wang, R. LiKamWa, and L. Zhong, Reflex: using lowpower processors in smartphones without knowing them. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS XVII), ACM, March 2012 Google Scholar
Digital Library
- Francois Bodin, and Michael O'Boyle, A Compiler Strategy for Shared Virtual Memories, Languages, Compilers and Run-Time Systems for Scalable Computers, 1996.Google Scholar
- Frederica Darema, David A. George, V. Alan Norton, and Gregory F. Pfister, A single-program-multiple-data computational model for EPEX/FORTRAN, Parallel Computing, 1988, vol 7--1,pp. 11--24Google Scholar
Cross Ref
- Khokhar, A.A. and Prasanna, V.K. and Shaaban, M.E. andWang, C.-L., "Heterogeneous computing: challenges and opportunities", Computer, Vol. 26, No. 6, pp. 18--27, Jun 1993 Google Scholar
Digital Library
- Kundu, T.K. and Paul, K., Improving Android Performance and Energy Efficiency, 24th International Conference on VLSI Design (VLSI Design), Jan 2011, pp 256--261 Google Scholar
Digital Library
- Leupers, R. and Castrillon, J., MPSoC programming using the MAPS compiler, Design Automation Conference (ASP-DAC), pp. 897--902, Jan 2010 Google Scholar
Digital Library
- Louis-Noel Pouchet, Polybench Benchmark suite, Ohio State University, 1992, http://www.cse.ohio-state.edu/ pouchet/software/polybench/Google Scholar
- MFP O'Boyle, L Kervella, F Bodin, Synchronization minimization in a SPMD execution model, Journal of parallel and distributed computing, 1995, 29(2) Google Scholar
Digital Library
- Michel Goraczko, Jie Liu, Dimitrios Lymberopoulos, Slobodan Matic, Bodhi Priyantha, and Feng Zhao. 2008. Energy-optimal software partitioning in heterogeneous multiprocessor embedded systems. In Proceedings of the 45th annual Design Automation Conference (DAC '08). ACM, New York, NY, USA, 191--196 Google Scholar
Digital Library
- Nishkam Ravi, Yi Yang, Tao Bao, and Srimat Chakradhar. Semiautomatic restructuring of offloadable tasks for many-core accelerators. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis (SC '13). ACM, New York, NY, USA, Article 12 Google Scholar
Digital Library
- OMAP SoC, TI, http://www.ti.com/lsds/ti/omap-applicationsprocessors/ technologies.pageGoogle Scholar
- OMAP4430, http://www.ti.com/product/OMAP4430Google Scholar
- Pandaboard, http://pandaboard.orgGoogle Scholar
- Pandaboard Manual, http://goo.gl/yGz7u6Google Scholar
- Power Measurement in OMAP4, http://goo.gl/TH2Y5RGoogle Scholar
- Rakesh Kumar, DeanM. Tullsen, Norman P. Jouppi, and Parthasarathy Ranganathan, "Heterogeneous Chip Multiprocessors," Computer, Vol. 38, No. 11, pp. 32--38, Nov. 2005 Google Scholar
Digital Library
- Robert Numrich and John Reid. Co-Array Fortran for Parallel Programming. Tech. rep. RAL-TR-1998-060. Rutherford Appleton Laboratory, 1998Google Scholar
- Snapdragon SoC, Qualcomm, http://www.qualcomm.com/snapdragonGoogle Scholar
- Sumit Semwal, DMA Buffer Sharing API Guide, http://lwn.net/Articles/489703/Google Scholar
- Syslink/rpmsg, http://omappedia.org/wiki/Category:RPMsgGoogle Scholar
- Tegra SoC, Nvidia, http://www.nvidia.com/object/tegra.htmlGoogle Scholar
- Ting Cao, Blackburn, S.M., Tiejun Gao, and McKinley, K.S., The Yin and Yang of power and performance for asymmetric hardware and managed software, 39th Annual International Symposium on Computer Architecture, Jun 2012, pp. 225--236 Google Scholar
Digital Library
- Tomofumi Yuki and Sanjay Rajopadhye, Folklore Confirmed: Compiling for Speed = Compiling for Energy, The 26th International Workshop on Languages and Compilers for Parallel Computing, 2013Google Scholar
- V. Zivojnovic et.al, DSPstone: A DSP-Oriented Benchmarking Methodology, Proc. of ICSPAT'94 - Dallas, Oct. 1994Google Scholar
- William W. Carlson, Jesse M. Draper and David E. Culler, Introduction to UPC and Language Specification, Tech. Report CCS-TR-99157, 1999Google Scholar
- Yelick, Semenzato, Pike, Miyamoto, Liblit, Krishnamurthy, Hilfinger, Graham, Gay, Colella, and Aiken, Titanium: A High-Performance Java Dialect, International Workshop on Java for High-Performance Netwo Computing, Stanford, California, 1998Google Scholar
Index Terms
Partitioning data-parallel programs for heterogeneous MPSoCs: time and energy design space exploration
Recommendations
Partitioning data-parallel programs for heterogeneous MPSoCs: time and energy design space exploration
LCTES '14: Proceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsMultiprocessor System-on-Chips(MPSoCs) are now widely used in embedded devices. MPSoCs typically contain a range of specialised processors. Alongside the CPU, there are microcontrollers, DSPs and other hardware accelerators. Programming these MPSoCs is ...
A compiler framework for automatically mapping data parallel programs to heterogeneous MPSoCs
CASES '14: Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded SystemsMany of today's embedded devices are based on MultiProcessor System-on-Chips(MPSoCs). Such devices are usually heterogeneous, containing DSPs and specialized accelerators as well as one or more CPUs. This heterogeneity allows efficient implementations ...
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimizationIn this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-...







Comments