Abstract
Multicore embedded systems are being widely used in telecommunication systems, robotics, medical applications and more.While they offer a high-performance with low-power solution, programming in an efficient way is still a challenge. In order to exploit the capabilities that the hardware offers, software developers are expected to handle many of the low-level details of programming including utilizing DMA, ensuring cache coherency, and inserting synchronization primitives explicitly. The state-of-the-art involves solutions where the software toolchain is too vendor-specific thus tying the software to a particular hardware leaving no room-for portability.
In this paper we present a runtime system to explore mapping a high-level programming model, OpenMP, on to multicore embedded systems. A key feature of our scheme is that unlike the existing approaches that largely rely on POSIX threads, our approach leverages the Multicore Association (MCA) APIs as an OpenMP translation layer. The MCA APIs is a set of low-level APIs handling resource management, inter-process communications and task scheduling for multicore embedded systems. By deploying the MCA APIs, our runtime is able to effectively capture the characteristics of multicore embedded systems compared with the POSIX threads. Furthermore, the MCA layer enables our runtime implementation to be portable across various architectures. Thus programmers only need to maintain a single OpenMP code base which is compatible by various compilers, while on the other hand, the code is portable across different possible types of platforms. We have evaluated our runtime system using several embedded benchmarks. The experiments demonstrate promising and competitive performance compared to the native approach for the platform.
- TMDXEVM6678L EVM Technical Reference Manual Version 1.0, Literature Number: SPRUH58. URL http://wfcache.advantech.com.Google Scholar
- Data Communication and Synchronization Library for Cell Broadband Engine Programmers Guide and API reference, Version 3.0. URL http://moss.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs.Google Scholar
- Freescale Semiconductor Inc. URL http://www.freescale.com.Google Scholar
- The Multicore Association. URL http://www.multicore-association.org.Google Scholar
- A Case For MCAPI: CPU-to-CPU Communications in Multicore Designs. URL http://www.mentor.com/.Google Scholar
- Multicore Resource API (MRAPI) Specification, Version 1.0. URL http://www.multicore-association.org.Google Scholar
- The Objective-C Programming Languages. URL http://developer.apple.com.Google Scholar
- The OpenCL Specification, Version 1.0, . URL http://www.khronos.org.Google Scholar
- OpenMP Application Program Interface, Version 3.1, . URL http://www.openmp.org.Google Scholar
- Polycore MCAPI Offers ThreadX RTOS Support. URL http://www.eetasia.com.Google Scholar
- J. Auerbach, D. F. Bacon, I. Burcea, P. Cheng, S. J. Fink, R. Rabbah, and S. Shukla. A Compiler and Runtime for Heterogeneous Computing. In Proceedings of DAC?12, pages 271--276, NY, USA, 2012. ACM. ISBN 978-1-4503-1199-1. doi: 10.1145/2228360.2228411. URL http://doi.acm.org/10.1145/2228360.2228411. Google Scholar
Digital Library
- J. Bull. Measuring Synchronisation and Scheduling Overheads in OpenMP. In Proceedings of the First European Workshop on OpenMP, pages 99--105, 1999.Google Scholar
- Q. Cao, C. Hu, H. He, X. Huang, and S. Li. Support for OpenMP Tasks on Cell Architecture. In Proc. of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II, ICA3PP?10, pages 308--317. Springer-Verlag, 2010. Google Scholar
Digital Library
- B. Chapman, L. Huang, E. Biscondi, E. Stotzer, A. Shrivastava, and A. Gatherer. Implementing OpenMP on a High Performance Embedded Multicore MPSoC. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1--8, 2009. doi: 10.1109/IPDPS.2009.5161107. Google Scholar
Digital Library
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proceedings of IISWC?09, pages 44--54, Washington, DC, USA, 2009. IEEE Computer Society. ISBN 978-1-4244-5156-2. doi: 10.1109/IISWC.2009.5306797. URL http://dx.doi.org/10.1109/IISWC.2009.5306797. Google Scholar
Digital Library
- P. Cooper, U. Dolinsky, A. F. Donaldson, A. Richards, C. Riley, and G. Russell. Offload: Automating Code Migration to Heterogeneous Multicore Systems. In Proceedings of HiPEAC ?10, pages 337--352. Springer-Verlag, 2010. Google Scholar
Digital Library
- F. Garcia and J. Fernandez. POSIX Threads Libraries. Linux J., 2000, 2000. Google Scholar
Digital Library
- M. Garland, M. Kudlur, and Y. Zheng. Designing a Unified Programming Model for Heterogeneous Machines. In Proceedings of SC? 12, pages 67:1--67:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. ISBN 978-1-4673-0804-5. URL http://dl.acm.org/citation.cfm?id=2388996.2389087. Google Scholar
Digital Library
- M. Gokhale, J. Stone, J. Arnold, and M. Kalinowski. Stream-oriented FPGA Computing in the Streams-C High Level Language. In Field-Programmable Custom Computing Machines, 2000 IEEE Symposium on, pages 49--56. IEEE, 2000. Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In Proc. of WWC-4, 2001., pages 3--14. IEEE Computer Society, 2001. Google Scholar
Cross Ref
- T. D. Han and T. S. Abdelrahman. hiCUDA: High-Level GPGPU Programming. IEEE Transactions on Parallel and Distributed Systems, 22:78--90, 2011. ISSN 1045-9219. doi: http://doi.ieeecomputersociety.org/10.1109/TPDS.2010.62. Google Scholar
Digital Library
- T. Hanawa, M. Sato, J. Lee, T. Imada, H. Kimura, and T. Boku. Evaluation ofMulticore Processors for Embedded Systems by Parallel Benchmark Program Using OpenMP. Evolving OpenMP in an Age of Extreme Parallelism, pages 15--27, 2009. Google Scholar
Digital Library
- J. He,W. Chen, G. Chen,W. Zheng, Z. Tang, and H. Ye. OpenMDSP: Extending OpenMP to Program Multi-Core DSP. In Proceedings of PACT ?11, pages 288--297. IEEE, 2011. Google Scholar
Digital Library
- F. D. Igual, M. Ali, A. Friedmann, E. Stotzer, T. Wentz, and R. A. van de Geijn. Unleashing the High-Performance and Low-Power of Multi-core DSPs for General-Purpose HPC. In Proceedings of SC ? 12, SC ?12, pages 26:1?-26:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. ISBN 978-1-4673-0804-5. URL http://dl.acm.org/citation.cfm?id=2388996.2389032. Google Scholar
Digital Library
- S. Lee and R. Eigenmann. OpenMPC: Extended OpenMP Programming and Tuning for GPUs. In Proceedings of SC ?10, pages 1--11. IEEE Computer Society, 2010. Google Scholar
Digital Library
- S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP ?09, pages 101--110, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-397-6. doi: 10.1145/1504176.1504194. URL http://doi.acm.org/10.1145/1504176.1504194. Google Scholar
Digital Library
- C. Liao, O. Hernandez, B. M. Chapman, W. Chen, and W. Zheng. OpenUH: an Optimizing, Portable OpenMP Compiler. Concurrency and Computation: Practice and Experience, 19(18):2317--2332, 2007. Google Scholar
Digital Library
- P.Martin. An Analysis of Random Number Generators for a Hardware Implementation of Genetic Programming using FPGAs and Handel-C. In Proceedings of the genetic and evolutionary computation conference, pages 837--844. Morgan Kaufmann Publishers Inc., 2002. Google Scholar
Digital Library
- J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Trans. Comput. Syst., 9(1):21--65, Feb. 1991. ISSN 0734-2071. doi: 10.1145/103727.103729. URL http://doi.acm.org/10.1145/103727.103729. Google Scholar
Digital Library
- K. O?Brien, K. O?Brien, Z. Sura, T. Chen, and T. Zhang. Supporting OpenMP on Cell. Int. J. Parallel Program., 36(3):289--311, June 2008. Google Scholar
Cross Ref
- D. Pellerin and S. Thibault. Practical FPGA Programming in C. Prentice Hall Press, 2005. Google Scholar
Digital Library
- A. Reid, K. Flautner, E. Grimley-Evans, and Y. Lin. SoC-C: Efficient Programming Abstractions for Heterogeneous Multicore Systems on Chip. In Proceedings of CASES ? 08, pages 95--104. ACM, 2008. Google Scholar
Digital Library
- M. Sato, M. S. Shigehisa, K. Kusano, and Y. Tanaka. Design of OpenMP Compiler for an SMP Cluster. In In EWOMP 99, pages 32--39, 1999.Google Scholar
- A. Sb??rlea, Y. Zou, Z. Budimlć, J. Cong, and V. Sarkar. Mapping a Data-flow Programming Model onto Heterogeneous Platforms. In Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES ?12, pages 61--70, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1212-7. doi: 10.1145/2248418.2248428. URL http://doi.acm.org/10.1145/2248418.2248428. Google Scholar
Digital Library
- D.W.Walker, D.W.Walker, J. J. Dongarra, and J. J. Dongarra. MPI: A Standard Message Passing Interface. Supercomputer, 12:56--68, 1996.Google Scholar
- C. Wang, S. Chandrasekaran, B. Chapman, and J. Holt. libEOMP: A Portable OpenMP Runtime Library Based on MCA APIs for Embedded Systems. In Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM ?13, pages 83--92, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1908-9. doi: 10.1145/2442992.2443001. URL http://doi.acm.org/10.1145/2442992.2443001. Google Scholar
Digital Library
Index Terms
Portable mapping of openMP to multicore embedded systems using MCA APIs
Recommendations
Portable mapping of openMP to multicore embedded systems using MCA APIs
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsMulticore embedded systems are being widely used in telecommunication systems, robotics, medical applications and more.While they offer a high-performance with low-power solution, programming in an efficient way is still a challenge. In order to exploit ...
Portable mapping of openMP to multicore embedded systems using MCA APIs
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsMulticore embedded systems are being widely used in telecommunication systems, robotics, medical applications and more.While they offer a high-performance with low-power solution, programming in an efficient way is still a challenge. In order to exploit ...
Deploying OpenMP Task Parallelism on Multicore Embedded Systems with MCA Task APIs
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and SystemsHeterogeneous multicore embedded systems are rapidly growing with cores of varying types and capacity. Programming these devices and exploiting the hardware has been a real challenge. The programming models and its execution are typically meant for ...







Comments