skip to main content
research-article

Energy-Efficient Run-Time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs

Published:27 September 2017Publication History
Skip Abstract Section

Abstract

Heterogeneous Multi-Processor Systems-on-Chips (MPSoCs) containing CPU and GPU cores are typically required to execute applications concurrently. However, as will be shown in this paper, existing approaches are not well suited for concurrent applications as they are developed either by considering only a single application or they do not exploit both CPU and GPU cores at the same time. In this paper, we propose an energy-efficient run-time mapping and thread partitioning approach for executing concurrent OpenCL applications on both GPU and GPU cores while satisfying performance requirements. Depending upon the performance requirements, for each concurrently executing application, the mapping process finds the appropriate number of CPU cores and operating frequencies of CPU and GPU cores, and the partitioning process identifies an efficient partitioning of the applications’ threads between CPU and GPU cores. We validate the proposed approach experimentally on the Odroid-XU3 hardware platform with various mixes of applications from the Polybench benchmark suite. Additionally, a case-study is performed with a real-world application SLAMBench. Results show an average energy saving of 32% compared to existing approaches while still satisfying the performance requirements.

References

  1. 2013. ARM Mali T628. http://www.arm.com/. (2013).Google ScholarGoogle Scholar
  2. 2014. ARM big.LITTLE Technology. http://www.arm.com/. (2014).Google ScholarGoogle Scholar
  3. 2015. Qualcomm Adreno 530 and 540. https://www.qualcomm.com/. (2015).Google ScholarGoogle Scholar
  4. 2016. ARM Mali 71. http://www.arm.com/. (2016).Google ScholarGoogle Scholar
  5. 2016. Exynos 5 Octa (5422). www.samsung.com/exynos/. (2016).Google ScholarGoogle Scholar
  6. 2016. Odroid-XU3. http://www.hardkernel.com/main/products/prdt_info.php?g_code=g140448267127. (2016).Google ScholarGoogle Scholar
  7. 2016. The open standard for parallel programming of heterogeneous systems. https://goo.gl/A9wXRJ. (2016).Google ScholarGoogle Scholar
  8. 2017. FreeOCL: Multi-platform implementation of OpenCL 1.2 targeting CPUs. (2017). https://github.com/zuzuf/freeoclGoogle ScholarGoogle Scholar
  9. Ali Aalsaud, Rishad Shafik, Ashur Rafiev, Fie Xia, Sheng Yang, and Alex Yakovlev. 2016. Power--Aware Performance Adaptation of Concurrent Applications in Heterogeneous Many-Core Systems. In Proceedings of the International Symposium on Low Power Electronics and Design. ACM, 368--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Karunakar Reddy Basireddy, Amit Kumar Singh, Geoff V. Merrett, and Bashir M. Al-Hashimi. 2017. ITMD: run-time management of concurrent multi-threaded applications on heterogeneous multi-cores. In Conference on Design, Automation and Test in Europe (DATE), University Booth. 1.Google ScholarGoogle Scholar
  11. Kiran Chandramohan and Michael F. P. O’Boyle. 2014. Partitioning data-parallel programs for heterogeneous MPSoCs: time and energy design space exploration. In ACM SIGPLAN Notices, Vol. 49. ACM, 73--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Del Sozzo, G. C. Durelli, E. M. G. Trainiti, A. Miele, M. D. Santambrogio, and C. Bolchini. 2016. Workload-aware power optimization strategy for asymmetric multiprocessors. In 2016 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE). IEEE, 531--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bryan Donyanavard, Tiago Mück, Santanu Sarma, and Nikil Dutt. 2016. SPARTA: runtime task allocation for energy efficient heterogeneous many-cores. In Proceedings of the IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. ACM, 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Bagnères et al.Switchable scheduling for runtime adaptation of optimization. In Euro-Par’14. 222--233.Google ScholarGoogle Scholar
  15. Ivan Grasso, Petar Radojkovic, Nikola Rajovic, Isaac Gelado, and Alex Ramirez. 2014. Energy efficient hpc on embedded socs: Optimization techniques for mali gpu. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. IEEE, 123--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In Innovative Parallel Computing (InPar), 2012. IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  17. Peter Greenhalgh. 2011. Big. little processing with arm cortex-a15 8 cortex-a7. ARM White paper (2011), 1--8.Google ScholarGoogle Scholar
  18. Dominik Grewe and Michael F. P. O’Boyle. 2011. A static task partitioning approach for heterogeneous systems using OpenCL. In International Conference on Compiler Construction. Springer, 286--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dominik Grewe, Zheng Wang, and Michael F. P. O’Boyle. 2013. OpenCL task partitioning in the presence of GPU contention. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 87--101.Google ScholarGoogle Scholar
  20. Timo Hönig, Heiko Janker, Christopher Eibel, Oliver Mihelic, Rüdiger Kapitza, and Wolfgang Schröder-Preikschat. 2014. Proactive Energy-Aware Programming with PEEK. In TRIOS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gangwon Jo, Won Jong Jeon, Wookeun Jung, Gordon Taft, and Jaejin Lee. 2014. OpenCL framework for ARM processors with NEON support. In Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing. ACM, 33--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ali Karami, Farshad Khunjush, and Seyyed Ali Mirsoleimani. 2015. A statistical performance analyzer framework for OpenCL kernels on Nvidia GPUs. The Journal of Supercomputing 71, 8 (2015), 2900--2921. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. David H. K. Kim, Connor Imes, and Henry Hoffmann. 2015. Racing and pacing to idle: Theoretical and empirical analysis of energy optimization heuristics. In Cyber-Physical Systems, Networks, and Applications (CPSNA), 2015 IEEE 3rd International Conference on. IEEE, 78--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 45--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jun Ma, Guihai Yan, Yinhe Han, and Xiaowei Li. 2016. An Analytical Framework for Estimating Scale-Out and Scale-Up Power Efficiency of Heterogeneous Manycores. IEEE Trans. Comput. 65, 2 (2016), 367--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Luigi Nardi, Bruno Bodin, M. Zeeshan Zia, John Mawer, Andy Nisbet, Paul H. J. Kelly, Andrew J. Davison, Mikel Luján, Michael F. P. O’Boyle, Graham Riley, and others. 2015. Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM. In Robotics and Automation (ICRA), 2015 IEEE International Conference on. IEEE, 5783--5790.Google ScholarGoogle ScholarCross RefCross Ref
  27. Prasanna Pandit and R. Govindarajan. 2014. Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Indrani Paul, Vignesh Ravi, Srilatha Manne, Manish Arora, and Sudhakar Yalamanchili. 2014. Coordinated energy management in heterogeneous processors. Scientific Programming 22, 2 (2014), 93--108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Behnaz Pourmohseni, Michael Glaß, and Jürgen Teich. 2017. Automatic operating point distillation for hybrid mapping methodologies. In 2017 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE). IEEE, 1135--1140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alok Prakash, Siqi Wang, Alexandru Eugen Irimiea, and Tulika Mitra. 2015. Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms. In IEEE International Conference on Computer Design (ICCD). IEEE, 208--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Amit Kumar Singh, Piotr Dziurzanski, Hashan Roshantha Mendis, and Leandro Soares Indrusiak. 2017. A Survey and Comparative Study of Hard and Soft Real-Time Dynamic Resource Allocation Strategies for Multi-/Many-Core Systems. ACM Comput. Surv. 50, 2, Article 24 (2017), 40 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Amit Kumar Singh, Charles Leech, Karunakar Reddy Basireddy, Bashir M. Al-Hashimi, and Geoff V. Merrett. 2017. Learning-based Run-time Power and Energy Management of Multi/Many-core Systems: Current and Future Trends. In Journal of Low Power Electronics (JOLPE). 26.Google ScholarGoogle Scholar
  33. Amit Kumar Singh, Muhammad Shafique, Akash Kumar, and Jörg Henkel. 2013. Mapping on Multi/Many-core Systems: Survey of Current and Emerging Trends. In Proceedings of the Design Automation Conference (DAC). Article 1, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In ACM SIGARCH Computer Architecture News, Vol. 40. IEEE Computer Society, 213--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hao Wang, Vijay Sathish, Ripudaman Singh, Michael J. Schulte, and Nam Sung Kim. 2012. Workload and power budget partitioning for single-chip heterogeneous processors. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM, 401--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Hao Wang, Ripudaman Singh, Michael J. Schulte, and Nam Sung Kim. 2014. Memory scheduling towards high-throughput cooperative heterogeneous computing. In Proceedings of the 23rd international conference on Parallel architectures and compilation. ACM, 331--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yuan Wen, Zheng Wang, and Michael F. P. O’Boyle. 2014. Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In High Performance Computing (HiPC), 2014 21st International Conference on. IEEE, 1--10.Google ScholarGoogle Scholar
  38. Yi-Ping You, Hen-Jung Wu, Yeh-Ning Tsai, and Yen-Ting Chao. 2015. VirtCL: a framework for OpenCL device abstraction and management. In ACM SIGPLAN Notices, Vol. 50. ACM, 161--172. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Energy-Efficient Run-Time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!