Abstract
A hardware configuration is a set of processors and their frequency levels in a multicore heterogeneous system. This article presents a compiler-based technique to match functions with hardware configurations. Such a technique consists of using multivariate linear regression to associate function arguments with particular hardware configurations. By showing that this classification space tends to be convex in practice, this article demonstrates that linear regression is not only an efficient tool to map computations to heterogeneous hardware, but also an effective one. To demonstrate the viability of multivariate linear regression as a way to perform adaptive compilation for heterogeneous architectures, we have implemented our ideas onto the Soot Java bytecode analyzer. Code that we produce can predict the best configuration for a large class of Java and Scala benchmarks running on an Odroid XU4 big.LITTLE board; hence, outperforming prior techniques such as ARM’s GTS and CHOAMP, a recently released static program scheduler.
- [1] . 2018. Heartbeat scheduling: Provable efficiency fornested parallelism. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 769–782. Google Scholar
Digital Library
- [2] . 2018. A survey on compiler autotuning using machine learning. ACM Computing Surveys 51, 5 (2018), 96:1–96:42.
DOI: DOI: http://dx.doi.org/10.1145/3197978 Google ScholarCross Ref
- [3] . 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation : Practice and Experience 23, 2 (2011), 187–198. Google Scholar
Digital Library
- [4] . 2019. SaC: Exploiting execution-time slack to save energy in heterogeneous multicore systems. In Proceedings of the 48th International Conference on Parallel Processing. ACM, New York, NY, 26:1–26:12.
DOI: DOI: http://dx.doi.org/10.1145/3337821.3337865 Google ScholarCross Ref
- [5] . 2017. SLOOP: QoS-supervised loop execution to reduce energy on heterogeneous architectures. ACM Transactions on Architecture and Code Optimization 14, 4(2017), Article 41, 25 pages.
DOI: DOI: http://dx.doi.org/10.1145/3148053 Google ScholarCross Ref
- [6] . 1991. The NAS parallel benchmarks & mdash; Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing. ACM, New York, NY, 158–165. Google Scholar
Digital Library
- [7] . 1993. Branch prediction for free. ACMSIGPLAN Notices 28, 6 (1993), 300–313.
DOI: DOI: http://dx.doi.org/10.1145/173262.155119 Google ScholarCross Ref
- [8] . 2016. A black-box approach to energy-aware scheduling on integrated CPU-GPU systems. In Proceedings of the 2016 International Symposium on Code Generation and Optimization. ACM, New York, NY, 70–81. Google Scholar
Digital Library
- [9] . 2017. JetsonLEAP: A framework to measure power on a heterogeneous system-on-a-chip device. Science of Computer Programming 33, 1 (2017), 1–37.Google Scholar
- [10] . 1936. Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8, (1936), 3–62.Google Scholar
- [11] . 2004. Convex Optimization. Cambridge University Press, New York, NY. Google Scholar
Digital Library
- [12] . 2014. Seven Concurrency Models in Seven Weeks (1st ed.). Pragmatic Bookshelf, Raleigh, NC, US. Google Scholar
Digital Library
- [13] . 2016. Montgolfier: Latency-aware power management system for heterogeneous servers. In Proceedings of the IEEE International Conference on Performance, Computing and Communications. IEEE, 1–8.Google Scholar
Cross Ref
- [14] . 1847. Méthode Générale pour la résolutiondes systèmes d’Équations simultanées. Comptes Rendus Hebd. Séances Acad.Sci. 25, 10 (1847), 536–538.Google Scholar
- [15] . 2018. A compiler-centric infra-structure for whole-board energy measurement on heterogeneous android systems. In Proceedings of the International Workshop on Reconfigurable Communication-Centric Systems-on-Chip. IEEE, 1–8.Google Scholar
Cross Ref
- [16] . 2019. Scheduling in Heterogeneous Architecturesvia Multivariate Linear Regression on Function Inputs.
Technical Report LIRMM-02281112. CNRS.Google Scholar - [17] . 2020. Mapping computations in heterogeneous multicore systems with statistical regression on inputs. In Proceedings of the Brazilian Symposium on Computing System Engineering. IEEE, 42–49.Google Scholar
Cross Ref
- [18] . 2020. Linear Separability. (2020).
Lecture Notes on Machine Learning - STAT598. School of Electrical and Computer Engineering, Purdue University .Google Scholar - [19] . 2012. Energy-efficient scheduling on heterogeneous multi-core architectures. In Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design. ACM, New York, NY, 345–350. Google Scholar
Digital Library
- [20] . 2005. ACME: Adaptive compilation made efficient. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York, NY, 69–77. Google Scholar
Digital Library
- [21] . 2018. Collection Switch: A framework for efficient and dynamic collection selection. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. ACM, New York, NY, 16–26.
DOI: DOI: http://dx.doi.org/10.1145/3168825 Google ScholarCross Ref
- [22] . 2020. Energy refactorings for android in the large and in the wild. In Proceedings of the 2020 IEEE International Conference on Software Analysis, Evolution and Reengineering, , , , , and (Eds.). IEEE, 217–228.
DOI: DOI: http://dx.doi.org/10.1109/SANER48275.2020.9054858Google ScholarCross Ref
- [23] . 2014. Continuously measuring critical section pressure with the free-lunch profiler. ACM SIGPLAN Notices 49, 10 (2014), 291–307. Google Scholar
Digital Library
- [24] . 2014. Quasar: Resource-efficient and QoS-aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 127–144. Google Scholar
Digital Library
- [25] . 2016. SPARTA: Runtime task allocation for energy efficient heterogeneous many-cores. In Proceedings of the 11th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis CODES. ACM, New York, NY, 27:1–27:10. Google Scholar
Digital Library
- [26] . 1958. Estimation of the means for dependent variables. Annals of Mathematical Statistics. 29, 4 (1958), 1095–1111.Google Scholar
Cross Ref
- [27] . 2021. A comparative evaluation of latency-aware energy optimization approaches in many-core systems (Invited Paper). In Proceedings of the 2nd Workshop on Next Generation Real-Time Embedded Systems (OpenAccess Series in Informatics (OASIcs)), and (Eds.), Vol. 87. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 1:1–1:12.
DOI: DOI: http://dx.doi.org/10.4230/OASIcs.NG-RES.2021.1Google Scholar - [28] . 1918. The correlation between relatives on the supposition of mendelian inheritance. Philosophical Transactions 52, 2 (1918), 399–433.Google Scholar
- [29] . 2005. The design and implementation of FFTW3. Proceedings of the IEEE 93, 2 (2005), 216 –231.
DOI: DOI: http://dx.doi.org/10.1109/JPROC.2004.840301Google ScholarCross Ref
- [30] . 2018. Contention-aware fair scheduling for asymmetric single-ISA multicore systems. IEEE Transactions on Computers 67, 12 (2018), 1703–1719.
DOI: DOI: http://dx.doi.org/10.1109/TC.2018.2836418Google ScholarCross Ref
- [31] . 2015. A framework for application-guided task management on heterogeneous embedded systems. ACM Transactions on Architecture Code Optimization 12, 4 (
Dec. 2015), 42:1–42:25. Google ScholarDigital Library
- [32] . 2011. Big.LITTLE processing with ARM cortex-A15 &cortex-A7. (2011). White paper, Vol. 17. Retrieved from https://www.eetimes.com/document.asp?doc_id=1279167.Google Scholar
- [33] . 2013. Auto-tuning of the FFTW Library for Massively Parallel Supercomputers. Partnership Advanced Computing Europe, Tech. Rep (2013), 1–12.Google Scholar
- [34] . 2017. DyPO: Dynamic pareto-optimal configuration selection for heterogeneous MpSoCs. Transactions on Embedded Computing Systems 16, 5s (2017), 123:1–123:20.
DOI: DOI: http://dx.doi.org/10.1145/3126530 Google ScholarCross Ref
- [35] . 2020. Apple Aims to Sell Macs With Its Own Chips Startingin 2021. (2020).
Accessed on July 2021. https://www.bloomberg.com/news/articles/2020-04-23/apple-aims-to-sell-macs-with-its-own-chips-startingin-2021.Google Scholar - [36] . 2014. Heterogeneity by the Numbers: A study of the ODROIDXU+E Big. LITTLE platform. In Proceedings of the 6th Workshop on Power-Aware Computing and Systems HotPower. USENIX Association, Berkeley, CA, 3–3. Google Scholar
Digital Library
- [37] . 2015. POET: A portable approach to minimizing energy under soft real-time constraints. In Proceedings of the IEEE Symposium on Real-Time and Embedded Technology and Applications. IEEE , 75–86.
DOI: DOI: http://dx.doi.org/10.1109/RTAS.2015.7108419Google ScholarCross Ref
- [38] . 2016. Continuous shape shifting: Enabling loopco-optimization via near-free dynamic code rewriting. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture MICRO. IEEE, 1–12. Google Scholar
Digital Library
- [39] . 2013. big.LITTLE Technology moves towards fully heterogeneous Global Task Scheduling.
Technical Report . Arm Ltd.Google Scholar - [40] . 2012. Bottleneck identification and scheduling inmultithreaded applications. In Proceedings of the Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 223–234. Google Scholar
Digital Library
- [41] . 2011. Brainy: Effective selection of data structures. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 86–97.
DOI: DOI: http://dx.doi.org/10.1145/1993498.1993509 Google ScholarCross Ref
- [42] . 2000. Practical Pram Programming. John Wiley & Sons, Inc., USA. Google Scholar
Digital Library
- [43] . 2014. Looking into heterogeneity: when simple is faster. In Proceedings of the 2nd International Workshop on Parallelism in Mobile Platforms. Retrieved from https://news.ycombinator.com/item?id=8714613.Google Scholar
- [44] . 2018. Optimizing graph algorithms in asymmetric multicore processors. Transactions on CAD of Integrated Circuits and Systems 37, 11(2018), 2673–2684.
DOI: DOI: http://dx.doi.org/10.1109/TCAD.2018.2858366Google ScholarCross Ref
- [45] . 2004. Single-ISA Heterogeneous multi-core architecturesfor multithreaded workload performance. SIGARCH Computer Architecture News 32, 2 (2004), 64.
DOI: DOI: http://dx.doi.org/10.1145/1028176.1006707 Google ScholarCross Ref
- [46] . 2004. LLVM: A compilation framework for lifelong program analysis transformation. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE, 75–86.
DOI: DOI: http://dx.doi.org/10.1109/CGO.2004.1281665 Google ScholarCross Ref
- [47] . 2009. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture MICRO. ACM, New York, NY, 45–55. Google Scholar
Digital Library
- [48] . 2016. Exploring fine-grained heterogeneity with composite cores. Transactions on Computers 65, 2 (2016), 535–547. Google Scholar
Digital Library
- [49] . 2020. An Adaptive, utilization-based approach to schedulereal-time tasks for ARM Big.LITTLE architectures. SIGBED Review 17, 1 (2020), 18–23.
DOI: DOI: http://dx.doi.org/10.1145/3412821.3412824 Google ScholarCross Ref
- [50] . 2018. Helping developers write energy efficient haskell through a data-structure evaluation. In Proceedings of the 2018 IEEE/ACM 6th International Workshop on Green and Sustainable Software, , , and (Eds.). ACM, New York, NY, 9–15.
DOI: DOI: http://dx.doi.org/10.1145/3194078.3194080 Google ScholarCross Ref
- [51] . 2017. DawnCC: Automatic annotation for data parallelism and offloading. Transactions on Architecture and Code Optimization 14, 2(2017), 13:1–13:25. Google Scholar
Digital Library
- [52] . 2016. A survey of techniques for architecting and managing asymmetric multicore processors. Computing Surveys 48, 3 (2016), 45:1–45:38.
DOI: DOI: http://dx.doi.org/10.1145/2856125 Google ScholarCross Ref
- [53] . 2015. A Survey of CPU-GPU heterogeneous computing techniques. Computing Surveys 47, 4 (2015), 69:1–69:35. Google Scholar
Digital Library
- [54] . 2020. Coordinated management of processor configuration and cache partitioning to optimize energy under QoS constraints. In Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium. IEEE, 303–313.
DOI: DOI: http://dx.doi.org/10.1109/IPDPS.2019.00040Google ScholarCross Ref
- [55] . 2018. ULOOF: A user level online offloading framework for mobile edge computing. IEEE Transactions on Mobile Computing 17, 11 (2018), 2660–2674.
DOI: DOI: http://dx.doi.org/10.1109/TMC.2018.2815015Google ScholarCross Ref
- [56] . 2012. Efficient and scalable scheduling for performance heterogeneous multicore systems. Journal of Parallel and Distributed Computing 72, 3 (2012), 353–361. Google Scholar
Digital Library
- [57] . 2017. Hipster: Hybrid task manager for latency-critical cloud workloads. In Proceedings of the 2017 IEEE Symposium on High-Performance Computer Architecture. IEEE , 409–420.Google Scholar
Cross Ref
- [58] . 2021. Improving energy-efficiency by recommending Java collections. Empirical Software Engineering 26, 3 (2021), 55.
DOI: DOI: http://dx.doi.org/10.1007/s10664-021-09950-yGoogle ScholarCross Ref
- [59] . 2014. A survey on techniques for improving the energy efficiency of large-scale distributed systems. ACM Computing Surveys 46, 4 (2014), 47:1–47:31.
DOI: DOI: http://dx.doi.org/10.1145/2532637 Google ScholarCross Ref
- [60] . 2018. RPPC: A holistic runtime system for maximizing performance under power capping. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 41–50. Google Scholar
Digital Library
- [61] . 2020. Adaptive task allocation and scheduling onnoc-based multicore platforms with multitasking processors. ACM Transactions on Embedded Computing Systems 20, 1 (2020) Article 4, 26 pages.
DOI: DOI: http://dx.doi.org/10.1145/3408324 Google ScholarCross Ref
- [62] . 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Research 12, 85 (2011), 2825–2830. Google Scholar
Digital Library
- [63] . 2015. Energy-efficient thread assignment optimization for heterogeneous multicore systems. ACM Transactions on Embedded Computing System 14, 1 (2015), 15:1–15:26. Google Scholar
Digital Library
- [64] . 2014. Compiler support for selective page migration in NUMA architectures. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, New York, NY, 369–380. Google Scholar
Digital Library
- [65] . 2017. Static placement of computation on heterogeneous devices. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 50:1–50:28. Google Scholar
Digital Library
- [66] . 2019. Renaissance: Benchmarking suite for parallelapplications on the JVM. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 31–47. Google Scholar
Digital Library
- [67] . 2009. Thread Motion: Fine-grained power management for multi-core systems. In Proceedings of the International Science Community Association. ACM, New York, NY, 302–313. Google Scholar
Digital Library
- [68] . 2020. AdaMD: Adaptive mapping and dvfs for energy-efficient heterogeneous multicores. Transactions on Computer Aided Design of Integrated Circuits and Systems 39, 10 (2020), 2206–2217.
DOI: DOI: http://dx.doi.org/10.1109/TCAD.2019.2935065Google ScholarCross Ref
- [69] . 2015. Doing big.LITTLE Right: Little And Big Obstacles. Softprise Consulting.Google Scholar
- [70] . 2021. Energy-aware scheduling of multi-version tasks on heterogeneous real-time systems. In Proceedings of the 36th Annual ACM Symposium on Applied Computing. ACM, New York, NY, 1–10. Google Scholar
Digital Library
- [71] . 2013. Dandelion: A compiler and runtime for heterogeneous systems. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, New York, NY, 49–68. Google Scholar
Digital Library
- [72] . 2016. Compile- and run-time approaches for the selection of efficient data structures for dynamic graph analysis. Applied Network Science 1, 1(2016), 9.
DOI: DOI: http://dx.doi.org/10.1007/s41109-016-0011-2Google ScholarCross Ref
- [73] . 2009. HASS: A scheduler for heterogeneous multicore systems. SIGOPS Operating Systems Review 43, 2 (2009), 66–75. Google Scholar
Digital Library
- [74] . 2004. Convergence of line search methods for unconstrained optimization. Applied Mathematics and Computation 157, 2 (2004), 393–405.
DOI: DOI: http://dx.doi.org/10.1016/j.amc.2003.08.058 Google ScholarCross Ref
- [75] . 2012. Brief announcement: The problem based benchmarksuite. In Proceedings of the 24th Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, 68–70. Google Scholar
Digital Library
- [76] . 2020. Dynamic Energy and thermal management of multi-core mobile platforms: A survey. Design and Test 37, 5 (2020), 25–33.
DOI: DOI: http://dx.doi.org/10.1109/MDAT.2020.2982629Google ScholarCross Ref
- [77] . 2014. Price theory based power management for heterogeneous multi-cores. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 161–176. Google Scholar
Digital Library
- [78] . 2018. CHOAMP: Cost based hardware optimization for asymmetric multicore processors. Transactions on Multi-Scale Computing Systems 4, 2 (2018), 163–176.Google Scholar
Cross Ref
- [79] . 2013. ReQoS: Reactive static/dynamic compilation for qosin warehouse scale computers. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, USA, 89–100. Google Scholar
Digital Library
- [80] . 2021. Run-time enforcement of non-functional program properties on MPSoCs. In A Journey of Embedded and Cyber-Physical Systems—Essays Dedicated to Peter Marwedel on the Occasion of His 70th Birthday, (Ed.). Springer-Verlag, Berlin, 125–149.
DOI: DOI: http://dx.doi.org/10.1007/978-3-030-47487-4_9Google Scholar - [81] . 2019. Energy-efficient runtime management of heterogeneous multicores using online projection. Transactions on Architecture and Code Optimization 15, 4 (2019), 63:1–63:26. Google Scholar
Digital Library
- [82] . 1999. Soot—A java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative ResearchCASCON. IBM Press, Indianapolis, US, 13. Google Scholar
Digital Library
- [83] . 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the 2012 39th Annual International Symposium on Computer Architecture. IEEE, New York, NY, 213–224. Google Scholar
Digital Library
- [84] . 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the 2012 39th Annual International Symposium on Computer Architecture. IEEE Computer Society, 213–224. Google Scholar
Digital Library
- [85] . 2018. Machine learning in compiler optimization. Proceedings of the IEEE 106, 11 (2018), 1879–1901.
DOI: DOI: http://dx.doi.org/10.1109/JPROC.2018.2817118Google ScholarCross Ref
- [86] . 2017. Decoupled access-execute on ARM big.LITTLE.
arxiv:1701.05478 Retrieved from http://arxiv.org/abs/1701.05478.Google Scholar - [87] . 1994. Static branch frequency and program profile analysis. In Proceedings of the 27th Annual International Symposium on Microarchitecture MICRO. ACM, New York, NY, 1–11.
DOI: DOI: http://dx.doi.org/10.1145/192724.192725 Google ScholarCross Ref
- [88] . 2015. Neural acceleration for GPU through put processors. In Proceedings of the 48th International Symposium on Microarchitecture MICRO. IEEE, 482–493. Google Scholar
Digital Library
- [89] . 2016. Maximizing performance under a power cap: A Comparison of hardware, software, and hybrid techniques. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 545–559. Google Scholar
Digital Library
- [90] . 2020. Optimizing ordered graph algorithms with graphit. In Proceedings of the International Symposium on Code Generation and Optimization. ACM, New York, NY, 158–170.
DOI: DOI: http://dx.doi.org/10.1145/3368826.3377909Google ScholarCross Ref
Index Terms
Mapping Computations in Heterogeneous Multicore Systems with Statistical Regression on Program Inputs
Recommendations
Heterogeneous parallel_for Template for CPU---GPU Chips
Heterogeneous processors, comprising CPU cores and a GPU, are the de facto standard in desktop and mobile platforms. In many cases it is worthwhile to exploit both the CPU and GPU simultaneously. However, the workload distribution poses a challenge when ...
A case for coordinated resource management in heterogeneous multicore platforms
ISCA'10: Proceedings of the 2010 international conference on Computer ArchitectureRecent advances in multi- and many-core architectures include increased hardware-level parallelism (i.e., core counts) and the emergence of platform-level heterogeneity. System software managing these platforms is typically comprised of multiple ...
Phase-Guided Scheduling on Single-ISA Heterogeneous Multicore Processors
DSD '11: Proceedings of the 2011 14th Euromicro Conference on Digital System DesignSingle-ISA heterogeneous (also known as asymmetric) multicore processors offer significant advantages over homogenous multicores in terms of both power and performance. Power-efficient cores can be paired with higher-performance cores to achieve ...






Comments