skip to main content
research-article

Mapping Computations in Heterogeneous Multicore Systems with Statistical Regression on Program Inputs

Published:18 October 2021Publication History
Skip Abstract Section

Abstract

A hardware configuration is a set of processors and their frequency levels in a multicore heterogeneous system. This article presents a compiler-based technique to match functions with hardware configurations. Such a technique consists of using multivariate linear regression to associate function arguments with particular hardware configurations. By showing that this classification space tends to be convex in practice, this article demonstrates that linear regression is not only an efficient tool to map computations to heterogeneous hardware, but also an effective one. To demonstrate the viability of multivariate linear regression as a way to perform adaptive compilation for heterogeneous architectures, we have implemented our ideas onto the Soot Java bytecode analyzer. Code that we produce can predict the best configuration for a large class of Java and Scala benchmarks running on an Odroid XU4 big.LITTLE board; hence, outperforming prior techniques such as ARM’s GTS and CHOAMP, a recently released static program scheduler.

REFERENCES

  1. [1] Acar Umut A., Charguéraud Arthur, Guatto Adrien, Rainey Mike, and Sieczkowski Filip. 2018. Heartbeat scheduling: Provable efficiency fornested parallelism. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 769782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Ashouri Amir H., Killian William, Cavazos John, Palermo Gianluca, and Silvano Cristina. 2018. A survey on compiler autotuning using machine learning. ACM Computing Surveys 51, 5 (2018), 96:1–96:42. DOI: DOI: http://dx.doi.org/10.1145/3197978 Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Augonnet Cedric, Thibault Samuel, Namyst Raymond, and Wacrenier Pierre-Andre. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation : Practice and Experience 23, 2 (2011), 187198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Azhar Muhammad Waqar, Pericàs Miquel, and Stenström Per. 2019. SaC: Exploiting execution-time slack to save energy in heterogeneous multicore systems. In Proceedings of the 48th International Conference on Parallel Processing. ACM, New York, NY, 26:1–26:12. DOI: DOI: http://dx.doi.org/10.1145/3337821.3337865 Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Azhar M. Waqar, Stenström Per, and Papaefstathiou Vassilis. 2017. SLOOP: QoS-supervised loop execution to reduce energy on heterogeneous architectures. ACM Transactions on Architecture and Code Optimization 14, 4(2017), Article 41, 25 pages. DOI: DOI: http://dx.doi.org/10.1145/3148053 Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Bailey D. H., Barszcz E., Barton J. T., Browning D. S., Carter R. L., Dagum L., Fatoohi R. A., Frederickson P. O., Lasinski T. A., Schreiber R. S., Simon H. D., Venkatakrishnan V., and Weeratunga S. K.. 1991. The NAS parallel benchmarks & mdash; Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing. ACM, New York, NY, 158165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Ball Thomas and Larus James R.. 1993. Branch prediction for free. ACMSIGPLAN Notices 28, 6 (1993), 300313. DOI: DOI: http://dx.doi.org/10.1145/173262.155119 Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Barik Rajkishore, Farooqui Naila, Lewis Brian T., Hu Chunling, and Shpeisman Tatiana. 2016. A black-box approach to energy-aware scheduling on integrated CPU-GPU systems. In Proceedings of the 2016 International Symposium on Code Generation and Optimization. ACM, New York, NY, 7081. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Bessa Tarsila, Gull Ghristopher, ao Pedro Quint, Frank Michael, Nacif José, and Pereira Fernando Magno Quint ao. 2017. JetsonLEAP: A framework to measure power on a heterogeneous system-on-a-chip device. Science of Computer Programming 33, 1 (2017), 137.Google ScholarGoogle Scholar
  10. [10] Bonferroni Carlo Emilio. 1936. Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8, (1936), 3–62.Google ScholarGoogle Scholar
  11. [11] Boyd Stephen and Vandenberghe Lieven. 2004. Convex Optimization. Cambridge University Press, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Butcher Paul. 2014. Seven Concurrency Models in Seven Weeks (1st ed.). Pragmatic Bookshelf, Raleigh, NC, US. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Cai Haoran, Cao Qiang, Sheng Feng, Zhang Manyi, Qi Chuanyi, Yao Jie, and Xie Changsheng. 2016. Montgolfier: Latency-aware power management system for heterogeneous servers. In Proceedings of the IEEE International Conference on Performance, Computing and Communications. IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Cauchy M. Augustine. 1847. Méthode Générale pour la résolutiondes systèmes d’Équations simultanées. Comptes Rendus Hebd. Séances Acad.Sci. 25, 10 (1847), 536538.Google ScholarGoogle Scholar
  15. [15] Silva Junio Cezar Ribeiro da, Pereira Fernando Magno Quintão, Frank Michael, and Gamatié Abdoulaye. 2018. A compiler-centric infra-structure for whole-board energy measurement on heterogeneous android systems. In Proceedings of the International Workshop on Reconfigurable Communication-Centric Systems-on-Chip. IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Silva Junio Cezar Ribeiro da, ao Lorena Le, Petrucci Vinícius, Gamatié Abdoulaye, and Pereira Fernando Magno Quint ao. 2019. Scheduling in Heterogeneous Architecturesvia Multivariate Linear Regression on Function Inputs. Technical Report LIRMM-02281112. CNRS.Google ScholarGoogle Scholar
  17. [17] Silva Junio Cezar Ribeiro da, ao Lorena Le, Petrucci Vinícius, Gamatié Abdoulaye, and Pereira Fernando Magno Quint ao. 2020. Mapping computations in heterogeneous multicore systems with statistical regression on inputs. In Proceedings of the Brazilian Symposium on Computing System Engineering. IEEE, 4249.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Chan Stanley. 2020. Linear Separability. (2020). Lecture Notes on Machine Learning - STAT598. School of Electrical and Computer Engineering, Purdue University.Google ScholarGoogle Scholar
  19. [19] Cong Jason and Yuan Bo. 2012. Energy-efficient scheduling on heterogeneous multi-core architectures. In Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design. ACM, New York, NY, 345350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Cooper Keith D., Grosul Alexander, Harvey Timothy J., Reeves Steven, Subramanian Devika, Torczon Linda, and Waterman Todd. 2005. ACME: Adaptive compilation made efficient. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York, NY, 6977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Costa Diego and Andrzejak Artur. 2018. Collection Switch: A framework for efficient and dynamic collection selection. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. ACM, New York, NY, 1626. DOI: DOI: http://dx.doi.org/10.1145/3168825 Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Couto Marco, Saraiva João, and Fernandes João Paulo. 2020. Energy refactorings for android in the large and in the wild. In Proceedings of the 2020 IEEE International Conference on Software Analysis, Evolution and Reengineering, Kontogiannis Kostas, Khomh Foutse, Chatzigeorgiou Alexander, Fokaefs Marios-Eleftherios, and Zhou Minghui (Eds.). IEEE, 217228. DOI: DOI: http://dx.doi.org/10.1109/SANER48275.2020.9054858Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] David Florian, Thomas Gael, Lawall Julia, and Muller Gilles. 2014. Continuously measuring critical section pressure with the free-lunch profiler. ACM SIGPLAN Notices 49, 10 (2014), 291307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Delimitrou Christina and Kozyrakis Christos. 2014. Quasar: Resource-efficient and QoS-aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 127144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Donyanavard Bryan, Mück Tiago, Sarma Santanu, and Dutt Nikil. 2016. SPARTA: Runtime task allocation for energy efficient heterogeneous many-cores. In Proceedings of the 11th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis CODES. ACM, New York, NY, 27:1–27:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Dunn Olive Jean. 1958. Estimation of the means for dependent variables. Annals of Mathematical Statistics. 29, 4 (1958), 10951111.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Esper Khalil, Wildermann Stefan, and Teich Jürgen. 2021. A comparative evaluation of latency-aware energy optimization approaches in many-core systems (Invited Paper). In Proceedings of the 2nd Workshop on Next Generation Real-Time Embedded Systems (OpenAccess Series in Informatics (OASIcs)), Bertogna Markoand Terraneo Federico (Eds.), Vol. 87. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 1:1–1:12. DOI: DOI: http://dx.doi.org/10.4230/OASIcs.NG-RES.2021.1Google ScholarGoogle Scholar
  28. [28] Fisher Ronald A.. 1918. The correlation between relatives on the supposition of mendelian inheritance. Philosophical Transactions 52, 2 (1918), 399433.Google ScholarGoogle Scholar
  29. [29] Frigo M. and Johnson S. G.. 2005. The design and implementation of FFTW3. Proceedings of the IEEE 93, 2 (2005), 216 –231. DOI: DOI: http://dx.doi.org/10.1109/JPROC.2004.840301Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Garcia-Garcia Adrian, Saez Juan Carlos, and Prieto Manuel. 2018. Contention-aware fair scheduling for asymmetric single-ISA multicore systems. IEEE Transactions on Computers 67, 12 (2018), 17031719. DOI: DOI: http://dx.doi.org/10.1109/TC.2018.2836418Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Gaspar Francisco, Taniça Luis, Tomás Pedro, AleksandarIlic, and Sousa Leonel. 2015. A framework for application-guided task management on heterogeneous embedded systems. ACM Transactions on Architecture Code Optimization 12, 4 (Dec. 2015), 42:1–42:25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Greenhalgh Peter. 2011. Big.LITTLE processing with ARM cortex-A15 &cortex-A7. (2011). White paper, Vol. 17. Retrieved from https://www.eetimes.com/document.asp?doc_id=1279167.Google ScholarGoogle Scholar
  33. [33] Guarrasi Massimiliano, Erbacci Giovanni, and Emerson Andrew. 2013. Auto-tuning of the FFTW Library for Massively Parallel Supercomputers. Partnership Advanced Computing Europe, Tech. Rep (2013), 1–12.Google ScholarGoogle Scholar
  34. [34] Gupta Ujjwal, Patil Chetan Arvind, Bhat Ganapati, Mishra Prabhat, and Ogras Umit Y.. 2017. DyPO: Dynamic pareto-optimal configuration selection for heterogeneous MpSoCs. Transactions on Embedded Computing Systems 16, 5s (2017), 123:1–123:20. DOI: DOI: http://dx.doi.org/10.1145/3126530 Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Gurman Mark, Wu Debby, and King Ian. 2020. Apple Aims to Sell Macs With Its Own Chips Startingin 2021. (2020). Accessed on July 2021. https://www.bloomberg.com/news/articles/2020-04-23/apple-aims-to-sell-macs-with-its-own-chips-startingin-2021.Google ScholarGoogle Scholar
  36. [36] Hähnel Marcus and Härtig Hermann. 2014. Heterogeneity by the Numbers: A study of the ODROIDXU+E Big. LITTLE platform. In Proceedings of the 6th Workshop on Power-Aware Computing and Systems HotPower. USENIX Association, Berkeley, CA, 33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Imes Connor, Kim David H. K., Maggio Martina, and Hoffmann Henry. 2015. POET: A portable approach to minimizing energy under soft real-time constraints. In Proceedings of the IEEE Symposium on Real-Time and Embedded Technology and Applications. IEEE , 7586. DOI: DOI: http://dx.doi.org/10.1109/RTAS.2015.7108419Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Jain A., Laurenzano M. A., Tang L., and Mars J.. 2016. Continuous shape shifting: Enabling loopco-optimization via near-free dynamic code rewriting. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture MICRO. IEEE, 112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Jeff Brian. 2013. big.LITTLE Technology moves towards fully heterogeneous Global Task Scheduling. Technical Report. Arm Ltd.Google ScholarGoogle Scholar
  40. [40] Joao José A., Suleman M. Aater, Mutlu Onur, and Patt Yale N.. 2012. Bottleneck identification and scheduling inmultithreaded applications. In Proceedings of the Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 223234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Jung Changhee, Rus Silvius, Railing Brian P., Clark Nathan, and Pande Santosh. 2011. Brainy: Effective selection of data structures. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 8697. DOI: DOI: http://dx.doi.org/10.1145/1993498.1993509 Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Keller Jörg, Kessler Christoph, and Träff Jesper Larsson. 2000. Practical Pram Programming. John Wiley & Sons, Inc., USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Kim J. M., Seo S. K.,and Chung S. W.. 2014. Looking into heterogeneity: when simple is faster. In Proceedings of the 2nd International Workshop on Parallelism in Mobile Platforms. Retrieved from https://news.ycombinator.com/item?id=8714613.Google ScholarGoogle Scholar
  44. [44] Krishna Jyothi and Nasre Rupesh. 2018. Optimizing graph algorithms in asymmetric multicore processors. Transactions on CAD of Integrated Circuits and Systems 37, 11(2018), 26732684. DOI: DOI: http://dx.doi.org/10.1109/TCAD.2018.2858366Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Kumar Rakesh, Tullsen Dean M., Ranganathan Parthasarathy, Jouppi Norman P., and Farkas Keith I.. 2004. Single-ISA Heterogeneous multi-core architecturesfor multithreaded workload performance. SIGARCH Computer Architecture News 32, 2 (2004), 64. DOI: DOI: http://dx.doi.org/10.1145/1028176.1006707 Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Lattner Chris and Adve Sarita V.. 2004. LLVM: A compilation framework for lifelong program analysis transformation. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE, 7586. DOI: DOI: http://dx.doi.org/10.1109/CGO.2004.1281665 Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Luk Chi-Keung, Hong Sunpyo, and Kim Hyesoon. 2009. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture MICRO. ACM, New York, NY, 4555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Lukefahr A., Padmanabha S., Das R., Sleiman F. M., Dreslinski R. G., Wenisch T. F., and Mahlke S.. 2016. Exploring fine-grained heterogeneity with composite cores. Transactions on Computers 65, 2 (2016), 535547. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Mascitti Agostino, Cucinotta Tommaso, and Marinoni Mauro. 2020. An Adaptive, utilization-based approach to schedulereal-time tasks for ARM Big.LITTLE architectures. SIGBED Review 17, 1 (2020), 1823. DOI: DOI: http://dx.doi.org/10.1145/3412821.3412824 Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Melfe Gilberto, Fonseca Alcides, and Fernandes João Paulo. 2018. Helping developers write energy efficient haskell through a data-structure evaluation. In Proceedings of the 2018 IEEE/ACM 6th International Workshop on Green and Sustainable Software, Malavolta Ivano, Kazman Rick, and Saraiva João(Eds.). ACM, New York, NY, 915. DOI: DOI: http://dx.doi.org/10.1145/3194078.3194080 Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Mendonça Gleison, Guimarães Breno, Alves Péricles, Pereira Márcio, Araújo Guido, and Pereira Fernando Magno Quintão. 2017. DawnCC: Automatic annotation for data parallelism and offloading. Transactions on Architecture and Code Optimization 14, 2(2017), 13:1–13:25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Mittal Sparsh. 2016. A survey of techniques for architecting and managing asymmetric multicore processors. Computing Surveys 48, 3 (2016), 45:1–45:38. DOI: DOI: http://dx.doi.org/10.1145/2856125 Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Mittal Sparsh and Vetter Jeffrey S.. 2015. A Survey of CPU-GPU heterogeneous computing techniques. Computing Surveys 47, 4 (2015), 69:1–69:35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Nejat Mehrzad, Manivannan Madhavan, Pericas Miquel, and Stenstrom Per. 2020. Coordinated management of processor configuration and cache partitioning to optimize energy under QoS constraints. In Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium. IEEE, 303313. DOI: DOI: http://dx.doi.org/10.1109/IPDPS.2019.00040Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Neto Jose Leal Domingues, Yu Se-Young, Macedo Daniel F., Nogueira José Marcos S., Langar Rami, and Secci Stefano. 2018. ULOOF: A user level online offloading framework for mobile edge computing. IEEE Transactions on Mobile Computing 17, 11 (2018), 26602674. DOI: DOI: http://dx.doi.org/10.1109/TMC.2018.2815015Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Nie Pengcheng and Duan Zhenhua. 2012. Efficient and scalable scheduling for performance heterogeneous multicore systems. Journal of Parallel and Distributed Computing 72, 3 (2012), 353361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Nishtala Rajiv, Carpenter Paul M., Petrucci Vinicius, and Martorell Xavier. 2017. Hipster: Hybrid task manager for latency-critical cloud workloads. In Proceedings of the 2017 IEEE Symposium on High-Performance Computer Architecture. IEEE , 409420.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Oliveira Wellington, Oliveira Renato, Castor Fernando, Pinto Gustavo, and Fernandes João Paulo. 2021. Improving energy-efficiency by recommending Java collections. Empirical Software Engineering 26, 3 (2021), 55. DOI: DOI: http://dx.doi.org/10.1007/s10664-021-09950-yGoogle ScholarGoogle ScholarCross RefCross Ref
  59. [59] Orgerie Anne-Cecile, ão Marcos Dias de Assunç, and Lefevre Laurent. 2014. A survey on techniques for improving the energy efficiency of large-scale distributed systems. ACM Computing Surveys 46, 4 (2014), 47:1–47:31. DOI: DOI: http://dx.doi.org/10.1145/2532637 Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Park Jinsu, Park Seongbeom, and Baek Woongki. 2018. RPPC: A holistic runtime system for maximizing performance under power capping. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 4150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Paul Suraj, Chatterjee Navonil, Ghosal Prasun, and Diguet Jean-Philippe. 2020. Adaptive task allocation and scheduling onnoc-based multicore platforms with multitasking processors. ACM Transactions on Embedded Computing Systems 20, 1 (2020) Article 4, 26 pages. DOI: DOI: http://dx.doi.org/10.1145/3408324 Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., and Duchesnay E.. 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Research 12, 85 (2011), 28252830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Petrucci Vinicius, Loques Orlando, Mossé Daniel, Melhem Rami, Gazala Neven Abou, and Gobriel Sameh. 2015. Energy-efficient thread assignment optimization for heterogeneous multicore systems. ACM Transactions on Embedded Computing System 14, 1 (2015), 15:1–15:26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Piccoli Guilherme, Santos Henrique N., Rodrigues Raphael E., Pousa Christiane, Borin Edson, and Pereira Fernando M. Quintão. 2014. Compiler support for selective page migration in NUMA architectures. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, New York, NY, 369380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Poesia Gabriel, Guimarães Breno Campos Ferreira, Ferracioli Fabricio, and Pereira Fernando Magno Quintão. 2017. Static placement of computation on heterogeneous devices. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 50:1–50:28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Prokopec Aleksandar, Rosà Andrea, Leopoldseder David, Duboscq Gilles, Tůma Petr, Studener Martin, Bulej Lubomír, Zheng Yudi, Villazón Alex, Simon Doug, Würthinger Thomas, and Binder Walter. 2019. Renaissance: Benchmarking suite for parallelapplications on the JVM. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 3147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Rangan Krishna K., Wei Gu-Yeon, and Brooks David. 2009. Thread Motion: Fine-grained power management for multi-core systems. In Proceedings of the International Science Community Association. ACM, New York, NY, 302313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Reddy Basireddy Karunakar, Singh Amit Kumar, Al-Hashimi Bashir M., and Merrett Geoff V.. 2020. AdaMD: Adaptive mapping and dvfs for energy-efficient heterogeneous multicores. Transactions on Computer Aided Design of Integrated Circuits and Systems 39, 10 (2020), 22062217. DOI: DOI: http://dx.doi.org/10.1109/TCAD.2019.2935065Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Rezki Uladizislau and Wool Vitaly. 2015. Doing big.LITTLE Right: Little And Big Obstacles. Softprise Consulting.Google ScholarGoogle Scholar
  70. [70] Roeder Julius, Altmeyer Sebastian, Rouxel Benjamin, and Grelck Clemens. 2021. Energy-aware scheduling of multi-version tasks on heterogeneous real-time systems. In Proceedings of the 36th Annual ACM Symposium on Applied Computing. ACM, New York, NY, 110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Rossbach Christopher J., Yu Yuan, Currey Jon, Martin Jean-Philippe, and Fetterly Dennis. 2013. Dandelion: A compiler and runtime for heterogeneous systems. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, New York, NY, 4968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. [72] Schiller Benjamin, Deusser Clemens, Castrillón Jerónimo, and Strufe Thorsten. 2016. Compile- and run-time approaches for the selection of efficient data structures for dynamic graph analysis. Applied Network Science 1, 1(2016), 9. DOI: DOI: http://dx.doi.org/10.1007/s41109-016-0011-2Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Shelepov Daniel, Alcaide Juan Carlos Saez, Jeffery Stacey, Fedorova Alexandra, Perez Nestor, Huang Zhi Feng, Blagodurov Sergey, and Kumar Viren. 2009. HASS: A scheduler for heterogeneous multicore systems. SIGOPS Operating Systems Review 43, 2 (2009), 6675. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. [74] Shi Zhen-Jun. 2004. Convergence of line search methods for unconstrained optimization. Applied Mathematics and Computation 157, 2 (2004), 393405. DOI: DOI: http://dx.doi.org/10.1016/j.amc.2003.08.058 Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Shun Julian, Blelloch Guy E., Fineman Jeremy T., Gibbons Phillip B., Kyrola Aapo, Simhadri Harsha Vardhan, and Tangwongsan Kanat. 2012. Brief announcement: The problem based benchmarksuite. In Proceedings of the 24th Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, 6870. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. [76] Singh Amit Kumar, Dey Somdip, McDonald-Maier Klaus D., Reddy Basireddy Karunakar, Merrett Geoff V., and Al-Hashimi Bashir M.. 2020. Dynamic Energy and thermal management of multi-core mobile platforms: A survey. Design and Test 37, 5 (2020), 2533. DOI: DOI: http://dx.doi.org/10.1109/MDAT.2020.2982629Google ScholarGoogle ScholarCross RefCross Ref
  77. [77] Muthukaruppan Thannirmalai Somu, Pathania Anuj, and Mitra Tulika. 2014. Price theory based power management for heterogeneous multi-cores. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 161176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. [78] Sreelatha Jyothi Krishna Viswakaran, Balachandran Shankar, and Nasre Rupesh. 2018. CHOAMP: Cost based hardware optimization for asymmetric multicore processors. Transactions on Multi-Scale Computing Systems 4, 2 (2018), 163176.Google ScholarGoogle ScholarCross RefCross Ref
  79. [79] Tang Lingjia, Mars Jason, Wang Wei, Dey Tanima, and Soffa Mary Lou. 2013. ReQoS: Reactive static/dynamic compilation for qosin warehouse scale computers. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, USA, 89100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. [80] Teich Jürgen, Mahmoody Pouya, Pourmohseni Behnaz, Roloff Sascha, Schröder-Preikschat Wolfgang, and Wildermann Stefan. 2021. Run-time enforcement of non-functional program properties on MPSoCs. In A Journey of Embedded and Cyber-Physical Systems—Essays Dedicated to Peter Marwedel on the Occasion of His 70th Birthday, Chen Jian-Jia(Ed.). Springer-Verlag, Berlin, 125149. DOI: DOI: http://dx.doi.org/10.1007/978-3-030-47487-4_9Google ScholarGoogle Scholar
  81. [81] Tzilis Stavros, Trancoso Pedro, and Sourdis Ioannis. 2019. Energy-efficient runtime management of heterogeneous multicores using online projection. Transactions on Architecture and Code Optimization 15, 4 (2019), 63:1–63:26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. [82] Vallée-Rai Raja, Co Phong, Gagnon Etienne, Hendren Laurie, Lam Patrick, and Sundaresan Vijay. 1999. Soot—A java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative ResearchCASCON. IBM Press, Indianapolis, US, 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Craeynest Kenzo Van, Jaleel Aamer, Eeckhout Lieven, Narvaez Paolo, and Emer Joel. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the 2012 39th Annual International Symposium on Computer Architecture. IEEE, New York, NY, 213224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. [84] Craeynest Kenzo Van, Jaleel Aamer, Eeckhout Lieven, Narvaez Paolo, and Emer Joel. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the 2012 39th Annual International Symposium on Computer Architecture. IEEE Computer Society, 213224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. [85] Wang Zheng and O’Boyle Michael F. P.. 2018. Machine learning in compiler optimization. Proceedings of the IEEE 106, 11 (2018), 18791901. DOI: DOI: http://dx.doi.org/10.1109/JPROC.2018.2817118Google ScholarGoogle ScholarCross RefCross Ref
  86. [86] Weber Anton, Kim-AnhTran, Kaxiras Stefanos, and Jimborean Alexandra. 2017. Decoupled access-execute on ARM big.LITTLE. arxiv:1701.05478Retrieved from http://arxiv.org/abs/1701.05478.Google ScholarGoogle Scholar
  87. [87] Wu Youfeng and Larus James R.. 1994. Static branch frequency and program profile analysis. In Proceedings of the 27th Annual International Symposium on Microarchitecture MICRO. ACM, New York, NY, 111. DOI: DOI: http://dx.doi.org/10.1145/192724.192725 Google ScholarGoogle ScholarCross RefCross Ref
  88. [88] Yazdanbakhsh A., Park J., Sharma H., Lotfi-Kamran P., and Esmaeilzadeh H.. 2015. Neural acceleration for GPU through put processors. In Proceedings of the 48th International Symposium on Microarchitecture MICRO. IEEE, 482493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. [89] Zhang Huazhe and Hoffmann Henry. 2016. Maximizing performance under a power cap: A Comparison of hardware, software, and hybrid techniques. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 545559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. [90] Zhang Yunming, Brahmakshatriya Ajay, Chen Xinyi, Dhulipala Laxman, Kamil Shoaib, Amarasinghe Saman, and Shun Julian. 2020. Optimizing ordered graph algorithms with graphit. In Proceedings of the International Symposium on Code Generation and Optimization. ACM, New York, NY, 158170. DOI: DOI: http://dx.doi.org/10.1145/3368826.3377909Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Mapping Computations in Heterogeneous Multicore Systems with Statistical Regression on Program Inputs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Embedded Computing Systems
          ACM Transactions on Embedded Computing Systems  Volume 20, Issue 6
          November 2021
          256 pages
          ISSN:1539-9087
          EISSN:1558-3465
          DOI:10.1145/3485150
          • Editor:
          • Tulika Mitra
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 October 2021
          • Revised: 1 July 2021
          • Accepted: 1 July 2021
          • Received: 1 March 2021
          Published in tecs Volume 20, Issue 6

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)70
          • Downloads (Last 6 weeks)4

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!