Abstract
Heterogeneous processors such as ARM’s big.LITTLE have become popular for embedded systems. They offer a choice between running workloads on a high performance core or a low-energy core leading to increased energy efficiency. However, the core configurations are fixed at design time which offers a limited amount of adaptation.
Dynamic Multicore Processors (DMPs) bridge the gap between homogeneous and fully reconfigurable systems. Cores can fuse dynamically to adapt the computational resources to the needs of different workloads. There exists multiple examples of DMPs in the literature, yet the focus has mainly been on static partitioning.
This paper conducts the first thorough study of the potential for dynamic reconfiguration of DMPs at runtime. We study how performance varies with static partitioning and what software optimizations are required to achieve high performance. We show that energy consumption is reduced considerably when adapting the number of cores to program phases, and introduce a simple online model which predicts the optimal number of cores to use to minimize energy consumption while maintaining high performance. Using the San Diego Vision Benchmark Suite as a use case, the dynamic scheme leads to ∼40% energy savings on average without decreasing performance.
- L. Bauer, M. Shafique, S. Kreutz, and J. Henkel. 2008. Run-time System for an Extensible Embedded Processor with Dynamic Instruction Set. In the Conference on Design, Automation and Test in Europe (DATE’08). ACM, 6. Google Scholar
Digital Library
- D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, W. Yoder, X. Chen, R. Desikan, S. Drolia, J. Gibson, M. S. Govindan, P. Gratz, H. Hanson, C. Kim, S. K. Kushwaha, H. Liu, R. Nagarajan, N. Ranganathan, R. Reeber, K. Sankaralingam, S. Sethumadhavan, P. Sivakumar, and A. Smith. 2004. Scaling to the End of Silicon with EDGE Architectures. Computer 37, 7 (July 2004), 44--55. Google Scholar
Digital Library
- C. Dubach, T. M. Jones, and E. V. Bonilla. 2013. Dynamic Microarchitectural Adaptation using Machine Learning. ACM Transactions on Architecture and Code Optimization 10 (2013). Issue 4. Google Scholar
Digital Library
- C. Dubach, T. M. Jones, and M. F. P. O’Boyle. 2012. Exploring and Predicting the Effects of Microarchitectural Parameters and Compiler Optimizations on Performance and Energy. ACM Transactions on Embedded Computing Systems 11S, 1, Article 24 (June 2012). Google Scholar
Digital Library
- Leese Everitt, Landau. 2001. Cluster Analysis. Google Scholar
Digital Library
- C. Fallin, C. Wilkerson, and O. Mutlu. 2014. The Heterogeneous Block Architecture. In IEEE 32nd International Conference on Computer Design (ICCD). 386--393.Google Scholar
- M. S. Govindan, B. Robatmili, D. Li, B. Maher, A. Smith, S. W. Keckler, and D. Burger. 2014. Scaling Power and Performance via Processor Composability. IEEE Trans. Comput. 63, 8 (Aug. 2014), 2025--2038. Google Scholar
Digital Library
- H. Homayoun, V. Kontorinis, A. Shayan, T. Lin, and D. M. Tullsen. 2012. Dynamically Heterogeneous Cores Through 3D Resource Pooling. In Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA’12). IEEE Computer Society, 1--12. Google Scholar
Digital Library
- E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. 2007. Core Fusion: Accommodating Software Diversity in Chip Multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM, 186--197. Google Scholar
Digital Library
- I. Jibaja, T. Cao, S. M. Blackburn, and K. S. McKinley. 2016. Portable Performance on Asymmetric Multicore Processors. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’16). ACM, 24--35. Google Scholar
Digital Library
- Khubaib, M. A. Suleman, M. Hashemi, C. Wilkerson, and Y. N. Patt. 2012. MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). Google Scholar
Digital Library
- C. Kim, S. Sethumadhavan, M. S. Govindan, N. Ranganathan, Divya Gulati, D. Burger, and S. W. Keckler. 2007. Composable Lightweight Processors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 40). IEEE Computer Society, 381--394. Google Scholar
Digital Library
- P. J. Micolet, A. Smith, and C. Dubach. 2016. A Machine Learning Approach to Mapping Streaming Workloads to Dynamic Multicore Processors. In Proceedings of the 17th ACM Conference on Languages, Compilers, Tools, and Theory for Embedded Systems (LCTES’16). ACM, 113--122. Google Scholar
Digital Library
- S. Mittal. 2016. A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors. Comput. Surveys 48, 3, Article 45 (Feb. 2016). Google Scholar
Digital Library
- S. Pagani, A. Pathania, M. Shafique, J. J. Chen, and J. Henkel. 2017. Energy Efficiency for Clustered Heterogeneous Multicores. IEEE Transactions on Parallel and Distributed Systems 28, 5 (May 2017), 1315--1330. Google Scholar
Digital Library
- M. Pricopi and T. Mitra. 2012. Bahurupi: A Polymorphic Heterogeneous Multi-core Architecture. ACM Transactions on Architecture and Code Optimization 8, 4, Article 22 (Jan. 2012). Google Scholar
Digital Library
- M. Pricopi and T. Mitra. 2014. Task Scheduling on Adaptive Multi-Core. IEEE Transactions on Computer 63, 10 (Oct. 2014), 2590--2603. Google Scholar
Digital Library
- A. Putnam, A. Smith, and D. Burger. 2011. Dynamic Vectorization in the E2 Dynamic Multicore Architecture. SIGARCH Comput. Archit. News 38, 4 (Jan. 2011), 27--32. Google Scholar
Digital Library
- A. Smith and A. Bakhoda. 2017. Microsoft Research Development Kit for EDGE Architectures. https://www.microsoft.com/en-us/research/project/e2/. (2017). Accessed: 2017-07-14.Google Scholar
- A. Smith, J. Gibson, B. Maher, N. Nethercote, B. Yoder, D. Burger, K. S. McKinley, and J. Burrill. 2006. Compiling for EDGE Architectures. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). Google Scholar
Digital Library
- M. K. Tavana, M. H. Hajkazemi, D. Pathak, I. Savidis, and H. Homayoun. 2015. ElasticCore: Enabling Dynamic Heterogeneity with Joint Core and Voltage/Frequency Scaling. In ACM/EDAC/IEEE Design Automation Conference (DAC’15). 1--6. Google Scholar
Digital Library
- W. Thies and S. Amarasinghe. 2010. An Empirical Characterization of Stream Programs and Its Implications for Language and Compiler Design. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM, 365--376. Google Scholar
Digital Library
- E. Tomusk, C. Dubach, and M. O’Boyle. 2015. Four Metrics to Evaluate Heterogeneous Multicores. ACM Transactions on Architecture and Code Optimization 12, 4, Article 37 (Nov. 2015). Google Scholar
Digital Library
- A. Venkat and D. M. Tullsen. 2014. Harnessing ISA Diversity: Design of a Heterogeneous-ISA Chip Multiprocessor. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). 121--132. Google Scholar
Digital Library
- S. K. Venkata, I. Ahn, D. Jeon, A. Gupta, C. Louie, S. Garcia, S. Belongie, and M. B. Taylor. 2009. SD-VBS: The San Diego Vision Benchmark Suite. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC) (IISWC’09). IEEE Computer Society, 55--64. Google Scholar
Digital Library
- Y. Watanabe, J. D. Davis, and D. A. Wood. 2010. WiDGET: Wisconsin Decoupled Grid Execution Tiles. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, 2--13. Google Scholar
Digital Library
Index Terms
A Study of Dynamic Phase Adaptation Using a Dynamic Multicore Processor
Recommendations
A machine learning approach to mapping streaming workloads to dynamic multicore processors
LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded SystemsDataflow programming languages facilitate the design of data intensive programs such as streaming applications commonly found in embedded systems. They also expose parallelism that can be exploited using multicore processors which are now part of the ...
A machine learning approach to mapping streaming workloads to dynamic multicore processors
LCTES '16Dataflow programming languages facilitate the design of data intensive programs such as streaming applications commonly found in embedded systems. They also expose parallelism that can be exploited using multicore processors which are now part of the ...
Efficient Nonserial Polyadic Dynamic Programming on the Cell Processor
IPDPSW '11: Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD ForumDynamic programming (DP) is an effective technique for many search and optimization problems. However, the high arithmetic complexity limits its extensive use. Although modern processor architectures with multiple cores and SIMD (single instruction ...






Comments