ABSTRACT
Asymmetric multicore processors have demonstrated a strong potential for improving performance and energy-efficiency. Shared-ISA asymmetric multicore processors overcome programmability problems in disjoint-ISA systems and enhance single-ISA architectures with instruction based asymmetry. In such a design, processors share a common, baseline ISA and performance enhanced (PE) cores extend the baseline ISA with instructions that accelerate performance-critical operations. To exploit asymmetry, the scheduler should be able to migrate threads based on their acceleration potential.
The contribution of this paper is a low overhead binary code rewriting method for shared-ISA multicore processors that transforms a binary executable at runtime, according to the scheduled processor's PE capabilities. The mutable binary code can be re-targeted among heterogeneous cores at any point in execution while preserving functional equivalence and using PE instructions, transparently, when available, thus enabling migrations among heterogeneous cores. We emulate a realistic shared-ISA asymmetric multicore system using actual hardware -- an FPGA experimental prototype. Experimental analysis shows that dynamic binary rewriting is feasible with little overhead. Rewritten code speeds up successfully baseline code while performing close, with 70% average efficiency, to non-portable, compiler generated code, statically optimized to use PE instructions.
- V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In PLDI, pages 1--12, 2000. Google Scholar
Digital Library
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, pages 44--54, 2009. Google Scholar
Digital Library
- M. DeVuyst, A. Venkat, and D. M. Tullsen. Execution migration in a heterogeneous-isa chip multiprocessor. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '12, pages 261--272, New York, NY, USA, 2012. ACM. Google Scholar
Digital Library
- L. C. Harris and B. P. Miller. Practical analysis of stripped binary code. SIGARCH Comput. Archit. News, 33: 63--68, December 2005. Google Scholar
Digital Library
- J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34: 1--17, Sept. 2006. Google Scholar
Digital Library
- D. A. Koufaty, D. Reddy, and S. Hahn. Bias scheduling in heterogeneous multi-core architectures. In EuroSys, pages 125--138, 2010. Google Scholar
Digital Library
- T. Li, P. Brett, R. C. Knauerhase, D. A. Koufaty, D. Reddy, and S. Hahn. Operating system support for overlapping-isa heterogeneous multi-core architectures. In HPCA, pages 1--12, 2010.Google Scholar
- J. Lu, H. Chen, P.-C. Yew, and W. chung Hsu. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6: 2004, 2004.Google Scholar
- R. Muth, S. K. Debray, S. Watterson, and K. De Bosschere. alto: a link-time optimizer for the compaq alpha. Software: Practice and Experience, 31(1): 67--101, 2001. Google Scholar
Digital Library
- E. B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, and G. C. Hunt. Helios: heterogeneous multiprocessing with satellite kernels. In SOSP, pages 221--234, 2009. Google Scholar
Digital Library
- D. Reddy, D. A. Koufaty, P. Brett, and S. Hahn. Bridging functional heterogeneity in multicore architectures. Operating Systems Review, 45(1): 21--33, 2011. Google Scholar
Digital Library
- J. C. Saez, D. Shelepov, A. Fedorova, and M. Prieto. Leveraging workload diversity through os scheduling to maximize performance on single-isa heterogeneous multicore systems. J. Parallel Distrib. Comput., 71(1): 114--131, 2011. Google Scholar
Digital Library
- D. Shelepov, J. C. Saez, S. Jeffery, A. Fedorova, N. Perez, Z. F. Huang, S. Blagodurov, and V. Kumar. Hass: a scheduler for heterogeneous multicore systems. Operating Systems Review, 43(2): 66--75, 2009. Google Scholar
Digital Library
- L. Van Put, D. Chanet, B. De Bus, B. De Sutter, and K. De Bosschere. DIABLO: a reliable, retargetable and extensible link-time rewriting framework. In International Symposium on Signal Processing and Information Technology, pages 7--12, 2005.Google Scholar
Cross Ref
Index Terms
Fast dynamic binary rewriting to support thread migration in shared-ISA asymmetric multicores
Recommendations
Dynamic binary rewriting and migration for shared-ISA asymmetric, multicore processors: summary
HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed ComputingScalable Thread Scheduling in Asymmetric Multicores for Power Efficiency
SBAC-PAD '12: Proceedings of the 2012 IEEE 24th International Symposium on Computer Architecture and High Performance ComputingThe emergence of asymmetric multicore processors(AMPs) has elevated the problem of thread scheduling in such systems. The computing needs of a thread often vary during its execution (phases) and hence, reassigning threads to cores(thread swapping) upon ...
Dynamic Thread Scheduling in Asymmetric Multicores to Maximize Performance-per-Watt
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD ForumRecent trends in technology scaling have enabled the incorporation of multiple processor cores on a single die. Depending on the characteristics of the cores, the multicore may be either symmetric (SMP) or asymmetric (AMP). Several studies have shown ...





Comments