ABSTRACT
While technology trends have ushered in the age of chip multiprocessors (CMP), a fundamental question is what size to make each core. Most current commercial designs are symmetric CMPs (SCMP) in which each core is identical and range from a simple RISC processor to a complex out-of-order x86 processor. Some researchers have proposed asymmetric CMPs (ACMP) consisting of multiple types of cores. While less of an issue for ACMPs, the fixed nature of both these architectures makes them vulnerable to mismatches between the granularity of the cores and the parallelism in the workload, which can cause inefficient execution. To remedy this weakness, recent research has proposed flexible-core CMPs (FCMP), which have the capability of aggregating multiple small processing cores to form larger logical processors. FCMPs introduce a new resource allocation and scheduling problem which must determine how many logical processors should be configured, how powerful each processor should be, and where/when each task should run. This paper introduces and motivates this problem, describes the challenges associated with it, and evaluates algorithms appropriate for multitasking on FCMPs. We also evaluate static-core CMPs of various configurations and compare them to FCMPs for various multitasking workloads.
- M. Annavaram, E. Grochowski, and J. Shen. Mitigating Amdahl's Law Through EPI Throttling. In International Symposium on Computer Architecture, pages 298--309, June 2005. Google Scholar
Digital Library
- S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The Impact of Performance Asymmetry in Emerging Multicore Architectures. In International Symposium on Computer Architecture, pages 506--517, June 2005. Google Scholar
Digital Library
- D. Burger, S. Keckler, K. McKinley, M. Dahlin, L. John, C. Lin, C. Moore, J. Burrill, R. McDonald, and W. Yoder. Scaling to the End of Silicon with EDGE Architectures. IEEE Computer, 37(7):44--55, 2004. Google Scholar
Digital Library
- J. Corbalan, X. Martorell, and J. Labarta. Performance-Driven Processor Allocation. IEEE Transactions on Parallel and Distributed Systems, 16(7):599--611, July 2005. Google Scholar
Digital Library
- J. Dorsey, S. Searles, M. Ciraula, E. Fang, S. Johnson, N. Bujanos, R. Kumar, D. Wu, M. Braganza, and S. Meyers. An Integrated Quad-Core Opteron(TM) Processor. In IEEE International Solid-State Circuits Conference, pages 102--103, February 2007.Google Scholar
- D. Feitelson, L. Rudolph, and U. Schwiegelshohn. Parallel Job Scheduling -- A Status Report. In Workshop on Job Scheduling Strategies for Parallel Processing, June 2004.Google Scholar
- D. G. Feitelson. Job Scheduling in Multiprogrammed Parallel Systems. Technical Report RC 19790 (87657), IBM Research, August 1997.Google Scholar
- D. G. Feitelson and L. Rudolph. Metrics and Benchmarking for Parallel Job Scheduling. In Workshop on Job Scheduling Strategies for Parallel Processing, pages 1--24, 1998. Google Scholar
Digital Library
- S. Ghiasi and D. Grunwald. Aide de Camp: Asymmetric Dual Core Design for Power and Energy Reduction. Technical Report CU-CS-964-03, The University of Colorado, Department of Computer Science, 2003.Google Scholar
- E. Grochowski, R. Ronen, J. Shen, and H. Wang. Best of Both Latency and Throughput. In International Conference on Computer Design, pages 236--243, October 2004. Google Scholar
Digital Library
- T. Ibaraki and N. Katoh. Resource Allocation Problems: Algorithmic Approaches. MIT Press, 1988. Google Scholar
Digital Library
- E. Ipek, M. Kirman, N. Kirman, and J. F. Martínez. Core Fusion: Accommodating Software Diversity in Chip Multiprocessors. In International Symposium on Computer Architecture, pages 186--197, June 2007. Google Scholar
Digital Library
- C. Kim, D. Burger, and S. W. Keckler. An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 211--222, October 2002. Google Scholar
Digital Library
- C. Kim, S. Sethumadhavan, M. Govindan, N. Ranganathan, D. Gulati, D. Burger, and S. W. Keckler. Composable Lightweight Processors. In International Symposium on Microarchitecture, pages 381--394, December 2007. Google Scholar
Digital Library
- R. Kumar, K. Farkas, N. Jouppi, P. Ranganathan, and D. Tullsen. Single-ISA Heterogeneous Multi-core Architectures: The Potential for Processor Power Reduction. In International Symposium on Microarchitecture, pages 81--92, December 2003. Google Scholar
Digital Library
- R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In International Symposium on Computer Architecture, pages 64--75, June 2004. Google Scholar
Digital Library
- U. Nawathe, M. Hassan, K. Yen, L. Warriner, B. Upputuri, D. Greenhill, A. Kumar, and H. Park. An 8-Core 64-Thread 64b Power-Efficient SPARC SoC. In IEEE International Solid-State Circuits Conference, pages 108--109, February 2007.Google Scholar
- D. Pham, T. Aipperspach, D. Boerstler, M. Bolliger, R. Chaudhry, D. Cox, P. Harvey, P. Harvey, H. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Pham, J. Pille, S. Posluszny, M. Riley, D. Stasiak, M. Suzuoki, O. Takahashi, J. Warnock, S. Weitzel, D. Wendel, and K. Yazawa. Overview of the Architecture, Circuit Design, and Physical Implementation of a First-Generation Cell Processor. IEEE Journal of Solid-State Circuits, 41(1):179--196, January 2006.Google Scholar
Cross Ref
- T. Sherwood, E. Perelman, G. Hamerly, S. Sair, and B. Calder. Discovering and Exploiting Program Phases. IEEE Micro, 23(6):84--93, November/December 2003. Google Scholar
Digital Library
- D. Tarjan, M. Boyer, and K. Skadron. Federation: Out-of-Order Execution Using Simple In-Order Cores. Technical Report CS-2007-11, University of Virginia, Department of Computer Science, August 2007.Google Scholar
- S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman, Y. Hoskote, and N. Borkar. An 80-Tile 1.28 TFLOPS Network-on-Chip in 65nm CMOS. In IEEE International Solid-State Circuits Conference, pages 98--99, February 2007.Google Scholar
- H. Zhong, S. A. Lieberman, and S. A. Mahlke. Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications. In International Symposium on High Performance Computer Architecture, pages 25--36, February 2007. Google Scholar
Digital Library
Index Terms
Multitasking workload scheduling on flexible-core chip multiprocessors
Recommendations
Multitasking workload scheduling on flexible core chip multiprocessors
While technology trends have ushered in the age of chip multiprocessors (CMP) and enabled designers to place an increasing number of cores on chip, a fundamental question is what size to make each core. Most current commercial designs are symmetric CMPs ...
On-Chip Interconnection Networks of the TRIPS Chip
The TRIPS chip prototypes two networks on chip to demonstrate the viability of a routed interconnection fabric for memory and operand traffic. In a 170-million-transistor custom ASIC chip, these NoCs provide system performance within 28 percent of ideal ...
Flexible Reconfigurable On-chip Networks for Multi-core SoCs
HEART '18: Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable TechnologiesMulti and many-core embedded SoCs (System-on-Chip) provide key solutions to meet the extraordinary demands of current and future applications. This fact becomes critical when the chip design dives to the limitation of sub-nanometer technologies that ...





Comments