ABSTRACT
Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism (TLP), and even a small part with limited TLP impose important constraints to the global performance, as explained by Amdahl's law.
In this paper we propose a novel approach for leveraging multiple cores to improve single-thread performance in a multi-core design. The proposed technique features a set of novel hardware mechanisms that support the execution of threads generated at compile time. These threads result from a fine-grain speculative decomposition of the original application and they are executed under a modified multi-core system that includes: (1) mechanisms to support multiple versions; (2) mechanisms to detect violations among threads; (3) mechanisms to reconstruct the original sequential order; and (4) mechanisms to checkpoint the architectural state and recovery to handle misspeculations.
The proposed scheme outperforms previous hardware-only schemes to implement the idea of combining cores for executing single-thread applications in a multi-core design by more than 10% on average on Spec2006 for all configurations. Moreover, single-thread performance is improved by 41% on average when the proposed scheme is used on a Tiny Core, and up to 2.6x for some selected applications.
References
- H. Akkary and M.A. Driscoll, A Dynamic Multithreading Processor, in Proc. of the 31st Int. Symp. on Microarchitecture, 1998 Google Scholar
Digital Library
- S. Balakrishnan, G. Sohi, Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs, in Proc. of the Int. Symp. on Computer Architecture, pp. 302--313, 2006 Google Scholar
Digital Library
- L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing", in Proc. of the 27th Int. Symp. on Computer Architecture, pp. 282--293, June 2000 Google Scholar
Digital Library
- R. Canal, J.-M. Parcerisa, and A. Gonzalez, A Cost-effective Clustered Architecture. in Int. Conf. on Parallel Architectures and Compilation Techniques, pp 160--168, Newport Beach, CA, October 1999 Google Scholar
Digital Library
- M. Cintra, J.F. Martinez and J. Torrellas, Architectural Support for Scalable Speculative Parallelization in Shared-Memory Systems, in Proc. of the 27th Int. Symp. on Computer Architecture, 2000 Google Scholar
Digital Library
- J. D. Collins and D. M. Tullsen, Clustered Multithreaded Architectures - Pursuing Both Ipc and Cycle Time, in Int. Parallel and Distributed Processing Symp., April 2004Google Scholar
Cross Ref
- J.D. Collins, H. Wang, D.M. Tullsen, C. Hughes, Y-F. Lee, D. Lavery and J.P. Shen, Speculative Precomputation: Long Range Prefetching of Delinquent Loads, in Proc. of the 28th Int. Symp. on Computer Architecture, 2001 Google Scholar
Digital Library
- C. García, C. Madriles, J. Sánchez, P. Marcuello, A. González, D. Tullsen, Mitosis Compiler: An Infrastructure for Speculative Threading Based on Pre-Computation Slices, in Procs. of the Conf. on Programming Language Design and Implementation, 2005 Google Scholar
Digital Library
- S. Gopal, T.N. Vijaykumar, J.E. Smith and G.S. Sohi, Speculative Versioning Cache, in Proc. of the 4th Int. Symp. on High Performance Computer Architecture, 1998 Google Scholar
Digital Library
- L. Hammond, M. Willey and K. Olukotun, Data Speculation Support for a Chip Multiprocessor, in Proc. of the Int. Conf. on Architectural Support for Programming Languages and Operating Systems, 1998 Google Scholar
Digital Library
- E. Ipek, M. Kirman, and N. Kirman. Core fusion: Accommodating Software Diversity in Chip Multiprocessors, In Proc. of the Int. Symp. on Computer Architecture, 2007 Google Scholar
Digital Library
- T. Johnson, R. Eigenmann, and T. Vijaykumar, Min-Cut Program Decomposition for Thread-Level Speculation, in Procs. of Conf. on Programming Language Design and Implementation, 2004 Google Scholar
Digital Library
- J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy, Introduction to the Cell Multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589--604, July 2005 Google Scholar
Digital Library
- G. Karypis, and V. Kumar, Analysis of Multilevel Graph Partitioning, in Procs. of the 7th Supercomputing, 1995 Google Scholar
Digital Library
- B. Kernighan, and S. Lin, An Efficient Heuristic Procedure for Partitioning of Electrical Circuits, in Bell System Technical Journal, 1970Google Scholar
- V. Krishnan and J. Torrellas, Hardware and Software Support for Speculative Execution of Sequential binaries on a Chip-Multiprocessor, in Int. Conf. on Supercomputing, pp. 85--92, 1998 Google Scholar
Digital Library
- F. Latorre, J. Gonzalez, and A. Gonzalez, Back-end Assignment Schemes for Clustered Multithreaded Processors, in Intl. Conf. on Supercomputing, pp 316--325, Malo, France, June-July 2004 Google Scholar
Digital Library
- P. Marcuello, and A. González, Thread-Spawning Schemes for Speculative Multithreaded Architectures, in Procs. of the Symp. on High Performance Computer Architectures, 2002 Google Scholar
Digital Library
- J.F. Martinez, J. Renau, M.C. Huang, M. Prvulovic, and J. Torrellas, Cherry: Checkpointed Early Recycling in Out-of-order Microprocessors, in Procs. of the Int. Symp. on Microarchitecture, November 2002 Google Scholar
Digital Library
- A. Mendelson, J, Mandelblat, S. Gochman, A. Shemer, R. Chabukswar, E. Niemeyer, A. Kumar, "CMP Implementation in Systems Based on the Intel® CoreTM Duo Processor", in Intel Technology Journal, Volume 10, Issue 2, 2006Google Scholar
Cross Ref
- T. Ohsawa, M. Takagi, S. Kawahara, and S. Matsushita, Pinot: Speculative Muti-threading Processor Architecture Exploiting Parallelism over a wide Range of Granularities, in Proc. of the 38th Int. Symp. on Microarchitecture, 2005 Google Scholar
Digital Library
- M. Prvulovic, M. J. Garzarán, L. Rauchwerger, and J. Torrellas, Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization, in Proc. of the 28th Int. Symp. on Computer Architecture, 2001 Google Scholar
Digital Library
- S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. P. Jouppi, CACTI 5.1, Technical Report HPL-2008-20, HP Labs.Google Scholar
- N. Vachharajani, R. Rangan, E. Raman, M. Bridges, G. Ottoni, and D. August, Speculative Decoupled Software Pipelining, in Procs. of the Conference on Parallel Architecture and Compilation Techniques, pp. 49--59, 2007 Google Scholar
Digital Library
- C.B. Zilles and G.S. Sohi, Execution-Based Prediction Using Speculative Slices, in Proc. of the 28th Int. Symp. on Computer Architecture, 2001 Google Scholar
Digital Library
- C.B. Zilles and G.S. Sohi, Master/Slave Speculative Parallelization, in Proc. of the 35th Int. Symp. on Microarchitecture, 2002 Google Scholar
Digital Library
- H. Zhong, S. A. Lieberman, and S. A. Mahlke, Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications. In Int. Symp. on High-Performance Computer Architecture, 2007 Google Scholar
Digital Library
Index Terms
Boosting single-thread performance in multi-core systems through fine-grain multi-threading






Comments