Abstract
The TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency. EDGE ISAs use a block-atomic execution model in which blocks are composed of dataflow instructions. The goal of the TRIPS design is to mine concurrency for high performance while tolerating emerging technology scaling challenges, such as increasing wire delays and power consumption. This paper evaluates how well TRIPS meets this goal through a detailed ISA and performance analysis. We compare performance, using cycles counts, to commercial processors. On SPEC CPU2000, the Intel Core 2 outperforms compiled TRIPS code in most cases, although TRIPS matches a Pentium 4. On simple benchmarks, compiled TRIPS code outperforms the Core 2 by 10% and hand-optimized TRIPS code outperforms it by factor of 3. Compared to conventional ISAs, the block-atomic model provides a larger instruction window, increases concurrency at a cost of more instructions executed, and replaces register and memory accesses with more efficient direct instruction-to-instruction communication. Our analysis suggests ISA, microarchitecture, and compiler enhancements for addressing weaknesses in TRIPS and indicates that EDGE architectures have the potential to exploit greater concurrency in future technologies.
- N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. The M5 Simulator: Modeling Networked Systems. In IEEE Micro, pages 52--60, July/August 2006. Google Scholar
Digital Library
- D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, W. Yoder, and the TRIPS Team. Scaling to the End of Silicon with EDGE Architectures. IEEE Computer, 37(7):44--55, July 2004. Google Scholar
Digital Library
- K. Coons, X. Chen, S. Kushwaha, D. Burger, and K. McKinley. A Spatial Path Scheduling Algorithm for EDGE Architectures. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 129--140, October 2006. Google Scholar
Digital Library
- J. Diamond, B. Robatmili, S. W. Keckler, K. Goto, D. Burger, and R. van de Geijn. High Performance Dense Linear Algebra on Spatially Partitioned Processors. In Symposium on Principles and Practice of Parallel Programming, pages 63--72, February 2008. Google Scholar
Digital Library
- http://www.eembc.org.Google Scholar
- M. Gebhart et al. An Evaluation of the TRIPS Computer Systems (Extended Technical Report). Technical Report TR-08-31, Department of Computer Sciences, The University of Texas at Austin, December 2008.Google Scholar
- K. Goto and R. A. van de Geijn. Anatomy of High-Performance Matrix Multiplication. ACM Transactions on Mathematical Software, 34(12):4--29, May 2008. Google Scholar
Digital Library
- P. Gratz, K. Sankaralingam, H. Hanson, P. Shivakumar, R. McDonald, S. W. Keckler, and D. Burger. Implementation and Evaluation of a Dynamically Routed Processor Operand Network. In International Symposium on Networks-on-Chip, pages 7--17, May 2007. Google Scholar
Digital Library
- M. D. Hill and M. R. Marty. Amdahl's Law in the Multicore Era. IEEE Computer, 41(7):33--38, July 2008. Google Scholar
Digital Library
- D. Jiménez. Piecewise Linear Branch Prediction. In International Symposium on Computer Architecture, pages 382--393, June 2005. Google Scholar
Digital Library
- C. Kim, D. Burger, and S. W. Keckler. An Adaptive Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 211--222, October 2002. Google Scholar
Digital Library
- C. Kim, S. Sethumadhavan, M. Govindan, N. Ranganathan, D. Gulati, S. W. Keckler, and D. Burger. Composable Lightweight Processors. In International Symposium on Microarchitecture, pages 281--294, December 2007. Google Scholar
Digital Library
- B. Maher, A. Smith, D. Burger, and K. S. McKinley. Merging Head and Tail Duplication for Convergent Hyperblock Formation. In International Symposium on Microarchitecture, pages 65--76, December 2006. Google Scholar
Digital Library
- S.Melvin and Y. Patt. Enhancing Instruction SchedulingWith a Block-Structured ISA. International Journal on Parallel Processing, 23(3):221--243, June 1995. Google Scholar
Digital Library
- A. Moshovos and G. S. Sohi. Speculative Memory Cloaking and Bypassing. International Journal of Parallel Programming, 27(6):427--456, December 1999. Google Scholar
Digital Library
- R. Nagarajan, K. Sankaralingam, D. Burger, and S. W. Keckler. A Design Space Evaluation of Grid Processor Architectures. In International Symposium on Microarchitecture, pages 40--51, December 2001. Google Scholar
Digital Library
- PAPI: Performance Application Programming Interface. http://icl.cs.utk.edu/papi.Google Scholar
- R. M. Rabbah, I. Bratt, K. Asanovic, and A. Agarwal. Versatility and VersaBench: A New Metric and a Benchmark Suite for Flexible Architectures. Technical Report TM-646, Laboratory for Computer Science, Massachusetts Institute of Technology, June 2004.Google Scholar
- B. Robatmili, K. E. Coons, D. Burger, and K. S. McKinley. Strategies for Mapping Data Flow Blocks to Distributed Hardware. In International Symposium on Microarchitecture, pages 23--34, November 2008. Google Scholar
Digital Library
- K. Sankaralingam, R. Nagarajan, R. McDonald, R. Desikan, S. Drolia, M. S. Govindan, P. Gratz, D. Gulati, H. Hanson, C. Kim, H. Liu, N. Ranganathan, S. Sethumadhavan, S. Sharif, P. Shivakumar, S. W. Keckler, and D. Burger. Distributed Microarchitectural Protocols in the TRIPS Prototype Processor. In International Symposium on Microarchitecture, pages 480--491, December 2006. Google Scholar
Digital Library
- A. Seznec and P. Michaud. A Case for (Partially) TAgged GEometric History Length Branch Prediction. Journal of Instruction-Level Parallelism, Vol. 8, February 2006.Google Scholar
- T. Sherwood, E. Perelman, and B. Calder. Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications. In International Conference on Parallel Architectures and Compilation Techniques, pages 3--14, September 2001. Google Scholar
Digital Library
- A. Smith, J. Gibson, B. Maher, N. Nethercote, B. Yoder, D. Burger, K. S. McKinley, and J. Burrill. Compiling for EDGE Architectures. In International Symposium on Code Generation and Optimization, pages 185--195, March 2006. Google Scholar
Digital Library
- A. Smith, R. Nagarajan, K. Sankaralingam, R. McDonald, D. Burger, S.W. Keckler, and K. S. McKinley. Dataflow Predication. In International Symposium on Microarchitecture, pages 89--102, December 2006. Google Scholar
Digital Library
- http://www.spec.org.Google Scholar
- B. Yoder, J. Burrill, R. McDonald, K. Bush, K. Coons, M. Gebhart, M. Govindan, B. Maher, R. Nagarajan, B. Robatmili, K. Sankaralingam, S. Sharif, A. Smith, D. Burger, S. W. Keckler, and K. S. McKinley. Software Infrastructure and Tools for the TRIPS Prototype. In Workshop on Modeling, Benchmarking and Simulation, June 2007.Google Scholar
Index Terms
An evaluation of the TRIPS computer system
Recommendations
An evaluation of the TRIPS computer system
ASPLOS 2009The TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency. EDGE ISAs use a block-atomic execution model in ...
An evaluation of the TRIPS computer system
ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systemsThe TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency. EDGE ISAs use a block-atomic execution model in ...
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP
This paper describes the polymorphous TRIPS architecture that can be configured for different granularities and types of parallelism. The TRIPS architecture is the first in a class of post-RISC, dataflow-like instruction sets called explicit data-graph ...







Comments