Abstract
A new generation of applications requires reduced power consumption without sacrificing performance. Instruction pipelining is commonly used to meet application performance requirements, but some implementation aspects of pipelining are inefficient with respect to energy usage. We propose static pipelining as a new instruction set architecture to enable more efficient instruction flow through the pipeline, which is accomplished by exposing the pipeline structure to the compiler. While this approach simplifies hardware pipeline requirements, significant modifications to the compiler are required. This paper describes the code generation and compiler optimizations we implemented to exploit the features of this architecture. We show that we can achieve performance and code size improvements despite a very low-level instruction representation. We also demonstrate that static pipelining of instructions reduces energy usage by simplifying hardware, avoiding many unnecessary operations, and allowing the compiler to perform optimizations that are not possible on traditional architectures.
- T. Austin, E. Larson, and D. Ernst. SimpleScalar: An Infrastructure for Computer System Modeling. Computer, 35(2):59--67, 2002. Google Scholar
Digital Library
- M. Benitez and J. Davidson. A Portable Global Optimizer and Linker. ACM SIGPLAN Notices, 23(7):329--338, 1988. Google Scholar
Digital Library
- A. Bright, J. Fritts, and M. Gschwind. Decoupled fetch-execute engine with static branch prediction support. Technical report, IBM Research Report RC23261, IBM Research Division, 1999.Google Scholar
- H. Corporaal and M. Arnold. Using Transport Triggered Architectures for Embedded Processor Design. Integrated Computer-Aided Engineering, 5(1):19--38, 1998. Google Scholar
Digital Library
- W. Dally. Micro-optimization of floating-point operations. In Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems, pages 283--289, 1989. Google Scholar
Digital Library
- I. Finlayson, G. Uh, D. Whalley, and G. Tyson. An Overview of Static Pipelining. Computer Architecture Letters, 11(1):17--20, 2012. Google Scholar
Digital Library
- Finlayson, I. and Uh, G. and Whalley, D. and Tyson, G. Improving Low Power Processor Efficiency with Static Pipelining. In Proceedings of the 15th Workshop on Interaction between Compilers and Computer Architectures, 2011. Google Scholar
Digital Library
- J. Fisher. VLIW Machine: A Multiprocessor for Compiling Scientific Code. Computer, 17(7):45--53, 1984. Google Scholar
Digital Library
- C. Fraser. A retargetable compiler for ansi c. ACM Sigplan Notices, 26(10):29--43, 1991. Google Scholar
Digital Library
- M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, T. Mudge, and R. Brown. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on, pages 3--14. IEEE, 2002. Google Scholar
Digital Library
- Y. He, D. She, B. Mesman, and H. Corporaal. Move-pro: A low power and high code density TTA architecture. In International Conference on Embedded Computer Systems, pages 294--301, July 2011.Google Scholar
Cross Ref
- T. Hoang-Thanh, U. Jälmbrant, E. Hagopian, K. P. Subramaniyan, M. SjĠlander, and P. Larsson-Edefors. Design Space Exploration for an Embedded Processor with Flexible Datapath Interconnect. In Proceedings of IEEE International Conference on Application-Specific Systems, Architectures and Processors, pages 55--62, July 2010.Google Scholar
- A. Ltd. Arm thumb-2 core technology. http://infocenter.arm.com /help/index.jsp?topic= /com.arm.doc.dui0471c /CHDFEDDB.html, June 2012.Google Scholar
- M. Reshadi, B. Gorjiara, and D. Gajski. Utilizing horizontal and vertical parallelism with a no-instruction-set compiler for custom datapaths. In ICCD '05: Proceedings of the 2005 International Conference on Computer Design, pages 69--76, Washington, DC, USA, 2005. IEEE Computer Society. Google Scholar
Digital Library
- P. Sassone, D. Wills, and G. Loh. Static Strands: Safely Collapsing Dependence Chains for Increasing Embedded Power Efficiency. In Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pages 127--136. ACM, 2005. Google Scholar
Digital Library
- A. Sethi and J. Ullman. Compilers: Principles Techniques and Tools. Addision Wesley Longman, 2000.Google Scholar
- S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. Jouppi. Cacti 5.1. Technical report, HP Laboratories, Palo Alto, Apr. 2008.Google Scholar
- M. Thuresson, M. Själander, M. Björk, L. Svensson, P. Larsson-Edefors, and P. Stenstrom. Flexcore: Utilizing exposed datapath control for efficient computing. Journal of Signal Processing Systems, 57(1):5--19, 2009. Google Scholar
Digital Library
- J. H. Tseng and K. Asanovic. Energy-efficient register access. In SBCCI '00: Proceedings of the 13th symposium on Integrated circuits and systems design, page 377, Washington, DC, USA, 2000. IEEE Computer Society. Google Scholar
Digital Library
- M. Wilkes and J. Stringer. Micro-Programming and the Design of the Control Circuits in an Electronic Digital Computer. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 49, pages 230--238. Cambridge Univ Press, 1953.Google Scholar
Cross Ref
Index Terms
Improving processor efficiency by statically pipelining instructions
Recommendations
Improving processor efficiency by statically pipelining instructions
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsA new generation of applications requires reduced power consumption without sacrificing performance. Instruction pipelining is commonly used to meet application performance requirements, but some implementation aspects of pipelining are inefficient with ...
Improving processor efficiency by statically pipelining instructions
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsA new generation of applications requires reduced power consumption without sacrificing performance. Instruction pipelining is commonly used to meet application performance requirements, but some implementation aspects of pipelining are inefficient with ...







Comments