Abstract
In order to satisfy the growing demand for high-performance computing in modern embedded devices, several architectural and microarchitectural enhancements have been implemented in processor architectures. Extended instruction (EI) is often used for architectural enhancement, while issuing multiple instructions is a common approach for microarchitectural enhancement. The impact of combining both of these approaches in the same design is not well understood. While previous studies have shown that EI can potentially improve performance in some applications on certain multiple-issue architectures, the algorithms used to identify EI for multiple-issue architectures yield only limited performance improvement. This is because not all arithmetic operations are suited for EI for multiple-issue architectures. To explore the full potential of EI for multiple-issue architectures, two important factors need to be considered: (1) the execution performance of an application is dominated by critical (located on the critical path) and highly resource-contentious (i.e., having a high probability of being delayed during execution due to hardware resource limitations) operations, and (2) an operation may become critical and/or highly resource contentious after some operations are added to the EI. This article presents an EI exploration algorithm for multiple-issue architectures that focuses on these two factors. Simulation results show that the proposed algorithm outperforms previously published algorithms.
- ALTERA CORP. 2004. Nios II Processor Reference Handbook. http://www.altera.com/literature/lit-nio2.jsp.Google Scholar
- K. Atasu, L. Pozzi, and P. Ienne. 2003. Automatic application-specific instruction-set extensions under microarchitectural constraints. In Proceedings of the 40th Design Automation Conference (DAC). 256--261. Google Scholar
Digital Library
- P. Biswas, S. Banerjee, N. Dutt, L. Pozzi, and P. Ienne. 2006. ISEGEN: An iterative improvement-based ISE generation technique for fast customization of processors. IEEE Trans. Integ. VLSI Syst. 14, 7, 754--762. Google Scholar
Digital Library
- N. T. Clark, H. Zhong, K. Fan, S. Mahlke, K. Flautner, and V. Nieuwenhove. 2004. OptimoDE: Programmable accelerator engines through retargetable customization. In Proceedings of the Symposium on High Performance Chips (HotChips).Google Scholar
- N. T. Clark, H. Zhong, and S. A. Mahlk. 2005. Automated custom instruction generation for domain-specific processor acceleration. IEEE Trans. Comput. 54, 10, 1258--1270. Google Scholar
Digital Library
- G. David, M. A. Ertl, and A. Krall. 2001. A fast Java interpreter. In Proceedings of the Java Optimization Strategies for Embedded Systems Workshop (JOSES).Google Scholar
- P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood. 2000. LX: A technology platform for customizable VLIW embedded processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA). 203--213. Google Scholar
Digital Library
- C. Galuzzi and K. Bertels. 2011. The instruction-set extension problem: A survey. ACM Trans. Reconfi. Technol. Syst. 18. Google Scholar
Digital Library
- D. Goodwin and D. Petkov. 2003. Automatic generation of application specific processors. In Proceedings of the International Conference on Compilers Architectures and Synthesis for Embedded Systems (CASES). 137--147. Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brow. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization (WWC). 3--14. Google Scholar
Digital Library
- T. R. Halfhill. 2000. ARC cores encourages plug-ins. Microprocess. Rep. 14, 4, 42--44.Google Scholar
- T. R. Halfhill. 2003a. MIPS embraces configurable technology. Microprocess. Rep.Google Scholar
- T. R. Halfhill. 2003b. Tensilica's software makes hardware. Microprocess. Rep.Google Scholar
- D. Jain, A. Kumar, L. Pozzi, and P. Ienne. 2004. Automatically customising VLIW architectures with coarse grained application-specific functional units. In Proceedings of the 8th International Workshop on Software and Compilers for Embedded Systems (SCOPES). 17--32.Google Scholar
- C. Lattner. 2002. LLVM: An infrastructure for multi-stage optimization. Master's thesis. Computer Science Dept., University of Illinois at Urbana-Champaign, IL.Google Scholar
- C. Liem, T. May, and P. Paulin. 1994. Instruction-set matching and selection for DSP and ASIP code generation. In Proceedings of the European Design and Test Conference (ED&TC). 31--37.Google Scholar
- A. Lodi, M. Toma, F. Campi, A. Cappelli, R. Canegallo, and R. Guerrieri. 2003. A VLIW processor with reconfigurable instruction set for embedded applications. IEEE J. Solid-State Circuits. 38, 11, 1876--1886.Google Scholar
Cross Ref
- Y. S. Lü, L. Shen, L. Huang, Z. Y. Wang, and N. Xiao. 2008. Customizing computation accelerators for extensible multi-issue processors with effective optimization techniques. In Proceedings of the 45th Annual Design Automation Conference (DAC). 197--200. Google Scholar
Digital Library
- L. Pozzi and P. Ienne. 2006. Automatic instruction set extension. In Customizable Embedded Processors, Morgan Kaufmann, San Mateo, CA.Google Scholar
- L. Pozzi, K. Atasu, and P. Ienne. 2006. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Trans. Comput.-Aid. Des. Integ. Circuits Syst. 25, 7, 1209--1229. Google Scholar
Digital Library
- D. S. Rao and F. J. Kurdahi. 1992. Partitioning by regularity extraction. In Proceedings of the 29th Design Automation Conference (DAC). 235--238. Google Scholar
Digital Library
- V. S. Reddy. 2006. Exploring VLIW ASIP design space using trimaran based framework. Master's thesis. Department of Computer Science and Engineering, Indian Institute of Technology Delhi.Google Scholar
- M. A. R. Saghir, M. El-Majzoub, and P. Alk. 2007. Customizing the datapath and ISA of soft VLIW processors. In Proceedings of the 2nd International Conference an High Performance Embedded Architectures and Compilers (HiPEAC). Lecture Notes in Computer Science, vol. 4367, Spring, 276--290. Google Scholar
Digital Library
- W. Stephan, T. V. As, and G. Brown. 2008. ρ-VEX: A reconfigurable and extensible softcore VLIW processor. In Proceedings of the IEEE International Conference on Field-Programmable Technologies (ICFPT). 369--372.Google Scholar
- F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha. 2002. Synthesis of custom processors based on extensible platforms. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 641--648. Google Scholar
Digital Library
- I. W. Wu, S.-C. Huang, C.-P. Chung, and J.-J. Shann. 2007. Instruction set extension generation with considering physical constraints. In Proceedings of the 2nd International Conference on High Performance Embedded Architecture and Compilers. Lecture Notes in Computer Science, vol. 4367, Springer-Verlag, Berlin Heidelberg, 291--305. Google Scholar
Digital Library
- P. Yu and T. Mitra. 2007. Disjoint pattern enumeration for custom instructions identification. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). 273--278.Google Scholar
Index Terms
Extended Instruction Exploration for Multiple-Issue Architectures
Recommendations
A high performance processor architecture for multimedia applications
In this paper, an efficient sub-word parallelism (SWP)-enabled Reduced instruction-set Computer (RISC) architecture is proposed. The proposed architecture can perform efficiently for both conventional and multimedia-oriented applications. Speed-up for ...
A power-driven multiplication instruction-set design method for ASIPs
This paper presents a novel power-driven multiplication instruction-set design method for application-specific instruction-set processors (ASIPs). Based on a dual-and-configurable-multiplier structure, our proposed method devises a multiplication ...
Limits on multiple instruction issue
Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systemsThis paper investigates the limitations on designing a processor which can sustain an execution rate of greater than one instruction per cycle on highly-optimized, non-scientific applications. We have used trace-driven simulations to determine that ...






Comments