Abstract
Dynamic configuration of application-specific implicit instructions has been proposed to better exploit the available parallelism at the instruction level in pipelined processors. The support of such implicit instruction issue-requires the pipeline to be extended with a trigger table that describes the instruction implicitly issued as a response to a value written into a triggering register by a triggering instruction (which may be an add or sub instruction). In this article, we explore the design optimization of the trigger table to maximize the number of instructions that can be implicitly issued while keeping the limited size of the trigger table. The concept of implicitly issued instruction has been formally defined by considering the inter-basic block analysis of control and data dependencies. A compilation tool chain has been developed to automatically identify the optimization opportunities, taking into account the constraints imposed by control and data dependencies as well as by architectural limitations. The proposed solutions have been applied to the case of a baseline scalar MIPS processor where, for the selected set of benchmarks (DSPStone and Mibench/automotive), we obtained an average speedup of 17%.
- Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. 2006. Compilers: Principles, Techniques, and Tools 2nd Ed. Addison-Wesley Longman Publishing Co., Inc., Boston, MA. Google Scholar
Digital Library
- Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. Computer 35, 2, 59--67. Google Scholar
Digital Library
- Benini, L., Bruni, D., Chinosi, M., Silvano, C., Zaccaria, V., and Zafalon, R. 2002. A framework for modeling and estimating the energy dissipation of VLIW-based embedded systems. Des. Autom. Embed. Sys. 7, 3, 183--203.Google Scholar
Digital Library
- Bracy, A., Prahlad, P., and Roth, A. 2004. Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO37). IEEE Computer Society, Los Alamitos, CA, 18--29. Google Scholar
Digital Library
- Chattopadhyay, A., Leupers, R., Meyr, H., and Ascheid, G. 2008. Language-Driven Exploration and Implementation of Partially Re-configurable ASIPs. Springer, Berlin. Google Scholar
Digital Library
- Corporaal, H. 1997. Microprocessor Architectures: From VLIW to Tta. John Wiley & Sons, New York, NY. Google Scholar
Digital Library
- Gathaus, M. R., Ringenberg, J. S., Ernst, D., Austen, T. M., Mudge, T., and Brown, R. B. 2001. Mibench: A free commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization. Google Scholar
Digital Library
- Gochman, S., Ronen, R., Anati, I., Berkovits, A., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R. C. 2003. The Intel® Pentium M® processor: Microarchitecture and performance. Intel Technol. J. 7, 2, 21--59.Google Scholar
- Gordon-Ross, A. and Vahid, F. 2006. Frequent loop detection using efficient nonintrusive on-chip hardware. IEEE Trans. Comput. 54, 10, 1203--1215. Google Scholar
Digital Library
- Heinrich, J. 1993. MIPS R4000 Microprocessor User’s Manual. Prentice-Hall PTR, Upper Saddle River, NJ.Google Scholar
- Hrishikesh, M. S., Burger, D., Jouppi, N. P., Keckler, S. W., Farkas, K. I., and Shivakumar, P. 2002. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA’02). IEEE Computer Society, Los Alamitos, CA, 14--24. Google Scholar
Digital Library
- Hu, S., Kim, I., Lipasti, M. H., and Smith, J. E. 2006. An approach for implementing efficient superscalar cisc processors. In the 12th International Symposium on High-Performance Computer Architecture. 41--52.Google Scholar
- Hu, S. and Smith, J. E. 2004. Using dynamic binary translation to fuse dependent instructions. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’04). IEEE Computer Society, Los Alamitos, CA, 213. Google Scholar
Digital Library
- Kreahling, W., Hines, S., Whalley, D., and Tyson, G. 2006. Reducing the cost of conditional transfers of control by using comparison specifications. SIGPLAN Not. 41, 7, 64--71. Google Scholar
Digital Library
- Krishnaswamy, A. and Gupta, R. 2005. Dynamic coalescing for 16-bit instructions. ACM Trans. Embed. Comput. Sys. 4, 1, 3--37. Google Scholar
Digital Library
- Rixner, S., Dally, W., Khailany, B., Mattson, P., Kapasi, U., and Owens, J. 2000. Register organization for media processing. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).Google Scholar
- Sassone, P. G. and Wills, D. S. 2004. Dynamic strands: Collapsing speculative dependence chains for reducing pipeline communication. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’37). IEEE Computer Society, Los Alamitos, CA, 7--17. Google Scholar
Digital Library
- Sykora, M., Agosta, G., and Silvano, C. 2008. Dynamic configuration of application-specific implicit instructions for embedded pipelined processors. In SAC ’08: Proceedings of the ACM Symposium on Applied Computing (SAC’08). ACM, New York, NY, 1509--1516. Google Scholar
Digital Library
- Zivojnovic, V., Velarde, J. M., Schläger, C., and Meyr, H. 1994. DSPstone--A DSP-oriented benchmarking methodology. In Proceedings of the International Conference on Signal Processing Applications and Technology (ICSPAT).Google Scholar
Index Terms
Architecture Optimization of Application-Specific Implicit Instructions
Recommendations
Dynamic configuration of application-specific implicit instructions for embedded pipelined processors
SAC '08: Proceedings of the 2008 ACM symposium on Applied computingIn this paper, we propose the dynamic configuration of application specific implicit instructions for pipelined processors to better exploit the available parallelism at instruction level. Given the target application, the compiler selects a set of ...
Architectural Considerations for Application-Specific Counterflow Pipelines
ARVLSI '99: Proceedings of the 20th Anniversary Conference on Advanced Research in VLSIApplication-specific processor design is a promising approach for meeting the performance and cost goals of a system. Application- specific processors are especially promising for embedded systems (e.g., digital cameras, cellular phones, etc.) where a ...
Dynamic coalescing for 16-bit instructions
In the embedded domain, memory usage and energy consumption are critical constraints.Embedded processors such as the ARM and MIPS provide a 16-bit instruction set, (called Thumb in the case of the ARM family of processors), in addition to the 32-bit ...






Comments