skip to main content
research-article

Architecture Optimization of Application-Specific Implicit Instructions

Published:01 August 2012Publication History
Skip Abstract Section

Abstract

Dynamic configuration of application-specific implicit instructions has been proposed to better exploit the available parallelism at the instruction level in pipelined processors. The support of such implicit instruction issue-requires the pipeline to be extended with a trigger table that describes the instruction implicitly issued as a response to a value written into a triggering register by a triggering instruction (which may be an add or sub instruction). In this article, we explore the design optimization of the trigger table to maximize the number of instructions that can be implicitly issued while keeping the limited size of the trigger table. The concept of implicitly issued instruction has been formally defined by considering the inter-basic block analysis of control and data dependencies. A compilation tool chain has been developed to automatically identify the optimization opportunities, taking into account the constraints imposed by control and data dependencies as well as by architectural limitations. The proposed solutions have been applied to the case of a baseline scalar MIPS processor where, for the selected set of benchmarks (DSPStone and Mibench/automotive), we obtained an average speedup of 17%.

References

  1. Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. 2006. Compilers: Principles, Techniques, and Tools 2nd Ed. Addison-Wesley Longman Publishing Co., Inc., Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. Computer 35, 2, 59--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Benini, L., Bruni, D., Chinosi, M., Silvano, C., Zaccaria, V., and Zafalon, R. 2002. A framework for modeling and estimating the energy dissipation of VLIW-based embedded systems. Des. Autom. Embed. Sys. 7, 3, 183--203.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bracy, A., Prahlad, P., and Roth, A. 2004. Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO37). IEEE Computer Society, Los Alamitos, CA, 18--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chattopadhyay, A., Leupers, R., Meyr, H., and Ascheid, G. 2008. Language-Driven Exploration and Implementation of Partially Re-configurable ASIPs. Springer, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Corporaal, H. 1997. Microprocessor Architectures: From VLIW to Tta. John Wiley & Sons, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gathaus, M. R., Ringenberg, J. S., Ernst, D., Austen, T. M., Mudge, T., and Brown, R. B. 2001. Mibench: A free commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gochman, S., Ronen, R., Anati, I., Berkovits, A., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R. C. 2003. The Intel® Pentium M® processor: Microarchitecture and performance. Intel Technol. J. 7, 2, 21--59.Google ScholarGoogle Scholar
  9. Gordon-Ross, A. and Vahid, F. 2006. Frequent loop detection using efficient nonintrusive on-chip hardware. IEEE Trans. Comput. 54, 10, 1203--1215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Heinrich, J. 1993. MIPS R4000 Microprocessor User’s Manual. Prentice-Hall PTR, Upper Saddle River, NJ.Google ScholarGoogle Scholar
  11. Hrishikesh, M. S., Burger, D., Jouppi, N. P., Keckler, S. W., Farkas, K. I., and Shivakumar, P. 2002. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA’02). IEEE Computer Society, Los Alamitos, CA, 14--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hu, S., Kim, I., Lipasti, M. H., and Smith, J. E. 2006. An approach for implementing efficient superscalar cisc processors. In the 12th International Symposium on High-Performance Computer Architecture. 41--52.Google ScholarGoogle Scholar
  13. Hu, S. and Smith, J. E. 2004. Using dynamic binary translation to fuse dependent instructions. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’04). IEEE Computer Society, Los Alamitos, CA, 213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kreahling, W., Hines, S., Whalley, D., and Tyson, G. 2006. Reducing the cost of conditional transfers of control by using comparison specifications. SIGPLAN Not. 41, 7, 64--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Krishnaswamy, A. and Gupta, R. 2005. Dynamic coalescing for 16-bit instructions. ACM Trans. Embed. Comput. Sys. 4, 1, 3--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Rixner, S., Dally, W., Khailany, B., Mattson, P., Kapasi, U., and Owens, J. 2000. Register organization for media processing. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).Google ScholarGoogle Scholar
  17. Sassone, P. G. and Wills, D. S. 2004. Dynamic strands: Collapsing speculative dependence chains for reducing pipeline communication. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’37). IEEE Computer Society, Los Alamitos, CA, 7--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sykora, M., Agosta, G., and Silvano, C. 2008. Dynamic configuration of application-specific implicit instructions for embedded pipelined processors. In SAC ’08: Proceedings of the ACM Symposium on Applied Computing (SAC’08). ACM, New York, NY, 1509--1516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zivojnovic, V., Velarde, J. M., Schläger, C., and Meyr, H. 1994. DSPstone--A DSP-oriented benchmarking methodology. In Proceedings of the International Conference on Signal Processing Applications and Technology (ICSPAT).Google ScholarGoogle Scholar

Index Terms

  1. Architecture Optimization of Application-Specific Implicit Instructions

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)1

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!