skip to main content
research-article

Merged Dictionary Code Compression for FPGA Implementation of Custom Microcoded PEs

Published:01 June 2008Publication History
Skip Abstract Section

Abstract

Horizontal Microcoded Architecture (HMA) is a paradigm for designing programmable high-performance processing elements (PEs). However, it suffers from large code size, which can be addressed by compression. In this article, we study the code size of one of the new HMA-based technologies called No-Instruction-Set Computer (NISC). We show that NISC code size can be several times larger than a typical RISC processor, and we propose several low-overhead dictionary-based code compression techniques to reduce its code size. Our compression algorithm leverages the knowledge of “don't care” values in the control words and can reduce the code size by 3.3 times, on average. Despite such good results, as shown in this article, these compression techniques lead to poor FPGA implementations because they require many on-chip RAMs. To address this issue, we introduce an FPGA-aware dictionary-based technique that uses the dual-port feature of on-chip RAMs to reduce the number of utilized block RAMs by half. Additionally, we propose cascading two-levels of dictionaries for code size and block RAM reduction of large programs. For an MP3 application, a merged, cascaded, three-dictionary implementation reduces the number of utilized block RAMs by 4.3 times (76%) compared to a NISC without compression. This corresponds to 20% additional savings over the best single level dictionary-based compression.

References

  1. Agrawala, A. and Rauscher, T. 1976. Foundations of Microprogramming: Architecture, Software, and Applications. Academic Press.Google ScholarGoogle Scholar
  2. Codwell, R., Nix, R., Donnell, J., Papworth, D., and Rodman, P. 1987. A VLIW architecture for a trace scheduling compiler. ACM SIGOPS Operat. Syst. Rev. 21, 4.Google ScholarGoogle Scholar
  3. Corliss, M., Lewis, E., and Roth, A. 2003. DISE: a programmable macro engine for customizing applications. In Proceedings of the International Symposium on Computer Architecture (ISCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Fraser, C. 2002. An instruction for direct interpretation of LZ77-compressed programs. Tech. rep. MSR-TR-2002-90, Microsoft Research, Microsoft Corporation.Google ScholarGoogle Scholar
  5. Garey, M. and Johnson, D. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gorjiara, B. 2007. Synthesis and optimization of custom low-power NISC processors. Ph.D. dissertation, University of California, Irvine. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gorjiara, B. and Gajski, D. 2005. Custom processor design using NISC: A case-study on DCT algorithm. In Proceedings of the IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia).Google ScholarGoogle Scholar
  8. Gorjiara, B. and Gajski, D. 2007. A novel profile-driven technique for simultaneous power and code-size optimization of nanocoded IPs. In Proceedings of the International Conference on Computer Design (ICCD).Google ScholarGoogle Scholar
  9. Gorjiara, B. and Gajski, D. 2008. Automatic Architecture Refinement Techniques for Customizing Processing Elements. In Proceedings of the Design Automation Conference (DAC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gorjiara, B., Reshadi, M., Chandraiah, P., and Gajski, D. 2006. Generic netlist representation for system and PE level design exploration. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Grehan, R. 1997. 16-bit: The good, the bad, your options. Embed. Syst. Prog.Google ScholarGoogle Scholar
  12. Ishiura, N. and Yamaguchi, M. 1997. Instruction code compression for application specific VLIW processors based on automatic field partitioning. In Proceedings of the International Conference on Synthesis and System Integration of Mixed Information System (SASIMI).Google ScholarGoogle Scholar
  13. Jensen, T. and Toft, B. 1995. Graph Coloring Problems. Wiley-Interscience. New York.Google ScholarGoogle Scholar
  14. Kemp, T., Montoye, R., Auerback, D., Harper, J., and Palmer, J. 1998. A Decompression Core for PowerPC. IBM Corporation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lau, J., Schoenmackers, S., Sherwood, T., and Calder, B. 2003. Reducing code size with echo instructions. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lefurgy, C., Piccininni, E., and Mudge, T. 1999. Evaluation of a high performance code compression method. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lekatsas, H., Henkel, J., and Jakkula, V. 2002. Design of a one-cycle decompression hardware for performance increase in embedded systems. In Proceedings of the Design Automation Conference (DAC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Prakash, J., Sandeep, C., Shankar, P., and Srikant, Y. 2003. A simple and fast scheme for code compression for VLIW processors. In Proceedings of the Data Compression Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Rafail, K. 1994. Universal Compression and Retrieval. Kluwer Academic. Publishing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rau, B., Yen, D., Yen, W., and Towle, R. 1989. The cydra 5 departmental supercomputer: Design philosophies, decisions, and trade-offs. IEEE Computers, 22, 1, 12--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Reshadi, M. 2007. No-instruction-set-computer (NISC) technology modeling and compilation. Ph.D. thesis, University of California, Irvine. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Reshadi, M. and Gajski, D. 2005. A cycle-accurate compilation algorithm for custom pipelined datapaths. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Reshadi, M. and Gajski, D. 2007. Interrupt and low-level programming support for expanding the application domain of statically-scheduled horizontally-microcoded architectures in embedded systems. In Proceedings of the Design Automation and Test in Europe (DATE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Reshadi, M., Gorjiara, B., and Gajski, D. 2005. Utilizing horizontal and vertical parallelism using a no-instruction-set compiler and custom datapaths. In Proceedings of the International Conference on Computer Design (ICCD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Reshadi, M., Gorjiara, B., and Gajski, D. 2008. C-Based design flow: A case study on G.729A for voice over internet protocol (VoIP). In Proceedings of the Design Automation Conference (DAC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ros, M. and Sutton, P. 2004. A hamming distance based VLIW/EPIC code compression technique. In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Saghir, M. 1998. Application-specific instruction-set architectures for embedded SDP applications. Ph.D. thesis, University of Toronto.Google ScholarGoogle Scholar
  28. Segars, S., Clarke, K., and Goudge, L. 1995. Embedded control problems, Thumb, and the ARM7TDMI. IEEE Micro 15, 5, 22--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Trajkovic, J., Reshadi, M., Gorjiara, B., and Gajski, D. 2006. A graph based algorithm for data path optimization in custom processors. In Proceedings of the Euromicro Conference on Digital System Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wang, K. 2001. Code compaction for VLIW instructions. M.S. thesis, University of Toronto.Google ScholarGoogle Scholar
  31. Weber, S. and Keutzer, K. 2005. Using minimal minterms to represent programmability. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wolfe, A. and Chanin, A. 1992. Executing compressed programs on an embedded RISC architecture. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xie, Y., Wolf, W., and Lekatsas, H. 2001. A code decompression architecture for VLIW processors. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Merged Dictionary Code Compression for FPGA Implementation of Custom Microcoded PEs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!