skip to main content
research-article

Rapid evaluation of custom instruction selection approaches with FPGA estimation

Published:10 March 2014Publication History
Skip Abstract Section

Abstract

The main aim of this article is to demonstrate that a fast and accurate FPGA estimation engine is indispensable in design flows for custom instruction (template) selection. The need for a FPGA estimation engine stems from the difficulty in predicting the FPGA performance measures of selected custom instructions. We will present a FPGA estimation technique that partitions the high-level representation of custom instructions into clusters based on the structural organization of the target FPGA, while taking into account general logic synthesis principles adopted by FPGA tools. In this work, we have evaluated a widely used graph covering algorithm with various heuristics for custom instruction selection. In addition, we present an algorithm called Refined Largest Fit First (RLFF) that relies on a graph covering heuristic to select non-overlapping superset templates, which typically incorporate frequently used basic templates. The initial solution is further refined by considering overlapping templates that were ignored previously to see if their introduction could lead to higher performance. While RLFF provides the most efficient cover compared to the ILP method and other graph covering heuristics, FPGA estimation results reveals that RLFF leads to the worst performance in certain applications. It is therefore a worthy proposition to equip design flows with accurate FPGA estimation in order to rapidly determine the most profitable custom instruction approach for a given application.

References

  1. Altera. NIOS II Processors. Online: http://www.altera.com/products/ip/processors/nios2/ni2-index.html.Google ScholarGoogle Scholar
  2. K. Atasu, C. Özturan, G. Dündar, O. Mencer, and W. Luk. 2008. CHIPS: Custom hardware instruction processor synthesis. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 27, 3, 528--541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Barat, R. Lauwereins, and G. Deconinck. 2002. Reconfigurable instruction set processors from a hardware/software perspective. IEEE Trans. Softw. Eng. 28, 9, 847--862. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Bilavarn, G. Gogniat, J.-L. Philippe, and L. Bossuet. 2006. Design Space pruning through early estimations of area-delay tradeoffs for FPGA implementations. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 25, 10, 1950--1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Bjuréus, M. Millberg, and A. Jantsch. 2002. FPGA resource and timing estimation from matlab execution traces. In Proceedings of the International Symposium on Hardware Software Codesign. 31--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Bonzini and L. Pozzi. 2008. Recurrence-aware instruction set selection for extensible embedded processors. IEEE Trans. VLSI Syst. 16, 10, 1259--1267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Brandolese, W. Fornaciari, and F. Salice. 2004. An area estimation methodology for FPGA based designs at SystemC level. In Proceedings of the Design Automation Conference. 129--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Chen and J. Cong. 2004. DAOmap: A depth-optimal area optimization mapping algorithm for FPGA designs. In Proceedings of the IEEE International Conference on Computer-Aided Design. 752--759. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. T. Clark, H. Zhong, and S. A. Mahlke. 2003. Processor acceleration through automated instruction set customization. In Proceedings of the 36th IEEE/ACM International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. T. Clark, H. Zhong, and S. A. Mahlke. 2005. Automated custom instruction generation for domain-specific processor acceleration. IEEE Trans. Comput. 54, 10, 1258--1270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Cong, Y. Fan, G. Han, and Z. Zhang. 2004. Application-specific instruction generation for configurable processor architectures. In Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays. 183--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Cong, G. Han, and Z. Zhang. 2006. Architecture and compiler optimizations for data bandwidth improvement in configurable processors. IEEE Trans. VLSI Syst. 14, 9, 986--997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. EEMBC. The Embedded Microprocessor Benchmark Consortium. http://www.eembc.org/home.php.Google ScholarGoogle Scholar
  14. EXPRESS. ExPRESS benchmarks. http://express.ece.ucsb.edu/benchmark/.Google ScholarGoogle Scholar
  15. C. Galuzzi, E. M. Panainte, Y. Yankova, K. Bertels, and S. Vassiliadis. 2006. Automatic selection of application-specific instruction-set extensions. In Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis. 160--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Guo, G. J. M. Smit, H. Broersma, and P. M. Heysters. 2003. A graph covering algorithm for a coarse grain reconfigurable system. In Proceedings of the ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems. 199--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Halldórsson and J. Radhakrishna. 1994. Greed is Good: Approximating independent sets in sparse and bounded-degree graphs. In Proceedings of the Annual ACM Symposium on Theory of Computing. 439--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. L. Joey, D. Chen, and J. Cong. 2006. Optimal simultaneous mapping and clustering for FPGA delay optimization. In Proceedings of the Design Automation Conference, 472--477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Kastner, A. Kaplan, S. O. Memik, and E. Bozorgzadeh. 2002. Instruction generation for hybrid reconfigurable systems. ACM Trans. Des. Autom. Electron. Syst. 7, 4, 605--627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Kulkarni, W. A. Najjar, R. Rinker, and F. J. Kurdahi. 2006. Compile-time area estimation for LUT-based FPGAs. ACM Trans. Des. Autom. Electron. Syst. 11, 1, 104--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. K. Lam and T. Srikanthan. 2009. Rapid design of area-efficient custom instructions for reconfigurable embedded processing. J. Syst. Archit. 55, 1, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. K. Lam, B. N. Krishnan, and T. Srikanthan. 2006a. Efficient management of custom instructions for run-time reconfigurable instruction set processors. In Proceedings of the IEEE International Conference on Field Programmable Technology. 261--264.Google ScholarGoogle Scholar
  24. S. K. Lam, M. Shoaib, and T. Srikanthan. 2006b. Modeling arbitrator delay-rea dependencies in customizable instruction set processors. In Proceedings of the IEEE International Workshop on Electronic Design, Test and Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. K. Lam, T. Srikanthan, and C. T. Clarke. 2011. Architecture-aware technique for mapping area-time efficient custom instructions onto FPGAs. IEEE Trans. Comput. 60, 5, 680--692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Microarchitecture, 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J.-E. Lee, K. Choi, and N. Dutt. 2002. Efficient instruction encoding for automatic instruction set design of configurable ASIPs. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 649--654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Li, J. Wu, S. K. Lam, and T. Srikanthan. 2010. Selecting profitable custom instructions for reconfigurable processors, J. Syst. Archit. 56, 8, 340--351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Mahlke, R. Ravindran, M. Schlansker, R. Schreiber, and T. Sherwood. 2001. Bitwidth cognizant architecture synthesis of custom hardware accelerators. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 20, 11, 1355--1371. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Mattson and M. Christensson. 2004. Evaluation of synthesizable CPU cores, M.S. thesis, Chalmers University of Technology, Gothenburg, Sweden.Google ScholarGoogle Scholar
  31. A. Nayak, M. Haldar, A. Choudhary, and P. Banerjee. 2002. Accurate area and delay estimators for FPGAs. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 862--869. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Pisinger. 1994. A minimal algorithm for the multiple choice knapsack problem. Tech. Rep. 94-25, DIKU, University of Copenhagen, Denmark.Google ScholarGoogle Scholar
  33. L. Pozzi, K. Atasu, and P. Ienne. 2006. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 25, 7, 1209--1229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J.D. Owens. 2000. Register organization for media processing. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture. 375--386.Google ScholarGoogle Scholar
  35. M. A. R. Saghir and R. Naous. 2007. Customizing the datapath and ISA of soft VLIW processors. In Proceedings of the 2nd International Conference on High Performance Embedded Architectures and Compilers. 276--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Stretch. S6000 Family software configurable processors. Online: http://www.stretchinc.com/products/s6000.php.Google ScholarGoogle Scholar
  37. F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha. 2004. Custom-Instruction synthesis for extensible-processor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 23, 2, 216--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha. 2007. A synthesis methodology for hybrid custom instruction and coprocessor generation for extensible processors. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 26, 11, 2035--2045. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Trimaran. An Infrastructure for Research in Instruction-Level Parallelism, Online: http://www.trimaran.org.Google ScholarGoogle Scholar
  40. Xilinx. 2007. Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete Data Sheet. DS083 (Version 4.7).Google ScholarGoogle Scholar
  41. Xilinx. 2008. Virtex-4 FPGA User Guide. User Guide UG070 (Version 2.6).Google ScholarGoogle Scholar
  42. Xilinx. 2012. 7 Series DSP48E1 Slice User Guide. User Guide UG479 (Version 1.3).Google ScholarGoogle Scholar
  43. A. Yazdanbakhsh, M. E. Salehi, S. Safari, and S. M. Fakhraie. 2010. Locality considerations in exploring custom instruction selection algorithms. In Proceedings of the 2nd Asia Symposium on Quality Electronic Design. 157--162.Google ScholarGoogle Scholar
  44. P. Yu and T. Mitra. 2004. Characterizing embedded applications for instruction-set extensible processors. In Proceedings of the 41st IEEE/ACM on Design Automation Conference, 723--728. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Rapid evaluation of custom instruction selection approaches with FPGA estimation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!