Abstract
The main aim of this article is to demonstrate that a fast and accurate FPGA estimation engine is indispensable in design flows for custom instruction (template) selection. The need for a FPGA estimation engine stems from the difficulty in predicting the FPGA performance measures of selected custom instructions. We will present a FPGA estimation technique that partitions the high-level representation of custom instructions into clusters based on the structural organization of the target FPGA, while taking into account general logic synthesis principles adopted by FPGA tools. In this work, we have evaluated a widely used graph covering algorithm with various heuristics for custom instruction selection. In addition, we present an algorithm called Refined Largest Fit First (RLFF) that relies on a graph covering heuristic to select non-overlapping superset templates, which typically incorporate frequently used basic templates. The initial solution is further refined by considering overlapping templates that were ignored previously to see if their introduction could lead to higher performance. While RLFF provides the most efficient cover compared to the ILP method and other graph covering heuristics, FPGA estimation results reveals that RLFF leads to the worst performance in certain applications. It is therefore a worthy proposition to equip design flows with accurate FPGA estimation in order to rapidly determine the most profitable custom instruction approach for a given application.
- Altera. NIOS II Processors. Online: http://www.altera.com/products/ip/processors/nios2/ni2-index.html.Google Scholar
- K. Atasu, C. Özturan, G. Dündar, O. Mencer, and W. Luk. 2008. CHIPS: Custom hardware instruction processor synthesis. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 27, 3, 528--541. Google Scholar
Digital Library
- F. Barat, R. Lauwereins, and G. Deconinck. 2002. Reconfigurable instruction set processors from a hardware/software perspective. IEEE Trans. Softw. Eng. 28, 9, 847--862. Google Scholar
Digital Library
- S. Bilavarn, G. Gogniat, J.-L. Philippe, and L. Bossuet. 2006. Design Space pruning through early estimations of area-delay tradeoffs for FPGA implementations. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 25, 10, 1950--1968. Google Scholar
Digital Library
- P. Bjuréus, M. Millberg, and A. Jantsch. 2002. FPGA resource and timing estimation from matlab execution traces. In Proceedings of the International Symposium on Hardware Software Codesign. 31--36. Google Scholar
Digital Library
- P. Bonzini and L. Pozzi. 2008. Recurrence-aware instruction set selection for extensible embedded processors. IEEE Trans. VLSI Syst. 16, 10, 1259--1267. Google Scholar
Digital Library
- C. Brandolese, W. Fornaciari, and F. Salice. 2004. An area estimation methodology for FPGA based designs at SystemC level. In Proceedings of the Design Automation Conference. 129--132. Google Scholar
Digital Library
- D. Chen and J. Cong. 2004. DAOmap: A depth-optimal area optimization mapping algorithm for FPGA designs. In Proceedings of the IEEE International Conference on Computer-Aided Design. 752--759. Google Scholar
Digital Library
- N. T. Clark, H. Zhong, and S. A. Mahlke. 2003. Processor acceleration through automated instruction set customization. In Proceedings of the 36th IEEE/ACM International Symposium on Microarchitecture. Google Scholar
Digital Library
- N. T. Clark, H. Zhong, and S. A. Mahlke. 2005. Automated custom instruction generation for domain-specific processor acceleration. IEEE Trans. Comput. 54, 10, 1258--1270. Google Scholar
Digital Library
- J. Cong, Y. Fan, G. Han, and Z. Zhang. 2004. Application-specific instruction generation for configurable processor architectures. In Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays. 183--189. Google Scholar
Digital Library
- J. Cong, G. Han, and Z. Zhang. 2006. Architecture and compiler optimizations for data bandwidth improvement in configurable processors. IEEE Trans. VLSI Syst. 14, 9, 986--997. Google Scholar
Digital Library
- EEMBC. The Embedded Microprocessor Benchmark Consortium. http://www.eembc.org/home.php.Google Scholar
- EXPRESS. ExPRESS benchmarks. http://express.ece.ucsb.edu/benchmark/.Google Scholar
- C. Galuzzi, E. M. Panainte, Y. Yankova, K. Bertels, and S. Vassiliadis. 2006. Automatic selection of application-specific instruction-set extensions. In Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis. 160--165. Google Scholar
Digital Library
- Y. Guo, G. J. M. Smit, H. Broersma, and P. M. Heysters. 2003. A graph covering algorithm for a coarse grain reconfigurable system. In Proceedings of the ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems. 199--208. Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization, 3--14. Google Scholar
Digital Library
- M. Halldórsson and J. Radhakrishna. 1994. Greed is Good: Approximating independent sets in sparse and bounded-degree graphs. In Proceedings of the Annual ACM Symposium on Theory of Computing. 439--448. Google Scholar
Digital Library
- Y. L. Joey, D. Chen, and J. Cong. 2006. Optimal simultaneous mapping and clustering for FPGA delay optimization. In Proceedings of the Design Automation Conference, 472--477. Google Scholar
Digital Library
- R. Kastner, A. Kaplan, S. O. Memik, and E. Bozorgzadeh. 2002. Instruction generation for hybrid reconfigurable systems. ACM Trans. Des. Autom. Electron. Syst. 7, 4, 605--627. Google Scholar
Digital Library
- D. Kulkarni, W. A. Najjar, R. Rinker, and F. J. Kurdahi. 2006. Compile-time area estimation for LUT-based FPGAs. ACM Trans. Des. Autom. Electron. Syst. 11, 1, 104--122. Google Scholar
Digital Library
- S. K. Lam and T. Srikanthan. 2009. Rapid design of area-efficient custom instructions for reconfigurable embedded processing. J. Syst. Archit. 55, 1, 1--14. Google Scholar
Digital Library
- S. K. Lam, B. N. Krishnan, and T. Srikanthan. 2006a. Efficient management of custom instructions for run-time reconfigurable instruction set processors. In Proceedings of the IEEE International Conference on Field Programmable Technology. 261--264.Google Scholar
- S. K. Lam, M. Shoaib, and T. Srikanthan. 2006b. Modeling arbitrator delay-rea dependencies in customizable instruction set processors. In Proceedings of the IEEE International Workshop on Electronic Design, Test and Applications. Google Scholar
Digital Library
- S. K. Lam, T. Srikanthan, and C. T. Clarke. 2011. Architecture-aware technique for mapping area-time efficient custom instructions onto FPGAs. IEEE Trans. Comput. 60, 5, 680--692. Google Scholar
Digital Library
- C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Microarchitecture, 330--335. Google Scholar
Digital Library
- J.-E. Lee, K. Choi, and N. Dutt. 2002. Efficient instruction encoding for automatic instruction set design of configurable ASIPs. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 649--654. Google Scholar
Digital Library
- T. Li, J. Wu, S. K. Lam, and T. Srikanthan. 2010. Selecting profitable custom instructions for reconfigurable processors, J. Syst. Archit. 56, 8, 340--351. Google Scholar
Digital Library
- S. Mahlke, R. Ravindran, M. Schlansker, R. Schreiber, and T. Sherwood. 2001. Bitwidth cognizant architecture synthesis of custom hardware accelerators. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 20, 11, 1355--1371. Google Scholar
Digital Library
- D. Mattson and M. Christensson. 2004. Evaluation of synthesizable CPU cores, M.S. thesis, Chalmers University of Technology, Gothenburg, Sweden.Google Scholar
- A. Nayak, M. Haldar, A. Choudhary, and P. Banerjee. 2002. Accurate area and delay estimators for FPGAs. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 862--869. Google Scholar
Digital Library
- D. Pisinger. 1994. A minimal algorithm for the multiple choice knapsack problem. Tech. Rep. 94-25, DIKU, University of Copenhagen, Denmark.Google Scholar
- L. Pozzi, K. Atasu, and P. Ienne. 2006. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 25, 7, 1209--1229. Google Scholar
Digital Library
- S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J.D. Owens. 2000. Register organization for media processing. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture. 375--386.Google Scholar
- M. A. R. Saghir and R. Naous. 2007. Customizing the datapath and ISA of soft VLIW processors. In Proceedings of the 2nd International Conference on High Performance Embedded Architectures and Compilers. 276--290. Google Scholar
Digital Library
- Stretch. S6000 Family software configurable processors. Online: http://www.stretchinc.com/products/s6000.php.Google Scholar
- F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha. 2004. Custom-Instruction synthesis for extensible-processor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 23, 2, 216--228. Google Scholar
Digital Library
- F. Sun, S. Ravi, A. Raghunathan, and N. K. Jha. 2007. A synthesis methodology for hybrid custom instruction and coprocessor generation for extensible processors. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 26, 11, 2035--2045. Google Scholar
Digital Library
- Trimaran. An Infrastructure for Research in Instruction-Level Parallelism, Online: http://www.trimaran.org.Google Scholar
- Xilinx. 2007. Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete Data Sheet. DS083 (Version 4.7).Google Scholar
- Xilinx. 2008. Virtex-4 FPGA User Guide. User Guide UG070 (Version 2.6).Google Scholar
- Xilinx. 2012. 7 Series DSP48E1 Slice User Guide. User Guide UG479 (Version 1.3).Google Scholar
- A. Yazdanbakhsh, M. E. Salehi, S. Safari, and S. M. Fakhraie. 2010. Locality considerations in exploring custom instruction selection algorithms. In Proceedings of the 2nd Asia Symposium on Quality Electronic Design. 157--162.Google Scholar
- P. Yu and T. Mitra. 2004. Characterizing embedded applications for instruction-set extensible processors. In Proceedings of the 41st IEEE/ACM on Design Automation Conference, 723--728. Google Scholar
Digital Library
Index Terms
Rapid evaluation of custom instruction selection approaches with FPGA estimation
Recommendations
Soft vector processors vs FPGA custom hardware: measuring and reducing the gap
FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysSoft processors are often used in FPGA-based systems because of their ease-of-use, but for a given computation there is a significant gap in area/performance between a C code implementation executing on a soft processor and a custom FPGA hardware ...
Rapid design of area-efficient custom instructions for reconfigurable embedded processing
RISPs (Reconfigurable Instruction Set Processors) are increasingly becoming popular as they can be customized to meet design constraints. However, existing instruction set customization methodologies do not lend well for mapping custom instructions on ...
Automatic custom instruction identification for application-specific instruction set processors
The application-specific instruction set processors (ASIPs) have received more and more attention in recent years. ASIPs make trade-offs between flexibility and performance by extending the base instruction set of a general-purpose processor with custom ...






Comments