Abstract
Runtime reconfiguration is a promising solution for reducing hardware cost in embedded systems, without compromising on performance. We present a framework that aims to increase the performance benefits of reconfigurable processors that support full or partial runtime reconfiguration. The proposed framework achieves this by: (1) providing a means for choosing suitable custom instruction selection heuristics, (2) leveraging FPGA-aware merging of custom instructions to maximize the reconfigurable logic block utilization in each configuration, and (3) incorporating a hierarchical loop partitioning strategy to reduce runtime reconfiguration overhead. We show that the performance gain can be improved by employing suitable custom instruction selection heuristics that, in turn, depend on the reconfigurable resource constraints and the merging factor (extent to which the selected custom instructions can be merged). The hierarchical loop partitioning strategy leads to an average performance gain of over 31% and 46% for full and partial runtime reconfiguration, respectively. Performance gain can be further increased to over 52% and 70% for full and partial runtime reconfiguration, respectively, by exploiting FPGA-aware merging of custom instructions.
- K. Atasu, C. Özturan, G. Dündar, O. Mencer, and W. Luk. 2008. CHIPS: Custom hardware instruction processor synthesis. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 27, 3, 528--541. Google Scholar
Digital Library
- L. Bauer, M. Shafique, S. Kramer, and J. Henkel. 2007. RISPP: Rotating instruction set processing platform. In Proceedings of the 44th ACM/IEEE/EDA Design Automation Conference. 791--796. Google Scholar
Digital Library
- P. Bonzini and L. Pozzi. 2008. Recurrence-aware instruction set selection for extensible embedded processors. IEEE Trans. VLSI Syst. 16, 10, 1259--1267. Google Scholar
Digital Library
- J. Cong, Y. Fan, G. Han, and Z. Zhang. 2004. Application-specific instruction generation for configurable processor architectures. In Proceedings of the 12th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 183--189. Google Scholar
Digital Library
- Eembc. 2014. The embedded microprocessor benchmark consortium. http://www.eembc.org.Google Scholar
- Y. Guo, G. J. M. Smit, H. Broersma, and P. M. Heysters. 2003. A graph covering algorithm for a coarse grain reconfigurable system. In Proceedings of the ACM/SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems. 199--208. Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization. 3--14. Google Scholar
Digital Library
- M. Halldórsson and J. Radhakrishna. 1994. Greed is good: Approximating independent sets in sparse and bounded-degree graphs. In Proceedings of the Annual ACM Symposium on Theory of Computing. 439--448. Google Scholar
Digital Library
- H. P. Huynh, J. E. Sim, and T. Mitra. 2009. An efficient framework for dynamic reconfiguration of instruction-set customization. Des. Autom. Embedd. Syst. 13, 1--2, 91--113.Google Scholar
- G. Karypis and V. Kumar. 1998a. A software package for partitioning unstructured graphs, partitioning meshes and computing fill-reducing orderings of sparse matrices. http://www.lrr.in.tum.de/∼berariu/teaching/res/pos1011/manualMETIS.pdf.Google Scholar
- G. Karypis and V. Kumar. 1998b. Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 96--129. Google Scholar
Digital Library
- M. Kaul, R. Vemuri, S. Govindarajan, and I. Ouaiss. 1999. An automated temporal partitioning and loop fission approach for FPGA based reconfigurable synthesis of DSP applications. In Proceedings of the Design Automation Conference. 616--622. Google Scholar
Digital Library
- S. K. Lam, B. N. Krishnan, and T. Srikanthan. 2006. Efficient management of custom instructions for run-time reconfigurable instruction set processors. In Proceedings of the IEEE International Conference on Field Programmable Technology. 261--264.Google Scholar
- S. K. Lam, F. Huang, T. Srikanthan, and J. Wu. 2008. Run-time management of custom instructions on a partially reconfigurable architecture. In Proceedings of the IEEE International Conference on Electronic Design. 1--6.Google Scholar
- S. K. Lam and T. Srikanthan. 2009. Rapid design of area-efficient custom instructions for reconfigurable embedded processing. J. Syst. Archit. 55, 1, 1--14. Google Scholar
Digital Library
- S. K. Lam, Y. Deng, J. Hu, X. Zhou, and T. Srikanthan. 2010. Hierarchical loop partitioning for rapid generation of runtime configurations. In Proceedings of the 6th International Symposium on Applied Reconfigurable Computing. 282--293. Google Scholar
Digital Library
- S. K. Lam, T. Srikanthan, and C. T. Clarke. 2011. Architecture-aware technique for mapping area-time efficient custom instructions onto FPGAS. IEEE Trans. Comput. 60, 5, 680--692. Google Scholar
Digital Library
- S. K. Lam, T. Srikanthan, and C. T. Clarke. 2012. Exploiting FPGA-aware merging of custom instructions for runtime reconfiguration. In Proceedings of the 7th International Workshop on Reconfigurable Communication-Centric Systems-on-Chip. 1--8.Google Scholar
- T. Li, J. Wu, S. K. Lam, and T. Srikanthan. 2010. Selecting profitable custom instructions for reconfigurable processors. J. Syst. Archit. 56, 8, 340--351. Google Scholar
Digital Library
- Y. Li, T. Callahan, E. Darnell, R. Harr, U. Kurkure, and J. Stockwood. 2000. Hardware-software co-design of embedded reconfigurable architectures. In Proceedings of the Design Automation Conference. 507--512. Google Scholar
Digital Library
- D. Mattson and M. Christensson. 2004. Evaluation of synthesizable CPU cores. M. S. thesis, Chalmers University of Technology, Gothenburg, Sweden.Google Scholar
- F. Mehdipour, H. Noori, M. S. Zamani, K. Murakami, M. Sedighi, and K. Inoue. 2006. An integrated temporal partitioning and mapping framework for handling custom instructions on a reconfigurable functional unit. In Proceedings of the Asia-Pacific Computer Systems Architecture Conference. 219--230. Google Scholar
Digital Library
- L. Pozzi, K. Atasu, and P. Ienne. 2006. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 25, 7, 1209--1229. Google Scholar
Digital Library
- A. Prakash, S. K. Lam, C. T. Clarke, and T. Srikanthan. 2013. FPGA-aware techniques for rapid generation of profitable custom instructions. Microprocess. Microsyst. 37, 3, 259--269Google Scholar
Cross Ref
- Stretch. 2014. S6000 family software configurable processors. http://www.stretchinc.com/products/s6000.php.Google Scholar
- Trimaran. 2014. An infrastructure for research in instruction-level parallelism. http://www.trimaran.org.Google Scholar
- A. G. Ye and J. Rose. 2006. Using bus-based connections to improve field-programmable gate-array density for implementing datapath circuits. IEEE Trans. Very Large Scale Integr. Syst. 14, 5, 462--473. Google Scholar
Digital Library
Recommendations
Run-time management of custom instructions on a partially reconfigurable architecture
Run-time reconfiguration can increase the cost efficiency and hardware specialisation of reconfigurable processors by dynamically changing the configuration of the reconfigurable logic to the required functionality. In this paper, we propose a scheme ...
A tightly coupled finite field arithmetic hardware in an FPGA-based embedded processor core for elliptic curve cryptography
This work presents the implementation of a tightly-coupled hardware architectural enhancement to the Altera FPGA-based Nios II embedded processor. The goal is to accelerate finite field arithmetic operations in the binary fields of F<SUB align=right&...
Real-time embedded systems powered by FPGA dynamic partial self-reconfiguration: a case study oriented to biometric recognition applications
This work aims to pave the way for an efficient open system architecture applied to embedded electronic applications to manage the processing of computationally complex algorithms at real-time and low-cost. The target is to define a standard ...






Comments