Abstract
While soft processor cores provided by FPGA vendors offer designers with increased flexibility, such processors typically incur penalties in performance and energy consumption compared to hard processor core alternatives. The recently developed technology of warp processing can help reduce those penalties. Warp processing is the dynamic and transparent transformation of critical software regions from microprocessor execution to much faster circuit execution on an FPGA. In this article, we describe an implementation of a warp processor on a Xilinx Virtex-II Pro and Spartan3 FPGAs incorporating one or more MicroBlaze soft processor cores. We further provide a detailed analysis of the energy overhead of dynamically partitioning an application's kernels to hardware executing within an FPGA. Considering an implementation that periodically partitions the executing application once every minute, a MicroBlaze-based warp processor implemented on a Spartan3 FPGA achieves average speedups of 5.8× and energy reductions of 49% compared to the MicroBlaze soft processor core alone—providing competitive performance and energy consumption compared to existing hard processor cores.
- Altera Corp. 2007. http://www.altera.com.Google Scholar
- Atmel Corp. 2007. http://www.atmel.com.Google Scholar
- ARM Ltd. 2007. http://www.arm.com.Google Scholar
- Balboni, A., Fornaciari, W., and Sciuto, D. 1996. Partitioning and exploration in the TOSCA co-design flow. In Proceedings of the International Workshop on Hardware/Software Codesign (CODES'96). IEEE, Los Alamitos, CA, 62--69. Google Scholar
Digital Library
- Burger, D. and Austin, T. 1997. The SimpleScalar tool set, version 2.0. SIGARCH Comput. Archit. News 25, 3. Google Scholar
Digital Library
- Eles, P., Peng, Z., Kuchchinski, K., and Doboli, A. 1997. System level hardware/software partitioning based on simulated annealing and tabu search. Kluwer's Des. Autom. Embed. Syst. 2, 1, 5--32.Google Scholar
Digital Library
- EEMBC. 2005. The embedded microprocessor benchmark consortium. http://www.eembc.org.Google Scholar
- Ernst, R., Henkel, J., and Benner, T. 1993. Hardware-software cosynthesis for microcontrollers. IEEE Des. Test Comput. 10, 4, 64--75. Google Scholar
Digital Library
- Gajski, D., Vahid, F., Narayan, S., and Gong, J. 1998. SpecSyn: an environment supporting the specify-explore-refine paradigm for hardware/software system design. IEEE Trans. VLSI Syst. 6, 1, 84--100. Google Scholar
Digital Library
- Gordon-Ross, A. and Vahid, F. 2003. Frequent loop detection using efficient non-intrusive on-chip hardware. In Proceedings of the Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). ACM, New York, 117--124. Google Scholar
Digital Library
- Halfhill, T. 2003. MIPS embraces configurable technology. Microprocessor Rep. 7--15.Google Scholar
- Henkel, J. 1996. A low power hardware/software partitioning approach for core-based embedded systems. In Proceedings of the Design Automation Conference (DAC'96). ACM, New York, 122--127. Google Scholar
Digital Library
- Henkel, J. and Li, Y. 1998. Energy-conscious HW/SW-partitioning of embedded systems: a case study on an MPEG-2 encoder. In Proceedings of the International Workshop on Hardware/Software Codesign (CODES'98). IEEE, Los Alamitos, CA, 22--27. Google Scholar
Digital Library
- Henkel, J. and Ernst, R. 1997. A hardware/software partitioner using a dynamically determined granularity. In Proceedings of the Design Automation Conference (DAC'97). ACM, New York, 691--696. Google Scholar
Digital Library
- Lysecky, R., Stitt, G., and Vahid, F. 2006. Warp processors. ACM Trans. Des. Automat. Elect. Syst. 11, 3, 659--681. Google Scholar
Digital Library
- Lysecky, R. and Vahid, F. 2004. A configurable logic architecture for dynamic hardware/software partitioning. In Proceedings of the Design Automation and Test in Europe Conference (DATE'04). IEEE, Los Alamitos, CA, 480--485. Google Scholar
Digital Library
- Lysecky, R. and Vahid, F. 2003. On-chip logic minimization. In Proceedings of the Design Automation Conference (DAC'03). ACM, New York, 334--337. Google Scholar
Digital Library
- Lysecky, R., Vahid, F., and Tan, S. 2005. A study of the scalability of on-chip routing for just-in-time FPGA compilation. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05). IEEE, Los Alamitos, CA, 57--62. Google Scholar
Digital Library
- Lysecky, R., Vahid, F., and Tan, S. 2004. Dynamic FPGA routing for just-in-time FPGA compilation. In Proceedings of the Design Automation Conference (DAC'04). ACM, New York, 334--337. Google Scholar
Digital Library
- Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'00). IEEE, Los Alamitos, CA, 241--243. Google Scholar
Digital Library
- Stitt, G., Lysecky, R., and Vahid, F. 2003. Dynamic hardware/software partitioning: a first approach. In Proceedings of the Design Automation Conference (DAC'03). ACM, New York, 250--255. Google Scholar
Digital Library
- Stitt, G. and Vahid, F. 2002a. The energy advantages of microprocessor platforms with on-chip configurable logic. IEEE Des. Test Comput. 9, 6, 36--43. Google Scholar
Digital Library
- Stitt, G. and Vahid, F. 2002b. Hardware/software partitioning of software binaries. In Proceedings of the International Conference on Computer Aided Design (ICCAD'02). ACM, New York, 164--170. Google Scholar
Digital Library
- Stitt, G., Vahid, F., and Najjar, W. 2006. A code refinement methodology for performance-improved synthesis from C. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'06). ACM, New York. Google Scholar
Digital Library
- Tensilica, Inc. 2007. http://www.tensilica.com.Google Scholar
- Triscend Corp. 2003. http://www.triscend.com.Google Scholar
- Venkataramani, G., Najjar, W., Kurdahi, F., Bagherzadeh, N., and Bohm, W. 2001. A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture. In Proceedings of the Conference on Compiler, Architecture and Synthesis for Embedded Systems (CASES'01). ACM, New York, 116--125. Google Scholar
Digital Library
- Vissers, K. 2004. Programming models and architectures for FPGAs. In Proceedings of the Conference on Compiler, Architecture and Synthesis for Embedded Systems (CASES'04). ACM, New York. Google Scholar
Digital Library
- Wan, M., Ichikawa, Y., Lidsky, D., and Rabaey, J. 1998. An energy conscious methodology for early design space exploration of heterogeneous DSPs. In Proceedings of the IEEE Custom Integrated Circuits Conference (CICC'98). IEEE, Los Alamitos, CA, 111--117.Google Scholar
- Xilinx, Inc. 2007. http://www.xilinx.com.Google Scholar
- Xilinx, Inc. 2003. Xilinx Press Release #03142, http://www.xilinx.com/prs_rls/silicon_spart/03142s3_pricing.htm.Google Scholar
Index Terms
Design and implementation of a MicroBlaze-based warp processor
Recommendations
Warp Processors
We describe a new processing architecture, known as a warp processor, that utilizes a field-programmable gate array (FPGA) to improve the speed and energy consumption of a software binary executing on a microprocessor. Unlike previous approaches that ...
Warp Processors
DAC '04: Proceedings of the 41st annual Design Automation ConferenceWe describe a new processing architecture, known as a warp processor, that utilizes a field-programmable gate array (FPGA) to improve the speed and energy consumption of a software binary executing on a microprocessor. Unlike previous approaches that ...
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning
DATE '04: Proceedings of the conference on Design, automation and test in Europe - Volume 1In previous work, we showed the benefits and feasibility of having a processor dynamically partition its executing software such that critical software kernels are transparently partitioned to execute as a hardware coprocessor on configurable logic -- ...






Comments