Abstract
Field-programmable gate arrays (FPGAs) are used for a wide variety of computations in low-cost embedded systems. Although these systems often have modest performance constraints, their energy consumption must typically be limited. Many FPGA applications employ repetitive loops that cannot be straightforwardly split into parallel computations. Performing a loop sequentially generally requires high-speed clocks that consume considerable clock power and sometimes require clock generation using a phase-locked loop (PLL). Loop unrolling addresses the high-speed clock issue, but its use often leads to significant combinational glitch power.
In this work, a computer-aided design (CAD) approach that unrolls loops for designs targeted to low-cost FPGAs is described. Our approach considers latency constraints in an effort to minimize energy consumption for loop-based computation. To reduce glitch power, a glitch-filtering approach is introduced that provides a balance between glitch reduction and design performance. Glitch-filter enable signals are generated and routed to the filters using resources best suited to the target FPGA. Our approach automatically inserts glitch filters and associated control logic into a design prior to processing with FPGA synthesis, place, and route tools. Our energy-saving loop-unrolling approach has been evaluated using five benchmarks often used in low-cost FPGAs. The energy-saving capabilities of the approach have been evaluated for an Intel Cyclone IV and a Xilinx Artix-7 FPGA using board-level power measurement. The use of unrolling and glitch filtering is shown to reduce energy by at least 65% for an Artix-7 device and 50% for a Cyclone IV device while meeting design latency constraints.
- Altera. 2017. Altera Cyclone IV GX Development Board. Retrieved December 9, 2018 from https://www.altera.com/products/boards_and_kits/dev-kits/altera/kit-cyclone-iv-gx.html.Google Scholar
- R. Andraka. 1998. A survey of CORDIC algorithms for FPGA-based computers. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 191--200. Google Scholar
Digital Library
- J. Babb, M. Renard, C. Andras Moritz, W. Lee, M. Frank, R. Barua, and S. Amarasinghe. 1999. Parallelizing applications to silicon. In Proceedings of the IEEE International Symposium on Field-Programmable Custom Computing Machines. IEEE. 70--81. Google Scholar
Digital Library
- S. Banik, A. Bogdanov, F. Regazzoni, T. Isobe, H. Hiwatari, and T. Akishita. 2016. Round gating for low energy block ciphers. In Proceedings of Symposium on Hardware-Oriented Security and Trust. IEEE. 55--60.Google Scholar
- R. Beaulieu, S. Treatman-Clark, D. Shors, B. Weeks, J. Smith, and L. Wingers. 2015. The SIMON and SPECK lightweight block ciphers. In Proceedings of IEEE/ACM Design Automation Conference. ACM, 1--6. Google Scholar
Digital Library
- E. Boemo, J. Oliver, and G. Caffarena. 2013. Tracking the pipelining-power rule along the FPGA technical literature. In Proceedings of FPGAWorld. ACM, 9:1--9:5. Google Scholar
Digital Library
- A. Collins. 2011. Agile mixed signal addresses analog design challenges. White Paper, WP398 (v1. 0) August 15 (2011).Google Scholar
- Cyclone IV, Device Handbook. 2010. Vol. 1. Altera, Dec (2010).Google Scholar
- T. Czajkowski and S. Brown. 2007. Using negative edge triggered FFs to reduce glitching power in FPGA circuits. In IEEE/ACM Design Automation Conference. ACM, 324--329. Google Scholar
Digital Library
- S. N. Dhanuskodi and D. Holcomb. 2016. Energy optimization of unrolled block ciphers using combinational checkpointing. In Proceedings of Workshop on RFID Security and Privacy. Springer International Publishing.Google Scholar
- Q. Dinh, D. Chen, and M. D. F. Wong. 2010. A routing approach to reduce glitches in low power FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 29, 2 (Feb. 2010), 235--240. Google Scholar
Digital Library
- O. Silvia Dragomir, T. Stefanov, and K. Bertels. 2009. Optimal loop unrolling and shifting for reconfigurable architectures. ACM Transactions on Reconfigurable Technology and Systems 2, 4 (Sept. 2009), 25:1--25:24. Google Scholar
Digital Library
- N. K. Dumpala, S. B. Patil, D. E. Holcomb, and R. Tessier. 2017. Energy efficient loop unrolling for low-cost FPGAs. In Proceedings of the IEEE Conference on Field-Programmable Custom Computing Machines. Napa, CA, 17--20.Google Scholar
- D. Fick, N. Liu, Z. Foo, M. Fojtik, J. Seo, D. Sylvester, and D. Blaauw. 2010. In situ delay-slack monitor for high-performance processors using an all-digital self-calibrating 5ps resolution time-to-digital converter. In International Solid State Circuits Conference. Mira Digital Publishing, 23--25.Google Scholar
- H. Hsing. 2015. tiny_aes AES Core. Retrieved December 9, 2018 from http://opencores.org/project,tiny_aes.Google Scholar
- S. Huda and J. Anderson. 2016. Towards PVT-tolerant glitch-free operation in FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM. 90--99. Google Scholar
Digital Library
- K. Kepa, D. Coburn, J. C. Dainty, and F. Morgan. 2008. High speed optical wavefront sensing with low cost FPGAs. Measurement Science Review 8, 4 (2008), 87--93.Google Scholar
Cross Ref
- S. Kerckhof, F. Durvaux, C. Hocquet, D. Bol, and F.-X. Standaert. 2012. Towards green cryptography: A comparison of lightweight ciphers from the energy viewpoint. In Proceedings of the Conference on Cryptographic Hardware and Embedded Systems. Springer International Publishing, 390--407. Google Scholar
Digital Library
- J. Lamoureux, G. Lemieux, and S. Wilton. 2008. GlitchLess: Dynamic power minimization in FPGAs through edge alignment and glitch filtering. IEEE Transactions on VLSI Systems 16, 11 (Nov. 2008), 1521--1534. Google Scholar
Digital Library
- H. Lim, K. Lee, Y. Cho, and N. Chang. 2005. Flip-flop insertion with shifted-phase clocks for FPGA power reduction. In IEEE/ACM International Conference on Computer-Aided Design. IEEE Computer Society, 335--342. Google Scholar
Digital Library
- E. Musoll and J. Cortadella. 1995. Low-power array multipliers with transition retaining barriers. In 5th International Workshop on Power and Timing Modeling. Oldenburg University, 227--235.Google Scholar
- National Institute of Standards and Technology. 2001. Advanced Encryption Standard (AES). Federal Information Processing Standards Publication FIPS-197.Google Scholar
- J. Oliver, J. Pérez, and E. Boemo. 2014. Power estimations versus power measurements in Spartan-6 devices. In Southern Conference on Programmable Logic. IEEE Press, 1--5.Google Scholar
- J. Park, K. R. S. Shayee, and P. C. Diniz. 2004. Performance and area modeling of complete FPGA designs in the presence of loop transformations. IEEE Transactions on Computers 53, 11 (Nov. 2004), 1420--1435. Google Scholar
Digital Library
- C. Ravishankar, J. H. Anderson, and A. Kennings. 2012. FPGA power reduction by guarded evaluation considering logic architecture. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 9 (Aug. 2012), 1305--1318. Google Scholar
Digital Library
- N. Rollins. 2007. Reducing Power in FPGA Designs Through Glitch Reduction. Ph.D. Dissertation. Brigham Young University, Provo, UT.Google Scholar
- W. Shum and J. H. Anderson. 2011. FPGA glitch power analysis and reduction. In Proceedings of the IEEE/ACM International Symposium on Low-Power Electronics and Design. IEEE Press, 27--32. Google Scholar
Digital Library
- R. Usselmann. 2009. DES Core. Retrieved December 9, 2018 from http://opencores.org/project,des.Google Scholar
- S. Wilton, S. Ang, and W. Luk. 2004. The impact of pipelining on energy per operation in field-programmable gate arrays. In Proceedings of Conference on Field Programmable Logic and Application. Springer-Verlag Berlin Heidelberg, 719--728.Google Scholar
- J. Wu. 2010. Several key issues on implementing delay line based TDCs using FPGAs. IEEE Transactions on Nuclear Science 57, 3 (June 2010), 1543--1548.Google Scholar
Cross Ref
- Xilinx. 2017. Artix-7 35T Arty FPGA Evaluation Kit. Retrieved December 9, 2018 from http://www.xilinx.com/products/boards-and-kits/arty.html#documentation.Google Scholar
- M. Zuluaga. 2012. Sorting Network IP Generator. Retrieved December 9, 2018 from http://www.spiral.net/hardware/sort/sort.html.Google Scholar
Index Terms
Loop Unrolling for Energy Efficiency in Low-Cost Field-Programmable Gate Arrays
Recommendations
Reconfigurable Processing With Field Programmable Gate Arrays
ASAP '96: Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and ProcessorsIn-system-programmable, SRAM-based Field Programmable Gate Arrays (FPGAs) can be used to create processors and coprocessors whose internal architecture as well as interconnections can be reconfigured to match the needs of a given application. Exploiting ...






Comments