Abstract
We consider the impact of compiler optimizations on the quality of high-level synthesis (HLS)-generated field-programmable gate array (FPGA) hardware. Using an HLS tool implemented within the state-of-the-art LLVM compiler, we study the effect of compiler optimizations on the hardware metrics of circuit area, execution cycles, FMax, and wall-clock time. We evaluate 56 different compiler optimizations implemented within LLVM and show that some optimizations significantly affect hardware quality. Moreover, we show that hardware quality is also affected by some optimization parameter values, as well as the order in which optimizations are applied. We then present a new HLS-directed approach to compiler optimizations, wherein we execute partial HLS and profiling at intermittent points in the optimization process and use the results to judiciously undo the impact of optimization passes predicted to be damaging to the generated hardware quality. Results show that our approach produces circuits with 16% better speed performance, on average, versus using the standard -O3 optimization level.
- Altera. 2012a. Implementing FPGA Design with the OpenCL Standard. White Paper WP-01173-2.0. Altera Corporation. Available at http://www.altera.com/literature/wp/wp-01173-opencl.pdf.Google Scholar
- Altera. 2012b. Cyclone-II FPGA Family Datasheet. Altera Corporation.Google Scholar
- Lelac Almagor, Keith D. Cooper, Alexander Grosul, Timothy J. Harvey, Steven W. Reeves, Devika Subramanian, Linda Torczon, and Todd Waterman. 2004. Finding effective compilation sequences. In Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’04). 231--239. Google Scholar
Digital Library
- Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Tomasz Czajkowski, Stephen Brown, and Jason Anderson. 2013. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Transactions on Embedded Computing Systms 13, 2, Article No. 24. Google Scholar
Digital Library
- Jason Cong, Bin Liu, Raghu Prabhakar, and Peng Zhang. 2012. A study on the impact of compiler optimizations on high-level synthesis. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. 143--157.Google Scholar
- Jason Cong and Zhiru Zhang. 2006. An efficient and versatile scheduling algorithm based on SDC formulation. In Proceedings of the 2006 43rd ACM/IEEE Design Automation Conference (DAC’06). 433--438. Google Scholar
Digital Library
- Phillipe Coussy, Ghizlane Lhairech-Lebreton, Dominique Heller, and Eric Martin. 2010. GAUT—a free and open source high-level synthesis tool. In Proceedings of IEEE Design Automation and Test in Europe (DATE’10).Google Scholar
- Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, Francois Bodin, Phil Barnard, Elton Ashton, Edwin Bonilla, John Thomson, Christopher K. I. Williams, and Michael O’Boyle. 2011. Milepost GCC: Machine learning enabled self-tuning compiler. International Journal of Parallel Programming 39, 296--327. Issue 3.Google Scholar
Cross Ref
- Joseph A. Fisher. 1981. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers 100, 7, 478--490. Google Scholar
Digital Library
- Sumit Gupta, Nikil Dutt, Rajesh Gupta, and Alex Nicolau. 2003. SPARK: A high-level synthesis framework for applying parallelizing compiler transformations. In Proceedings of the 16th International Conference on VLSI Design. 461--466. Google Scholar
Digital Library
- Stefan Hadjis, Andrew Canis, Jason Anderson, Jongsok Choi, Kevin Nam, Tomasz Czajkowski, and Stephen Brown. 2012. Impact of FPGA architecture on resource sharing in high-level synthesis. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’12). 111--114. Google Scholar
Digital Library
- Yuko Hara, Hiroyuki Tomiyama, Shinya Honda, and Hiroaki Takada. 2009. Proposal and quantitative analysis of the CHStone benchmark program suite for practical C-based high-level synthesis. Journal of Information Processing 17, 242--254.Google Scholar
Cross Ref
- Qijing Huang, Ruolong Lian, Andrew Canis, Jongsok Choi, Ryan Xi, Stephen Brown, and Jason Anderson. 2013. The effect of compiler optimizations on high-level synthesis for FPGAs. In Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’13). 89--96. Google Scholar
Digital Library
- LLVM. 2010a. The LLVM Compiler. Infrastructure. Retrieved April 9, 2015, from http://www.llvm.org.Google Scholar
- LLVM. 2010b. LLVM Loop Unroll Pass. Retrieved April 9, 2015, from http://www.llvm.org/docs/Passes.html#loop-unroll-unroll-loops.Google Scholar
- Chris Loken, Daniel Gruner, Leslie Groer, Richard Peltier, Neil Bunn, Michael Craig, Teresa Henriques, Jillian Dempsey, Ching-Hsing Yu, Joseph Chen, L. Jonathan Dursi, Jason Chong, Scott Northrup, Jaime Pinto, Neil Knecht, and Ramses Van Zon. 2010. SciNet: Lessons learned from building a power-efficient top-20 system and data centre. Journal of Physics: Conference Series 256, 1, 012026.Google Scholar
Cross Ref
- Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1992. Effective compiler support for predicated execution using the hyperblock. In ACM SIGMICRO Newsletter 23, 45--54. Google Scholar
Digital Library
- Zhelong Pan and Rudolf Eigenmann. 2006. Fast and effective orchestration of compiler optimizations for automatic performance tuning. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). 319--332. Google Scholar
Digital Library
- Spyridon Triantafyllis, Manish Vachharajani, Neil Vachharajani, and David I. August. 2003. Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’03). 204--215. Google Scholar
Digital Library
- Jason Villarreal, Adrian Park, Walid Najjar, and Robert Halstead. 2010. Designing modular hardware accelerators in C with ROCCC 2.0. In Proceedings of the 2010 IEEE 18th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’10). 127--134. Google Scholar
Digital Library
- B. Ramakrishna Rau. 1996. Iterative modulo scheduling. International Journal of Parallel Processing 24, 13--64.Google Scholar
- Xilinx. 2013. C-Based Design: High-Level Synthesis with the Vivado HLS Tool. Technical Report. Xilinx Incorporated. Available at http://www.xilinx.com/training/dsp/high-level-synthesis-with-vivado-hls.htm.Google Scholar
- Y. Explorations. 2012. Y Explorations—C to RTL Behavioral Synthesis. Retrieved April 9, 2015, from http://www.yxi.com.Google Scholar
Index Terms
The Effect of Compiler Optimizations on High-Level Synthesis-Generated Hardware
Recommendations
LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems
Special issue on application-specific processorsIt is generally accepted that a custom hardware implementation of a set of computations will provide superior speed and energy efficiency relative to a software implementation. However, the cost and difficulty of hardware design is often prohibitive, ...
The Effect of Compiler Optimizations on High-Level Synthesis for FPGAs
FCCM '13: Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing MachinesWe consider the impact of compiler optimizations on the quality of high-level synthesis (HLS)-generated FPGA hardware. Using a HLS tool implemented within the state-of-the-art LLVM compiler, we study the effect of compiler optimizations on the hardware ...
Automating the Design of Processor/Accelerator Embedded Systems with LegUp High-Level Synthesis
EUC '14: Proceedings of the 2014 12th IEEE International Conference on Embedded and Ubiquitous ComputingLegUp [1] is an open-source high-level synthesis (HLS) tool that accepts a C program as input and automatically synthesizes it into a hybrid system. The hybrid system comprises an embedded processor and custom accelerators that realize user-designated ...






Comments