Abstract
Performance optimization is an important goal for High-level Synthesis (HLS). Existing HLS scheduling algorithms are all based on Control and Data Flow Graph (CDFG) and will schedule basic blocks in sequential order. Our study shows that the sequential scheduling order of basic blocks is a big limiting factor for achievable circuit performance. In this article, we propose a Dependency Graph (DG) with two important properties for scheduling. First, DG is a directed acyclic graph. Thus, no loop breaking heuristic is needed for scheduling. Second, DG can be used to identify the exact instruction parallelism. Our experiment shows that DG can lead to 76% instruction parallelism increase over CDFG. Based on DG, we propose a bottom-up scheduling algorithm to achieve much higher instruction parallelism than existing algorithms. Hierarchical state transition graph with guard conditions is proposed for efficient implementation of such high parallelism scheduling. Our experimental results show that our DG-based HLS algorithm can outperform the CDFG-based LegUp and the state-of-the-art industrial tool Vivado HLS by 2.88× and 1.29× on circuit latency, respectively.
- David F. Bacon, Susan L. Graham, and Oliver J. Sharp. 1994. Compiler transformations for high-performance computing. ACM Comput. Surv. 26, 4 (Dec. 1994), 345–420. https://doi.org/10.1145/197405.197406 Google Scholar
Digital Library
- Utpal Banerjee, Rudolf Eigenmann, Alexandru Nicolau, and David A. Padua. 1993. Automatic program parallelization. Proc. IEEE 81, 2 (1993), 211–243. https://doi.org/10.1109/5.214548Google Scholar
Cross Ref
- Raul Camposano. 1991. Path-based scheduling for synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 10, 1 (1991), 85–93. https://doi.org/10.1109/43.62794 Google Scholar
Digital Library
- Andrew Canis, Stephen D. Brown, and Jason H. Anderson. 2014. Modulo SDC scheduling with recurrence minimization in high-level synthesis. In Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL'14). 1–8. https://doi.org/10.1109/FPL.2014.6927490Google Scholar
- Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'11). Association for Computing Machinery, New York, NY, 33–36. https://doi.org/10.1145/1950413.1950423 Google Scholar
Digital Library
- Jason Cong and Zhiru Zhang. 2006. An efficient and versatile scheduling algorithm based on SDC formulation. In Proceedings of the 43rd Annual Design Automation Conference (DAC'06). Association for Computing Machinery, New York, NY, 433–438. https://doi.org/10.1145/1146909.1147025 Google Scholar
Digital Library
- Ron K. Cytron, Jeanne Ferrante, Barry K. Rosen, M. N. Wegman, and Frank Kenneth Zadeck. 1989. An efficient method of computing static single assignment form. In Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'89). Association for Computing Machinery, New York, NY, 25–35. https://doi.org/10.1145/75277.75280 Google Scholar
Digital Library
- Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 3 (July 1987), 319–349. https://doi.org/10.1145/24039.24041 Google Scholar
Digital Library
- Marcel Gort and Jason H. Anderson. 2013. Range and bitmask analysis for hardware optimization in high-level synthesis. In Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC'13). 773–779. https://doi.org/10.1109/ASPDAC.2013.6509694Google Scholar
- Zhenghua Gu, Wenqing Wan, and Chang Wu. 2019. Latency minimal scheduling with maximum instruction parallelism. In Proceedings of the IEEE 13th International Conference on ASIC (ASICON'19). 1–4. https://doi.org/10.1109/ASICON47005.2019.8983520Google Scholar
Cross Ref
- Sumit Gupta, Nick Savoiu, Nikil Dutt, Rajesh Gupta, and Alexandru Nicolau. 2004. Using global code motions to improve the quality of results for high-level synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 23, 2 (2004), 302–312. https://doi.org/10.1109/TCAD.2003.822105 Google Scholar
Digital Library
- Stefan Hadjis, Andrew Canis, Jason H. Anderson, Jongsok Choi, Kevin Nam, Stephen Brown, and Tomasz Czajkowski. 2012. Impact of FPGA architecture on resource sharing in high-level synthesis. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'12). Association for Computing Machinery, New York, NY, 111–114. https://doi.org/10.1145/2145694.2145712 Google Scholar
Digital Library
- Hongbin Zheng, Qingrui Liu, Junyi Li, Dihu Chen, and Zixin Wang. 2013. A gradual scheduling framework for problem size reduction and cross basic block parallelism exploitation in high-level synthesis. In Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC'13). 780–786. https://doi.org/10.1109/ASPDAC.2013.6509695Google Scholar
- Cheng-Tsung Hwang, Jiahn-Hurng Lee, and Yu Chin Hsu. 1991. A formal approach to the scheduling problem in high-level synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 10, 4 (1991), 464–475. https://doi.org/10.1109/43.75629 Google Scholar
Digital Library
- Lana Josipović, Radhika Ghosal, and Paolo Ienne. 2018. Dynamically scheduled high-level synthesis. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'18). Association for Computing Machinery, New York, NY, 127–136. https://doi.org/10.1145/3174243.3174264 Google Scholar
Digital Library
- David Ku and Gionvanni De Micheli. 1991. Relative scheduling under timing constraints. In Proceedings of the 27th ACM/IEEE Design Automation Conference (DAC'90). Association for Computing Machinery, New York, NY, 59–64. https://doi.org/10.1145/123186.123227 Google Scholar
Digital Library
- Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO'04). IEEE Computer Society, 75. Google Scholar
Digital Library
- Marco Lattuada and Fabrizio Ferrandi. 2015. Code transformations based on speculative SDC scheduling. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'15). IEEE Press, 71–77. Google Scholar
Digital Library
- Giovanni De Micheli. 1994. Synthesis and Optimization of Digital Circuits (1st ed.). McGraw-Hill Higher Education. Google Scholar
Digital Library
- Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, Hsuan Hsiao, Stephen Brown, Fabrizio Ferrandi, Jason Anderson, and Koen Bertels. 2016. A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 35, 10 (2016), 1591–1604. https://doi.org/10.1109/TCAD.2015.2513673 Google Scholar
Digital Library
- Alice C. Parker, Jorge T. Pizarro, and Mitch Mlinar. 1986. MAHA: A program for datapath synthesis. In Proceedings of the 23rd ACM/IEEE Design Automation Conference (DAC'86). IEEE Press, 461–466. https://doi.org/10.1109/DAC.1986.1586129 Google Scholar
Digital Library
- Pierre G. Paulin and John P. Knight. 1989. Force-directed scheduling for the behavioral synthesis of ASICs. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 8, 6 (1989), 661–679. https://doi.org/10.1109/43.31522 Google Scholar
Digital Library
- Christian Pilato and Fabrizio Ferrandi. 2013. Bambu: A modular framework for the high-level synthesis of memory-intensive applications. In Proceedings of the 23rd International Conference on Field programmable Logic and Applications. 1–4. https://doi.org/10.1109/FPL.2013.6645550Google Scholar
Cross Ref
- The Ohio State University 2020. PolyBench/C 4.0. Retrieved from https://sourceforge.net/p/polybench/wiki/Home/.Google Scholar
- Kazutoshi Wakabayashi and H. Tanaka. 1992. Global scheduling independent of control dependencies based on condition vectors. In Proceedings of the 29th ACM/IEEE Design Automation Conference (DAC'92). IEEE Computer Society Press, Washington, DC, 112–115. https://doi.org/10.1109/DAC.1992.227852 Google Scholar
Digital Library
- Xilinx 2020. Vivado Design Suite HLx Editions 2019.2. Retrieved from https://www.xilinx.com/products/design-tools/vivado.html.Google Scholar
- Yuko Hara, Hiroyuki Tomiyama, Shinya Honda, Hiroaki Takada, and Katsuya Ishii. 2008. CHStone: A benchmark program suite for practical c-based high-level synthesis. In Proceedings of the IEEE International Symposium on Circuits and Systems. 1192–1195. https://doi.org/10.1109/ISCAS.2008.4541637Google Scholar
- Zhiru Zhang and Bin Liu. 2013. SDC-based modulo scheduling for pipeline synthesis. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'13). 211–218. https://doi.org/10.1109/ICCAD.2013.6691121 Google Scholar
Digital Library
Index Terms
Dependency Graph-based High-level Synthesis for Maximum Instruction Parallelism
Recommendations
Layout-driven RTL binding techniques for high-level synthesis
ISSS '96: Proceedings of the 9th international symposium on System synthesisThe importance of effective and efficient accounting of layout effects is well-established in high-level synthesis (HLS), since it allows more realistic exploration of the design space and the generation of solutions with predictable metrics. This ...
An Equivalence-Checking Method for Scheduling Verification in High-Level Synthesis
A formal method for checking equivalence between a given behavioral specification prior to scheduling and the one produced by the scheduler is described. Finite state machine with data path (FSMD) models have been used to represent both the behaviors. ...
An Introduction to High-Level Synthesis
Editor's note:High-level synthesis raises the design abstraction level and allows rapid generation of optimized RTL hardware for performance, area, and power requirements. This article gives an overview of state-of-the-art HLS techniques and tools.—Tim ...






Comments