skip to main content
research-article

Dependency Graph-based High-level Synthesis for Maximum Instruction Parallelism

Authors Info & Claims
Published:13 September 2021Publication History
Skip Abstract Section

Abstract

Performance optimization is an important goal for High-level Synthesis (HLS). Existing HLS scheduling algorithms are all based on Control and Data Flow Graph (CDFG) and will schedule basic blocks in sequential order. Our study shows that the sequential scheduling order of basic blocks is a big limiting factor for achievable circuit performance. In this article, we propose a Dependency Graph (DG) with two important properties for scheduling. First, DG is a directed acyclic graph. Thus, no loop breaking heuristic is needed for scheduling. Second, DG can be used to identify the exact instruction parallelism. Our experiment shows that DG can lead to 76% instruction parallelism increase over CDFG. Based on DG, we propose a bottom-up scheduling algorithm to achieve much higher instruction parallelism than existing algorithms. Hierarchical state transition graph with guard conditions is proposed for efficient implementation of such high parallelism scheduling. Our experimental results show that our DG-based HLS algorithm can outperform the CDFG-based LegUp and the state-of-the-art industrial tool Vivado HLS by 2.88× and 1.29× on circuit latency, respectively.

References

  1. David F. Bacon, Susan L. Graham, and Oliver J. Sharp. 1994. Compiler transformations for high-performance computing. ACM Comput. Surv. 26, 4 (Dec. 1994), 345–420. https://doi.org/10.1145/197405.197406 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Utpal Banerjee, Rudolf Eigenmann, Alexandru Nicolau, and David A. Padua. 1993. Automatic program parallelization. Proc. IEEE 81, 2 (1993), 211–243. https://doi.org/10.1109/5.214548Google ScholarGoogle ScholarCross RefCross Ref
  3. Raul Camposano. 1991. Path-based scheduling for synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 10, 1 (1991), 85–93. https://doi.org/10.1109/43.62794 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrew Canis, Stephen D. Brown, and Jason H. Anderson. 2014. Modulo SDC scheduling with recurrence minimization in high-level synthesis. In Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL'14). 1–8. https://doi.org/10.1109/FPL.2014.6927490Google ScholarGoogle Scholar
  5. Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'11). Association for Computing Machinery, New York, NY, 33–36. https://doi.org/10.1145/1950413.1950423 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jason Cong and Zhiru Zhang. 2006. An efficient and versatile scheduling algorithm based on SDC formulation. In Proceedings of the 43rd Annual Design Automation Conference (DAC'06). Association for Computing Machinery, New York, NY, 433–438. https://doi.org/10.1145/1146909.1147025 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ron K. Cytron, Jeanne Ferrante, Barry K. Rosen, M. N. Wegman, and Frank Kenneth Zadeck. 1989. An efficient method of computing static single assignment form. In Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'89). Association for Computing Machinery, New York, NY, 25–35. https://doi.org/10.1145/75277.75280 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 3 (July 1987), 319–349. https://doi.org/10.1145/24039.24041 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Marcel Gort and Jason H. Anderson. 2013. Range and bitmask analysis for hardware optimization in high-level synthesis. In Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC'13). 773–779. https://doi.org/10.1109/ASPDAC.2013.6509694Google ScholarGoogle Scholar
  10. Zhenghua Gu, Wenqing Wan, and Chang Wu. 2019. Latency minimal scheduling with maximum instruction parallelism. In Proceedings of the IEEE 13th International Conference on ASIC (ASICON'19). 1–4. https://doi.org/10.1109/ASICON47005.2019.8983520Google ScholarGoogle ScholarCross RefCross Ref
  11. Sumit Gupta, Nick Savoiu, Nikil Dutt, Rajesh Gupta, and Alexandru Nicolau. 2004. Using global code motions to improve the quality of results for high-level synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 23, 2 (2004), 302–312. https://doi.org/10.1109/TCAD.2003.822105 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Stefan Hadjis, Andrew Canis, Jason H. Anderson, Jongsok Choi, Kevin Nam, Stephen Brown, and Tomasz Czajkowski. 2012. Impact of FPGA architecture on resource sharing in high-level synthesis. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'12). Association for Computing Machinery, New York, NY, 111–114. https://doi.org/10.1145/2145694.2145712 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hongbin Zheng, Qingrui Liu, Junyi Li, Dihu Chen, and Zixin Wang. 2013. A gradual scheduling framework for problem size reduction and cross basic block parallelism exploitation in high-level synthesis. In Proceedings of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC'13). 780–786. https://doi.org/10.1109/ASPDAC.2013.6509695Google ScholarGoogle Scholar
  14. Cheng-Tsung Hwang, Jiahn-Hurng Lee, and Yu Chin Hsu. 1991. A formal approach to the scheduling problem in high-level synthesis. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 10, 4 (1991), 464–475. https://doi.org/10.1109/43.75629 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lana Josipović, Radhika Ghosal, and Paolo Ienne. 2018. Dynamically scheduled high-level synthesis. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'18). Association for Computing Machinery, New York, NY, 127–136. https://doi.org/10.1145/3174243.3174264 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. David Ku and Gionvanni De Micheli. 1991. Relative scheduling under timing constraints. In Proceedings of the 27th ACM/IEEE Design Automation Conference (DAC'90). Association for Computing Machinery, New York, NY, 59–64. https://doi.org/10.1145/123186.123227 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO'04). IEEE Computer Society, 75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Marco Lattuada and Fabrizio Ferrandi. 2015. Code transformations based on speculative SDC scheduling. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'15). IEEE Press, 71–77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Giovanni De Micheli. 1994. Synthesis and Optimization of Digital Circuits (1st ed.). McGraw-Hill Higher Education. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, Hsuan Hsiao, Stephen Brown, Fabrizio Ferrandi, Jason Anderson, and Koen Bertels. 2016. A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 35, 10 (2016), 1591–1604. https://doi.org/10.1109/TCAD.2015.2513673 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alice C. Parker, Jorge T. Pizarro, and Mitch Mlinar. 1986. MAHA: A program for datapath synthesis. In Proceedings of the 23rd ACM/IEEE Design Automation Conference (DAC'86). IEEE Press, 461–466. https://doi.org/10.1109/DAC.1986.1586129 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Pierre G. Paulin and John P. Knight. 1989. Force-directed scheduling for the behavioral synthesis of ASICs. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 8, 6 (1989), 661–679. https://doi.org/10.1109/43.31522 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Christian Pilato and Fabrizio Ferrandi. 2013. Bambu: A modular framework for the high-level synthesis of memory-intensive applications. In Proceedings of the 23rd International Conference on Field programmable Logic and Applications. 1–4. https://doi.org/10.1109/FPL.2013.6645550Google ScholarGoogle ScholarCross RefCross Ref
  24. The Ohio State University 2020. PolyBench/C 4.0. Retrieved from https://sourceforge.net/p/polybench/wiki/Home/.Google ScholarGoogle Scholar
  25. Kazutoshi Wakabayashi and H. Tanaka. 1992. Global scheduling independent of control dependencies based on condition vectors. In Proceedings of the 29th ACM/IEEE Design Automation Conference (DAC'92). IEEE Computer Society Press, Washington, DC, 112–115. https://doi.org/10.1109/DAC.1992.227852 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xilinx 2020. Vivado Design Suite HLx Editions 2019.2. Retrieved from https://www.xilinx.com/products/design-tools/vivado.html.Google ScholarGoogle Scholar
  27. Yuko Hara, Hiroyuki Tomiyama, Shinya Honda, Hiroaki Takada, and Katsuya Ishii. 2008. CHStone: A benchmark program suite for practical c-based high-level synthesis. In Proceedings of the IEEE International Symposium on Circuits and Systems. 1192–1195. https://doi.org/10.1109/ISCAS.2008.4541637Google ScholarGoogle Scholar
  28. Zhiru Zhang and Bin Liu. 2013. SDC-based modulo scheduling for pipeline synthesis. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'13). 211–218. https://doi.org/10.1109/ICCAD.2013.6691121 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dependency Graph-based High-level Synthesis for Maximum Instruction Parallelism

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 14, Issue 4
      December 2021
      165 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/3483341
      • Editor:
      • Deming Chen
      Issue’s Table of Contents

      Copyright © 2021 Association for Computing Machinery.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 September 2021
      • Accepted: 1 May 2021
      • Revised: 1 April 2021
      • Received: 1 February 2021
      Published in trets Volume 14, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)29
      • Downloads (Last 6 weeks)3

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!