skip to main content
research-article

Adaptive Clock Management of HLS-generated Circuits on FPGAs

Published:14 December 2022Publication History
Skip Abstract Section

Abstract

In this article, we present Syncopation, a performance-boosting fine-grained timing analysis and adaptive clock management technique for High-Level Synthesis-generated circuits implemented on Field-Programmable Gate Arrays. The key idea is to use the HLS scheduling information along with the placement and routing results to determine the worst-case timing path for individual clock cycles. By adjusting the clock period on a cycle-by-cycle basis, we can increase performance of an HLS-generated circuit. Our experiments show that Syncopation improves performance by 3.2% (geomean) across all benchmarks (up to 47%). In addition, by employing targeted synthesis techniques along with Syncopation, we can achieve 10.3% performance improvement (geomean) across all benchmarks (up to 50%). Syncopation instrumentation is implemented entirely in soft logic without requiring alterations to the HLS-synthesis toolchain or changes to the FPGA, and has been validated on real hardware.

REFERENCES

  1. [1] Bower Jacob A., Luk Wayne, Mencer Oskar, Flynn Michael J., and Morf Martin. 2006. Dynamic clock-frequencies for FPGAs. Microprocess. Microsyst. 30 (2006), 388397.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Canis Andrew, Choi Jongsok, Aldham Mark, Zhang Victor, Kammoona Ahmed, Czajkowski Tomasz, Brown Stephen, and Anderson Jason. 2013. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans. Embed. Comput. Syst. 13 (092013). DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Chae K., Mukhopadhyay S., Lee Chang-Ho, and Laskar J.. 2010. A dynamic timing control technique utilizing time borrowing and clock stretching. In IEEE Custom Integrated Circuits Conference. 14.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chen Y. T., Kim J. H., Li K., Hoyes G., and Anderson J. H.. n.d. High-level synthesis techniques to generate deeply pipelined circuits for FPGAs with registered routing. In International Conference on Field-Programmable Technology (ICFPT).Google ScholarGoogle Scholar
  5. [5] Cheng Jianyi, Josipovic Lana, Constantinides George A., Ienne Paolo, and Wickerson John. 2020. Combining dynamic & static scheduling in high-level synthesis. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’20). Association for Computing Machinery, New York, NY, 288298. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Constantin J., Bonetti A., Teman A., Müller C., Schmid L., and Burg A.. 2016. DynOR: A 32-bit microprocessor in 28 nm FD-SOI with cycle-by-cycle dynamic clock adjustment. In 42nd European Solid-State Circuits Conference. 261264. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Rosa L. de Souza, Bonato V., and Bouganis C.. 2018. Scaling up loop pipelining for high-level synthesis: A non-iterative approach. In International Conference on Field-Programmable Technology (FPT). 6269. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Dong X. and Lemieux G. G. F.. 2009. PGR: Period and glitch reduction via clock skew scheduling, delay padding and glitchless. In International Conference on Field-Programmable Technology. 8895. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Dutt S. and Shi O.. 2018. A fast and effective lookahead and fractional search based scheduling algorithm for high-level synthesis. In Design, Automation Test in Europe Conference Exhibition (DATE). 3136.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Fishburn J. P.. 1990. Clock skew optimization. IEEE Trans. Comput. 39, 7 (1990), 945951. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Gheolbǎnoiu A., Petricǎ L., and Coţofanǎ S.. 2015. Hybrid adaptive clock management for FPGA processor acceleration. In Design, Automation Test in Europe Conference Exhibition (DATE). 13591364. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Gibson K., Roorda E., Noronha D. H., and Wilton S.. 2020. Syncopation: Adaptive clock management for HLS-generated circuits on FPGAs. In 30th International Conference on Field Programmable Logic and Applications (FPL).Google ScholarGoogle Scholar
  13. [13] Gu Z., Wan W., and Wu C.. 2019. Latency minimal scheduling with maximum instruction parallelism. In IEEE 13th International Conference on ASIC (ASICON). 14.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Hadjis S., Canis A., Sobue R., Hara-Azumi Y., Tomiyama H., and Anderson J.. 2015. Profiling-driven multi-cycling in FPGA high-level synthesis. In Design, Automation Test in Europe Conference Exhibition (DATE). 3136. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Hutton Mike. 2017. Understanding how the new Intel Hyperflex FPGA architecture enables next-generation high-performance systems. Retrieved from https://www.intel.com/content/dam/www/programmab le/us/en/pdfs/literature/wp/wp-01231-understanding-how-hyperflex-architecture- enables-high-performance-systems.pdf.Google ScholarGoogle Scholar
  16. [16] Intel. 2017. Verilog HDL Synthesis Attributes and Directives. Retrieved from https://www.intel.com/content/www/us/en/programm able/quartushelp/17.0/hdl/vlog/vlog_file_dir.htm.Google ScholarGoogle Scholar
  17. [17] Intel. 2020. Intel® Quartus® Prime Pro Edition User Guide: Design Optimization. Retrieved from https://www.intel.com/content/dam/www/programmab le/us/en/pdfs/literature/ug/ug-qpp-design-optimization.pdf.Google ScholarGoogle Scholar
  18. [18] Jayasinghe D., Ignjatovic A., and Parameswaran S.. 2019. RFTC: Runtime frequency tuning countermeasure using FPGA dynamic reconfiguration to mitigate power analysis attacks. In 56th ACM/IEEE Design Automation Conference (DAC). 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Jia T., Joseph R., and Gu J.. 2019. 19.4 An adaptive clock management scheme exploiting instruction-based dynamic timing slack for a general-purpose graphics processor unit with deep pipeline and out-of-order execution. In IEEE International Solid-State Circuits Conference (ISSCC). 318320. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Jia T., Joseph R., and Gu J.. 2019. An instruction-driven adaptive clock management through dynamic phase scaling and compiler assistance for a low power microprocessor. IEEE J. Solid-State Circ. 54, 8 (Aug.2019), 23272338. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Josipovic L., Brisk P., and Ienne P.. 2017. From C to elastic circuits. In 51st Asilomar Conference on Signals, Systems, and Computers. 121125.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Josipović Lana, Ghosal Radhika, and Ienne Paolo. 2018. Dynamically scheduled high-level synthesis. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). Association for Computing Machinery, New York, NY, 127136. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Lapotre V., Coussy P., Chavet C., Wouafo H., and Danilo R.. 2013. Dynamic branch prediction for high-level synthesis. In 23rd International Conference on Field Programmable Logic and Applications. 16. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Legup. LegUp 4.0 Programmer’s Manual. Retrieved from http://legup.eecg.utoronto.ca/docs/4.0/programme rmanual.html.Google ScholarGoogle Scholar
  25. [25] Liang T., Zhao J., Feng L., Sinha S., and Zhang W.. 2019. Hi-ClockFlow: Multi-clock dataflow automation and throughput optimization in high-level synthesis. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 16.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Marty T., Yuki T., and Derrien S.. 2018. Enabling overclocking through algorithm-level error detection. In International Conference on Field-Programmable Technology (FPT). 174181.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Nestor J. A. and Krishnamoorthy G.. 1993. SALSA: A new approach to scheduling with timing constraints. IEEE Trans. Comput.-aid. Des. Integ. Circ. Syst. 12, 8 (1993), 11071122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Pontikakis B., Bui H. T., Boyer F., and Savaria Y.. 2007. A low-complexity high-speed clock generator for dynamic frequency scaling of FPGA and standard-cell based designs. In IEEE International Symposium on Circuits and Systems. 633636. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Ragavan R., Killian C., and Sentieys O.. 2016. Adaptive overclocking and error correction based on dynamic speculation window. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI). 325330.Google ScholarGoogle Scholar
  30. [30] Ragheb O. and Anderson J. H.. 2018. High-level synthesis of FPGA circuits with multiple clock domains. In IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 109116. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Ram D. S. H., Bhuvaneswari M. C., and Logesh S. M.. 2011. A novel evolutionary technique for multi-objective power, area and delay optimization in high level synthesis of datapaths. In IEEE Computer Society Annual Symposium on VLSI. 290295.Google ScholarGoogle Scholar
  32. [32] Shi K., Boland D., and Constantinides G. A.. 2013. Accuracy-performance tradeoffs on an FPGA through overclocking. In IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines. 2936.Google ScholarGoogle Scholar
  33. [33] Shi K., Boland D., Stott E., Bayliss S., and Constantinides G. A.. 2014. Datapath synthesis for overclocking: Online arithmetic for latency-accuracy trade-offs. In 51st ACM/EDAC/IEEE Design Automation Conference (DAC). 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Singh Deshanand P. and Brown Stephen D.. 2002. Constrained clock shifting for field programmable gate arrays. Association for Computing Machinery, New York, NY. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Williams Stephen M. and Line Mingjie. 2020. Reactive signal obfuscation with time-fracturing to counter information leakage in FPGAs. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’20). Association for Computing Machinery, New York, NY, 322. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Hara Yuko, Tomiyama Hiroyuki, Honda Shinya, Takada Hiroaki, and Ishii Katsuya. 2008. CHStone: A benchmark program suite for practical C-based high-level synthesis. In IEEE International Symposium on Circuits and Systems. 11921195. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Zheng H., Gurumani S. T., Yang L., Chen D., and Rupnow K.. 2013. High-level synthesis with behavioral level multi-cycle path analysis. In 23rd International Conference on Field-Programmable Logic and Applications. 18. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Zhou Yuan, Gupta Udit, Dai Steve, Zhao Ritchie, Srivastava Nitish, Jin Hanchen, Featherston Joseph, Lai Yi-Hsiang, Liu Gai, Velasquez Gustavo Angarita, et al. 2018. Rosetta: A realistic high-level synthesis benchmark suite for software programmable FPGAs. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 269278.Google ScholarGoogle Scholar

Index Terms

  1. Adaptive Clock Management of HLS-generated Circuits on FPGAs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Reconfigurable Technology and Systems
            ACM Transactions on Reconfigurable Technology and Systems  Volume 15, Issue 4
            December 2022
            476 pages
            ISSN:1936-7406
            EISSN:1936-7414
            DOI:10.1145/3540252
            • Editor:
            • Deming Chen
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 14 December 2022
            • Online AM: 30 May 2022
            • Accepted: 20 February 2022
            • Revised: 21 December 2021
            • Received: 14 September 2021
            Published in trets Volume 15, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed
          • Article Metrics

            • Downloads (Last 12 months)91
            • Downloads (Last 6 weeks)6

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!