Abstract
In this article, we present Syncopation, a performance-boosting fine-grained timing analysis and adaptive clock management technique for High-Level Synthesis-generated circuits implemented on Field-Programmable Gate Arrays. The key idea is to use the HLS scheduling information along with the placement and routing results to determine the worst-case timing path for individual clock cycles. By adjusting the clock period on a cycle-by-cycle basis, we can increase performance of an HLS-generated circuit. Our experiments show that Syncopation improves performance by 3.2% (geomean) across all benchmarks (up to 47%). In addition, by employing targeted synthesis techniques along with Syncopation, we can achieve 10.3% performance improvement (geomean) across all benchmarks (up to 50%). Syncopation instrumentation is implemented entirely in soft logic without requiring alterations to the HLS-synthesis toolchain or changes to the FPGA, and has been validated on real hardware.
- [1] . 2006. Dynamic clock-frequencies for FPGAs. Microprocess. Microsyst. 30 (2006), 388–397.Google Scholar
Cross Ref
- [2] . 2013. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans. Embed. Comput. Syst. 13 (
09 2013).DOI: Google ScholarDigital Library
- [3] . 2010. A dynamic timing control technique utilizing time borrowing and clock stretching. In IEEE Custom Integrated Circuits Conference. 1–4.Google Scholar
Cross Ref
- [4] . n.d. High-level synthesis techniques to generate deeply pipelined circuits for FPGAs with registered routing. In International Conference on Field-Programmable Technology (ICFPT).Google Scholar
- [5] . 2020. Combining dynamic & static scheduling in high-level synthesis. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’20). Association for Computing Machinery, New York, NY, 288–298.
DOI: Google ScholarDigital Library
- [6] . 2016. DynOR: A 32-bit microprocessor in 28 nm FD-SOI with cycle-by-cycle dynamic clock adjustment. In 42nd European Solid-State Circuits Conference. 261–264.
DOI: Google ScholarCross Ref
- [7] . 2018. Scaling up loop pipelining for high-level synthesis: A non-iterative approach. In International Conference on Field-Programmable Technology (FPT). 62–69.
DOI: Google ScholarCross Ref
- [8] . 2009. PGR: Period and glitch reduction via clock skew scheduling, delay padding and glitchless. In International Conference on Field-Programmable Technology. 88–95.
DOI: Google ScholarCross Ref
- [9] . 2018. A fast and effective lookahead and fractional search based scheduling algorithm for high-level synthesis. In Design, Automation Test in Europe Conference Exhibition (DATE). 31–36.Google Scholar
Cross Ref
- [10] . 1990. Clock skew optimization. IEEE Trans. Comput. 39, 7 (1990), 945–951.
DOI: Google ScholarDigital Library
- [11] . 2015. Hybrid adaptive clock management for FPGA processor acceleration. In Design, Automation Test in Europe Conference Exhibition (DATE). 1359–1364.
DOI: Google ScholarCross Ref
- [12] . 2020. Syncopation: Adaptive clock management for HLS-generated circuits on FPGAs. In 30th International Conference on Field Programmable Logic and Applications (FPL).Google Scholar
- [13] . 2019. Latency minimal scheduling with maximum instruction parallelism. In IEEE 13th International Conference on ASIC (ASICON). 1–4.Google Scholar
Cross Ref
- [14] . 2015. Profiling-driven multi-cycling in FPGA high-level synthesis. In Design, Automation Test in Europe Conference Exhibition (DATE). 31–36.
DOI: Google ScholarCross Ref
- [15] . 2017. Understanding how the new Intel Hyperflex FPGA architecture enables next-generation high-performance systems. Retrieved from https://www.intel.com/content/dam/www/programmab le/us/en/pdfs/literature/wp/wp-01231-understanding-how-hyperflex-architecture- enables-high-performance-systems.pdf.Google Scholar
- [16] . 2017. Verilog HDL Synthesis Attributes and Directives. Retrieved from https://www.intel.com/content/www/us/en/programm able/quartushelp/17.0/hdl/vlog/vlog_file_dir.htm.Google Scholar
- [17] . 2020. Intel® Quartus® Prime Pro Edition User Guide: Design Optimization. Retrieved from https://www.intel.com/content/dam/www/programmab le/us/en/pdfs/literature/ug/ug-qpp-design-optimization.pdf.Google Scholar
- [18] . 2019. RFTC: Runtime frequency tuning countermeasure using FPGA dynamic reconfiguration to mitigate power analysis attacks. In 56th ACM/IEEE Design Automation Conference (DAC). 1–6.Google Scholar
Digital Library
- [19] . 2019. 19.4 An adaptive clock management scheme exploiting instruction-based dynamic timing slack for a general-purpose graphics processor unit with deep pipeline and out-of-order execution. In IEEE International Solid-State Circuits Conference (ISSCC). 318–320.
DOI: Google ScholarCross Ref
- [20] . 2019. An instruction-driven adaptive clock management through dynamic phase scaling and compiler assistance for a low power microprocessor. IEEE J. Solid-State Circ. 54, 8 (
Aug. 2019), 2327–2338.DOI: Google ScholarCross Ref
- [21] . 2017. From C to elastic circuits. In 51st Asilomar Conference on Signals, Systems, and Computers. 121–125.Google Scholar
Cross Ref
- [22] . 2018. Dynamically scheduled high-level synthesis. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). Association for Computing Machinery, New York, NY, 127–136.
DOI: Google ScholarDigital Library
- [23] . 2013. Dynamic branch prediction for high-level synthesis. In 23rd International Conference on Field Programmable Logic and Applications. 1–6.
DOI: Google ScholarCross Ref
- [24] . LegUp 4.0 Programmer’s Manual. Retrieved from http://legup.eecg.utoronto.ca/docs/4.0/programme rmanual.html.Google Scholar
- [25] . 2019. Hi-ClockFlow: Multi-clock dataflow automation and throughput optimization in high-level synthesis. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1–6.Google Scholar
Cross Ref
- [26] . 2018. Enabling overclocking through algorithm-level error detection. In International Conference on Field-Programmable Technology (FPT). 174–181.Google Scholar
Cross Ref
- [27] . 1993. SALSA: A new approach to scheduling with timing constraints. IEEE Trans. Comput.-aid. Des. Integ. Circ. Syst. 12, 8 (1993), 1107–1122.Google Scholar
Digital Library
- [28] . 2007. A low-complexity high-speed clock generator for dynamic frequency scaling of FPGA and standard-cell based designs. In IEEE International Symposium on Circuits and Systems. 633–636.
DOI: Google ScholarCross Ref
- [29] . 2016. Adaptive overclocking and error correction based on dynamic speculation window. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI). 325–330.Google Scholar
- [30] . 2018. High-level synthesis of FPGA circuits with multiple clock domains. In IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 109–116.
DOI: Google ScholarCross Ref
- [31] . 2011. A novel evolutionary technique for multi-objective power, area and delay optimization in high level synthesis of datapaths. In IEEE Computer Society Annual Symposium on VLSI. 290–295.Google Scholar
- [32] . 2013. Accuracy-performance tradeoffs on an FPGA through overclocking. In IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines. 29–36.Google Scholar
- [33] . 2014. Datapath synthesis for overclocking: Online arithmetic for latency-accuracy trade-offs. In 51st ACM/EDAC/IEEE Design Automation Conference (DAC). 1–6.Google Scholar
Digital Library
- [34] . 2002. Constrained clock shifting for field programmable gate arrays. Association for Computing Machinery, New York, NY.
DOI: Google ScholarDigital Library
- [35] . 2020. Reactive signal obfuscation with time-fracturing to counter information leakage in FPGAs. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’20). Association for Computing Machinery, New York, NY, 322.
DOI: Google ScholarDigital Library
- [36] . 2008. CHStone: A benchmark program suite for practical C-based high-level synthesis. In IEEE International Symposium on Circuits and Systems. 1192–1195.
DOI: Google ScholarCross Ref
- [37] . 2013. High-level synthesis with behavioral level multi-cycle path analysis. In 23rd International Conference on Field-Programmable Logic and Applications. 1–8.
DOI: Google ScholarCross Ref
- [38] . 2018. Rosetta: A realistic high-level synthesis benchmark suite for software programmable FPGAs. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 269–278.Google Scholar
Index Terms
Adaptive Clock Management of HLS-generated Circuits on FPGAs
Recommendations
Clock power reduction for virtex-5 FPGAs
FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysClock network power in field-programmable gate arrays (FPGAs) is considered and two complementary approaches for clock power reduction in the Xilinx Virtex-5 FPGA are presented. The approaches are unique in that they leverage specific architectural ...
Using Dynamic Signal-Tracing to Debug Compiler-Optimized HLS Circuits on FPGAs
FCCM '15: Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing MachinesHigh-level synthesis (HLS) for FPGA designs has received considerable attention in recent years. To make this design methodology mainstream, improved debugging technologies are essential. Ideally, a user should be able to debug their design using the ...
Accelerating Synchronous Sequential Circuits Using an Adaptive Clock
VLSID '10: Proceedings of the 2010 23rd International Conference on VLSI DesignIn this paper we propose a scheme for enhancing the timing performance of a pre-designed synchronous sequential circuit. In the proposed scheme, a circuit is driven by two clocks. One of them is the conventional clock while the other one, having a ...






Comments