Abstract
In today’s FPGA-based soft-processors, one of the slowest instructions is integer division. Compared to the low single-digit latency of other arithmetic operations, the fixed 32-cycle latency of radix-2 division is substantially longer. Given that today’s soft-processors typically only implement radix-2 division—if they support hardware division at all—there is significant potential to improve the performance of integer dividers.
In this work, we present a set of high-performance, data-dependent, variable-latency integer dividers for FPGA-based soft-processors that we call Quick-Div. We compare them to various radix-N dividers and provide a thorough analysis in terms of latency and resource usage. In addition, we analyze the frequency scaling for such divider designs when (1) treated as a stand-alone unit and (2) integrated as part of a high-performance soft-processor. Moreover, we provide additional theoretical analysis of different dividers’ behaviour and develop a new better-performing Quick-Div variant, called Quick-radix-4. Experimental results show that our Quick-radix-4 design can achieve up to 6.8× better performance and 6.1× better performance-per-LUT over the radix-2 divider for applications such as random number generation. Even in cases where division operations constitute as little as 1% of all executed instructions, Quick-radix-4 provides a performance uplift of 16% compared to the radix-2 divider.
- [1] Cobham Gaisler A.B. 2021. GRLIB IP Core User’s Manual. Retrieved from gaisler.com/products/grlib/grip.pdf.Google Scholar
- [2] . 2001. A multi-radix approach to asynchronous division. In Proceedings of the 7th International Symposium on Asynchronous Circuits and Systems (ASYNC’01). 25–34.
DOI: DOI: https://doi.org/10.1109/ASYNC.2001.914066Google ScholarCross Ref
- [3] . 1994. High-radix division and square-root with speculation. IEEE Trans. Comput. 43, 8 (1994), 919–931.
DOI: DOI: https://doi.org/10.1109/12.295854Google ScholarDigital Library
- [4] . 1998. Integer Square Roots. Retrieved from https://www.embedded.com/electronics-blogs/programmer-s-toolbox/4219659/Integer-Square-Roots.Google Scholar
- [5] . 2012. Table-based division by small integer constants. In Proceedings of the 8th International Conference on Reconfigurable Computing: Architectures, Tools and Applications (ARC’12). Springer-Verlag, Berlin, 53–63.
DOI: DOI: https://doi.org/10.1007/978-3-642-28365-9_5Google ScholarDigital Library
- [6] . 2019. Embench: Open Benchmarks for Embedded Platforms. Retrieved from https://github.com/embench/embench-iot.Google Scholar
- [7] . 2000. Reciprocation, square root, inverse square root, and some elementary functions using small multipliers. IEEE Trans. Comput. 49, 7 (2000), 628–637.
DOI: DOI: https://doi.org/10.1109/12.863031Google ScholarDigital Library
- [8] . 2013. Vendor agnostic, high-performance, double precision Floating Point division for FPGAs. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’13). 1–5.
DOI: DOI: https://doi.org/10.1109/HPEC.2013.6670335Google ScholarCross Ref
- [9] . 2016. Open-source variable-precision floating-point library for major commercial FPGAs. ACM Trans. Reconfigurable Technol. Syst. 9, 3, Article 20 (
July 2016), 17 pages.DOI: DOI: https://doi.org/10.1145/2851507Google ScholarDigital Library
- [10] . 1964. Applications of division by convergence. Ph.D. Dissertation.Google Scholar
- [11] . 2019. A catalog and in-hardware evaluation of open-source drop-in compatible RISC-V softcore processors. In Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig’19). 1–8.
DOI: DOI: https://doi.org/10.1109/ReConFig48160.2019.8994796Google ScholarCross Ref
- [12] . 2007. Floating-point divider design for FPGAs. IEEE Trans. Very Large Scale Integr. Syst. 15, 1 (
Jan 2007), 115–118.DOI: DOI: https://doi.org/10.1109/TVLSI.2007.891099Google ScholarDigital Library
- [13] Intel Corp. 2020. Nios II Gen2 Processor Reference Guide. Retrieved from https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/nios2/n2cpu-nii5v1gen2.pdf.Google Scholar
- [14] ISO/IEC 14882:2011 2011. Information Technology—Programming Languages–C++.
Standard . International Organization for Standardization, Geneva, CH.Google Scholar - [15] . 2015. VHDL Implementation and Performance Analysis of two Division Algorithms. Master’s Thesis. University of Victoria.Google Scholar
- [16] . 2017. Fast Exact Integer Divisions Using Floating-point Operations. Retrieved from https://lemire.me/blog/2017/11/16/fast-exact-integer-divisions-using-floating-point-operations/.Google Scholar
- [17] . 2014. Low-latency double-precision floating-point division for FPGAs. In Proceedings of the International Conference on Field-Programmable Technology (FPT’14). 107–114.
DOI: DOI: https://doi.org/10.1109/FPT.2014.7082762Google ScholarCross Ref
- [18] . 1977. The skip-and-set fast-division algorithm. IEEE Trans. Comput. C-26, 10 (1977), 1030–1032.
DOI: DOI: https://doi.org/10.1109/TC.1977.1674740Google ScholarDigital Library
- [19] . 2018. Evaluating the performance efficiency of a soft-processor, variable-length, parallel-execution-unit architecture for FPGAs using the RISC-V ISA. In Proceedings of the IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’18), Vol. 00. 1–8.
DOI: DOI: https://doi.org/10.1109/FCCM.2018.00010Google ScholarCross Ref
- [20] . 2019. Rethinking integer divider design for FPGA-based soft-processors. In Proceedings of the IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’19). 289–297.
DOI: DOI: https://doi.org/10.1109/FCCM.2019.00046Google ScholarCross Ref
- [21] . 2017. TAIGA: A new RISC-V soft-processor framework enabling high-performance CPU architectural features. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL’17). 1–4.
DOI: DOI: https://doi.org/10.23919/FPL.2017.8056766Google ScholarCross Ref
- [22] . 1991. Simple radix 2 division and square root with skipping of some addition steps. In Proceedings of the 10th IEEE Symposium on Computer Arithmetic. 202–209.
DOI: DOI: https://doi.org/10.1109/ARITH.1991.145560Google ScholarCross Ref
- [23] . 2016. The Simple C RSA-32 Implementation. Retrieved from https://github.com/jmtorrespalma/sc-rsa.Google Scholar
- [24] . VexRiscv. Retrieved from https://github.com/SpinalHDL/VexRiscv.Google Scholar
- [25] . 1988. Random number generators: Good ones are hard to find. Commun. ACM 31, 10 (
Oct. 1988), 1192–1201.DOI: DOI: https://doi.org/10.1145/63039.63042Google ScholarDigital Library
- [26] . 2000. Fast Integer Square Root. Retrieved from http://ww1.microchip.com/downloads/en/AppNotes/91040a.pdf.Google Scholar
- [27] . 2018. Verilator 4.008. Retrieved from https://www.veripool.org/ftp/verilator_doc.pdf.Google Scholar
- [28] . 2004. Comparative study of SRT-dividers in FPGA. In Field Programmable Logic and Application, , , and (Eds.). Springer, Berlin, 209–220.Google Scholar
Cross Ref
- [29] . 2009. High speed fixed point dividers for FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications. 448–452.
DOI: DOI: https://doi.org/10.1109/FPL.2009.5272492Google ScholarCross Ref
- [30] . 2005. A High-performance Data-dependent Hardware Integer Divider. Master’s Thesis. University of Salzburg.Google Scholar
- [31] . ORCA: RISC-V by VectorBlox. Retrieved from github.com/VectorBlox/orca(mirror:https://github.com/riscveval/orca-1).Google Scholar
- [32] . 2007. Variable Precision Floating-Point Divide and Square Root for Efficient FPGA, Implementation of Image and Signal Processing Algorithms. Ph.D. Dissertation. EECS Department, Northeastern University.Google Scholar
- [33] . 2021. Square Root. Retrieved from https://en.wikipedia.org/wiki/Square_root.Google Scholar
- [34] Xilinx Inc. 2019. MicroBlaze Processor Reference Guide. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/ug984-vivado-microblaze-ref.pdf.Google Scholar
Index Terms
Quick-Div: Rethinking Integer Divider Design for FPGA-based Soft-processors
Recommendations
Microarchitectural Comparison of the MXP and Octavo Soft-Processor FPGA Overlays
Field-Programmable Gate Arrays (FPGAs) can yield higher performance and lower power than software solutions on CPUs or GPUs. However, designing with FPGAs requires specialized hardware design skills and hours-long CAD processing times. To reduce and ...
Design of High-Speed Digital Divider Units
The division operation has proved to be a much more difficult function to generate efficiently than the other elementary arithmetic operations. This is due primarily to the need to test the result of one iteration before proceeding to the next. The ...






Comments