skip to main content
research-article

Quick-Div: Rethinking Integer Divider Design for FPGA-based Soft-processors

Published:04 February 2022Publication History
Skip Abstract Section

Abstract

In today’s FPGA-based soft-processors, one of the slowest instructions is integer division. Compared to the low single-digit latency of other arithmetic operations, the fixed 32-cycle latency of radix-2 division is substantially longer. Given that today’s soft-processors typically only implement radix-2 division—if they support hardware division at all—there is significant potential to improve the performance of integer dividers.

In this work, we present a set of high-performance, data-dependent, variable-latency integer dividers for FPGA-based soft-processors that we call Quick-Div. We compare them to various radix-N dividers and provide a thorough analysis in terms of latency and resource usage. In addition, we analyze the frequency scaling for such divider designs when (1) treated as a stand-alone unit and (2) integrated as part of a high-performance soft-processor. Moreover, we provide additional theoretical analysis of different dividers’ behaviour and develop a new better-performing Quick-Div variant, called Quick-radix-4. Experimental results show that our Quick-radix-4 design can achieve up to 6.8× better performance and 6.1× better performance-per-LUT over the radix-2 divider for applications such as random number generation. Even in cases where division operations constitute as little as 1% of all executed instructions, Quick-radix-4 provides a performance uplift of 16% compared to the radix-2 divider.

REFERENCES

  1. [1] Cobham Gaisler A.B. 2021. GRLIB IP Core User’s Manual. Retrieved from gaisler.com/products/grlib/grip.pdf.Google ScholarGoogle Scholar
  2. [2] Cornetta G. and Cortadella J.. 2001. A multi-radix approach to asynchronous division. In Proceedings of the 7th International Symposium on Asynchronous Circuits and Systems (ASYNC’01). 2534. DOI: DOI: https://doi.org/10.1109/ASYNC.2001.914066Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Cortadella J. and Lang T.. 1994. High-radix division and square-root with speculation. IEEE Trans. Comput. 43, 8 (1994), 919931. DOI: DOI: https://doi.org/10.1109/12.295854Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Crenshaw Jack W.. 1998. Integer Square Roots. Retrieved from https://www.embedded.com/electronics-blogs/programmer-s-toolbox/4219659/Integer-Square-Roots.Google ScholarGoogle Scholar
  5. [5] de Dinechin Florent and Didier Laurent-Stéphane. 2012. Table-based division by small integer constants. In Proceedings of the 8th International Conference on Reconfigurable Computing: Architectures, Tools and Applications (ARC’12). Springer-Verlag, Berlin, 5363. DOI: DOI: https://doi.org/10.1007/978-3-642-28365-9_5Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Group Embench Task. 2019. Embench: Open Benchmarks for Embedded Platforms. Retrieved from https://github.com/embench/embench-iot.Google ScholarGoogle Scholar
  7. [7] Ercegovac M. D., Lang T., Muller J.-M., and Tisserand A.. 2000. Reciprocation, square root, inverse square root, and some elementary functions using small multipliers. IEEE Trans. Comput. 49, 7 (2000), 628637. DOI: DOI: https://doi.org/10.1109/12.863031Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Fang X. and Leeser M.. 2013. Vendor agnostic, high-performance, double precision Floating Point division for FPGAs. In Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC’13). 15. DOI: DOI: https://doi.org/10.1109/HPEC.2013.6670335Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Fang Xin and Leeser Miriam. 2016. Open-source variable-precision floating-point library for major commercial FPGAs. ACM Trans. Reconfigurable Technol. Syst. 9, 3, Article 20 (July 2016), 17 pages. DOI: DOI: https://doi.org/10.1145/2851507Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Goldschmidt Robert. 1964. Applications of division by convergence. Ph.D. Dissertation.Google ScholarGoogle Scholar
  11. [11] Heinz C., Lavan Y., Hofmann J., and Koch A.. 2019. A catalog and in-hardware evaluation of open-source drop-in compatible RISC-V softcore processors. In Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig’19). 18. DOI: DOI: https://doi.org/10.1109/ReConFig48160.2019.8994796Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Hemmert K. S. and Underwood K. D.. 2007. Floating-point divider design for FPGAs. IEEE Trans. Very Large Scale Integr. Syst. 15, 1 (Jan 2007), 115118. DOI: DOI: https://doi.org/10.1109/TVLSI.2007.891099Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Intel Corp. 2020. Nios II Gen2 Processor Reference Guide. Retrieved from https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/nios2/n2cpu-nii5v1gen2.pdf.Google ScholarGoogle Scholar
  14. [14] ISO/IEC 14882:2011 2011. Information Technology—Programming Languages–C++. Standard. International Organization for Standardization, Geneva, CH.Google ScholarGoogle Scholar
  15. [15] Khan Salman. 2015. VHDL Implementation and Performance Analysis of two Division Algorithms. Master’s Thesis. University of Victoria.Google ScholarGoogle Scholar
  16. [16] Lemire Daniel. 2017. Fast Exact Integer Divisions Using Floating-point Operations. Retrieved from https://lemire.me/blog/2017/11/16/fast-exact-integer-divisions-using-floating-point-operations/.Google ScholarGoogle Scholar
  17. [17] Liebig B. and Koch A.. 2014. Low-latency double-precision floating-point division for FPGAs. In Proceedings of the International Conference on Field-Programmable Technology (FPT’14). 107114. DOI: DOI: https://doi.org/10.1109/FPT.2014.7082762Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Ligomenides. 1977. The skip-and-set fast-division algorithm. IEEE Trans. Comput. C-26, 10 (1977), 10301032. DOI: DOI: https://doi.org/10.1109/TC.1977.1674740Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Matthews E., Aguila Z., and Shannon L.. 2018. Evaluating the performance efficiency of a soft-processor, variable-length, parallel-execution-unit architecture for FPGAs using the RISC-V ISA. In Proceedings of the IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’18), Vol. 00. 18. DOI: DOI: https://doi.org/10.1109/FCCM.2018.00010Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Matthews E., Lu A., Fang Z., and Shannon L.. 2019. Rethinking integer divider design for FPGA-based soft-processors. In Proceedings of the IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’19). 289297. DOI: DOI: https://doi.org/10.1109/FCCM.2019.00046Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Matthews E. and Shannon L.. 2017. TAIGA: A new RISC-V soft-processor framework enabling high-performance CPU architectural features. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL’17). 14. DOI: DOI: https://doi.org/10.23919/FPL.2017.8056766Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Montuschi P. and Ciminiera L.. 1991. Simple radix 2 division and square root with skipping of some addition steps. In Proceedings of the 10th IEEE Symposium on Computer Arithmetic. 202209. DOI: DOI: https://doi.org/10.1109/ARITH.1991.145560Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Palma Juan Manuel Torres. 2016. The Simple C RSA-32 Implementation. Retrieved from https://github.com/jmtorrespalma/sc-rsa.Google ScholarGoogle Scholar
  24. [24] Papon Charles. [n.d.]. VexRiscv. Retrieved from https://github.com/SpinalHDL/VexRiscv.Google ScholarGoogle Scholar
  25. [25] Park S. K. and Miller K. W.. 1988. Random number generators: Good ones are hard to find. Commun. ACM 31, 10 (Oct. 1988), 11921201. DOI: DOI: https://doi.org/10.1145/63039.63042Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Fosler Microchip Technology Inc. and Ross M.. 2000. Fast Integer Square Root. Retrieved from http://ww1.microchip.com/downloads/en/AppNotes/91040a.pdf.Google ScholarGoogle Scholar
  27. [27] Snyder Wilson. 2018. Verilator 4.008. Retrieved from https://www.veripool.org/ftp/verilator_doc.pdf.Google ScholarGoogle Scholar
  28. [28] Sutter Gustavo, Bioul Gery, and Deschamps Jean-Pierre. 2004. Comparative study of SRT-dividers in FPGA. In Field Programmable Logic and Application, Becker Jürgen, Platzner Marco, and Vernalde Serge (Eds.). Springer, Berlin, 209220.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Sutter G. and Deschamps J.. 2009. High speed fixed point dividers for FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications. 448452. DOI: DOI: https://doi.org/10.1109/FPL.2009.5272492Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Trummer Rainer K. L.. 2005. A High-performance Data-dependent Hardware Integer Divider. Master’s Thesis. University of Salzburg.Google ScholarGoogle Scholar
  31. [31] VectorBlox. [n.d.]. ORCA: RISC-V by VectorBlox. Retrieved from github.com/VectorBlox/orca(mirror:https://github.com/riscveval/orca-1).Google ScholarGoogle Scholar
  32. [32] Wang Xiaojun. 2007. Variable Precision Floating-Point Divide and Square Root for Efficient FPGA, Implementation of Image and Signal Processing Algorithms. Ph.D. Dissertation. EECS Department, Northeastern University.Google ScholarGoogle Scholar
  33. [33] Wikipedia. 2021. Square Root. Retrieved from https://en.wikipedia.org/wiki/Square_root.Google ScholarGoogle Scholar
  34. [34] Xilinx Inc. 2019. MicroBlaze Processor Reference Guide. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/ug984-vivado-microblaze-ref.pdf.Google ScholarGoogle Scholar

Index Terms

  1. Quick-Div: Rethinking Integer Divider Design for FPGA-based Soft-processors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 15, Issue 3
          September 2022
          353 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3508070
          • Editor:
          • Deming Chen
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 February 2022
          • Accepted: 1 November 2021
          • Revised: 1 October 2021
          • Received: 1 May 2021
          Published in trets Volume 15, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)269
          • Downloads (Last 6 weeks)21

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!