Abstract
Approximate arithmetic operators, such as adders and multipliers, are increasingly used to satisfy the energy and performance requirements of resource-constrained embedded systems. However, most of the available approximate operators have an application-agnostic design methodology, and the efficacy of these operators can only be evaluated by employing them in the applications. Furthermore, the various available libraries of approximate operators do not share any standard approximation-induction policy to design new operators according to an application’s accuracy and performance constraints. These limitations also hinder the utilization of machine learning models to explore and determine approximate operators according to an application’s requirements. In this work, we present a generic design methodology for implementing FPGA-based application-specific approximate arithmetic operators. Our proposed technique utilizes lookup tables and carry-chains of FPGAs to implement approximate operators according to the input configurations. For instance, for an \( \text{M}\times \text{N} \) accurate multiplier utilizing K lookup tables, our methodology utilizes K-bit configurations to design \( 2^K \) approximate multipliers. We then utilize various machine learning models to evaluate and select configurations satisfying application accuracy and performance constraints. We have evaluated our proposed methodology for three benchmark applications, i.e., biomedical signal processing, image processing, and ANNs. We report more non-dominated approximate multipliers with better hypervolume contribution than state-of-the-art designs for these benchmark applications with the proposed design methodology.
- [1] . 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th \( \lbrace \)USENIX\( \rbrace \) Symposium on Operating Systems Design and Implementation. 265–283.Google Scholar
- [2] . 2021. MNIST-cnn. Retrieved February 2, 2021 from https://github.com/integeruser/MNIST-cnn.Google Scholar
- [3] . 1997. Data mining with decision trees and decision rules. Future Generation Computer Systems 13, 2–3 (1997), 197–210.Google Scholar
Digital Library
- [4] . 1973. A two’s complement parallel array multiplication algorithm. IEEE Transactions on Computers 100, 12 (1973), 1045–1047.Google Scholar
Digital Library
- [5] . 2020. A parallel global multiobjective framework for optimization: pagmo. Journal of Open Source Software 5, 53 (2020), 2338.
DOI: Google ScholarCross Ref
- [6] . 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010. Springer, 177–186.Google Scholar
Cross Ref
- [7] . 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference. Association for Computing Machinery, New York, NY, Article
113 , 9 pages. Google ScholarDigital Library
- [8] . 2017. AF classification from a short single lead ECG recording: The PhysioNet/computing in cardiology challenge 2017. In Proceedings of the 2017 Computing in Cardiology. 1–4.
DOI: Google ScholarCross Ref
- [9] . 2012. The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine 29, 6 (2012), 141–142.Google Scholar
Cross Ref
- [10] . 2020. SIMDive: Approximate SIMD soft multiplier-divider for FPGAs with tunable accuracy. In Proceedings of the 2020 on Great Lakes Symposium on VLSI. Association for Computing Machinery, New York, NY, 151–156.
DOI: Google ScholarDigital Library
- [11] . 2002. Stochastic gradient boosting. Computational Statistics and Data Analysis 38, 4 (2002), 367–378.Google Scholar
Cross Ref
- [12] . 2013. Low-power digital signal processing using approximate adders. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 1 (2013), 124–137.
DOI: Google ScholarDigital Library
- [13] . 2015. DRUM: A dynamic range unbiased multiplier for approximate applications. In Proceedings of the 2015 IEEE/ACM International Conference on Computer-Aided Design. IEEE, 418–425.Google Scholar
Digital Library
- [14] . 2011. Design and application of faithfully rounded and truncated multipliers with combined deletion, reduction, truncation, and rounding. IEEE Transactions on Circuits and Systems II: Express Briefs 58, 5 (2011), 304–308.
DOI: Google ScholarCross Ref
- [15] . 2011. Trading accuracy for power with an underdesigned multiplier architecture. In Proceedings of the 2011 24th Internatioal Conference on VLSI Design. 346–351.
DOI: Google ScholarDigital Library
- [16] . 2010. Low-power high-speed multiplier for error-tolerant application. In Proceedings of the 2010 IEEE International Conference of Electron Devices and Solid-state Circuits. IEEE, 1–4.Google Scholar
- [17] . 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.
DOI: Google ScholarCross Ref
- [18] . 2002. Classification and regression by randomForest. R News 2, 3 (2002), 18–22.Google Scholar
- [19] . 2016. A survey of techniques for approximate computing. ACM Computing Surveys 48, 4, Article
62 (March 2016), 33 pages.DOI: Google ScholarDigital Library
- [20] . 2019. AutoAx: An automatic design space exploration and circuit building methodology utilizing libraries of approximate components. In Proceedings of the 56th Annual Design Automation Conference 2019 . Association for Computing Machinery, New York, NY, Article
123 , 6 pages.DOI: Google ScholarDigital Library
- [21] . 2017. EvoApprox8b: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods. In Proceedings of the Design, Automation Test in Europe Conference Exhibition. 258–261.
DOI: Google ScholarCross Ref
- [22] . 2016. Design of power-efficient approximate multipliers for approximate artificial neural networks. In Proceedings of the 35th International Conference on Computer-Aided Design . Association for Computing Machinery, New York, NY, Article
81 , 7 pages.DOI: Google ScholarDigital Library
- [23] . 2020. Libraries of approximate circuits: automated design and application in CNN accelerators. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 10, 4 (2020), 406–418.
DOI: Google ScholarCross Ref
- [24] . 1991. Multilayer perceptrons for classification and regression. Neurocomputing 2, 5–6 (1991), 183–197.Google Scholar
Cross Ref
- [25] . 1985. A real-time QRS detection algorithm. IEEE Transactions on Biomedical Engineering BME-32, 3 (1985), 230–236.
DOI: Google ScholarCross Ref
- [26] . 2010. Truncated binary multipliers with variable correction and minimum mean square error. IEEE Transactions on Circuits and Systems I: Regular Papers 57, 6 (2010), 1312–1325.
DOI: Google ScholarDigital Library
- [27] . 2020. ApproxFPGAs: Embracing ASIC-based approximate arithmetic components for FPGA-based systems. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference.
DOI: Google ScholarCross Ref
- [28] . 2018. DeMAS: An efficient design methodology for building approximate adders for FPGA-based systems. In Proceedings of the 2018 Design, Automation Test in Europe Conference Exhibition. 917–920.
DOI: Google ScholarCross Ref
- [29] . 2016. Architectural-space exploration of approximate multipliers. In Proceedings of the 2016 IEEE/ACM International Conference on Computer-Aided Design. 1–8.
DOI: Google ScholarDigital Library
- [30] . 2015. A low latency generic accuracy configurable adder. In Proceedings of the 52nd Annual Design Automation Conference. Association for Computing Machinery, New York, NY, Article
86 , 6 pages.DOI: Google ScholarDigital Library
- [31] . 2004. A tutorial on support vector regression. Statistics and Computing 14, 3 (2004), 199–222.Google Scholar
Digital Library
- [32] . 2018. SMApproxlib: Library of FPGA-based approximate multipliers. In Proceedings of the 55th Annual Design Automation Conference . Association for Computing Machinery, New York, NY, Article
Article 157 , 6 pages.DOI: Google ScholarDigital Library
- [33] . 2021. Energy-efficient low-latency signed multiplier for FPGA-based hardware accelerators. IEEE Embedded Systems Letters 13, 2 (
jun 2021), 41–44.DOI: Google ScholarCross Ref
- [34] . 2018. Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators. In Proceedings of the 55th Annual Design Automation Conference. Association for Computing Machinery, New York, NY, Article
Article 159 , 6 pages.DOI: Google ScholarDigital Library
- [35] . 2021. High-performance accurate and approximate multipliers for FPGA-based hardware accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41, 2 (2021), 1–1.
DOI: Google ScholarCross Ref
- [36] . 2021. Area-optimized accurate and approximate softcore signed multiplier architectures. IEEE Transactions on Computers 70, 3 (2021), 384–392.
DOI: Google ScholarDigital Library
- [37] . 2017. UltraScale Architecture Configurable Logic Block. https://docs.xilinx.com/v/u/en-US/ug574-ultrascale-clb.Google Scholar
- [38] . 2017. AxBench: A multiplatform benchmark suite for approximate computing. IEEE Design Test 34, 2 (2017), 60–68.
DOI: Google ScholarCross Ref
Index Terms
AppAxO: Designing Application-specific Approximate Operators for FPGA-based Embedded Systems
Recommendations
FPGA design and implementation of truncated multipliers using bypassing technique
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and InformaticsIn this paper, we investigate the design and implementation of standard and fixed-width 8 x 8 multipliers using row bypassing technique. The design is described using VERILOG Hardware Descriptive Language and implemented using XILINX ISE 12.1v tool. ...
Energy efficient logarithmic-based approximate divider for ASIC and FPGA-based implementations
AbstractThe main focus of approximate dividers has been on ASIC-based designs. However, for emerging applications, there is a need to design approximate arithmetic units compatible with FPGA applications due to their inherent capabilities. A ...
A novel three-input approximate XOR gate design based on quantum-dot cellular automata
Quantum-dot cellular automata (QCA) are one of the most promising emerging nanoelectronic paradigms used for designing computers and very large-scale integration circuits. Many applications can tolerate the errors and imprecision of digital systems; ...






Comments