skip to main content
research-article

AppAxO: Designing Application-specific Approximate Operators for FPGA-based Embedded Systems

Authors Info & Claims
Published:28 May 2022Publication History
Skip Abstract Section

Abstract

Approximate arithmetic operators, such as adders and multipliers, are increasingly used to satisfy the energy and performance requirements of resource-constrained embedded systems. However, most of the available approximate operators have an application-agnostic design methodology, and the efficacy of these operators can only be evaluated by employing them in the applications. Furthermore, the various available libraries of approximate operators do not share any standard approximation-induction policy to design new operators according to an application’s accuracy and performance constraints. These limitations also hinder the utilization of machine learning models to explore and determine approximate operators according to an application’s requirements. In this work, we present a generic design methodology for implementing FPGA-based application-specific approximate arithmetic operators. Our proposed technique utilizes lookup tables and carry-chains of FPGAs to implement approximate operators according to the input configurations. For instance, for an \( \text{M}\times \text{N} \) accurate multiplier utilizing K lookup tables, our methodology utilizes K-bit configurations to design \( 2^K \) approximate multipliers. We then utilize various machine learning models to evaluate and select configurations satisfying application accuracy and performance constraints. We have evaluated our proposed methodology for three benchmark applications, i.e., biomedical signal processing, image processing, and ANNs. We report more non-dominated approximate multipliers with better hypervolume contribution than state-of-the-art designs for these benchmark applications with the proposed design methodology.

REFERENCES

  1. [1] Abadi Martín, Barham Paul, Chen Jianmin, Chen Zhifeng, Davis Andy, Dean Jeffrey, Devin Matthieu, Ghemawat Sanjay, Irving Geoffrey, Isard Michael, et al. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th \( \lbrace \)USENIX\( \rbrace \) Symposium on Operating Systems Design and Implementation. 265283.Google ScholarGoogle Scholar
  2. [2] Anonymous. 2021. MNIST-cnn. Retrieved February 2, 2021 from https://github.com/integeruser/MNIST-cnn.Google ScholarGoogle Scholar
  3. [3] Apté Chidanand and Weiss Sholom. 1997. Data mining with decision trees and decision rules. Future Generation Computer Systems 13, 2–3 (1997), 197210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Baugh Charles R. and Wooley Bruce A.. 1973. A two’s complement parallel array multiplication algorithm. IEEE Transactions on Computers 100, 12 (1973), 10451047.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Biscani Francesco and Izzo Dario. 2020. A parallel global multiobjective framework for optimization: pagmo. Journal of Open Source Software 5, 53 (2020), 2338. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Bottou Léon. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010. Springer, 177186.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chippa Vinay K., Chakradhar Srimat T., Roy Kaushik, and Raghunathan Anand. 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference. Association for Computing Machinery, New York, NY, Article 113, 9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Clifford Gari D., Liu Chengyu, Moody Benjamin, Lehman Li-wei H., Silva Ikaro, Li Qiao, Johnson A. E., and Mark Roger G.. 2017. AF classification from a short single lead ECG recording: The PhysioNet/computing in cardiology challenge 2017. In Proceedings of the 2017 Computing in Cardiology. 14. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Deng Li. 2012. The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine 29, 6 (2012), 141142.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Ebrahimi Zahra, Ullah Salim, and Kumar Akash. 2020. SIMDive: Approximate SIMD soft multiplier-divider for FPGAs with tunable accuracy. In Proceedings of the 2020 on Great Lakes Symposium on VLSI. Association for Computing Machinery, New York, NY, 151156. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Friedman Jerome H.. 2002. Stochastic gradient boosting. Computational Statistics and Data Analysis 38, 4 (2002), 367378.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Gupta Vaibhav, Mohapatra Debabrata, Raghunathan Anand, and Roy Kaushik. 2013. Low-power digital signal processing using approximate adders. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 1 (2013), 124137. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Hashemi Soheil, Bahar R. Iris, and Reda Sherief. 2015. DRUM: A dynamic range unbiased multiplier for approximate applications. In Proceedings of the 2015 IEEE/ACM International Conference on Computer-Aided Design. IEEE, 418425.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Ko Hou-Jen and Hsiao Shen-Fu. 2011. Design and application of faithfully rounded and truncated multipliers with combined deletion, reduction, truncation, and rounding. IEEE Transactions on Circuits and Systems II: Express Briefs 58, 5 (2011), 304308. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Kulkarni Parag, Gupta Puneet, and Ercegovac Milos. 2011. Trading accuracy for power with an underdesigned multiplier architecture. In Proceedings of the 2011 24th Internatioal Conference on VLSI Design. 346351. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Kyaw Khaing Yin, Goh Wang Ling, and Yeo Kiat Seng. 2010. Low-power high-speed multiplier for error-tolerant application. In Proceedings of the 2010 IEEE International Conference of Electron Devices and Solid-state Circuits. IEEE, 14.Google ScholarGoogle Scholar
  17. [17] LeCun Yann, Bottou Léon, Bengio Yoshua, and Haffner Patrick. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 22782324. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Liaw Andy, Wiener Matthew, et al. 2002. Classification and regression by randomForest. R News 2, 3 (2002), 1822.Google ScholarGoogle Scholar
  19. [19] Mittal Sparsh. 2016. A survey of techniques for approximate computing. ACM Computing Surveys 48, 4, Article 62 (March 2016), 33 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Mrazek Vojtech, Hanif Muhammad Abdullah, Vasicek Zdenek, Sekanina Lukas, and Shafique Muhammad. 2019. AutoAx: An automatic design space exploration and circuit building methodology utilizing libraries of approximate components. In Proceedings of the 56th Annual Design Automation Conference 2019 . Association for Computing Machinery, New York, NY, Article 123, 6 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Mrazek Vojtech, Hrbacek Radek, Vasicek Zdenek, and Sekanina Lukas. 2017. EvoApprox8b: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods. In Proceedings of the Design, Automation Test in Europe Conference Exhibition. 258261. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Mrazek Vojtech, Sarwar Syed Shakib, Sekanina Lukas, Vasicek Zdenek, and Roy Kaushik. 2016. Design of power-efficient approximate multipliers for approximate artificial neural networks. In Proceedings of the 35th International Conference on Computer-Aided Design . Association for Computing Machinery, New York, NY, Article 81, 7 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Mrazek Vojtech, Sekanina Lukas, and Vasicek Zdenek. 2020. Libraries of approximate circuits: automated design and application in CNN accelerators. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 10, 4 (2020), 406418. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Murtagh Fionn. 1991. Multilayer perceptrons for classification and regression. Neurocomputing 2, 5–6 (1991), 183197.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Pan Jiapu and Tompkins Willis J.. 1985. A real-time QRS detection algorithm. IEEE Transactions on Biomedical Engineering BME-32, 3 (1985), 230236. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Petra Nicola, Caro Davide De, Garofalo Valeria, Napoli Ettore, and Strollo Antonio G. M.. 2010. Truncated binary multipliers with variable correction and minimum mean square error. IEEE Transactions on Circuits and Systems I: Regular Papers 57, 6 (2010), 13121325. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Prabakaran Bharath Srinivas, Mrazek Vojtech, Vasicek Zdenek, Sekanina Lukas, and Shafique Muhammad. 2020. ApproxFPGAs: Embracing ASIC-based approximate arithmetic components for FPGA-based systems. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Prabakaran Bharath Srinivas, Rehman Semeen, Hanif Muhammad Abdullah, Ullah Salim, Mazaheri Ghazal, Kumar Akash, and Shafique Muhammad. 2018. DeMAS: An efficient design methodology for building approximate adders for FPGA-based systems. In Proceedings of the 2018 Design, Automation Test in Europe Conference Exhibition. 917920. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Rehman Semeen, El-Harouni Walaa, Shafique Muhammad, Kumar Akash, Henkel Jorg, and Henkel Jörg. 2016. Architectural-space exploration of approximate multipliers. In Proceedings of the 2016 IEEE/ACM International Conference on Computer-Aided Design. 18. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Shafique Muhammad, Ahmad Waqas, Hafiz Rehan, and Henkel Jörg. 2015. A low latency generic accuracy configurable adder. In Proceedings of the 52nd Annual Design Automation Conference. Association for Computing Machinery, New York, NY, Article 86, 6 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Smola Alex J. and Schölkopf Bernhard. 2004. A tutorial on support vector regression. Statistics and Computing 14, 3 (2004), 199222.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Ullah Salim, Murthy Sanjeev Sripadraj, and Kumar Akash. 2018. SMApproxlib: Library of FPGA-based approximate multipliers. In Proceedings of the 55th Annual Design Automation Conference . Association for Computing Machinery, New York, NY, Article Article 157, 6 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Ullah Salim, Nguyen Tuan Duy Anh, and Kumar Akash. 2021. Energy-efficient low-latency signed multiplier for FPGA-based hardware accelerators. IEEE Embedded Systems Letters 13, 2 (jun 2021), 4144. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Ullah Salim, Rehman Semeen, Prabakaran Bharath Srinivas, Kriebel Florian, Hanif Muhammad Abdullah, Shafique Muhammad, and Kumar Akash. 2018. Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators. In Proceedings of the 55th Annual Design Automation Conference. Association for Computing Machinery, New York, NY, Article Article 159, 6 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Ullah Salim, Rehman Semeen, Shafique Muhammad, and Kumar Akash. 2021. High-performance accurate and approximate multipliers for FPGA-based hardware accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41, 2 (2021), 11. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Ullah Salim, Schmidl Hendrik, Sahoo Siva Satyendra, Rehman Semeen, and Kumar Akash. 2021. Area-optimized accurate and approximate softcore signed multiplier architectures. IEEE Transactions on Computers 70, 3 (2021), 384392. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Xilinx. 2017. UltraScale Architecture Configurable Logic Block. https://docs.xilinx.com/v/u/en-US/ug574-ultrascale-clb.Google ScholarGoogle Scholar
  38. [38] Yazdanbakhsh Amir, Mahajan Divya, Esmaeilzadeh Hadi, and Lotfi-Kamran Pejman. 2017. AxBench: A multiplatform benchmark suite for approximate computing. IEEE Design Test 34, 2 (2017), 6068. DOI:Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. AppAxO: Designing Application-specific Approximate Operators for FPGA-based Embedded Systems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Embedded Computing Systems
          ACM Transactions on Embedded Computing Systems  Volume 21, Issue 3
          May 2022
          365 pages
          ISSN:1539-9087
          EISSN:1558-3465
          DOI:10.1145/3530307
          • Editor:
          • Tulika Mitra
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 28 May 2022
          • Online AM: 11 February 2022
          • Accepted: 1 January 2022
          • Revised: 1 October 2021
          • Received: 1 August 2021
          Published in tecs Volume 21, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!