skip to main content
research-article

qLUT: Input-Aware Quantized Table Lookup for Energy-Efficient Approximate Accelerators

Published:27 September 2017Publication History
Skip Abstract Section

Abstract

Approximate computing has emerged as a popular design paradigm for optimizing the performance and energy consumption of error-resilient applications in domains such as machine learning, graphics, data analytics, etc. Numerous techniques for approximate computing have been proposed at different layers of the system stack, from circuits to architecture to software. In this work, we propose a new technique, called quantized table lookup, for approximating the meta-functions used in the core computational kernels of error-resilient applications. In contrast to prior work that directly approximates the functionality of the meta-functions, the proposed technique instead approximates the input data to the meta-functions by reducing/quantizing them to a much smaller set of values that we call quantized inputs. The small number of quantized inputs enables us to completely replace the energy-intensive arithmetic units in the meta-function with small and energy-efficient lookup tables (called quantized lookup tables or qLUT) that contain precomputed output values corresponding to the quantized inputs. The proposed approximation technique is not only highly generic, but also inherently quality-configurable and input-aware. Quality-configurability and input-awareness are achieved by modulating the size of the qLUT as well as selecting the values of the quantized inputs judiciously based on the statistics of the original input data. To evaluate the proposed technique, we have implemented the dominant meta-functions of nine error-resilient application benchmarks as quantized table lookup based hardware accelerators using 45nm technology. Experimental results demonstrate average energy savings of 46% at the application-level for minimal (<1%) loss in output quality.

References

  1. V. Chippa, S. Chakradhar, K. Roy, and A. Raghunathan. 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, 113:1--113:9. ISBN 978-1-4503-2071-9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Shafique, R. Hafiz, S. Rehman, W. El-Harouni, and J. Henkel. 2016. Invited: Cross-layer approximate computing: From logic to architectures. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Mohapatra, V. K. Chippa, A. Raghunathan, and K. Roy. 2011. Design of voltage-scalable meta-functions for approximate computing. In 2011 Design, Automation Test in Europe. 1--6.Google ScholarGoogle Scholar
  4. R. Amirtharajah and A. P. Chandrakasan. 2004. A micropower programmable DSP using approximate signal processing based on distributed arithmetic. IEEE Journal of Solid-State Circuits 39, 2 (Feb 2004), 337--347. ISSN 0018-9200.Google ScholarGoogle ScholarCross RefCross Ref
  5. F. de Dinechin and A. Tisserand. 2005. Multipartite table methods. IEEE Trans. Comput. Transactions on Computers 54, 3 (March 2005), 319--330. ISSN 0018-9340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Raha, S. Venkataramani, V. Raghunathan, and A. Raghunathan. 2015. Quality configurable reduce-and-rank for energy efficient approximate computing. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE). 665--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Alvarez, J. Corbal, and M. Valero. 2005. Fuzzy memoization for floating-point multimedia applications. IEEE Transactions on Computers 54, 7 (July 2005), 922--927. ISSN 0018-9340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Samadi, J. Lee, D. Jamshidi, A. Hormati, and S. Mahlke. 2013. SAGE: Self-tuning approximation for graphics engines. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, 13--24. ISBN 978-1-4503-2638-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. Chippa, A. Raghunathan, K. Roy, and S. Chakradhar. 2011. Dynamic effort scaling: Managing the quality-efficiency tradeoff. In Proceedings of the 48th Design Automation Conference (DAC’11). ACM, 603--608. ISBN 978-1-4503-0636-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Baek and T. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, 198--209. ISBN 978-1-4503-0019-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Martin, C. Fowlkes, D. Tal, and J. Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, volume 2, pages 416--423.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Venkataramani, V. Chippa, S. Chakradhar, K. Roy, and A. Raghunathan. 2013. Quality programmable vector processors for approximate computing. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, 1--12. ISBN 978-1-4503-2638-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Voronenko and M. Püschel. 2007. Multiplierless multiple constant multiplication. ACM Trans. Algorithms 3, 2 (May 2007). ISSN 1549-6325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Nguyen and A. Chatterjee. 2000. Number-splitting with shift-and-add decomposition for power and hardware optimization in linear DSP synthesis. IEEE Trans. Very Large Scale Integr. Syst. 8, 4 (August 2000), 419--424. ISSN 1063-8210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Potkonjak, M. B. Srivastava, and A. P. Chandrakasan. 1996. Multiple constant multiplications: efficient and versatile framework and algorithms for exploring common subexpression elimination. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 15, 2 (Feb 1996), 151--165. ISSN 0278-0070. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Ayinala and K. K. Parhi. 2013. Low-energy architectures for support vector machine computation. In 2013 Asilomar Conference on Signals, Systems and Computers. 2167--2171.Google ScholarGoogle Scholar
  17. V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy. 2011. IMPACT: IMPrecise adders for low-power approximate computing. In IEEE/ACM International Symposium on Low Power Electronics and Design. 409--414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Raha, H. Jayakumar, and V. Raghunathan. 2014. A power efficient video encoder using reconfigurable approximate arithmetic units. In 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems. 324--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Raha, H. Jayakumar, and V. Raghunathan. 2016. Input-based dynamic reconfiguration of approximate arithmetic units for video encoding. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 3 (March 2016), 846--857. ISSN 1063-8210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. K. Krause and I. Polian. 2011. Adaptive voltage over-scaling for resilient applications. In 2011 Design, Automation Test in Europe. 1--6.Google ScholarGoogle Scholar
  21. A. Lingamneni, A. Basu, C. Enz, K. Palem, and C. Piguet. 2013. Improving energy gains of inexact DSP hardware through reciprocative error compensation. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, 20:1--20:8. ISBN 978-1-4503-2071-9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Shin and S. K. Gupta. 2010. Approximate logic synthesis for error tolerant applications. In 2010 Design, Automation Test in Europe Conference Exhibition (DATE 2010). 957--960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Rehman, W. El-Harouni, M. Shafique, A. Kumar, and J. Henkel. 2016. Architectural-space exploration of approximate multipliers. In Proceedings of the 35th International Conference on Computer-Aided Design (ICCAD’16). ACM, 80:1--80:8. ISBN 978-1-4503-4466-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Ranjan, A. Raha, S. Venkataramani, K. Roy, and A. Raghunathan. 2014. ASLAN: Synthesis of approximate sequential circuits. In 2014 Design, Automation Test in Europe Conference Exhibition (DATE). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Raha, S. Venkataramani, V. Raghunathan, and A. Raghunathan. 2017. Energy-efficient reduce-and-rank using input-adaptive approximations. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 2 (Feb 2017), 462--475. ISSN 1063-8210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. 2012. Neural acceleration for general-purpose approximate programs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, 449--460. ISBN 978-0-7695-4924-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Sidiroglou-Douskos, S. Misailovic, H. Hoffmann, and M. Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE’11). ACM, 124--134. ISBN 978-1-4503-0443-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. 2011. Dynamic knobs for responsive power-aware computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI. ACM, 199--212. ISBN 978-1-4503-0266-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Aurangzeb and R. Eigenmann. 2017. HiPA: History-based piecewise approximation for functions. In Proceedings of the International Conference on Supercomputing (ICS’17). ACM, 23:1--23:10. ISBN 978-1-4503-5020-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Laurenzano, P. Hill, M. Samadi, S. Mahlke, J. Mars, and L. Tang. 2016. Input responsiveness: Using canary inputs to dynamically steer approximation. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’16). ACM, 161--176. ISBN 978-1-4503-4261-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Raha and V. Raghunathan. 2017. Synergistic approximation of computation and memory subsystems for error-resilient applications. IEEE Embedded Systems Letters 9, 1 (March 2017), 21--24. ISSN 1943-0663. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Raha and V. Raghunathan. 2017. Towards full-system energy-accuracy tradeoffs: A case study of an approximate smart camera system. In Proceedings of the 54th Annual Design Automation Conference 2017 (DAC’17). ACM, 74:1--74:6. ISBN 978-1-4503-4927-7. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. qLUT: Input-Aware Quantized Table Lookup for Energy-Efficient Approximate Accelerators

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!