Abstract
Approximate computing has emerged as a popular design paradigm for optimizing the performance and energy consumption of error-resilient applications in domains such as machine learning, graphics, data analytics, etc. Numerous techniques for approximate computing have been proposed at different layers of the system stack, from circuits to architecture to software. In this work, we propose a new technique, called quantized table lookup, for approximating the meta-functions used in the core computational kernels of error-resilient applications. In contrast to prior work that directly approximates the functionality of the meta-functions, the proposed technique instead approximates the input data to the meta-functions by reducing/quantizing them to a much smaller set of values that we call quantized inputs. The small number of quantized inputs enables us to completely replace the energy-intensive arithmetic units in the meta-function with small and energy-efficient lookup tables (called quantized lookup tables or qLUT) that contain precomputed output values corresponding to the quantized inputs. The proposed approximation technique is not only highly generic, but also inherently quality-configurable and input-aware. Quality-configurability and input-awareness are achieved by modulating the size of the qLUT as well as selecting the values of the quantized inputs judiciously based on the statistics of the original input data. To evaluate the proposed technique, we have implemented the dominant meta-functions of nine error-resilient application benchmarks as quantized table lookup based hardware accelerators using 45nm technology. Experimental results demonstrate average energy savings of 46% at the application-level for minimal (<1%) loss in output quality.
- V. Chippa, S. Chakradhar, K. Roy, and A. Raghunathan. 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, 113:1--113:9. ISBN 978-1-4503-2071-9. Google Scholar
Digital Library
- M. Shafique, R. Hafiz, S. Rehman, W. El-Harouni, and J. Henkel. 2016. Invited: Cross-layer approximate computing: From logic to architectures. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC). 1--6. Google Scholar
Digital Library
- D. Mohapatra, V. K. Chippa, A. Raghunathan, and K. Roy. 2011. Design of voltage-scalable meta-functions for approximate computing. In 2011 Design, Automation Test in Europe. 1--6.Google Scholar
- R. Amirtharajah and A. P. Chandrakasan. 2004. A micropower programmable DSP using approximate signal processing based on distributed arithmetic. IEEE Journal of Solid-State Circuits 39, 2 (Feb 2004), 337--347. ISSN 0018-9200.Google Scholar
Cross Ref
- F. de Dinechin and A. Tisserand. 2005. Multipartite table methods. IEEE Trans. Comput. Transactions on Computers 54, 3 (March 2005), 319--330. ISSN 0018-9340. Google Scholar
Digital Library
- A. Raha, S. Venkataramani, V. Raghunathan, and A. Raghunathan. 2015. Quality configurable reduce-and-rank for energy efficient approximate computing. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE). 665--670. Google Scholar
Digital Library
- C. Alvarez, J. Corbal, and M. Valero. 2005. Fuzzy memoization for floating-point multimedia applications. IEEE Transactions on Computers 54, 7 (July 2005), 922--927. ISSN 0018-9340. Google Scholar
Digital Library
- M. Samadi, J. Lee, D. Jamshidi, A. Hormati, and S. Mahlke. 2013. SAGE: Self-tuning approximation for graphics engines. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, 13--24. ISBN 978-1-4503-2638-4. Google Scholar
Digital Library
- V. Chippa, A. Raghunathan, K. Roy, and S. Chakradhar. 2011. Dynamic effort scaling: Managing the quality-efficiency tradeoff. In Proceedings of the 48th Design Automation Conference (DAC’11). ACM, 603--608. ISBN 978-1-4503-0636-2. Google Scholar
Digital Library
- W. Baek and T. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, 198--209. ISBN 978-1-4503-0019-3. Google Scholar
Digital Library
- D. Martin, C. Fowlkes, D. Tal, and J. Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, volume 2, pages 416--423.Google Scholar
Cross Ref
- S. Venkataramani, V. Chippa, S. Chakradhar, K. Roy, and A. Raghunathan. 2013. Quality programmable vector processors for approximate computing. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, 1--12. ISBN 978-1-4503-2638-4. Google Scholar
Digital Library
- Y. Voronenko and M. Püschel. 2007. Multiplierless multiple constant multiplication. ACM Trans. Algorithms 3, 2 (May 2007). ISSN 1549-6325. Google Scholar
Digital Library
- H. Nguyen and A. Chatterjee. 2000. Number-splitting with shift-and-add decomposition for power and hardware optimization in linear DSP synthesis. IEEE Trans. Very Large Scale Integr. Syst. 8, 4 (August 2000), 419--424. ISSN 1063-8210. Google Scholar
Digital Library
- M. Potkonjak, M. B. Srivastava, and A. P. Chandrakasan. 1996. Multiple constant multiplications: efficient and versatile framework and algorithms for exploring common subexpression elimination. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 15, 2 (Feb 1996), 151--165. ISSN 0278-0070. Google Scholar
Digital Library
- M. Ayinala and K. K. Parhi. 2013. Low-energy architectures for support vector machine computation. In 2013 Asilomar Conference on Signals, Systems and Computers. 2167--2171.Google Scholar
- V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy. 2011. IMPACT: IMPrecise adders for low-power approximate computing. In IEEE/ACM International Symposium on Low Power Electronics and Design. 409--414. Google Scholar
Digital Library
- A. Raha, H. Jayakumar, and V. Raghunathan. 2014. A power efficient video encoder using reconfigurable approximate arithmetic units. In 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems. 324--329. Google Scholar
Digital Library
- A. Raha, H. Jayakumar, and V. Raghunathan. 2016. Input-based dynamic reconfiguration of approximate arithmetic units for video encoding. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 3 (March 2016), 846--857. ISSN 1063-8210. Google Scholar
Digital Library
- P. K. Krause and I. Polian. 2011. Adaptive voltage over-scaling for resilient applications. In 2011 Design, Automation Test in Europe. 1--6.Google Scholar
- A. Lingamneni, A. Basu, C. Enz, K. Palem, and C. Piguet. 2013. Improving energy gains of inexact DSP hardware through reciprocative error compensation. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, 20:1--20:8. ISBN 978-1-4503-2071-9. Google Scholar
Digital Library
- D. Shin and S. K. Gupta. 2010. Approximate logic synthesis for error tolerant applications. In 2010 Design, Automation Test in Europe Conference Exhibition (DATE 2010). 957--960. Google Scholar
Digital Library
- S. Rehman, W. El-Harouni, M. Shafique, A. Kumar, and J. Henkel. 2016. Architectural-space exploration of approximate multipliers. In Proceedings of the 35th International Conference on Computer-Aided Design (ICCAD’16). ACM, 80:1--80:8. ISBN 978-1-4503-4466-1. Google Scholar
Digital Library
- A. Ranjan, A. Raha, S. Venkataramani, K. Roy, and A. Raghunathan. 2014. ASLAN: Synthesis of approximate sequential circuits. In 2014 Design, Automation Test in Europe Conference Exhibition (DATE). 1--6. Google Scholar
Digital Library
- A. Raha, S. Venkataramani, V. Raghunathan, and A. Raghunathan. 2017. Energy-efficient reduce-and-rank using input-adaptive approximations. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 2 (Feb 2017), 462--475. ISSN 1063-8210. Google Scholar
Digital Library
- H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. 2012. Neural acceleration for general-purpose approximate programs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, 449--460. ISBN 978-0-7695-4924-8. Google Scholar
Digital Library
- S. Sidiroglou-Douskos, S. Misailovic, H. Hoffmann, and M. Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE’11). ACM, 124--134. ISBN 978-1-4503-0443-6. Google Scholar
Digital Library
- H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. 2011. Dynamic knobs for responsive power-aware computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI. ACM, 199--212. ISBN 978-1-4503-0266-1. Google Scholar
Digital Library
- Aurangzeb and R. Eigenmann. 2017. HiPA: History-based piecewise approximation for functions. In Proceedings of the International Conference on Supercomputing (ICS’17). ACM, 23:1--23:10. ISBN 978-1-4503-5020-4. Google Scholar
Digital Library
- M. Laurenzano, P. Hill, M. Samadi, S. Mahlke, J. Mars, and L. Tang. 2016. Input responsiveness: Using canary inputs to dynamically steer approximation. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’16). ACM, 161--176. ISBN 978-1-4503-4261-2. Google Scholar
Digital Library
- A. Raha and V. Raghunathan. 2017. Synergistic approximation of computation and memory subsystems for error-resilient applications. IEEE Embedded Systems Letters 9, 1 (March 2017), 21--24. ISSN 1943-0663. Google Scholar
Digital Library
- A. Raha and V. Raghunathan. 2017. Towards full-system energy-accuracy tradeoffs: A case study of an approximate smart camera system. In Proceedings of the 54th Annual Design Automation Conference 2017 (DAC’17). ACM, 74:1--74:6. ISBN 978-1-4503-4927-7. Google Scholar
Digital Library
Index Terms
qLUT: Input-Aware Quantized Table Lookup for Energy-Efficient Approximate Accelerators
Recommendations
Neural Acceleration for General-Purpose Approximate Programs
This work proposes an approximate algorithmic transformation and a new class of accelerators, called neural processing units (NPUs). NPUs leverage the approximate algorithmic transformation that converts regions of code from a Von Neumann model to a ...
Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor
SBAC-PAD '13: Proceedings of the 2013 25th International Symposium on Computer Architecture and High Performance ComputingThis paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, ...
Programming the Linpack benchmark for the IBM PowerXCell 8i processor
High Performance Computing with the Cell Broadband EngineIn this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i 1 processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™ 2 architecture ...






Comments