Abstract
Deliberate use of approximate computing has been an active research area recently. Observing that many application programs from different domains can live with less-than-perfect accuracy, existing techniques try to trade off program output accuracy with performance-energy savings. While these works provide point solutions, they leave three critical questions regarding approximate computing unanswered, especially in the context of dropping/skipping costly data accesses: (i) what is the maximum potential of skipping (i.e., not performing) data accesses under a given inaccuracy bound?; (ii) can we identify the data accesses to drop randomly, or is being architecture aware (i.e., identifying the costliest data accesses in a given architecture) critical?; and (iii) do two executions that skip the same number of data accesses always result in the same output quality (error)? This paper first provides answers to these questions using ten multithreaded workloads, and then, motivated by the negative answer to the third question, presents a program slicing-based approach that identifies the set of data accesses to drop such that (i) the resulting performance/energy benefits are maximized and (ii) the execution remains within the error (inaccuracy) bound specified by the user. Our slicing-based approach first uses backward slicing and then forward slicing to decide the set of data accesses to drop. Our experimental evaluations using ten multithreaded workloads show that, when averaged over all benchmark programs we have, 8.8% performance improvement and 13.7% energy saving are possible when we set the error bound to 2%, and the corresponding improvements jump to 15% and 25%, respectively, when the error bound is raised to 4%.
- Akturk, I., Khatamifard, K., and Karpuzcu, U. R. On quantification of accuracy loss in approximate computing.Google Scholar
- Bienia, C., Kumar, S., Singh, J. P., and Li, K. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT (2008). Google Scholar
Digital Library
- Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K., Saidi, A., Basu, A., Hestness, J., Hower, D. R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M. D., and Wood, D. A. The gem5 simulator. ACM SIGARCH Computer Architecture News (2011). Google Scholar
Digital Library
- Carbin, M., and Rinard, M. C. Automatically Identifying Critical Input Regions and Code in Applications. In Proceedings of the 19th International Symposium on Software Testing and Analysis. Google Scholar
Digital Library
- Chippa, V. K., Venkataramani, S., Chakradhar, S. T., Roy, K., and Raghunathan, A. Approximate Computing: An Integrated Hardware Approach. In Asilomar Conference on Signals, Systems and Computers (2013).Google Scholar
- Ding, W., Tang, X., Kandemir, M., Zhang, Y., and Kultursay, E. Optimizing Off-chip Accesses in Multicores. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2015). Google Scholar
Digital Library
- Esmaeilzadeh, H., Sampson, A., Ceze, L., and Burger, D. Architecture support for disciplined approximate programming. In ASPLOS (2012). Google Scholar
Digital Library
- Esmaeilzadeh, H., Sampson, A., Ceze, L., and Burger, D. Neural acceleration for general-purpose approximate programs. In MICRO (2012). Google Scholar
Digital Library
- Ghosh, S., Martonosi, M., and Malik, S. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) (1999). Google Scholar
Digital Library
- Grigorian, B., and Reinman, G. Improving coverage and reliability in approximate computing using application-specific, light-weight checks. First Workshop on Approximate Computing Across the System Stack (WACAS) (2014).Google Scholar
- Han, J., and Orshansky, M. Approximate computing: An emerging paradigm for energy-efficient design. In 18th IEEE European Test Symposium (ETS) (2013).Google Scholar
Cross Ref
- Hegde, R., and Shanbhag, N. R. Energy-efficient signal processing via algorithmic noise-tolerance. In Proceedings of the International Symposium on Low Power Electronics and Design (1999). Google Scholar
Digital Library
- Huh, J., Kim, C., Shafi, H., Zhang, L., Burger, D., and Keckler, S. A NUCA substrate for flexible CMP cache sharing. IEEE Transactions on Parallel and Distributed Systems (2007). Google Scholar
Digital Library
- Kandemir, M., Zhao, H., Tang, X., and Karakoy, M. Memory Row Reuse Distance and Its Role in Optimizing Application Performance. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS) (2015). Google Scholar
Digital Library
- Kayiran, O., Jog, A., Pattnaik, A., Ausavarungnirun, R., Tang, X., Kandemir, M. T., Loh, G. H., Mutlu, O., and Das, C. R. uC-States: Fine-grained GPU Datapath Power Management. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT) (2016). Google Scholar
Digital Library
- Kim, Y., Zhang, Y., and Li, P. An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (2013). Google Scholar
Digital Library
- Kislal, O., Kotra, J., Tang, X., Kandemir, M. T., and Jung, M. Enhancing computation-to-core assignment with physical location information. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2018). Google Scholar
Digital Library
- Korel, B., and Laski, J. Dynamic program slicing. Information Processing Letters (1988). Google Scholar
Digital Library
- Lattner, C., and Adve, V. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2004). Google Scholar
Digital Library
- Li, S., Ahn, J. H., Strong, R. D., Brockman, J. B., Tullsen, D. M., and Jouppi, N. P. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO (2009). Google Scholar
Digital Library
- Li, X., and Yeung, D. Exploiting soft computing for increased fault tolerance. In In Proceedings of Workshop on Architectural Support for Gigascale Integration (2006).Google Scholar
- Lipasti, M. H., Wilkerson, C. B., and Shen, J. P. Value locality and load value prediction. In ASPLOS (1996).Google Scholar
Digital Library
- Miguel, J. S., Badr, M., and Jerger, N. E. Load value approximation. In MICRO (2014).Google Scholar
Digital Library
- Misailovic, S., Carbin, M., Achour, S., Qi, Z., and Rinard, M. C. Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (2014).Google Scholar
Digital Library
- Mishra, A. K., Barik, R., and Paul, S. iact: A software-hardware framework for understanding the scope of approximate computing. In Workshop on Approximate Computing Across the System Stack (WACAS) (2014).Google Scholar
- Mohapatra, D., Chippa, V. K., Raghunathan, A., and Roy, K. Design of voltage-scalable meta-functions for approximate computing. In Design, Automation Test in Europe (2011).Google Scholar
Cross Ref
- Nepal, K., Li, Y., Bahar, R. I., and Reda, S. Abacus: A technique for automated behavioral synthesis of approximate computing circuits. In Proceedings of the Conference on Design, Automation and Test in Europe (2014).Google Scholar
- Pattnaik, A., Tang, X., Jog, A., Kayiran, O., Mishra, A. K., Kandemir, M. T., Mutlu, O., and Das, C. R. Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT) (2016).Google Scholar
Digital Library
- Pattnaik, A., Tang, X., Kayiran, O., Jog, A., Mishra, A., T. Kandemir, M., Sivasubramaniam, A., and Das, C. R. Opportunistic Computing in GPU Architectures. In Proceedings of the 46th International Symposium on Computer Architecture (2019).Google Scholar
Digital Library
- Pugh, W., and Rosser, E. Iteration space slicing and its application to communication optimization. In Proceedings of the 11th International Conference on Supercomputing (1997). Google Scholar
Digital Library
- Renganarayana, L., Srinivasan, V., Nair, R., and Prener, D. Programming with relaxed synchronization. In Proceedings of the ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability (2012). Google Scholar
Digital Library
- Rinard, M. C. Unsynchronized techniques for approximate parallel computing.Google Scholar
- Rubio-González, C., Nguyen, C., Nguyen, H. D., Demmel, J., Kahan, W., Sen, K., Bailey, D. H., Iancu, C., and Hough, D. Precimonious: Tuning assistant for floating-point precision. In 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2013).Google Scholar
Digital Library
- Samadi, M., Jamshidi, D. A., Lee, J., and Mahlke, S. Paraprox: Pattern-based approximation for data parallel applications. In ASPLOS (2014). Google Scholar
Digital Library
- Samadi, M., Lee, J., Jamshidi, D. A., Hormati, A., and Mahlke, S. Sage: Self-tuning approximation for graphics engines. In MICRO (2013). Google Scholar
Digital Library
- Sampson, A., Dietl, W., Fortuna, E., Gnanapragasam, D., Ceze, L., and Grossman, D. Enerj: Approximate data types for safe and general low-power computation. In PLDI (2011). Google Scholar
Digital Library
- Shrifi, A., Ding, W., Guttman, D., Zhao, H., Tang, X., Kandemir, M., and Das, C. DEMM: a Dynamic Energy-saving mechanism for Multicore Memories. In Proceedings of the 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) (2017).Google Scholar
Cross Ref
- Sidiroglou, S., Misailovic, S., and Hoffmann, H. Managing performance vs. accuracy trade-offs with loop perforation. In Proc. ACM SIGSOFT symposium (2011). Google Scholar
Digital Library
- Sidiroglou-Douskos, S., Misailovic, S., Hoffmann, H., and Rinard, M. Managing Performance vs. Accuracy Trade-offs with Loop Perforation. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (2011). Google Scholar
Digital Library
- Sodani, A., Gramunt, R., Corbal, J., Kim, H.-S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., and Liu, Y.-C. Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro (2016). Google Scholar
Digital Library
- St. Amant, R., Yazdanbakhsh, A., Park, J., Thwaites, B., Esmaeilzadeh, H., Hassibi, A., Ceze, L., and Burger, D. General-purpose code acceleration with limited-precision analog computation. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (2014). Google Scholar
Digital Library
- Stanley-Marbell, P., and Rinard, M. Efficiency limits for value-deviation-bounded approximate communication. IEEE Embedded Systems Letters (2015).Google Scholar
Digital Library
- Sui, X., Lenharth, A., Fussell, D. S., and Pingali, K. Proactive control of approximate programs. In ASPLOS (2016). Google Scholar
Digital Library
- Tang, X., Kandemir, M., Yedlapalli, P., and Kotra, J. Improving Bank-Level Parallelism for Irregular Applications. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2016).Google Scholar
Digital Library
- Tang, X., Kislal, O., Kandemir, M., and Karakoy, M. Data movement aware computation partitioning. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2017). Google Scholar
Digital Library
- Tang, X., Pattnaik, A., Jiang, H., Kayiran, O., Jog, A., Pai, S., Ibrahim, M., Kandemir, M., and Das, C. Controlled Kernel Launch for Dynamic Parallelism in GPUs. In Proceedings of the 23rd International Symposium on High-Performance Computer Architecture (HPCA) (2017).Google Scholar
Cross Ref
- Tang, X., Taylan Kandemir, M., Karakoy, M., and Arunachalam, M. Co-Optimizing Memory-Level Parallelism and Cache-Level Parallelism. In Proceedings of the 40th annual ACM SIGPLAN conference on Programming Language Design and Implementation (2019).Google Scholar
Digital Library
- Tip, F. A survey of program slicing techniques. Journal of programming languages (1995). Google Scholar
Digital Library
- Vassiliadis, V., Riehme, J., Deussen, J., Parasyris, K., Antonopoulos, C. D., Bellas, N., Lalis, S., and Naumann, U. Towards automatic significance analysis for approximate computing. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (2016). Google Scholar
Digital Library
- Venkataramani, S., Chippa, V. K., Chakradhar, S. T., Roy, K., and Raghunathan, A. Quality programmable vector processors for approximate computing. In Proc. of International Symposium on Microarchitecture (MICRO) (2013). Google Scholar
Digital Library
- Venkataramani, S., Chippa, V. K., Chakradhar, S. T., Roy, K., and Raghunathan, A. Quality programmable vector processors for approximate computing. In MICRO (2013). Google Scholar
Digital Library
- Venkatesan, R., Agarwal, A., Roy, K., and Raghunathan, A. Macaco: Modeling and analysis of circuits for approximate computing. In Proceedings of the International Conference on Computer-Aided Design (2011). Google Scholar
Digital Library
- Weiser, M. Program slicing. In Proceedings of the 5th International Conference on Software Engineering (1981). Google Scholar
Digital Library
- Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. The SPLASH-2 programs: Characterization and methodological considerations. In ACM SIGARCH Computer Architecture News (1995). Google Scholar
Digital Library
- Wulf, W. A., and McKee, S. A. Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News (1995). Google Scholar
Digital Library
- Zhang, Q., Yuan, F., Ye, R., and Xu, Q. Approxit: An approximate computing framework for iterative methods. In The ACM/EDAC/IEEE Design Automation Conference (2014). Google Scholar
Digital Library
Index Terms
Architecture-Aware Approximate Computing
Recommendations
Architecture-Aware Approximate Computing
SIGMETRICS '19: Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer SystemsObserving that many application programs from different domains can live with less-than-perfect accuracy, existing techniques try to trade off program output accuracy with performance-energy savings. While these works provide point solutions, they leave ...
Architecture-Aware Approximate Computing
Observing that many application programs from different domains can live with less-than-perfect accuracy, existing techniques try to trade off program output accuracy with performance-energy savings. While these works provide point solutions, they leave ...
Energy efficient runtime approximate computing on data flow graphs
ICCAD '17: Proceedings of the 36th International Conference on Computer-Aided DesignApproximate computing is an emerging computation paradigm that utilizes many applications' intrinsic error resilience to improve power and energy efficiency. Several approaches have been proposed to identify the non-critical computations by analyzing ...






Comments