skip to main content
research-article
Public Access

Architecture-Aware Approximate Computing

Published:19 June 2019Publication History
Skip Abstract Section

Abstract

Deliberate use of approximate computing has been an active research area recently. Observing that many application programs from different domains can live with less-than-perfect accuracy, existing techniques try to trade off program output accuracy with performance-energy savings. While these works provide point solutions, they leave three critical questions regarding approximate computing unanswered, especially in the context of dropping/skipping costly data accesses: (i) what is the maximum potential of skipping (i.e., not performing) data accesses under a given inaccuracy bound?; (ii) can we identify the data accesses to drop randomly, or is being architecture aware (i.e., identifying the costliest data accesses in a given architecture) critical?; and (iii) do two executions that skip the same number of data accesses always result in the same output quality (error)? This paper first provides answers to these questions using ten multithreaded workloads, and then, motivated by the negative answer to the third question, presents a program slicing-based approach that identifies the set of data accesses to drop such that (i) the resulting performance/energy benefits are maximized and (ii) the execution remains within the error (inaccuracy) bound specified by the user. Our slicing-based approach first uses backward slicing and then forward slicing to decide the set of data accesses to drop. Our experimental evaluations using ten multithreaded workloads show that, when averaged over all benchmark programs we have, 8.8% performance improvement and 13.7% energy saving are possible when we set the error bound to 2%, and the corresponding improvements jump to 15% and 25%, respectively, when the error bound is raised to 4%.

References

  1. Akturk, I., Khatamifard, K., and Karpuzcu, U. R. On quantification of accuracy loss in approximate computing.Google ScholarGoogle Scholar
  2. Bienia, C., Kumar, S., Singh, J. P., and Li, K. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K., Saidi, A., Basu, A., Hestness, J., Hower, D. R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M. D., and Wood, D. A. The gem5 simulator. ACM SIGARCH Computer Architecture News (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Carbin, M., and Rinard, M. C. Automatically Identifying Critical Input Regions and Code in Applications. In Proceedings of the 19th International Symposium on Software Testing and Analysis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chippa, V. K., Venkataramani, S., Chakradhar, S. T., Roy, K., and Raghunathan, A. Approximate Computing: An Integrated Hardware Approach. In Asilomar Conference on Signals, Systems and Computers (2013).Google ScholarGoogle Scholar
  6. Ding, W., Tang, X., Kandemir, M., Zhang, Y., and Kultursay, E. Optimizing Off-chip Accesses in Multicores. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Esmaeilzadeh, H., Sampson, A., Ceze, L., and Burger, D. Architecture support for disciplined approximate programming. In ASPLOS (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Esmaeilzadeh, H., Sampson, A., Ceze, L., and Burger, D. Neural acceleration for general-purpose approximate programs. In MICRO (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ghosh, S., Martonosi, M., and Malik, S. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Grigorian, B., and Reinman, G. Improving coverage and reliability in approximate computing using application-specific, light-weight checks. First Workshop on Approximate Computing Across the System Stack (WACAS) (2014).Google ScholarGoogle Scholar
  11. Han, J., and Orshansky, M. Approximate computing: An emerging paradigm for energy-efficient design. In 18th IEEE European Test Symposium (ETS) (2013).Google ScholarGoogle ScholarCross RefCross Ref
  12. Hegde, R., and Shanbhag, N. R. Energy-efficient signal processing via algorithmic noise-tolerance. In Proceedings of the International Symposium on Low Power Electronics and Design (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Huh, J., Kim, C., Shafi, H., Zhang, L., Burger, D., and Keckler, S. A NUCA substrate for flexible CMP cache sharing. IEEE Transactions on Parallel and Distributed Systems (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kandemir, M., Zhao, H., Tang, X., and Karakoy, M. Memory Row Reuse Distance and Its Role in Optimizing Application Performance. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS) (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kayiran, O., Jog, A., Pattnaik, A., Ausavarungnirun, R., Tang, X., Kandemir, M. T., Loh, G. H., Mutlu, O., and Das, C. R. uC-States: Fine-grained GPU Datapath Power Management. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT) (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kim, Y., Zhang, Y., and Li, P. An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kislal, O., Kotra, J., Tang, X., Kandemir, M. T., and Jung, M. Enhancing computation-to-core assignment with physical location information. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Korel, B., and Laski, J. Dynamic program slicing. Information Processing Letters (1988). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lattner, C., and Adve, V. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Li, S., Ahn, J. H., Strong, R. D., Brockman, J. B., Tullsen, D. M., and Jouppi, N. P. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Li, X., and Yeung, D. Exploiting soft computing for increased fault tolerance. In In Proceedings of Workshop on Architectural Support for Gigascale Integration (2006).Google ScholarGoogle Scholar
  22. Lipasti, M. H., Wilkerson, C. B., and Shen, J. P. Value locality and load value prediction. In ASPLOS (1996).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Miguel, J. S., Badr, M., and Jerger, N. E. Load value approximation. In MICRO (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Misailovic, S., Carbin, M., Achour, S., Qi, Z., and Rinard, M. C. Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mishra, A. K., Barik, R., and Paul, S. iact: A software-hardware framework for understanding the scope of approximate computing. In Workshop on Approximate Computing Across the System Stack (WACAS) (2014).Google ScholarGoogle Scholar
  26. Mohapatra, D., Chippa, V. K., Raghunathan, A., and Roy, K. Design of voltage-scalable meta-functions for approximate computing. In Design, Automation Test in Europe (2011).Google ScholarGoogle ScholarCross RefCross Ref
  27. Nepal, K., Li, Y., Bahar, R. I., and Reda, S. Abacus: A technique for automated behavioral synthesis of approximate computing circuits. In Proceedings of the Conference on Design, Automation and Test in Europe (2014).Google ScholarGoogle Scholar
  28. Pattnaik, A., Tang, X., Jog, A., Kayiran, O., Mishra, A. K., Kandemir, M. T., Mutlu, O., and Das, C. R. Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT) (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Pattnaik, A., Tang, X., Kayiran, O., Jog, A., Mishra, A., T. Kandemir, M., Sivasubramaniam, A., and Das, C. R. Opportunistic Computing in GPU Architectures. In Proceedings of the 46th International Symposium on Computer Architecture (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Pugh, W., and Rosser, E. Iteration space slicing and its application to communication optimization. In Proceedings of the 11th International Conference on Supercomputing (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Renganarayana, L., Srinivasan, V., Nair, R., and Prener, D. Programming with relaxed synchronization. In Proceedings of the ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rinard, M. C. Unsynchronized techniques for approximate parallel computing.Google ScholarGoogle Scholar
  33. Rubio-González, C., Nguyen, C., Nguyen, H. D., Demmel, J., Kahan, W., Sen, K., Bailey, D. H., Iancu, C., and Hough, D. Precimonious: Tuning assistant for floating-point precision. In 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Samadi, M., Jamshidi, D. A., Lee, J., and Mahlke, S. Paraprox: Pattern-based approximation for data parallel applications. In ASPLOS (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Samadi, M., Lee, J., Jamshidi, D. A., Hormati, A., and Mahlke, S. Sage: Self-tuning approximation for graphics engines. In MICRO (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sampson, A., Dietl, W., Fortuna, E., Gnanapragasam, D., Ceze, L., and Grossman, D. Enerj: Approximate data types for safe and general low-power computation. In PLDI (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Shrifi, A., Ding, W., Guttman, D., Zhao, H., Tang, X., Kandemir, M., and Das, C. DEMM: a Dynamic Energy-saving mechanism for Multicore Memories. In Proceedings of the 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) (2017).Google ScholarGoogle ScholarCross RefCross Ref
  38. Sidiroglou, S., Misailovic, S., and Hoffmann, H. Managing performance vs. accuracy trade-offs with loop perforation. In Proc. ACM SIGSOFT symposium (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sidiroglou-Douskos, S., Misailovic, S., Hoffmann, H., and Rinard, M. Managing Performance vs. Accuracy Trade-offs with Loop Perforation. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sodani, A., Gramunt, R., Corbal, J., Kim, H.-S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., and Liu, Y.-C. Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. St. Amant, R., Yazdanbakhsh, A., Park, J., Thwaites, B., Esmaeilzadeh, H., Hassibi, A., Ceze, L., and Burger, D. General-purpose code acceleration with limited-precision analog computation. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Stanley-Marbell, P., and Rinard, M. Efficiency limits for value-deviation-bounded approximate communication. IEEE Embedded Systems Letters (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Sui, X., Lenharth, A., Fussell, D. S., and Pingali, K. Proactive control of approximate programs. In ASPLOS (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Tang, X., Kandemir, M., Yedlapalli, P., and Kotra, J. Improving Bank-Level Parallelism for Irregular Applications. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Tang, X., Kislal, O., Kandemir, M., and Karakoy, M. Data movement aware computation partitioning. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Tang, X., Pattnaik, A., Jiang, H., Kayiran, O., Jog, A., Pai, S., Ibrahim, M., Kandemir, M., and Das, C. Controlled Kernel Launch for Dynamic Parallelism in GPUs. In Proceedings of the 23rd International Symposium on High-Performance Computer Architecture (HPCA) (2017).Google ScholarGoogle ScholarCross RefCross Ref
  47. Tang, X., Taylan Kandemir, M., Karakoy, M., and Arunachalam, M. Co-Optimizing Memory-Level Parallelism and Cache-Level Parallelism. In Proceedings of the 40th annual ACM SIGPLAN conference on Programming Language Design and Implementation (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Tip, F. A survey of program slicing techniques. Journal of programming languages (1995). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Vassiliadis, V., Riehme, J., Deussen, J., Parasyris, K., Antonopoulos, C. D., Bellas, N., Lalis, S., and Naumann, U. Towards automatic significance analysis for approximate computing. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Venkataramani, S., Chippa, V. K., Chakradhar, S. T., Roy, K., and Raghunathan, A. Quality programmable vector processors for approximate computing. In Proc. of International Symposium on Microarchitecture (MICRO) (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Venkataramani, S., Chippa, V. K., Chakradhar, S. T., Roy, K., and Raghunathan, A. Quality programmable vector processors for approximate computing. In MICRO (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Venkatesan, R., Agarwal, A., Roy, K., and Raghunathan, A. Macaco: Modeling and analysis of circuits for approximate computing. In Proceedings of the International Conference on Computer-Aided Design (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Weiser, M. Program slicing. In Proceedings of the 5th International Conference on Software Engineering (1981). Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. The SPLASH-2 programs: Characterization and methodological considerations. In ACM SIGARCH Computer Architecture News (1995). Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Wulf, W. A., and McKee, S. A. Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News (1995). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Zhang, Q., Yuan, F., Ye, R., and Xu, Q. Approxit: An approximate computing framework for iterative methods. In The ACM/EDAC/IEEE Design Automation Conference (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Architecture-Aware Approximate Computing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!