skip to main content
research-article

REDSPY: Exploring Value Locality in Software

Authors Info & Claims
Published:04 April 2017Publication History
Skip Abstract Section

Abstract

Complex code bases with several layers of abstractions have abundant inefficiencies that affect the execution time. Value redundancy is a kind of inefficiency where the same values are repeatedly computed, stored, or retrieved over the course of execution. Not all redundancies can be easily detected or eliminated with compiler optimization passes due to the inherent limitations of the static analysis.

Microscopic observation of whole executions at instruction- and operand-level granularity breaks down abstractions and helps recognize redundancies that masquerade in complex programs. We have developed REDSPY---a fine-grained profiler to pinpoint and quantify redundant operations in program executions. Value redundancy may happen over time at same locations or in adjacent locations, and thus it has temporal and spatial locality. REDSPY identifies both temporal and spatial value locality. Furthermore, REDSPY is capable of identifying values that are approximately the same, enabling optimization opportunities in HPC codes that often use floating point computations. REDSPY provides intuitive optimization guidance by apportioning redundancies to their provenance---source lines and execution calling contexts. REDSPY pinpointed dramatically high volume of redundancies in programs that were optimization targets for decades, such as SPEC CPU2006 suite, Rodinia benchmark, and NWChem---a production computational chemistry code. Guided by REDSPY, we were able to eliminate redundancies that resulted in significant speedups.

References

  1. CCTLib. https://github.com/CCTLib/.Google ScholarGoogle Scholar
  2. The DWARF Debugging Standard. http://www.dwarfstd.org.Google ScholarGoogle Scholar
  3. NU-MineBench Suite. http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html.Google ScholarGoogle Scholar
  4. Rodinia Benchmark Suite. http://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Rodinia:Accelerating_Compute-Intensive_Applications_with_Accelerators.Google ScholarGoogle Scholar
  5. Intel VTune. https://software.intel.com/en-us/intel-vtune-amplifier-xe.Google ScholarGoogle Scholar
  6. L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency Computation : Practice Expererience, 22 (6): 685--701, Apr 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. M. Anderson, L. M. Berc, J. Dean, S. Ghemawat, M. R. Henzinger, S.-T. A. Leung, R. L. Sites, M. T. Vandevoorde, C. A. Waldspurger, and W. E. Weihl. Continuous Profiling: Where Have All the Cycles Gone? ACM Trans. Comput. Syst., 15 (4): 357--390, Nov 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. B. Bell, K. M. Lepak, and M. H. Lipasti. Characterization of Silent Stores. In Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622), pages 133--144, 2000. Google ScholarGoogle ScholarCross RefCross Ref
  9. P. Briggs, K. D. Cooper, and L. T. Simpson. Value Numbering. Software--Practice and Experience, 27 (6): 701--724, Jun 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Burrows, U. Erlingsson, S.-T. A. Leung, M. T. Vandevoorde, C. A. Waldspurger, K. Walker, and W. E. Weihl. Efficient and Flexible Value Sampling. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IX, pages 160--167, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. A. Butts and G. Sohi. Dynamic Dead-instruction Detection and Elimination. In Proc. of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 199--210, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Calder, P. Feller, and A. Eustace. Value Profiling. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 30, pages 259--269, Washington, DC, USA, 1997. IEEE Computer Society. Google ScholarGoogle ScholarCross RefCross Ref
  13. B. Calder, P. Feller, and A. Eustace. Value Profiling and Optimization. Journal of Instruction Level Parallelism, 1, 1999.Google ScholarGoogle Scholar
  14. M. Chabbi and J. Mellor-Crummey. DeadSpy: A Tool to Pinpoint Program Inefficiencies. In Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO '12, pages 124--134, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Chabbi, X. Liu, and J. Mellor-Crummey. Call Paths for Pin Tools. In Proc. of Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 76:76--76:86, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E.-Y. Chung, L. Benini, and G. D. Micheli. Energy Efficient Source Code Transformation based on Value Profiling. In Proceedings of International Workshop on Compilers and Operating Systems for Low Power, 2000.Google ScholarGoogle Scholar
  17. K. Cooper, J. Eckhardt, and K. Kennedy. Redundancy Elimination Revisited. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT '08, pages 12--21, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. J. Deitz, B. L. Chamberlain, and L. Snyder. Eliminating Redundancies in Sum-of-product Array Computations. In Proceedings of the 15th International Conference on Supercomputing, ICS '01, pages 65--77, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. DeRose, B. Homer, D. Johnson, S. Kaufmann, and H. Poxon. Cray Performance Analysis Tools. In Tools for High Performance Computing, pages 191--199. Springer Berlin Heidelberg, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  20. P. T. Feller. Value Profiling for Instructions and Memory Locations. Master dissertation, 1998.Google ScholarGoogle Scholar
  21. M. F. Fernández. Simple and Effective Link-time Optimization of Modula-3 Programs. In Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, PLDI '95, pages 103--115, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. L. Graham, P. B. Kessler, and M. K. Mckusick. Gprof: A Call Graph Execution Profiler. In Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction, SIGPLAN '82, pages 120--126, New York, NY, USA, 1982. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Henry, H. Bolloré, and E. Oseret. Towards the Generalization of Value Profiling for High-Performance Application Optimization. http://sylvain-henry.info/home/files/papers/shenry_2015_vprof.pdf.Google ScholarGoogle Scholar
  24. R. Hundt, E. Raman, M. Thuresson, and N. Vachharajani. MAO -- An Extensible Micro-architectural Optimizer. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '11, pages 1--10, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarGoogle ScholarCross RefCross Ref
  25. Intel Corp. Intel 64 and IA-32 Architectures Software Developer'-ôs Manual. https://software.intel.com/en-us/articles/intel-sdm,natexlaba.Google ScholarGoogle Scholar
  26. Intel Corp. Intel X86 Encoder Decoder Software Library. https://software.intel.com/en-us/articles/xed-x86-encoder-decoder-software-library,natexlabb.Google ScholarGoogle Scholar
  27. T. Johnson, M. Amini, and X. D. Li. ThinLTO: Scalable and Incremental LTO. In Proceedings of International Symposium on Code Generation and Optimization, Austin, Texas, USA, 2017. Google ScholarGoogle ScholarCross RefCross Ref
  28. T. Kamio and H. Masahura. A Value Profiler for Assisting Object-Oriented Program Specialization. In Proceedings of Workshop on New Approaches to Software Construction, 2004.Google ScholarGoogle Scholar
  29. K. M. Lepak and M. H. Lipasti. On the Value Locality of Store Instructions. In Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201), pages 182--191, Jun 2000\natexlaba. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. M. Lepak and M. H. Lipasti. Silent Stores for Free. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 33, pages 22--31, New York, NY, USA, 2000\natexlabb. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Levontextitet al. OProfile. http://oprofile.sourceforge.net.Google ScholarGoogle Scholar
  32. M. H. Lipasti and J. P. Shen. Exceeding the Dataflow Limit via Value Prediction. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 29, pages 226--237, Washington, DC, USA, 1996. IEEE Computer Society. Google ScholarGoogle ScholarCross RefCross Ref
  33. M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value Locality and Load Value Prediction. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VII, pages 138--147, New York, NY, USA, 1996. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 190--200, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y. Luo and G. Tan. Optimizing Stencil Code via Locality of Computation. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, pages 477--478, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. S. Miguel, M. Badr, and N. E. Jerger. Load Value Approximation. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-47, pages 127--139, Washington, DC, USA, 2014. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. S. Miguel, J. Albericio, A. Moshovos, and N. E. Jerger. DoppelgÄNger: A Cache for Approximate Computing. In Proceedings of the 48th International Symposium on Microarchitecture, MICRO-48, pages 50--61, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Mostow and D. Cohen. Automating Program Speedup by Deciding What to Cache. In Proceedings of the 9th International Joint Conference on Artificial Intelligence - Volume 1, IJCAI'85, pages 165--172, San Francisco, CA, USA, 1985. Morgan Kaufmann Publishers Inc.Google ScholarGoogle Scholar
  39. R. Muth, S. A. Watterson, and S. K. Debray. Code Specialization Based on Value Profiles. In Proceedings of the 7th International Symposium on Static Analysis, SAS '00, pages 340--359, London, UK, 2000. Springer-Verlag. Google ScholarGoogle ScholarCross RefCross Ref
  40. T. Oh, H. Kim, N. P. Johnson, J. W. Lee, and D. I. August. Practical Automatic Loop Specialization. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 419--430, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global Value Numbers and Redundant Computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 12--27, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Samadi, D. A. Jamshidi, J. Lee, and S. Mahlke. Paraprox: Pattern-based Approximation for Data Parallel Applications. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 35--50, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. SPEC Corporation. SPEC CPU2006 Benchmark Suite. http://www.spec.org/cpu2006. 3 November 2007.Google ScholarGoogle Scholar
  44. M. Valiev, E. Bylaska, N. Govind, K. Kowalski, T. Straatsma, H. V. Dam, D. Wang, J. Nieplocha, E. Apra, T. Windus, and W. de~Jong. NWChem: A Comprehensive and Scalable Open-source Solution for Large Scale Molecular Simulations. Computer Physics Communications, 181 (9): 1477 -- 1489, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  45. S. A. Watterson and S. K. Debray. Goal-Directed Value Profiling. In Proceedings of the 10th International Conference on Compiler Construction, CC '01, pages 319--333, London, UK, 2001. Springer-Verlag. Google ScholarGoogle ScholarCross RefCross Ref
  46. V. Weaver. Reading RAPL Energy Measurements from Linux. http://web.eece.maine.edu/~vweaver/projects/rapl/.Google ScholarGoogle Scholar
  47. M. N. Wegman and F. K. Zadeck. Constant Propagation with Conditional Branches. ACM Trans. Program. Lang. Syst., 13 (2): 181--210, Apr 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. S. Wen, X. Liu, and M. Chabbi. Runtime Value Numbering: A Profiling Technique to Pinpoint Redundant Computations. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT), pages 254--265, Oct 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. A. Yazdanbakhsh, G. Pekhimenko, B. Thwaites, H. Esmaeilzadeh, O. Mutlu, and T. C. Mowry. RFVP: Rollback-free Value Prediction with Safe-to-approximate Loads. ACM Transactions on Architecture and Code Optimization (TACO), 12 (4): 62, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Y. Zhong and W. Chang. Sampling-based Program Locality Approximation. In Proceedings of the 7th International Symposium on Memory Management, pages 91--100, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. REDSPY: Exploring Value Locality in Software

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!