Abstract
Complex code bases with several layers of abstractions have abundant inefficiencies that affect the execution time. Value redundancy is a kind of inefficiency where the same values are repeatedly computed, stored, or retrieved over the course of execution. Not all redundancies can be easily detected or eliminated with compiler optimization passes due to the inherent limitations of the static analysis.
Microscopic observation of whole executions at instruction- and operand-level granularity breaks down abstractions and helps recognize redundancies that masquerade in complex programs. We have developed REDSPY---a fine-grained profiler to pinpoint and quantify redundant operations in program executions. Value redundancy may happen over time at same locations or in adjacent locations, and thus it has temporal and spatial locality. REDSPY identifies both temporal and spatial value locality. Furthermore, REDSPY is capable of identifying values that are approximately the same, enabling optimization opportunities in HPC codes that often use floating point computations. REDSPY provides intuitive optimization guidance by apportioning redundancies to their provenance---source lines and execution calling contexts. REDSPY pinpointed dramatically high volume of redundancies in programs that were optimization targets for decades, such as SPEC CPU2006 suite, Rodinia benchmark, and NWChem---a production computational chemistry code. Guided by REDSPY, we were able to eliminate redundancies that resulted in significant speedups.
- CCTLib. https://github.com/CCTLib/.Google Scholar
- The DWARF Debugging Standard. http://www.dwarfstd.org.Google Scholar
- NU-MineBench Suite. http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html.Google Scholar
- Rodinia Benchmark Suite. http://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Rodinia:Accelerating_Compute-Intensive_Applications_with_Accelerators.Google Scholar
- Intel VTune. https://software.intel.com/en-us/intel-vtune-amplifier-xe.Google Scholar
- L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency Computation : Practice Expererience, 22 (6): 685--701, Apr 2010.Google Scholar
Digital Library
- J. M. Anderson, L. M. Berc, J. Dean, S. Ghemawat, M. R. Henzinger, S.-T. A. Leung, R. L. Sites, M. T. Vandevoorde, C. A. Waldspurger, and W. E. Weihl. Continuous Profiling: Where Have All the Cycles Gone? ACM Trans. Comput. Syst., 15 (4): 357--390, Nov 1997. Google Scholar
Digital Library
- G. B. Bell, K. M. Lepak, and M. H. Lipasti. Characterization of Silent Stores. In Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622), pages 133--144, 2000. Google Scholar
Cross Ref
- P. Briggs, K. D. Cooper, and L. T. Simpson. Value Numbering. Software--Practice and Experience, 27 (6): 701--724, Jun 1997. Google Scholar
Digital Library
- M. Burrows, U. Erlingsson, S.-T. A. Leung, M. T. Vandevoorde, C. A. Waldspurger, K. Walker, and W. E. Weihl. Efficient and Flexible Value Sampling. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IX, pages 160--167, New York, NY, USA, 2000. ACM. Google Scholar
Digital Library
- J. A. Butts and G. Sohi. Dynamic Dead-instruction Detection and Elimination. In Proc. of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 199--210, 2002. Google Scholar
Digital Library
- B. Calder, P. Feller, and A. Eustace. Value Profiling. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 30, pages 259--269, Washington, DC, USA, 1997. IEEE Computer Society. Google Scholar
Cross Ref
- B. Calder, P. Feller, and A. Eustace. Value Profiling and Optimization. Journal of Instruction Level Parallelism, 1, 1999.Google Scholar
- M. Chabbi and J. Mellor-Crummey. DeadSpy: A Tool to Pinpoint Program Inefficiencies. In Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO '12, pages 124--134, New York, NY, USA, 2012. ACM. Google Scholar
Digital Library
- M. Chabbi, X. Liu, and J. Mellor-Crummey. Call Paths for Pin Tools. In Proc. of Annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 76:76--76:86, 2014. Google Scholar
Digital Library
- E.-Y. Chung, L. Benini, and G. D. Micheli. Energy Efficient Source Code Transformation based on Value Profiling. In Proceedings of International Workshop on Compilers and Operating Systems for Low Power, 2000.Google Scholar
- K. Cooper, J. Eckhardt, and K. Kennedy. Redundancy Elimination Revisited. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT '08, pages 12--21, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
- S. J. Deitz, B. L. Chamberlain, and L. Snyder. Eliminating Redundancies in Sum-of-product Array Computations. In Proceedings of the 15th International Conference on Supercomputing, ICS '01, pages 65--77, New York, NY, USA, 2001. ACM. Google Scholar
Digital Library
- L. DeRose, B. Homer, D. Johnson, S. Kaufmann, and H. Poxon. Cray Performance Analysis Tools. In Tools for High Performance Computing, pages 191--199. Springer Berlin Heidelberg, 2008. Google Scholar
Cross Ref
- P. T. Feller. Value Profiling for Instructions and Memory Locations. Master dissertation, 1998.Google Scholar
- M. F. Fernández. Simple and Effective Link-time Optimization of Modula-3 Programs. In Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, PLDI '95, pages 103--115, New York, NY, USA, 1995. ACM. Google Scholar
Digital Library
- S. L. Graham, P. B. Kessler, and M. K. Mckusick. Gprof: A Call Graph Execution Profiler. In Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction, SIGPLAN '82, pages 120--126, New York, NY, USA, 1982. ACM. Google Scholar
Digital Library
- S. Henry, H. Bolloré, and E. Oseret. Towards the Generalization of Value Profiling for High-Performance Application Optimization. http://sylvain-henry.info/home/files/papers/shenry_2015_vprof.pdf.Google Scholar
- R. Hundt, E. Raman, M. Thuresson, and N. Vachharajani. MAO -- An Extensible Micro-architectural Optimizer. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '11, pages 1--10, Washington, DC, USA, 2011. IEEE Computer Society. Google Scholar
Cross Ref
- Intel Corp. Intel 64 and IA-32 Architectures Software Developer'-ôs Manual. https://software.intel.com/en-us/articles/intel-sdm,natexlaba.Google Scholar
- Intel Corp. Intel X86 Encoder Decoder Software Library. https://software.intel.com/en-us/articles/xed-x86-encoder-decoder-software-library,natexlabb.Google Scholar
- T. Johnson, M. Amini, and X. D. Li. ThinLTO: Scalable and Incremental LTO. In Proceedings of International Symposium on Code Generation and Optimization, Austin, Texas, USA, 2017. Google Scholar
Cross Ref
- T. Kamio and H. Masahura. A Value Profiler for Assisting Object-Oriented Program Specialization. In Proceedings of Workshop on New Approaches to Software Construction, 2004.Google Scholar
- K. M. Lepak and M. H. Lipasti. On the Value Locality of Store Instructions. In Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201), pages 182--191, Jun 2000\natexlaba. Google Scholar
Digital Library
- K. M. Lepak and M. H. Lipasti. Silent Stores for Free. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 33, pages 22--31, New York, NY, USA, 2000\natexlabb. ACM. Google Scholar
Digital Library
- J. Levontextitet al. OProfile. http://oprofile.sourceforge.net.Google Scholar
- M. H. Lipasti and J. P. Shen. Exceeding the Dataflow Limit via Value Prediction. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 29, pages 226--237, Washington, DC, USA, 1996. IEEE Computer Society. Google Scholar
Cross Ref
- M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value Locality and Load Value Prediction. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VII, pages 138--147, New York, NY, USA, 1996. ACM. Google Scholar
Digital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 190--200, New York, NY, USA, 2005. ACM. Google Scholar
Digital Library
- Y. Luo and G. Tan. Optimizing Stencil Code via Locality of Computation. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, pages 477--478, 2014. Google Scholar
Digital Library
- J. S. Miguel, M. Badr, and N. E. Jerger. Load Value Approximation. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-47, pages 127--139, Washington, DC, USA, 2014. IEEE Computer Society. Google Scholar
Digital Library
- J. S. Miguel, J. Albericio, A. Moshovos, and N. E. Jerger. DoppelgÄNger: A Cache for Approximate Computing. In Proceedings of the 48th International Symposium on Microarchitecture, MICRO-48, pages 50--61, New York, NY, USA, 2015. ACM. Google Scholar
Digital Library
- J. Mostow and D. Cohen. Automating Program Speedup by Deciding What to Cache. In Proceedings of the 9th International Joint Conference on Artificial Intelligence - Volume 1, IJCAI'85, pages 165--172, San Francisco, CA, USA, 1985. Morgan Kaufmann Publishers Inc.Google Scholar
- R. Muth, S. A. Watterson, and S. K. Debray. Code Specialization Based on Value Profiles. In Proceedings of the 7th International Symposium on Static Analysis, SAS '00, pages 340--359, London, UK, 2000. Springer-Verlag. Google Scholar
Cross Ref
- T. Oh, H. Kim, N. P. Johnson, J. W. Lee, and D. I. August. Practical Automatic Loop Specialization. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 419--430, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global Value Numbers and Redundant Computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 12--27, 1988. Google Scholar
Digital Library
- M. Samadi, D. A. Jamshidi, J. Lee, and S. Mahlke. Paraprox: Pattern-based Approximation for Data Parallel Applications. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 35--50, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- SPEC Corporation. SPEC CPU2006 Benchmark Suite. http://www.spec.org/cpu2006. 3 November 2007.Google Scholar
- M. Valiev, E. Bylaska, N. Govind, K. Kowalski, T. Straatsma, H. V. Dam, D. Wang, J. Nieplocha, E. Apra, T. Windus, and W. de~Jong. NWChem: A Comprehensive and Scalable Open-source Solution for Large Scale Molecular Simulations. Computer Physics Communications, 181 (9): 1477 -- 1489, 2010.Google Scholar
Cross Ref
- S. A. Watterson and S. K. Debray. Goal-Directed Value Profiling. In Proceedings of the 10th International Conference on Compiler Construction, CC '01, pages 319--333, London, UK, 2001. Springer-Verlag. Google Scholar
Cross Ref
- V. Weaver. Reading RAPL Energy Measurements from Linux. http://web.eece.maine.edu/~vweaver/projects/rapl/.Google Scholar
- M. N. Wegman and F. K. Zadeck. Constant Propagation with Conditional Branches. ACM Trans. Program. Lang. Syst., 13 (2): 181--210, Apr 1991. Google Scholar
Digital Library
- S. Wen, X. Liu, and M. Chabbi. Runtime Value Numbering: A Profiling Technique to Pinpoint Redundant Computations. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT), pages 254--265, Oct 2015. Google Scholar
Digital Library
- A. Yazdanbakhsh, G. Pekhimenko, B. Thwaites, H. Esmaeilzadeh, O. Mutlu, and T. C. Mowry. RFVP: Rollback-free Value Prediction with Safe-to-approximate Loads. ACM Transactions on Architecture and Code Optimization (TACO), 12 (4): 62, 2016. Google Scholar
Digital Library
- Y. Zhong and W. Chang. Sampling-based Program Locality Approximation. In Proceedings of the 7th International Symposium on Memory Management, pages 91--100, 2008. Google Scholar
Digital Library
Index Terms
REDSPY: Exploring Value Locality in Software
Recommendations
REDSPY: Exploring Value Locality in Software
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsComplex code bases with several layers of abstractions have abundant inefficiencies that affect the execution time. Value redundancy is a kind of inefficiency where the same values are repeatedly computed, stored, or retrieved over the course of ...
REDSPY: Exploring Value Locality in Software
Asplos'17Complex code bases with several layers of abstractions have abundant inefficiencies that affect the execution time. Value redundancy is a kind of inefficiency where the same values are repeatedly computed, stored, or retrieved over the course of ...
A dynamic tool for finding redundant computations in native code
WODA '08: Proceedings of the 2008 international workshop on dynamic analysis: held in conjunction with the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2008)Compilers perform optimizations to improve application runtime performance, but they often fail to generate optimal code due to complicated interactions between optimizations and unforeseen interactions of optimizations with the target architecture. ...







Comments