skip to main content
research-article
Public Access

The CSI Framework for Compiler-Inserted Program Instrumentation

Authors Info & Claims
Published:19 December 2017Publication History
Skip Abstract Section

Abstract

The CSI framework provides comprehensive static instrumentation that a compiler can insert into a program-under-test so that dynamic-analysis tools - memory checkers, race detectors, cache simulators, performance profilers, code-coverage analyzers, etc. - can observe and investigate runtime behavior. Heretofore, tools based on compiler instrumentation would each separately modify the compiler to insert their own instrumentation. In contrast, CSI inserts a standard collection of instrumentation hooks into the program-under-test. Each CSI-tool is implemented as a library that defines relevant hooks, and the remaining hooks are "nulled" out and elided during either compile-time or link-time optimization, resulting in instrumented runtimes on par with custom instrumentation. CSI allows many compiler-based tools to be written as simple libraries without modifying the compiler, lowering the bar for the development of dynamic-analysis tools.

We have defined a standard API for CSI and modified LLVM to insert CSI hooks into the compiler's internal representation (IR) of the program. The API organizes IR objects - such as functions, basic blocks, and memory accesses - into flat and compact ID spaces, which not only simplifies the building of tools, but surprisingly enables faster maintenance of IR-object data than do traditional hash tables. CSI hooks contain a "property" parameter that allows tools to customize behavior based on static information without introducing overhead. CSI provides "forensic" tables that tools can use to associate IR objects with source-code locations and to relate IR objects to each other.

To evaluate the efficacy of CSI, we implemented six demonstration CSI-tools. One of our studies shows that compiling with CSI and linking with the "null" CSI-tool produces a tool-instrumented executable that is as fast as the original uninstrumented code. Another study, using a CSI port of Google's ThreadSanitizer, shows that the CSI-tool rivals the performance of Google's custom compiler-based implementation. All other demonstration CSI tools slow down the execution of the program-under-test by less than 70%.

References

  1. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (second ed.). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apache Software Foundation. 2016. ab -- Apache HTTP server benchmarking tool. Available at https://httpd.apache.org/docs/2.4/programs/ab.html. (2016).Google ScholarGoogle Scholar
  3. David R. Barach, David H. Taenzer, and Robert E. Wells. 1982. A Technique for Finding Storage Allocation Errors in C-language Programs. SIGPLAN Notices, Vol. 17, 5 (May 1982), 16--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrew R. Bernat and Barton P. Miller. 2011. Anywhere, Any-time Binary Instrumentation. In PASTE. 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Walter Binder, Alex Villazón, Danilo Ansaloni, and Philippe Moret. 2009. @J: Towards Rapid Development of Dynamic Analysis Tools for the Java Virtual Machine. VMIL. Article 4,9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Derek Bruening, Evelyn Duesterwald, and Saman Amarasinghe. 2001. Design and Implementation of a Dynamic Optimization Framework for Windows. FDDO-4.Google ScholarGoogle Scholar
  7. Eric Bruneton, Romain Lenglet, and Thierry Coupaye. 2002. ASM: A code manipulation tool to implement adaptable systems. Adaptable and Extensible Component Systems.Google ScholarGoogle Scholar
  8. Randal E. Bryant and David R. O’Hallaron. 2015. Computer Systems: A Programmer's Perspective (3rd ed.). Pearson, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bryan Buck and Jeffrey K. Hollingsworth. 2000. An API for Runtime Code Patching. Int. J. High Perform. Comput. Appl. Vol. 14, 4 (Nov.2000), 317--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mark Callaghan and Domas Mituzas. 2009. Poor Man's Profiler. Available at https://dom.as/2009/02/15/poor-mans-contention-profiling. (Feb.2009).Google ScholarGoogle Scholar
  11. Clang. 2017. Clang 6 Documentatoin: ThreadSanitizer. http://clang.llvm.org/docs/ThreadSanitizer.html. (2017).Google ScholarGoogle Scholar
  12. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (third ed.). The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Thomas Coudray, Arnaud Fontaine, and Pierre Chifflier. 2015. Picon: Control Flow Integrity on LLVM IR. In SSTIC.Google ScholarGoogle Scholar
  14. Markus Dahm. 1999. Byte Code Engineering. 267--277.Google ScholarGoogle Scholar
  15. Bruno De Bus, Dominique Chanet, Bjorn De Sutter, Ludo Van Put, and Koen De Bosschere. 2004. The Design and Implementation of FIT: A Flexible Instrumentation Toolkit. PASTE. 29--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chen Ding and Yutao Zhong. 2003. Predicting Whole-program Locality Through Reuse Distance Analysis. PLDI. 245--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Anne Dinning and Edith Schonberg. 1991. Detecting Access Anomalies in Programs with Critical Sections. PADD. 85--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Anne Carolyn Dinning. . 1990. Detecting Nondeterminism in Shared Memory Parallel Programs. Ph.D. Dissertation. Department of Computer Science, New York University.Google ScholarGoogle Scholar
  19. DWARF Standards Committee. 2015. DWARF Debugging Information Format Version 4. (2015).Google ScholarGoogle Scholar
  20. Mingdong Feng and Charles E. Leiserson. 1999. Efficient Detection of Determinacy Races in Cilk Programs. Theory of Computing Systems Vol. 32, 3 (1999), 301--326.Google ScholarGoogle ScholarCross RefCross Ref
  21. Free Software Foundation. 2009. GCC Wiki: LinkTimeOptimization. Available at https://gcc.gnu.org/wiki/LinkTimeOptimization. (Oct. . 2009).Google ScholarGoogle Scholar
  22. Free Software Foundation. 2014. GNU Binutils. Available at https://www.gnu.org/software/binutils/. (September 2014).Google ScholarGoogle Scholar
  23. Free Software Foundation. 2017. GNU Compiler Collection (GCC) Internals.Google ScholarGoogle Scholar
  24. Free Software Foundation. 2017. GNU Compiler Collection (GCC) Internals: Plugins. Available at https://gcc.gnu.org/onlinedocs/gccint/Plugins.html. (April 2017).Google ScholarGoogle Scholar
  25. Felix Garcia and Javier Fernandez. 2000. POSIX Thread Libraries. Linux Journal, Vol. 2000, 70es (Feb.2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Saturnino Garcia, Donghwan Jeon, Christopher M. Louie, and Michael Bedford Taylor. 2011. Kremlin: Rethinking and Rebooting gprof for the Multicore Age. SIGPLAN Not. Vol. 46 (June 2011), 458--469. Issue 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Google, Inc. 2015. Google C++Style Guide. Available at https://google.github.io/styleguide/cppguide.html. (2015).Google ScholarGoogle Scholar
  28. Susan L. Graham, Peter B. Kessler, and Marshall K. McKusick. 1982. gprof: A Call Graph Execution Profiler. SIGPLAN'82 Symposium on Compiler Construction. 120--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Reed Hastings and Bob Joyce. 1992. Purify: Fast detection of memory leaks and access errors. Winter 1992 USENIX Conference. 125--138.Google ScholarGoogle Scholar
  30. Yuxiong He, Charles E. Leiserson, and William M. Leiserson. 2010. The Cilkview Scalability Analyzer. In SPAA. 145--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Intel Corporation. 2015. Pin 2.14 User Guide. Available at https://software.intel.com/sites/landingpage/pintool/docs/71313/Pin/html/index.html. (January 2015).Google ScholarGoogle Scholar
  32. Rohit Jalan and Arun Kejariwal. 2012. Trin-Trin: Who's calling? A Pin-Based Dynamic Call Graph Extraction Framework. International Journal of Parallel Programming, Vol. 40, 4 (2012), 410--442.Google ScholarGoogle ScholarCross RefCross Ref
  33. Donghwan Jeon, Saturnino Garcia, Chris Louie, and Michael Bedford Taylor. 2011. Kismet: Parallel Speedup Estimates for Serial Programs. OOPSLA. 519--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Teresa Johnson, Mehdi Amini, and Xinliang David Li. 2017. ThinLTO: Scalable and Incremental LTO. In CGO. 111--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Gregor Kiczales, Erik Hilsdale, Jim Hugunin, Mik Kersten, Jeffrey Palm, and William G. Griswold. 2001. An Overview of AspectJ. In ECOOP. 327--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Leslie Lamport. 1978. Time, Clocks and the Ordering of Events in a Distributed System. (July 1978), 558--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. James R. Larus and Eric Schnarr. 1995. EEL: Machine-independent Executable Editing. In PLDI. 291--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. CGO. 75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Han Bok Lee and Benjamin Zorn. 1997. BIT: A Tool for Instrumenting Java Bytecodes. USENIX Symposium on Internet Technologies and Systems. USENIX Association, 73--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. I-Ting Angelina Lee and Tao B. Schardl. 2015. Efficiently Detecting Races in Cilk Programs That Use Reducer Hyperobjects. SPAA. 111--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. LLVM Project. 2016. LLVM Language Reference Manual. Available at http://llvm.org/docs/LangRef.html. (2016).Google ScholarGoogle Scholar
  42. LLVM Project. 2016. LLVM Link Time Optimization: Design and Implementation. Available at http://llvm.org/docs/LinkTimeOptimization.html. (2016).Google ScholarGoogle Scholar
  43. LLVM Project. 2016. Writing an LLVM Pass. Available at http://llvm.org/docs/WritingAnLLVMPass.html. (2016).Google ScholarGoogle Scholar
  44. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. PLDI. 190--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Lukávs Marek, Alex Villazón, Yudi Zheng, Danilo Ansaloni, Walter Binder, and Zhengwei Qi. 2012. DiSL: A Domain-Specific Language for Bytecode Instrumentation. AOSD. 239--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. John Mellor-Crummey. 1991. On-the-fly Detection of Data Races for Programs with Nested Fork-Join Parallelism. Supercomputing. 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. John Mellor-Crummey. 1993. Compile-Time Support for Efficient Data Race Detection in Shared-Memory Parallel Programs. PADD. 129--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. PLDI. 89--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Oracle. 2004. JVM#8482; Tool Interface (JVM TI). Available at http://docs.oracle.com/javase/1.5.0/docs/guide/jvmti/index.html. (2004).Google ScholarGoogle Scholar
  50. James Reinders. 2005. VTune Performance Analyzer Essentials. Intel Press.Google ScholarGoogle Scholar
  51. Ted Romer, Geoff Voelker, Dennis Lee, Alec Wolman, Wayne Wong, Hank Levy, Brian Bershad, and Brad Chen. 1997. Instrumentation and Optimization of Win32/Intel Executables Using Etch. NT. 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas Anderson. 1997. Eraser: A Dynamic Race Detector for Multi-Threaded Programs. ACM Transactions on Computer Systems Vol. 15, 4 (Nov. 1997), 391--411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Tao B. Schardl, Bradley C. Kuszmaul, I-Ting Angelina Lee, William M. Leiserson, and Charles E. Leiserson. 2015. The Cilkprof Scalability Profiler. In SPAA. 89--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Konstantin Serebryany and Timur Iskhodzhanov. 2009. ThreadSanitizer -- Data Race Detection in Practice. WBIA. 62--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Konstantin Serebryany, Alexander Potapenko, Timur Iskhodzhanov, and Dmitry Vyukov. 2011. Dynamic Race Detection with LLVM Compiler. Technical Report 37278. Google.Google ScholarGoogle Scholar
  57. Sameer Shende, Allen D. Malony, Janice Cuny, Peter Beckman, Steve Karmesin, and Kathleen Lindlan. 1998. Portable Profiling and Tracing for Parallel, Scientific Applications Using C++. SPDT. 134--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Michael D. Smith. 1991. Tracing with pixie. Technical Report CSL-TR-91--497. Stanford University.Google ScholarGoogle Scholar
  59. Amitabh Srivastava and Alan Eustace. 1994. ATOM: A System for Building Customized Program Analysis Tools. PLDI. 196--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Amitabh Srivastava and David W. Wall. 1992. A Practical System for Intermodule Code Optimization at Link-Time. Technical Report 92/6. Digital Western Research Laboratory.Google ScholarGoogle Scholar
  61. Richard M. Stallman and the GCC Developer Community. 2016. Using the GNU Compiler Collection (for GCC version 6.1.0). Free Software Foundation.Google ScholarGoogle Scholar
  62. Basile Starynkevitch. 2011. MELT -- A Translated Domain Specific Language Embedded in the GCC Compiler. DSL.Google ScholarGoogle Scholar
  63. Mark Stephenson, Siva Kumar Sastry Hari, Yunsup Lee, Eiman Ebrahimi, Daniel R. Johnson, David Nellans, Mike O'Connor, and Stephen W. Keckler. 2015. Flexible Software Profiling of GPU Architectures. ISCA. 185--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Rabin A. Sugumar and Santosh G. Abraham. 1993. Efficient Simulation of Caches Under Optimal Replacement with Applications to Miss Characterization. In SIGMETRICS. 24--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Ian Lance Taylor. 2008. A new ELF linker. Available at https://research.google.com/pubs/archive/34417.pdf. (2008).Google ScholarGoogle Scholar
  66. The Clang Team. 2017. Clang Plugins -- Clang 5 Documentation. Available at https://clang.llvm.org/docs/ClangPlugins.html. (2017).Google ScholarGoogle Scholar
  67. Mustafa M Tikir and Jeffrey K Hollingsworth. 2002. Efficient instrumentation for code coverage testing. ISSTA. 86--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Gang-Ryung Uh, Robert Cohn, Bharadwaj Yadavalli, Ramesh Peri, and Ravi Ayyagari. 2006. Analyzing Dynamic Binary Instrumentation Overhead.Google ScholarGoogle Scholar
  69. Raja Vallée-Rai, Etienne Gagnon, Laurie Hendren, Patrick Lam, Patrice Pominville, and Vijay Sundaresan. 2000. Optimizing Java Bytecode Using the Soot Framework: Is It Feasible? CC. Lecture Notes in Computer Science, Vol. Vol. 1781. 18--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. William von Hagen. 2006. The Definitive Guide to GCC (second ed.). Apress, Chapter 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. David W. Wall. 1989. Link-Time Code Modification. Technical Report 89/17. Digital Western Research Laboratory.Google ScholarGoogle Scholar
  72. Josef Weidendorfer. 2008. Sequential Performance Analysis with Callgrind and KCachegrind. 2nd International Workshop on Parallel Tools for High Performance Computing. 93--113.Google ScholarGoogle Scholar
  73. David A. Wheeler. 2001. More Than a Gigabuck: Estimating GNU/Linux's Size. Available at http://www.dwheeler.com/sloc/redhat71-v1/redhat71sloc.html. (June 2001).Google ScholarGoogle Scholar
  74. Abe White. 2007. Serp. Available at http://serp.sourceforge.net/. (2007).Google ScholarGoogle Scholar
  75. Yuan Yu, Tom Rodeheffer, and Wei Chen. 2005. RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking. SOSP. 221--234. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The CSI Framework for Compiler-Inserted Program Instrumentation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!