skip to main content
research-article

A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code

Published:26 January 2011Publication History
Skip Abstract Section

Abstract

A large body of data-flow analyses exists for analyzing and optimizing sequential code. Unfortunately, much of it cannot be directly applied on parallel code, for reasons of correctness. This paper presents a technique to automatically, aggressively, yet safely apply sequentially-sound data-flow transformations, without change, on shared-memory programs. The technique is founded on the notion of program references being "siloed" on certain control-flow paths. Intuitively, siloed references are free of interference from other threads within the confines of such paths. Data-flow transformations can, in general, be unblocked on siloed references.

The solution has been implemented in a widely used compiler. Results on benchmarks from SPLASH-2 show that performance improvements of up to 41% are possible, with an average improvement of 6% across all the tested programs over all thread counts.

Skip Supplemental Material Section

Supplemental Material

56-mpeg-4.mp4

References

  1. Adve, S. V., and Gharachorloo, K. Shared Memory Consistency Models: A Tutorial. IEEE Computer 29, 12 (Dec. 1996), 66--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Adve, S. V., and Hill, M. D. Weak Ordering--A New Definition. In Proc. International Symposium on Computer Architecture (May 1990), pp. 2--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Boehm, H.-J., and Adve, S. V. Foundations of the C++ Concurrency Memory Model. In Proc. Conference on Programming Language Design and Implementation (June 2008), pp. 68--78 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bristow, G., Drey, C., Edwards, B., and Riddle, W. Anoma­ly Detection in Concurrent Programs. In Proc. International Conference on Software Engineering (Sept. 1979), pp. 265--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Callahan, D., and Subhlok, J. Static Analysis of Low-level Synchronization. In Proc. ACM Workshop on Parallel and Distributed Debugging (May 1988), pp. 100--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Choi, J.-D., Gupta, M., Sreedhar, V. C., and Midkiff, S. P. Escape Analysis for Java. In Proc. Conference on Object-Oriented Programming, Systems, Languages and Applications (Nov. 1999), pp. 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chow, F., Chan, S., Liu, S.-M., Lo, R., and Streich, M. Effective Representation of Aliases and Indirect Memory Operations in SSA Form. In Proc. International Conference on Compiler Construction (Apr. 1996), vol. 1060 of Lecture Notes in Computer Science, Springer, pp. 253--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Duesterwald, E., and Soffa, M. L. Concurrency Analysis in the Presence of Procedures Using a Data-Flow Framework. In Proc. Symposium on Testing, Analysis and Verification (Oct. 1991), pp. 36--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. GCC 4.4 Release Series--Changes, New Features, and Fixes. At http://gcc.gnu.org/gcc-4.4/changes.html.Google ScholarGoogle Scholar
  10. Heffner, K., Tarditi, D., and Smith, M. D. Extending Object-Oriented Optimizations for Concurrent Programs. In Proc. International Conference on Parallel Architectures and Compilation Techniques (Sept. 2007), pp. 119--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hendren, L. J., and Nicolau, A. Parallelizing Programs with Recursive Data Structures. IEEE Transactions on Parallel and Distributed Systems 1, 1 (Jan. 1990), 35--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Huang, L., Sethuraman, G., and Chapman, B. Parallel Data Flow Analysis for OpenMP Programs. In Proc. International Workshop on OpenMP (June 2007), vol. 4935 of Lecture Notes in Computer Science, Springer, pp. 138--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. The IEEE and The Open Group. IEEE Standard 1003.1, 2004.Google ScholarGoogle Scholar
  14. C Standard ISO/IEC 9899. At http://www.open-std.org/JTC1/.Google ScholarGoogle Scholar
  15. Joisha, P. G., Schreiber, R. S., Banerjee, P., Boehm, H.-J., and Chakrabarti, D. R. A Technique for the Effective and Automatic Reuse of Classical Compiler Optimizations on Multithreaded Code. Technical Report HPL-2010-81R1, Hewlett-Packard Laboratories, July 2010.Google ScholarGoogle Scholar
  16. Kam, J. B., and Ullman, J. D. Monotone Data Flow Analysis Frameworks. Acta Informatica 7, 3 (Sept. 1977), 305--317.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Knoop, J., and Steffen, B. Parallelism for Free: Efficient and Optimal Bitvector Analyses for Parallel Programs. ACM Transactions on Programming Languages and Systems 18, 3 (May 1996), 268--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lamport, L. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Transactions on Computers C-28, 9 (Sept. 1979), 690--691. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lee, J., Midkiff, S. P., and Padua, D. A. Concurrent Static Single Assignment Form and Constant Propagation for Explicitly Parallel Programs. In Proc. International Workshop on Languages and Compilers for Parallel Computing (Aug. 1997), vol. 1366 of Lecture Notes in Computer Science, Springer, pp. 114--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Li, L., and Verbrugge, C. A Practical MHP Information Analysis for Concurrent Java Programs. In Proc. International Workshop on Languages and Compilers for Parallel Computing (Sept. 2004), vol. 3602 of Lecture Notes in Computer Science, Springer, pp. 194--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Masticola, S. P., and Ryder, B. G. Non-concurrency Analysis. In Proc. Symposium on Principles and Practices of Parallel Programming (May 1993), pp. 129--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Midkiff, S. P., and Padua, D. A. Issues in the Optimization of Parallel Programs. In Proc. International Conference on Parallel Processing (Aug. 1990), vol. II, The Pennsylvania State University Press, pp. 105--113.Google ScholarGoogle Scholar
  23. Naumovich, G., and Avrunin, G. S. A Conservative Data Flow Algorithm for Detecting All Pairs of Statements that May Happen in Parallel. In Proc. Symposium on Foundations of Software Engineering (Nov. 1998), pp. 24--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Naumovich, G., Avrunin, G. S., and Clarke, L. A. An Efficient Algorithm for Computing MHP Information for Concurrent Java Programs. In Proc. Symposium on Foundations of Software Engineering (Sept. 1999), pp. 338--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Novillo, D. Memory SSA-A Unified Approach for Sparsely Representing Memory Operations. In Proc. GCC Developers' Summit (July 2007), pp. 97--110.Google ScholarGoogle Scholar
  26. Novillo, D., Unrau, R., and Schaeffer, J. Concurrent SSA Form in the Presence of Mutual Exclusion. In Proc. International Conference on Parallel Processing (Aug. 1998), IEEE Computer Society Press, pp. 356--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. OpenMP Architecture Review Board. OpenMP Application Program Interface, version 3.0 ed., May 2008.Google ScholarGoogle Scholar
  28. Rodríguez, E., Dwyer, M., Flanagan, C., Hatcliff, J., Leavens, G. T., and Robby. Extending JML for Modular Specification and Verification of Multi-threaded Programs. In Proc. European Conference on Object-Oriented Programming (July 2005), vol. 3586 of Lecture Notes in Computer Science, Springer, pp. 551--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ruf, E. Effective Synchronization Removal for Java. In Proc. Conference on Programming Language Design and Implementation (June 2000), pp. 208--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rugina, R., and Rinard, M. C. Pointer Analysis for Structured Parallel Programs. ACM Transactions on Programming Languages and Systems 25, 1 (Jan. 2003), 70--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sarkar, V. Analysis and Optimization of Explicitly Parallel Programs Using the Parallel Program Graph Representation. In Proc. International Workshop on Languages and Compilers for Parallel Computing (Aug. 1997), vol. 1366 of Lecture Notes in Computer Science, Springer, pp. 94--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Satoh, S., Kusano, K., and Sato, M. Compiler Optimization Techniques for OpenMP Programs. Scientific Programming 9, 2/3 (Aug. 2001), 131--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ševčík, J. Program Transformations in Weak Memory Models. PhD thesis, University of Edinburgh, 2008.Google ScholarGoogle Scholar
  34. Shasha, D., and Snir, M. Efficient and Correct Execution of Parallel Programs that Share Memory. ACM Transactions on Programming Languages and Systems 10, 2 (Apr. 1988), 282--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Srinivasan, H., Hook, J., and Wolfe, M. Static Single Assignment for Explicitly Parallel Programs. In Proc. Symposium on Principles of Programming Languages (Jan. 1993), pp. 260--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sura, Z., Fang, X., Wong, C.-L., Midkiff, S. P., Lee, J., and Padua, D. A. Compiler Techniques for High Performance Sequentially Consistent Java Programs. In Proc. Symposium on Principles and Practices of Parallel Programming (June 2005), pp. 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Taylor, R. N. A General-Purpose Algorithm for Analyzing Concurrent Programs. Communications of the ACM 26, 5 (May 1983), 362--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tian, X., Bik, A., Girkar, M., Grey, P., Saito, H., and Su, E. Intel OpenMP C++/Fortran Compiler for Hyper-Threading Technology: Implementation and Performance. Intel Technology Journal 6, 1 (Feb. 2002), 36--46Google ScholarGoogle Scholar
  39. von Praun, C., and Gross, T. R. Static Conflict Analysis for Multi-Threaded Object-Oriented Programs. In Proc. Conference on Programming Language Design and Implementation (June 2003), pp. 338--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. von Praun, C., Schneider, F., and Gross, T. R. Load Elimination in the Presence of Side Effects, Concurrency and Precise Exceptions. In Proc. International Workshop on Languages and Compilers for Parallel Computing (Oct. 2003), vol. 2958 of Lecture Notes in Computer Science, Springer, pp. 390--405.Google ScholarGoogle Scholar
  41. Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proc. International Symposium on Computer Architecture (June 1995), pp. 24--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zhang, Y., Sreedhar, V. C., Zhu, W., Sarkar, V., and Gao, G. R. Optimized Lock Assignment and Allocation: A Method for Exploiting Concurrency among Critical Sections. CAPSL Technical Memo Revised 65, University of Delaware, Mar. 2007.Google ScholarGoogle Scholar

Index Terms

  1. A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 46, Issue 1
      POPL '11
      January 2011
      624 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1925844
      Issue’s Table of Contents
      • cover image ACM Conferences
        POPL '11: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
        January 2011
        652 pages
        ISBN:9781450304900
        DOI:10.1145/1926385

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 January 2011

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!