skip to main content
research-article
Free Access

On a Technique for Transparently Empowering Classical Compiler Optimizations on Multithreaded Code

Published:01 June 2012Publication History
Skip Abstract Section

Abstract

A large body of data-flow analyses exists for analyzing and optimizing sequential code. Unfortunately, much of it cannot be directly applied on parallel code, for reasons of correctness. This article presents a technique to automatically, aggressively, yet safely apply sequentially-sound data-flow transformations, without change, on shared-memory programs. The technique is founded on the notion of program references being “siloed” on certain control-flow paths. Intuitively, siloed references are free of interference from other threads within the confines of such paths. Data-flow transformations can, in general, be unblocked on siloed references.

The solution has been implemented in a widely used compiler. Results on benchmarks from SPLASH-2 show that performance improvements of up to 41% are possible, with an average improvement of 6% across all the tested programs over all thread counts.

References

  1. Adve, S. V. and Gharachorloo, K. 1996. Shared memory consistency models: A tutorial. IEEE Comput. 29, 12, 66--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Adve, S. V. and Hill, M. D. 1990. Weak ordering: A new definition. In Proceedings of the International Symposium on Computer Architecture. ACM, New York, NY, 2--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Blanchet, B. 1999. Escape analysis for object-oriented languages: Application to Java. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages and Applications. ACM, New York, NY, 20--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Boehm, H.-J. and Adve, S. V. 2008. Foundations of the C++ concurrency memory model. In Proceedings of the Conference on Programming Language Design and Implementation. ACM, New York, NY, 68--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bogda, J. and Hölzle, U. 1999. Removing unnecessary synchronization in Java. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages and Applications. ACM, New York, NY, 35--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bristow, G., Drey, C., Edwards, B., and Riddle, W. 1979. Anomaly detection in concurrent programs. In Proceedings of the International Conference on Software Engineering. IEEE Computer Society, Los Alamitos, CA, 265--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C Standard. 1999. ISO/IEC 9899:1999. http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf.Google ScholarGoogle Scholar
  8. C++ Standards Committee (WG21), Pete Becker, Ed. 2011. Programming Languages---C++ (final draft international standard). http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2011/n3290.pdf.Google ScholarGoogle Scholar
  9. C Standards Committee (WG14). 2010. Committee draft: Programming languages---C. http://www.openstd.org/JTC1/SC22/WG14/www/docs/n1539.pdf.Google ScholarGoogle Scholar
  10. Callahan, D. and Subhlok, J. 1988. Static analysis of low-level synchronization. In Proceedings of the Workshop on Parallel and Distributed Debugging. ACM, New York, NY, 100--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Choi, J.-D., Gupta, M., Sreedhar, V. C., and Midkiff, S. P. 1999. Escape analysis for Java. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages and Applications. ACM, New York, NY, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chow, F., Chan, S., Liu, S.-M., Lo, R., and Streich, M. 1996. Effective representation of aliases and indirect memory operations in SSA form. In Proceedings of the International Conference on Compiler Construction. Lecture Notes in Computer Science, vol. 1060, Springer, 253--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chugh, R., Voung, J. W., Jhala, R., and Lerner, S. 2008. Dataflow analysis for concurrent programs using datarace detection. In Proceedings of the Conference on Programming Language Design and Implementation. ACM, New York, NY, 316--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cytron, R., Ferrante, J., Rosen, B. K., and Wegman, M. N. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13, 4, 451--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Duesterwald, E. and Soffa, M. L. 1991. Concurrency analysis in the presence of procedures using a data-flow framework. In Proceedings of the Symposium on Testing, Analysis and Verification. ACM, New York, NY, 36--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. GCC Changes. 2009. GCC 4.4 Release series---changes, new features, and fixes. At http://gcc.gnu.org/gcc-4.4/changes.html.Google ScholarGoogle Scholar
  17. Heffner, K., Tarditi, D., and Smith, M. D. 2007. Extending object-oriented optimizations for concurrent programs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, Los Alamitos, CA, 119--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hendren, L. J. and Nicolau, A. 1990. Parallelizing programs with recursive data structures. IEEE Trans. Parallel Distrib. Syst. 1, 1, 35--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Huang, L., Sethuraman, G., and Chapman, B. 2007. Parallel data flow analysis for OpenMP programs. In Proceedings of the International Workshop on OpenMP. Lecture Notes in Computer Science, vol. 4935, Springer, 138--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kam, J. B. and Ullman, J. D. 1977. Monotone data flow analysis frameworks. Acta Informatica 7, 3, 305--317.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Karp, A. H. and Flatt, H. P. 1990. Measuring parallel processor performance. Comm. ACM 33, 5, 539--543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Knoop, J., Steffen, B., and Vollmer, J. 1996. Parallelism for free: Efficient and optimal bitvector analyses for parallel programs. ACM Trans. Program. Lang. Syst. 18, 3, 268--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lamport, L. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9, 690--691. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lee, J., Midkiff, S. P., and Padua, D. A. 1997. Concurrent static single assignment form and constant propagation for explicitly parallel programs. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 1366, Springer, 114--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Li, L. and Verbrugge, C. 2004. A practical MHP information analysis for concurrent Java programs. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 3602, Springer, 194--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Masticola, S. P. and Ryder, B. G. 1993. Non-concurrency analysis. In Proceedings of the Symposium on Principles and Practices of Parallel Programming. ACM, New York, NY, 129--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Midkiff, S. P. and Padua, D. A. 1990. Issues in the optimization of parallel programs. In Proceedings of the International Conference on Parallel Processing. Vol. II, Pennsylvania State University Press, University Park, PA, 105--113.Google ScholarGoogle Scholar
  28. Naumovich, G. and Avrunin, G. S. 1998. A conservative data flow algorithm for detecting all pairs of statements that may happen in parallel. In Proceedings of the Symposium on Foundations of Software Engineering. ACM, New York, NY, 24--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Naumovich, G., Avrunin, G. S., and Clarke, L. A. 1999. An efficient algorithm for computing MHP information for concurrent Java programs. In Proceedings of the Symposium on Foundations of Software Engineering. ACM, New York, NY, 338--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Novillo, D. 2007. Memory SSA: A unified approach for sparsely representing memory operations. In Proceedings of the GCC Developers’ Summit. 97--110.Google ScholarGoogle Scholar
  31. Novillo, D., Unrau, R. C., and Schaeffer, J. 1998. Concurrent SSA form in the presence of mutual exclusion. In Proceedings of the International Conference on Parallel Processing. IEEE Computer Society, Los Alamitos, CA, 356--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. OpenMP API. 2008. OpenMP application program interface. Version 3.0 Ed. OpenMP Architecture Review Board.Google ScholarGoogle Scholar
  33. POSIX. 2004. IEEE Standard 1003.1. The IEEE and The Open Group.Google ScholarGoogle Scholar
  34. Rodríguez, E., Dwyer, M., Flanagan, C., Hatcliff, J., Leavens, G. T., and Robby. 2005. Extending JML for modular specification and verification of multi-threaded programs. In Proceedings of the European Conference on Object-Oriented Programming. Lecture Notes in Computer Science, vol. 3586, Springer, 551--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Rogers, A. and Li, K. 1992. Software support for speculative loads. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 38--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ruf, E. 2000. Effective synchronization removal for Java. In Proceedings of the Conference on Programming Language Design and Implementation. ACM, New York, NY, 208--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Rugina, R. and Rinard, M. C. 2003. Pointer analysis for structured parallel programs. ACM Trans. Program. Lang. Syst. 25, 1, 70--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sarkar, V. 1997. Analysis and optimization of explicitly parallel programs using the parallel program graph representation. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 1366, Springer, 94--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Satoh, S., Kusano, K., and Sato, M. 2001. Compiler optimization techniques for OpenMP programs. Sci. Program. 9, 2/3, 131--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ševčík, J. 2008. Program transformations in weak memory models. Ph.D. thesis, University of Edinburgh.Google ScholarGoogle Scholar
  41. Shasha, D. and Snir, M. 1988. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst. 10, 2, 282--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sreedhar, V. C., Zhang, Y., and Gao, G. R. 2005. A new framework for analysis and optimization of shared memory parallel programs. CAPSL Tech. memo 63, University of Delaware.Google ScholarGoogle Scholar
  43. Srinivasan, H., Hook, J., and Wolfe, M. 1993. Static single assignment for explicitly parallel programs. In Proceedings of the Symposium on Principles of Programming Languages. ACM, New York, NY, 260--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sura, Z., Fang, X., Wong, C.-L., Midkiff, S. P., Lee, J., and Padua, D. A. 2005. Compiler techniques for high performance sequentially consistent Java programs. In Proceedings of the Symposium on Principles and Practices of Parallel Programming. ACM, New York, NY, 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Taylor, R. N. 1983. A general-purpose algorithm for analyzing concurrent programs. Comm. ACM 26, 5, 362--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Tian, X., Bik, A., Girkar, M., Grey, P., Saito, H., and Su, E. 2002. Intel OpenMP C++/Fortran compiler for hyper-threading technology: Implementation and performance. Intel Techn. J. 6, 1, 36--46.Google ScholarGoogle Scholar
  47. von Praun, C. and Gross, T. R. 2003. Static conflict analysis for multi-threaded object-oriented programs. In Proceedings of the Conference on Programming Language Design and Implementation. ACM, New York, NY, 115--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. von Praun, C., Schneider, F., and Gross, T. R. 2003. Load elimination in the presence of side effects, concurrency and precise exceptions. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 2958, Springer, 390--405.Google ScholarGoogle ScholarCross RefCross Ref
  49. Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the International Symposium on Computer Architecture. ACM, New York, NY, 24--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Zhang, Y., Sreedhar, V. C., Zhu, W., Sarkar, V., and Gao, G. R. 2007. Optimized lock assignment and allocation: A method for exploiting concurrency among critical sections. CAPSL tech. memo revised 65, University of Delaware.Google ScholarGoogle Scholar

Index Terms

  1. On a Technique for Transparently Empowering Classical Compiler Optimizations on Multithreaded Code

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Programming Languages and Systems
      ACM Transactions on Programming Languages and Systems  Volume 34, Issue 2
      June 2012
      212 pages
      ISSN:0164-0925
      EISSN:1558-4593
      DOI:10.1145/2220365
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 June 2012
      • Accepted: 1 March 2012
      • Revised: 1 November 2011
      • Received: 1 May 2011
      Published in toplas Volume 34, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!