Abstract
A large body of data-flow analyses exists for analyzing and optimizing sequential code. Unfortunately, much of it cannot be directly applied on parallel code, for reasons of correctness. This article presents a technique to automatically, aggressively, yet safely apply sequentially-sound data-flow transformations, without change, on shared-memory programs. The technique is founded on the notion of program references being “siloed” on certain control-flow paths. Intuitively, siloed references are free of interference from other threads within the confines of such paths. Data-flow transformations can, in general, be unblocked on siloed references.
The solution has been implemented in a widely used compiler. Results on benchmarks from SPLASH-2 show that performance improvements of up to 41% are possible, with an average improvement of 6% across all the tested programs over all thread counts.
- Adve, S. V. and Gharachorloo, K. 1996. Shared memory consistency models: A tutorial. IEEE Comput. 29, 12, 66--76. Google Scholar
Digital Library
- Adve, S. V. and Hill, M. D. 1990. Weak ordering: A new definition. In Proceedings of the International Symposium on Computer Architecture. ACM, New York, NY, 2--14. Google Scholar
Digital Library
- Blanchet, B. 1999. Escape analysis for object-oriented languages: Application to Java. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages and Applications. ACM, New York, NY, 20--34. Google Scholar
Digital Library
- Boehm, H.-J. and Adve, S. V. 2008. Foundations of the C++ concurrency memory model. In Proceedings of the Conference on Programming Language Design and Implementation. ACM, New York, NY, 68--78. Google Scholar
Digital Library
- Bogda, J. and Hölzle, U. 1999. Removing unnecessary synchronization in Java. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages and Applications. ACM, New York, NY, 35--46. Google Scholar
Digital Library
- Bristow, G., Drey, C., Edwards, B., and Riddle, W. 1979. Anomaly detection in concurrent programs. In Proceedings of the International Conference on Software Engineering. IEEE Computer Society, Los Alamitos, CA, 265--273. Google Scholar
Digital Library
- C Standard. 1999. ISO/IEC 9899:1999. http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf.Google Scholar
- C++ Standards Committee (WG21), Pete Becker, Ed. 2011. Programming Languages---C++ (final draft international standard). http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2011/n3290.pdf.Google Scholar
- C Standards Committee (WG14). 2010. Committee draft: Programming languages---C. http://www.openstd.org/JTC1/SC22/WG14/www/docs/n1539.pdf.Google Scholar
- Callahan, D. and Subhlok, J. 1988. Static analysis of low-level synchronization. In Proceedings of the Workshop on Parallel and Distributed Debugging. ACM, New York, NY, 100--111. Google Scholar
Digital Library
- Choi, J.-D., Gupta, M., Sreedhar, V. C., and Midkiff, S. P. 1999. Escape analysis for Java. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages and Applications. ACM, New York, NY, 1--19. Google Scholar
Digital Library
- Chow, F., Chan, S., Liu, S.-M., Lo, R., and Streich, M. 1996. Effective representation of aliases and indirect memory operations in SSA form. In Proceedings of the International Conference on Compiler Construction. Lecture Notes in Computer Science, vol. 1060, Springer, 253--267. Google Scholar
Digital Library
- Chugh, R., Voung, J. W., Jhala, R., and Lerner, S. 2008. Dataflow analysis for concurrent programs using datarace detection. In Proceedings of the Conference on Programming Language Design and Implementation. ACM, New York, NY, 316--326. Google Scholar
Digital Library
- Cytron, R., Ferrante, J., Rosen, B. K., and Wegman, M. N. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13, 4, 451--490. Google Scholar
Digital Library
- Duesterwald, E. and Soffa, M. L. 1991. Concurrency analysis in the presence of procedures using a data-flow framework. In Proceedings of the Symposium on Testing, Analysis and Verification. ACM, New York, NY, 36--48. Google Scholar
Digital Library
- GCC Changes. 2009. GCC 4.4 Release series---changes, new features, and fixes. At http://gcc.gnu.org/gcc-4.4/changes.html.Google Scholar
- Heffner, K., Tarditi, D., and Smith, M. D. 2007. Extending object-oriented optimizations for concurrent programs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, Los Alamitos, CA, 119--129. Google Scholar
Digital Library
- Hendren, L. J. and Nicolau, A. 1990. Parallelizing programs with recursive data structures. IEEE Trans. Parallel Distrib. Syst. 1, 1, 35--47. Google Scholar
Digital Library
- Huang, L., Sethuraman, G., and Chapman, B. 2007. Parallel data flow analysis for OpenMP programs. In Proceedings of the International Workshop on OpenMP. Lecture Notes in Computer Science, vol. 4935, Springer, 138--142. Google Scholar
Digital Library
- Kam, J. B. and Ullman, J. D. 1977. Monotone data flow analysis frameworks. Acta Informatica 7, 3, 305--317.Google Scholar
Digital Library
- Karp, A. H. and Flatt, H. P. 1990. Measuring parallel processor performance. Comm. ACM 33, 5, 539--543. Google Scholar
Digital Library
- Knoop, J., Steffen, B., and Vollmer, J. 1996. Parallelism for free: Efficient and optimal bitvector analyses for parallel programs. ACM Trans. Program. Lang. Syst. 18, 3, 268--299. Google Scholar
Digital Library
- Lamport, L. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9, 690--691. Google Scholar
Digital Library
- Lee, J., Midkiff, S. P., and Padua, D. A. 1997. Concurrent static single assignment form and constant propagation for explicitly parallel programs. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 1366, Springer, 114--130. Google Scholar
Digital Library
- Li, L. and Verbrugge, C. 2004. A practical MHP information analysis for concurrent Java programs. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 3602, Springer, 194--208. Google Scholar
Digital Library
- Masticola, S. P. and Ryder, B. G. 1993. Non-concurrency analysis. In Proceedings of the Symposium on Principles and Practices of Parallel Programming. ACM, New York, NY, 129--138. Google Scholar
Digital Library
- Midkiff, S. P. and Padua, D. A. 1990. Issues in the optimization of parallel programs. In Proceedings of the International Conference on Parallel Processing. Vol. II, Pennsylvania State University Press, University Park, PA, 105--113.Google Scholar
- Naumovich, G. and Avrunin, G. S. 1998. A conservative data flow algorithm for detecting all pairs of statements that may happen in parallel. In Proceedings of the Symposium on Foundations of Software Engineering. ACM, New York, NY, 24--34. Google Scholar
Digital Library
- Naumovich, G., Avrunin, G. S., and Clarke, L. A. 1999. An efficient algorithm for computing MHP information for concurrent Java programs. In Proceedings of the Symposium on Foundations of Software Engineering. ACM, New York, NY, 338--354. Google Scholar
Digital Library
- Novillo, D. 2007. Memory SSA: A unified approach for sparsely representing memory operations. In Proceedings of the GCC Developers’ Summit. 97--110.Google Scholar
- Novillo, D., Unrau, R. C., and Schaeffer, J. 1998. Concurrent SSA form in the presence of mutual exclusion. In Proceedings of the International Conference on Parallel Processing. IEEE Computer Society, Los Alamitos, CA, 356--364. Google Scholar
Digital Library
- OpenMP API. 2008. OpenMP application program interface. Version 3.0 Ed. OpenMP Architecture Review Board.Google Scholar
- POSIX. 2004. IEEE Standard 1003.1. The IEEE and The Open Group.Google Scholar
- Rodríguez, E., Dwyer, M., Flanagan, C., Hatcliff, J., Leavens, G. T., and Robby. 2005. Extending JML for modular specification and verification of multi-threaded programs. In Proceedings of the European Conference on Object-Oriented Programming. Lecture Notes in Computer Science, vol. 3586, Springer, 551--576. Google Scholar
Digital Library
- Rogers, A. and Li, K. 1992. Software support for speculative loads. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, NY, 38--50. Google Scholar
Digital Library
- Ruf, E. 2000. Effective synchronization removal for Java. In Proceedings of the Conference on Programming Language Design and Implementation. ACM, New York, NY, 208--218. Google Scholar
Digital Library
- Rugina, R. and Rinard, M. C. 2003. Pointer analysis for structured parallel programs. ACM Trans. Program. Lang. Syst. 25, 1, 70--116. Google Scholar
Digital Library
- Sarkar, V. 1997. Analysis and optimization of explicitly parallel programs using the parallel program graph representation. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 1366, Springer, 94--113. Google Scholar
Digital Library
- Satoh, S., Kusano, K., and Sato, M. 2001. Compiler optimization techniques for OpenMP programs. Sci. Program. 9, 2/3, 131--142. Google Scholar
Digital Library
- Ševčík, J. 2008. Program transformations in weak memory models. Ph.D. thesis, University of Edinburgh.Google Scholar
- Shasha, D. and Snir, M. 1988. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst. 10, 2, 282--312. Google Scholar
Digital Library
- Sreedhar, V. C., Zhang, Y., and Gao, G. R. 2005. A new framework for analysis and optimization of shared memory parallel programs. CAPSL Tech. memo 63, University of Delaware.Google Scholar
- Srinivasan, H., Hook, J., and Wolfe, M. 1993. Static single assignment for explicitly parallel programs. In Proceedings of the Symposium on Principles of Programming Languages. ACM, New York, NY, 260--272. Google Scholar
Digital Library
- Sura, Z., Fang, X., Wong, C.-L., Midkiff, S. P., Lee, J., and Padua, D. A. 2005. Compiler techniques for high performance sequentially consistent Java programs. In Proceedings of the Symposium on Principles and Practices of Parallel Programming. ACM, New York, NY, 2--13. Google Scholar
Digital Library
- Taylor, R. N. 1983. A general-purpose algorithm for analyzing concurrent programs. Comm. ACM 26, 5, 362--376. Google Scholar
Digital Library
- Tian, X., Bik, A., Girkar, M., Grey, P., Saito, H., and Su, E. 2002. Intel OpenMP C++/Fortran compiler for hyper-threading technology: Implementation and performance. Intel Techn. J. 6, 1, 36--46.Google Scholar
- von Praun, C. and Gross, T. R. 2003. Static conflict analysis for multi-threaded object-oriented programs. In Proceedings of the Conference on Programming Language Design and Implementation. ACM, New York, NY, 115--128. Google Scholar
Digital Library
- von Praun, C., Schneider, F., and Gross, T. R. 2003. Load elimination in the presence of side effects, concurrency and precise exceptions. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 2958, Springer, 390--405.Google Scholar
Cross Ref
- Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the International Symposium on Computer Architecture. ACM, New York, NY, 24--36. Google Scholar
Digital Library
- Zhang, Y., Sreedhar, V. C., Zhu, W., Sarkar, V., and Gao, G. R. 2007. Optimized lock assignment and allocation: A method for exploiting concurrency among critical sections. CAPSL tech. memo revised 65, University of Delaware.Google Scholar
Index Terms
On a Technique for Transparently Empowering Classical Compiler Optimizations on Multithreaded Code
Recommendations
A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code
POPL '11A large body of data-flow analyses exists for analyzing and optimizing sequential code. Unfortunately, much of it cannot be directly applied on parallel code, for reasons of correctness. This paper presents a technique to automatically, aggressively, ...
A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code
POPL '11: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languagesA large body of data-flow analyses exists for analyzing and optimizing sequential code. Unfortunately, much of it cannot be directly applied on parallel code, for reasons of correctness. This paper presents a technique to automatically, aggressively, ...
Precise and efficient integration of interprocedural alias information into data-flow analysis
Data-flow analysis is a basis for program optimization and parallelizing transformations. The mechanism of passing reference parameters at call sites generates interprocedural aliases which complicate this analysis. Solutions have been developed for ...






Comments