Abstract
A large body of data-flow analyses exists for analyzing and optimizing sequential code. Unfortunately, much of it cannot be directly applied on parallel code, for reasons of correctness. This paper presents a technique to automatically, aggressively, yet safely apply sequentially-sound data-flow transformations, without change, on shared-memory programs. The technique is founded on the notion of program references being "siloed" on certain control-flow paths. Intuitively, siloed references are free of interference from other threads within the confines of such paths. Data-flow transformations can, in general, be unblocked on siloed references.
The solution has been implemented in a widely used compiler. Results on benchmarks from SPLASH-2 show that performance improvements of up to 41% are possible, with an average improvement of 6% across all the tested programs over all thread counts.
Supplemental Material
- Adve, S. V., and Gharachorloo, K. Shared Memory Consistency Models: A Tutorial. IEEE Computer 29, 12 (Dec. 1996), 66--76. Google Scholar
Digital Library
- Adve, S. V., and Hill, M. D. Weak Ordering--A New Definition. In Proc. International Symposium on Computer Architecture (May 1990), pp. 2--14. Google Scholar
Digital Library
- Boehm, H.-J., and Adve, S. V. Foundations of the C++ Concurrency Memory Model. In Proc. Conference on Programming Language Design and Implementation (June 2008), pp. 68--78 Google Scholar
Digital Library
- Bristow, G., Drey, C., Edwards, B., and Riddle, W. Anomaly Detection in Concurrent Programs. In Proc. International Conference on Software Engineering (Sept. 1979), pp. 265--273. Google Scholar
Digital Library
- Callahan, D., and Subhlok, J. Static Analysis of Low-level Synchronization. In Proc. ACM Workshop on Parallel and Distributed Debugging (May 1988), pp. 100--111. Google Scholar
Digital Library
- Choi, J.-D., Gupta, M., Sreedhar, V. C., and Midkiff, S. P. Escape Analysis for Java. In Proc. Conference on Object-Oriented Programming, Systems, Languages and Applications (Nov. 1999), pp. 1--19. Google Scholar
Digital Library
- Chow, F., Chan, S., Liu, S.-M., Lo, R., and Streich, M. Effective Representation of Aliases and Indirect Memory Operations in SSA Form. In Proc. International Conference on Compiler Construction (Apr. 1996), vol. 1060 of Lecture Notes in Computer Science, Springer, pp. 253--267. Google Scholar
Digital Library
- Duesterwald, E., and Soffa, M. L. Concurrency Analysis in the Presence of Procedures Using a Data-Flow Framework. In Proc. Symposium on Testing, Analysis and Verification (Oct. 1991), pp. 36--48. Google Scholar
Digital Library
- GCC 4.4 Release Series--Changes, New Features, and Fixes. At http://gcc.gnu.org/gcc-4.4/changes.html.Google Scholar
- Heffner, K., Tarditi, D., and Smith, M. D. Extending Object-Oriented Optimizations for Concurrent Programs. In Proc. International Conference on Parallel Architectures and Compilation Techniques (Sept. 2007), pp. 119--129. Google Scholar
Digital Library
- Hendren, L. J., and Nicolau, A. Parallelizing Programs with Recursive Data Structures. IEEE Transactions on Parallel and Distributed Systems 1, 1 (Jan. 1990), 35--47. Google Scholar
Digital Library
- Huang, L., Sethuraman, G., and Chapman, B. Parallel Data Flow Analysis for OpenMP Programs. In Proc. International Workshop on OpenMP (June 2007), vol. 4935 of Lecture Notes in Computer Science, Springer, pp. 138--142. Google Scholar
Digital Library
- The IEEE and The Open Group. IEEE Standard 1003.1, 2004.Google Scholar
- C Standard ISO/IEC 9899. At http://www.open-std.org/JTC1/.Google Scholar
- Joisha, P. G., Schreiber, R. S., Banerjee, P., Boehm, H.-J., and Chakrabarti, D. R. A Technique for the Effective and Automatic Reuse of Classical Compiler Optimizations on Multithreaded Code. Technical Report HPL-2010-81R1, Hewlett-Packard Laboratories, July 2010.Google Scholar
- Kam, J. B., and Ullman, J. D. Monotone Data Flow Analysis Frameworks. Acta Informatica 7, 3 (Sept. 1977), 305--317.Google Scholar
Digital Library
- Knoop, J., and Steffen, B. Parallelism for Free: Efficient and Optimal Bitvector Analyses for Parallel Programs. ACM Transactions on Programming Languages and Systems 18, 3 (May 1996), 268--299. Google Scholar
Digital Library
- Lamport, L. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Transactions on Computers C-28, 9 (Sept. 1979), 690--691. Google Scholar
Digital Library
- Lee, J., Midkiff, S. P., and Padua, D. A. Concurrent Static Single Assignment Form and Constant Propagation for Explicitly Parallel Programs. In Proc. International Workshop on Languages and Compilers for Parallel Computing (Aug. 1997), vol. 1366 of Lecture Notes in Computer Science, Springer, pp. 114--130. Google Scholar
Digital Library
- Li, L., and Verbrugge, C. A Practical MHP Information Analysis for Concurrent Java Programs. In Proc. International Workshop on Languages and Compilers for Parallel Computing (Sept. 2004), vol. 3602 of Lecture Notes in Computer Science, Springer, pp. 194--208. Google Scholar
Digital Library
- Masticola, S. P., and Ryder, B. G. Non-concurrency Analysis. In Proc. Symposium on Principles and Practices of Parallel Programming (May 1993), pp. 129--138. Google Scholar
Digital Library
- Midkiff, S. P., and Padua, D. A. Issues in the Optimization of Parallel Programs. In Proc. International Conference on Parallel Processing (Aug. 1990), vol. II, The Pennsylvania State University Press, pp. 105--113.Google Scholar
- Naumovich, G., and Avrunin, G. S. A Conservative Data Flow Algorithm for Detecting All Pairs of Statements that May Happen in Parallel. In Proc. Symposium on Foundations of Software Engineering (Nov. 1998), pp. 24--34. Google Scholar
Digital Library
- Naumovich, G., Avrunin, G. S., and Clarke, L. A. An Efficient Algorithm for Computing MHP Information for Concurrent Java Programs. In Proc. Symposium on Foundations of Software Engineering (Sept. 1999), pp. 338--354. Google Scholar
Digital Library
- Novillo, D. Memory SSA-A Unified Approach for Sparsely Representing Memory Operations. In Proc. GCC Developers' Summit (July 2007), pp. 97--110.Google Scholar
- Novillo, D., Unrau, R., and Schaeffer, J. Concurrent SSA Form in the Presence of Mutual Exclusion. In Proc. International Conference on Parallel Processing (Aug. 1998), IEEE Computer Society Press, pp. 356--364. Google Scholar
Digital Library
- OpenMP Architecture Review Board. OpenMP Application Program Interface, version 3.0 ed., May 2008.Google Scholar
- Rodríguez, E., Dwyer, M., Flanagan, C., Hatcliff, J., Leavens, G. T., and Robby. Extending JML for Modular Specification and Verification of Multi-threaded Programs. In Proc. European Conference on Object-Oriented Programming (July 2005), vol. 3586 of Lecture Notes in Computer Science, Springer, pp. 551--576. Google Scholar
Digital Library
- Ruf, E. Effective Synchronization Removal for Java. In Proc. Conference on Programming Language Design and Implementation (June 2000), pp. 208--218. Google Scholar
Digital Library
- Rugina, R., and Rinard, M. C. Pointer Analysis for Structured Parallel Programs. ACM Transactions on Programming Languages and Systems 25, 1 (Jan. 2003), 70--116. Google Scholar
Digital Library
- Sarkar, V. Analysis and Optimization of Explicitly Parallel Programs Using the Parallel Program Graph Representation. In Proc. International Workshop on Languages and Compilers for Parallel Computing (Aug. 1997), vol. 1366 of Lecture Notes in Computer Science, Springer, pp. 94--113. Google Scholar
Digital Library
- Satoh, S., Kusano, K., and Sato, M. Compiler Optimization Techniques for OpenMP Programs. Scientific Programming 9, 2/3 (Aug. 2001), 131--142. Google Scholar
Digital Library
- Ševčík, J. Program Transformations in Weak Memory Models. PhD thesis, University of Edinburgh, 2008.Google Scholar
- Shasha, D., and Snir, M. Efficient and Correct Execution of Parallel Programs that Share Memory. ACM Transactions on Programming Languages and Systems 10, 2 (Apr. 1988), 282--312. Google Scholar
Digital Library
- Srinivasan, H., Hook, J., and Wolfe, M. Static Single Assignment for Explicitly Parallel Programs. In Proc. Symposium on Principles of Programming Languages (Jan. 1993), pp. 260--272. Google Scholar
Digital Library
- Sura, Z., Fang, X., Wong, C.-L., Midkiff, S. P., Lee, J., and Padua, D. A. Compiler Techniques for High Performance Sequentially Consistent Java Programs. In Proc. Symposium on Principles and Practices of Parallel Programming (June 2005), pp. 2--13. Google Scholar
Digital Library
- Taylor, R. N. A General-Purpose Algorithm for Analyzing Concurrent Programs. Communications of the ACM 26, 5 (May 1983), 362--376. Google Scholar
Digital Library
- Tian, X., Bik, A., Girkar, M., Grey, P., Saito, H., and Su, E. Intel OpenMP C++/Fortran Compiler for Hyper-Threading Technology: Implementation and Performance. Intel Technology Journal 6, 1 (Feb. 2002), 36--46Google Scholar
- von Praun, C., and Gross, T. R. Static Conflict Analysis for Multi-Threaded Object-Oriented Programs. In Proc. Conference on Programming Language Design and Implementation (June 2003), pp. 338--349. Google Scholar
Digital Library
- von Praun, C., Schneider, F., and Gross, T. R. Load Elimination in the Presence of Side Effects, Concurrency and Precise Exceptions. In Proc. International Workshop on Languages and Compilers for Parallel Computing (Oct. 2003), vol. 2958 of Lecture Notes in Computer Science, Springer, pp. 390--405.Google Scholar
- Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proc. International Symposium on Computer Architecture (June 1995), pp. 24--36. Google Scholar
Digital Library
- Zhang, Y., Sreedhar, V. C., Zhu, W., Sarkar, V., and Gao, G. R. Optimized Lock Assignment and Allocation: A Method for Exploiting Concurrency among Critical Sections. CAPSL Technical Memo Revised 65, University of Delaware, Mar. 2007.Google Scholar
Index Terms
A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code
Recommendations
A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code
POPL '11: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languagesA large body of data-flow analyses exists for analyzing and optimizing sequential code. Unfortunately, much of it cannot be directly applied on parallel code, for reasons of correctness. This paper presents a technique to automatically, aggressively, ...
On a Technique for Transparently Empowering Classical Compiler Optimizations on Multithreaded Code
A large body of data-flow analyses exists for analyzing and optimizing sequential code. Unfortunately, much of it cannot be directly applied on parallel code, for reasons of correctness. This article presents a technique to automatically, aggressively, ...
Compiler Analysis for Cache Coherence: Interprocedural Array Data-Flow Analysis and Its Impact on Cache Performance
In this paper, we present compiler algorithms for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, stale reference detection and locality preserving analysis. While the stale ...







Comments