Abstract
Automatic parallelization is a promising strategy to improve application performance in the multicore era. However, common programming practices such as the reuse of data structures introduce artificial constraints that obstruct automatic parallelization. Privatization relieves these constraints by replicating data structures, thus enabling scalable parallelization. Prior privatization schemes are limited to arrays and scalar variables because they are sensitive to the layout of dynamic data structures. This work presents Privateer, the first fully automatic privatization system to handle dynamic and recursive data structures, even in languages with unrestricted pointers. To reduce sensitivity to memory layout, Privateer speculatively separates memory objects. Privateer's lightweight runtime system validates speculative separation and speculative privatization to ensure correct parallel execution. Privateer enables automatic parallelization of general-purpose C/C++ applications, yielding a geomean whole-program speedup of 11.4x over best sequential execution on 24 cores, while non-speculative parallelization yields only 0.93x.
- E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for C/C++. In Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, 2009. Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008. Google Scholar
Digital Library
- H.-J. Boehm. Simple garbage-collector-safety. In Proceedings of the ACM SIGPLAN 1996 conference on Programming Language Design and Implementation, pages 89--98, New York, NY, 1996. ACM. Google Scholar
Digital Library
- T. Chen, J. Lin, X. Dai, W.-C. Hsu, and P.-C. Yew. Data dependence profiling for speculative optimizations. In E. Duesterwald, editor, Compiler Construction, volume 2985 of Lecture Notes in Computer Science, pages 2733--2733. Springer Berlin / Heidelberg, 2004.Google Scholar
Cross Ref
- W. Y. Chen, S. A. Mahlke, and W. W. Hwu. Tolerating first level memory access latency in high-performance systems. In Proceedings of the 1992 International Conference on Parallel Processing, pages 36--43, Boca Raton, Florida, 1992. CRC Press.Google Scholar
- R. Cytron, J. Ferrante, B. K. Rosen, M. N.Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451--490, October 1991. Google Scholar
Digital Library
- F. H. Dang, H. Yu, and L. Rauchwerger. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the 16th International Parallel and Distributed Processing Symposium, pages 20--29, 2002. Google Scholar
Digital Library
- D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In Distributed Computing, pages 194--208, 2006. Google Scholar
Digital Library
- C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 223--234, New York, NY, 2007. ACM. Google Scholar
Digital Library
- P. Feautrier. Array expansion. In Proceedings of the 2nd International Conference on Supercomputing, pages 429--441. ACM, 1988. Google Scholar
Digital Library
- F. Gabbay and A. Mendelson. Can program profiling support value prediction? In Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture, pages 270--280, Washington, DC, 1997. IEEE Computer Society. Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, pages 3--14, Washington, DC, 2001. IEEE Computer Society. Google Scholar
Digital Library
- H. Kim, N. P. Johnson, J. W. Lee, S. A. Mahlke, and D. I. August. Automatic speculative DOALL for clusters. Proceedings of the 10th IEEE/ACM International Symposium on Code Generation and Optimization, March 2012. Google Scholar
Digital Library
- K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 107--120, 1998. Google Scholar
Digital Library
- C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the Annual International Symposium on Code Generation and Optimization, pages 75--86, 2004. Google Scholar
Digital Library
- D. E. Maydan, S. P. Amarasinghe, and M. S. Lam. Array-data flow analysis and its use in array privatization. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 2--15, New York, NY, 1993. ACM. Google Scholar
Digital Library
- M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009. Google Scholar
Digital Library
- Y. Ni, A. Welc, A.-R. Adl-Tabatabai, M. Bach, S. Berkowits, J. Cownie, R. Geva, S. Kozhukow, R. Narayanaswamy, J. Olivier, S. Preis, B. Saha, A. Tal, and X. Tian. Design and implementation of transactional constructs for C/C++. In Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, pages 195--212, 2008. Google Scholar
Digital Library
- C. G. Quiünones, C. Madriles, J. Sánchez, P. Marcuello, A. Gonzleáz, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Proceedings of the 2005 ACM SIGPLAN conference on Programming Language Design and Implementation, pages 269--279, New York, NY, 2005. ACM. Google Scholar
Digital Library
- E. Raman, G. Ottoni, A. Raman, M. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In Proceedings of the Annual International Symposium on Code Generation and Optimization, 2008. Google Scholar
Digital Library
- L. Rauchwerger and D. Padua. The Privatizing DOALL test: A run-time technique for DOALL loop identification and array privatization. In Proceedings of the 8th International Conference on Supercomputing, pages 33--43, New York, NY, 1994. ACM. Google Scholar
Digital Library
- L. Rauchwerger and D. Padua. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. ACM SIGPLAN Notices, 30(6):218--232, 1995. Google Scholar
Digital Library
- S. Rus, G. He, C. Alias, and L. Rauchwerger. Region Array SSA. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pages 43--52. ACM, 2006. Google Scholar
Digital Library
- S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. International Journal of Parallel Programming, 31:251--283, August 2003. Google Scholar
Digital Library
- Standard Performance Evaluation Corporation. http://spec.org.Google Scholar
- The GNU Project. GNU Binutils. http://gnu.org/software/binutils.Google Scholar
- C. Tian, M. Feng, and R. Gupta. Supporting Speculative Parallelization in the Presence of Dynamic Data Structures. In ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010. Google Scholar
Digital Library
- Trimaran. Trimaran Benchmarks Packages. http://trimaran.org.Google Scholar
- P. Tu and D. A. Padua. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pages 500--521, 1994. Google Scholar
Digital Library
- N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 49--59, Washington, DC, 2007. IEEE Computer Society. Google Scholar
Digital Library
- P. Vanbroekhoven, G. Janssens, M. Bruynooghe, and F. Catthoor. A practical dynamic single assignment transformation. ACM Transactions on Design Automation of Electronic Systems, 12, September 2007. Google Scholar
Digital Library
- H. Vandierendonck, S. Rul, and K. De Bosschere. The Paralax infrastructure: Automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architecture and Compilation Techniques. Google Scholar
Digital Library
- K. Veeraraghavan, D. Lee, B.Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. Doubleplay: parallelizing sequential logging and replay. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 15--26, New York, NY, 2011. ACM. Google Scholar
Digital Library
- Q. Wu, A. Pyatakov, A. N. Spiridonov, E. Raman, D. W. Clark, and D. I. August. Exposing memory access regularities using objectrelative memory profiling. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 2004. Google Scholar
Digital Library
- H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture, 2008Google Scholar
Index Terms
Speculative separation for privatization and reductions
Recommendations
Perspective: A Sensible Approach to Speculative Automatic Parallelization
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsThe promise of automatic parallelization, freeing programmers from the error-prone and time-consuming process of making efficient use of parallel processing resources, remains unrealized. For decades, the imprecision of memory analysis limited the ...
Speculative separation for privatization and reductions
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationAutomatic parallelization is a promising strategy to improve application performance in the multicore era. However, common programming practices such as the reuse of data structures introduce artificial constraints that obstruct automatic ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors
Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...







Comments