skip to main content
research-article

Speculative separation for privatization and reductions

Published:11 June 2012Publication History
Skip Abstract Section

Abstract

Automatic parallelization is a promising strategy to improve application performance in the multicore era. However, common programming practices such as the reuse of data structures introduce artificial constraints that obstruct automatic parallelization. Privatization relieves these constraints by replicating data structures, thus enabling scalable parallelization. Prior privatization schemes are limited to arrays and scalar variables because they are sensitive to the layout of dynamic data structures. This work presents Privateer, the first fully automatic privatization system to handle dynamic and recursive data structures, even in languages with unrestricted pointers. To reduce sensitivity to memory layout, Privateer speculatively separates memory objects. Privateer's lightweight runtime system validates speculative separation and speculative privatization to ensure correct parallel execution. Privateer enables automatic parallelization of general-purpose C/C++ applications, yielding a geomean whole-program speedup of 11.4x over best sequential execution on 24 cores, while non-speculative parallelization yields only 0.93x.

References

  1. E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for C/C++. In Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H.-J. Boehm. Simple garbage-collector-safety. In Proceedings of the ACM SIGPLAN 1996 conference on Programming Language Design and Implementation, pages 89--98, New York, NY, 1996. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Chen, J. Lin, X. Dai, W.-C. Hsu, and P.-C. Yew. Data dependence profiling for speculative optimizations. In E. Duesterwald, editor, Compiler Construction, volume 2985 of Lecture Notes in Computer Science, pages 2733--2733. Springer Berlin / Heidelberg, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  5. W. Y. Chen, S. A. Mahlke, and W. W. Hwu. Tolerating first level memory access latency in high-performance systems. In Proceedings of the 1992 International Conference on Parallel Processing, pages 36--43, Boca Raton, Florida, 1992. CRC Press.Google ScholarGoogle Scholar
  6. R. Cytron, J. Ferrante, B. K. Rosen, M. N.Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451--490, October 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. H. Dang, H. Yu, and L. Rauchwerger. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the 16th International Parallel and Distributed Processing Symposium, pages 20--29, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In Distributed Computing, pages 194--208, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 223--234, New York, NY, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Feautrier. Array expansion. In Proceedings of the 2nd International Conference on Supercomputing, pages 429--441. ACM, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Gabbay and A. Mendelson. Can program profiling support value prediction? In Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture, pages 270--280, Washington, DC, 1997. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, pages 3--14, Washington, DC, 2001. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Kim, N. P. Johnson, J. W. Lee, S. A. Mahlke, and D. I. August. Automatic speculative DOALL for clusters. Proceedings of the 10th IEEE/ACM International Symposium on Code Generation and Optimization, March 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 107--120, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the Annual International Symposium on Code Generation and Optimization, pages 75--86, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. E. Maydan, S. P. Amarasinghe, and M. S. Lam. Array-data flow analysis and its use in array privatization. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 2--15, New York, NY, 1993. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Ni, A. Welc, A.-R. Adl-Tabatabai, M. Bach, S. Berkowits, J. Cownie, R. Geva, S. Kozhukow, R. Narayanaswamy, J. Olivier, S. Preis, B. Saha, A. Tal, and X. Tian. Design and implementation of transactional constructs for C/C++. In Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, pages 195--212, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. G. Quiünones, C. Madriles, J. Sánchez, P. Marcuello, A. Gonzleáz, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Proceedings of the 2005 ACM SIGPLAN conference on Programming Language Design and Implementation, pages 269--279, New York, NY, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Raman, G. Ottoni, A. Raman, M. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In Proceedings of the Annual International Symposium on Code Generation and Optimization, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Rauchwerger and D. Padua. The Privatizing DOALL test: A run-time technique for DOALL loop identification and array privatization. In Proceedings of the 8th International Conference on Supercomputing, pages 33--43, New York, NY, 1994. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Rauchwerger and D. Padua. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. ACM SIGPLAN Notices, 30(6):218--232, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Rus, G. He, C. Alias, and L. Rauchwerger. Region Array SSA. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pages 43--52. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. International Journal of Parallel Programming, 31:251--283, August 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Standard Performance Evaluation Corporation. http://spec.org.Google ScholarGoogle Scholar
  26. The GNU Project. GNU Binutils. http://gnu.org/software/binutils.Google ScholarGoogle Scholar
  27. C. Tian, M. Feng, and R. Gupta. Supporting Speculative Parallelization in the Presence of Dynamic Data Structures. In ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Trimaran. Trimaran Benchmarks Packages. http://trimaran.org.Google ScholarGoogle Scholar
  29. P. Tu and D. A. Padua. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pages 500--521, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 49--59, Washington, DC, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. Vanbroekhoven, G. Janssens, M. Bruynooghe, and F. Catthoor. A practical dynamic single assignment transformation. ACM Transactions on Design Automation of Electronic Systems, 12, September 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Vandierendonck, S. Rul, and K. De Bosschere. The Paralax infrastructure: Automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architecture and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Veeraraghavan, D. Lee, B.Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. Doubleplay: parallelizing sequential logging and replay. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 15--26, New York, NY, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Q. Wu, A. Pyatakov, A. N. Spiridonov, E. Raman, D. W. Clark, and D. I. August. Exposing memory access regularities using objectrelative memory profiling. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture, 2008Google ScholarGoogle Scholar

Index Terms

  1. Speculative separation for privatization and reductions

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 47, Issue 6
      PLDI '12
      June 2012
      534 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2345156
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2012
        572 pages
        ISBN:9781450312059
        DOI:10.1145/2254064

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2012

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!