Abstract
Speculative parallelization divides a sequential program into possibly parallel tasks and permits these tasks to run in parallel if and only if they show no dependences with each other. The parallelization is safe in that a speculative execution always produces the same output as the sequential execution.
In this paper, we present the dependence hint, an interface for a user to specify possible dependences between possibly parallel tasks. Dependence hints may be incorrect or incomplete but they do not change the program output. The interface extends Cytron's do-across and recent OpenMP ordering primitives and makes them safe and safely composable. We use it to express conditional and partial parallelism and to parallelize large-size legacy code. The prototype system is implemented as a software library. It is used to improve performance by nearly 10 times on average on current multicore machines for 8 programs including 5 SPEC benchmarks.
- E. Allen, D. Chase, C. Flood, V. Luchangco, J. Maessen, S. Ryu, and G. L. Steele. Project fortress: a multicore language for multicore processors. Linux Magazine, pages 38--43, September 2007.Google Scholar
- R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, Oct. 2001. Google Scholar
Digital Library
- A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient system-enforced deterministic parallelism. In Proceedings of the Symposium on Operating Systems Design and Implementation, 2010. Google Scholar
Digital Library
- A. Basumallik and R. Eigenmann. Optimizing irregular shared-memory applications for distributed-memory systems. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 119--128, 2006. Google Scholar
Digital Library
- M. A. Bender, J. T. Fineman, S. Gilbert, and C. E. Leiserson. On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 133--144, Barcelona, Spain, 2004. Google Scholar
Digital Library
- T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: a compiler and runtime system for deterministic multithreaded execution. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 53--64, 2010. Google Scholar
Digital Library
- E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe multithreaded programming for C/CGoogle Scholar
- . In Proceedings of the International Conference on Object Oriented Programming, Systems, Languages and Applications, 2009.Google Scholar
- S. Burckhardt, A. Baldassin, and D. Leijen. Concurrent programming with revisions and isolation types. In Proceedings of the International Conference on Object Oriented Programming, Systems, Languages and Applications, pages 691--707, 2010. Google Scholar
Digital Library
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the International Conference on Object Oriented Programming, Systems, Languages and Applications, pages 519--538, 2005. Google Scholar
Digital Library
- R. Cytron. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing, pages 836--844, St. Charles, IL, Aug. 1986.Google Scholar
- C. Ding. Access annotation for safe speculative parallelization: Semantics and support. Technical Report URCS #966, Department of Computer Science, University of Rochester, March 2011.Google Scholar
- C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 223--234, 2007. Google Scholar
Digital Library
- M. Feng, R. Gupta, and Y. Hu. SpiceC: scalable parallelism via implicit copying and explicit commit. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 69--80, 2011. Google Scholar
Digital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 212--223, 1998. Google Scholar
Digital Library
- L. Heyer, S. Kruglyak, and S. Yooseph. Exploring expression data: Identification and analysis of coexpressed genes. Genome Research, 9:1106--1115, 1999.Google Scholar
Cross Ref
- J. C. Jenista, Y. H. Eom, and B. Demsky. OoOJava: Software out-of-order execution. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 57--68, 2011. Google Scholar
Digital Library
- Y. Jiang and X. Shen. Adaptive software speculation for enhancing the cost-efficiency of behavior-oriented parallelization. In Proceedings of the International Conference on Parallel Processing, pages 270--278, 2008. Google Scholar
Digital Library
- L. Liu and Z. Li. Improving parallelism and locality with asynchronous algorithms. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 213--222, 2010. Google Scholar
Digital Library
- V. Luchangco and V. J. Marathe. Transaction communicators: enabling cooperation among concurrent transactions. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 169--178, 2011. Google Scholar
Digital Library
- J. M. Mellor-Crummey. On-the-fly detection of data races for programs with nested fork-join parallelism. In Proceedings of Supercomputing, pages 24--33, 1991. Google Scholar
Digital Library
- OpenMP application program interface, version 3.0, May 2008. http://www.openmp.org/mp-documents/spec30.pdf.Google Scholar
- K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 12--25, 2011. Google Scholar
Digital Library
- A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 65--76, 2010. Google Scholar
Digital Library
- L. Rauchwerger and D. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995. Google Scholar
Digital Library
- M. C. Rinard and M. S. Lam. The design, implementation, and evaluation of Jade. ACM Transactions on Programming Languages and Systems, 20(3):483--545, 1998. Google Scholar
Digital Library
- J. A. Roback and G. R. Andrews. Gossamer: A lightweight approach to using multicore machines. In Proceedings of the International Conference on Parallel Processing, pages 30--39, Washington, DC, USA, 2010. IEEE Computer Society. Google Scholar
Digital Library
- Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 215--228, Atlanta, Georgia, May 1999. Google Scholar
Digital Library
- M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 245--257, San Diego, CA, June 2003. Google Scholar
Digital Library
- W. Thies, V. Chandrasekhar, and S. P. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in c programs. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture, pages 356--369, 2007. Google Scholar
Digital Library
- C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or Discard execution model for speculative parallelization on multicores. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture, pages 330--341, 2008. Google Scholar
Digital Library
- K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: parallelizing sequential logging and replay. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 15--26, 2011. Google Scholar
Digital Library
- C. von Praun, L. Ceze, and C. Cascaval. Implicit parallelism with ordered transactions. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Mar. 2007. Google Scholar
Digital Library
- A. Welc, S. Jagannathan, and A. L. Hosking. Safe futures for Java. In Proceedings of the International Conference on Object Oriented Programming, Systems, Languages and Applications, pages 439--453, 2005. Google Scholar
Digital Library
- D. Wonnacott. Achieving scalable locality with time skewing. International Journal of Parallel Programming, 30(3), June 2002. Google Scholar
Digital Library
- A. Zhai, J. G. Steffan, C. B. Colohan, and T. C. Mowry. Compiler and hardware support for reducing the synchronization of speculative threads. ACM Transactions on Architecture and Code Optimization, 5(1):1--33, 2008. Google Scholar
Digital Library
- C. Zhang, C. Ding, X. Gu, K. Kelsey, T. Bai, and X. F. 0002. Continuous speculative program parallelization in software. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 335--336, 2010. poster paper. Google Scholar
Digital Library
Index Terms
Safe parallel programming using dynamic dependence hints
Recommendations
Safe parallel programming using dynamic dependence hints
OOPSLA '11: Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applicationsSpeculative parallelization divides a sequential program into possibly parallel tasks and permits these tasks to run in parallel if and only if they show no dependences with each other. The parallelization is safe in that a speculative execution always ...
Parallel programming by hints
SPLASH '11 Workshops: Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE! 2011, AOOPES'11, NEAT'11, & VMIL'11Sequential programs are often difficult to parallelize because of the complexity in their implementation and the uncertainty in their behavior. We will demonstrate behavior-oriented parallelization (BOP), which provides annotations for a user to mark ...
Parallel programming by hints
OOPSLA '11: Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companionA sequential program is difficult to parallelize often because of the complexity in its implementation and the uncertainty in its behavior. Behavior-oriented parallelization (bop) provides annotations for a user to mark possibly parallel tasks and a ...







Comments