Abstract
Many sequential applications are difficult to parallelize because of unpredictable control flow, indirect data access, and input-dependent parallelism. These difficulties led us to build a software system for behavior oriented parallelization (BOP), which allows a program to be parallelized based on partial information about program behavior, for example, a user reading just part of the source code, or a profiling tool examining merely one or few executions.
The basis of BOP is programmable software speculation, where a user or an analysis tool marks possibly parallel regions in the code, and the run-time system executes these regions speculatively. It is imperative to protect the entire address space during speculation. The main goal of the paper is to demonstrate that the general protection can be made cost effective by three novel techniques: programmable speculation, critical-path minimization, and value-based correctness checking. On a recently acquired multi-core, multi-processor PC, the BOP system reduced the end-to-end execution time by integer factors for a Lisp interpreter, a data compressor, a language parser, and a scientific library, with no change to the underlying hardware or operating system.
- R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, October 2001. Google Scholar
Digital Library
- M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001. Google Scholar
Digital Library
- A. J. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, 15(5):757--763, 1966.Google Scholar
Cross Ref
- W. Blume et al. Parallel programming with polaris. IEEE Computer, 29(12):77--81, December 1996. Google Scholar
Digital Library
- H.-J. Boehm. Threads cannot be implemented as a library. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 261--268, 2005. Google Scholar
Digital Library
- F. W. Chang and G. A. Gibson. Automatic i/o hint generation through speculative execution. In Proceedings of the Symposium on Operating Systems Design and Implementation, 1999. Google Scholar
Digital Library
- W. Chen, C. Iancu, and K. Yelick. Communication optimizations for fine-grained UPC applications. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, St. Louis, MO, 2005. Google Scholar
Digital Library
- T. M. Chilimbi and M. Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Berlin, Germany, June 2002. Google Scholar
Digital Library
- M. H. Cintra and D. R. Llanos. Design space exploration of a software speculative parallelization scheme. IEEE Transactions on Parallel and Distributed Systems, 16(6):562--576, 2005. Google Scholar
Digital Library
- R. Cytron. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, IL, August 1986.Google Scholar
- F. Dang, H. Yu, and L. Rauchwerger. The R-LRPD test: Speculative parallelization of partially parallel loops. Technical report, CS Dept., Texas A&M University, College Station, TX, 2002. Google Scholar
Digital Library
- C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999. Google Scholar
Digital Library
- B. Grant, M. Philipose, M. Mock, C. Chambers, and S. J. Eggers. An evaluation of staged run-time optimizations in DyC. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999. Google Scholar
Digital Library
- CGrelck and S.-B. Scholz. SAC-from high-level programming with arrays to efficient parallel execution. Parallel Processing Letters, 13(3):401--412, 2003.Google Scholar
Cross Ref
- M. Gupta and R. Nim. Techniques for run-time parallelization of loops. In Proceedings of SC'98, 1998. Google Scholar
Digital Library
- M. Hall, S. Amarasinghe, B. Murphy, S. Liao, and M. Lam. Interprocedural parallelization analysis in SUIF. ACM Trans. Program. Lang. Syst., 27(4):662--731, 2005. Google Scholar
Digital Library
- R. H. Halstead. Multilisp: a language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems (TOPLAS), 7(4):501--538, 1985. Google Scholar
Digital Library
- M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer III. Software transactional memory for dynamic--sized data structures. In Proceedings of the 22th PODC, pages 92--101, Boston, MA, July 2003. Google Scholar
Digital Library
- M. Herlihy and J. E. Moss. Transactional memory: Architectural support for lock--free data structures. In Proceedings of the International Symposium on Computer Architecture, San Diego, CA, May 1993. Google Scholar
Digital Library
- A. Kejariwal and A. Nicolau. Reading list of performance analysis, speculative execution. http://www.ics.uci.edu<akejariw/SpeculativeExecutionReadingList.pdf.Google Scholar
- A. Kejariwal, X. Tian, W. Li, M. Girkar, S. Kozhukhov, H. Saito, U. Banerjee, A. Nicolau, A. V. Veidenbaum, and C. D. Polychronopoulos. On the performance potential of different types of speculative thread-level parallelism. In Proceedings of ACM International Conference on Supercomputing, June 2006. Google Scholar
Digital Library
- P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the 1994 Winter USENIX Conference, 1994. Google Scholar
Digital Library
- K. Li. Shared Virtual Memory on Loosely Coupled Multiprocessors. PhD thesis, Dept. of Computer Science, Yale University, New Haven, CT, September 1986. Google Scholar
Digital Library
- M. K. Martin, D. J. Sorin, H. V. Cain, M. D. Hill, and M. H. Lipasti. Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing. In Proceedings of the International Symposium on Microarchitecture (MICRO--34), 2001. Google Scholar
Digital Library
- J. Mellor-Crummey. Compile-time support for efficient data race detection in shared memory parallel programs. Technical Report CRPC-TR92232, Rice University, September 1992.Google Scholar
- R. W. Numrich and J. K. Reid. Co-array Fortran for parallel programming. ACM Fortran Forum, 17(2):1--31, August 1998. Google Scholar
Digital Library
- OpenMP application program interface, version 2.5, May 2005. http://www.openmp.org/drupal/mp-documents/spec25.pdf.Google Scholar
- D. Perkovic and P. J. Keleher. A protocol-centric approach to on-the-fly race detection. IEEE Transactions on Parallel and Distributed Systems, 11(10):1058--1072, 2000. Google Scholar
Digital Library
- L. Rauchwerger and D. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995. Google Scholar
Digital Library
- M. C. Rinard and M. S. Lam. The design, implementation, and evaluation of Jade. ACM Transactions on Programming Languages and Systems (TOPLAS), 20(3):483--545, 1998. Google Scholar
Digital Library
- X. Shen and C. Ding. Parallelization of utility programs based on behavior phase analysis. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, Hawthorne, NY, 2005. short paper. Google Scholar
Digital Library
- X. Shen, C. Ding, S. Dwarkadas, and M. L. Scott. Characterizing phases in service-oriented applications. Technical Report TR 848, Department of Computer Science, University of Rochester, November 2004.Google Scholar
- X. Shen, Y. Zhong, and C. Ding. Locality phase prediction. In Proceedings of the Eleventh International Conference on Architect ural Support for Programming Languages and Operating Systems (ASPLOS XI), Boston, MA, 2004. Google Scholar
Digital Library
- G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In Proceedings of the International Symposium on Computer Architecture, 1995. Google Scholar
Digital Library
- J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread-level speculation. ACM Transactions on Computer Systems, 23(3):253--300, 2005. Google Scholar
Digital Library
- C. von P. raun, L. Ceze, and C. Cascaval. Implicit parallelism with ordered transactions. In Proceedings of the ACM SIGPLAN Symposium on Principles Practice of Parallel Programming, March 2007. Google Scholar
Digital Library
- R. Wahbe, S. Lucco, and S. L. Graham. Practical data breakpoints: design and implementation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Albuquerque, NM, June 1993. Google Scholar
Digital Library
- A. Welc, S. Jagannathan, and A. L. Hosking. Safe futures for java. In Proceedings of OOPSLA, pages 439--453, 2005. Google Scholar
Digital Library
Index Terms
Software behavior oriented parallelization
Recommendations
Software behavior oriented parallelization
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and ImplementationMany sequential applications are difficult to parallelize because of unpredictable control flow, indirect data access, and input-dependent parallelism. These difficulties led us to build a software system for behavior oriented parallelization (BOP), ...
A cost-driven compilation framework for speculative parallelization of sequential programs
PLDI '04The emerging hardware support for thread-level speculation opens new opportunities to parallelize sequential programs beyond the traditional limits. By speculating that many data dependences are unlikely during runtime, consecutive iterations of a ...
Design Space Exploration of a Software Speculative Parallelization Scheme
With speculative parallelization, code sections that cannot be fully analyzed by the compiler are optimistically executed in parallel. Hardware schemes are fast but expensive and require modifications to the processors and/or memory system. Software ...







Comments