skip to main content
article

Software behavior oriented parallelization

Published:10 June 2007Publication History
Skip Abstract Section

Abstract

Many sequential applications are difficult to parallelize because of unpredictable control flow, indirect data access, and input-dependent parallelism. These difficulties led us to build a software system for behavior oriented parallelization (BOP), which allows a program to be parallelized based on partial information about program behavior, for example, a user reading just part of the source code, or a profiling tool examining merely one or few executions.

The basis of BOP is programmable software speculation, where a user or an analysis tool marks possibly parallel regions in the code, and the run-time system executes these regions speculatively. It is imperative to protect the entire address space during speculation. The main goal of the paper is to demonstrate that the general protection can be made cost effective by three novel techniques: programmable speculation, critical-path minimization, and value-based correctness checking. On a recently acquired multi-core, multi-processor PC, the BOP system reduced the end-to-end execution time by integer factors for a Lisp interpreter, a data compressor, a language parser, and a scientific library, with no change to the underlying hardware or operating system.

References

  1. R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, October 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. J. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, 15(5):757--763, 1966.Google ScholarGoogle ScholarCross RefCross Ref
  4. W. Blume et al. Parallel programming with polaris. IEEE Computer, 29(12):77--81, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H.-J. Boehm. Threads cannot be implemented as a library. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 261--268, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. W. Chang and G. A. Gibson. Automatic i/o hint generation through speculative execution. In Proceedings of the Symposium on Operating Systems Design and Implementation, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Chen, C. Iancu, and K. Yelick. Communication optimizations for fine-grained UPC applications. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, St. Louis, MO, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. M. Chilimbi and M. Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Berlin, Germany, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. H. Cintra and D. R. Llanos. Design space exploration of a software speculative parallelization scheme. IEEE Transactions on Parallel and Distributed Systems, 16(6):562--576, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Cytron. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, IL, August 1986.Google ScholarGoogle Scholar
  11. F. Dang, H. Yu, and L. Rauchwerger. The R-LRPD test: Speculative parallelization of partially parallel loops. Technical report, CS Dept., Texas A&M University, College Station, TX, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Grant, M. Philipose, M. Mock, C. Chambers, and S. J. Eggers. An evaluation of staged run-time optimizations in DyC. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. CGrelck and S.-B. Scholz. SAC-from high-level programming with arrays to efficient parallel execution. Parallel Processing Letters, 13(3):401--412, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Gupta and R. Nim. Techniques for run-time parallelization of loops. In Proceedings of SC'98, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Hall, S. Amarasinghe, B. Murphy, S. Liao, and M. Lam. Interprocedural parallelization analysis in SUIF. ACM Trans. Program. Lang. Syst., 27(4):662--731, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. H. Halstead. Multilisp: a language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems (TOPLAS), 7(4):501--538, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer III. Software transactional memory for dynamic--sized data structures. In Proceedings of the 22th PODC, pages 92--101, Boston, MA, July 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Herlihy and J. E. Moss. Transactional memory: Architectural support for lock--free data structures. In Proceedings of the International Symposium on Computer Architecture, San Diego, CA, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Kejariwal and A. Nicolau. Reading list of performance analysis, speculative execution. http://www.ics.uci.edu<akejariw/SpeculativeExecutionReadingList.pdf.Google ScholarGoogle Scholar
  21. A. Kejariwal, X. Tian, W. Li, M. Girkar, S. Kozhukhov, H. Saito, U. Banerjee, A. Nicolau, A. V. Veidenbaum, and C. D. Polychronopoulos. On the performance potential of different types of speculative thread-level parallelism. In Proceedings of ACM International Conference on Supercomputing, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the 1994 Winter USENIX Conference, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Li. Shared Virtual Memory on Loosely Coupled Multiprocessors. PhD thesis, Dept. of Computer Science, Yale University, New Haven, CT, September 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. K. Martin, D. J. Sorin, H. V. Cain, M. D. Hill, and M. H. Lipasti. Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing. In Proceedings of the International Symposium on Microarchitecture (MICRO--34), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Mellor-Crummey. Compile-time support for efficient data race detection in shared memory parallel programs. Technical Report CRPC-TR92232, Rice University, September 1992.Google ScholarGoogle Scholar
  26. R. W. Numrich and J. K. Reid. Co-array Fortran for parallel programming. ACM Fortran Forum, 17(2):1--31, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. OpenMP application program interface, version 2.5, May 2005. http://www.openmp.org/drupal/mp-documents/spec25.pdf.Google ScholarGoogle Scholar
  28. D. Perkovic and P. J. Keleher. A protocol-centric approach to on-the-fly race detection. IEEE Transactions on Parallel and Distributed Systems, 11(10):1058--1072, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Rauchwerger and D. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. C. Rinard and M. S. Lam. The design, implementation, and evaluation of Jade. ACM Transactions on Programming Languages and Systems (TOPLAS), 20(3):483--545, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Shen and C. Ding. Parallelization of utility programs based on behavior phase analysis. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, Hawthorne, NY, 2005. short paper. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. X. Shen, C. Ding, S. Dwarkadas, and M. L. Scott. Characterizing phases in service-oriented applications. Technical Report TR 848, Department of Computer Science, University of Rochester, November 2004.Google ScholarGoogle Scholar
  33. X. Shen, Y. Zhong, and C. Ding. Locality phase prediction. In Proceedings of the Eleventh International Conference on Architect ural Support for Programming Languages and Operating Systems (ASPLOS XI), Boston, MA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In Proceedings of the International Symposium on Computer Architecture, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread-level speculation. ACM Transactions on Computer Systems, 23(3):253--300, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. von P. raun, L. Ceze, and C. Cascaval. Implicit parallelism with ordered transactions. In Proceedings of the ACM SIGPLAN Symposium on Principles Practice of Parallel Programming, March 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Wahbe, S. Lucco, and S. L. Graham. Practical data breakpoints: design and implementation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Albuquerque, NM, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Welc, S. Jagannathan, and A. L. Hosking. Safe futures for java. In Proceedings of OOPSLA, pages 439--453, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Software behavior oriented parallelization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 42, Issue 6
          Proceedings of the 2007 PLDI conference
          June 2007
          491 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1273442
          Issue’s Table of Contents
          • cover image ACM Conferences
            PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation
            June 2007
            508 pages
            ISBN:9781595936332
            DOI:10.1145/1250734

          Copyright © 2007 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 June 2007

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!