skip to main content
10.1145/1806651.1806663acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
research-article

Speculative parallelization using state separation and multiple value prediction

Authors Info & Claims
Published:05 June 2010Publication History

ABSTRACT

With the availability of chip multiprocessor (CMP) and simultaneous multithreading (SMT) machines, extracting thread level parallelism from a sequential program has become crucial for improving performance. However, many sequential programs cannot be easily parallelized due to the presence of dependences. To solve this problem, different solutions have been proposed. Some of them make the optimistic assumption that such dependences rarely manifest themselves at runtime. However, when this assumption is violated, the recovery causes very large overhead. Other approaches incur large synchronization or computation overhead when resolving the dependences. Consequently, for a loop with frequently arising cross-iteration dependences, previous techniques are not able to speed up the execution. In this paper we propose a compiler technique which uses state separation and multiple value prediction to speculatively parallelize loops in sequential programs that contain frequently arising cross-iteration dependences. The key idea is to generate multiple versions of a loop iteration based on multiple predictions of values of variables involved in cross-iteration dependences (i.e., live-in variables). These speculative versions and the preceding loop iteration are executed in separate memory states simultaneously. After the execution, if one of these versions is correct (i.e., its predicted values are found to be correct), then we merge its state and the state of the preceding iteration because the dependence between the two iterations is correctly resolved. The memory states of other incorrect versions are completely discarded. Based on this idea, we further propose a runtime adaptive scheme that not only gives a good performance but also achieves better CPU utilization. We conducted experiments on 10 benchmark programs on a real machine. The results show that our technique can achieve 1.7x speedup on average across all used benchmarks.

References

  1. H. Agrawal and J. R. Horgan. Dynamic program slicing. In PLDI '90, pages 246--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. S. Ahuja, K. Skadron, M. Martonosi, and D. W. Clark. Multipath execution: Opportunities and limits. In Supercomputing'98, pages 101--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. G. Burke and R. K. Cytron. Interprocedural dependence analysis and parallelization. In PLDI '86, pages 162--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Cintra and D. R. Llanos. Toward efficient and robust software speculative parallelization on multiprocessors. In PPoPP '03, pages 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI '07, pages 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Franklin and G. S. Sohi. Arb: A hardware mechanism for dynamic reordering of memory references. IEEE Transactions on Computers, 45(5):552--571, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. J. Garzaran, M. Prvulovic, and J. M. Llaberia. Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors. TACO, 2(3):247--279, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative versioning cache. In HPCA '98, pages 195--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Gupta and R. Nim. Techniques for speculative run-time parallelization of loops. In Supercomputing '98, pages 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In ASPLOS '98, pages 58--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast track: A software system for speculative program optimization. In CGO '09, pages 157--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Casçaval. How much parallelism is there in irregular applications? In PPoPP '09, pages 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic parallelism benefits from data partitioning. In ASPLOS '08, pages 233--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07, pages 211--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Lamport. The parallel execution of do loops. Commun. ACM, 17(2):83--93, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In CGO '04, pages 75--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Comput., 24(3-4):445--475, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. H. Lipasti, C. B.Wilkerson, and J. P. Shen. Value locality and load value prediction. In ASPLOS '96, pages 138--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C.-K. Luk, R. Cohn, R.Muth, H. Patil, A. Klauser, G. Lowney, S.Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI '05, pages 190--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Marcuello and A. González. Clustered speculative multithreaded processors. In In Supercomputing '99, pages 365--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Prvulovic, M. J. Garzarán, L. Rauchwerger, and J. Torrellas. Removing architectural bottlenecks to the scalability of speculative parallelization. In ISCA '01, pages 204--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. G. Quiñones, C. Madriles, F. J. Sánchez, P. Marcuello, A. González, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In PLDI '06, pages 313--325.Google ScholarGoogle Scholar
  24. L. Rauchwerger and D. A. Padua. The lrpd test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Sazeides, Y. Sazeides, J. E. Smith, and J. E. Smith. The predictability of data values. In MICRO '97, pages 248--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In ISCA '95, pages 414--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In PLDI '10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO '08, pages 330--341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Speculative parallelization of sequential loops on multicores. International Journal of Parallel Programming, 37(5):508--535, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P.-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, 48(9):881--902, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In ISCA '95, pages 392--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. K. Uht, V. Sindagi, and K. Hall. Disjoint eager execution: an optimal form of speculative execution. In MICRO '95, pages 313--325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Wallace, B. Calder, and D. M. Tullsen. Threaded multiple path execution. In ISCA '98, pages 238--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Wang and M. Franklin. Highly accurate data value prediction using hybrid predictors. In MICRO '97, pages 281--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Zhai, C. B. Colohan, J. G. Steffan, and T. C. Mowry. Compiler optimization of scalar value communication between speculative threads. In ASPLOS '02, pages 171--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for speculative parallelization of partially-parallel loops in dsm multiprocessors. In HPCA '99, pages 135--141. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Speculative parallelization using state separation and multiple value prediction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ISMM '10: Proceedings of the 2010 international symposium on Memory management
      June 2010
      140 pages
      ISBN:9781450300544
      DOI:10.1145/1806651
      • General Chair:
      • Jan Vitek,
      • Program Chair:
      • Doug Lea
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 45, Issue 8
        ISMM '10
        August 2010
        129 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1837855
        Issue’s Table of Contents

      Copyright © 2010 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 June 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate72of156submissions,46%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!