ABSTRACT
With the availability of chip multiprocessor (CMP) and simultaneous multithreading (SMT) machines, extracting thread level parallelism from a sequential program has become crucial for improving performance. However, many sequential programs cannot be easily parallelized due to the presence of dependences. To solve this problem, different solutions have been proposed. Some of them make the optimistic assumption that such dependences rarely manifest themselves at runtime. However, when this assumption is violated, the recovery causes very large overhead. Other approaches incur large synchronization or computation overhead when resolving the dependences. Consequently, for a loop with frequently arising cross-iteration dependences, previous techniques are not able to speed up the execution. In this paper we propose a compiler technique which uses state separation and multiple value prediction to speculatively parallelize loops in sequential programs that contain frequently arising cross-iteration dependences. The key idea is to generate multiple versions of a loop iteration based on multiple predictions of values of variables involved in cross-iteration dependences (i.e., live-in variables). These speculative versions and the preceding loop iteration are executed in separate memory states simultaneously. After the execution, if one of these versions is correct (i.e., its predicted values are found to be correct), then we merge its state and the state of the preceding iteration because the dependence between the two iterations is correctly resolved. The memory states of other incorrect versions are completely discarded. Based on this idea, we further propose a runtime adaptive scheme that not only gives a good performance but also achieves better CPU utilization. We conducted experiments on 10 benchmark programs on a real machine. The results show that our technique can achieve 1.7x speedup on average across all used benchmarks.
- H. Agrawal and J. R. Horgan. Dynamic program slicing. In PLDI '90, pages 246--256. Google Scholar
Digital Library
- P. S. Ahuja, K. Skadron, M. Martonosi, and D. W. Clark. Multipath execution: Opportunities and limits. In Supercomputing'98, pages 101--108. Google Scholar
Digital Library
- M. G. Burke and R. K. Cytron. Interprocedural dependence analysis and parallelization. In PLDI '86, pages 162--175. Google Scholar
Digital Library
- M. Cintra and D. R. Llanos. Toward efficient and robust software speculative parallelization on multiprocessors. In PPoPP '03, pages 13--24. Google Scholar
Digital Library
- C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI '07, pages 1--12. Google Scholar
Digital Library
- M. Franklin and G. S. Sohi. Arb: A hardware mechanism for dynamic reordering of memory references. IEEE Transactions on Computers, 45(5):552--571, 1996. Google Scholar
Digital Library
- M. J. Garzaran, M. Prvulovic, and J. M. Llaberia. Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors. TACO, 2(3):247--279, 2005. Google Scholar
Digital Library
- S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative versioning cache. In HPCA '98, pages 195--205. Google Scholar
Digital Library
- M. Gupta and R. Nim. Techniques for speculative run-time parallelization of loops. In Supercomputing '98, pages 1--12. Google Scholar
Digital Library
- L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In ASPLOS '98, pages 58--69. Google Scholar
Digital Library
- K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast track: A software system for speculative program optimization. In CGO '09, pages 157--168. Google Scholar
Digital Library
- K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002. Google Scholar
Digital Library
- M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Casçaval. How much parallelism is there in irregular applications? In PPoPP '09, pages 3--14. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic parallelism benefits from data partitioning. In ASPLOS '08, pages 233--243. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07, pages 211--222. Google Scholar
Digital Library
- L. Lamport. The parallel execution of do loops. Commun. ACM, 17(2):83--93, 1974. Google Scholar
Digital Library
- C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In CGO '04, pages 75--88. Google Scholar
Digital Library
- A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Comput., 24(3-4):445--475, 1998. Google Scholar
Digital Library
- M. H. Lipasti, C. B.Wilkerson, and J. P. Shen. Value locality and load value prediction. In ASPLOS '96, pages 138--147. Google Scholar
Digital Library
- C.-K. Luk, R. Cohn, R.Muth, H. Patil, A. Klauser, G. Lowney, S.Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI '05, pages 190--200. Google Scholar
Digital Library
- P. Marcuello and A. González. Clustered speculative multithreaded processors. In In Supercomputing '99, pages 365--372. Google Scholar
Digital Library
- M. Prvulovic, M. J. Garzarán, L. Rauchwerger, and J. Torrellas. Removing architectural bottlenecks to the scalability of speculative parallelization. In ISCA '01, pages 204--215. Google Scholar
Digital Library
- C. G. Quiñones, C. Madriles, F. J. Sánchez, P. Marcuello, A. González, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In PLDI '06, pages 313--325.Google Scholar
- L. Rauchwerger and D. A. Padua. The lrpd test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999. Google Scholar
Digital Library
- Y. Sazeides, Y. Sazeides, J. E. Smith, and J. E. Smith. The predictability of data values. In MICRO '97, pages 248--258. Google Scholar
Digital Library
- G. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In ISCA '95, pages 414--425. Google Scholar
Digital Library
- C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In PLDI '10. Google Scholar
Digital Library
- C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO '08, pages 330--341. Google Scholar
Digital Library
- C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Speculative parallelization of sequential loops on multicores. International Journal of Parallel Programming, 37(5):508--535, 2009. Google Scholar
Digital Library
- J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P.-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, 48(9):881--902, 1999. Google Scholar
Digital Library
- D. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In ISCA '95, pages 392--403. Google Scholar
Digital Library
- A. K. Uht, V. Sindagi, and K. Hall. Disjoint eager execution: an optimal form of speculative execution. In MICRO '95, pages 313--325. Google Scholar
Digital Library
- S. Wallace, B. Calder, and D. M. Tullsen. Threaded multiple path execution. In ISCA '98, pages 238--249. Google Scholar
Digital Library
- K. Wang and M. Franklin. Highly accurate data value prediction using hybrid predictors. In MICRO '97, pages 281--290. Google Scholar
Digital Library
- A. Zhai, C. B. Colohan, J. G. Steffan, and T. C. Mowry. Compiler optimization of scalar value communication between speculative threads. In ASPLOS '02, pages 171--183. Google Scholar
Digital Library
- Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for speculative parallelization of partially-parallel loops in dsm multiprocessors. In HPCA '99, pages 135--141. Google Scholar
Digital Library
Index Terms
Speculative parallelization using state separation and multiple value prediction
Recommendations
Speculative parallelization using state separation and multiple value prediction
ISMM '10With the availability of chip multiprocessor (CMP) and simultaneous multithreading (SMT) machines, extracting thread level parallelism from a sequential program has become crucial for improving performance. However, many sequential programs cannot be ...
A cost-driven compilation framework for speculative parallelization of sequential programs
PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementationThe emerging hardware support for thread-level speculation opens new opportunities to parallelize sequential programs beyond the traditional limits. By speculating that many data dependences are unlikely during runtime, consecutive iterations of a ...
Speculative parallelization on GPGPUs
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel ProgrammingThis paper overviews the first speculative parallelization technique for GPUs that can exploit parallelism in loops even in the presence of dynamic irregularities that may give rise to cross-iteration dependences. The execution of a speculatively ...







Comments