skip to main content
research-article
Open Access

Compiler-Driven Software Speculation for Thread-Level Parallelism

Published:22 December 2015Publication History
Skip Abstract Section

Abstract

Current parallelizing compilers can tackle applications exercising regular access patterns on arrays or affine indices, where data dependencies can be expressed in a linear form. Unfortunately, there are cases that independence between statements of code cannot be guaranteed and thus the compiler conservatively produces sequential code. Programs that involve extensive pointer use, irregular access patterns, and loops with unknown number of iterations are examples of such cases. This limits the extraction of parallelism in cases where dependencies are rarely or never triggered at runtime. Speculative parallelism refers to methods employed during program execution that aim to produce a valid parallel execution schedule for programs immune to static parallelization. The motivation for this article is to review recent developments in the area of compiler-driven software speculation for thread-level parallelism and how they came about. The article is divided into two parts. In the first part the fundamentals of speculative parallelization for thread-level parallelism are explained along with a design choice categorization for implementing such systems. Design choices include the ways speculative data is handled, how data dependence violations are detected and resolved, how the correct data are made visible to other threads, or how speculative threads are scheduled. The second part is structured around those design choices providing the advances and trends in the literature with reference to key developments in the area. Although the focus of the article is in software speculative parallelization, a section is dedicated for providing the interested reader with pointers and references for exploring similar topics such as hardware thread-level speculation, transactional memory, and automatic parallelization.

References

  1. David I. August, Daniel A. Connors, Scott A. Mahlke, John W. Sias, Kevin M. Crozier, Ben-Chung Cheng, Patrick R. Eaton, Qudus B. Olaniran, and Wen mei W. Hwu. 1998. Integrated predicated and speculative execution in the IMPACT EPIC architecture. In Proceedings of the International Symposium on Computer Architecture (ISCA), 227--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. David F. Bacon, Susan L. Graham, and Oliver J. Sharp. 1994. Compiler transformations for high-performance computing. Computing Surveys 26, 4 (1994), 345--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Hans-J. Boehm. 1996. Simple garbage-collector-safety. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI), 89--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Matthew Bridges, Neil Vachharajani, Yun Zhang, Thomas Jablin, and David August. 2007. Revisiting the sequential programming model for multi-core. In Proceedings of the International Symposium on Microarchitecture (MICRO), 69--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Matthew Bridges. 2008. The VELOCITY Compiler: Extracting Efficient Multicore Eexecution from Legacy Sequential Codes. Technical Report. Princeton University.Google ScholarGoogle Scholar
  6. Derek Bruening, Srikrishna Devabhaktuni, and Saman Amarasinghe. 2000. Softspec: Software-based speculative parallelism. In Workshop on Feedback-Directed and Dynamic Optimization (FDDO).Google ScholarGoogle Scholar
  7. Luis Ceze, James Tuck, Josep Torrellas, and Calin Cascaval. 2006. Bulk disambiguation of speculative threads in multiprocessors. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA), 227--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ding Kai Chen, Josep Torrellas, and Pen Chung Yew. 1994. An efficient algorithm for the run-time parallelization of DOACROSS loops. In Proceedings of the International Conference on Supercomputing (ICS), 518--527. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael K. Chen and Kunle Olukotun. 2003. The Jrpm system for dynamically parallelizing Java programs. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA), 434--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Marcelo Cintra and Diego Llanos. 2005. Design space exploration of a software speculative parallelization scheme. IEEE Transactions on Parallel and Distributed Systems 16, 6 (2005), 562--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Marcelo Cintra and Diego R. Llanos. 2003. Toward efficient and robust software speculative parallelization on multiprocessors. In Proceedings of the 9th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Marcelo Cintra, José F. Martínez, and Josep Torrellas. 2000. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA), 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ron Cytron. 1986. DOACROSS: Beyond vectorization for multiprocessors. In Proceedings of the International Conference on Parallel Processing (ICPP), 836--844.Google ScholarGoogle Scholar
  14. Francis Dang, Hao Yu, and Lawrence Rauchwerger. 2002. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS), 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of Conference on Symposium on Opearting Systems Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chen Ding, Xipeng Shen, Kirk Kelsey, Chris Tice, Ruke Huang, and Chengliang Zhang. 2007. Software behavior oriented parallelization. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI), 223--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. María Jesús Garzarán, Milos Prvulovic, José María Llabería, Víctor Viñals, Lawrence Rauchwerger, and Josep Torrellas. 2005. Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors. ACM Transactions in Architecture and Code Optimization 2, 3 (September 2005), 247--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sridhar Gopal, T. Vijaykumar, James Smith, and Gurindar Sohi. 1998. Speculative versioning cache. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA), 195--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Manish Gupta and Rahul Nim. 1998. Techniques for speculative run-time parallelization of loops. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Apache Hadoop. 2005. Apache Hadoop. http://hadoop.apache.org/. (2005). Accessed February 2, 2015.Google ScholarGoogle Scholar
  21. Lance Hammond, Mark Willey, and Kunle Olukotun. 1998. Data speculation support for a chip multiprocessor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 58--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tim Harris, James Larus, and Ravi Rajwar. 2010. Transactional Memory (2nd ed.). Morgan and Claypool Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tim Harris, Mark Plesko, Avraham Shinnar, and David Tarditi. 2006. Optimizing memory transactions. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI), 14--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA), 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jason Howard, Saurabh Dighe, Yatin Hoskote, Sriram Vangal, David Finan, Gregory Ruhl, David Jenkins, Howard Wilson, Nitin Borkar, Gerhard Schrom, Fabrice Pailet, Shailendra Jain, Tiju Jacob, Satish Yada, Sraven Marella, Praveen Salihundam, Vasantha Erraguntla, Michael Konow, Michael Riepen, Guido Droege, Joerg Lindemann, Matthias Gries, Thomas Apel, Kersten Henriss, Tor Lund-Larsen, Sebastian Steibl, Shekhar Borkar, Vivek De, Rob Van Der Wijngaart, and Timothy Mattson. 2010. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 108--109.Google ScholarGoogle ScholarCross RefCross Ref
  26. Shiwen Hu, Ravi Bhargava, and Lizy Kurian John. 2003. The role of return value prediction in exploiting speculative method-level parallelism. Journal of Instruction-Level Parallelsim 5 (2003), 1--21.Google ScholarGoogle Scholar
  27. Nikolas Ioannou, Jeremy Singer, Salman Khan, Paraskevas Yiapanis, Adam Pocock, Polychronis Xekalakis, Gavin Brown, Mikel Luján, Ian Watson, and Marcelo Cintra. 2010. Toward a more accurate understanding of the limits of the TLS execution paradigm. In Proceedings of the IEEE International Symposium on Workload Characterization. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Nick P. Johnson, Hanjun Kim, Prakash Prabhu, Ayal Zaks, and David I. August. 2012. Speculative separation for privatization and peductions. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI), 359--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Troy A. Johnson, Rudolf Eigenmann, and T. N. Vijaykumar. 2004. Min-cut program decomposition for thread-level speculation. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI), 59--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ken Kennedy and John R. Allen. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hanjun Kim, Arun Raman, Feng Liu, Jae W. Lee, and David I. August. 2010. Scalable speculative parallelization on commodity clusters. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau, and Josep Torrellas. 2006. POSH: A TLS compiler that exploits program structure. In Proceedings of the International Symposium on Principles and Practice of Parallel Programming (PPoPP), 158--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mikel Luján, Phyllis Gustafson, Michael Paleczny, and Christopher A. Vick. 2007. Speculative parallelization—Eliminating the overhead of failure. In Proceedings of the 3rd International Conference on High Performance Computing and Communications (HPCC), 460--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Clifford Lynch. 2008. Big data: How do your data grow? Nature 455, 7209 (2008), 28--29.Google ScholarGoogle ScholarCross RefCross Ref
  35. Pedro Marcuello and Antonio González. 1999. Exploiting speculative thread-level parallelism on a SMT processor. In Proceedings of the International Conference on High-Performance Computing and Networking, 754--763. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pedro Marcuello and Antonio González. 2002. Thread-spawning schemes for speculative multithreading. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA), 55--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jan Kasper Martinsen and Hakan Grahn. 2011. A methodology for evaluating JavaScript execution behavior in interactive web applications. In Computer Systems and Applications (AICCSA). 241--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jan Martinsen, Hakan Grahn, and Anders Isberg. 2013. Using speculation to enhance javaScript performance in Web applications. IEEE Internet Computing 17, 2 (2013), 10--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, and Scott Mahlke. 2009. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In Proceedings of the ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI), 166--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Samuel P. Midkiff. 2012. Automatic Parallelization: An Overview of Fundamental Compiler Techniques. Morgan & Claypool Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Erik M. Nystrom, Hong-Seok Kim, and Wen-Mei W. Hwu. 2004. Bottom-up and top-down context-sensitive summary-based pointer analysis. In Proceedings of 11th Static Analysis Symposium (SAS), 165--180.Google ScholarGoogle Scholar
  42. Cosmin Oancea, Alan Mycroft, and Tim Harris. 2009. A lightweight in-place implementation for software thread-level speculation. In Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures (SPAA), 223--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jeffrey Oplinger, David Heine, Shih Liao, Basem A. Nayfeh, Monica S. Lam, and Kunle Olukotun. 1997. Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor. Technical Report CSL-TR-97-715. Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August. 2005a. Automatic thread extraction with decoupled software pipelining. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 105--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August. 2005b. Automatic thread extraction with decoupled Software Pipelining. In Proceedings of International Symposium on Microarchitecture (MICRO), 105--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Christopher J. F. Pickett and Clark Verbrugge. 2006. Software thread level speculation for the Java language and virtual machine environment. In Proceedings of the International Conference on Languages and Compilers for Parallel Computing (LCPC), 304--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Manohar K. Prabhu and Kunle Olukotun. 2005. Exposing speculative thread parallelism in SPEC2000. In Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP), 142--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Arun Raman, Hanjun Kim, Thomas R. Mason, Thomas B. Jablin, and David I. August. 2010. Speculative parallelization using software multi-threaded transactions. In Proceedings of Architectural Support for Programming Languages and Operating Systems, 65--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Easwaran Raman, Guilherme Ottoni, Arun Raman, Matthew J. Bridges, and David I. August. 2008. Parallel-stage decoupled software pipelining. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 114--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ram Rangan, Neil Vachharajani, Guilherme Ottoni, and David I. August. 2008. Performance scalability of decoupled software pipelining. ACM Transactions on Architecture and Code Optimization 5, 2, Article 8 (2008), 8:1--8:25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ram Rangan, Neil Vachharajani, Manish Vachharajani, and David I. August. 2004. Decoupled software pipelining with the synchronization array. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), 177--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Paruj Ratanaworabhan, Benjamin Livshits, and Benjamin G. Zorn. 2010. JSMeter: Comparing the behavior of JavaScript benchmarks with real Web applications. In Proceedings of the 2010 USENIX Conference on Web Application Development (WebApps). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Lawrence Rauchwerger. 1998. Run-time parallelization: Its time has come. Parallel Computing 24, 3--4 (1998), 527--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Lawrence Rauchwerger and David Padua. 1994a. Speculative Run-Time Parallelization of Loops. Technical Report CSRD-827. Center for Supercomputing Research and Development, University of Illinois.Google ScholarGoogle Scholar
  55. Lawrence Rauchwerger and David Padua. 1994b. The privatizing DOALL test: A run-time technique for DOALL loop identification and array privatization. In Proceedings of the 8th International Conference on Supercomputing (ICS), 33--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Lawrence Rauchwerger and David Padua. 1995. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI), 218--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jose Renau, Karin Strauss, Luis Ceze, Wei Liu, Smruti Sarangi, James Tuck, and Josep Torrellas. 2005a. Thread-level speculation on a CMP can be energy efficient. In Proceedings of the International Conference on Supercomputing, 219--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Jose Renau, James Tuck, Wei Liu, Luis Ceze, Karin Strauss, and Josep Torrellas. 2005b. Tasking with out-of-order spawn in TLS chip multiprocessors: Microarchitecture and compilation. In Proceedings of the Internatonal Conference on Supercomputing, 179--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Gregor Richards, Sylvain Lebresne, Brian Burg, and Jan Vitek. 2010. An analysis of the dynamic behavior of JavaScript programs. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Peter Rundberg and Per Stenström. 2001. An all-software thread-level data dependence speculation system for multiprocessors. Journal of Instruction-Level Parallelism 3 (2001), 1--28.Google ScholarGoogle Scholar
  61. Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao Minh, and Benjamin Hertzberg. 2006. McRT-STM: A high performance software transactional memory system for a multi-core runtime. In Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 187--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Joel H. Salz and Ravi Mirchandaney. 1991. The preprocessed doacross loop. In Proceedings of ICPP, 174--178.Google ScholarGoogle Scholar
  63. Joel H. Salz, Ravi Mirchandaney, and Kay Crowley. 1989. The doconsider loop. In Proceedings of ICS. 29--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Joel H. Salz, Ravi Mirchandaney, and Kay Crowley. 1991. Run-time parallelization and scheduling of loops. IEEE Transactions on Computers 40, 5 (1991), 603--612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Michael F. Spear. 2010. Lightweight, robust adaptivity for software transactional memory. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 273--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Michael F. Spear, Virendra J. Marathe, William N. Scherer, and Michael L. Scott. 2006. Conflict detection and validation strategies for software transactional memory. In Proceedings of the International Conference on Distributed Computing (DISC), 179--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Gregory Steffan. 2003. Hardware Support for Thread-Level Speculation. Doctoral dissertation. Carnegie Mellon University Pittsburgh, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Gregory Steffan, Christopher Colohan, Antonia Zhai, and Todd C. Mowry. 2005. The STAMPede approach to thread-level speculation. ACM Transactions on Computer Systems 23, 3 (2005), 253--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. 2000. A scalable approach to thread-level speculation. In Proceedings of the International Symposium on Computer Architecture (ISCA), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Peiyi Tang and Pen-Chung Yew. 1986. Processor self-scheduling for multiple nested parallel loops. In Proceedings of the International Conference of Parallel Processing, 528--535.Google ScholarGoogle Scholar
  71. William Thies, Vikram Chandrasekhar, and Saman Amarasinghe. 2007. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In Proceedings of the International Symposium on Microarchitecture (MICRO), 356--369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Chen Tian, Min Feng, and Rajiv Gupta. 2010. Supporting speculative parallelization in the presence of dynamic data structures. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI), 62--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Chen Tian, Min Feng, Vijay Nagarajan, and Rajiv Gupta. 2008. Copy or discard execution model for speculative parallelization on multicores. In Proceedings of the 41st Annual International Symposium on Microarchitecture (MICRO), 330--341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Neil Vachharajani, Ram Rangan, Easwaran Raman, Matthew J. Bridges, Guilherme Ottoni, and David I. August. 2007. Speculative decoupled software pipelining. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT), 49--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Amy Wang, Matthew Gaudet, Peng Wu, Jose Amaral, Martin Ohmacht, Christopher Barton, Raul Silvera, and Maged Michael. 2012. Evaluation of blue gene/q hardware support for transactional memories. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT), 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Paraskevas Yiapanis. 2013. High Performance Optimizations in Runtime Speculative Parallelization for Multicore Architectures. Ph.D. dissertation. School of Computer Science, University of Manchester.Google ScholarGoogle Scholar
  77. Paraskevas Yiapanis, Demian Rosas-Ham, Gavin Brown, and Mikel Luján. 2013. Optimizing software runtime systems for speculative parallelization. ACM Transactions on Architecture and Code Optimization 9, 4, Article 39 (2013), 39:1--39:27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Chenggang Zhang, Guodong Han, and Cho-Li Wang. 2013. GPU-TLS: An efficient runtime for speculative loop parallelization on GPUs. In Proceedings of the International Symposium on Cluster, Cloud, and Grid Computing (CCGRID). 120--127.Google ScholarGoogle Scholar
  79. Hongtao Zhong, Mojtaba Mehrara, Steven A. Lieberman, and Scott A. Mahlke. 2008. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of the International Conference on High-Performance Computer Architecture (HPCA), 290--301.Google ScholarGoogle Scholar
  80. Chuan-Qi Zhu and Pen-Chung Yew. 1987. A scheme to enforce data dependence on large multiprocessor systems. IEEE Transactions on Software Engineering 13, 6 (1987), 726--739. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Compiler-Driven Software Speculation for Thread-Level Parallelism

          Recommendations

          Reviews

          Sergei Gorlatch

          Compilers quickly reach their limits in parallelizing sequential code (for example, C++): static analysis methods often fail because of insufficient information about values that become known at runtime; therefore, compilers work usually quite conservatively and produce mostly sequential code. However, efficient utilization of modern multicore processors is desirable, and to reach this goal one possible solution is to let the compiler speculate. Thread-level speculation (TLS) is a software technique that allows the compiler to generate parallel code when correct execution is unpredictable at compile-time. To achieve correct program execution, a runtime environment is required that monitors the parallel executing threads and, in case of an incorrect execution, performs a rollback and executes a correct (in the worst case a sequential) version of the code. Part 1 of the paper provides an overview of the requirements needed to realize a TLS system. Readers unfamiliar with the topic, but with a basic background in parallel programming, are gently introduced to the level of knowledge needed to understand TLS. Key components (for example, version management and conflict detection) of a TLS system are discussed in detail and different implementation strategies are presented. The reader will recognize the complexity of the topic and quickly realize where the challenges of TLS are. Part 2 goes into more detail. Advancements and research topics in the field of TLS from the last two decades are gathered, and their pros and cons are reviewed and discussed. The knowledge gained in Part 1 is sufficient to follow this broad introduction to the state of the art of TLS. This part also contains a critical but arguable discussion on the applicability and performance of particular TLS systems, with further information on the advantages and drawbacks of the implementation decisions in each TLS system. The authors compare the existing TLS systems regarding their usability for general-purpose applications and show the limitations of the different approaches, for example, need of synchronization, memory usage, and scalability issues. Finally, the reader is introduced to advanced topics (such as distributed TLS) and provided with information on important related topics (for example, transactional memory). Online Computing Reviews Service

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!