ABSTRACT
Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performance for most applications. The industry has already fallen short of the decades-old performance trend of doubling performance every 18 months. An attractive approach for exploiting multiple cores is to rely on tools, both compilers and runtime optimizers, to automatically extract threads from sequential applications. However, despite decades of research on automatic parallelization, most techniques are only effective in the scientific and data parallel domains where array dominated codes can be precisely analyzed by the compiler. Thread-level speculation offers the opportunity to expand parallelization to general-purpose programs, but at the cost of expensive hardware support. In this paper, we focus on providing low-overhead software support for exploiting speculative parallelism. We propose STMlite, a light-weight software transactional memory model that is customized to facilitate profile-guided automatic loop parallelization. STMlite eliminates a considerable amount of checking and locking overhead in conventional software transactional memory models by decoupling the commit phase from main transaction execution. Further, strong atomicity requirements for generic transactional memories are unnecessary within a stylized automatic parallelization framework. STMlite enables sequential applications to extract meaningful performance gains on commodity multicore hardware.
- M. Abadi, T. Harris, and M. Mehrara. Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In Proc. of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 185--196, 2009. Google Scholar
Digital Library
- A.-R. Adl-Tabatabai, B. T. Lewis, V. Menon, B. R. Murphy, B. Saha, and T. Shpeisman. Compiler and runtime support for efficient software transactional memory. In Proc. of the SIGPLAN '06 Conference on Programming Language Design and Implementation, pages 26--37, 2006. Google Scholar
Digital Library
- R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence--based approach. Morgan Kaufmann Publishers Inc., 2002. Google Scholar
Digital Library
- M. J. Bridges et al. Revisiting the sequential programming model for multi-core. In Proc. of the 40th Annual International Symposium on Microarchitecture, pages 69--81, Dec. 2007. Google Scholar
Digital Library
- B. D. Carlstrom et al. The Atomos transactional programming language. In Proc. of the SIGPLAN '06 Conference on Programming Language Design and Implementation, pages 1--13, June 2006. Google Scholar
Digital Library
- L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In Proc. of the 33rd Annual International Symposium on Computer Architecture, pages 227--238, Washington, DC, USA, 2006. IEEE Computer Society. Google Scholar
Digital Library
- M. K. Chen and K. Olukotun. Exploiting method-level parallelism in single-threaded Java programs. In Proc. of the 7th International Conference on Parallel Architectures and Compilation Techniques, page 176, Oct. 1998. Google Scholar
Digital Library
- K. Cooper et al. The ParaScope parallel programming environment. Proceedings of the IEEE, 81(2):244--263, Feb. 1993.Google Scholar
Cross Ref
- D. Dice, O. Shalev, and N. Shavit. Transactional Locking II. In Proc. of the 2006 International Symposium on Distributed Computing, 2006. Google Scholar
Digital Library
- D. Dice and N. Shavit. Understanding tradeoffs in software transactional memory. In Proc. of the 2007 International Symposium on Code Generation and Optimization, pages 21--33, 2007. Google Scholar
Digital Library
- Z.-H. Du et al. A cost-driven compilation framework for speculative parallelization of sequential programs. In Proc. of the SIGPLAN'04 Conference on Programming Language Design and Implementation, pages 71--81, 2004. Google Scholar
Digital Library
- W. Eatherton. The push of network processing to the top of the pyramid, 2005. Keynote address: Symposium on Architectures for Networking and Communications Systems.Google Scholar
- M. Frank. SUDS: Automatic parallelization for Raw Processors. PhD thesis, MIT, 2003. Google Scholar
Digital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proc. of the SIGPLAN'98 Conference on Programming Language Design and Implementation, pages 212--223, June 1998. Google Scholar
Digital Library
- M. Hall et al. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12):84--89, Dec. 1996. Google Scholar
Digital Library
- L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 58--69, Oct. 1998. Google Scholar
Digital Library
- T. Harris and K. Fraser. Language support for lightweight transactions. Proceedings of the OOPSLA'03, 38(11):388--402, 2003. Google Scholar
Digital Library
- T. Harris, M. Plesko, A. Shinnar, and D. Tarditi. Optimizing memory transactions. Proc. of the SIGPLAN'06 Conference on Programming Language Design and Implementation, 41(6):14--25, 2006. Google Scholar
Digital Library
- M. Herlihy, V. Luchangco, and M. Moir. The repeat offender problem: A mechanism for supporting dynamic-sized, lock-free data structures. In Proceedings of the 16th International Conference on Distributed Computing, pages 339--353. Springer-Verlag, 2002. Google Scholar
Digital Library
- H. P. Hofstee. Power efficient processor design and the Cell processor. In Proc. of the 11th International Symposium on High-Performance Computer Architecture, pages 258--262, Feb. 2005. Google Scholar
Digital Library
- T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar. Min-cut program decomposition for thread-level speculation. In Proc. of the SIGPLAN'04 Conference on Programming Language Design and Implementation, pages 59--70, June 2004. Google Scholar
Digital Library
- P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro, 25(2):21--29, Feb. 2005. Google Scholar
Digital Library
- J. Larus and R. Rajwar. Transactional Memroy. Morgan & Claypool Publishers, 2007.Google Scholar
- C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. of the 2004 International Symposium on Code Generation and Optimization, pages 75--86, 2004. Google Scholar
Digital Library
- W. Liu et al. POSH: A TLS compiler that exploits program structure. In Proc. of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 158--167, Apr. 2006. Google Scholar
Digital Library
- V. J. Marathe, W. N. Scherer, and M. L. Scott. Adaptive software transactional memory. In Proc. of the 2005 International Symposium on Distributed Computing, pages 354--368, Sept. 2005. Google Scholar
Digital Library
- P. Marcuello and A. Gonzalez. Thread-spawning schemes for speculative multithreading. In Proc. of the 8th International Symposium on High-Performance Computer Architecture, page 55, Feb. 2002. Google Scholar
Digital Library
- C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proceedings of IISWC08, 2008.Google Scholar
- C. C. Minh, M. Trautmann, J. Chung, A. McDonald, N. Bronson, J. Casper, C. Kozyrakis, and K. Olukotun. An effective hybrid transactional memory system with strong isolation guarantees. In Proc. of the 34th Annual International Symposium on Computer Architecture, pages 69--80, New York, NY, USA, 2007. ACM. Google Scholar
Digital Library
- J. Nickolls and I. Buck. NVIDIA CUDA software and GPU parallel computing architecture. In Microprocessor Forum, May 2007.Google Scholar
- E. Nystrom, H.-S. Kim, and W. Hwu. Bottom-up and top-down context-sensitive summary-based pointer analysis. In Proc. of the 11th Static Analysis Symposium, pages 165--180, Aug. 2004.Google Scholar
Cross Ref
- B. Saha, A. Adl-Tabatabai, and Q. Jacobson. Architectural support for software transactional memory. In Proc. of the 39th Annual International Symposium on Microarchitecture, pages 185--196, Nov. 2006. Google Scholar
Digital Library
- F. T. Schneider, V. Menon, T. Shpeisman, and A.-R. Adl-Tabatabai. Dynamic optimization for efficient strong atomicity. In Proceedings of the OOPSLA'08, pages 181--194, 2008. Google Scholar
Digital Library
- M. L. Seidl and B. G. Zorn. Segregating heap objects by reference behavior and lifetime. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12--23, Oct. 1998. Google Scholar
Digital Library
- N. Shavit and D. Touitou. Software transactional memory. Journal of Parallel and Distributed Computing, 10(2):99--116, Feb. 1997.Google Scholar
Cross Ref
- T. Shpeisman, V. Menon, A.-R. Adl-Tabatabai, S. Balensiefer, D. Grossman, R. L. Hudson, K. F. Moore, and B. Saha. Enforcing isolation and ordering in STM. In Proc. of the SIGPLAN '07 Conference on Programming Language Design and Implementation, pages 78--88, 2007. Google Scholar
Digital Library
- A. Shriraman, S. Dwarkadas, and M. L. Scott. Flexible Decoupled Transactional Memory Support. In Proc. of the 35th Annual International Symposium on Computer Architecture, pages 139--150, 2008. Google Scholar
Digital Library
- M. F. Spear, V. J. Marathe, W. N. S. Iii, and M. L. Scott. Conflict detection and validation strategies for software transactional memory. In Proc. of the 2006 International Symposium on Distributed Computing, 2006. Google Scholar
Digital Library
- M. F. Spear, M. M. Michael, and C. von Praun. RingSTM: scalable transactions with a single atomic instruction. pages 275--284, 2008. Google Scholar
Digital Library
- J. G. Steffan and T. C. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proc. of the 4th International Symposium on High--Performance Computer Architecture, pages 2--13, 1998. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proc. of the 2002 International Conference on Compiler Construction, pages 179--196, 2002. Google Scholar
Digital Library
- N. Vachharajani, R. Rangan, E. Raman, M. Bridges, G. Ottoni, and D. August. Speculative Decoupled Software Pipelining. In Proc. of the 16th International Conference on Parallel Architectures and Compilation Techniques, pages 49--59, Sept. 2007. Google Scholar
Digital Library
- L. Yen et al. LogTM-SE: Decoupling hardware transactional memory from caches. In Proc. of the 13th International Symposium on High-Performance Computer Architecture, pages 261--272, Feb. 2007. Google Scholar
Digital Library
- H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proc. of the 14th International Symposium on High-Performance Computer Architecture, Feb. 2008.Google Scholar
- C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proc. of the 35th Annual International Symposium on Microarchitecture, pages 85--96, Nov. 2002.\endthebibliography Google Scholar
Digital Library
Index Terms
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory
Recommendations
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory
PLDI '09Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performance for most applications. The industry has already fallen short of the ...
Speculative parallelization using software multi-threaded transactions
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsWith the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative ...
Speculative parallelization using software multi-threaded transactions
ASPLOS '10With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative ...







Comments