skip to main content
10.1145/1542476.1542495acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

Published:15 June 2009Publication History

ABSTRACT

Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performance for most applications. The industry has already fallen short of the decades-old performance trend of doubling performance every 18 months. An attractive approach for exploiting multiple cores is to rely on tools, both compilers and runtime optimizers, to automatically extract threads from sequential applications. However, despite decades of research on automatic parallelization, most techniques are only effective in the scientific and data parallel domains where array dominated codes can be precisely analyzed by the compiler. Thread-level speculation offers the opportunity to expand parallelization to general-purpose programs, but at the cost of expensive hardware support. In this paper, we focus on providing low-overhead software support for exploiting speculative parallelism. We propose STMlite, a light-weight software transactional memory model that is customized to facilitate profile-guided automatic loop parallelization. STMlite eliminates a considerable amount of checking and locking overhead in conventional software transactional memory models by decoupling the commit phase from main transaction execution. Further, strong atomicity requirements for generic transactional memories are unnecessary within a stylized automatic parallelization framework. STMlite enables sequential applications to extract meaningful performance gains on commodity multicore hardware.

References

  1. M. Abadi, T. Harris, and M. Mehrara. Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In Proc. of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 185--196, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A.-R. Adl-Tabatabai, B. T. Lewis, V. Menon, B. R. Murphy, B. Saha, and T. Shpeisman. Compiler and runtime support for efficient software transactional memory. In Proc. of the SIGPLAN '06 Conference on Programming Language Design and Implementation, pages 26--37, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence--based approach. Morgan Kaufmann Publishers Inc., 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. J. Bridges et al. Revisiting the sequential programming model for multi-core. In Proc. of the 40th Annual International Symposium on Microarchitecture, pages 69--81, Dec. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. D. Carlstrom et al. The Atomos transactional programming language. In Proc. of the SIGPLAN '06 Conference on Programming Language Design and Implementation, pages 1--13, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In Proc. of the 33rd Annual International Symposium on Computer Architecture, pages 227--238, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. K. Chen and K. Olukotun. Exploiting method-level parallelism in single-threaded Java programs. In Proc. of the 7th International Conference on Parallel Architectures and Compilation Techniques, page 176, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Cooper et al. The ParaScope parallel programming environment. Proceedings of the IEEE, 81(2):244--263, Feb. 1993.Google ScholarGoogle ScholarCross RefCross Ref
  9. D. Dice, O. Shalev, and N. Shavit. Transactional Locking II. In Proc. of the 2006 International Symposium on Distributed Computing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Dice and N. Shavit. Understanding tradeoffs in software transactional memory. In Proc. of the 2007 International Symposium on Code Generation and Optimization, pages 21--33, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Z.-H. Du et al. A cost-driven compilation framework for speculative parallelization of sequential programs. In Proc. of the SIGPLAN'04 Conference on Programming Language Design and Implementation, pages 71--81, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Eatherton. The push of network processing to the top of the pyramid, 2005. Keynote address: Symposium on Architectures for Networking and Communications Systems.Google ScholarGoogle Scholar
  13. M. Frank. SUDS: Automatic parallelization for Raw Processors. PhD thesis, MIT, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proc. of the SIGPLAN'98 Conference on Programming Language Design and Implementation, pages 212--223, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Hall et al. Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer, 29(12):84--89, Dec. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 58--69, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Harris and K. Fraser. Language support for lightweight transactions. Proceedings of the OOPSLA'03, 38(11):388--402, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Harris, M. Plesko, A. Shinnar, and D. Tarditi. Optimizing memory transactions. Proc. of the SIGPLAN'06 Conference on Programming Language Design and Implementation, 41(6):14--25, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Herlihy, V. Luchangco, and M. Moir. The repeat offender problem: A mechanism for supporting dynamic-sized, lock-free data structures. In Proceedings of the 16th International Conference on Distributed Computing, pages 339--353. Springer-Verlag, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. P. Hofstee. Power efficient processor design and the Cell processor. In Proc. of the 11th International Symposium on High-Performance Computer Architecture, pages 258--262, Feb. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar. Min-cut program decomposition for thread-level speculation. In Proc. of the SIGPLAN'04 Conference on Programming Language Design and Implementation, pages 59--70, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro, 25(2):21--29, Feb. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Larus and R. Rajwar. Transactional Memroy. Morgan & Claypool Publishers, 2007.Google ScholarGoogle Scholar
  24. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. of the 2004 International Symposium on Code Generation and Optimization, pages 75--86, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. Liu et al. POSH: A TLS compiler that exploits program structure. In Proc. of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 158--167, Apr. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. V. J. Marathe, W. N. Scherer, and M. L. Scott. Adaptive software transactional memory. In Proc. of the 2005 International Symposium on Distributed Computing, pages 354--368, Sept. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Marcuello and A. Gonzalez. Thread-spawning schemes for speculative multithreading. In Proc. of the 8th International Symposium on High-Performance Computer Architecture, page 55, Feb. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proceedings of IISWC08, 2008.Google ScholarGoogle Scholar
  29. C. C. Minh, M. Trautmann, J. Chung, A. McDonald, N. Bronson, J. Casper, C. Kozyrakis, and K. Olukotun. An effective hybrid transactional memory system with strong isolation guarantees. In Proc. of the 34th Annual International Symposium on Computer Architecture, pages 69--80, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Nickolls and I. Buck. NVIDIA CUDA software and GPU parallel computing architecture. In Microprocessor Forum, May 2007.Google ScholarGoogle Scholar
  31. E. Nystrom, H.-S. Kim, and W. Hwu. Bottom-up and top-down context-sensitive summary-based pointer analysis. In Proc. of the 11th Static Analysis Symposium, pages 165--180, Aug. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  32. B. Saha, A. Adl-Tabatabai, and Q. Jacobson. Architectural support for software transactional memory. In Proc. of the 39th Annual International Symposium on Microarchitecture, pages 185--196, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. T. Schneider, V. Menon, T. Shpeisman, and A.-R. Adl-Tabatabai. Dynamic optimization for efficient strong atomicity. In Proceedings of the OOPSLA'08, pages 181--194, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. L. Seidl and B. G. Zorn. Segregating heap objects by reference behavior and lifetime. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12--23, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. N. Shavit and D. Touitou. Software transactional memory. Journal of Parallel and Distributed Computing, 10(2):99--116, Feb. 1997.Google ScholarGoogle ScholarCross RefCross Ref
  36. T. Shpeisman, V. Menon, A.-R. Adl-Tabatabai, S. Balensiefer, D. Grossman, R. L. Hudson, K. F. Moore, and B. Saha. Enforcing isolation and ordering in STM. In Proc. of the SIGPLAN '07 Conference on Programming Language Design and Implementation, pages 78--88, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Shriraman, S. Dwarkadas, and M. L. Scott. Flexible Decoupled Transactional Memory Support. In Proc. of the 35th Annual International Symposium on Computer Architecture, pages 139--150, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. F. Spear, V. J. Marathe, W. N. S. Iii, and M. L. Scott. Conflict detection and validation strategies for software transactional memory. In Proc. of the 2006 International Symposium on Distributed Computing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. F. Spear, M. M. Michael, and C. von Praun. RingSTM: scalable transactions with a single atomic instruction. pages 275--284, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. G. Steffan and T. C. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proc. of the 4th International Symposium on High--Performance Computer Architecture, pages 2--13, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proc. of the 2002 International Conference on Compiler Construction, pages 179--196, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. N. Vachharajani, R. Rangan, E. Raman, M. Bridges, G. Ottoni, and D. August. Speculative Decoupled Software Pipelining. In Proc. of the 16th International Conference on Parallel Architectures and Compilation Techniques, pages 49--59, Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. L. Yen et al. LogTM-SE: Decoupling hardware transactional memory from caches. In Proc. of the 13th International Symposium on High-Performance Computer Architecture, pages 261--272, Feb. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proc. of the 14th International Symposium on High-Performance Computer Architecture, Feb. 2008.Google ScholarGoogle Scholar
  45. C. Zilles and G. Sohi. Master/slave speculative parallelization. In Proc. of the 35th Annual International Symposium on Microarchitecture, pages 85--96, Nov. 2002.\endthebibliography Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!