skip to main content
research-article

Energy-Efficient and High-Performance Lock Speculation Hardware for Embedded Multicore Systems

Published:21 May 2015Publication History
Skip Abstract Section

Abstract

Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurrency is a key demand for delivering satisfactory performance at low energy cost. In order to achieve this high concurrency, consistency across the shared memory hierarchy must be accomplished in a cost-effective manner in terms of performance, energy, and implementation complexity. In this article, we propose Embedded-Spec, a hardware solution for supporting transparent lock speculation, without the requirement for special supporting instructions. Using this approach, we evaluate the energy consumption and performance of a suite of benchmarks, exploring a range of contention management and retry policies. We conclude that for resource-constrained platforms, lock speculation can provide real benefits in terms of improved concurrency and energy efficiency, as long as the underlying hardware support is carefully configured.

References

  1. C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie. 2005. Unbounded transactional memory. In Proceedings of the ACM/IEEE International Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Angiolini, J. Ceng, R. Leupers, F. Ferrari, C. Ferri, and L. Benini. 2006. An integrated open framework for heterogeneous MPSoC design space exploration. In DATE’06. European Design and Automation Association, 1145--1150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Blundell, E. C. Lewis, and M. M. K. Martin. 2006. Subtleties of transactional memory atomicity semantics. Computer Architecture Letters 5, 2 (Nov. 2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Bobba, N. Goyal, M. D. Hill, M. M. Swift, and D. A. Wood. 2008. TokenTM: Efficient execution of large transactions with hardware transactional memory. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). IEEE Computer Society, Washington, DC, 127--138. DOI:http://dx.doi.org/10.1109/ISCA.2008.24 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Christie, J.-W. Chung, S. Diestelhorst, M. Hohmuth, M. Pohlack, C. Fetzer, M. Nowack, T. Riegel, P. Felber, P. Marlier, and E. Rivière. 2010. Evaluation of AMD’s advanced synchronization facility within a complete transactional memory stack. In Proceedings of the 5th European Conference on Computer Systems (EuroSys’10). ACM, New York, NY, 27--40. DOI:http://dx.doi.org/10.1145/1755913.1755918 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Dice, Y. Lev, M. Moir, and D. Nussbaum. 2009. Early experience with a commercial hardware transactional memory implementation. SIGPLAN Not. 44, 3 (March 2009), 157--168. DOI:http://dx.doi.org/10.1145/1508284.1508263 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A Efthymiou and J. D. Garside. 2002. An adaptive serial-parallel CAM architecture for low-power cache blocks. In Proceedings of the 2002 International Symposium on Low Power Electronics and Design (ISLPED’02). 136--141. DOI:http://dx.doi.org/10.1109/LPE.2002.146726 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Ferri, A. Marongiu, B. Lipton, T. Moreshet, R. I. Bahar, M. Herlihy, and L. Benini. 2011. SoC-TM: Integrated HW/SW support for transactional memory programming on embedded MPSoCs. In Proceedings of the 9th Annual Conference on Hardware/Software Codesign and System Synthesis (CODES’11). 39--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Ferri, S. Wood, T. Moreshet, R. I. Bahar, and M. Herlihy. 2010a. Embedded-TM: Energy and complexity-effective hardware transactional memory for embedded multicore systems. J. Parallel Distrib. Comput. 70, 10 (October 2010), 1042--1052. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Ferri, S. Wood, T. Moreshet, R. I. Bahar, and M. Herlihy. 2010b. Energy and throughput efficient transactional memory for embedded multicore systems. In Proceedings of the 5th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization. 3--14. DOI:http://dx.doi.org/10.1109/WWC.2001.15 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and Ka. Olukotun. 2004. Transactional memory coherence and consistency. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA’04). 102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Harris, J. R. Larus, and R. Rajwar. 2010. Transactional memory (2nd ed.). Synthesis Lectures on Computer Architecture 5, 1 (2010), 1--263. DOI:http://dx.doi.org/10.2200/S00272ED1V01Y201006CAC011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Herlihy and J. E. B. Moss. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA’93). 289--300. DOI:http://dx.doi.org/10.1145/165123.165164 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Horowitz, T. Indermaur, and R. Gonzalez. 1994. Low-power digital design. In IEEE Symposium on Low Power Electronics. 8--11. DOI:http://dx.doi.org/10.1109/LPE.1994.573184Google ScholarGoogle Scholar
  16. Intel Corporation. 2012. Transactional Synchronization in Haswell. Retrieved from http://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronizati on-in-haswell/.Google ScholarGoogle Scholar
  17. D. Kanter. 2012. Analysis of Haswells Transactional Memory. Retrieved from http://www.realworldtech.com/haswell-tm/.Google ScholarGoogle Scholar
  18. A. Kleen. 2014. Scaling Existing Lock-Based Applications with Lock Elision. Retrieved from http://queue.acm.org/detail.cfm?id=2579227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. C. Minh, J. W. Chung, C. Kozyrakis, and K. Olukotun. 2008. STAMP: Stanford transactional applications for multi-processing. In Proceedings of the International Symposium on Workload Characterization.Google ScholarGoogle Scholar
  20. K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood. 2006. LogTM: Log-based transactional memory. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA’06). 254--265.Google ScholarGoogle Scholar
  21. M. Pohlack and S. Diestelhorst. 2011. From lightweight hardware transactional memory to lightweight lock elision. Presented at the 6th ACM SIGGPLAN Workshop on Transactional Computing (TRANSACT’11).Google ScholarGoogle Scholar
  22. R. Rajwar and J. R. Goodman. 2001. Speculative lock elision: Enabling highly concurrent multithreaded execution. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 34’01). 294--305. http://dl.acm.org/citation.cfm?id=563998.564036 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Rajwar and J. R. Goodman. 2002. Transactional lock-free execution of lock-based programs. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X). ACM, New York, NY, 5--17. DOI:http://dx.doi.org/10.1145/605397.605399 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Rajwar, M. Herlihy, and K. Lai. 2005. Virtualizing transactional memory. In Proceedings of the 32nd Annual ACM/IEEE International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Shriraman, S. Dwarkadas, and M. L. Scott. 2010. Implementation tradeoffs in the design of flexible transactional memory support. J. Parallel Distrib. Comput. 70, 10 (October 2010), 1068--1084. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. STMicroelectronics. 2008. Nomadik Platform. www.st.com. (2008).Google ScholarGoogle Scholar
  27. S. Tomić, C. Perfumo, C. Kulkarni, A. Armejach, A. Cristal, O. Unsal, T. Harris, and M. Valero. 2009. EazyHTM: Eager-lazy hardware transactional memory. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, 145--155. DOI:http://dx.doi.org/10.1145/1669112.1669132 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Yen, J. Bobba, M. R. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. 2007. LogTM-SE: Decoupling hardware transactional memory from caches. In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). 261--272. DOI:http://dx.doi.org/10.1109/HPCA.2007.346204 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Energy-Efficient and High-Performance Lock Speculation Hardware for Embedded Multicore Systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!