Abstract
Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurrency is a key demand for delivering satisfactory performance at low energy cost. In order to achieve this high concurrency, consistency across the shared memory hierarchy must be accomplished in a cost-effective manner in terms of performance, energy, and implementation complexity. In this article, we propose Embedded-Spec, a hardware solution for supporting transparent lock speculation, without the requirement for special supporting instructions. Using this approach, we evaluate the energy consumption and performance of a suite of benchmarks, exploring a range of contention management and retry policies. We conclude that for resource-constrained platforms, lock speculation can provide real benefits in terms of improved concurrency and energy efficiency, as long as the underlying hardware support is carefully configured.
- C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie. 2005. Unbounded transactional memory. In Proceedings of the ACM/IEEE International Symposium on High-Performance Computer Architecture. Google Scholar
Digital Library
- F. Angiolini, J. Ceng, R. Leupers, F. Ferrari, C. Ferri, and L. Benini. 2006. An integrated open framework for heterogeneous MPSoC design space exploration. In DATE’06. European Design and Automation Association, 1145--1150. Google Scholar
Digital Library
- C. Blundell, E. C. Lewis, and M. M. K. Martin. 2006. Subtleties of transactional memory atomicity semantics. Computer Architecture Letters 5, 2 (Nov. 2006). Google Scholar
Digital Library
- J. Bobba, N. Goyal, M. D. Hill, M. M. Swift, and D. A. Wood. 2008. TokenTM: Efficient execution of large transactions with hardware transactional memory. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). IEEE Computer Society, Washington, DC, 127--138. DOI:http://dx.doi.org/10.1109/ISCA.2008.24 Google Scholar
Digital Library
- D. Christie, J.-W. Chung, S. Diestelhorst, M. Hohmuth, M. Pohlack, C. Fetzer, M. Nowack, T. Riegel, P. Felber, P. Marlier, and E. Rivière. 2010. Evaluation of AMD’s advanced synchronization facility within a complete transactional memory stack. In Proceedings of the 5th European Conference on Computer Systems (EuroSys’10). ACM, New York, NY, 27--40. DOI:http://dx.doi.org/10.1145/1755913.1755918 Google Scholar
Digital Library
- D. Dice, Y. Lev, M. Moir, and D. Nussbaum. 2009. Early experience with a commercial hardware transactional memory implementation. SIGPLAN Not. 44, 3 (March 2009), 157--168. DOI:http://dx.doi.org/10.1145/1508284.1508263 Google Scholar
Digital Library
- A Efthymiou and J. D. Garside. 2002. An adaptive serial-parallel CAM architecture for low-power cache blocks. In Proceedings of the 2002 International Symposium on Low Power Electronics and Design (ISLPED’02). 136--141. DOI:http://dx.doi.org/10.1109/LPE.2002.146726 Google Scholar
Digital Library
- C. Ferri, A. Marongiu, B. Lipton, T. Moreshet, R. I. Bahar, M. Herlihy, and L. Benini. 2011. SoC-TM: Integrated HW/SW support for transactional memory programming on embedded MPSoCs. In Proceedings of the 9th Annual Conference on Hardware/Software Codesign and System Synthesis (CODES’11). 39--48. Google Scholar
Digital Library
- C. Ferri, S. Wood, T. Moreshet, R. I. Bahar, and M. Herlihy. 2010a. Embedded-TM: Energy and complexity-effective hardware transactional memory for embedded multicore systems. J. Parallel Distrib. Comput. 70, 10 (October 2010), 1042--1052. Google Scholar
Digital Library
- C. Ferri, S. Wood, T. Moreshet, R. I. Bahar, and M. Herlihy. 2010b. Energy and throughput efficient transactional memory for embedded multicore systems. In Proceedings of the 5th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’10). Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization. 3--14. DOI:http://dx.doi.org/10.1109/WWC.2001.15 Google Scholar
Digital Library
- L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and Ka. Olukotun. 2004. Transactional memory coherence and consistency. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA’04). 102. Google Scholar
Digital Library
- T. Harris, J. R. Larus, and R. Rajwar. 2010. Transactional memory (2nd ed.). Synthesis Lectures on Computer Architecture 5, 1 (2010), 1--263. DOI:http://dx.doi.org/10.2200/S00272ED1V01Y201006CAC011 Google Scholar
Digital Library
- M. Herlihy and J. E. B. Moss. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA’93). 289--300. DOI:http://dx.doi.org/10.1145/165123.165164 Google Scholar
Digital Library
- M. Horowitz, T. Indermaur, and R. Gonzalez. 1994. Low-power digital design. In IEEE Symposium on Low Power Electronics. 8--11. DOI:http://dx.doi.org/10.1109/LPE.1994.573184Google Scholar
- Intel Corporation. 2012. Transactional Synchronization in Haswell. Retrieved from http://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronizati on-in-haswell/.Google Scholar
- D. Kanter. 2012. Analysis of Haswells Transactional Memory. Retrieved from http://www.realworldtech.com/haswell-tm/.Google Scholar
- A. Kleen. 2014. Scaling Existing Lock-Based Applications with Lock Elision. Retrieved from http://queue.acm.org/detail.cfm?id=2579227. Google Scholar
Digital Library
- C. C. Minh, J. W. Chung, C. Kozyrakis, and K. Olukotun. 2008. STAMP: Stanford transactional applications for multi-processing. In Proceedings of the International Symposium on Workload Characterization.Google Scholar
- K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood. 2006. LogTM: Log-based transactional memory. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA’06). 254--265.Google Scholar
- M. Pohlack and S. Diestelhorst. 2011. From lightweight hardware transactional memory to lightweight lock elision. Presented at the 6th ACM SIGGPLAN Workshop on Transactional Computing (TRANSACT’11).Google Scholar
- R. Rajwar and J. R. Goodman. 2001. Speculative lock elision: Enabling highly concurrent multithreaded execution. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 34’01). 294--305. http://dl.acm.org/citation.cfm?id=563998.564036 Google Scholar
Digital Library
- R. Rajwar and J. R. Goodman. 2002. Transactional lock-free execution of lock-based programs. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X). ACM, New York, NY, 5--17. DOI:http://dx.doi.org/10.1145/605397.605399 Google Scholar
Digital Library
- R. Rajwar, M. Herlihy, and K. Lai. 2005. Virtualizing transactional memory. In Proceedings of the 32nd Annual ACM/IEEE International Symposium on Computer Architecture. Google Scholar
Digital Library
- A. Shriraman, S. Dwarkadas, and M. L. Scott. 2010. Implementation tradeoffs in the design of flexible transactional memory support. J. Parallel Distrib. Comput. 70, 10 (October 2010), 1068--1084. Google Scholar
Digital Library
- STMicroelectronics. 2008. Nomadik Platform. www.st.com. (2008).Google Scholar
- S. Tomić, C. Perfumo, C. Kulkarni, A. Armejach, A. Cristal, O. Unsal, T. Harris, and M. Valero. 2009. EazyHTM: Eager-lazy hardware transactional memory. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 42). ACM, New York, NY, 145--155. DOI:http://dx.doi.org/10.1145/1669112.1669132 Google Scholar
Digital Library
- L. Yen, J. Bobba, M. R. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. 2007. LogTM-SE: Decoupling hardware transactional memory from caches. In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). 261--272. DOI:http://dx.doi.org/10.1109/HPCA.2007.346204 Google Scholar
Digital Library
Index Terms
Energy-Efficient and High-Performance Lock Speculation Hardware for Embedded Multicore Systems
Recommendations
Software-improved hardware lock elision
PODC '14: Proceedings of the 2014 ACM symposium on Principles of distributed computingWith hardware transactional memory (HTM) becoming available in mainstream processors, lock-based critical sections may now initiate a hardware transaction instead of taking the lock, enabling their concurrent execution unless a real data conflict ...
Transactional Lock Elision Meets Combining
PODC '17: Proceedings of the ACM Symposium on Principles of Distributed ComputingFlat combining (FC) and transactional lock elision (TLE) are two techniques that facilitate efficient multi-thread access to a sequentially implemented data structure protected by a lock. FC allows threads to delegate their operations to another (...






Comments