skip to main content
research-article
Public Access

Edge-TM: Exploiting Transactional Memory for Error Tolerance and Energy Efficiency

Published:27 September 2017Publication History
Skip Abstract Section

Abstract

Scaling of semiconductor devices has enabled higher levels of integration and performance improvements at the price of making devices more susceptible to the effects of static and dynamic variability. Adding safety margins (guardbands) on the operating frequency or supply voltage prevents timing errors, but has a negative impact on performance and energy consumption. We propose Edge-TM, an adaptive hardware/software error management policy that (i) optimistically scales the voltage beyond the edge of safe operation for better energy savings and (ii) works in combination with a Hardware Transactional Memory (HTM)-based error recovery mechanism. The policy applies dynamic voltage scaling (DVS) (while keeping frequency fixed) based on the feedback provided by HTM, which makes it simple and generally applicable. Experiments on an embedded platform show our technique capable of 57% energy improvement compared to using voltage guardbands and an extra 21-24% improvement over existing state-of-the-art error tolerance solutions, at a nominal area and time overhead.

References

  1. S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De. 2003. Parameter variations and impact on circuits and microarchitecture. In DAC. 338--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. A. Bowman, J. W. Tschanz, Nam Sung Kim, J. C. Lee, C. B. Wilkerson, S. L. Lu, T. Karnik, and V. K. De. 2009. Energy-efficient and metastability-immune resilient circuits for dynamic variation tolerance. IEEE JSSC 44, 1 (Jan 2009), 49--63.Google ScholarGoogle ScholarCross RefCross Ref
  3. K. A. Bowman, J. W. Tschanz, S. L. Lu, P. A. Aseron, M. M. Khellah, A. Raychowdhury, B. M. Geuskens, C. Tokunaga, C. B. Wilkerson, T. Karnik, and V. K. De. 2011. A 45nm resilient microprocessor core for dynamic variation tolerance. IEEE JSSC 46, 1 (Jan 2011), 194--208.Google ScholarGoogle ScholarCross RefCross Ref
  4. F. Chaix, G. Bizot, M. Nicolaidis, and N. E. Zergainoh. 2011. Variability-aware task mapping strategies for many-cores processor chips. In IOLTS. 55--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cristian Constantinescu. 2008. Intermittent faults and effects on reliability of integrated circuits. In RAMS. 370--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Das, D. Roberts, Seokwoo Lee, S. Pant, D. Blaauw, T. Austin, K. Flautner, and T. Mudge. 2006. A self-tuning DVS processor using delay-error detection and correction. IEEE JSSC 41, 4 (April 2006), 792--804.Google ScholarGoogle ScholarCross RefCross Ref
  7. S. Das, C. Tokunaga, S. Pant, W. H. Ma, S. Kalaiselvan, K. Lai, D. M. Bull, and D. T. Blaauw. 2009. RazorII: In situ error detection and correction for PVT and SER tolerance. IEEE JSSC 44, 1 (Jan 2009), 32--48.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Dighe, S. R. Vangal, P. Aseron, S. Kumar, T. Jacob, K. A. Bowman, J. Howard, J. Tschanz, V. Erraguntla, N. Borkar, V. K. De, and S. Borkar. 2011. Within-die variation-aware dynamic-voltage-frequency-scaling with optimal core allocation and thread hopping for the 80-core TeraFLOPS processor. JSSC 46, 1 (Jan 2011), 184--193.Google ScholarGoogle Scholar
  9. Dan Ernst, Nam Sung Kim, Shidhartha Das, Sanjay Pant, Rajeev Rao, Toan Pham, Conrad Ziesler, David Blaauw, Todd Austin, Krisztian Flautner, and Trevor Mudge. 2003. Razor: A low-power pipeline based on circuit-level timing speculation. In MICRO. 7--. http://dl.acm.org/citation.cfm?id=956417.956571 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Fojtik, D. Fick, Y. Kim, N. Pinckney, D. M. Harris, D. Blaauw, and D. Sylvester. 2013. Bubble razor: Eliminating timing margins in an ARM cortex-M3 processor in 45 nm CMOS using architecturally independent error detection and correction. IEEE JSSC 48, 1 (Jan 2013), 66--81.Google ScholarGoogle ScholarCross RefCross Ref
  11. Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional memory: Architectural support for lock-free data structures. In ISCA. 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sungpack Hong, Tayo Oguntebi, Jared Casper, Nathan Bronson, Christos Kozyrakis, and Kunle Olukotun. 2010. Eigenbench: A simple exploration tool for orthogonal TM characteristics. In IISWC. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Intel. 2009. Voltage Regulator Module and Enterprise Voltage Regulator-Down 11.1. (2009). http://www.intel.com/Assets/en_US/PDF/designguide/321736.pdf.Google ScholarGoogle Scholar
  14. A. B. Kahng, S. Kang, R. Kumar, and J. Sartori. 2010. Slack redistribution for graceful degradation under voltage overscaling. In 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC). 825--831. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Veit B. Kleeberger, Petra R. Maier, and Ulf Schlichtmann. 2014. Workload- and instruction-aware timing analysis: The missing link between technology and system-level resilience. In DAC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Leem, Hyungmin Cho, J. Bau, Q. A. Jacobson, and S. Mitra. 2010. ERSA: Error resilient system architecture for probabilistic applications. In DATE. 1560--1565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lai Liangzhen and Puneet Gupta. 2014. A Case Study of Logic Delay Fault Behaviors on General-Purpose Embedded Processor Under Voltage Overscaling. Technical Report. University of California. Retrieved from http://escholarship.org/uc/item/3967v8hw.Google ScholarGoogle Scholar
  18. S. Narayanan, G. Lyle, R. Kumar, and D. Jones. 2009. Testing the critical operating point (COP) hypothesis using FPGA emulation of timing errors in over-scaled soft-processors. In SELSE.Google ScholarGoogle Scholar
  19. OpenMP. 2017. The OpenMP Application Program Interface v.3.0. available through www.openmp.org. (2017).Google ScholarGoogle Scholar
  20. Dimitra Papagiannopoulou, Andrea Marongiu, Tali Moreshet, Luca Benini, Maurice Herlihy, and Iris Bahar. 2015. Playing with fire: Transactional memory revisited for error-resilient and energy-efficient MPSoC execution. In GLSVLSI. 9--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Papagiannopoulou, T. Moreshet, A. Marongiu, L. Benini, M. Herlihy, and R. Iris Bahar. 2014. Speculative synchronization for coherence-free embedded NUMA architectures. In SAMOS. 99--106.Google ScholarGoogle Scholar
  22. J. Patel. 2008. CMOS process variations: A critical operation point hypothesis. web.stanford.edu/class/ee380/Abstracts/080402-jhpatel.pdf. (2008). http://web.stanford.edu/class/ee380/Abstracts/080402-jhpatel.pdf.Google ScholarGoogle Scholar
  23. Francesco Paterna, Andrea Acquaviva, Alberto Caprara, Francesco Papariello, Giuseppe Desoli, and Luca Benini. 2012. Variability-aware task allocation for energy-efficient quality of service provisioning in embedded streaming multimedia applications. IEEE TOC 61, 7 (2012), 939--953. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Abbas Rahimi, Daniele Cesarini, Andrea Marongiu, Rajesh K. Gupta, and Luca Benini. 2014. Improving resilience to timing errors by exposing variability effects to software in tightly-coupled processor clusters. JETCAS 4, 2 (2014), 216--229.Google ScholarGoogle Scholar
  25. D. Rossi, F. Conti, A. Marongiu, A. Pullini, I. Loi, M. Gautschi, G. Tagliavini, A. Capotondi, P. Flatresse, and L. Benini. 2015. PULP: A parallel ultra low power platform for next generation IoT applications. In Hot Chips.Google ScholarGoogle Scholar
  26. Davide Rossi, Antonio Pullini, Igor Loi, Michael Gautschi, Frank Kagan Gurkaynak, Adam Teman, Jeremy Constantin, Andreas Burg, Ivan Miro-Panades, Edith Beigné, Fabien Clermidy, Fady Abouzeid, Philippe Flatresse, and Luca Benini. 2016. 193 MOPS/mW @ 162 MOPS, 0.32V to 1.15V voltage range multi-core accelerator for energy efficient parallel and sequential digital processing. In COOL CHIPS.Google ScholarGoogle Scholar
  27. S. R. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari, and J. Torrellas. 2008. VARIUS: A model of process variation and resulting timing errors for microarchitects. IEEE TSM 21, 1 (Feb 2008), 3--13.Google ScholarGoogle Scholar
  28. John Sartori and Rakesh Kumar. 2010. Overscaling-friendly timing speculation architectures. In GLSVLSI. 209--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Tschanz, K. Bowman, S. Walstra, M. Agostinelli, T. Karnik, and Vivek De. 2009. Tunable replica circuits and adaptive voltage-frequency techniques for dynamic voltage, temperature, and aging variation tolerance. In SVC. 112--113.Google ScholarGoogle Scholar
  30. Jons-Tobias Wamhoff, Mario Schwalbe, Rasha Faqeh, Christof Fetzer, and Pascal Felber. 2013. Transactional encoding for tolerating transient hardware errors. In Stabilization, Safety, and Security of Distributed Systems. Vol. 8255. Springer Intl. Pub., 1--16.Google ScholarGoogle Scholar
  31. Philip M. Wells, Koushik Chakraborty, and Gurindar S. Sohi. 2008. Adapting to intermittent faults in multicore systems. In ASPLOS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. G. Yalcin, A. Cristal, O. Unsal, A. Sobe, D. Harmanci, P. Felber, A. Voronin, J.-T. Wamhoff, and C. Fetzer. 2014. Combining error detection and transactional memory for energy-efficient computing below safe operation margins. In PDP. 248--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Gulay Yalcin, Osman Unsal, and Adrian Cristal. 2013. FaulTM: Error detection and recovery using hardware transactional memory. In DATE. 220--225. http://dl.acm.org/citation.cfm?id=2485288.2485344 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Gulay Yalcin, Osman Sabri Unsal, and Adrian Cristal. 2013. Fault tolerance for multi-threaded applications by leveraging hardware transactional memory. In Computing Frontiers. Article 4, 9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Edge-TM: Exploiting Transactional Memory for Error Tolerance and Energy Efficiency

                        Recommendations

                        Comments

                        Login options

                        Check if you have access through your login credentials or your institution to get full access on this article.

                        Sign in

                        Full Access

                        PDF Format

                        View or Download as a PDF file.

                        PDF

                        eReader

                        View online with eReader.

                        eReader
                        About Cookies On This Site

                        We use cookies to ensure that we give you the best experience on our website.

                        Learn more

                        Got it!