skip to main content
research-article
Public Access

Low-Cost Memory Fault Tolerance for IoT Devices

Published:27 September 2017Publication History
Skip Abstract Section

Abstract

IoT devices need reliable hardware at low cost. It is challenging to efficiently cope with both hard and soft faults in embedded scratchpad memories. To address this problem, we propose a two-step approach: FaultLink and Software-Defined Error-Localizing Codes (SDELC). FaultLink avoids hard faults found during testing by generating a custom-tailored application binary image for each individual chip. During software deployment-time, FaultLink optimally packs small sections of program code and data into fault-free segments of the memory address space and generates a custom linker script for a lazy-linking procedure. During run-time, SDELC deals with unpredictable soft faults via novel and inexpensive Ultra-Lightweight Error-Localizing Codes (UL-ELCs). These require fewer parity bits than single-error-correcting Hamming codes. Yet our UL-ELCs are more powerful than basic single-error-detecting parity: they localize single-bit errors to a specific chunk of a codeword. SDELC then heuristically recovers from these localized errors using a small embedded C library that exploits observable side information (SI) about the application’s memory contents. SI can be in the form of redundant data (value locality), legal/illegal instructions, etc. Our combined FaultLink+SDELC approach improves min-VDD by up to 440 mV and correctly recovers from up to 90% (70%) of random single-bit soft faults in data (instructions) with just three parity bits per 32-bit word.

References

  1. 1995. Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification (Version 1.2). (1995).Google ScholarGoogle Scholar
  2. Amit Agarwal, Bipul C. Paul, Hamid Mahmoodi, Animesh Datta, and Kaushik Roy. 2005. A process-tolerant cache architecture for improved yield in nanoscale technologies. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 13, 1 (2005), 27--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yuvraj Agarwal, Alex Bishop, Tuck-Boon Chan, Matt Fotjik, Puneet Gupta, Andrew B. Kahng, Liangzhen Lai, Paul Martin, Mani Srivastava, Dennis Sylvester, Lucas Wanner, and Bing Zhang. 2014. RedCooper: Hardware Sensor Enabled Variability Software Testbed for Lifetime Energy Constrained Application. Technical Report. University of California, Los Angeles (UCLA).Google ScholarGoogle Scholar
  4. F. J. Aichelmann. 1984. Fault-tolerant design techniques for semiconductor memory applications. IBM Journal of Research and Development 28, 2 (1984), 177--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alaa Alameldeen and David Wood. 2004. Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches. Technical Report. University of Wisconsin, Madison.Google ScholarGoogle Scholar
  6. Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu, Chris Wilkerson, and Shih-Lien Lu. 2011. Energy-efficient cache design using variable-strength error-correcting codes. In Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Amin Ansari, Shuguang Feng, Shantanu Gupta, and Scott Mahlke. 2011. Archipelago: A polymorphic cache design for enabling robust near-threshold operation. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Amin Ansari, Shantanu Gupta, Shuguang Feng, and Scott Mahlke. 2009. ZerehCache: Armoring cache architectures in high defect density technologies. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Abbas BanaiyanMofrad, Houman Homayoun, and Nikil Dutt. 2011. FFT-cache: A flexible fault-tolerant cache architecture for ultra low voltage operation. In Proceedings of the ACM/IEEE International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Proceedings of the ACM/IEEE International Symposium on Hardware/Software Codesign (CODES). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Luis Angel D. Bathen and Nikil D. Dutt. 2011. E-RoC: Embedded RAIDs-on-chip for low power distributed dynamically managed reliable memories. In Design, Automation, and Test in Europe (DATE).Google ScholarGoogle Scholar
  12. Luis Angel D. Bathen, Nikil D. Dutt, Alex Nicolau, and Puneet Gupta. 2012. VaMV: Variability-aware memory virtualization. In Design, Automation, and Test in Europe (DATE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Robert C. Baumann. 2005. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Transactions on Device and Materials Reliability 5, 3 (2005), 305--316.Google ScholarGoogle Scholar
  14. Timothy J. Dell. 1997. A White Paper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory. Technical Report. IBM Microelectronics Division.Google ScholarGoogle Scholar
  15. Nikil Dutt, Puneet Gupta, Alex Nicolau, Abbas BanaiyanMofrad, Mark Gottscho, and Majid Shoushtari. 2014. Multi-layer memory resiliency. In Proceedings of the ACM/IEEE Design Automation Conference (DAC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hamed Farbeh, Mahdi Fazeli, Faramarz Khosravi, and Seyed Ghassem Miremadi. 2012. Memory mapped SPM: Protecting instruction scratchpad memory in embedded systems against soft errors. In Proceedings of the European Dependable Computing Conference (EDCC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Eiji Fujiwara and Masato Kitakami. 1993. A class of error locating codes for byte-organized memory systems. In Proceedings of the International Symposium on Fault-Tolerant Computing.Google ScholarGoogle ScholarCross RefCross Ref
  18. Mark Gottscho, Abbas BanaiyanMofrad, Nikil Dutt, Alex Nicolau, and Puneet Gupta. 2015. DPCS: Dynamic power/capacity scaling for SRAM caches in the nanoscale era. ACM Transactions on Architecture and Code Optimization (TACO) 12, 3 (2015), 26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mark Gottscho, Luis A. D. Bathen, Nikil Dutt, Alex Nicolau, and Puneet Gupta. 2015. ViPZonE: Hardware power variability-aware memory management for energy savings. IEEE Transactions on Computers (TC) 64, 5 (2015), 1483--1496.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mark Gottscho, Clayton Schoeny, Lara Dolecek, and Puneet Gupta. 2016. Software-defined error-correcting codes. In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W).Google ScholarGoogle ScholarCross RefCross Ref
  21. Puneet Gupta, Yuvraj Agarwal, Lara Dolecek, Nikil Dutt, Rajesh K. Gupta, Rakesh Kumar, Subhasish Mitra, Alexandru Nicolau, Tajana Simunic Rosing, Mani B. Srivastava, Steven Swanson, and Dennis Sylvester. 2013. Underdesigned and opportunistic computing in presence of hardware variability. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 32, 1 (2013), 8--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization (IWWC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Said Hamdioui, Georgi Gaydadjiev, and Ad J. van de Goor. 2004. The state-of-art and future trends in testing embedded memories. In International Workshop on Memory Technology, Design and Testing (MTDT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Said Hamdioui, Ad J. van de Goor, and Mike Rodgers. 2002. March SS: A test for all static simple RAM faults. In International Workshop on Memory Technology, Design, and Testing (MTDT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Nam Sung Kim, Krisztian Flautner, David Blaauw, and Trevor Mudge. 2004. Circuit and microarchitectural techniques for reducing cache leakage power. IEEE Transactions on Very Large Scale Integration Systems (TVLSI) 12, 2 (2004), 167--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Liangzhen Lai. 2015. Cross-Layer Approaches for Monitoring, Margining and Mitigation of Circuit Variability. Ph.D. Dissertation. University of California, Los Angeles (UCLA).Google ScholarGoogle Scholar
  27. Serge Lamikhov-Center. 2016. ELFIO: C++ Library for Reading and Generating ELF Files. (2016). http://elfio.sourceforge.net/Google ScholarGoogle Scholar
  28. F. Li, G. Chen, M. Kandemir, and I. Kolcu. 2005. Improving scratch-pad memory reliability through compiler-guided data block duplication. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Man-Lap Li, Pradeep Ramachandran, Swarup K. Sahoo, Sarita V. Adve, Vikram S. Adve, and Yuanyuan Zhou. 2008. Understanding the propagation of hard errors to software and implications for resilient system design. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mikko H. Lipasti, Christopher B. Wilkerson, and John Paul Shen. 1996. Value locality and load value prediction. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G Zorn. 2011. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Shih-Lien Lu, Qiong Cai, and Patrick Stolt. 2013. Memory resiliency. Intel Technology Journal 17, 1 (2013).Google ScholarGoogle Scholar
  33. Yixin Luo, Sriram Govindan, Bikash Sharma, Mark Santaniello, Justin Meza, Aman Kansal, Jie Liu, Badriddine Khessib, Kushagra Vaid, and Onur Mutlu. 2014. Characterizing application memory error vulnerability to optimize datacenter cost via heterogeneous-reliability memory. In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tayyeb Mahmood, Seokin Hong, and Soontae Kim. 2015. Ensuring cache reliability and energy scaling at near-threshold voltage with macho. IEEE Transactions on Computers (TC) 64, 6 (2015), 1694--1706.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mehrtash Manoochehri, Murali Annavaram, and Michel Dubois. 2011. CPPC: Correctable parity protected cache. In Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Michail Mavropoulos, Georgios Keramidas, and Dimitris Nikolos. 2015. A defect-aware reconfigurable cache architecture for low-vccmin DVFS-enabled systems. In Design, Automation, and Test in Europe (DATE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sparsh Mittal. 2014. A survey of architectural techniques for improving cache power efficiency. Sustainable Computing: Informatics and Systems 4, 1 (2014), 33--43.Google ScholarGoogle ScholarCross RefCross Ref
  38. Sparsh Mittal. 2016. A survey of architectural techniques for managing process variation. Comput. Surveys 48, 4 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Amir Mahdi Hosseini Monazzah, Hamed Farbeh, Seyed Ghassem Miremadi, Mahdi Fazeli, and Hossein Asadi. 2013. FTSPM: A fault-tolerant scratchpad memory. In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Mutyam and V. Narayanan. 2007. Working with process variation aware caches. In Design, Automation, and Test in Europe (DATE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Preeti Ranjan Panda, Nikil Dutt, and Alexandru Nicolau. 1999. Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration.Google ScholarGoogle Scholar
  42. Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Michael Powell, Se-Hyun Yang, Babak Falsafi, Kaushik Roy, and T. N. Vijaykumar. 2000. Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of the IEEE International Symposium on Low Power Electronics and Design (ISLPED). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Moinuddin K. Qureshi and Zeshan Chishti. 2013. Operating SECDED-based caches at ultra-low voltage with FLAIR. In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ashish Ranjan, Swagath Venkataramani, Xuanyao Fong, Kaushik Roy, and Anand Raghunathan. 2015. Approximate storage for energy efficient spintronic memories. In Proceedings of the ACM/IEEE Design Automation Conference (DAC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Mohamed M. Sabry, David Atienza, and Francky Catthoor. 2014. OCEAN: An optimized HW/SW reliability mitigation approach for scratchpad memories in real-time SoCs. ACM Transactions on Embedded Computing Systems (TECS) 13, 4s (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, and Dan Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Adrian Sampson, Jacob Nelson, Karin Strauss, and Luis Ceze. 2013. Approximate storage in solid-state memories. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Hossein Sayadi, Hamed Farbeh, Amir Mahdi Hosseini Monazzah, and Seyed Ghassem Miremadi. 2014. A data recomputation approach for reliability improvement of scratchpad memory in embedded systems. In Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT).Google ScholarGoogle ScholarCross RefCross Ref
  50. Mark F. Schilling. 2012. The surprising predictability of long runs. Mathematics Magazine 85, 2 (2012), 141--149.Google ScholarGoogle ScholarCross RefCross Ref
  51. Stanley E. Schuster. 1978. Multiple word/bit line redundancy for semiconductor memories. IEEE Journal of Solid-State Circuits (JSSC) 13, 5 (1978), 698--703.Google ScholarGoogle ScholarCross RefCross Ref
  52. Philip P. Shirvani and Edward J. McCluskey. 1999. PADded cache: A new fault-tolerance technique for cache memories. In Proceedings of the VLSI Test Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Majid Shoushtari, Abbas BanaiyanMofrad, and Nikil Dutt. 2015. Exploiting partially-forgetful memories for approximate computing. IEEE Embedded Systems Letters (ESL) 7, 1 (2015), 19--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Jiguo Song, Gedare Bloom, and Gabriel Palmer. 2016. SuperGlue: IDL-based, system-level fault tolerance for embedded systems. In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).Google ScholarGoogle ScholarCross RefCross Ref
  55. Rick van Rein. 2016. BadRAM: Linux Kernel Support for Broken RAM Modules. (2016).Google ScholarGoogle Scholar
  56. Daniel P. Volpato, Alexandre K. I. Mendonca, Luiz C. V. dos Santos, and José Luís Güntzel. 2010. A post-compiling approach that exploits code granularity in scratchpads to improve energy efficiency. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI). 127--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jiajing Wang and Benton H. Calhoun. 2011. Minimum supply voltage and yield estimation for large SRAMs under parametric variations. IEEE Transactions on Very Large Scale Integration Systems (TVLSI) 19, 11 (2011), 2120--2125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Lucas Wanner, Charwak Apte, Rahul Balani, Puneet Gupta, and Mani Srivastava. 2013. Hardware variability-aware duty cycling for embedded sensors. IEEE Transactions on Very Large Scale Integration Systems (TVLSI) 21, 6 (2013), 1000--1012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Lucas Wanner, Liangzhen Lai, Abbas Rahimi, Mark Gottscho, Pietro Mercati, Chu-Hsiang Huang, Frederic Sala, Yuvraj Agarwal, Lara Dolecek, Nikil Dutt, Puneet Gupta, Rajesh Gupta, Ranjit Jhala, Rakesh Kumar, Sorin Lerner, Subhasish Mitra, Alexandru Nicolau, Tajana Simunic Rosing, Mani B. Srivastava, Steve Swanson, Dennis Sylvester, and Yuanyuan Zhou. 2015. NSF expedition on variability-aware software: Recent results and contributions. De Gruyter Information Technology (IT) 57, 3 (2015).Google ScholarGoogle Scholar
  60. Andrew Waterman, Yunsup Lee, David Patterson, and Krste Asanovic. 2014. The RISC-V Instruction Set Manual Volume I: User-Level ISA Version 2.0. (2014).Google ScholarGoogle Scholar
  61. Chris Wilkerson, Hongliang Gao, Alaa R. Alameldeen, Zeshan Chishti, Muhammad Khellah, and Shih-Lien Lu. 2008. Trading off cache capacity for reliability to enable low voltage operation. In Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Jack K. Wolf. 1965. On an extended class of error-locating codes. Information and Control 8, 2 (1965), 163--169.Google ScholarGoogle ScholarCross RefCross Ref
  63. J. K. Wolf and B. Elspas. 1963. Error-locating codes -- A new concept in error control. IEEE Transactions on Information Theory 9, 2 (1963), 113--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Jun Xu, Zbigniew Kalbarczyk, Sanjay Patel, and Ravishankar K. Iyer. 2002. Architecture support for defending against buffer overflow attacks. In Workshop on Evaluating and Architecting Systems for Dependability.Google ScholarGoogle Scholar
  65. Chao Yan and Russ Joseph. 2016. Enabling deep voltage scaling in delay sensitive L1 caches. In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).Google ScholarGoogle ScholarCross RefCross Ref
  66. Jun Yang, Youtao Zhang, and Rajiv Gupta. 2000. Frequent value compression in data caches. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO). 258--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Amir Yazdanbakhsh, Divya Mahajan, Hadi Esmaeilzadeh, and Pejman Lotfi-Kamran. 2017. AxBench: A multiplatform benchmark suite for approximate computing. IEEE Design and Test 34, 2 (2017), 60--68.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Low-Cost Memory Fault Tolerance for IoT Devices

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader
                  About Cookies On This Site

                  We use cookies to ensure that we give you the best experience on our website.

                  Learn more

                  Got it!