skip to main content
research-article

Using managed runtime systems to tolerate holes in wearable memories

Published:16 June 2013Publication History
Skip Abstract Section

Abstract

New memory technologies, such as phase-change memory (PCM), promise denser and cheaper main memory, and are expected to displace DRAM. However, many of them experience permanent failures far more quickly than DRAM. DRAM mechanisms that handle permanent failures rely on very low failure rates and, if directly applied to PCM, are extremely inefficient: Discarding a page when the first line fails wastes 98% of the memory.

This paper proposes low complexity cooperative software and hardware that handle failure rates as high as 50%. Our approach makes error handling transparent to the application by using the memory abstraction offered by managed languages. Once hardware error correction for a memory line is exhausted, rather than discarding the entire page, the hardware communicates the failed line to a failure-aware OS and runtime. The runtime ensures memory allocations never use failed lines and moves data when lines fail during program execution. This paper describes minimal extensions to an Immix mark-region garbage collector, which correctly utilizes pages with failed physical lines by skipping over failures. This paper also proposes hardware support that clusters failed lines at one end of a memory region to reduce fragmentation and improve performance under failures. Contrary to accepted hardware wisdom that advocates for wear-leveling, we show that with software support non-uniform failures delay the impact of memory failure. Together, these mechanisms incur no performance overhead when there are no failures and at failure levels of 10% to 50% suffer only an average overhead of 4% and 12%}, respectively. These results indicate that hardware and software cooperation can greatly extend the life of wearable memories.

References

  1. D. Bacon, P. Cheng, and V. T. Rajan. Controlling fragmentation and space consumption in the Metronome, a real-time garbage collector for Java. In Proceedings of the 2003 ACM SIGPLAN Conference on Languages, Compiler, and Tool Support for Embedded Systems, pages 81--92, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 117--128, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 22--32, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. M. Blackburn, P. Cheng, and K. S. McKinley. Myths and realities: The performance impact of garbage collection. In Proceedings of the 2004 ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, pages 25--36, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 169--190, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H.-J. Boehm. Conservative GC algorithmic overview. http://www.hpl.hp.com/personal/Hans\_Boehm/gc/gcdescr.html.Google ScholarGoogle Scholar
  7. J. Condit, E. B. Nightingale, C. Frost, E. Ipek, D. Burger, B. C. Lee, and D. Coetzee. Better I/O through byte-addressable, persistent memory. In Proceedings of the 22nd ACM Sumposium on Operating Systems Principles, pages 133--146, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Demmers, M. Weiser, B. Hayes, H. Boehm, D. Bobrow, and S. Shenker. Combining generational and conservative garbage collection: Framework and implementations. In Proceedings of the 17th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 261--269, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Avoiding server downtime from hardware errors in system memory with HP Memory Quarantine. Hewlett-Packard Corporation.Google ScholarGoogle Scholar
  10. X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: Improving program locality. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 69--80, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Ipek, J. Condit, E. B. Nightingale, D. Burger, and T. Moscibroda. Dynamically replicated memory: Building reliable systems from nanoscale resistive memories. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 3--14, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. ITRS Working Group. ITRS report. Technical report, International Technology Roadmap for Semiconductors, 2011.Google ScholarGoogle Scholar
  13. Jikes RVM. phCompiler Replay, Dec. 2011. http://jikesrvm.org/Exper-imental+Guidelines.Google ScholarGoogle Scholar
  14. D. Lea. A memory allocator. http://g.oswego.edu/dl/html/malloc.html.Google ScholarGoogle Scholar
  15. Micron Technology Inc. PCM-based MCP. http://www.micron.com/ products/multichip-packages/pcm-based-mcp?source=mb.Google ScholarGoogle Scholar
  16. M. K. Qureshi. Pay-as-you-go: Low-overhead hard-error correction for phase change memories. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pages 318--328, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. K. Qureshi, J. Karidis, M. Franceschini, V. Srinivasan, L. Lastras, and B. Abali. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In phProceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 14--23, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. K. Qureshi, V. Srinivasan, and J. Rivers. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture, pages 24--33, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Raoux, G. Burr, M. Breitwisch, C. Rettner, Y. Chen, R. Shelby, M. Salinga, D. Krebs, S.-H. Chen, H. L. Lung, and C. Lam. Phase-change random access memory: A scalable technology. IBM Journal of Research and Development, 52 (4.5): 465--479, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Rattner. Extreme scale computing. Keynote Speech at the 39th International Symposium on Computer Architecture, 2012.Google ScholarGoogle Scholar
  21. J. B. Sartor, S. M. Blackburn, D. Frampton, M. Hirzel, and K. S. McKinley. Z-rays: Divide arrays and conquer speed and flexibility. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 471--482, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Schechter, G. Loh, K. Strauss, and D. Burger. Use ECP, not ECC, for hard failures in resistive memories. In phProceedings of the 37th Annual International Symposium on Computer Architecture, pages 141--152, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. H. Seong, D. H. Woo, V. Srinivasan, J. A. Rivers, and H.-H. S. Lee. SAFER: Stuck-at-fault error recovery for memories. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pages 115--124, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Yang, S. M. Blackburn, D. Frampton, J. B. Sartor, and K. S. McKinley. Why nothing matters: The impact of zeroing. In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 307--324, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. H. Yoon, N. Muralimanohar, J. Chang, P. Ranganathan, N. P. Jouppi, and M. Erez. FREE-p: Protecting non-volatile memory against both hard and soft errors. In Proceedings of the 17th International Symposium on High Performance Computer Architecture, pages 466--477, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Zhou, B. Zhao, J. Yang, and Y. Zhang. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture, pages 14--23, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Using managed runtime systems to tolerate holes in wearable memories

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 48, Issue 6
        PLDI '13
        June 2013
        515 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2499370
        Issue’s Table of Contents
        • cover image ACM Conferences
          PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2013
          546 pages
          ISBN:9781450320146
          DOI:10.1145/2491956

        Copyright © 2013 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 June 2013

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!