skip to main content
10.1145/1736020.1736064acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Virtualized and flexible ECC for main memory

Published:13 March 2010Publication History

ABSTRACT

We present a general scheme for virtualizing main memory error-correction mechanisms, which map redundant information needed to correct errors into the memory namespace itself. We rely on this basic idea, which increases flexibility to increase error protection capabilities, improve power efficiency, and reduce system cost; with only small performance overheads. We augment the virtual memory system architecture to detach the physical mapping of data from the physical mapping of its associated ECC information. We then use this mechanism to develop two-tiered error protection techniques that separate the process of detecting errors from the rare need to also correct errors, and thus save energy. We describe how to provide strong chipkill and double-chip kill protection using existing DRAM and packaging technology. We show how to maintain access granularity and redundancy overheads, even when using ×8 DRAM chips. We also evaluate error correction for systems that do not use ECC DIMMs. Overall, analysis of demanding SPEC CPU 2006 and PARSEC benchmarks indicates that performance overhead is only 1% with ECC DIMMs and less than 10% using standard Non-ECC DIMM configurations, that DRAM power savings can be as high as 27%, and that the system energy-delay product is improved by 12% on average.

References

  1. Calculating memory system power for DDR2. Technical Report TN-47-04, Micron Technology, 2005.Google ScholarGoogle Scholar
  2. N. Aggarwal, J. E. Smith, K. K. Saluja, N. P. Jouppi, and P. Ranganathan. Implementing high availability memory with a duplication cache. In Proc. the 41st IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. H. Ahn, N. P. Jouppi, C. Kozyrakis, J. Leverich, and R. S. Schreiber. Future scaling of processor-memmory interfaces. In Proc. the Int'l Conf. High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. H. Ahn, J. Leverich, R. Schreiber, and N. P. Jouppi. Multicore DIMM: An energy efficient memory module with independently controlled DRAMs. IEEE Computer Architecture Letters, 8(1):5--8, Jan. -- Jun. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. AMD. BIOS and kernel developer's guide for AMD NPT family 0Fh processors, Jul. 2007. URL http://support.amd.com/us/Processor_TechDocs/32559.pdf.Google ScholarGoogle Scholar
  6. S. Ankireddi and T. Chen. Challenges in thermal management of memory modules. URL http://electronics-cooling.com/html/2008_feb_a3.php.Google ScholarGoogle Scholar
  7. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. Technical Report TR-811-08, Princeton Univ., Jan. 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proc. the 27th Ann. Int'l Sump. Computer Architecure (ISCA), Jun. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. L. Chen. Symbol error correcting codes for memory applications. In Proc. the 26th Ann. Int'l Symp. Fault-Tolerant Computing (FTCS), Jun. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. L. Chen and M. Y. Hsiao. Error-correcting codes for semiconductor memory applications: A state-of-the-art review. IBM J. Research and Development, 28: 124--134, Mar. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Danilak. Transparent error correction code memory system and method. US Patent, US 7,117,421, Oct. 2006.Google ScholarGoogle Scholar
  12. T. J. Dell. A white paper on the benefits of chipkill-correct ECC for PC server main memory. IBM Microelectronics Division, Nov. 1997.Google ScholarGoogle Scholar
  13. T. J. Dell. System RAS implications of DRAM soft errors. IBM J. Research and Development, 52(3):307--314, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Earl Joseph II. GUPS (giga-updates per second) benchmark. URL http://www.dgate.org/~brg/files/dis/gups/.Google ScholarGoogle Scholar
  15. M. J. Haertel, R. S. Polzin, A. Kocev, and M. B. Steinman. ECC implementation in non-ECC components. US Patent Pending, Serial No. 725,922, Sep. 2008.Google ScholarGoogle Scholar
  16. G. Hamerly, E. Perelman, J. Lau, and B. Calder. SimPoint 3.0: Faster and more exible program analysis. In Proc. the Workshop on Modeling, Benchmarking and Simulation, Jun. 2005.Google ScholarGoogle Scholar
  17. R. W. Hamming. Error correcting and error detecting codes. Technical J., 29:147--160, Apr. 1950.Google ScholarGoogle ScholarCross RefCross Ref
  18. HP. Server power calculators. comconfigurator/powercalcs.asp.Google ScholarGoogle Scholar
  19. Bell System URL http://h30099.www3.hp.Google ScholarGoogle Scholar
  20. M. Y. Hsiao. A class of optimal minimum odd-weight-column SEC-DED codes. IBM J. Research and Development, 14:395--301, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. IBM. Enhancing IBM Netfinity server reliability, 1999.Google ScholarGoogle Scholar
  22. B. Jacob, S. Ng, and D. Wang. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Kuppuswamy, S. R. Sawant, S. Balasubramanian, P. Kaushik, N. Natarajan,Google ScholarGoogle Scholar
  24. and J. D. Gilbert. Over one million TPCC with a 45nm 6-core Xeon CPU. In Proc. Int'l Solid State Circuits Conf. (ISSCC), Feb. 2009.Google ScholarGoogle Scholar
  25. H.-H. S. Lee, G. S. Tyson, and M. K. Farrens. Eager writeback -- a technique for improving bandwidth utilization. In Proc. the 33rd IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Nov.-Dec. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Lim, P. Ranganathan, J. Chang, C. Patel, T. Mudge, and S. Reinhardt. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proc. the 35th Ann. Int'l Symp. Computer Architecture (ISCA), Jun. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Lin and D. J. C. Jr. Error Control Coding: Fundamentals and Applications. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1983.Google ScholarGoogle Scholar
  28. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. PIN: Building customized program analysis tools with dynamic instrumentation. In Proc. the ACM SIGPLAN Conf. Programming Language Design and Implementation (PLDI), Jun. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hog-berg, F. Larsson, A. Moestedt, and B. Werner. SIMICS: A full system simulation platform. IEEE Computer, 35:50--58, Feb. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Computer Architecture News (CAN), 33:92--99, Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. D. McCalpin. STREAM: Sustainable memory bandwidth in high performance computers. URL http://www.cs.virginia.edu/stream/.Google ScholarGoogle Scholar
  32. U. Nawathe, M.Hassan, L. Warriner, K. Yen, B. Upputuri, D.Greenhill, A.Kumar, and H. Park. An 8-core, 64-thread, 64-bit, power efficient SPARC SoC. In Proc. the Int'l Solid State Circuits Conf. (ISSCC), Feb. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. NVIDIA. Fermi architecture. fermi_architecture.html. http://www.nvidia.com/object/Google ScholarGoogle Scholar
  34. I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. J. Soc. for Industrial and Applied Math., 8:300--304, Jun. 1960.Google ScholarGoogle ScholarCross RefCross Ref
  35. N. N. Sadler and D. J. Sorin. Choosing an error protection scheme for a microprocessor's L1 data cache. In Proc. the Int'l Conf. Computer Design (ICCD), Oct. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  36. B. Schroeder, E. Pinheiro, and W.-D. Weber. DRAM errors in the wild: A large-scale field study. In Proc. the 11th Int'l Joint Conf. Measurement and Modeling of Computer Systems (SIGMETRICS), Jun. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts. Wiley, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Slayman. Impact of error correction code and dynamic memory reconfiguration on high-reliability/low-cost server memory. In Proc. IEEE Int'l Integrated Reliability Workshop (IIRW), Oct. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  39. Standard Performance Evaluation Corporation. SPEC CPU 2006, 2006. URL http://www.spec.org/cpu2006/.Google ScholarGoogle Scholar
  40. J. Standards. JESD 79-2e DDR2 SDRAM specification, 2008.Google ScholarGoogle Scholar
  41. J. Standards. JESD 79-3b DDR3 SDRAM specification, 2008.Google ScholarGoogle Scholar
  42. OpenSPARC T2 System-On-Chip (SOC) Microarchitecture Specification. Sun Microsystems Inc., May 2008.Google ScholarGoogle Scholar
  43. UltraSPARC R III Cu. Sun Microsystems Inc., Jan. 2004.Google ScholarGoogle Scholar
  44. M. Talluri and M. D. Hill. Surpassing the TLB performance of superpages with less operating system support. In Proc. the 6th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical report, HP Laboratories, Apr. 2008.Google ScholarGoogle Scholar
  46. Violin Memory Inc. Scalable memory applicance. violin-memory.com/DRAM.Google ScholarGoogle Scholar
  47. D. Wang, B. Ganesh, N. Tuaycharoen, K. Baynes, A. Jaleel, and B. Jacob. DRAMsim: A memory-system simulator. SIGARCH Computer Architecture News (CAN), 33:100--107, Sep. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. P. M. Wells, K. Chakraborty, and G. S. Sohi. Mixed-mode multicore reliability. In Proc. the 14th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. E. Witchel, J. Cates, and K. Asanovic. Mondrian memory protection. In Proc. the 10th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. D. H. Yoon and M. Erez. Flexible cache error protection using an ECC FIFO. In Proc. the Int'l Conf. High Performance Computing, Networking, Storage, and Analysis (SC), Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. D. H. Yoon and M. Erez. Memory mapped ECC: Low-cost error protection for last level caches. In Proc. the 36th Ann. Int'l Symp. Computer Architecture (ISCA), Jun. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Z. Zhang, Z. Zhu, and X. Zhang. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In Proc. the 33rd IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Dec. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. H. Zheng, J. Lin, Z. Zhang, E. Gorbatov, H. David, and Z. Zhu. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency. In Proc. the 41st IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Virtualized and flexible ECC for main memory

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!