skip to main content
research-article
Open Access

Mojim: A Reliable and Highly-Available Non-Volatile Memory System

Authors Info & Claims
Published:14 March 2015Publication History
Skip Abstract Section

Abstract

Next-generation non-volatile memories (NVMs) promise DRAM-like performance, persistence, and high density. They can attach directly to processors to form non-volatile main memory (NVMM) and offer the opportunity to build very low-latency storage systems. These high-performance storage systems would be especially useful in large-scale data center environments where reliability and availability are critical. However, providing reliability and availability to NVMM is challenging, since the latency of data replication can overwhelm the low latency that NVMM should provide. We propose Mojim, a system that provides the reliability and availability that large-scale storage systems require, while preserving the performance of NVMM. Mojim achieves these goals by using a two-tier architecture in which the primary tier contains a mirrored pair of nodes and the secondary tier contains one or more secondary backup nodes with weakly consistent copies of data. Mojim uses highly-optimized replication protocols, software, and networking stacks to minimize replication costs and expose as much of NVMM?s performance as possible. We evaluate Mojim using raw DRAM as a proxy for NVMM and using an industrial NVMM emulation system. We find that Mojim provides replicated NVMM with similar or even better performance than un-replicated NVMM (reducing latency by 27% to 63% and delivering between 0.4 to 2.7X the throughput). We demonstrate that replacing MongoDB's built-in replication system with Mojim improves MongoDB's performance by 3.4 to 4X.

References

  1. Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken, John R. Douceur, Jon Howell, Jacob R. Lorch, Marvin Theimer, and Roger P. Wattenhofer. FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI '02), Boston, Massachusetts, December 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Peter A. Alsberg and John D. Day. A principle for resilient sharing of distributed resources. In Proceedings of the 2nd International Conference on Software Engineering (ICSE '76), San Francisco, California, October 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Katelin Bailey, Luis Ceze, Steven D. Gribble, and Henry M. Levy. Operating system implications of fast, cheap, non- volatile memory. In Proceedings of the 13th USENIX Conference on Hot Topics in Operating Systemsi (HotOS '13), Napa, California, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Mike Burrows. The chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06), Seattle, Washington, November 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, Jaidev Haridas, Chakravarthy Uddaraju, Hemal Khatri, Andrew Edwards, Vaman Bedekar, Shane Mainali, Rafay Abbasi, Arpit Agarwal, Mian Fahim ul Haq, Muhammad Ikram ul Haq, Deepali Bhardwaj, Sowmya Dayanand, Anitha Adusumilli, Marvin McNett, Sriram Sankaran, Kavitha Manivannan, and Leonidas Rigas. Windows azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), Cascais, Portugal, October 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mosharaf Chowdhury, Srikanth Kandula, and Ion Stoica. Leveraging endpoint flexibility in data-intensive clusters. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM (SIGCOMM '13), Hong Kong, China, August 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek, John Kubiatowicz, and Robert Morris. Efficient replica maintenance for distributed storage systems. In Proceedings of the 3rd Symposium on Networked Systems Design and Implementation (NSDI '06), San Jose, California, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. Nv-heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '11), New York, New York, March 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Doug Burger, Benjamin C. Lee, and Derrick Coetzee. Better i/o through byte-addressable, persistent memory. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP '09), Big Sky, Montana, October 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10), New York, New York, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon's Highly Available Key-Value Store. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP '07), Stevenson, Washington, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. System software for persistent memory. In Proceedings of the EuroSys Conference (EuroSys '14), Amsterdam, The Netherlands, April 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. EMC Corporation. EMC VNXe High Availability. https://www.emc.com/collateral/hardware/white-papers/h8276-emc-vnxe-high-availability-wp.pdf.Google ScholarGoogle Scholar
  14. Daniel Ford, Franc ?ois Labelle, Florentina I. Popovic i, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. Availability in Globally Distributed Storage Systems. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI '10), Vancouver, Canada, December 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP '03), Bolton Landing, New York, October 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Google Inc. Google Sparse Hash. http://goog-sparsehash.sourceforge.net.Google ScholarGoogle Scholar
  17. Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. The dangers of replication and a solution. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD '96), New York, New York, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lisa Hellerstein, Garth A. Gibson, Richard M. Karp, Randy H. Katz, and David A. Patterson. Coding Techniques for Handling Failures in Large Disk Arrays. Algorithmica, 12(2):182--208, August 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hewlett Packard. HP NonStop operating system. http://h17007.www1.hp.com/us/en/enterprise/servers/integrity/nonstop/nonstop-os.aspx.Google ScholarGoogle Scholar
  20. M Hosomi, H Yamagishi, T Yamamoto, K Bessho, Y Higo, K Yamane, H Yamada, M Shoji, H Hachino, C Fukumoto, et al. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-ram. In Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, pages 459--462, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  21. Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. Erasure coding in windows azure storage. In Proceedings of the USENIX Annual Technical Conference (USENIX '12), Boston, Massachusetts, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. Zookeeper: Wait-free coordination for internet-scale systems. In Proceedings of the USENIX Annual Technical Conference (USENIX '10), Boston, Massachusetts, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Intel. Add Support for New Persistent Memory Instructions. http://www.lwn.net/Articles/619851.Google ScholarGoogle Scholar
  24. Intel. Intel 64 and IA-32 Architectures Software Developer's Manual. http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.Google ScholarGoogle Scholar
  25. Engin Ipek, Jeremy Condit, Edmund B. Nightingale, Doug Burger, and Thomas Moscibroda. Dynamically replicated memory: Building reliable systems from nanoscale resistive memories. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV), Pittsburgh, Pennsylvania, March 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. James Pinkerton. The Future of Computing: The Convergence of Memory and Storage through Non-Volatile Memory (NVM). Storage Industry Summit, San Jose, California, Jan 2014.Google ScholarGoogle Scholar
  27. Brian G Johnson and Charles H Dennison. Phase change memory, September 2004. US Patent 6,791,102.Google ScholarGoogle Scholar
  28. Brent ByungHoon Kang, Robert Wilensky, and John Kubiatowicz. The hash history approach for reconciling mutual inconsistency. In Proceedings of the 23rd International Conference on Distributed Computing Systems (ICDCS '03), Providence, Rhode Island, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. John Kubiatowicz, David Bindel, Patrick Eaton, Yan Chen, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Westley Weimer, Chris Wells, Hakim Weatherspoon, and Ben Zhao. OceanStore: An Architecture for Global-Scale Persistent Storage. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX), Cambridge, Massachusetts, November 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Amit Kumar and Ram Huggahalli. Impact of cache coherence protocols on the processing of network traffic. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '07), Chicago, Illinois, Dec 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Leslie Lamport. Paxos Made Simple. ACM SIGACT News, 32(4):18--25, November 2001.Google ScholarGoogle Scholar
  32. Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09), Austin, Texas, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. Phase change memory architecture and the quest for scalability. Commun. ACM, 53(7):99--106, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Benjamin C Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, and Doug Burger. Phase-change technology and the future of main memory. IEEE micro, 30(1):143, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Myoung-Jae Lee, Chang Bum Lee, Dongsoo Lee, Seung Ryul Lee, Man Chang, Ji Hyun Hur, Young-Bae Kim, Chang-Jung Kim, David H Seo, Sunae Seo, et al. A fast, high-endurance and scalable non-volatile memory device made from asymmetric ta2o5- x/tao2- x bilayer structures. Nature materials, 10(8):625--630, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  36. Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and Michael Walfish. Detecting Failures in Distributed Systems with the Falcon Spy Network. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), Cascais, Portugal, October 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mellanox Technologies. Rdma aware networks programming user manual. http://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf.Google ScholarGoogle Scholar
  38. Micron Technology Inc. P8p parallel phase change memory (pcm). http://www.micron.com/media/Documents/Products/Data%20Sheet/PCM/p8p_parallel_pcm_ds.pdf.Google ScholarGoogle Scholar
  39. Jeffrey C. Mogul, Eduardo Argollo, Mehul Shah, and Paolo Faraboschi. Operating system support for nvm+dram hybrid main memory. In The Twelfth Workshop on Hot Topics in Operating Systems (HotOS XII), Monte Verita, Switzerland, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. MongoDB Inc. MongoDB. http://www.mongodb.org/.Google ScholarGoogle Scholar
  41. Iulian Moraru, David G Andersen, Michael Kaminsky, Niraj Tolia, Parthasarathy Ranganathan, and Nathan Binkert. Consistent, durable, and safe memory management for byte- addressable non volatile main memory. In Conference on Timely Results in Operating Systems (TRIOS '13), Farmington, Pennsylvania, November 2013.Google ScholarGoogle Scholar
  42. Suman Nath, Haifeng Yu, Philip B. Gibbons, and Srinivasan Seshan. Subtleties in tolerating correlated failures in wide-area storage systems. In Proceedings of the 3rd Symposium on Networked Systems Design and Implementation (NSDI '06), San Jose, California, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. NetApp Inc. NetApp SnapMirror Data Replication. http://www.netapp.com/us/products/protection-software/snapmirror.aspx.Google ScholarGoogle Scholar
  44. Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. Fast Crash Recovery in RAMCloud. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), Cascais, Portugal, October 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Stan Park, Terence Kelly, and Kai Shen. Failure-atomic msync(): a simple and efficient mechanism for preserving the integrity of durable data. In Proceedings of the EuroSys Conference (EuroSys '13), Prague, Czech Republic, April 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. David Patterson, Garth Gibson, and Randy Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the 1988 ACM SIGMOD Conference on the Management of Data (SIGMOD '88), Chicago, Illinois, June 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Karin Petersen, Mike J. Spreitzer, Douglas B. Terry, Marvin M. Theimer, and Alan J. Demers. Flexible Update Propagation for Weakly Consistent Replication. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP '97), Saint-Malo, France, October 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Moinuddin K Qureshi, Michele M Franceschini, Luis A Lastras-Monta ?no, and John P Karidis. Morphable memory system: a robust architecture for exploiting multi-level phase change memories. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '07), June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09), Austin, Texas, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing (ICS '11), Tucson, Arizona, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz. Pond: The oceanstore prototype. In Proceedings of the 2nd USENIX Symposium on File and Storage Technologies (FAST '03), San Francisco, California, April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Antony Rowstron and Peter Druschel. Storage Management and Caching in PAST, A Large-scale, Persistent Peer-to-peer Storage Utility. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP '01), Banff, Canada, October 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. David Spence, Jon Crowcroft, Steven Hand, and Tim Harris. Location based placement of whole distributed systems. In Proceedings of the 2005 ACM Conference on Emerging Net- work Experiment and Technology (CoNEXT '05), Toulouse, France, October 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Sun Microsystems. Solaris Internals: FileBench. http://filebench.sourceforge.net/.Google ScholarGoogle Scholar
  55. Douglas B. Terry, Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, Marcos K. Aguilera, and Hussam Abu- Libdeh. Consistency-Based Service Level Agreements for Cloud Storage. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP '13), Farmington, Pennsylvania, November 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Robbert van Renesse and Fred B. Schneider. Chain replication for supporting high throughput and availability. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI '04), San Francisco, California, December 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. VMWare Inc. VMware High Availability. http://www.vmware.com/files/pdf/VMware-High-Availability-DS-EN.pdf.Google ScholarGoogle Scholar
  58. Haris Volos, Andres Jaan Tack, and Michael M. Swift. Mnemosyne: Lightweight persistent memory. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '11), New York, New York, March 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Xiaojian Wu and A.L.N. Reddy. Scmfs: A file system for storage class memory. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11), Nov 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. J Joshua Yang, Dmitri B Strukov, and Duncan R Stewart. Memristive devices for computing. Nature nanotechnology, 8(1):13--24, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  61. Ming Zhong, Kai Shen, and Joel Seiferas. Replication degree customization for high availability. In Proceedings of the EuroSys Conference (EuroSys '08), Glasgow, Scotland UK, March 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09), Austin, Texas, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mojim: A Reliable and Highly-Available Non-Volatile Memory System

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!