Abstract
Next-generation non-volatile memories (NVMs) promise DRAM-like performance, persistence, and high density. They can attach directly to processors to form non-volatile main memory (NVMM) and offer the opportunity to build very low-latency storage systems. These high-performance storage systems would be especially useful in large-scale data center environments where reliability and availability are critical. However, providing reliability and availability to NVMM is challenging, since the latency of data replication can overwhelm the low latency that NVMM should provide. We propose Mojim, a system that provides the reliability and availability that large-scale storage systems require, while preserving the performance of NVMM. Mojim achieves these goals by using a two-tier architecture in which the primary tier contains a mirrored pair of nodes and the secondary tier contains one or more secondary backup nodes with weakly consistent copies of data. Mojim uses highly-optimized replication protocols, software, and networking stacks to minimize replication costs and expose as much of NVMM?s performance as possible. We evaluate Mojim using raw DRAM as a proxy for NVMM and using an industrial NVMM emulation system. We find that Mojim provides replicated NVMM with similar or even better performance than un-replicated NVMM (reducing latency by 27% to 63% and delivering between 0.4 to 2.7X the throughput). We demonstrate that replacing MongoDB's built-in replication system with Mojim improves MongoDB's performance by 3.4 to 4X.
- Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken, John R. Douceur, Jon Howell, Jacob R. Lorch, Marvin Theimer, and Roger P. Wattenhofer. FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI '02), Boston, Massachusetts, December 2002. Google Scholar
Digital Library
- Peter A. Alsberg and John D. Day. A principle for resilient sharing of distributed resources. In Proceedings of the 2nd International Conference on Software Engineering (ICSE '76), San Francisco, California, October 1976. Google Scholar
Digital Library
- Katelin Bailey, Luis Ceze, Steven D. Gribble, and Henry M. Levy. Operating system implications of fast, cheap, non- volatile memory. In Proceedings of the 13th USENIX Conference on Hot Topics in Operating Systemsi (HotOS '13), Napa, California, May 2011. Google Scholar
Digital Library
- Mike Burrows. The chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06), Seattle, Washington, November 2006. Google Scholar
Digital Library
- Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, Jaidev Haridas, Chakravarthy Uddaraju, Hemal Khatri, Andrew Edwards, Vaman Bedekar, Shane Mainali, Rafay Abbasi, Arpit Agarwal, Mian Fahim ul Haq, Muhammad Ikram ul Haq, Deepali Bhardwaj, Sowmya Dayanand, Anitha Adusumilli, Marvin McNett, Sriram Sankaran, Kavitha Manivannan, and Leonidas Rigas. Windows azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), Cascais, Portugal, October 2011. Google Scholar
Digital Library
- Mosharaf Chowdhury, Srikanth Kandula, and Ion Stoica. Leveraging endpoint flexibility in data-intensive clusters. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM (SIGCOMM '13), Hong Kong, China, August 2013. Google Scholar
Digital Library
- Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek, John Kubiatowicz, and Robert Morris. Efficient replica maintenance for distributed storage systems. In Proceedings of the 3rd Symposium on Networked Systems Design and Implementation (NSDI '06), San Jose, California, May 2006. Google Scholar
Digital Library
- Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. Nv-heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '11), New York, New York, March 2011. Google Scholar
Digital Library
- Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Doug Burger, Benjamin C. Lee, and Derrick Coetzee. Better i/o through byte-addressable, persistent memory. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP '09), Big Sky, Montana, October 2009. Google Scholar
Digital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10), New York, New York, June 2010. Google Scholar
Digital Library
- Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon's Highly Available Key-Value Store. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP '07), Stevenson, Washington, October 2007. Google Scholar
Digital Library
- Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. System software for persistent memory. In Proceedings of the EuroSys Conference (EuroSys '14), Amsterdam, The Netherlands, April 2014. Google Scholar
Digital Library
- EMC Corporation. EMC VNXe High Availability. https://www.emc.com/collateral/hardware/white-papers/h8276-emc-vnxe-high-availability-wp.pdf.Google Scholar
- Daniel Ford, Franc ?ois Labelle, Florentina I. Popovic i, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. Availability in Globally Distributed Storage Systems. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI '10), Vancouver, Canada, December 2010. Google Scholar
Digital Library
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP '03), Bolton Landing, New York, October 2003. Google Scholar
Digital Library
- Google Inc. Google Sparse Hash. http://goog-sparsehash.sourceforge.net.Google Scholar
- Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. The dangers of replication and a solution. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD '96), New York, New York, June 1996. Google Scholar
Digital Library
- Lisa Hellerstein, Garth A. Gibson, Richard M. Karp, Randy H. Katz, and David A. Patterson. Coding Techniques for Handling Failures in Large Disk Arrays. Algorithmica, 12(2):182--208, August 1994.Google Scholar
Digital Library
- Hewlett Packard. HP NonStop operating system. http://h17007.www1.hp.com/us/en/enterprise/servers/integrity/nonstop/nonstop-os.aspx.Google Scholar
- M Hosomi, H Yamagishi, T Yamamoto, K Bessho, Y Higo, K Yamane, H Yamada, M Shoji, H Hachino, C Fukumoto, et al. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-ram. In Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, pages 459--462, 2005.Google Scholar
Cross Ref
- Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. Erasure coding in windows azure storage. In Proceedings of the USENIX Annual Technical Conference (USENIX '12), Boston, Massachusetts, June 2012. Google Scholar
Digital Library
- Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. Zookeeper: Wait-free coordination for internet-scale systems. In Proceedings of the USENIX Annual Technical Conference (USENIX '10), Boston, Massachusetts, June 2010. Google Scholar
Digital Library
- Intel. Add Support for New Persistent Memory Instructions. http://www.lwn.net/Articles/619851.Google Scholar
- Intel. Intel 64 and IA-32 Architectures Software Developer's Manual. http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.Google Scholar
- Engin Ipek, Jeremy Condit, Edmund B. Nightingale, Doug Burger, and Thomas Moscibroda. Dynamically replicated memory: Building reliable systems from nanoscale resistive memories. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV), Pittsburgh, Pennsylvania, March 2010. Google Scholar
Digital Library
- James Pinkerton. The Future of Computing: The Convergence of Memory and Storage through Non-Volatile Memory (NVM). Storage Industry Summit, San Jose, California, Jan 2014.Google Scholar
- Brian G Johnson and Charles H Dennison. Phase change memory, September 2004. US Patent 6,791,102.Google Scholar
- Brent ByungHoon Kang, Robert Wilensky, and John Kubiatowicz. The hash history approach for reconciling mutual inconsistency. In Proceedings of the 23rd International Conference on Distributed Computing Systems (ICDCS '03), Providence, Rhode Island, May 2003. Google Scholar
Digital Library
- John Kubiatowicz, David Bindel, Patrick Eaton, Yan Chen, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Westley Weimer, Chris Wells, Hakim Weatherspoon, and Ben Zhao. OceanStore: An Architecture for Global-Scale Persistent Storage. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX), Cambridge, Massachusetts, November 2000. Google Scholar
Digital Library
- Amit Kumar and Ram Huggahalli. Impact of cache coherence protocols on the processing of network traffic. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '07), Chicago, Illinois, Dec 2007. Google Scholar
Digital Library
- Leslie Lamport. Paxos Made Simple. ACM SIGACT News, 32(4):18--25, November 2001.Google Scholar
- Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09), Austin, Texas, June 2009. Google Scholar
Digital Library
- Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. Phase change memory architecture and the quest for scalability. Commun. ACM, 53(7):99--106, 2010. Google Scholar
Digital Library
- Benjamin C Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, and Doug Burger. Phase-change technology and the future of main memory. IEEE micro, 30(1):143, 2010. Google Scholar
Digital Library
- Myoung-Jae Lee, Chang Bum Lee, Dongsoo Lee, Seung Ryul Lee, Man Chang, Ji Hyun Hur, Young-Bae Kim, Chang-Jung Kim, David H Seo, Sunae Seo, et al. A fast, high-endurance and scalable non-volatile memory device made from asymmetric ta2o5- x/tao2- x bilayer structures. Nature materials, 10(8):625--630, 2011.Google Scholar
Cross Ref
- Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and Michael Walfish. Detecting Failures in Distributed Systems with the Falcon Spy Network. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), Cascais, Portugal, October 2011. Google Scholar
Digital Library
- Mellanox Technologies. Rdma aware networks programming user manual. http://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf.Google Scholar
- Micron Technology Inc. P8p parallel phase change memory (pcm). http://www.micron.com/media/Documents/Products/Data%20Sheet/PCM/p8p_parallel_pcm_ds.pdf.Google Scholar
- Jeffrey C. Mogul, Eduardo Argollo, Mehul Shah, and Paolo Faraboschi. Operating system support for nvm+dram hybrid main memory. In The Twelfth Workshop on Hot Topics in Operating Systems (HotOS XII), Monte Verita, Switzerland, May 2009. Google Scholar
Digital Library
- MongoDB Inc. MongoDB. http://www.mongodb.org/.Google Scholar
- Iulian Moraru, David G Andersen, Michael Kaminsky, Niraj Tolia, Parthasarathy Ranganathan, and Nathan Binkert. Consistent, durable, and safe memory management for byte- addressable non volatile main memory. In Conference on Timely Results in Operating Systems (TRIOS '13), Farmington, Pennsylvania, November 2013.Google Scholar
- Suman Nath, Haifeng Yu, Philip B. Gibbons, and Srinivasan Seshan. Subtleties in tolerating correlated failures in wide-area storage systems. In Proceedings of the 3rd Symposium on Networked Systems Design and Implementation (NSDI '06), San Jose, California, May 2006. Google Scholar
Digital Library
- NetApp Inc. NetApp SnapMirror Data Replication. http://www.netapp.com/us/products/protection-software/snapmirror.aspx.Google Scholar
- Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. Fast Crash Recovery in RAMCloud. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), Cascais, Portugal, October 2011. Google Scholar
Digital Library
- Stan Park, Terence Kelly, and Kai Shen. Failure-atomic msync(): a simple and efficient mechanism for preserving the integrity of durable data. In Proceedings of the EuroSys Conference (EuroSys '13), Prague, Czech Republic, April 2013. Google Scholar
Digital Library
- David Patterson, Garth Gibson, and Randy Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the 1988 ACM SIGMOD Conference on the Management of Data (SIGMOD '88), Chicago, Illinois, June 1988. Google Scholar
Digital Library
- Karin Petersen, Mike J. Spreitzer, Douglas B. Terry, Marvin M. Theimer, and Alan J. Demers. Flexible Update Propagation for Weakly Consistent Replication. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP '97), Saint-Malo, France, October 1997. Google Scholar
Digital Library
- Moinuddin K Qureshi, Michele M Franceschini, Luis A Lastras-Monta ?no, and John P Karidis. Morphable memory system: a robust architecture for exploiting multi-level phase change memories. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '07), June 2010. Google Scholar
Digital Library
- Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09), Austin, Texas, June 2009. Google Scholar
Digital Library
- Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing (ICS '11), Tucson, Arizona, 2011. Google Scholar
Digital Library
- Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz. Pond: The oceanstore prototype. In Proceedings of the 2nd USENIX Symposium on File and Storage Technologies (FAST '03), San Francisco, California, April 2003. Google Scholar
Digital Library
- Antony Rowstron and Peter Druschel. Storage Management and Caching in PAST, A Large-scale, Persistent Peer-to-peer Storage Utility. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP '01), Banff, Canada, October 2001. Google Scholar
Digital Library
- David Spence, Jon Crowcroft, Steven Hand, and Tim Harris. Location based placement of whole distributed systems. In Proceedings of the 2005 ACM Conference on Emerging Net- work Experiment and Technology (CoNEXT '05), Toulouse, France, October 2005. Google Scholar
Digital Library
- Sun Microsystems. Solaris Internals: FileBench. http://filebench.sourceforge.net/.Google Scholar
- Douglas B. Terry, Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, Marcos K. Aguilera, and Hussam Abu- Libdeh. Consistency-Based Service Level Agreements for Cloud Storage. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP '13), Farmington, Pennsylvania, November 2013. Google Scholar
Digital Library
- Robbert van Renesse and Fred B. Schneider. Chain replication for supporting high throughput and availability. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI '04), San Francisco, California, December 2004. Google Scholar
Digital Library
- VMWare Inc. VMware High Availability. http://www.vmware.com/files/pdf/VMware-High-Availability-DS-EN.pdf.Google Scholar
- Haris Volos, Andres Jaan Tack, and Michael M. Swift. Mnemosyne: Lightweight persistent memory. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '11), New York, New York, March 2011. Google Scholar
Digital Library
- Xiaojian Wu and A.L.N. Reddy. Scmfs: A file system for storage class memory. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11), Nov 2011. Google Scholar
Digital Library
- J Joshua Yang, Dmitri B Strukov, and Duncan R Stewart. Memristive devices for computing. Nature nanotechnology, 8(1):13--24, 2013.Google Scholar
Cross Ref
- Ming Zhong, Kai Shen, and Joel Seiferas. Replication degree customization for high availability. In Proceedings of the EuroSys Conference (EuroSys '08), Glasgow, Scotland UK, March 2008. Google Scholar
Digital Library
- Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09), Austin, Texas, June 2009. Google Scholar
Digital Library
Index Terms
Mojim: A Reliable and Highly-Available Non-Volatile Memory System
Recommendations
Mojim: A Reliable and Highly-Available Non-Volatile Memory System
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating SystemsNext-generation non-volatile memories (NVMs) promise DRAM-like performance, persistence, and high density. They can attach directly to processors to form non-volatile main memory (NVMM) and offer the opportunity to build very low-latency storage ...
Mojim: A Reliable and Highly-Available Non-Volatile Memory System
ASPLOS'15Next-generation non-volatile memories (NVMs) promise DRAM-like performance, persistence, and high density. They can attach directly to processors to form non-volatile main memory (NVMM) and offer the opportunity to build very low-latency storage ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsEmerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...







Comments