skip to main content
research-article

NVMM-Oriented Hierarchical Persistent Client Caching for Lustre

Published:18 January 2021Publication History
Skip Abstract Section

Abstract

In high-performance computing (HPC), data and metadata are stored on special server nodes and client applications access the servers’ data and metadata through a network, which induces network latencies and resource contention. These server nodes are typically equipped with (slow) magnetic disks, while the client nodes store temporary data on fast SSDs or even on non-volatile main memory (NVMM). Therefore, the full potential of parallel file systems can only be reached if fast client side storage devices are included into the overall storage architecture.

In this article, we propose an NVMM-based hierarchical persistent client cache for the Lustre file system (NVMM-LPCC for short). NVMM-LPCC implements two caching modes: a read and write mode (RW-NVMM-LPCC for short) and a read only mode (RO-NVMM-LPCC for short). NVMM-LPCC integrates with the Lustre Hierarchical Storage Management (HSM) solution and the Lustre layout lock mechanism to provide consistent persistent caching services for I/O applications running on client nodes, meanwhile maintaining a global unified namespace of the entire Lustre file system. The evaluation results presented in this article show that NVMM-LPCC can increase the average read throughput by up to 35.80 times and the average write throughput by up to 9.83 times compared with the native Lustre system, while providing excellent scalability.

References

  1. Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. Proc. IEEE 98, 12 (2010), 2237--2251. DOI:https://doi.org/10.1109/JPROC.2010.2070830Google ScholarGoogle Scholar
  2. Jens Axboe. 2019. fio: Flexible I/O Tester. Retrieved from git://git.kernel.dk/fio.git.Google ScholarGoogle Scholar
  3. Francieli Zanon Boito, Eduardo C. Inacio, Jean Luca Bez, Philippe O. A. Navaux, Mario A. R. Dantas, and Yves Denneulin. 2018. A checkpoint of research on parallel I/O for high-performance computing. ACM Comput. Surv. 51, 2 (2018), 23:1–23:35. DOI:https://doi.org/10.1145/3152891 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Peter Braam. 2005. The Lustre storage architecture. CoRR abs/1903.01955. http://arxiv.org/abs/1903.01955Google ScholarGoogle Scholar
  5. André Brinkmann, Kathryn Mohror, Weikuan Yu, Philip H. Carns, Toni Cortes, Scott Klasky, Alberto Miranda, Franz-Josef Pfreundt, Robert B. Ross, and Marc-Andre Vef. 2020. Ad hoc file systems for high-performance computing. J. Comput. Sci. Technol. 35, 1 (2020), 4--26. DOI:https://doi.org/10.1007/s11390-020-9801-1Google ScholarGoogle ScholarCross RefCross Ref
  6. Youmin Chen, Jiwu Shu, Jiaxin Ou, and Youyou Lu. 2018. HiNFS: A persistent memory file system with both buffering and direct-access. ACM Transactions on Storage (ToS) 14, 1 (2018), 4:1–4:30. DOI:https://doi.org/10.1145/3204454 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Giuseppe Congiu, Sai Narasimhamurthy, Tim Süß, and André Brinkmann. 2016. Improving collective I/O performance using non-volatile memory devices. In Proceedings of the 2016 IEEE International Conference on Cluster Computing (CLUSTER), Taipei, Taiwan, September 12–16. 120--129. DOI:https://doi.org/10.1109/CLUSTER.2016.37Google ScholarGoogle ScholarCross RefCross Ref
  8. Mingkai Dong, Heng Bu, Jifei Yi, Benchao Dong, and Haibo Chen. 2019. Performance and protection in the ZoFS user-space NVM file system. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), Huntsville, ON, Canada, October 27–30. 478--493. DOI:https://doi.org/10.1145/3341301.3359637 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Subramanya R. Dulloor, Sanjay Kumar, Anil S. Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the 9th Eurosys Conference 2014, EuroSys 2014, Amsterdam, The Netherlands, April 13–16. 15:1–15:15. DOI:https://doi.org/10.1145/2592798.2592814 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Marc Eshel, Roger L. Haskin, Dean Hildebrand, Manoj Naik, Frank B. Schmuck, and Renu Tewari. 2010. Panache: A parallel file system cache for global file access. In Proceedings of the 8th USENIX Conference on File and Storage Technologies, San Jose, CA, February 23–26. 155--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Herodotos Herodotou and Elena Kakoulli. 2019. Automating distributed tiered storage management in cluster computing. In Proceedings of the VLDB Endowment 13, 1 (Sept. 2019), 43--56. DOI:https://doi.org/10.14778/3357377.3357381 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Morteza Hoseinzadeh. 2019. A survey on tiering and caching in high-performance storage systems. CoRR abs/1904.11560 (2019).Google ScholarGoogle Scholar
  13. David Howells. 2006. Fs-cache: A network filesystem caching facility. In Proceedings of the Linux Symposium, Ottawa, Ontario, Canada. 427--440.Google ScholarGoogle Scholar
  14. Jian Huang, Karsten Schwan, and Moinuddin K. Qureshi. 2014. NVRAM-aware logging in transaction systems. In Proceedings of the VLDB Endowment 8, 4 (2014), 389--400. DOI:https://doi.org/10.14778/2735496.2735502 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nusrat Sharmin Islam, Md. Wasi-ur-Rahman, Xiaoyi Lu, and Dhabaleswar K. Panda. 2016. High performance design for HDFS with byte-addressability of NVM and RDMA. In Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, Istanbul, Turkey, June 1–3. 8:1–8:14. DOI:https://doi.org/10.1145/2925426.2926290 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Adrian Jackson, Michèle Weiland, Mark Parsons, and Bernhard Homoelle. 2018. Architectures for high performance computing and data systems using byte-addressable persistent memory. CoRR abs/1805.10041.Google ScholarGoogle Scholar
  17. Krish K. R., Bharti Wadhwa, M. Safdar Iqbal, M. Mustafa Rafique, and Ali Raza Butt. 2016. On efficient hierarchical storage for big data processing. In Proceedings of the IEEE/ACM 16th International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Cartagena, Colombia, May 16–19. 403--408. DOI:https://doi.org/10.1109/CCGrid.2016.61 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rohan Kadekodi, Se Kwon Lee, Sanidhya Kashyap, Taesoo Kim, Aasheesh Kolli, and Vijay Chidambaram. 2019. SplitFS: Reducing software overhead in file systems for persistent memory. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), Huntsville, ON, Canada, October 27–30. 494--508. DOI:https://doi.org/10.1145/3341301.3359631 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA), June 20–24, Austin, TX. 2--13. DOI:https://doi.org/10.1145/1555754.1555758 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ning Liu, Jason Cope, Philip H. Carns, Christopher D. Carothers, Robert B. Ross, Gary Grider, Adam Crume, and Carlos Maltzahn. 2012. On the role of burst buffers in leadership-class storage systems. In Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), April 16–20, Asilomar Conference Grounds, Pacific Grove, CA. 1--11. DOI:https://doi.org/10.1109/MSST.2012.6232369Google ScholarGoogle ScholarCross RefCross Ref
  21. Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. 2017. Octopus: An RDMA-enabled distributed persistent memory file system. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC), Santa Clara, CA, July 12–14. 773--785. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Amirsaman Memaripour and Steven Swanson. 2018. Breeze: User-level access to non-volatile main memories for legacy software. In Proceedings of the 36th IEEE International Conference on Computer Design (ICCD), Orlando, FL, October 7–10. 413--422. DOI:https://doi.org/10.1109/ICCD.2018.00069Google ScholarGoogle ScholarCross RefCross Ref
  23. Thomas Mikolajick, Christine Dehm, Walter Hartner, Ivan Kasko, M. J. Kastner, Nicolas Nagel, Manfred Moert, and Carlos Mazure. 2001. FeRAM technology for high density applications. Microelectron. Reliab. 41, 7 (2001), 947--950. DOI:https://doi.org/10.1016/S0026-2714(01)00049-XGoogle ScholarGoogle ScholarCross RefCross Ref
  24. Yingjin Qian, Xi Li, Shuichi Ihara, Andreas Dilger, Carlos Thomaz, Shilong Wang, Wen Cheng, Chunyan Li, Lingfang Zeng, Fang Wang, Dan Feng, Tim Süß, and André Brinkmann. 2019. LPCC: Hierarchical persistent client caching for lustre. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019, Denver, Colorado, November 17–19. 88:1–88:14. DOI:https://doi.org/10.1145/3295500.3356139 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yingjin Qian, Xi Li, Shuichi Ihara, Lingfang Zeng, Jürgen Kaiser, Tim Süß, and André Brinkmann. 2017. A configurable rule based classful token bucket filter network request scheduler for the lustre file system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Denver, CO, November 12–17. 6:1–6:12. DOI:https://doi.org/10.1145/3126908.3126932 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Frank B. Schmuck and Roger L. Haskin. 2002. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the FAST ’02 Conference on File and Storage Technologies, January 28–30, Monterey, California. 231--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hongzhang Shan and John Shalf. 2007. Using IOR to Analyze the I/O Performance for HPC Platforms. In Proceedings of the Cray Users Group Meeting (CUG'07), Seattle, Washington, may 7-10. https://crd.lbl.gov/assets/pubs_presos/CDS/ATG/cug07shan.pdf.Google ScholarGoogle Scholar
  28. Yizhou Shan, Shin-Yeh Tsai, and Yiying Zhang. 2017. Distributed shared persistent memory. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC), Santa Clara, CA, September 24–27. 323--337. DOI:https://doi.org/10.1145/3127479.3128610 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Liu Shi, Zhenjun Liu, and Lu Xu. 2012. BWCC: A FS-cache based cooperative caching system for network storage system. In Proceedings of the IEEE International Conference on Cluster Computing. IEEE. DOI:https://doi.org/10.1109/cluster.2012.41 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Vasily Tarasov, Erez Zadok, and Spencer Shepler. 2016. Filebench: A flexible framework for file system benchmarking. ;login: 41, 1 (2016), 6--12.Google ScholarGoogle Scholar
  31. Marc-Andre Vef, Nafiseh Moti, Tim Süß, Markus Tacke, Tommaso Tocci, Ramon Nou, Alberto Miranda, Toni Cortes, and André Brinkmann. 2020. GekkoFS—A temporary burst buffer file system for HPC applications. J. Comput. Sci. Technol. 35, 1 (2020), 72--91. DOI:https://doi.org/10.1007/s11390-020-9797-6Google ScholarGoogle ScholarCross RefCross Ref
  32. Marc-Andre Vef, Vasily Tarasov, Dean Hildebrand, and André Brinkmann. 2018. Challenges and solutions for tracing storage systems: A case study with spectrum scale. ACM Trans. Storage (TOS) 14, 2 (2018), 18:1–18:24. DOI:https://doi.org/10.1145/3149376 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Haris Volos, Sanketh Nalli, Sankaralingam Panneerselvam, Venkatanathan Varadarajan, Prashant Saxena, and Michael M. Swift. 2014. Aerie: Flexible file-system interfaces to storage-class memory. In Proceedings of the 9th Eurosys Conference 2014 (EuroSys), Amsterdam, The Netherlands, April 13–16. 14:1–14:14. DOI:https://doi.org/10.1145/2592798.2592810 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Newport Beach, CA, March 5–11. 91--104. DOI:https://doi.org/10.1145/1950365.1950379 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Lipeng Wan, Qing Cao, Feiyi Wang, and Sarp Oral. 2017. Optimizing checkpoint data placement with guaranteed burst buffer endurance in large-scale hierarchical storage systems. J. Parallel Distrib. Comput. 100 (2017), 16--29. DOI:https://doi.org/10.1016/j.jpdc.2016.10.002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Teng Wang, Kathryn Mohror, Adam Moody, Kento Sato, and Weikuan Yu. 2016. An ephemeral burst-buffer file system for scientific applications. In Proceeding of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE. DOI:https://doi.org/10.1109/sc.2016.68 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xue Wei and Li Xi. 2017. LCOC: Lustre Cache on Client based on SSD. Retrieved from http://wiki.lustre.org/Lustre_Administrator_and_Developer_Workshop_2017.Google ScholarGoogle Scholar
  38. Matthew Wilcox. 2014. Add Support for NV-DIMMs to Ext4. Retrieved from https://lwn.net/Articles/613384/.Google ScholarGoogle Scholar
  39. H.-S. Philip Wong, Simone Raoux, SangBum Kim, Jiale Liang, John P. Reifenberg, Bipin Rajendran, Mehdi Asheghi, and Kenneth E. Goodson. 2010. Phase change memory. In Proceedings of the IEEE 98, 12 (2010), 2201--2227. DOI:https://doi.org/10.1109/JPROC.2010.2070050Google ScholarGoogle ScholarCross RefCross Ref
  40. Li Xi. 2018. Lustre Persistent Client Cache: A client side cache that speeds up applications with certain I/O patterns. Retrieved from http://opensfs.org/lug-2018-agenda/.Google ScholarGoogle Scholar
  41. Jian Xu and Steven Swanson. 2016. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, February 22--25. 323--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jiachen Zhang, Peng Li, Bo Liu, Trent G. Marbach, Xiaoguang Liu, and Gang Wang. 2018. Performance analysis of 3D XPoint SSDs in virtualized and non-virtualized environments. In Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems (ICPADS), Singapore, December 11–13. 51--60. DOI:https://doi.org/10.1109/PADSW.2018.8644859Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. NVMM-Oriented Hierarchical Persistent Client Caching for Lustre

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 17, Issue 1
          Special Section on Usenix Fast 2020
          February 2021
          165 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/3446939
          • Editor:
          • Sam H. Noh
          Issue’s Table of Contents

          Copyright © 2021 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 January 2021
          • Accepted: 1 June 2020
          • Revised: 1 March 2020
          • Received: 1 November 2019
          Published in tos Volume 17, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!