Abstract
In high-performance computing (HPC), data and metadata are stored on special server nodes and client applications access the servers’ data and metadata through a network, which induces network latencies and resource contention. These server nodes are typically equipped with (slow) magnetic disks, while the client nodes store temporary data on fast SSDs or even on non-volatile main memory (NVMM). Therefore, the full potential of parallel file systems can only be reached if fast client side storage devices are included into the overall storage architecture.
In this article, we propose an NVMM-based hierarchical persistent client cache for the Lustre file system (NVMM-LPCC for short). NVMM-LPCC implements two caching modes: a read and write mode (RW-NVMM-LPCC for short) and a read only mode (RO-NVMM-LPCC for short). NVMM-LPCC integrates with the Lustre Hierarchical Storage Management (HSM) solution and the Lustre layout lock mechanism to provide consistent persistent caching services for I/O applications running on client nodes, meanwhile maintaining a global unified namespace of the entire Lustre file system. The evaluation results presented in this article show that NVMM-LPCC can increase the average read throughput by up to 35.80 times and the average write throughput by up to 9.83 times compared with the native Lustre system, while providing excellent scalability.
- Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. Proc. IEEE 98, 12 (2010), 2237--2251. DOI:https://doi.org/10.1109/JPROC.2010.2070830Google Scholar
- Jens Axboe. 2019. fio: Flexible I/O Tester. Retrieved from git://git.kernel.dk/fio.git.Google Scholar
- Francieli Zanon Boito, Eduardo C. Inacio, Jean Luca Bez, Philippe O. A. Navaux, Mario A. R. Dantas, and Yves Denneulin. 2018. A checkpoint of research on parallel I/O for high-performance computing. ACM Comput. Surv. 51, 2 (2018), 23:1–23:35. DOI:https://doi.org/10.1145/3152891 Google Scholar
Digital Library
- Peter Braam. 2005. The Lustre storage architecture. CoRR abs/1903.01955. http://arxiv.org/abs/1903.01955Google Scholar
- André Brinkmann, Kathryn Mohror, Weikuan Yu, Philip H. Carns, Toni Cortes, Scott Klasky, Alberto Miranda, Franz-Josef Pfreundt, Robert B. Ross, and Marc-Andre Vef. 2020. Ad hoc file systems for high-performance computing. J. Comput. Sci. Technol. 35, 1 (2020), 4--26. DOI:https://doi.org/10.1007/s11390-020-9801-1Google Scholar
Cross Ref
- Youmin Chen, Jiwu Shu, Jiaxin Ou, and Youyou Lu. 2018. HiNFS: A persistent memory file system with both buffering and direct-access. ACM Transactions on Storage (ToS) 14, 1 (2018), 4:1–4:30. DOI:https://doi.org/10.1145/3204454 Google Scholar
Digital Library
- Giuseppe Congiu, Sai Narasimhamurthy, Tim Süß, and André Brinkmann. 2016. Improving collective I/O performance using non-volatile memory devices. In Proceedings of the 2016 IEEE International Conference on Cluster Computing (CLUSTER), Taipei, Taiwan, September 12–16. 120--129. DOI:https://doi.org/10.1109/CLUSTER.2016.37Google Scholar
Cross Ref
- Mingkai Dong, Heng Bu, Jifei Yi, Benchao Dong, and Haibo Chen. 2019. Performance and protection in the ZoFS user-space NVM file system. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), Huntsville, ON, Canada, October 27–30. 478--493. DOI:https://doi.org/10.1145/3341301.3359637 Google Scholar
Digital Library
- Subramanya R. Dulloor, Sanjay Kumar, Anil S. Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the 9th Eurosys Conference 2014, EuroSys 2014, Amsterdam, The Netherlands, April 13–16. 15:1–15:15. DOI:https://doi.org/10.1145/2592798.2592814 Google Scholar
Digital Library
- Marc Eshel, Roger L. Haskin, Dean Hildebrand, Manoj Naik, Frank B. Schmuck, and Renu Tewari. 2010. Panache: A parallel file system cache for global file access. In Proceedings of the 8th USENIX Conference on File and Storage Technologies, San Jose, CA, February 23–26. 155--168. Google Scholar
Digital Library
- Herodotos Herodotou and Elena Kakoulli. 2019. Automating distributed tiered storage management in cluster computing. In Proceedings of the VLDB Endowment 13, 1 (Sept. 2019), 43--56. DOI:https://doi.org/10.14778/3357377.3357381 Google Scholar
Digital Library
- Morteza Hoseinzadeh. 2019. A survey on tiering and caching in high-performance storage systems. CoRR abs/1904.11560 (2019).Google Scholar
- David Howells. 2006. Fs-cache: A network filesystem caching facility. In Proceedings of the Linux Symposium, Ottawa, Ontario, Canada. 427--440.Google Scholar
- Jian Huang, Karsten Schwan, and Moinuddin K. Qureshi. 2014. NVRAM-aware logging in transaction systems. In Proceedings of the VLDB Endowment 8, 4 (2014), 389--400. DOI:https://doi.org/10.14778/2735496.2735502 Google Scholar
Digital Library
- Nusrat Sharmin Islam, Md. Wasi-ur-Rahman, Xiaoyi Lu, and Dhabaleswar K. Panda. 2016. High performance design for HDFS with byte-addressability of NVM and RDMA. In Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, Istanbul, Turkey, June 1–3. 8:1–8:14. DOI:https://doi.org/10.1145/2925426.2926290 Google Scholar
Digital Library
- Adrian Jackson, Michèle Weiland, Mark Parsons, and Bernhard Homoelle. 2018. Architectures for high performance computing and data systems using byte-addressable persistent memory. CoRR abs/1805.10041.Google Scholar
- Krish K. R., Bharti Wadhwa, M. Safdar Iqbal, M. Mustafa Rafique, and Ali Raza Butt. 2016. On efficient hierarchical storage for big data processing. In Proceedings of the IEEE/ACM 16th International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Cartagena, Colombia, May 16–19. 403--408. DOI:https://doi.org/10.1109/CCGrid.2016.61 Google Scholar
Digital Library
- Rohan Kadekodi, Se Kwon Lee, Sanidhya Kashyap, Taesoo Kim, Aasheesh Kolli, and Vijay Chidambaram. 2019. SplitFS: Reducing software overhead in file systems for persistent memory. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), Huntsville, ON, Canada, October 27–30. 494--508. DOI:https://doi.org/10.1145/3341301.3359631 Google Scholar
Digital Library
- Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA), June 20–24, Austin, TX. 2--13. DOI:https://doi.org/10.1145/1555754.1555758 Google Scholar
Digital Library
- Ning Liu, Jason Cope, Philip H. Carns, Christopher D. Carothers, Robert B. Ross, Gary Grider, Adam Crume, and Carlos Maltzahn. 2012. On the role of burst buffers in leadership-class storage systems. In Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), April 16–20, Asilomar Conference Grounds, Pacific Grove, CA. 1--11. DOI:https://doi.org/10.1109/MSST.2012.6232369Google Scholar
Cross Ref
- Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. 2017. Octopus: An RDMA-enabled distributed persistent memory file system. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC), Santa Clara, CA, July 12–14. 773--785. Google Scholar
Digital Library
- Amirsaman Memaripour and Steven Swanson. 2018. Breeze: User-level access to non-volatile main memories for legacy software. In Proceedings of the 36th IEEE International Conference on Computer Design (ICCD), Orlando, FL, October 7–10. 413--422. DOI:https://doi.org/10.1109/ICCD.2018.00069Google Scholar
Cross Ref
- Thomas Mikolajick, Christine Dehm, Walter Hartner, Ivan Kasko, M. J. Kastner, Nicolas Nagel, Manfred Moert, and Carlos Mazure. 2001. FeRAM technology for high density applications. Microelectron. Reliab. 41, 7 (2001), 947--950. DOI:https://doi.org/10.1016/S0026-2714(01)00049-XGoogle Scholar
Cross Ref
- Yingjin Qian, Xi Li, Shuichi Ihara, Andreas Dilger, Carlos Thomaz, Shilong Wang, Wen Cheng, Chunyan Li, Lingfang Zeng, Fang Wang, Dan Feng, Tim Süß, and André Brinkmann. 2019. LPCC: Hierarchical persistent client caching for lustre. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019, Denver, Colorado, November 17–19. 88:1–88:14. DOI:https://doi.org/10.1145/3295500.3356139 Google Scholar
Digital Library
- Yingjin Qian, Xi Li, Shuichi Ihara, Lingfang Zeng, Jürgen Kaiser, Tim Süß, and André Brinkmann. 2017. A configurable rule based classful token bucket filter network request scheduler for the lustre file system. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Denver, CO, November 12–17. 6:1–6:12. DOI:https://doi.org/10.1145/3126908.3126932 Google Scholar
Digital Library
- Frank B. Schmuck and Roger L. Haskin. 2002. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the FAST ’02 Conference on File and Storage Technologies, January 28–30, Monterey, California. 231--244. Google Scholar
Digital Library
- Hongzhang Shan and John Shalf. 2007. Using IOR to Analyze the I/O Performance for HPC Platforms. In Proceedings of the Cray Users Group Meeting (CUG'07), Seattle, Washington, may 7-10. https://crd.lbl.gov/assets/pubs_presos/CDS/ATG/cug07shan.pdf.Google Scholar
- Yizhou Shan, Shin-Yeh Tsai, and Yiying Zhang. 2017. Distributed shared persistent memory. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC), Santa Clara, CA, September 24–27. 323--337. DOI:https://doi.org/10.1145/3127479.3128610 Google Scholar
Digital Library
- Liu Shi, Zhenjun Liu, and Lu Xu. 2012. BWCC: A FS-cache based cooperative caching system for network storage system. In Proceedings of the IEEE International Conference on Cluster Computing. IEEE. DOI:https://doi.org/10.1109/cluster.2012.41 Google Scholar
Digital Library
- Vasily Tarasov, Erez Zadok, and Spencer Shepler. 2016. Filebench: A flexible framework for file system benchmarking. ;login: 41, 1 (2016), 6--12.Google Scholar
- Marc-Andre Vef, Nafiseh Moti, Tim Süß, Markus Tacke, Tommaso Tocci, Ramon Nou, Alberto Miranda, Toni Cortes, and André Brinkmann. 2020. GekkoFS—A temporary burst buffer file system for HPC applications. J. Comput. Sci. Technol. 35, 1 (2020), 72--91. DOI:https://doi.org/10.1007/s11390-020-9797-6Google Scholar
Cross Ref
- Marc-Andre Vef, Vasily Tarasov, Dean Hildebrand, and André Brinkmann. 2018. Challenges and solutions for tracing storage systems: A case study with spectrum scale. ACM Trans. Storage (TOS) 14, 2 (2018), 18:1–18:24. DOI:https://doi.org/10.1145/3149376 Google Scholar
Digital Library
- Haris Volos, Sanketh Nalli, Sankaralingam Panneerselvam, Venkatanathan Varadarajan, Prashant Saxena, and Michael M. Swift. 2014. Aerie: Flexible file-system interfaces to storage-class memory. In Proceedings of the 9th Eurosys Conference 2014 (EuroSys), Amsterdam, The Netherlands, April 13–16. 14:1–14:14. DOI:https://doi.org/10.1145/2592798.2592810 Google Scholar
Digital Library
- Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Newport Beach, CA, March 5–11. 91--104. DOI:https://doi.org/10.1145/1950365.1950379 Google Scholar
Digital Library
- Lipeng Wan, Qing Cao, Feiyi Wang, and Sarp Oral. 2017. Optimizing checkpoint data placement with guaranteed burst buffer endurance in large-scale hierarchical storage systems. J. Parallel Distrib. Comput. 100 (2017), 16--29. DOI:https://doi.org/10.1016/j.jpdc.2016.10.002 Google Scholar
Digital Library
- Teng Wang, Kathryn Mohror, Adam Moody, Kento Sato, and Weikuan Yu. 2016. An ephemeral burst-buffer file system for scientific applications. In Proceeding of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE. DOI:https://doi.org/10.1109/sc.2016.68 Google Scholar
Digital Library
- Xue Wei and Li Xi. 2017. LCOC: Lustre Cache on Client based on SSD. Retrieved from http://wiki.lustre.org/Lustre_Administrator_and_Developer_Workshop_2017.Google Scholar
- Matthew Wilcox. 2014. Add Support for NV-DIMMs to Ext4. Retrieved from https://lwn.net/Articles/613384/.Google Scholar
- H.-S. Philip Wong, Simone Raoux, SangBum Kim, Jiale Liang, John P. Reifenberg, Bipin Rajendran, Mehdi Asheghi, and Kenneth E. Goodson. 2010. Phase change memory. In Proceedings of the IEEE 98, 12 (2010), 2201--2227. DOI:https://doi.org/10.1109/JPROC.2010.2070050Google Scholar
Cross Ref
- Li Xi. 2018. Lustre Persistent Client Cache: A client side cache that speeds up applications with certain I/O patterns. Retrieved from http://opensfs.org/lug-2018-agenda/.Google Scholar
- Jian Xu and Steven Swanson. 2016. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, February 22--25. 323--338. Google Scholar
Digital Library
- Jiachen Zhang, Peng Li, Bo Liu, Trent G. Marbach, Xiaoguang Liu, and Gang Wang. 2018. Performance analysis of 3D XPoint SSDs in virtualized and non-virtualized environments. In Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems (ICPADS), Singapore, December 11–13. 51--60. DOI:https://doi.org/10.1109/PADSW.2018.8644859Google Scholar
Cross Ref
Index Terms
NVMM-Oriented Hierarchical Persistent Client Caching for Lustre
Recommendations
LPCC: hierarchical persistent client caching for lustre
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisMost high-performance computing (HPC) clusters use a global parallel file system to enable high data throughput. The parallel file system is typically centralized and its storage media are physically separated from the compute cluster. Compute nodes as ...
Provisioning ZFS Pools On Lustre
PEARC '19: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning)While Lustre's parallelism, performance, and scalability make it desirable as a storage solution for clusters, its limitations prevent it from being suitable as a general purpose storage for all of a cluster's needs. In particular, Lustre's relatively ...
Benchmarking SSD-Based Lustre File System Configurations
XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery EnvironmentDue to recent development efforts, ZFS on Linux is now a viable alternative to the traditional ldiskfs backend used for production Lustre file systems. Certain ZFS features, such as copy-on-write, make it even more appealing for systems utilizing SSD ...






Comments