Abstract
Journaling file systems have been widely adopted to support applications that demand data consistency. However, we observed that the overhead of journaling can cause up to 48.2% performance drop under certain kinds of workloads. On the other hand, the emerging high-performance, byte-addressable Non-volatile Memory (NVM) has the potential to minimize such overhead by being used as the journal device. The traditional journaling mechanism based on block devices is nevertheless unsuitable for NVM due to the write amplification of metadata journal we observed. In this article, we propose a fine-grained metadata journal mechanism to fully utilize the low-latency byte-addressable NVM so that the overhead of journaling can be significantly reduced. Based on the observation that conventional block-based metadata journal contains up to 90% clean metadata that is unnecessary to be journalled, we design a fine-grained journal format for byte-addressable NVM which contains only modified metadata. Moreover, we redesign the process of transaction committing, checkpointing, and recovery in journaling file systems utilizing the new journal format. Therefore, thanks to the reduced amount of ordered writes for journals, the overhead of journaling can be reduced without compromising the file system consistency. To evaluate our fine-grained metadata journaling mechanism, we have implemented a journaling file system prototype based on Ext4 and JBD2 in Linux. Experimental results show that our NVM-based fine-grained metadata journaling is up to 15.8 × faster than the traditional approach under FileBench workloads.
- George Amvrosiadis, Angela Demke Brown, and Ashvin Goel. 2015. Opportunistic storage maintenance. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). Google Scholar
Digital Library
- John Bent, Gary Grider, Brett Kettering, Adam Manzanares, Meghan McClelland, Aaron Torres, and Alfred Torrez. 2012. Storage challenges at los alamos national lab. In Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST’12). Google Scholar
Cross Ref
- Simona Boboila, Youngjae Kim, Sudharshan S. Vazhkudai, Peter Desnoyers, and Galen M. Shipman. 2012. Active flash: Out-of-core data analytics on flash storage. In Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST’12). Google Scholar
Cross Ref
- Cheng Chen, Jun Yang, Qingsong Wei, Chundong Wang, and Mingdi Xue. 2016. Fine-grained metadata journaling on NVM. In Proceedings of the IEEE 32nd Symposium on Mass Storage Systems and Technologies (MSST’16). Google Scholar
Cross Ref
- Jianxi Chen, Qingsong Wei, Cheng Chen, and Lingkun Wu. 2013. FSMAC: A file system metadata accelerator with non-volatile memory. In Proceedings of the IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST’13). Google Scholar
Cross Ref
- Shimin Chen and Qin Jin. 2015. Persistent b+-trees in non-volatile main memory. Proc. VLDB Endowm. 8, 7 (2015), 786--797.Google Scholar
Digital Library
- Leon Chua. 2011. Resistance switching memories are memristors. Appl. Phys. A 102, 4 (2011), 765--783. Google Scholar
Cross Ref
- Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM 22nd Symposium on Operating Systems Principles (SIGOPS’09). 133--146. Google Scholar
Digital Library
- Helen Custer. 1992. Inside Windows NT. Microcomputer Applications. Suisun City, CA, USA.Google Scholar
- Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the 9th European Conference on Computer Systems (EuroSys’14). 15. Google Scholar
Digital Library
- Everspin. 2004. Second generation MRAM: Spin torque technology. Retrieved from http://www.everspin.com/products/second-generation-st-mram.html.Google Scholar
- Daniel Fryer, Mike Qin, Jack Sun, Kah Wai Lee, Angela Demke Brown, and Ashvin Goel. 2014. Checking the integrity of transactional mechanisms. ACM Trans. Stor. (TOS) 10, 4 (2014), 17.Google Scholar
- Daniel Fryer, Kuei Sun, Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Ashvin Goel, and Angela Demke Brown. 2012. Recon: Verifying file system consistency at runtime. ACM Trans. Stor. (TOS) 8, 4 (2012), 15.Google Scholar
- Shen Gao, Jianliang Xu, Bingsheng He, Byron Choi, and Haibo Hu. 2011. PCMLogging: Reducing transaction logging overhead with PCM. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). 2401--2404. DOI:http://dx.doi.org/10.1145/2063576.2063977 Google Scholar
Digital Library
- Intel. 2015. 3D xpoint unveiled, the next breakthrough in memory technology. Retrieved from http://www.intel.com/content/www/us/en/architecture-and-technology/3d-xpoint-unveiled-video.html.Google Scholar
- William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, et al. 2015. BetrFS: A right-optimized write-optimized file system. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 301--315. Google Scholar
Digital Library
- Daeho Jeong, Youngjae Lee, and Jin-Soo Kim. 2015. Boosting quasi-asynchronous I/O for better responsiveness in mobile devices. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 191--202.Google Scholar
Digital Library
- Dongwoo Kang, Seungjae Baek, Jongmoo Choi, Donghee Lee, Sam H. Noh, and Onur Mutlu. 2015. Amnesic cache management for non-volatile memory. In Proceddings of the IEEE 31st Symposium on Mass Storage Systems and Technologies (MSST’15). Google Scholar
Cross Ref
- Jeffrey Katcher. 1997. Postmark: A New File System Benchmark. Technical Report TR3022, Network Appliance.Google Scholar
- Takayuki Kawahara. 2011. Scalable spin-transfer torque ram technology for normally-off computing. IEEE Design Test Comput. 28, 1 (2011), 0052--63.Google Scholar
Digital Library
- Ricardo Koller, Leonardo Marmol, Raju Rangaswami, Swaminathan Sundararaman, Nisha Talagala, and Ming Zhao. 2013. Write policies for host-side flash caches. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 45--58.Google Scholar
Digital Library
- Eunji Lee, Hyokyung Bahn, and Sam H. Noh. 2013. Unioning of the buffer cache and journaling layers with non-volatile memory. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 73--80.Google Scholar
Digital Library
- Eunji Lee, Seunghoon Yoo, Jee-Eun Jang, and Hyokyung Bahn. 2012. Shortcut-JFS: A write efficient journaling file system for phase change memory. In Proceedings of the IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST’12). Google Scholar
Cross Ref
- Junghee Lee, Youngjae Kim, Jongman Kim, and Galen M. Shipman. 2015. Synchronous I/O scheduling of independent write caches for an array of SSDs. Comput. Arch. Lett. 14, 1 (2015), 79--82. Google Scholar
Cross Ref
- Shuangchen Li, Ping Chi, Jishen Zhao, Kwang-Ting Cheng, and Yuan Xie. 2015a. Leveraging nonvolatility for architecture design with emerging NVM. In Proceedings of the IEEE Non-Volatile Memory System and Applications Symposium (NVMSA’15). Google Scholar
Cross Ref
- Zheng Li, Shuangwu Zhang, Jingning Liu, Wei Tong, Yu Hua, Dan Feng, and Chenye Yu. 2015b. A software-defined fusion storage system for PCM and NAND flash. In Proceedings of the IEEE Non-Volatile Memory System and Applications Symposium (NVMSA’15). Google Scholar
Cross Ref
- Fabio Margaglia, Gala Yadgar, Eitan Yaakobi, Yue Li, Assaf Schuster, and Andre Brinkmann. 2016. The devil is in the details: Implementing flash page reuse with WOM codes. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16).Google Scholar
Digital Library
- Leonardo Mármol, Swaminathan Sundararaman, Nisha Talagala, Raju Rangaswami, Sushma Devendrappa, Bharath Ramsundar, and Sriram Ganesan. 2014. NVMKV: A scalable and lightweight flash aware key-value store. In Proceedings of the 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’14).Google Scholar
- Avantika Mathur, Mingming Cao, Suparna Bhattacharya, Andreas Dilger, Alex Tomas, and Laurent Vivier. 2007. The new ext4 filesystem: Current status and future plans. In Proceedings of the Linux Symposium, Vol. 2. Citeseer, 21--33.Google Scholar
- Richard McDougall. 2005. Filebench: Application level file system benchmark. https://www.usenix.org/legacy/events/fast05/filebench-mcdougall-fast05-bof.pdf.Google Scholar
- Dushyanth Narayanan and Orion Hodson. 2012. Whole-system persistence. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 401--410. Google Scholar
Digital Library
- Vijayan Prabhakaran, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2005. Model-based failure analysis of journaling file systems. In IEEE International Conference on Dependable Systems and Networks (DSN’05). Google Scholar
Digital Library
- Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. Proceedings of the ACM Computer Architecture News(SIGARCH’09). Google Scholar
Digital Library
- Simone Raoux, Geoffrey W. Burr, Matthew J. Breitwisch, Charles T. Rettner, Yi-Chou Chen, Robert M. Shelby, Martin Salinga, Daniel Krebs, S.-H. Chen, Hsiang-Lan Lung, and others. 2008. Phase-change random access memory: A scalable technology. IBM J. Res. Dev.t 52, 4.5 (2008), 465--479.Google Scholar
Digital Library
- Pedro Eugenio Rocha and Luis C. E. Bona. 2012. Analyzing the performance of an externally journaled filesystem. In Proceedings of the IEEE Brazilian Symposium on Computing System Engineering (SBESC’12). IEEE, 93--98. Google Scholar
Digital Library
- Ricardo Santana, Raju Rangaswami, Vasily Tarasov, and Dean Hildebrand. 2016. A fast and slippery slope for file systems. ACM SIGOPS Operat. Syst. Rev. 49, 2 (2016), 27--34. Google Scholar
Digital Library
- Jose Santos and Sonny Rao. 2013. Flexible file system benchmark. Retrieved from https://sourceforge.net/projects/ffsb/.Google Scholar
- Priya Sehgal, Sourav Basu, Kiran Srinivasan, and Kaladhar Voruganti. 2015. An empirical study of file systems on NVM. In Proceedings of the IEEE 31st Symposium on Mass Storage Systems and Technologies (MSST’15). Google Scholar
Cross Ref
- Dimitris Skourtis, Dimitris Achlioptas, Noah Watkins, Carlos Maltzahn, and Scott Brandt. 2014. Flash on rails: Consistent flash performance through redundancy. In Proceedings of the 2014 USENIX Annual Technical Conference (ATC’14).Google Scholar
- John D. Strunk. 2012. Hybrid aggregates: Combining SSDs and HDDs in a single storage pool. ACM SIGOPS Operat. Syst. Rev. 46, 3 (2012), 50--56. Google Scholar
Digital Library
- Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. 1996. Scalability in the XFS file system. In Proceeding of the USENIX Annual Technical Conference, Vol. 15.Google Scholar
- Chia-Che Tsai, Yang Zhan, Jayashree Reddy, Yizheng Jiao, Tao Zhang, and Donald E Porter. 2015. How to get more value from your file system directory cache. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). Google Scholar
Digital Library
- Stephen Tweedie. 2000. Ext3, journaling filesystem. In Proceedings of the Ottawa Linux Symposium. 24--29.Google Scholar
- VikingTechnology. 2014. ArxCis-NV (TM) non-volatile DIMM. Retrieved from http://www.vikingtechnology.com/arxcis-nv.Google Scholar
- Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. In ACM SIGARCH Comput. Arch. News, Vol. 39. ACM, 91--104.Google Scholar
Digital Library
- Chundong Wang, Qingsong Wei, Jun Yang, Cheng Chen, and Mingdi Xue. 2015. How to be consistent with persistent memory? An evaluation approach. In Proceedings of the 10th IEEE International Conference on Networking, Architecture, and Storage (NAS’15). 186--194. Google Scholar
Cross Ref
- Qingsong Wei, Jianxi Chen, and Cheng Chen. 2015. Accelerating file system metadata access with byte-addressable nonvolatile memory. ACM Trans. Stor. (TOS) 11, 3 (2015), 12.Google Scholar
- Zev Weiss, Sriram Subramanian, Swaminathan Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2015. ANViL: Advanced virtualization for modern non-volatile memory devices. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 111--118.Google Scholar
Digital Library
- Xiaojian Wu and A. L. Reddy. 2011. SCMFS: A file system for storage class memory. In Proceedings of 2011 International ACM Conference for High Performance Computing, Networking, Storage and Analysis. Google Scholar
Digital Library
- Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: Reducing consistency cost for NVM-based single level systems. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 167--181.Google Scholar
Digital Library
- Lingfang Zeng, Binbing Hou, Dan Feng, and Kenneth B. Kent. 2015. SJM: An SCM-based journaling mechanism with write reduction for file systems. In Proceedings of the 2015 International Workshop on Data-Intensive Scalable Computing Systems. 1. Google Scholar
Digital Library
- Yiying Zhang, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2015. Removing the costs and retaining the benefits of flash-based SSD virtualization with FSDV. In Proceedings of the IEEE 31st Symposium on Mass Storage Systems and Technologies (MSST’15). Google Scholar
Cross Ref
- Yiying Zhang and Steven Swanson. 2015. A study of application performance with non-volatile main memory. In Proceedings of the IEEE 31st Symposium on Mass Storage Systems and Technologies (MSST’15). Google Scholar
Cross Ref
- Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In ACM SIGARCH Comput. Arch. News, Vol. 37. ACM, 14--23. Google Scholar
Digital Library
Index Terms
Optimizing File Systems with Fine-grained Metadata Journaling on Byte-addressable NVM
Recommendations
Improving SSD lifetime with byte-addressable metadata
MEMSYS '17: Proceedings of the International Symposium on Memory SystemsExisting solid state drives (SSDs) provide flash-based out-of-band (OOB) data that can only be updated on a page write. Consequently, the metadata stored in their OOB region lack flexibility due to the idiosyncrasies of flash memory, incurring ...
Spindle: A Write-Optimized NVM Cache for Journaling File System
Network and Parallel ComputingAbstractJournaling techniques are widely employed in modern file systems to guarantee crash consistency. However, journaling usually leads to system performance decrease due to the frequent storage accesses it entails. Architects can utilize emerging non-...
Accelerating File System Metadata Access with Byte-Addressable Nonvolatile Memory
File system performance is dominated by small and frequent metadata access. Metadata is stored as blocks on the hard disk drive. Partial metadata update results in whole-block read or write, which significantly amplifies disk I/O. Furthermore, a huge ...






Comments