Abstract
In this work, we develop the Orchestrated File System (OrcFS) for Flash storage. OrcFS vertically integrates the log-structured file system and the Flash-based storage device to eliminate the redundancies across the layers. A few modern file systems adopt sophisticated append-only data structures in an effort to optimize the behavior of the file system with respect to the append-only nature of the Flash memory. While the benefit of adopting an append-only data structure seems fairly promising, it makes the stack of software layers full of unnecessary redundancies, leaving substantial room for improvement. The redundancies include (i) redundant levels of indirection (address translation), (ii) duplicate efforts to reclaim the invalid blocks (i.e., segment cleaning in the file system and garbage collection in the storage device), and (iii) excessive over-provisioning (i.e., separate over-provisioning areas in each layer). OrcFS eliminates these redundancies via distributing the address translation, segment cleaning (or garbage collection), bad block management, and wear-leveling across the layers. Existing solutions suffer from high segment cleaning overhead and cause significant write amplification due to mismatch between the file system block size and the Flash page size. To optimize the I/O stack while avoiding these problems, OrcFS adopts three key technical elements.
First, OrcFS uses disaggregate mapping, whereby it partitions the Flash storage into two areas, managed by a file system and Flash storage, respectively, with different granularity. In OrcFS, the metadata area and data area are maintained by 4Kbyte page granularity and 256Mbyte superblock granularity. The superblock-based storage management aligns the file system section size, which is a unit of segment cleaning, with the superblock size of the underlying Flash storage. It can fully exploit the internal parallelism of the underlying Flash storage, exploiting the sequential workload characteristics of the log-structured file system. Second, OrcFS adopts quasi-preemptive segment cleaning to prohibit the foreground I/O operation from being interfered with by segment cleaning. The latency to reclaim the free space can be prohibitive in OrcFS due to its large file system section size, 256Mbyte. OrcFS effectively addresses this issue via adopting a polling-based segment cleaning scheme. Third, the OrcFS introduces block patching to avoid unnecessary write amplification in the partial page program. OrcFS is the enhancement of the F2FS file system. We develop a prototype OrcFS based on F2FS and server class SSD with modified firmware (Samsung 843TN). OrcFS reduces the device mapping table requirement to 1/465 and 1/4 compared with the page mapping and the smallest mapping scheme known to the public, respectively. Via eliminating the redundancy in the segment cleaning and garbage collection, the OrcFS reduces 1/3 of the write volume under heavy random write workload. OrcFS achieves 56% performance gain against EXT4 in varmail workload.
- Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark S. Manasse, and Rina Panigrahy. 2008. Design tradeoffs for SSD performance. In Proceedings of the USENIX Annual Technical Conference (ATC). 57--70. Google Scholar
Digital Library
- Jens Axboe. 2005. Fio-flexible i/o tester synthetic benchmark. URL https://github. com/axboe/fio (Accessed: 2015-06-13) (2005).Google Scholar
- Kyle Banker. 2011. MongoDB in Action. Manning Publications Co. Google Scholar
Digital Library
- Frank Berry. 2015. Enterprise flash storage: Who’s adopting them and why. Proceedings of the Flash Memory Summit, Santa Clara, CA (2015).Google Scholar
- Daniel Campello, Hector Lopez, Ricardo Koller, Raju Rangaswami, and Luis Useche. 2015. Non-blocking writes to files. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). 151--165. Google Scholar
Digital Library
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26, 2 (2008), 4--4. Google Scholar
Digital Library
- Yuan-Hao Chang, Jen-Wei Hsieh, and Tei-Wei Kuo. 2007. Endurance enhancement of flash-memory storage systems: An efficient static wear leveling design. In Proceedings of the ACM Annual Design Automation Conference. 212--217. Google Scholar
Digital Library
- ChosunBiz. 2016. http://biz.chosun.com/site/data/html_dir/2016/08/12/2016081202016.html?main_box. (2016).Google Scholar
- David Chow, Charles Lee, Abraham Ma, Frank Yu, Edward Lee, Ming-Shiang Shen, and others. 2007. Managing bad blocks in various flash memory cells for electronic data flash card. (2007). US Patent No. 11/864,684.Google Scholar
- Christian Czezatke and M. Anton Ertl. 2000. LinLogFS-a log-structured file system for linux. In Proceedings of the USENIX Annual Technical Conference (ATC). 77--88. Google Scholar
Digital Library
- John D. Davis, Laura Caulfield, and Steve Swanson. 2013. Flash trends: Challenges and future. In Proceedings of the IEEE Hot Chips 25 Symposium (HCS). IEEE, 1--42.Google Scholar
Cross Ref
- Jörn Engel and Robert Mertens. 2005. LogFS-finally a scalable flash file system. In Proceedings of the 12th International Linux System Technology Conference.Google Scholar
- f2fs-tools. 2012. Formatting Tools for Flash-Friendly File System. http://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs-tools.git. (2012).Google Scholar
- S. Ghemawat and J. Dean. 2014. LevelDB, A fast and lightweight key/value database library by Google. (2014).Google Scholar
- Ayush Gupta, Youngjae Kim, and Bhuvan Urgaonkar. 2009. DFTL a flash translation layer employing demand-based selective caching of page-level address mappings. In Proceedings of the 14th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). 229--240. Google Scholar
Digital Library
- Adrian Hunter. 2008. A brief introduction to the design of UBIFS. In Proceedings of the the Rapport Technique.Google Scholar
- William K. Josephson, Lars A. Bongo, Kai Li, and David Flynn. 2010. DFS: A file system for virtualized flash storage. ACM Trans. Stor. 6, 14 (2010), 14:1--14:25. Google Scholar
Digital Library
- Jeong-Uk Kang, Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho. 2014. The multi-streamed solid-state drive. In Proceedings of the 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage). 13--13. Google Scholar
Digital Library
- Jeong-Uk Kang, Heeseung Jo, Jin-Soo Kim, and Joonwon Lee. 2006. A superblock-based flash translation layer for NAND flash memory. In Proceedings of the the 6th ACM 8 IEEE International Conference on Embedded Software. 161--170. Google Scholar
Digital Library
- Atsuo Kawaguchi, Shingo Nishioka, and Hiroshi Motoda. 1995. A flash-memory based file system. In Proceedings of the the USENIX Anual Technical Conference (ATC’95). 155--164. Google Scholar
Digital Library
- Joohyun Kim, Haesung Kim, Seongjin Lee, and Youjip Won. 2010. FTL design for TRIM command. In Proceedings of the the 15th International Workshop on Software Support for Portable Storage. 7--12.Google Scholar
- Jesung Kim, Jong Min Kim, Sam H. Noh, Sang Lyul Min, and Yookun Cho. 2002. A space-efficient flash translation layer for CompactFlash systems. IEEE Consum. Electron. 48, 2 (2002), 366--375. Google Scholar
Digital Library
- Kingston Technology. 2013. Understanding over-provisioning. (2013).Google Scholar
- Ryusuke Konishi, Yoshiji Amagai, Koji Sato, Hisashi Hifumi, Seiji Kihara, and Satoshi Moriai. 2006. The Linux implementation of a log-structured file system. ACM SIGOPS Operat. Syst. Rev. 40, 3 (2006), 102--107. Google Scholar
Digital Library
- Hunki Kwon, Eunsam Kim, Jongmoo Choi, Donghee Lee, and Sam H. Noh. 2010. Janus-FTL: Finding the optimal point on the spectrum between page and block mapping schemes. In Proceedings of the the ACM International Conference on Embedded Software (EMSOFT’10). 169--178. Google Scholar
Digital Library
- Ohhoon Kwon, Jaewoo Lee, and Kern Koh. 2007. EF-greedy: A novel garbage collection policy for flash memory based embedded systems. In Computational Science (ICCS’07). Springer, 913--920. Google Scholar
Digital Library
- Avinash Lakshman and Prashant Malik. 2009. Cassandra: Structured storage system on a P2P network. In Proceedings of the the 28th ACM Symposium on Principles of Distributed Computing (PODC’09). 5--5. Google Scholar
Digital Library
- Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the the USENIX Conference on File and Storage Technologies (FAST’15). 273--286. Google Scholar
Digital Library
- Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim. 2011. A semi-preemptive garbage collector for solid state drives. In Proceedings of the IEEE Performance Analysis of Systems and Software (ISPASS’11). 12--21. Google Scholar
Digital Library
- Sungjin Lee, Ming Liu, Sangwoo Jun, Shuotao Xu, Jihong Kim, and others. 2016. Application-managed flash. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 339--353. Google Scholar
Digital Library
- Sungjin Lee, Dongkun Shin, Young-Jin Kim, and Jihong Kim. 2008. LAST: Locality-aware sector translation for NAND flash memory-based storage systems. In Proceedings of the ACM SIGOPS Operating Systems Review, Vol. 42. 36--42. Google Scholar
Digital Library
- Sang-Won Lee, Dong-Joo Park, Tae-Sun Chung, Dong-Ho Lee, Sangwon Park, and Ha-Joo Song. 2007. A log buffer-based flash translation layer using fully-associative sector translation. ACM Trans. Embed. Comput. Syst. 6, 3 (July 2007), Article 18. Google Scholar
Digital Library
- Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. 2011. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). 1--13. Google Scholar
Digital Library
- Charles Manning. 2010. How YAFFS works. Retrieved April 6, 2010 from https://yaffs.net/documents/how-yaffs-works.Google Scholar
- Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala, and Raju Rangaswami. 2015. NVMKV: A scalable, lightweight, FTL-aware key-value store. In Proceedings of the USENIX Annual Technical Conference (ATC’15). 207--219. Google Scholar
Digital Library
- Lucas Mearian. 2016. SSD prices plummet again, close in on HDDs: Prices dropped by 12 percent in just the last quarter alone. Retrieved from http://www.pcworld.com/article/3040591/storage/ssd-prices-plummet-again-close-in-on-hdds.html.Google Scholar
- Micron. 2016. Technology Innovation Redefined. Retrieved from https://www.micron.com/∼/media/documents/products/product-flyer/3d_nand_flyer.pdf.Google Scholar
- Patrick O’ Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’ Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Inform. 33, 4 (1996), 351--385. Google Scholar
Digital Library
- Jian Ouyang, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. 2014. SDF: Software-defined flash for web-scale internet storage system. In Proceedings of the 19th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). 471--484. Google Scholar
Digital Library
- JungWook Park, Gi-Ho Park, Charles Weems, and ShinDug Kim. 2009. Sub-grouped superblock management for high-performance flash storages. IEICE Electron. Express 6, 6 (2009), 297--303.Google Scholar
Cross Ref
- RocksDB. 2014. A persistent key-value store for fast storage environments. Retrieved from http://rocksdb.org/.Google Scholar
- Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10 (1992), 26--52. Google Scholar
Digital Library
- Samsung. 2015. Next generation Samsung 3bit V-NAND Techonology. Retrieved from http://www.samsung.com/semiconductor/global/file/insight/2015/08/3bit_V-NAND_technology_White_Paper-1.pdf.Google Scholar
- Samsung Electronics Co. 2014. Over-provisioning: Maximize the lifetime and performance of your SSD with small effect to earn more. Application note. (2014).Google Scholar
- Mohit Saxena and Michael M. Swift. 2010. FlashVM: Virtual memory management on flash. In Proceedings of the USENIX Annual Technical Conference (ATC’10). 14--14. Google Scholar
Digital Library
- Margo Seltzer, Keith A. Smith, Hari Balakrishnan, Jacqueline Chang, Sara McMains, and Venkata Padmanabhan. 1995. File system logging versus clustering: A performance comparison. In Proceedings of the the USENIX Technical Conference Proceedings. 21--21. Google Scholar
Digital Library
- Frank Shu and Nathan Obr. 2007. Data set management commands proposal for ATA8-ACS2. Management 2 (2007), 1.Google Scholar
- smartmontools. 2010. smartmontools package. Retrieved from http://sourceforge.net/apps/trac/smartmontools/wiki.Google Scholar
- Kent Smith. 2011. Garbage collection. In Proceedings of the Flash Memory Summit. 1--9.Google Scholar
- SSD843Tn. 2014. Samsung, SSD 843tn Specification. Retrieved from http://enterprise.m2m-direct.co.uk/downloads/resources/SAMSUNG%20Channel%20Info%20Memory%2010-14.pdf.Google Scholar
- StarWind. 2014. Log-Structured File System. Retrieved from https://www.starwindsoftware.com/vm-centric-storage-lsfs.Google Scholar
- Vasily Tarasov, Erez Zadok, and Spencer Shepler. 2016. Filebench: A flexible framework for file system benchmarking. USENIX Login Mag. 41 (2016).Google Scholar
- Zev Weiss, Sriram Subramanian, Swaminathan Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2015. ANViL: Advanced virtualization for modern non-volatile memory devices. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 111--118. Google Scholar
Digital Library
- Guanying Wu and Xubin He. 2012. Reducing SSD read latency via NAND flash program and erase suspension. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12). 10--10. Google Scholar
Digital Library
- Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Andrew A. Chien, and Haryadi S. Gunawi. 2017. Tiny-tail flash: Near-perfect elimination of garbage collection tail latencies in NAND SSDs. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’17). 22:1--22:26. Google Scholar
Digital Library
- Jingpei Yang, Ned Plasson, Greg Gillis, Nisha Talagala, and Swaminathan Sundararaman. 2014. Dont́ stack your log on my log. In Proceedings of the Interactions of NVM/Flash with Operating Systems and Workloads (INFLOW’14).Google Scholar
- Yudong Yang, Vishal Misra, and Dan Rubenstein. 2015. On the optimality of greedy garbage collection for SSDs. ACM SIGMETRICS Perform. Eval. Rev. 43, 2 (Sept. 2015), 63--65. Google Scholar
Digital Library
- Jinsoo Yoo, Youjip Won, Joongwoo Hwang, Sooyong Kang, Jongmoo Choil, Sungroh Yoon, and Jaehyuk Cha. 2013. Vssim: Virtual machine based ssd simulator. In Proceedings of the the IEEE Mass Storage Systems and Technologies (MSST’13). 1--14.Google Scholar
Cross Ref
- Jiacheng Zhang, Jiwu Shu, and Youyou Lu. 2016. ParaFS: A log-structured file system to exploit the internal parallelism of flash devices. In Proceedings of the USENIX Annual Technical Conference (ATC’16). 87--100. Google Scholar
Digital Library
- Yiying Zhang, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2015. Removing the costs and retaining the benefits of flash-based SSD virtualization with FSDV. In Proceedings of the Conference on Mass Storage Systems and Technologies (MSST’15). 1650--1665.Google Scholar
- Yong Zhang and Xue-hong Qiu. 2006. Implementation of JFFS2 file system in embedded linux system. In Proceedings of the Computer Technology and Development, Vol. 4. 48--48.Google Scholar
- Da Zheng, Randal C. Burns, and Alexander S. Szalay. 2015. Optimize unsynchronized garbage collection in an SSD array. Computing Research Repository, Vol. abs/1506.07566. 1--7.Google Scholar
Index Terms
OrcFS: Orchestrated File System for Flash Storage
Recommendations
The Zebra striped network file system
Zebra is a network file system that increases throughput by striping the file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a single stream, which it then stripes using an ...
Improving the performance of log-structured file systems with adaptive block rearrangement
SAC '07: Proceedings of the 2007 ACM symposium on Applied computingLog-Structured File System (LFS) is famous for its optimization for write performance. Because of its append-only nature, garbage collection is needed to reclaim the space occupied by the obsolete data. The cleaning overhead could significantly decrease ...
Semantic-Aware Hot Data Selection Policy for Flash File System in Android-Based Smartphones
ICPADS '13: Proceedings of the 2013 International Conference on Parallel and Distributed SystemsFlash memory has different characteristics from traditional hard disk drives. Therefore, the traditional file systems such as EXT4 are not well-optimized for flash memory storage. Recently, a flash memory-aware file system, called F2FS, is announced, ...






Comments