skip to main content
research-article

OrcFS: Orchestrated File System for Flash Storage

Published:12 April 2018Publication History
Skip Abstract Section

Abstract

In this work, we develop the Orchestrated File System (OrcFS) for Flash storage. OrcFS vertically integrates the log-structured file system and the Flash-based storage device to eliminate the redundancies across the layers. A few modern file systems adopt sophisticated append-only data structures in an effort to optimize the behavior of the file system with respect to the append-only nature of the Flash memory. While the benefit of adopting an append-only data structure seems fairly promising, it makes the stack of software layers full of unnecessary redundancies, leaving substantial room for improvement. The redundancies include (i) redundant levels of indirection (address translation), (ii) duplicate efforts to reclaim the invalid blocks (i.e., segment cleaning in the file system and garbage collection in the storage device), and (iii) excessive over-provisioning (i.e., separate over-provisioning areas in each layer). OrcFS eliminates these redundancies via distributing the address translation, segment cleaning (or garbage collection), bad block management, and wear-leveling across the layers. Existing solutions suffer from high segment cleaning overhead and cause significant write amplification due to mismatch between the file system block size and the Flash page size. To optimize the I/O stack while avoiding these problems, OrcFS adopts three key technical elements.

First, OrcFS uses disaggregate mapping, whereby it partitions the Flash storage into two areas, managed by a file system and Flash storage, respectively, with different granularity. In OrcFS, the metadata area and data area are maintained by 4Kbyte page granularity and 256Mbyte superblock granularity. The superblock-based storage management aligns the file system section size, which is a unit of segment cleaning, with the superblock size of the underlying Flash storage. It can fully exploit the internal parallelism of the underlying Flash storage, exploiting the sequential workload characteristics of the log-structured file system. Second, OrcFS adopts quasi-preemptive segment cleaning to prohibit the foreground I/O operation from being interfered with by segment cleaning. The latency to reclaim the free space can be prohibitive in OrcFS due to its large file system section size, 256Mbyte. OrcFS effectively addresses this issue via adopting a polling-based segment cleaning scheme. Third, the OrcFS introduces block patching to avoid unnecessary write amplification in the partial page program. OrcFS is the enhancement of the F2FS file system. We develop a prototype OrcFS based on F2FS and server class SSD with modified firmware (Samsung 843TN). OrcFS reduces the device mapping table requirement to 1/465 and 1/4 compared with the page mapping and the smallest mapping scheme known to the public, respectively. Via eliminating the redundancy in the segment cleaning and garbage collection, the OrcFS reduces 1/3 of the write volume under heavy random write workload. OrcFS achieves 56% performance gain against EXT4 in varmail workload.

References

  1. Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark S. Manasse, and Rina Panigrahy. 2008. Design tradeoffs for SSD performance. In Proceedings of the USENIX Annual Technical Conference (ATC). 57--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jens Axboe. 2005. Fio-flexible i/o tester synthetic benchmark. URL https://github. com/axboe/fio (Accessed: 2015-06-13) (2005).Google ScholarGoogle Scholar
  3. Kyle Banker. 2011. MongoDB in Action. Manning Publications Co. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Frank Berry. 2015. Enterprise flash storage: Who’s adopting them and why. Proceedings of the Flash Memory Summit, Santa Clara, CA (2015).Google ScholarGoogle Scholar
  5. Daniel Campello, Hector Lopez, Ricardo Koller, Raju Rangaswami, and Luis Useche. 2015. Non-blocking writes to files. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). 151--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26, 2 (2008), 4--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yuan-Hao Chang, Jen-Wei Hsieh, and Tei-Wei Kuo. 2007. Endurance enhancement of flash-memory storage systems: An efficient static wear leveling design. In Proceedings of the ACM Annual Design Automation Conference. 212--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. ChosunBiz. 2016. http://biz.chosun.com/site/data/html_dir/2016/08/12/2016081202016.html?main_box. (2016).Google ScholarGoogle Scholar
  9. David Chow, Charles Lee, Abraham Ma, Frank Yu, Edward Lee, Ming-Shiang Shen, and others. 2007. Managing bad blocks in various flash memory cells for electronic data flash card. (2007). US Patent No. 11/864,684.Google ScholarGoogle Scholar
  10. Christian Czezatke and M. Anton Ertl. 2000. LinLogFS-a log-structured file system for linux. In Proceedings of the USENIX Annual Technical Conference (ATC). 77--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. John D. Davis, Laura Caulfield, and Steve Swanson. 2013. Flash trends: Challenges and future. In Proceedings of the IEEE Hot Chips 25 Symposium (HCS). IEEE, 1--42.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jörn Engel and Robert Mertens. 2005. LogFS-finally a scalable flash file system. In Proceedings of the 12th International Linux System Technology Conference.Google ScholarGoogle Scholar
  13. f2fs-tools. 2012. Formatting Tools for Flash-Friendly File System. http://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs-tools.git. (2012).Google ScholarGoogle Scholar
  14. S. Ghemawat and J. Dean. 2014. LevelDB, A fast and lightweight key/value database library by Google. (2014).Google ScholarGoogle Scholar
  15. Ayush Gupta, Youngjae Kim, and Bhuvan Urgaonkar. 2009. DFTL a flash translation layer employing demand-based selective caching of page-level address mappings. In Proceedings of the 14th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). 229--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Adrian Hunter. 2008. A brief introduction to the design of UBIFS. In Proceedings of the the Rapport Technique.Google ScholarGoogle Scholar
  17. William K. Josephson, Lars A. Bongo, Kai Li, and David Flynn. 2010. DFS: A file system for virtualized flash storage. ACM Trans. Stor. 6, 14 (2010), 14:1--14:25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jeong-Uk Kang, Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho. 2014. The multi-streamed solid-state drive. In Proceedings of the 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage). 13--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jeong-Uk Kang, Heeseung Jo, Jin-Soo Kim, and Joonwon Lee. 2006. A superblock-based flash translation layer for NAND flash memory. In Proceedings of the the 6th ACM 8 IEEE International Conference on Embedded Software. 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Atsuo Kawaguchi, Shingo Nishioka, and Hiroshi Motoda. 1995. A flash-memory based file system. In Proceedings of the the USENIX Anual Technical Conference (ATC’95). 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Joohyun Kim, Haesung Kim, Seongjin Lee, and Youjip Won. 2010. FTL design for TRIM command. In Proceedings of the the 15th International Workshop on Software Support for Portable Storage. 7--12.Google ScholarGoogle Scholar
  22. Jesung Kim, Jong Min Kim, Sam H. Noh, Sang Lyul Min, and Yookun Cho. 2002. A space-efficient flash translation layer for CompactFlash systems. IEEE Consum. Electron. 48, 2 (2002), 366--375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kingston Technology. 2013. Understanding over-provisioning. (2013).Google ScholarGoogle Scholar
  24. Ryusuke Konishi, Yoshiji Amagai, Koji Sato, Hisashi Hifumi, Seiji Kihara, and Satoshi Moriai. 2006. The Linux implementation of a log-structured file system. ACM SIGOPS Operat. Syst. Rev. 40, 3 (2006), 102--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hunki Kwon, Eunsam Kim, Jongmoo Choi, Donghee Lee, and Sam H. Noh. 2010. Janus-FTL: Finding the optimal point on the spectrum between page and block mapping schemes. In Proceedings of the the ACM International Conference on Embedded Software (EMSOFT’10). 169--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ohhoon Kwon, Jaewoo Lee, and Kern Koh. 2007. EF-greedy: A novel garbage collection policy for flash memory based embedded systems. In Computational Science (ICCS’07). Springer, 913--920. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Avinash Lakshman and Prashant Malik. 2009. Cassandra: Structured storage system on a P2P network. In Proceedings of the the 28th ACM Symposium on Principles of Distributed Computing (PODC’09). 5--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the the USENIX Conference on File and Storage Technologies (FAST’15). 273--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim. 2011. A semi-preemptive garbage collector for solid state drives. In Proceedings of the IEEE Performance Analysis of Systems and Software (ISPASS’11). 12--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sungjin Lee, Ming Liu, Sangwoo Jun, Shuotao Xu, Jihong Kim, and others. 2016. Application-managed flash. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 339--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sungjin Lee, Dongkun Shin, Young-Jin Kim, and Jihong Kim. 2008. LAST: Locality-aware sector translation for NAND flash memory-based storage systems. In Proceedings of the ACM SIGOPS Operating Systems Review, Vol. 42. 36--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sang-Won Lee, Dong-Joo Park, Tae-Sun Chung, Dong-Ho Lee, Sangwon Park, and Ha-Joo Song. 2007. A log buffer-based flash translation layer using fully-associative sector translation. ACM Trans. Embed. Comput. Syst. 6, 3 (July 2007), Article 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. 2011. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Charles Manning. 2010. How YAFFS works. Retrieved April 6, 2010 from https://yaffs.net/documents/how-yaffs-works.Google ScholarGoogle Scholar
  35. Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala, and Raju Rangaswami. 2015. NVMKV: A scalable, lightweight, FTL-aware key-value store. In Proceedings of the USENIX Annual Technical Conference (ATC’15). 207--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lucas Mearian. 2016. SSD prices plummet again, close in on HDDs: Prices dropped by 12 percent in just the last quarter alone. Retrieved from http://www.pcworld.com/article/3040591/storage/ssd-prices-plummet-again-close-in-on-hdds.html.Google ScholarGoogle Scholar
  37. Micron. 2016. Technology Innovation Redefined. Retrieved from https://www.micron.com/∼/media/documents/products/product-flyer/3d_nand_flyer.pdf.Google ScholarGoogle Scholar
  38. Patrick O’ Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’ Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Inform. 33, 4 (1996), 351--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jian Ouyang, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. 2014. SDF: Software-defined flash for web-scale internet storage system. In Proceedings of the 19th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). 471--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. JungWook Park, Gi-Ho Park, Charles Weems, and ShinDug Kim. 2009. Sub-grouped superblock management for high-performance flash storages. IEICE Electron. Express 6, 6 (2009), 297--303.Google ScholarGoogle ScholarCross RefCross Ref
  41. RocksDB. 2014. A persistent key-value store for fast storage environments. Retrieved from http://rocksdb.org/.Google ScholarGoogle Scholar
  42. Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10 (1992), 26--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Samsung. 2015. Next generation Samsung 3bit V-NAND Techonology. Retrieved from http://www.samsung.com/semiconductor/global/file/insight/2015/08/3bit_V-NAND_technology_White_Paper-1.pdf.Google ScholarGoogle Scholar
  44. Samsung Electronics Co. 2014. Over-provisioning: Maximize the lifetime and performance of your SSD with small effect to earn more. Application note. (2014).Google ScholarGoogle Scholar
  45. Mohit Saxena and Michael M. Swift. 2010. FlashVM: Virtual memory management on flash. In Proceedings of the USENIX Annual Technical Conference (ATC’10). 14--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Margo Seltzer, Keith A. Smith, Hari Balakrishnan, Jacqueline Chang, Sara McMains, and Venkata Padmanabhan. 1995. File system logging versus clustering: A performance comparison. In Proceedings of the the USENIX Technical Conference Proceedings. 21--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Frank Shu and Nathan Obr. 2007. Data set management commands proposal for ATA8-ACS2. Management 2 (2007), 1.Google ScholarGoogle Scholar
  48. smartmontools. 2010. smartmontools package. Retrieved from http://sourceforge.net/apps/trac/smartmontools/wiki.Google ScholarGoogle Scholar
  49. Kent Smith. 2011. Garbage collection. In Proceedings of the Flash Memory Summit. 1--9.Google ScholarGoogle Scholar
  50. SSD843Tn. 2014. Samsung, SSD 843tn Specification. Retrieved from http://enterprise.m2m-direct.co.uk/downloads/resources/SAMSUNG%20Channel%20Info%20Memory%2010-14.pdf.Google ScholarGoogle Scholar
  51. StarWind. 2014. Log-Structured File System. Retrieved from https://www.starwindsoftware.com/vm-centric-storage-lsfs.Google ScholarGoogle Scholar
  52. Vasily Tarasov, Erez Zadok, and Spencer Shepler. 2016. Filebench: A flexible framework for file system benchmarking. USENIX Login Mag. 41 (2016).Google ScholarGoogle Scholar
  53. Zev Weiss, Sriram Subramanian, Swaminathan Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2015. ANViL: Advanced virtualization for modern non-volatile memory devices. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 111--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Guanying Wu and Xubin He. 2012. Reducing SSD read latency via NAND flash program and erase suspension. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12). 10--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Andrew A. Chien, and Haryadi S. Gunawi. 2017. Tiny-tail flash: Near-perfect elimination of garbage collection tail latencies in NAND SSDs. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’17). 22:1--22:26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Jingpei Yang, Ned Plasson, Greg Gillis, Nisha Talagala, and Swaminathan Sundararaman. 2014. Dont́ stack your log on my log. In Proceedings of the Interactions of NVM/Flash with Operating Systems and Workloads (INFLOW’14).Google ScholarGoogle Scholar
  57. Yudong Yang, Vishal Misra, and Dan Rubenstein. 2015. On the optimality of greedy garbage collection for SSDs. ACM SIGMETRICS Perform. Eval. Rev. 43, 2 (Sept. 2015), 63--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Jinsoo Yoo, Youjip Won, Joongwoo Hwang, Sooyong Kang, Jongmoo Choil, Sungroh Yoon, and Jaehyuk Cha. 2013. Vssim: Virtual machine based ssd simulator. In Proceedings of the the IEEE Mass Storage Systems and Technologies (MSST’13). 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  59. Jiacheng Zhang, Jiwu Shu, and Youyou Lu. 2016. ParaFS: A log-structured file system to exploit the internal parallelism of flash devices. In Proceedings of the USENIX Annual Technical Conference (ATC’16). 87--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Yiying Zhang, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2015. Removing the costs and retaining the benefits of flash-based SSD virtualization with FSDV. In Proceedings of the Conference on Mass Storage Systems and Technologies (MSST’15). 1650--1665.Google ScholarGoogle Scholar
  61. Yong Zhang and Xue-hong Qiu. 2006. Implementation of JFFS2 file system in embedded linux system. In Proceedings of the Computer Technology and Development, Vol. 4. 48--48.Google ScholarGoogle Scholar
  62. Da Zheng, Randal C. Burns, and Alexander S. Szalay. 2015. Optimize unsynchronized garbage collection in an SSD array. Computing Research Repository, Vol. abs/1506.07566. 1--7.Google ScholarGoogle Scholar

Index Terms

  1. OrcFS: Orchestrated File System for Flash Storage

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 14, Issue 2
          May 2018
          210 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/3208078
          • Editor:
          • Sam H. Noh
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 April 2018
          • Accepted: 1 November 2017
          • Revised: 1 September 2017
          • Received: 1 November 2016
          Published in tos Volume 14, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!