Abstract
Traditionally, the only option for developers was to implement file systems (FSs) via drivers within the operating system kernel. However, there exists a growing number of file systems (FSs), notably distributed FSs for the cloud, whose interfaces are implemented solely in user space to (i) isolate FS logic, (ii) take advantage of user space libraries, and/or (iii) for rapid FS prototyping. Common interfaces for implementing FSs in user space exist, but they do not guarantee POSIX compliance in all cases, or suffer from considerable performance penalties due to high amounts of wait context switchs between kernel and user space processes.
We propose DEFUSE: an interface for user space FSs that provides fast accesses while ensuring access correctness and requiring no modifications to applications. DEFUSE: achieves significant performance improvements over existing user space FS interfaces thanks to its novel design that drastically reduces the number of wait context switchs for FS accesses. Additionally, to ensure access correctness, DEFUSE: maintains POSIX compliance for FS accesses thanks to three novel concepts of bypassed file descriptor (FD) lookup, FD stashing, and user space paging. Our evaluation spanning a variety of workloads shows that by reducing the number of wait context switchs per workload from as many as 16,000 or 41,000 with filesystem in user space down to 9 on average, DEFUSE: increases performance 2× over existing interfaces for typical workloads and by as many as 10× in certain instances.
- [1] Personal conversation with David Bonnie, storage tech lead at Los Alamos National Laboratory and co-designer of OrangeFS/PVFS2, in reference to work on MarFS (November 15, 2016).Google Scholar
- [2] Access DBFS Using Local File APIs. Retrieved from https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html#access-dbfs-using-local-file-apis.Google Scholar
- [3] Access DBFS with the Databricks CLI. Retrieved from https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html#access-dbfs-with-the-databricks-cli.Google Scholar
- [4] AccessFS: Permission Filesystem for Linux. Retrieved from http://www.olafdietsche.de/2002/11/07/accessfs-permission-filesystem-linux/.Google Scholar
- [5] . 2012. PUMA: Purdue University Benchmark Suite.Google Scholar
- [6] Alluxio-FUSE. Retrieved from https://github.com/Alluxio/alluxio/tree/master/integration/fuse.Google Scholar
- [7] Amazon S3. Retrieved from https://aws.amazon.com/s3/.Google Scholar
- [8] Amazon S3 FUSE. Retrieved from https://github.com/s3fs-fuse/s3fs-fuse.Google Scholar
- [9] Apache Hadoop 2.4.1—File System Shell Guide. Retrieved from https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#Overview.Google Scholar
- [10] Apache Spark. Retrieved from http://spark.apache.org/.Google Scholar
- [11] . 2015. Spark SQL: Relational data processing in spark. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’15). 1383–1394.Google Scholar
Digital Library
- [12] AVFS—A Virtual File System. Retrieved from http://avf.sourceforge.net/.Google Scholar
- [13] Amazon Web Services SDK for C++. Retrieved from https://aws.amazon.com/sdk-for-cpp/.Google Scholar
- [14] . 2015. Scalable error isolation for distributed systems. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI’15). 605–620.Google Scholar
- [15] . 2009. PLFS: A checkpoint filesystem for parallel applications. In High Performance Computing, Networking, Storage and Analysis (SC’09). 1–12.Google Scholar
- [16] . 2019. Extension framework for file systems in user space. In Proceedings of the USENIX Annual Technical Conference (ATC’19). 121–134.Google Scholar
- [17] . 2013. Small File Aggregation with PLFS.
Technical Report . Los Alamos National Laboratory.Google ScholarCross Ref
- [18] . 2008. HDFS architecture guide. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.Google Scholar
- [19] . 2018. SGX-FS: Hardening a file system in user-space with intel SGX. In Proceedings of the IEEE International Conference on Cloud Computing Technology and Science (CloudCom’18). 67–72.Google Scholar
Cross Ref
- [20] . 2020. FluidMem: Full, flexible, and fast memory disaggregation for the cloud. In Proceedings of the IEEE 40th International Conference on Distributed Computing Systems (ICDCS’20). 665–677.
DOI: Google ScholarCross Ref
- [21] . 2009. Small-file access in parallel file systems. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS’09). 1–11.Google Scholar
Digital Library
- [22] . 1995. Overview of the MPI-IO parallel I/O interface. In Proceedings of the Workshop on Input/Output in Parallel and Distributed Systems (IPPS ’95). 1–15.Google Scholar
- [23] Databricks File System. Retrieved from https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html/.Google Scholar
- [24] . 2007. GANESHA, A multi-usage with large cache NFSv4 server. In Proceedings of the Linux Symposium. 113.Google Scholar
- [25] EMACS Hooks. Retrieved from https://www.gnu.org/software/emacs/manual/html_node/emacs/Hooks.html.Google Scholar
- [26] . 2018. Flare: Optimizing apache spark with native compilation for scale-up architectures and medium-size data. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). 799–815.Google Scholar
- [27] Ext4 (and Ext2/Ext3) Wiki. Retrieved from https://ext4.wiki.kernel.org/.Google Scholar
- [28] FAT filesystem library in R6RS Scheme. Retrieved from https://gitlab.com/weinholt/fs-fatfs.Google Scholar
- [29] FUSE Example fusexmp. Retrieved from https://github.com/fuse4x/fuse/blob/master/example/fusexmp.c.Google Scholar
- [30] FUSE Google Cloud Storage. Retrieved from https://github.com/GoogleCloudPlatform/gcsfuse/.Google Scholar
- [31] FUSE High Level Interface. Retrieved from https://github.com/libfuse/libfuse/blob/master/include/fuse.h.Google Scholar
- [32] GDB: The GNU Project Debugger. Retrieved from https://www.gnu.org/software/gdb/.Google Scholar
- [33] GlusterFS—A Scale-Out Network-Attached Storage File System. Retrieved from https://www.gluster.org/.Google Scholar
- [34] Google Cloud Storage. Retrieved from https://cloud.google.com/storage/.Google Scholar
- [35] gsutil tool. Retrieved from https://cloud.google.com/storage/docs/gsutil.Google Scholar
- [36] . 2007. XtreemFS: A case for object-based storage in grid data management. In Proceedings of the 3rd VLDB Workshop on Data Management in Grids, co-located with VLDB.Google Scholar
- [37] IBM Spectrum Scale—Formerly General Parallel File System (GPFS). Retrieved from https://www.ibm.com/us-en/marketplace/scale-out-file-and-object-storage.Google Scholar
- [38] . Spring 2017. MarFS, a near-POSIX interface to cloud objects. USENIX Mag. (2017).Google Scholar
- [39] IOzone Filesystem Benchmark. Retrieved from http://iozone.org/.Google Scholar
- [40] . 2012. Optimizing local file accesses for FUSE-based distributed storage. In Proceedings of the High Performance Computing, Networking, Storage and Analysis (SC’12). 760–765.Google Scholar
Digital Library
- [41] . 2015. BetrFS: A right-optimized write-optimized file system. In Proceedings of the 13rd USENIX Conference on File and Storage Technologies (FAST’15). 301–315.Google Scholar
- [42] Journaled File System Technology for Linux. Retrieved from http://jfs.sourceforge.net/.Google Scholar
- [43] . 2019. SplitFS: Reducing software overhead in file systems for persistent memory. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP’19). 494–508.Google Scholar
Digital Library
- [44] . 2007. ReFUSE: Userspace FUSE reimplementation using PUFFS. In Proceedings of the 6th European BSD Conference (EuroBSDCon’07).Google Scholar
- [45] . 2005. User-level device drivers: Achieved performance. J. Comput. Sci. Technol. 20, 5 (
September 2005), 654–664.Google ScholarCross Ref
- [46] . 2018. Alluxio: A Virtual Distributed File System. Ph.D. Dissertation. UC Berkeley.Google Scholar
- [47] LIB HDFS. Retrieved from https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/LibHdfs.html.Google Scholar
- [48] libfuse—Filesystem in Userspace. Retrieved from https://github.com/libfuse/libfuse.Google Scholar
- [49] libfuse—SSHFS implementation. Retrieved from https://github.com/libfuse/sshfs.Google Scholar
- [50] Linux Manual–bpf–Perform a Command on an Extended BPF Map or Program. Retrieved from http://man7.org/linux/man-pages/man2/bpf.2.html.Google Scholar
- [51] Linux Manual—Overview, Conventions, and Miscellaneous: libc. Retrieved from http://man7.org/linux/man-pages/man7/libc.7.html.Google Scholar
- [52] Linux User Manual—Time Command. Retrieved from https://linux.die.net/man/1/time.Google Scholar
- [53] Linux Virtual File System. Retrieved from http://www.tldp.org/LDP/tlk/fs/filesystem.html.Google Scholar
- [54] Lustre Parallel File System. Retrieved from http://lustre.org/.Google Scholar
- [55] . 2000. Microsoft Extensible Firmware Initiative FAT32 File System Specification.
Technical Report . Microsoft Corporation.Google Scholar - [56] Moose File System (MooseFS). Retrieved from https://moosefs.com/index.html.Google Scholar
- [57] Mountable HDFS. Retrieved from https://wiki.apache.org/hadoop/MountableHDFS.Google Scholar
- [58] Message Passing Interface Forum. 2021. MPI: A Message-Passing Interface Standard Version 4.0.Google Scholar
- [59] . 2004. A versatile and user-oriented versioning file system. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04), Vol. 4. 115–128.Google Scholar
- [60] . 2010. User space storage system stack modules with file level control. In Proceedings of the 12th Annual Linux Symposium in Ottawa. 189–196.Google Scholar
- [61] Native HDFS FUSE. Retrieved from https://github.com/remis-thoughts/native-hdfs-fuse.Google Scholar
- [62] NFS Ganesha—File System Abstraction Layer (FSAL). Retrieved from https://github.com/nfs-ganesha/nfs-ganesha/wiki/Fsalsupport.Google Scholar
- [63] ObjectiveFS. Retrieved from https://objectivefs.com/.Google Scholar
- [64] OrangeFS Direct Interface. Retrieved from http://docs.orangefs.com/v_2_9/Direct_Interface.htm.Google Scholar
- [65] . 2015. Optimizing FUSE for cloud storage. In Linux Vault.Google Scholar
- [66] . 2019. UMap: Enabling application-driven optimizations for page management. In Proceedings of the IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC’19). IEEE, 71–78.Google Scholar
Cross Ref
- [67] . 2019. Experiences with fuse in the real world. In Proceedings of the Linux Storage and Filesystems Conference (VAULT’19).Google Scholar
- [68] . 2013. A 1 PB/s file system to checkpoint three million MPI tasks. In Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing (HPDC’13). 143–154.Google Scholar
Digital Library
- [69] react hooks. Retrieved from https://reactjs.org/docs/hooks-intro.html.Google Scholar
- [70] Anass Sebbar, Karim Zkik, Youssef Baddi, Mohammed Boulmalf, and Mohamed Dafir Ech-Cherif El Kettani. 2020. MitM detection and defense mechanism CBNA-RF based on machine learning for large-scale SDN context. Journal of Ambient Intelligence and Humanized Computing 11, 12 (2020), 5875–5894.Google Scholar
- [71] . 2000. PVFS: A parallel file system for Linux clusters. In Proceedings of the 4th Annual Linux Showcase and Conference. 391–430.Google Scholar
Digital Library
- [72] . 1990. Coda: A highly available file system for a distributed workstation environment. IEEE Trans. Comput. 39, 4 (1990), 447–459.Google Scholar
Digital Library
- [73] . 2018. Linux literally loses its Lustre—HPC filesystem ditched in new kernel. Retrieved from https://www.theregister.co.uk/2018/06/18/linux_4_18_rc_1_removes_lustre_filesystem/.Google Scholar
- [74] . 2010. The Hadoop distributed file system. In Proceedings of the 26th IEEE Symposium on Massive Storage Systems and Technologies (MSST’10). 1–10.Google Scholar
Digital Library
- [75] Solucorp VirtualFS. Retrieved from http://www.solucorp.qc.ca/virtualfs/.Google Scholar
- [76] Spark PySpark Daemon. Retrieved from https://github.com/apache/spark/blob/5264164a67df98b73facae207eda12ee 133be7d/python/pyspark/daemon.py.Google Scholar
- [77] . 2007. Rapid file system development using ptrace. In Workshop on Experimental Computer Science, Part of ACM FCRC. 22.Google Scholar
- [78] Vassos Hadzilacos and Sam Toueg. 1994. A Modular Approach to Fault-Tolerant Broadcasts and Related Problems. Technical report. Cornell University.Google Scholar
- [79] . 2011. Refuse to crash with re-FUSE. In Proceedings of the 6th Conference on Computer Systems (EuroSys’11). 77–90.Google Scholar
Digital Library
- [80] System-call wrappers for glibc. Retrieved from https://lwn.net/Articles/799331/.Google Scholar
- [81] Tahoe-LAFS - Tahoe Least-Authority File Store. Retrieved from https://tahoe-lafs.org/trac/tahoe-lafs/.Google Scholar
- [82] . 2015. Terra incognita: On the practicality of user-space file systems. In Proceedings of the 7th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’15).Google Scholar
Digital Library
- [83] . 2001. Multiple bypass: Interposition agents for distributed computing. Cluster Comput. 4, 1 (2001), 39–47.
DOI: Google ScholarDigital Library
- [84] . 1996. An abstract-device interface for implementing portable parallel-I/O interfaces. In Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computing (Frontiers’96). 180–187.Google Scholar
Cross Ref
- [85] . 1997. Users Guide for ROMIO: A High-Performance, Portable MPI-IO Implementation.
Technical Report . Technical Report ANL/MCS-TM-234, Mathematics and Computer Science Division, Argonne National Laboratory.Google ScholarCross Ref
- [86] The Linux Kernel—d_splice_alias. Retrieved from https://www.kernel.org/doc/htmldocs/filesystems/API-d-splice-alias.html.Google Scholar
- [87] The Linux Kernel—userfaultfd. Retrieved from https://www.kernel.org/doc/html/latest/admin-guide/mm/userfaultfd.html.Google Scholar
- [88] The Plastic File System. Retrieved from http://plasticfs.sourceforge.net/.Google Scholar
- [89] The SYSIO library. Retrieved from https://libsysio.sourceforge.io/.Google Scholar
- [90] tmpfs Documentation. Retrieved from https://www.kernel.org/doc/Documentation/filesystems/tmpfs.txt.Google Scholar
- [91] TPCx-BB Specification. Retrieved from https://www.tpc.org/.Google Scholar
- [92] User-space page fault handling. Retrieved from https://lwn.net/Articles/550555/.Google Scholar
- [93] . 2019. Performance and resource utilization of FUSE user-space file systems. ACM Trans. Stor. 15, 2, Article
15 (May 2019).Google Scholar - [94] . 2021. ADAPT: An auxiliary storage data path toolkit. J. Syst. Arch. 113 (2021), 101902.Google Scholar
Cross Ref
- [95] . 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). 307–320.Google Scholar
- [96] . 2012. LDPLFS: Improving I/O performance without application modification. In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW’12). 1352–1359.Google Scholar
Digital Library
- [97] . 1999. A stackable file system interface for Linux. In Proceedings of the LinuxExpo Conference. 141–151.Google Scholar
- [98] . 2000. FiST: A language for stackable filesystems.Google Scholar
- [99] . 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (NSDI’12). 15–28.Google Scholar
- [100] . 2018. Direct-FUSE: Removing the middleman for high-performance FUSE file system support. In Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS’18). 6.Google Scholar
Digital Library
Index Terms
DEFUSE: An Interface for Fast and Correct User Space File System Access
Recommendations
A multiple-file write scheme for improving write performance of small files in Fast File System
Fast File System (FFS) stores files to disk in separate disk writes, each of which incurs a disk positioning (seek + rotation) limiting the write performance for small files. We propose a new scheme called co-writing to accelerate small file writes in ...
The University of Alberta user interface management system
In this paper the design and implementation of the University of Alberta user interface management system (UIMS) is discussed. This UIMS is based on the Seeheim model of user interfaces, which divides the user interface into three separate components. ...






Comments