Abstract
PU hardware is becoming increasingly general purpose, quickly outgrowing the traditional but constrained GPU-as-coprocessor programming model. To make GPUs easier to program and easier to integrate with existing systems, we propose making the host's file system directly accessible from GPU code. GPUfs provides a POSIX-like API for GPU programs, exploits GPU parallelism for efficiency, and optimizes GPU file access by extending the buffer cache into GPU memory. Our experiments, based on a set of real benchmarks adopted to use our file system, demonstrate the feasibility and benefits of our approach. For example, we demonstrate a simple self-contained GPU program which searches for a set of strings in the entire tree of Linux kernel source files over seven times faster than an eight-core CPU run.
- Amittai Aviram, Shu-Chun Weng, Sen Hu, and Bryan Ford. Efficient system-enforced deterministic parallelism. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, October 2010. Google Scholar
Digital Library
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schupbach, and Akhilesh Singhania. The Multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd symposium on Operating Systems Principles, pages 29--44, New York, NY, USA, 2009. Google Scholar
Digital Library
- Amr Bayoumi, Michael Chu, Yasser Hanafy, Patricia Harrell, and Gamal Refai-Ahmed. Scientific and Engineering Computing Using ATI Stream Technology. Computing in Science and Engineering, 11(6):92--97, 2009. Google Scholar
Digital Library
- Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. ACM Transactions on Graphics, 23(3), August 2004. Google Scholar
Digital Library
- Wolfgang Effelsberg and Theo Haerder. Principles of database buffer management. ACM Transactions on Database Systems, 9(4):560--595, December 1984. Google Scholar
Digital Library
- Isaac Gelado, John E. Stone, Javier Cabezas, Sanjay Patel, Nacho Navarro, and Wen-mei W. Hwu. An asymmetric distributed shared memory model for heterogeneous parallel systems. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 347--358, New York, NY, USA, 2010. Google Scholar
Digital Library
- Khronos Group. OpenCL - the open standard for parallel programming of heterogeneous systems. http://www.khronos.org/opencl.Google Scholar
- Tianyi David Han and Tarek S. Abdelrahman. hiCUDA: a high-level directive-based language for GPU programming. In Proceedings of the 2nd Workshop on General Purpose Processing on Graphics Processing Units (GPGPU-2), March 2009. Google Scholar
Digital Library
- Stephen Hemminger. fast reader/writer lock for gettimeofday 2.5.30, 2002. http://lwn.net/Articles/7388/.Google Scholar
- John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and Michael J. West. Scale and performance in a distributed file system. ACM Transactions on Computing Systems, 6(1), February 1988. Google Scholar
Digital Library
- Intel Xeon-Phi Coprocessor: System Software Developers Guide, November 2012. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessorsystem-software-developers-guide.html.Google Scholar
- J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the Cell multiprocessor. IBM Journal of Research and Development, 49:589--604, July 2005. Google Scholar
Digital Library
- S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: Firstclass GPU resource management in the operating system. In USENIX Annual Technical Conference, June 2012. Google Scholar
Digital Library
- Walt Ligon and Rob Ross. Parallel i/o and the parallel virtual file system. In William Gropp, Ewing Lusk, and Thomas Sterling, editors, Beowulf Cluster Computing with Linux, pages 493--535. MIT Press, 2003. Google Scholar
Digital Library
- Yuki Matsuo, Taku Shimosawa, and Yutaka Ishikawa. A file I/O system for many-core based clusters. In Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers, pages 3:1--3:8, New York, NY, USA, 2012. Google Scholar
Digital Library
- Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, 2008. Google Scholar
Digital Library
- Michael D. McCool and Bruce D'Amora. Programming using Rapid-Mind on the Cell BE. In SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 222, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- Paul E. McKenney, Dipankar Sarma, Andrea Arcangeli, Andi Kleen, Orran Krieger, and Rusty Russell. Read-copy update. In Ottawa Linux Symposium, pages 338--367, June 2002.Google Scholar
- Edmund B. Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, and Galen Hunt. Helios: heterogeneous multiprocessing with satellite kernels. In SOSP '09: Proceedings of the 22nd ACM symposium on Operating systems principles, 2009. Google Scholar
Digital Library
- NVIDIA CUDA 4.2 Developer Guide. http://developer.nvidia.com/category/zone/cuda-zone.Google Scholar
- NVIDIA's Next Generation CUDA Compute Architecture: Fermi, 2011. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf.Google Scholar
- Christopher J. Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and EmmettWitchel. PTask: operating system abstractions to manage GPUs as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 233--248, 2011. Google Scholar
Digital Library
- Livio Soares and Michael Stumm. FlexSC: flexible system call scheduling with exception-less system calls. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, pages 1--8, Berkeley, CA, USA, 2010. Google Scholar
Digital Library
- Jeff A. Stuart, Michael Cox, and John D. Owens. GPU-to-CPU callbacks. In Third Workshop on UnConventional High Performance Computing (UCHPC 2010), August 2010. Google Scholar
Digital Library
- Sain-Zee Ueng, Melvin Lathara, Sara S. Baghsorkhi, andWen-MeiW. Hwu. CUDA-Lite: Reducing GPU Programming Complexity. In LCPC 2008, 21th Annual Workshop on Languages and Compilers for Parallel Computing, 2008.Google Scholar
- BruceWalker, Gerald Popek, Robert English, Charles Kline, and Greg Thiel. The LOCUS distributed operating system. In Proceedings of the ninth ACM symposium on Operating systems principles, pages 49--70, New York, NY, USA, 1983. Google Scholar
Digital Library
- Yaron Weinsberg, Danny Dolev, Tal Anker, Muli Ben-Yehuda, and Pete Wyckoff. Tapping into the fountain of CPUs: on operating system support for programmable devices. In 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08), March 2008. Google Scholar
Digital Library
- E. Zadok and I. Badulescu. A stackable file system interface for Linux. In LinuxExpo Conference Proceedings, pages 141--151, Raleigh, NC, May 1999.Google Scholar
Index Terms
GPUfs: integrating a file system with GPUs
Recommendations
GPUfs: integrating a file system with GPUs
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsPU hardware is becoming increasingly general purpose, quickly outgrowing the traditional but constrained GPU-as-coprocessor programming model. To make GPUs easier to program and easier to integrate with existing systems, we propose making the host's ...
GPUnet: Networking Abstractions for GPU Programs
Despite the popularity of GPUs in high-performance and scientific computing, and despite increasingly general-purpose hardware capabilities, the use of GPUs in network servers or distributed systems poses significant challenges.
GPUnet is a native GPU ...
GPUfs: Integrating a file system with GPUs
As GPU hardware becomes increasingly general-purpose, it is quickly outgrowing the traditional, constrained GPU-as-coprocessor programming model. This article advocates for extending standard operating system services and abstractions to GPUs in order ...







Comments