Abstract
As GPU hardware becomes increasingly general-purpose, it is quickly outgrowing the traditional, constrained GPU-as-coprocessor programming model. This article advocates for extending standard operating system services and abstractions to GPUs in order to facilitate program development and enable harmonious integration of GPUs in computing systems. As an example, we describe the design and implementation of GPUFs, a software layer which provides operating system support for accessing host files directly from GPU programs. GPUFs provides a POSIX-like API, exploits GPU parallelism for efficiency, and optimizes GPU file access by extending the host CPU's buffer cache into GPU memory. Our experiments, based on a set of real benchmarks adapted to use our file system, demonstrate the feasibility and benefits of the GPUFs approach. For example, a self-contained GPU program that searches for a set of strings throughout the Linux kernel source tree runs over seven times faster than on an eight-core CPU.
- AMD. AMD and HSA: A new era of vivid digital experiences. http://www.amd.com/us/products/technologies/hsa/Pages/hsa.aspx.Google Scholar
- Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, and Willy Zwaenepoel. 1996. Treadmarks: Shared memory computing on networks of workstations. Computer 29, 2, 18--28. Google Scholar
Digital Library
- Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Computat. Pract. Exper., Special Issue: Euro-Par 2009 23, 2, 187--198. http://dx.doi.org/10.1002/cpe.1631 Google Scholar
Digital Library
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. ACM, New York, 29--44. http://dx.doi.org/10.1145/1629575.1629579 Google Scholar
Digital Library
- Amr Bayoumi, Michael Chu, Yasser Hanafy, Patricia Harrell, and Gamal Refai-Ahmed. 2009. Scientific and engineering computing using ATI stream technology. Comput. Sci. Eng. 11, 6, 92--97. http://dx.doi.org/10.1109/MCSE.2009.204 Google Scholar
Digital Library
- Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. 2004. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph. 23, 3. Google Scholar
Digital Library
- Wolfgang Effelsberg and Theo Haerder. 1984. Principles of database buffer management. ACM Trans. Datab. Syst. 9, 4, 560--595. http://dx.doi.org/10.1145/1994.2022 Google Scholar
Digital Library
- Isaac Gelado, John E. Stone, Javier Cabezas, Sanjay Patel, Nacho Navarro, and Wen-mei W. Hwu. 2010. An asymmetric distributed shared memory model for heterogeneous parallel systems. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 347--358. Google Scholar
Digital Library
- Sangjin Han, Keon Jang, KyoungSoo Park, and Sue Moon. 2010. PacketShader: A GPU-accelerated software router. SIGCOMM Comput. Commun. Rev. 40, 4, 195--206. http://dx.doi.org/10.1145/1851275.1851207 Google Scholar
Digital Library
- Tianyi David Han and Tarek S. Abdelrahman. 2009. hiCUDA: A high-level directive-based language for GPU programming. In Proceedings of the 2nd Workshop on General Purpose Processing on Graphics Processing Units (GPGPU-2). Google Scholar
Digital Library
- Stephen Hemminger. 2002. Fast reader/writer lock for gettimeofday 2.5.30.Google Scholar
- Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan-Kaufmann. Google Scholar
Digital Library
- John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and Michael J. West. 1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1. HPL. GPU Regexp. http://www.hpl.hp.com/israel/research/gpu_regex.html. Google Scholar
Digital Library
- Intel Corporation 2012. Intel Xeon-Phi Coprocessor: System Software Developers Guide. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-system-software-developers-guide.html.Google Scholar
- Feng Ji, Heshan Lin, and Xiaosong Ma. 2013. RSVM: A region-based software virtual memory for GPU. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 269--278. Google Scholar
Digital Library
- J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. 2005. Introduction to the cell multiprocessor. IBM J. Res. Develop. 49, 4/5, 589--604. Google Scholar
Digital Library
- S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. 2012. Gdev: First-class GPU resource management in the operating system. In Proceedings of the USENIX Annual Technical Conference. Google Scholar
Digital Library
- Stephen W. Keckler, William J. Dally, Brucek Khailany, Michael Garland, and David Glasco. 2011. GPUs and the future of parallel computing. IEEE Micro 31, 5, 7--17. Google Scholar
Digital Library
- Yuki Matsuo, Taku Shimosawa, and Yutaka Ishikawa. 2012. A file I/O system for many-core based clusters. In Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers. ACM, New York, Article 3. DOI: http://dx.doi.org/10.1145/2318916.2318920 Google Scholar
Digital Library
- Michael D. McCool and Bruce D'Amora. 2006. Programming using RapidMind on the Cell BE. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'06). ACM, New York, 222. DOI: http://dx.doi.org/10.1145/1188455.1188686 Google Scholar
Digital Library
- Paul E. McKenney, Dipankar Sarma, Andrea Arcangeli, Andi Kleen, Orran Krieger, and Rusty Russell. 2002. Read-copy update. In Proceedings of the Ottawa Linux Symposium. 338--367.Google Scholar
- Edmund B. Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, and Galen Hunt. 2009. Helios: Heterogeneous multiprocessing with satellite kernels. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP'09). ACM, New York. Google Scholar
Digital Library
- NFS. Network File System (NFS) version 4 protocol. http://www.ietf.org/rfc/rfc3530.txt.Google Scholar
- NVIDIA. NVIDIA CUDA 4.2 Developer Guide. http://developer.nvidia.com/category/zone/cuda-zone.Google Scholar
- NVIDIA 2013. NVIDIA CUDA Programming Guide. NVIDIA. NVIDIA. GPU-accelerated high performance libraries. https://developer.nvidia.com/gpu-accelerated-libraries.Google Scholar
- NVIDIA Thrust library. https://developer.nvidia.com/thrust.Google Scholar
- NVIDIA. Popular GPU-accelerated applications. http://www.nvidia.com/object/gpu-applications.html.Google Scholar
- Khronos Group: OpenCL. The Open Standard for Parallel Programming of Heterogeneous Systems. http://www.khronos.org/opencl.Google Scholar
- PGroup. PGI accelerator compilers with OpenACC directives. www.pgroup.com/resources/accel.htm.Google Scholar
- Christopher J. Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and Emmett Witchel. 2011a. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles, ACM, New York, 233--248. Google Scholar
Digital Library
- Christopher J. Rossbach, Jon Currey, and Emmett Witchel. 2011b. Operating Systems must support GPU abstractions. In Proceedings of the Hot Topics in Operating Systems (HotOS'11). Google Scholar
Digital Library
- Livio Soares and Michael Stumm. 2010. FlexSC: Flexible system call scheduling with exception-less system calls. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. USENIX Association, Berkeley, CA, 1--8. http://dl.acm.org/citation.cfm?id=1924943.1924946. Google Scholar
Digital Library
- Jeff A. Stuart, Michael Cox, and John D. Owens. 2010. GPU-to-CPU callbacks. In Proceedings of the 3rd Workshop on UnConventional High Performance Computing (UCHPC'10). Google Scholar
Digital Library
- Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm. 1996. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA'96). ACM, New York, 191--202. DOI: http://dx.doi.org/10.1145/232973.232993 Google Scholar
Digital Library
- Sain-Zee Ueng, Melvin Lathara, Sara S. Baghsorkhi, and Wen-Mei W. Hwu. 2008. CUDA-Lite: Reducing GPU programming complexity. In Proceedings of the 21th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC'08). Google Scholar
Digital Library
- Yaron Weinsberg, Danny Dolev, Tal Anker, Muli Ben-Yehuda, and Pete Wyckoff. 2008. Tapping into the fountain of CPUs: On operating system support for programmable devices. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'08). ACM, New York. Google Scholar
Digital Library
- E. Zadok and I. Bădulescu. 1999. A stackable file system interface for Linux. In Proceedings of the LinuxExpo Conference. 141--151.Google Scholar
Index Terms
GPUfs: Integrating a file system with GPUs
Recommendations
GPUfs: integrating a file system with GPUs
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsPU hardware is becoming increasingly general purpose, quickly outgrowing the traditional but constrained GPU-as-coprocessor programming model. To make GPUs easier to program and easier to integrate with existing systems, we propose making the host's ...
GPUnet: Networking Abstractions for GPU Programs
Despite the popularity of GPUs in high-performance and scientific computing, and despite increasingly general-purpose hardware capabilities, the use of GPUs in network servers or distributed systems poses significant challenges.
GPUnet is a native GPU ...
GPUfs: integrating a file system with GPUs
ASPLOS '13PU hardware is becoming increasingly general purpose, quickly outgrowing the traditional but constrained GPU-as-coprocessor programming model. To make GPUs easier to program and easier to integrate with existing systems, we propose making the host's ...






Comments