skip to main content
research-article

GPUfs: Integrating a file system with GPUs

Published:26 February 2014Publication History
Skip Abstract Section

Abstract

As GPU hardware becomes increasingly general-purpose, it is quickly outgrowing the traditional, constrained GPU-as-coprocessor programming model. This article advocates for extending standard operating system services and abstractions to GPUs in order to facilitate program development and enable harmonious integration of GPUs in computing systems. As an example, we describe the design and implementation of GPUFs, a software layer which provides operating system support for accessing host files directly from GPU programs. GPUFs provides a POSIX-like API, exploits GPU parallelism for efficiency, and optimizes GPU file access by extending the host CPU's buffer cache into GPU memory. Our experiments, based on a set of real benchmarks adapted to use our file system, demonstrate the feasibility and benefits of the GPUFs approach. For example, a self-contained GPU program that searches for a set of strings throughout the Linux kernel source tree runs over seven times faster than on an eight-core CPU.

References

  1. AMD. AMD and HSA: A new era of vivid digital experiences. http://www.amd.com/us/products/technologies/hsa/Pages/hsa.aspx.Google ScholarGoogle Scholar
  2. Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, and Willy Zwaenepoel. 1996. Treadmarks: Shared memory computing on networks of workstations. Computer 29, 2, 18--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Computat. Pract. Exper., Special Issue: Euro-Par 2009 23, 2, 187--198. http://dx.doi.org/10.1002/cpe.1631 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. ACM, New York, 29--44. http://dx.doi.org/10.1145/1629575.1629579 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Amr Bayoumi, Michael Chu, Yasser Hanafy, Patricia Harrell, and Gamal Refai-Ahmed. 2009. Scientific and engineering computing using ATI stream technology. Comput. Sci. Eng. 11, 6, 92--97. http://dx.doi.org/10.1109/MCSE.2009.204 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. 2004. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. Graph. 23, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Wolfgang Effelsberg and Theo Haerder. 1984. Principles of database buffer management. ACM Trans. Datab. Syst. 9, 4, 560--595. http://dx.doi.org/10.1145/1994.2022 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Isaac Gelado, John E. Stone, Javier Cabezas, Sanjay Patel, Nacho Navarro, and Wen-mei W. Hwu. 2010. An asymmetric distributed shared memory model for heterogeneous parallel systems. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 347--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sangjin Han, Keon Jang, KyoungSoo Park, and Sue Moon. 2010. PacketShader: A GPU-accelerated software router. SIGCOMM Comput. Commun. Rev. 40, 4, 195--206. http://dx.doi.org/10.1145/1851275.1851207 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tianyi David Han and Tarek S. Abdelrahman. 2009. hiCUDA: A high-level directive-based language for GPU programming. In Proceedings of the 2nd Workshop on General Purpose Processing on Graphics Processing Units (GPGPU-2). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Stephen Hemminger. 2002. Fast reader/writer lock for gettimeofday 2.5.30.Google ScholarGoogle Scholar
  12. Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan-Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and Michael J. West. 1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1. HPL. GPU Regexp. http://www.hpl.hp.com/israel/research/gpu_regex.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Intel Corporation 2012. Intel Xeon-Phi Coprocessor: System Software Developers Guide. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-system-software-developers-guide.html.Google ScholarGoogle Scholar
  15. Feng Ji, Heshan Lin, and Xiaosong Ma. 2013. RSVM: A region-based software virtual memory for GPU. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 269--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. 2005. Introduction to the cell multiprocessor. IBM J. Res. Develop. 49, 4/5, 589--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. 2012. Gdev: First-class GPU resource management in the operating system. In Proceedings of the USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Stephen W. Keckler, William J. Dally, Brucek Khailany, Michael Garland, and David Glasco. 2011. GPUs and the future of parallel computing. IEEE Micro 31, 5, 7--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yuki Matsuo, Taku Shimosawa, and Yutaka Ishikawa. 2012. A file I/O system for many-core based clusters. In Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers. ACM, New York, Article 3. DOI: http://dx.doi.org/10.1145/2318916.2318920 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Michael D. McCool and Bruce D'Amora. 2006. Programming using RapidMind on the Cell BE. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'06). ACM, New York, 222. DOI: http://dx.doi.org/10.1145/1188455.1188686 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Paul E. McKenney, Dipankar Sarma, Andrea Arcangeli, Andi Kleen, Orran Krieger, and Rusty Russell. 2002. Read-copy update. In Proceedings of the Ottawa Linux Symposium. 338--367.Google ScholarGoogle Scholar
  22. Edmund B. Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, and Galen Hunt. 2009. Helios: Heterogeneous multiprocessing with satellite kernels. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP'09). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. NFS. Network File System (NFS) version 4 protocol. http://www.ietf.org/rfc/rfc3530.txt.Google ScholarGoogle Scholar
  24. NVIDIA. NVIDIA CUDA 4.2 Developer Guide. http://developer.nvidia.com/category/zone/cuda-zone.Google ScholarGoogle Scholar
  25. NVIDIA 2013. NVIDIA CUDA Programming Guide. NVIDIA. NVIDIA. GPU-accelerated high performance libraries. https://developer.nvidia.com/gpu-accelerated-libraries.Google ScholarGoogle Scholar
  26. NVIDIA Thrust library. https://developer.nvidia.com/thrust.Google ScholarGoogle Scholar
  27. NVIDIA. Popular GPU-accelerated applications. http://www.nvidia.com/object/gpu-applications.html.Google ScholarGoogle Scholar
  28. Khronos Group: OpenCL. The Open Standard for Parallel Programming of Heterogeneous Systems. http://www.khronos.org/opencl.Google ScholarGoogle Scholar
  29. PGroup. PGI accelerator compilers with OpenACC directives. www.pgroup.com/resources/accel.htm.Google ScholarGoogle Scholar
  30. Christopher J. Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and Emmett Witchel. 2011a. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles, ACM, New York, 233--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Christopher J. Rossbach, Jon Currey, and Emmett Witchel. 2011b. Operating Systems must support GPU abstractions. In Proceedings of the Hot Topics in Operating Systems (HotOS'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Livio Soares and Michael Stumm. 2010. FlexSC: Flexible system call scheduling with exception-less system calls. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. USENIX Association, Berkeley, CA, 1--8. http://dl.acm.org/citation.cfm?id=1924943.1924946. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jeff A. Stuart, Michael Cox, and John D. Owens. 2010. GPU-to-CPU callbacks. In Proceedings of the 3rd Workshop on UnConventional High Performance Computing (UCHPC'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm. 1996. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA'96). ACM, New York, 191--202. DOI: http://dx.doi.org/10.1145/232973.232993 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sain-Zee Ueng, Melvin Lathara, Sara S. Baghsorkhi, and Wen-Mei W. Hwu. 2008. CUDA-Lite: Reducing GPU programming complexity. In Proceedings of the 21th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yaron Weinsberg, Danny Dolev, Tal Anker, Muli Ben-Yehuda, and Pete Wyckoff. 2008. Tapping into the fountain of CPUs: On operating system support for programmable devices. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'08). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. E. Zadok and I. Bădulescu. 1999. A stackable file system interface for Linux. In Proceedings of the LinuxExpo Conference. 141--151.Google ScholarGoogle Scholar

Index Terms

  1. GPUfs: Integrating a file system with GPUs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Computer Systems
          ACM Transactions on Computer Systems  Volume 32, Issue 1
          February 2014
          132 pages
          ISSN:0734-2071
          EISSN:1557-7333
          DOI:10.1145/2584468
          Issue’s Table of Contents

          Copyright © 2014 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 February 2014
          • Accepted: 1 November 2013
          • Received: 1 October 2013
          Published in tocs Volume 32, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!