skip to main content
announcement

GPUrpc: Exploring Transparent Access to Remote GPUs

Authors Info & Claims
Published:13 October 2016Publication History
Skip Abstract Section

Abstract

Graphics processing units (GPUs) are increasingly used for high-performance computing. Programming frameworks for general-purpose computing on GPUs (GPGPU), such as CUDA and OpenCL, are also maturing. Driving this trend is the recent proliferation of mobile devices such as smartphones and wearable computers. These devices are increasingly incorporating computationally intensive applications that involve some form of environmental recognition such as augmented reality (AR) or voice recognition. However, devices with low computational power cannot satisfy such demanding computing requirements. The CPU load of these devices could be reduced by offloading computation onto GPUs on the cloud. This paper presents GPUrpc, a remote procedure call (RPC) extension to Gdev, which is a rich set of runtime libraries and device drivers for achieving first-class GPU resource management. GPUrpc allows developers to use CUDA for GPGPU development work. Existing research uses RPCs based on the CUDA application programming interfaces (APIs); hence, all CUDA APIs require communication. To reduce communication overhead, we use an RPC based on a low-level API than CUDA API and reduced API that does not require communication. Our evaluation conducted on Linux and NVIDIA GPUs shows that the basic performance of our prototype implementation is reliable in comparison with the existing method. Evaluation using the Rodinia benchmark suite designed for research in heterogeneous parallel computing showed that GPUrpc is effective for applications such as image processing and data mining. GPUrpc also can improve power consumption to approximately 1/6 that of CPU processing for performing 512 × 512 matrix multiplication.

References

  1. 2014. TOP500 supercomputing sites. Retrieved from http://www.top500.org/lists/2014/11/.Google ScholarGoogle Scholar
  2. Erik Alerstam, Tomas Svensson, and Stefan Andersson-Engels. 2008. Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration. J. Biomed. Optics 13, 6 (2008), 060504--060504.Google ScholarGoogle ScholarCross RefCross Ref
  3. Nguyen Viet Anh, Yusuke Fujii, Yuki Iida, Takuya Azumi, Nobuhiko Nishio, and Shinpei Kato. 2014. Reducing data copies between GPUs and NICs. In Proceedings of the IEEE International Conference on Cyber-Physical Systems, Networks, and Applications. 37--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC) (IISWC’09). IEEE Computer Society, 44--54. DOI:http://dx.doi.org/10.1109/IISWC.2009.5306797 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, and Kevin Skadron. 2008. A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68, 10 (Oct. 2008), 1370--1380. DOI:http://dx.doi.org/10.1016/j.jpdc.2008.05.014 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Eduardo Cuervo, Aruna Balasubramanian, Dae ki Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra, and Paramvir Bahl. 2010. MAUI: Making smartphones last longer with code offload. In Proceedings of the ACM International Conference on Mobile Systems, Applications, and Services. 49--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. José Duato, Francisco D. Igual, Rafael Mayo, Antonio J. Peña, Enrique S. Quintana-Ortí, and Federico Silla. 2010a. An efficient implementation of GPU virtualization in high performance clusters. In Proceedings of the 2009 International Conference on Parallel Processing (Euro-Par’09). Springer-Verlag, 385--394. http://dl.acm.org/citation.cfm?id=1884795.1884840 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. José Duato, Antonio J. Peña, Federico Silla, Rafael Mayo, and Enrique S. Quintana-Ortí. 2010b. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In Proceedings of the 2010 International Conference on High Performance Computing 8 Simulation (HPCS’10). 224--231. DOI:http://dx.doi.org/10.1109/HPCS.2010.5547126Google ScholarGoogle ScholarCross RefCross Ref
  9. Glenn A. Elliott and James H. Anderson. 2014. Exploring the multitude of real-time multi-GPU configurations. In Proceedings of the IEEE Real-Time Systems Symposium. 260--271.Google ScholarGoogle Scholar
  10. James H. Anderson Glenn A. Elliott, Bryan C. Ward. 2013. GPUSync: A framework for real-time GPU management. In Proceedings of the IEEE Real-Time Systems Symposium. 33--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Google. 2013. Google Glass. Retrieved from http://www.google.com/glass.Google ScholarGoogle Scholar
  12. Khronos Group. 2013. OpenCL. Retrieved from http://jp.khronos.org/opencl.Google ScholarGoogle Scholar
  13. Yuki Iida, Manato Hirabayashi, Takuya Azumi, Nobuhiko Nishio, and Shinpei Kato. 2014. Connected smartphones and high-performance servers for remote object detection. In Proceedings of the IEEE International Conference on Cyber-Physical Systems, Networks, and Applications. 71--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shinpei Kato, Jason Aumiller, and Scott Brandt. 2013. Zero-copy I/O processing for low-latency GPU computing. In Proceedings of the ACM/IEEE 4th International Conference on Cyber-Physical Systems (ICCPS’13). ACM, 170--178. DOI:http://dx.doi.org/10.1145/2502524.2502548 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Shinpei Kato, Michael McThrow, Carlos Maltzahn, and Scott Brandt. 2012. Gdev: First-class gpu resource management in the operating system. In Presented as Part of the 2012 USENIX Annual Technical Conference (USENIX ATC’12). USENIX, 401--412. https://www.usenix.org/conference/atc12/technical-sessions/ presentation/kato. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Kawai, K. Yasuoka, K. Yoshikawa, and T. Narumi. 2012. Distributed-shared CUDA: Virtualization of large-scale GPU systems for programmability and reliability. In Proceedings of the 4th International Conference on Future Computational Technologies and Applications (FUTURE COMPUTING’12). 712.Google ScholarGoogle Scholar
  17. Volodymyr Kindratenko and Pedro Trancoso. 2011. Trends in high-performance computing. Computing in Science 8 Engineering 13, 3 (2011), 92--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Andreas Kolb and Nicolas Cuntz. 2005. Dynamic particle coupling for GPU-based fluid simulation. In Proceedings of the Symposium on Simulation Technique. 722--727.Google ScholarGoogle Scholar
  19. Wenjing Ma and Gagan Agrawal. 2009. A translation system for enabling data mining applications on GPUs. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09). ACM, 400--409. DOI:http://dx.doi.org/10.1145/1542275.1542331 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. NVIDIA. 2015a. CUDA C Programming Guide. Retrieved from http://docs.nvidia.com/cuda/cuda-c-programm ing-guide.Google ScholarGoogle Scholar
  21. NVIDIA. 2015b. CUDA Documents. Retrieved from http://docs.nvidia.com/cuda/.Google ScholarGoogle Scholar
  22. Minoru Oikawa, Atsushi Kawai, Kentaro Nomura, Kenji Yasuoka, Kazuyuki Yoshikawa, and Tetsu Narumi. 2012. DS-CUDA: A middleware to use many GPUs in the cloud environment. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (SCC’12). IEEE Computer Society, Washington, DC, USA, 1207--1214. DOI:http://dx.doi.org/10.1109/SC.Companion.2012.146 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Antonio José Peña, Carlos Reaño, Federico Silla, Rafael Mayo, Enrique S. Quintana-Ortí, and Jose Duato. 2014. A complete and efficient CUDA-sharing solution for HPC clusters. Parallel Comput. 40, 10 (2014), 574--588. DOI:http://dx.doi.org/10.1016/j.parco.2014.09.011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Padmanabhan S. Pillai, Lily B. Mummert, Steven W. Schlosser, and Rahul Sukthankar. 2009. SLIPstream: Scalable low-latency interactive perception on streaming data. In NOSSDAV. ACM, 43--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Moo-Ryong Ra, Anmol Sheth, Lily Mummert, Padmanabhan Pillai, David Wetherall, and Ramesh Govindan. 2011. Odessa: Enabling interactive perception applications on mobile devices. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services (MobiSys’11). ACM, 43--56. DOI:http://dx.doi.org/10.1145/1999995.2000000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Haicheng Wu, Gregory Diamos, Srihari Cadambi, and Sudhakar Yalamanchili. 2012. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, 107--118. DOI:http://dx.doi.org/10.1109/MICRO.2012.19 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jeffrey Young, Haicheng Wu, and Sudhakar Yalamanchili. 2012. Satisfying data-intensive queries using GPU clusters. In SC Companion. IEEE Computer Society, 1314. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GPUrpc: Exploring Transparent Access to Remote GPUs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!