skip to main content
research-article
Public Access

EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU

Authors Info & Claims
Published:26 January 2017Publication History
Skip Abstract Section

Abstract

Modern GPUs are broadly adopted in many multitasking environments, including data centers and smartphones. However, the current support for the scheduling of multiple GPU kernels (from different applications) is limited, forming a major barrier for GPU to meet many practical needs. This work for the first time demonstrates that on existing GPUs, efficient preemptive scheduling of GPU kernels is possible even without special hardware support. Specifically, it presents EffiSha, a pure software framework that enables preemptive scheduling of GPU kernels with very low overhead. The enabled preemptive scheduler offers flexible support of kernels of different priorities, and demonstrates significant potential for reducing the average turnaround time and improving the system overall throughput of programs that time share a modern GPU.

References

  1. Y. Suzuki, S. Kato, H. Yamada, and K. Kono, "Gpuvm: Why not virtualizing gpus at the hypervisor?," in Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'14, (Berkeley, CA, USA), pp. 109--120, USENIX Association, 2014.Google ScholarGoogle Scholar
  2. I. Tanasic, I. Gelado, J. Cabezas, A. Ramirez, N. Navarro, and M. Valero, "Enabling preemptive multiprogramming on gpus," in Proceeding of the 41st annual international symposium on Computer architecuture, pp. 193--204, IEEE Press, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  3. K. Menychtas, K. Shen, and M. L. Scott, "Disengaged scheduling for fair, protected access to computational accelerators," in Proceedings of the international conference on Architectural support for programming languages and operating systems, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. J. K. Park, Y. Park, and S. Mahlke, "Chimera: Collaborative preemption for multitasking on a shared gpu," in Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 593--606, ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Lin, L. Nyland, and H. Zhou, "Enabling efficient preemption for simt architectures with lightweight context switching," in the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC'16), 2016.Google ScholarGoogle Scholar
  6. C. Basaran and K. Kang, "Supporting preemptive task executions and memory copies in gpgpus," in Proceedings of the 24th Euromicro Conference on Real-Time Systems, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Zhou, G. Tong, and C. Liu, "Gpes: a preemptive execution system for gpgpu computing," in Real-Time and Embedded Technology and Applications Symposium (RTAS), 2015 IEEE, pp. 87--97, IEEE, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  8. K. Gupta, J. A. Stuart, and J. D. Owens, "A study of persistent threads style gpu programming for gpgpu workloads," in Innovative Parallel Computing, 2012.Google ScholarGoogle Scholar
  9. A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter, "The scalable heterogeneous computing (shoc) benchmark suite," in GPGPU, 2010.Google ScholarGoogle Scholar
  10. G. Chen and X. Shen, "Free launch: Optimizing gpu dynamic kernel launches through thread reuse," in Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Yang and H. Zhou, "Cuda-np: Realizing nested thread-level parallelism in gpgpu applications," SIGPLAN Not., vol. 49, pp. 93--106, Feb. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. "http://clang.llvm.org."Google ScholarGoogle Scholar
  13. S. Eyerman and L. Eeckhout, "System-level performance metrics for multiprogram workloads," IEEE micro, no. 3, pp. 42--53, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. S. Tanenbaum, Modern Operating Systems. Pearson, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. D. Pietro, F. Lombardi, and A. Villani, "Cuda leaks: A detailed hack for cuda and a (partial) fix," ACM Trans. Embed. Comput. Syst., vol. 15, pp. 15:1--15:25, Jan. 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Chen, B. Wu, D. Li, and X. Shen, "Porple: An extensible optimizer for portable data placement on gpu," in Proceedings of the 47th International Conference on Microarchitecture, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Chen and X. Shen, "Coherence-free multiview: Enabling reference-discerning data placement on gpu," in Proceedings of the 2016 International Conference on Supercomputing, ICS '16, (New York, NY, USA), pp. 14:1--14:13, ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Chen, X. Shen, B. Wu, and D. Li, "Optimizing data placement on gpu memory: A portable approach," IEEE Transactions on Computers, vol. PP, no. 99, 2016.Google ScholarGoogle Scholar
  19. S. Kato, M. McThrow, C. Maltzahn, and S. Brandt, "Gdev: First-class gpu resource management in the operating system," in Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX ATC'12, (Berkeley, CA, USA), pp. 37--37, USENIX Association, 2012.Google ScholarGoogle Scholar
  20. M. Silberstein, B. Ford, I. Keidar, and E. Witchel, "Gpufs: Integrating a file system with gpus," ACM Trans. Comput. Syst., vol. 32, pp. 1:1--1:31, Feb. 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Wang, X. Ding, R. Lee, S. Kato, and X. Zhang, "Gdm: Device memory management for gpgpu computing," in The 2014 ACM International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '14, (New York, NY, USA), pp. 533--545, ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Q. Chen, H. Yang, J. Mars, and L. Tang, "Baymax: Qos awareness and increased utilization for non-preemptive accelerators in warehouse scale computers," in International Conference on Architectural Support for Programming Languages and Operating Systems, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Z. Wang, J. Yang, R. Melhem, B. Childers, Y. Zhang, and M. Guo, "Simultaneous multikernel gpu: Multi-tasking throughput processors via fine-grained sharing," in Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, HPCA'16, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Pai, M. J. Thazhuthaveetil, and R. Govindarajan, "Improving GPGPU Concurrency with Elastic Kernels," in International Conference on Architectural Support for Programming Languages and Operating Systems, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Zhong and B. He, "Kernelet: High-throughput gpu kernel executions with dynamic slicing and scheduling," CoRR, vol. abs/1303.5164, 2013.Google ScholarGoogle Scholar
  26. B. Wu, G. Chen, D. Li, X. Shen, and J. Vetter, "Enabling and exploiting flexible task assignment on gpu through sm-centric program transformations," in Proceedings of the International Conference on Supercomputing, ICS '15, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. T. Adriaens, K. Compton, N. S. Kim, and M. J. Schulte, "The Case for GPGPU Spatial Multitasking," in International Symposium on High Performance Computer Architecture, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Chen, O. Villa, S. Krishnamoorthy, and G. Gao, "Dynamic load balancing on single-and multi-gpu systems," in IPDPS, 2010.Google ScholarGoogle Scholar
  29. S. Xiao and W. chun Feng, "Inter-block gpu communication via fast barrier synchronization," in IPDPS, 2010.Google ScholarGoogle Scholar
  30. T. Aila and S. Laine, "Understanding the efficiency of ray traversal on gpus," in Proceedings of the Conference on High Performance Graphics 2009, HPG '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Tzeng, A. Patney, and J. D. Owens, "Task management for irregular-parallel workloads on the gpu," in Proceedings of the Conference on High Performance Graphics, 2010.Google ScholarGoogle Scholar
  32. M. Steinberger, M. Kenzel, P. Boechat, B. Kerbl, M. D. okter, and D. Schmalstieg, "Whippletree: Task-based scheduling of dynamic workloads on the gpu," ACM Transactions on Computer Systems, vol. 33, no. 6, 2014.Google ScholarGoogle Scholar

Index Terms

  1. EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!