skip to main content
research-article
Public Access

Dynamic Resource Management for Efficient Utilization of Multitasking GPUs

Published:04 April 2017Publication History
Skip Abstract Section

Abstract

As graphics processing units (GPUs) are broadly adopted, running multiple applications on a GPU at the same time is beginning to attract wide attention. Recent proposals on multitasking GPUs have focused on either spatial multitasking, which partitions GPU resource at a streaming multiprocessor (SM) granularity, or simultaneous multikernel (SMK), which runs multiple kernels on the same SM. However, multitasking performance varies heavily depending on the resource partitions within each scheme, and the application mixes. In this paper, we propose GPU Maestro that performs dynamic resource management for efficient utilization of multitasking GPUs. GPU Maestro can discover the best performing GPU resource partition exploiting both spatial multitasking and SMK. Furthermore, dynamism within a kernel and interference between the kernels are automatically considered because GPU Maestro finds the best performing partition through direct measurements. Evaluations show that GPU Maestro can improve average system throughput by 20.2% and 13.9% over the baseline spatial multitasking and SMK, respectively.

References

  1. Green500 list, 2016. https://www.top500.org/green500/lists/2016/11/.Google ScholarGoogle Scholar
  2. Top500 list, 2016. http://www.top500.org/lists/2016/11/.Google ScholarGoogle Scholar
  3. J. T. Adriaens, K. Compton, N. S. Kim, and M. J. Schulte. Thecase for GPGPU spatial multitasking. In Proc. of the 18th International Symposium on High-Performance Computer Architecture, pages 1--12, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Amazon. Amazon web services. https://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  5. A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing CUDA workloads using a detailed GPU simulator. In Proc. of the 2009 IEEE Symposium on Performance Analysis of Systems and Software, pages 163--174, Apr. 2009. Google ScholarGoogle ScholarCross RefCross Ref
  6. C. Basaran and K.-D. Kang. Supporting preemptive task executions and memory copies in GPGPUs. In 2012 24th Euromicro Conference on Real-Time Systems, pages 287--296, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Che, M. Boyer, J. Meng, D. Tarjan, , J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proc. of the IEEE Symposium on Workload Characterization, pages 44--54, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 28(3):42--53, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Gupta, J. A. Stuart, and J. D. Owens. A study of persistent threads style GPU programming for GPGPU workloads. Innovative Parallel Computing, pages 1--14, 2012. Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Hong and H. Kim. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In Proc. of the 36th Annual International Symposium on Computer Architecture, pages 152--163, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Kato, K. Lakshmanan, R. R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. pages 17--30, 2011.Google ScholarGoogle Scholar
  12. KHRONOS Group. OpenCL - the open standard for parallel programming of heterogeneous systems, 2010. URL http://www.khronos.org.Google ScholarGoogle Scholar
  13. V. Narasiman, M. Shebanow, C. J. Lee, R. Miftakhutdinov, O. Mutlu, and Y. N. Patt. Improving GPU performance via large warps and two-level warp scheduling. In Proc. of the 44th Annual International Symposium on Microarchitecture, pages 308--317, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. NVIDIA. GPU Computing SDK. http://developer.nvidia.com/gpu-computing-sdk.Google ScholarGoogle Scholar
  15. NVIDIA. NVIDIA's next generation CUDA compute architecture: Kepler GK110, 2012. www.nvidia.com/content/PDF/NVIDIAKeplerGK110ArchitectureWhitepaper.pdf.Google ScholarGoogle Scholar
  16. NVIDIA. Sharing a GPU between MPI processes: Multi-process service (MPS) overview, 2014. http://docs.nvidia.com/deploy/mps /index.html.Google ScholarGoogle Scholar
  17. NVIDIA. NVIDIA GeForce GTX 980: Featuring Maxwell, the most advanced GPU ever made, 2014. http://international.download.nvidia.com/geforce-com/international/pdfs/GeForceGTX980WhitepaperFINAL.PDF.Google ScholarGoogle Scholar
  18. NVIDIA. NVIDIA CUDA C Programming Guide, version 7.5, 2015.Google ScholarGoogle Scholar
  19. S. Pai, M. J. Thazhuthaveetil, and R. Govindarajan. Improving GPGPU concurrency with elastic kernels. In 18th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 407--418, Mar. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. J. K. Park, Y. Park, and S. Mahlke. ELF: Maximizin memory-level parallelism for GPUs with coordinated warp and fetch scheduling. In Proceedings of SC15: the International Conference on High Performance Computing, Networking, Storage and Analysis, Nov. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. J. K. Park, Y. Park, and S. Mahlke. Chimera: Collaborative preemption for multitasking on a shared GPU. In 20th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 593--606, Mar. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. G. Rogers, M. O'Connor, and T. M. Aamodt. Cache-conscious wavefront scheduling. In Proc. of the 45th Annual International Symposium on Microarchitecture, pages 72--83, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proc. of the 23rd ACM Symposium on Operating Systems Principles, pages 233--248, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts. John Wiley and Sons, Inc., 8th edition, 2013.Google ScholarGoogle Scholar
  25. J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. D. Liu, and W. mei Hwu. Parboil: A revised benchmark suite for scientific and commercial through put computing. Technical Report IMPACT-12-01, University of Illinois at Urbana-Champaign, Mar. 2012.Google ScholarGoogle Scholar
  26. I. Tanasic, I. Gelado, J. Cabezas, A. Ramirez, N. Navarro, and M. Valero. Enabling preemptive multiprogramming on GPUs. In Proc. of the 41st Annual International Symposium on Computer Architecture, pages 193--204, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  27. D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proc. of the 22nd Annual International Symposium on Computer Architecture, pages 392--403, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In Proc. of the 23rd Annual International Symposium on Computer Architecture, pages 191--202, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. Large-scale cluster management at Google with Borg. In Proc. of the 10th European Conference on Computer Systems, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Z. Wang, J. Yang, R. Melhem, B. Childers, Y. Zhang, and M. Guo. Simultaneous multikernel GPU: Multi-tasking throughput processors via fine-grained sharing. In Proc. of the 22nd International Symposium on High-Performance Computer Architecture, pages 358--369, Mar. 2016. Google ScholarGoogle ScholarCross RefCross Ref
  31. B. Wu, G. Chen, D. Li, X. Shen, and J. Vetter. Enabling and exploiting flexible task assignment on GPU through SMcentric program transformations. In Proc. of the 2015 International Conference on Supercomputing, pages 119--130, June 2015.Google ScholarGoogle Scholar
  32. Q. Xu, H. Jeon, K. Kim, W. W. Ro, and M. Annavaram. Warped-slicer: Efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming. In Proc. of the 43rd Annual International Symposium on Computer Architecture, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Zhang and J. D. Owens. A quantitative performance analysis model for GPU architectures. In Proc. of the 17th International Symposium on High-Performance Computer Architecture, pages 382--393, Feb. 2011. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Dynamic Resource Management for Efficient Utilization of Multitasking GPUs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 52, Issue 4
        ASPLOS '17
        April 2017
        811 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3093336
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
          April 2017
          856 pages
          ISBN:9781450344654
          DOI:10.1145/3037697

        Copyright © 2017 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 April 2017

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!