skip to main content
article

Analyzing memory management methods on integrated CPU-GPU systems

Published:18 June 2017Publication History
Skip Abstract Section

Abstract

Heterogeneous systems that integrate a multicore CPU and a GPU on the same die are ubiquitous. On these systems, both the CPU and GPU share the same physical memory as opposed to using separate memory dies. Although integration eliminates the need to copy data between the CPU and the GPU, arranging transparent memory sharing between the two devices can carry large overheads. Memory on CPU/GPU systems is typically managed by a software framework such as OpenCL or CUDA, which includes a runtime library, and communicates with a GPU driver. These frameworks offer a range of memory management methods that vary in ease of use, consistency guarantees and performance. In this study, we analyze some of the common memory management methods of the most widely used software frameworks for heterogeneous systems: CUDA, OpenCL 1.2, OpenCL 2.0, and HSA, on NVIDIA and AMD hardware. We focus on performance/functionality trade-offs, with the goal of exposing their performance impact and simplifying the choice of memory management methods for programmers.

References

  1. AMD Graphics Core Next (GCN) Architecture. https://www.amd. com/Documents/GCN_Architecture_whitepaper.pdf, 2012.Google ScholarGoogle Scholar
  2. CL Offline Compiler. https://github.com/HSAFoundation/ CLOC, 2017.Google ScholarGoogle Scholar
  3. N. Agarwal, D. Nellans, E. Ebrahimi, T. F. Wenisch, J. Danskin, and S. W. Keckler. Selective gpu caches to eliminate cpu-gpu hw cache coherence. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 494–506. IEEE, 2016.Google ScholarGoogle Scholar
  4. R. Barik, R. Kaleem, D. Majeti, B. T. Lewis, T. Shpeisman, C. Hu, Y. Ni, and A.-R. Adl-Tabatabai. Efficient mapping of irregular c++ applications to integrated gpus. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, page 33. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IEEE International Symposium on Workload Characterization IISWC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Choi, A. Chandramowlishwaran, K. Madduri, and R. Vuduc. A cpugpu hybrid implementation and model-driven scheduling of the fast multipole method. In Proceedings of Workshop on General Purpose Processing Using GPUs, page 64. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. O. Gabbay. AMD HSA kernel driver. https://lwn.net/Articles/ 605153, 2014.Google ScholarGoogle Scholar
  8. J. Gomez-Luna, T. Grass, A. Rico, E. Ayguade, A. J. Pena, et al. Evaluating the effect of last-level cache sharing on integrated gpu-cpu systems with heterogeneous applications. In Workload Characterization (IISWC), 2016 IEEE International Symposium on, pages 1–10. IEEE, 2016.Google ScholarGoogle Scholar
  9. J. Hestness, S. W. Keckler, and D. A. Wood. A comparative analysis of microarchitecture effects on cpu and gpu memory system behavior. In IISWC. IEEE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  10. J. Hestness, S. W. Keckler, and D. A. Wood. Gpu computing pipeline inefficiencies and optimization opportunities in heterogeneous cpugpu processors. In IEEE International Symposium on Workload Characterization (IISWC), 2015, pages 87–97. IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Kaleem, R. Barik, T. Shpeisman, B. T. Lewis, C. Hu, and K. Pingali. Adaptive heterogeneous scheduling for integrated gpus. In Proceedings of the 23rd international conference on Parallel architectures and compilation, pages 151–162. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Merrill, M. Garland, and A. Grimshaw. Scalable gpu graph traversal. In ACM SIGPLAN Notices, volume 47, pages 117–128. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Mukherjee, Y. Sun, P. Blinzer, A. K. Ziabari, and D. Kaeli. A comprehensive performance analysis of hsa and opencl 2.0. In Performance Analysis of Systems and Software (ISPASS), 2016 IEEE International Symposium on, pages 183–193. IEEE, 2016.Google ScholarGoogle Scholar
  14. G. L. Peterson. Myths about the mutual exclusion problem. Information Processing Letters, 12(3):115–116, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  15. P. Singh, C.-R. M, P. Raghavendra, T. Tye, D. Das, and A. N. Platform coherency and soc verification challenges. In 13th International Systemon-Chip (SoC) Conference, Exhibit & Workshops, 2013.Google ScholarGoogle Scholar
  16. Y. Sun, X. Gong, A. K. Ziabari, L. Yu, X. Li, S. Mukherjee, C. Mccardwell, A. Villegas, and D. Kaeli. Hetero-mark, a benchmark suite for cpu-gpu collaborative computing. In Workload Characterization (IISWC), 2016 IEEE International Symposium on, pages 1–10. IEEE, 2016.Google ScholarGoogle Scholar
  17. J. Vesely, A. Basu, M. Oskin, G. Loh, and A. Bhattacharjee. Observations and opportunities in architecting shared virtual memory for heterogenous systems. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 2016.Google ScholarGoogle Scholar
  18. T. Vijayaraghavan, Y. Eckert, G. H. Loh, M. J. Schulte, M. Ignatowski, B. M. Beckmann, W. C. Brantley, J. L. Greathouse, W. Huang, A. Karunanithi, et al. Design and analysis of an apu for exascale computing. In HPCA, 2017.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Analyzing memory management methods on integrated CPU-GPU systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 52, Issue 9
        ISMM '17
        September 2017
        127 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3156685
        Issue’s Table of Contents
        • cover image ACM Conferences
          ISMM 2017: Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management
          June 2017
          127 pages
          ISBN:9781450350440
          DOI:10.1145/3092255

        Copyright © 2017 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 June 2017

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!