skip to main content
abstract

SemCache++: semantics-aware caching for efficient multi-GPU offloading

Published:24 January 2015Publication History
Skip Abstract Section

Abstract

Offloading computations to multiple GPUs is not an easy task. It requires decomposing data, distributing computations and handling communication manually. Drop-in GPU libraries have made it easy to offload computations to multiple GPUs by hiding this complexity inside library calls. Such encapsulation prevents the reuse of the data between successive kernel invocations resulting in redundant communication. This limitation exists in multi-GPU libraries like CUBLASXT. In this paper, we introduce SemCache++, a semantics-aware GPU cache that automatically manages communication between the CPU and multiple GPUs in addition to optimizing communication by eliminating redundant transfers using caching. SemCache++ is used to build the first multi-GPU drop-in replacement library that (a) uses the virtual memory to automatically manage and optimize multi-GPU communication and (b) requires no program rewriting or annotations. Our caching technique is efficient; it uses a two level caching directory to track matrices and sub-matrices. Experimental results show that our system can eliminate redundant communication and deliver significant performance improvements over multi-GPU libraries like CUBLASXT.

References

  1. N. AlSaber and M. Kulkarni. Semcache: Semantics-aware caching for efficient gpu offloading. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS ’13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. Starpu: A unified platform for task scheduling on heterogeneous multicore architectures. In Euro-Par 2009 Parallel Processing. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Ayguadé, R. M. Badia, F. D. Igual, J. Labarta, R. Mayo, and E. S. Quintana-Ort´ı. An extension of the starss programming model for platforms with multiple gpus. In Proceedings of the 15th International Euro-Par Conference on Parallel Processing, Euro-Par ’09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. R. Humphrey, D. K. Price, K. E. Spagnoli, A. L. Paolini, and E. J. Kelmelis. Cula: hybrid gpu accelerated linear algebra routines. SPIE Defense and Security Symposium (DSS).Google ScholarGoogle Scholar
  5. T. B. Jablin, J. A. Jablin, P. Prabhu, F. Liu, and D. I. August. Dynamically managed data for cpu-gpu architectures. In Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO ’12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. NVIDIA. Cuda. http://developer.nvidia.com/ cuda-toolkit.Google ScholarGoogle Scholar
  7. G. Quintana-Ort´ı, F. D. Igual, E. S. Quintana-Ort´ı, and R. A. van de Geijn. Solving dense linear systems on platforms with multiple hardware accelerators. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP ’09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. D. S. Tomov, R. Nath and J. Dongarra. Magma version 0.2 user guide.Google ScholarGoogle Scholar

Index Terms

  1. SemCache++: semantics-aware caching for efficient multi-GPU offloading

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGPLAN Notices
              ACM SIGPLAN Notices  Volume 50, Issue 8
              PPoPP '15
              August 2015
              290 pages
              ISSN:0362-1340
              EISSN:1558-1160
              DOI:10.1145/2858788
              • Editor:
              • Andy Gill
              Issue’s Table of Contents
              • cover image ACM Conferences
                PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
                January 2015
                290 pages
                ISBN:9781450332057
                DOI:10.1145/2688500

              Copyright © 2015 Owner/Author

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 24 January 2015

              Check for updates

              Qualifiers

              • abstract

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!