skip to main content
research-article

Iterative data-parallel mark&sweep on a GPU

Published:04 June 2011Publication History
Skip Abstract Section

Abstract

Automatic memory management makes programming easier. This is also true for general purpose GPU computing where currently no garbage collectors exist. In this paper we present a parallel mark-and-sweep collector to collect GPU memory on the GPU and tune its performance. Performance is increased by: (1) data-parallel marking and sweeping of regions of memory, (2) marking all elements of large arrays in parallel, (3) trading recursion over parallelism to match deeply linked data structures.

(1) is achieved by coarsely processing all potential objects in a region of memory in parallel. When during (1) a large array is detected, it is put aside and a parallel-for is later issued on the GPU to mark its elements. For a data-structure that is a large linked list, we dynamically switch to a marking version with less overhead by performing a few recursive steps sequentially (and multiple lists in parallel).

The collector achieves a speedup of a factor of up-to 11 over a sequential collector on the same GPU.

References

  1. C. Attanasio, D. Bacon, A. Cocchi, and S. Smith. A Comparative Evaluation of Parallel Garbage Collector Implementations. In Languages and Compilers for Parallel Computing. LCPC'03, volume 2624 of LNCS, pages 79--94. Springer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Azatchi, Y. Levanoni, H. Paz, and E. Petrank. An on-the-fly mark and sweep garbage collector based on sliding views. In Proc. 18th ACM SIGPLAN Conf. Object-oriented Programing, Systems, Languages, and Applications, OOPSLA'03, pages 269--281, Anaheim, CA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D.F. Bacon, C.R. Attanasio, H.B. Lee, V.T. Rajan, and S. Smith. Java without the coffee breaks: a nonintrusive multiprocessor garbage collector. In Proc. ACM SIGPLAN 2001 Conf. Programming Language Design and Implementation, PLDI'01, pages 92--103, Snowbird, UT, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Barabash, O. Ben-Yitzhak, I. Goft, E.K. Kolodner, V. Leikehman, Y. Ossia, A. Owshanko, and E. Petrank. A parallel, incremental, mostly concurrent garbage collector for servers. ACM Trans. Program. Lang. Syst., issue 6, 27:1097--1146, Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Barabash, Y. Ossia, and E. Petrank. Mostly concurrent garbage collection revisited. In Proc. 18th ACM SIGPLAN Conf. Object-Oriented Programing, Systems, Languages, and Applications, OOPSLA'03, pages 255--268, Anaheim, CA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Barabash and E. Petrank. Tracing garbage collection on highly parallel platforms. In Proc. 2010 Intl. Symp. Memory Management, ISMM'10, pages 1--10, Toronto, Canada, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H.J. Boehm, A.J. Demers, and S. Shenker. Mostly parallel garbage collection. In Proc. ACM SIGPLAN 1991 Conf. Programming Language Design and Implementation, PLDI'91, pages 157--164, Toronto, Canada, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Endo, K. Taura, and A. Yonezawa. A scalable mark-sweep garbage collector on large-scale shared-memory machines. In Proc. 1997 ACM/IEEE Conf. Supercomputing, pages 1--14, San Jose, CA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Jones and R. Lins. Garbage Collection: Algorithms for Automatic Dynamic Memory Management. Wiley, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Marlow, T. Harris, R.P. James, and S. Peyton Jones. Parallel generational-copying garbage collection with a block-structured heap. In Proc. 7th Intl. Symp. Memory Management, ISMM'08, pages 11--20, Tucson, AZ, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fridtjof Siebert. Limits of parallel marking garbage collection. In Proc. 7th Intl. Symp. on Memory Management, ISMM '08, pages 21--29, Tucson, AZ, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ming Wu and Xiao-Feng Li. Task-pushing: a Scalable Parallel GC Marking Algorithm without Synchronization Operations. In Proc. IEEE Parallel and Distributed Processing Symp., IPDPS'07, pages 1--10, Long Beach, CA, 2007.Google ScholarGoogle Scholar

Index Terms

  1. Iterative data-parallel mark&sweep on a GPU

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 46, Issue 11
        ISMM '11
        November 2011
        135 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2076022
        Issue’s Table of Contents
        • cover image ACM Conferences
          ISMM '11: Proceedings of the international symposium on Memory management
          June 2011
          148 pages
          ISBN:9781450302630
          DOI:10.1145/1993478

        Copyright © 2011 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 June 2011

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!