skip to main content
research-article

Non-blocking programming on multi-core graphics processors: (extended asbtract)

Published:20 June 2009Publication History
Skip Abstract Section

Abstract

This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures like the CUDA graphics processors. We first design three memory access models to capture the fundamental features of the new memory access mechanisms. Subsequently, we prove the exact synchronization power of these models in terms of their consensus numbers. These tight results show that the coalesced memory access mechanisms can facilitate strong synchronization between the threads of multicore processors, without the need of synchronization primitives other than reads and writes.

Moreover, based on the intrinsic features of recent GPU architectures, we construct strong synchronization objects like wait-free and t-resilient read-modify-write objects for a general model of recent GPU architectures without strong hardware synchronization primitives like test-and-set and compare-and-swap. Accesses to the wait-free objects have time complexity O(N), where N is the number of processes. Our result demonstrates that it is possible to construct waitfree synchronization mechanisms for GPUs without the need of strong synchronization primitives in hardware and that wait-free programming is possible for GPUs.

References

  1. Cell Broadband Engine Architecture, version 1.01. IBM, Sony and Toshiba Corporations, 2006.Google ScholarGoogle Scholar
  2. NVIDIA CUDA Compute Unified Device Architecture, Programming Guide, version 1.1. NVIDIA Corporation, 2007.Google ScholarGoogle Scholar
  3. S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. Computer, 29(12):66--76, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Attiya and J. Welch. Distributed Computing: Fundamentals, Simulations, and Advanced Topics. JohnWiley and Sons, Inc., 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Borowsky and E. Gafni. Generalized flp impossibility result for t-resilient asynchronous computations. In STOC '93: Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pages 91--100, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Buhrman, A. Panconesi, R. Silvestri, and P. Vitanyi. On the importance of having an identity or, is consensus really universal? Distrib. Comput., 18(3):167--176, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. Castano and P. Micikevicius. Personal communication. NVIDIA, 2008.Google ScholarGoogle Scholar
  8. T. Chandra, V. Hadzilacos, P. Jayanti, and S. Toueg. Generalized irreducibility of consensus and the equivalence of tresilient and wait-free implementations of consensus. SIAM Journal on Computing, 34(2):333--357, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Dolev, C. Dwork, and L. Stockmeyer. On the minimal synchronism needed for distributed consensus. J. ACM, 34(1):77--97, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Dwork and M. Herlihy. Bounded round number. In Proc. of Symp. on Principles of Distributed Computing (PODC), pages 53--64, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374--382, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. H. Ha, P. Tsigas, and O. J. Anshus. The synchronization power of coalesced memory accesses. In Proc. of the Intl. Symp. on Distributed Computing (DISC), pages 320--334, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. H. Ha, P. Tsigas, and O. J. Anshus. The synchronization power of coalesced memory accesses. Technical report CS:2008-68, University of Tromsø, Norway, 2008.Google ScholarGoogle Scholar
  14. P. H. Ha, P. Tsigas, and O. J. Anshus. Wait-free programming for general purpose computations on graphics processors. In Proc. of the IEEE Intl. Parallel and Distributed Processing Symp. (IPDPS), pages 1--12, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Herlihy. Randomized wait-free concurrent objects (extended abstract). In Proc. of Symp. on Principles of Distributed Computing (PODC), pages 11--21, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Herlihy. Wait-free synchronization. ACM Transaction on Programming and Systems, 11(1):124--149, Jan. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Lamport. Concurrent reading and writing. Commun. ACM, 20(11):806--811, 1977. Google ScholarGoogle ScholarCross RefCross Ref
  18. L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess progranm. IEEE Trans. Comput., 28(9):690--691, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. S. Lumetta and D. E. Culler. Managing concurrent access for shared memory active messages. In Proc. of the Intl. Parallel Processing Symp. (IPPS), page 272, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. M. Michael and M. L. Scott. Relative performance of preemption-safe locking and non-blocking synchronization on multiprogrammed shared memory multiprocessors. In Proc. of the IEEE Intl. Parallel Processing Symp. (IPPS, pages 267--273, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26(1):80--113, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  22. G. L. Peterson. Concurrent reading while writing. ACM Trans. Program. Lang. Syst., 5(1):46--55, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Pham and et.al. The design and implementation of a firstgeneration cell processor. In Solid-State Circuits Conference, 2005. Digest of Technical Papers. ISSCC. 2005 IEEE International, pages 184--185, 2005.Google ScholarGoogle Scholar
  24. E. Ruppert. Determining consensus numbers. In Proc. of Symp. on Principles of Distributed Computing (PODC), pages 93--99, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Tsigas and Y. Zhang. Evaluating the performance of non-blocking synchronization on shared-memory multiprocessors. In Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 320--321, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Tsigas and Y. Zhang. Integrating non-blocking synchronisation in parallel applications: Performance advantages and methodologies. In Proceedings of the 3rd ACM Workshop on Software and Performance (WOSP'02), pages 55--67, July 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Non-blocking programming on multi-core graphics processors: (extended asbtract)

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 36, Issue 5
        December 2008
        111 pages
        ISSN:0163-5964
        DOI:10.1145/1556444
        Issue’s Table of Contents

        Copyright © 2009 Authors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 June 2009

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!