skip to main content
research-article

Synchronization Using Remote-Scope Promotion

Published:14 March 2015Publication History
Skip Abstract Section

Abstract

Heterogeneous system architecture (HSA) and OpenCL define scoped synchronization to facilitate low overhead communication across a subset of threads. Scoped synchronization works well for static sharing patterns, where consumer threads are known a priori. It works poorly for dynamic sharing patterns (e.g., work stealing) where programmers cannot use a faster small scope due to the rare possibility that the work is stolen by a thread in a distant slower scope. This puts programmers in a conundrum: optimize the common case by synchronizing at a faster small scope or use work stealing at a slower large scope. In this paper, we propose to extend scoped synchronization with remote-scope promotion. This allows the most frequent sharers to synchronize through a small scope. Infrequent sharers synchronize by promoting that remote small scope to a larger shared scope. Synchronization using remote-scope promotion provides performance robustness for dynamic workloads, where the benefits provided by scoped synchronization and work stealing are hard to anticipate. Compared to a naïve baseline, static scoped synchronization alone achieves a 1.07x speedup on average and dynamic work stealing alone achieves a 1.18x speedup on average. In contrast, synchronization using remote-scope promotion achieves a robust 1.25x speedup on average, across a diverse set of graph benchmarks and inputs.

References

  1. "OpenCL 2.0 Reference Pages." {Online}. Available: http://www.khronos.org/registry/cl/sdk/2.0/docs/man/xhtml/.Google ScholarGoogle Scholar
  2. "CUDA C Programming Guide." {Online}. Available: http://docs.nvidia.com/cuda/cuda-c-programming-guide/.Google ScholarGoogle Scholar
  3. "HSA Programmer's Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer's Guide, and Object Format (BRIG) Version 1.0 Provisional," HSA Foundation, Spring 2013.Google ScholarGoogle Scholar
  4. T. Aila and S. Laine, "Understanding the Efficiency of Ray Traversal on GPUs," In Proceedings of the Conference on High Performance Graphics, New York, N.Y., USA, 2009, pp. 145--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Frigo, C. E. Leiserson, and K. H. Randall, "The Imple-mentation of the Cilk-5 Multithreaded Language," In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, New York, N.Y., USA, 1998, pp. 212--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. OpenMP Architecture Review Board, "OpenMP Application Program Interface Version 4.0," {Online}. http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf.Google ScholarGoogle Scholar
  7. "Intel Threading Building Blocks." {Online}. Available: http://www.threadingbuildingblocks.org/.Google ScholarGoogle Scholar
  8. D. Leijen, W. Schulte, and S. Burckhardt, "The design of a task parallel library," In Proceedings of the 24th ACM SIG-PLAN conference on Object oriented programming systems languages and applications, pp. 227--242, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. International Organization for Standardization, "Working Draft, Standard for Programming Language C++," {Online}. Available: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdfGoogle ScholarGoogle Scholar
  10. D.R. Hower, B.A. Hechtman, B.M. Beckmann, B.R. Gaster, M.D. Hill, S.K. Reinhardt, and D.A. Wood, "Heterogeneous-race-free Memory Models," In The 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-19), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B.R. Gaster, D. Hower, and L. Howes, "HRF-Relaxed: Adapting HRF to the complexities of industrial heterogeneous memory models," In Transactions on Architecture and Code Optimization (TACO), 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. AMD, "Southern Islands Series Instruction Set Architecture," 2012.Google ScholarGoogle Scholar
  13. S. Owens, S. Sarkar, and P. Sewell, "A Better x86 Memory Model: x86-TSO," In Proceedings of the Conference on Theorem Proving in Higher Order Logics, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. J. Sorin, M. D. Hill, and D. A. Wood, "A Primer on Memory Consistency and Cache Coherence," Morgan and Claypool, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. A. Hechtman, S. Che, D. R. Hower, Y. Tian, B. M. Beckmann, M. D. Hill, S. K. Reinhardt, and D. A. Wood, "QuickRelease: A Throughput-oriented Approach to Release Consistency on GPUs," presented at the 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014).Google ScholarGoogle Scholar
  16. N.S. Arora, R.D. Blumofe, and C. Greg Plaxton, "Thread scheduling for multiprogrammed multiprocessors," In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, ACM, Puerto Vallarta, Mexico, 1998, pp. 119--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Cederman and P. Tsigas, "Dynamic Load-Balancing Using Work-Stealing," In GPU Computing Gems Jade Edition, Wen-Mei Hwu (Editor-in-Chief), Morgan Kaufmann.Google ScholarGoogle Scholar
  18. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 Simulator," In SIGARCH Computer Arch. News, vol. 39, no. 2, pp. 1--7, Aug. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Che, B. M. Beckmann, S. K. Reinhardt, and K. Skadron, "Pannotia: Understanding Irregular GPGPU Graph Applications," In Proceedings of the International Symposium on Workload Characterizations, Sept. 2013.Google ScholarGoogle Scholar
  20. DIMACS Implementation Challenges. http://dimacs.rutgers.edu/Challenges/Google ScholarGoogle Scholar
  21. Web resource: http://www.sommer.jp/graphs/Google ScholarGoogle Scholar
  22. B. N. Bershad, M. J. Zekauskas, and W. A. Sawdon, "The Midway distributed shared memory system," In Proc. 38th IEEE Computer Society Int. Conf., pp. 528--537, 1993.Google ScholarGoogle Scholar
  23. L. Iftode, J. P. Singh, and K. Li, "Scope consistency: a bridge between release consistency and entry consistency," In Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures, p.277--287, June 24--26, 1996, Padua, Italy. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Dice, M.S. Moir, and W.N. Scherer III, "Quickly reacquirable locks," US Patent 7,814,488, 2010.Google ScholarGoogle Scholar
  25. W.W.L. Fung and T.M. Aamodt, "Energy Efficient GPU Transactional Memory via Space-Time Optimizations," In Proceedings of the 46th IEEE/ACM International Symposium on Microarchitecture (MICRO-46), pp. 408--420, Davis, CA, Dec. 7--11, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Cederman, P. Tsigas, and M.T. Chaudhry, "Towards a Software Transactional Memory for Graphics Processors," In Proceedings of the 10th Eurographics Symposium on Parallel Graphics and Visualization (EGPGV 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. I. Singh, A. Shriraman, W.W.L. Fung, M. O'Connor, and T.M. Aamodt, "Cache Coherence for GPU Architectures," In Proceedings of the 19th IEEE International Symposium on High-Performance Computer Architecture (HPCA-19), pp. 578--590, Shenzhen, China, Feb. 23--27, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Tzeng, A. Patney, and J.D. Owens, "Task Management for Irregular-Parallel Workloads on the GPU," In Proceedings of High Performance Graphics 2010, pp. 29--37. June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Synchronization Using Remote-Scope Promotion

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!