skip to main content
research-article

Contention in Structured Concurrency: Provably Efficient Dynamic Non-Zero Indicators for Nested Parallelism

Published:26 January 2017Publication History
Skip Abstract Section

Abstract

Over the past two decades, many concurrent data structures have been designed and implemented. Nearly all such work analyzes concurrent data structures empirically, omitting asymptotic bounds on their efficiency, partly because of the complexity of the analysis needed, and partly because of the difficulty of obtaining relevant asymptotic bounds: when the analysis takes into account important practical factors, such as contention, it is difficult or even impossible to prove desirable bounds.

In this paper, we show that considering structured concurrency or relaxed concurrency models can enable establishing strong bounds, also for contention. To this end, we first present a dynamic relaxed counter data structure that indicates the non-zero status of the counter. Our data structure extends a recently proposed data structure, called SNZI, allowing our structure to grow dynamically in response to the increasing degree of concurrency in the system.

Using the dynamic SNZI data structure, we then present a concurrent data structure for series-parallel directed acyclic graphs (sp-dags), a key data structure widely used in the implementation of modern parallel programming languages. The key component of sp-dags is an in-counter data structure that is an instance of our dynamic SNZI. We analyze the efficiency of our concurrent sp-dags and in-counter data structures under nested-parallel computing paradigm. This paradigm offers a structured model for concurrency. Under this model, we prove that our data structures require amortized (1) shared memory steps, including contention. We present an implementation and an experimental evaluation that suggests that the sp-dags data structure is practical and can perform well in practice.

References

  1. Umut A. Acar, Naama Ben-David, and Mike Rainey. Contention in structured concurrency: Provably efficient dynamic nonzero indicators for nested parallel computation. Technical Report Carnegie Mellon University-CS-16-133, Department of Computer Science, Carnegie Mellon University, 2016.Google ScholarGoogle Scholar
  2. Umut A. Acar, Arthur Charguéraud, and Mike Rainey. Scheduling parallel programs by work stealing with private deques. In PPoPP '13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Umut A. Acar, Arthur Charguéraud, Mike Rainey, and Filip Sieczkowski. Dag-calculus: A calculus for parallel computation. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, ICFP 2016, pages 18--32, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Agarwal and M. Cherian. Adaptive backoff synchronization techniques. In Proceedings of the 16th Annual International Symposium on Computer Architecture, ISCA '89, pages 396--406, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. James H. Anderson and Yong-Jik Kim. An improved lower bound for the time complexity of mutual exclusion. Distrib. Comput., 15(4):221--253, December 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. E. Anderson. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst., 1(1):6--16, January 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures, SPAA '98, pages 119--129. ACM Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720--748, September 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Trevor Brown, Faith Ellen, and Eric Ruppert. A general technique for non-blocking trees. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pages 329--342, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA '05, pages 519--538. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Robert Cypher. The communication requirements of mutual exclusion. In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '95, pages 147--156, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cynthia Dwork, Maurice Herlihy, and Orli Waarts. Contention in shared memory algorithms. Journal of the ACM (JACM), 44(6):779--805, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Faith Ellen, Panagiota Fatourou, Joanna Helga, and Eric Ruppert. The amortized complexity of non-blocking binary search trees. In Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing, PODC '14, pages 332--340, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Faith Ellen, Yossi Lev, Victor Luchangco, and Mark Moir. Snzi: Scalable nonzero indicators. In Proceedings of the Twenty-sixth Annual ACM Symposium on Principles of Distributed Computing, PODC '07, pages 13--22, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Faith Fich and Eric Ruppert. Hundreds of impossibility results for distributed computing. Distrib. Comput., 16(2-3):121--163, September 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Faith Ellen Fich, Danny Hendler, and Nir Shavit. Linear lower bounds on real-world implementations of concurrent objects. In Foundations of Computer Science, 2005. FOCS 2005. 46th Annual IEEE Symposium on, pages 165--173. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Matthew Fluet, Mike Rainey, John Reppy, and Adam Shaw. Implicitly threaded parallelism in Manticore. Journal of Functional Programming, 20(5-6):1--40, 2011.Google ScholarGoogle Scholar
  18. Mikhail Fomitchev and Eric Ruppert. Lock-free linked lists and skip lists. In Proceedings of the Twenty-third Annual ACM Symposium on Principles of Distributed Computing, PODC '04, pages 50--59, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pages 212--223, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Phillip B Gibbons, Yossi Matias, and Vijaya Ramachandran. Efficient low-contention parallel algorithms. In Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, pages 236--247. ACM, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Phillip B Gibbons, Yossi Matias, Vijaya Ramachandran, et al. The queue-read queue-write pram model: Accounting for contention in parallel algorithms. SIAM Journal on Computing, pages 638--648, 1997.Google ScholarGoogle Scholar
  22. James R. Goodman, Mary K. Vernon, and Philip J. Woest. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS III, pages 64--75, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tim Harris and Keir Fraser. Language support for lightweight transactions. In Proceedings of the 18th Annual ACM SIGPLAN Conference on Object-oriented Programing, Systems, Languages, and Applications, OOPSLA '03, pages 388--402, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Maurice Herlihy. Wait-free synchronization. ACM Trans. Program. Lang. Syst., 13:124--149, January 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Maurice Herlihy. A methodology for implementing highly concurrent data objects. ACM Trans. Program. Lang. Syst., 15(5):745--770, November 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Maurice Herlihy, Victor Luchangco, Mark Moir, and William N Scherer III. Software transactional memory for dynamic-sized data structures. In Proceedings of the twenty-second annual symposium on Principles of distributed computing, pages 92--101. ACM, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Maurice Herlihy, Nir Shavit, and Orli Waarts. Linearizable counting networks. Distributed Computing, 9(4):193--203, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Shams Mahmood Imam and Vivek Sarkay. Habanero-java library: a java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ '14, Cracow, Poland, September 23--26, 2014, pages 75--86, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Intel. Intel threading building blocks. 2011. https://www.threadingbuildingblocks.org/.Google ScholarGoogle Scholar
  30. Prasad Jayanti, King Tan, and Sam Toueg. Time and space lower bounds for nonblocking implementations. SIAM Journal on Computing, 30(2):438--456, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Richard M Karp and Yanjun Zhang. Randomized parallel algorithms for backtrack search and branch-and-bound computation. Journal of the ACM (JACM), 40(3):765--789, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Gabriele Keller, Manuel M.T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, and Ben Lippmeier. Regular, shape-polymorphic, parallel arrays in haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming, ICFP '10, pages 261--272, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Alex Kogan and Erez Petrank. A methodology for creating fast wait-free data structures. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 141--150, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Doug Lea. A java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA '00, pages 36--43, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, OOPSLA '09, pages 227--242, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pangfeng Liu, William Aiello, and Sandeep Bhatt. An atomic model for message-passing. In Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures, pages 154--163. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Maged M. Michael and Michael L. Scott. Nonblocking algorithms and preemption-safe locking on multiprogrammed shared memory multiprocessors. J. Parallel Distrib. Comput., 51(1):1--26, May 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Mark Moir and Nir Shavit. Concurrent data structures. Handbook of Data Structures and Applications, pages 47--14, 2007.Google ScholarGoogle Scholar
  39. Rotem Oshman and Nir Shavit. The skiptrie: Low-depth concurrent search without rebalancing. In Proceedings of the 2013 ACM Symposium on Principles of Distributed Computing, PODC '13, pages 23--32, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. William N. Scherer, III and Michael L. Scott. Advanced contention management for dynamic software transactional memory. In Proceedings of the Twenty-fourth Annual ACM Symposium on Principles of Distributed Computing, PODC '05, pages 240--248, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Nir Shavit and Dan Touitou. Software transactional memory. In Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing, PODC '95, pages 204--213, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Nir Shavit and Asaph Zemach. Combining funnels. J. Parallel Distrib. Comput., 60(11):1355--1387, November 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. Reducing contention through priority updates. In Proceedings of the Twenty-fifth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '13, pages 152--163, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Shahar Timnat and Erez Petrank. A practical wait-free simulation for lock-free data structures. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pages 357--368, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. R Kent Treiber. Systems programming: Coping with parallelism. Technical report, International Business Machines Incorporated, Thomas J. Watson Research Center, 1986.Google ScholarGoogle Scholar

Index Terms

  1. Contention in Structured Concurrency: Provably Efficient Dynamic Non-Zero Indicators for Nested Parallelism

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!