skip to main content
research-article

Delegated isolation

Published:22 October 2011Publication History
Skip Abstract Section

Abstract

Isolation---the property that a task can access shared data without interference from other tasks---is one of the most basic concerns in parallel programming. In this paper, we present Aida, a new model of isolated execution for parallel programs that perform frequent, irregular accesses to pointer-based shared data structures. The three primary benefits of Aida are dynamism, safety and liveness guarantees, and programmability. First, Aida allows tasks to dynamically select and modify, in an isolated manner, arbitrary fine-grained regions in shared data structures, all the while maintaining a high level of concurrency. Consequently, the model can achieve scalable parallelization of regular as well as irregular shared-memory applications. Second, the model offers freedom from data races, deadlocks, and livelocks. Third, no extra burden is imposed on programmers, who access the model via a simple, declarative isolation construct that is similar to that for transactional memory. The key new insight in Aida is a notion of delegation among concurrent isolated tasks (known in Aida as assemblies). Each assembly A is equipped with a region in the shared heap that it owns---the only objects accessed by A are those it owns, guaranteeing race-freedom. The region owned by A can grow or shrink flexibly---however, when A needs to own a datum owned by B, A delegates itself, as well as its owned region, to B. From now on, B has the responsibility of re-executing the task A set out to complete. Delegation as above is the only inter-assembly communication primitive in Aida. In addition to reducing contention in a local, data-driven manner, it guarantees freedom from deadlocks and livelocks.

We offer an implementation of Aida on top of the Habanero Java parallel programming language. The implementation employs several novel ideas, including the use of a union-find data structure to represent tasks and the regions that they own. A thorough evaluation using several irregular data-parallel benchmarks demonstrates the low overhead and excellent scalability of Aida, as well as its benefits over existing approaches to declarative isolation. Our results show that Aida performs on par with the state-of-the-art customized implementations of irregular applications and much better than coarse-grained locking and transactional memory approaches.

References

  1. Richard J. Anderson and Heather Woll. Wait-free parallel algorithms for the union-find problem. In Proceedings of the twenty-third annual ACM symposium on Theory of computing, STOC '91, pages 370--380, New York, NY, USA, 1991. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ganesh Bikshandi, Jose G. Castanos, Sreedhar B. Kodali, V. Krishna Nandivada, Igor Peshansky, Vijay A. Saraswat, Sayantan Sur, Pradeep Varma, and Tong Wen. Efficient, portable implementation of asynchronous multi-place programs. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '09, pages 271--282, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work-stealing. In Proceedins of the 35th Annual IEEE Conference on Foundations of Computer Science, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Donald Burke, Joshua Epstein, Derek Cummings, Jon Parker, Kenneth Cline, Ramesh Singa, and Shubha Charkravarty. Individual-based computational modeling of smallpox epidemic control strategies. Academic Emergency Medicine, 13(11):1142--1149, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  5. Martin Burtscher, Milind Kulkarni, Dimitrios Prountzos, and Keshav Pingali. On the scalability of an automatically parallelized irregular application. In José Nelson Amaral, editor, Languages and Compilers for Parallel Computing, pages 109--123. Springer-Verlag, Berlin, Heidelberg, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Calin Cascaval, Colin Blundell, Maged Michael, Harold W. Cain, Peng Wu, Stefanie Chiras, and Siddhartha Chatterjee. Software transactional memory: Why is it only a research toy? Queue, 6:46--58, September 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Vincent Cavé, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. Habanero-Java: the New Adventures of Old X10. In PPPJ'11: Proceedings of 9th International Conference on the Principles and Practice of Programming in Java, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Robit Chandra, Leonardo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, and Ramesh Menon. Parallel programming in OpenMP. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Charles et al. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, pages 519--538, New York, NY, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Paul Chew. Guaranteed-quality mesh generation for curved surfaces. In Proceedings of the ninth annual symposium on Computational geometry, SCG '93, pages 274--280, New York, NY, USA, 1993. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 9th DIMACS Implemetation Challenge. Available from http://www.dis.uniroma1.it/~challenge9/.Google ScholarGoogle Scholar
  12. DS™ 2.1 beta. Available from http://www.cs.brown.edu/$\sim$mph/.Google ScholarGoogle Scholar
  13. Panagiota Fatourou and Nikolaos D. Kallimanis. Blocking Universal Constructions. Technical report, University of Ioannina, 2011. TR-2011-05.Google ScholarGoogle Scholar
  14. Peter A. Franaszek, John T. Robinson, and Alexander Thomasian. Concurrency control for high contention environments. ACM Trans. Database Syst., 17:304--345, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Keir Fraser and Tim Harris. Concurrent programming without locks. ACM Trans. Comput. Syst., 25, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Galil and G. Italiano. Data structures and algorithms for disjoint set union problems. ACM Comput. Surv., 23(3):319--344, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Galois. Available from http://iss.ices.utexas.edu/galois/.Google ScholarGoogle Scholar
  18. Habanero java web page. http://habanero.rice.edu/hj.Google ScholarGoogle Scholar
  19. Tim Harris, James R. Larus, and Ravi Rajwar. Transactional Memory, 2nd Edition. Morgan & Claypool, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, SPAA '10, pages 355--364, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Danny Hendler and Nir Shavit. Work dealing. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, SPAA '02, pages 164--172, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Maurice Herlihy, Victor Luchangco, and Mark Moir. A flexible framework for implementing software transactional memory. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, OOPSLA '06, pages 253--262, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dieter Jungnickel. Graphs, Networks and Algorithms. Springer Publishing Company, Incorporated, 3rd edition, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Kulkarni, M. Burtscher, C. Cascaval, and K. Pingali. Lonestar: A suite of parallel irregular programs. In Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on, pages 65 --76, april 2009.Google ScholarGoogle ScholarCross RefCross Ref
  25. Milind Kulkarni, Martin Burtscher, Rajeshkar Inkulu, Keshav Pingali, and Calin Casçaval. How much parallelism is there in irregular applications? In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '09, pages 3--14, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. Optimistic parallelism requires abstractions. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI '07, pages 211--222, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. The Lonestar Benchmark Suite. Available from http://iss.ices.utexas.edu/lonestar/.Google ScholarGoogle Scholar
  28. Roberto Lublinerman, Swarat Chaudhuri, and Pavol Cerny. Parallel programming with object assemblies. In Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, OOPSLA '09, pages 61--80, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. V. Marathe, W. Scherer, and M. Scott. Design tradeoffs in modern software transactional memory systems. In Workshop on languages, compilers, and run-time support for scalable systems, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mario Méndez-Lojo, Donald Nguyen, Dimitrios Prountzos, Xin Sui, M. Amber Hassaan, Milind Kulkarni, Martin Burtscher, and Keshav Pingali. Structure-driven optimizations for amorphous data-parallel programs. In Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '10, pages 3--14, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Oyama, K. Taura, and A. Yonezawa. Executing parallel programs with synchronization bottlenecks efficiently. In Proceedings of the international workshop on parallel and distributed computing for symbolic and irregular applications, PDSIA '99, pages 182--204. World Scientific, 1999.Google ScholarGoogle Scholar
  32. K. Pingali, M. Kulkarni, D. Nguyen, M. Burtscher, M. Mendez-Lojo, D. Prountzos, X. Sui, and Z. Zhong. Amorphous data-parallelism in irregular applications. Technical Report TR-09-05, University of Texas at Austin, 2009.Google ScholarGoogle Scholar
  33. Dimitrios Prountzos, Roman Manevich, Keshav Pingali, and Kathryn S. McKinley. A shape analysis for optimizing parallel graph programs. In Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL '11, pages 159--172, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. William N. Scherer, III and Michael L. Scott. Advanced contention management for dynamic software transactional memory. In Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing, PODC '05, pages 240--248, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Nir Shavit and Dan Touitou. Software transactional memory. In Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing, PODC '95, pages 204--213, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In Proceedings of the 22nd annual international conference on Supercomputing, ICS '08, pages 277--288, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sunay Tripathi. FireEngine -- A New Networking Architecture for the Solaris Operating System. Technical report, Sun Microsystems, 2004.Google ScholarGoogle Scholar
  38. Yonghong Yan, Jisheng Zhao, Yi Guo, and Vivek Sarkar. Hierarchical place trees: A portable abstraction for task parallelism and data movement. In The 22nd International Workshop on Languages and Compilers for Parallel Computing (LCPC'09), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Delegated isolation

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 46, Issue 10
            OOPSLA '11
            October 2011
            1063 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2076021
            Issue’s Table of Contents
            • cover image ACM Conferences
              OOPSLA '11: Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
              October 2011
              1104 pages
              ISBN:9781450309400
              DOI:10.1145/2048066

            Copyright © 2011 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 22 October 2011

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!