Abstract
Scheduling is the assignment of tasks or activities to processors for execution, and it is an important concern in parallel programming. Most prior work on scheduling has focused either on static scheduling of applications in which the dependence graph is known at compile-time or on dynamic scheduling of independent loop iterations such as in OpenMP.
In irregular algorithms, dependences between activities are complex functions of runtime values so these algorithms are not amenable to compile-time analysis nor do they consist of independent activities. Moreover, the amount of work can vary dramatically with the scheduling policy. To handle these complexities, implementations of irregular algorithms employ carefully handcrafted, algorithm-specific schedulers but these schedulers are themselves parallel programs, complicating the parallel programming problem further.
In this paper, we present a flexible and efficient approach for specifying and synthesizing scheduling policies for irregular algorithms. We develop a simple compositional specification language and show how it can concisely encode scheduling policies in the literature. Then, we show how to synthesize efficient parallel schedulers from these specifications. We evaluate our approach for five irregular algorithms on three multicore architectures and show that (1) the performance of some algorithms can improve by orders of magnitude with the right scheduling policy, and (2) for the same policy, the overheads of our synthesized schedulers are comparable to those of fixed-function schedulers.
- N. Amenta, S. Choi, and G. Rote. Incremental constructions con brio. In Proc. Symp. on Computational Geometry (SCG), pages 211--219, 2003. Google Scholar
Digital Library
- L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, May 1994. (DIKU report 94/19).Google Scholar
- D. Bader and V. Sachdeva. A cache-aware parallel implementation of the push-relabel network flow algorithm and experimental evaluation of the gap relabeling heuristic. In Proc. 18th ISCA International Conference on Parallel and Distributed Computing Systems (PDCS), September 2005.Google Scholar
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207--216, 1995. Google Scholar
Digital Library
- A. Bowyer. Computing Dirichlet tessellations. The Computer Journal, 24(2):162--166, 1981.Google Scholar
Cross Ref
- B. V. Cherkassy and A. V. Goldberg. On implementing push-relabel method for the maximum flow problem. In Proc. Intl. Conf. on Integer Programming and Combinatorial Optimization (IPCO), pages 157--171, London, UK, 1995. Google Scholar
Digital Library
- L. P. Chew. Guaranteed-quality mesh generation for curved surfaces. In Proc. Symp. on Computational Geometry (SCG), 1993. Google Scholar
Digital Library
- K. L. Clarkson and P. W. Shor. Applications of random sampling in computational geometry, ii. Discrete Comput. Geom., 4(5):287--421, 1989.Google Scholar
Digital Library
- T. Cormen, C. Leiserson, R. Rivest, and C. Stein, editors. Introduction to Algorithms. MIT Press, 2001. Google Scholar
Digital Library
- L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng., 5(1):46--55, 1998. Google Scholar
Digital Library
- A. Duran, J. Corbalán, and E. Ayguadé. Evaluation of OpenMP task scheduling strategies. In Proc. Conf. on OpenMP in a new era of parallelism (IWOMP), pages 100--110, 2008. Google Scholar
Digital Library
- A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. J. ACM, 35(4):921--940, 1988. Google Scholar
Digital Library
- R. H. Halstead, Jr. MULTILISP: a language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst., 7:501--538, October 1985. Google Scholar
Digital Library
- B. Hardekopf and C. Lin. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In Proc. Conf. on Programming Language Design and Implementation (PLDI), 2007. Google Scholar
Digital Library
- M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In Proc. of International Symposium on Computer Architecture (ISCA), 1993. Google Scholar
Digital Library
- M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463--492, 1990. Google Scholar
Digital Library
- Intel Corporation. Intel threading building blocks 3.0. http://threadingbuildingblocks.org.Google Scholar
- M. Kulkarni, P. Carribault, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Scheduling strategies for optimistic parallel execution of irregular programs. In Proc. Symp. on Parallelism in algorithms and architectures (SPAA), pages 217--228, New York, NY, USA, 2008. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI), 42(6):211--222, 2007. Google Scholar
Digital Library
- M. Kulkarni, D. Prountzos, D. Nguyen, and K. Pingali. Defining and implementing commutativity conditions for parallel execution. regular rech report TR-ECE-09-11, School of Electrial and Computer Engineering, Purdue University, August 2009.Google Scholar
- D. Lea. A Java fork/join framework. In Proc. Conf. on Java Grande, pages 36--43, 2000. Google Scholar
Digital Library
- D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proc. Conf. on object-oriented programming, systems, languages, and applications (OOPSLA), pages 227--242, 2009. Google Scholar
Digital Library
- K. Madduri, D. Bader, J. Berry, and J. Crobak. Parallel shortest path algorithms for solving large-scale instances. Technical Report GT-CSE-06-19, Georgia Institute of Technology, September 2006.Google Scholar
- M. Méndez-Lojo, A. Mathew, and K. Pingali. Parallel inclusion-based points-to analysis. In Proc. Intl. Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH), 2010. Google Scholar
Digital Library
- U. Meyer and P. Sanders. Delta-stepping: A parallel single source shortest path algorithm. In Proc. European Symp. on Algorithms (ESA), pages 393--404, 1998. Google Scholar
Digital Library
- M. M. Michael, M. T. Vechev, and V. A. Saraswat. Idempotent work stealing. In Proc. Symp. on Principles and practice of parallel programming (PPoPP), pages 45--54, 2009. Google Scholar
Digital Library
- G. L. Miller. A time efficient Delaunay refinement algorithm. In Proc. Symposium on Discrete Algorithms (SODA), pages 400--409, New Orleans, 2004. Google Scholar
Digital Library
- F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program Analysis. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999. Google Scholar
Digital Library
- D. J. Pearce, P. H. J. Kelly, and C. L. Hankin. Online cycle detection and difference propagation for pointer analysis. In Intl. IEEE Workshop on Source Code Analysis and Manipulation (SCAM), pages 3--12, 2003.Google Scholar
Cross Ref
- K. Pingali, M. Kulkarni, D. Nguyen, M. Burtscher, M. Mendez-Lojo, D. Prountzos, X. Sui, and Z. Zhong. Amorphous data-parallelism in irregular algorithms. regular tech report TR-09-05, The University of Texas at Austin, 2009.Google Scholar
- J. Ruppert. A Delaunay refinement algorithm for quality 2-dimensional mesh generation. J. Algorithms, 18(3):548--585, 1995. Google Scholar
Digital Library
- J. R. Shewchuk. Triangle: Engineering a 2d quality mesh generator and Delaunay triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. Springer-Verlag, 1996. Google Scholar
Digital Library
- A. Solar-Lezama, C. G. Jones, and R. Bodik. Sketching concurrent data structures. In Proc. Conf. on Programming Language Design and Implementation (PLDI), pages 136--148, New York, NY, USA, 2008. Google Scholar
Digital Library
- M. T. Vechev, E. Yahav, D. F. Bacon, and N. Rinetzky. CGCExplorer: a semi-automated search procedure for provably correct concurrent collectors. In Proc. Conf. on Programming Language Design and Implementation (PLDI), pages 456--467, New York, NY, USA, 2007. Google Scholar
Digital Library
- D. F. Watson. Computing the n-dimensional tessellation with application to Voronoi polytopes. The Computer Journal, 24(2):167--172, 1981.Google Scholar
Cross Ref
- J. Wu, R. Das, J. Saltz, H. Berryman, and S. Hiranandani. Distributed memory compiler design for sparse problems. IEEE Transactions on Computers, 44, 1995. Google Scholar
Digital Library
Index Terms
Synthesizing concurrent schedulers for irregular algorithms
Recommendations
Synthesizing concurrent schedulers for irregular algorithms
ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systemsScheduling is the assignment of tasks or activities to processors for execution, and it is an important concern in parallel programming. Most prior work on scheduling has focused either on static scheduling of applications in which the dependence graph ...
Synthesizing concurrent schedulers for irregular algorithms
ASPLOS '11Scheduling is the assignment of tasks or activities to processors for execution, and it is an important concern in parallel programming. Most prior work on scheduling has focused either on static scheduling of applications in which the dependence graph ...
Modified Rate-Monotonic Algorithm for Scheduling Periodic Jobs with Deferred Deadlines
The deadline of a request is the time instant at which its execution must complete. The deadline of the request in any period of a job with deferred deadline is some time instant after the end of the period. The authors describe a semi-static priority-...







Comments