ABSTRACT
The trend in microprocessor design toward multicore and manycore processors means that future performance gains in software will largely come from harnessing parallelism. To realize such gains, we need languages and implementations that can enable parallelism at many different levels. For example, an application might use both explicit threads to implement course-grain parallelism for independent tasks and implicit threads for fine-grain data-parallel computation over a large array. An important aspect of this requirement is supporting a wide range of different scheduling mechanisms for parallel computation.
In this paper, we describe the scheduling framework that we have designed and implemented for Manticore, a strict parallel functional language. We take a micro-kernel approach in our design: the compiler and runtime support a small collection of scheduling primitives upon which complex scheduling policies can be implemented. This framework is extremely flexible and can support a wide range of different scheduling policies. It also supports the nesting of schedulers, which is key to both supporting multiple scheduling policies in the same application and to hierarchies of speculative parallel computations.
In addition to describing our framework, we also illustrate its expressiveness with several popular scheduling techniques. We present a (mostly) modular approach to extending our schedulers to support cancellation. This mechanism is essential for implementing eager and speculative parallelism. We finally evaluate our framework with a series of benchmarks and an analysis.
Supplemental Material
Available for Download
Supplemental material for: A scheduling framework for general-purpose parallel languages
- Arora, N. S., R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors, 1998.Google Scholar
- Arvind, R. S. Nikhil, and K. K. Pingali. I-structures: Data structures for parallel computing. ACM TOPLAS, 11(4), October 1989, pp. 598--632. Google Scholar
Digital Library
- Blumofe, R. D. and C. E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5), 1999, pp. 720--748. Google Scholar
Digital Library
- Blumofe, R. D., C. C. Leiserson, and B. Song. Automatic processor allocation for work-stealing jobs, 1998.Google Scholar
- Blumofe, R. D. and D. Papadopoulos. The performance of work stealing in multiprogrammed environments (extended abstract). In Measurement and Modeling of Computer Systems, 1998, pp. 266--267. Google Scholar
Digital Library
- Burton, F. W. and M. R. Sleep. Executing functional programs on a virtual tree of processors. In FPCA '81, New York, NY, October 1981. ACM, pp. 187--194. Google Scholar
Digital Library
- Carlisle, M., L. J. Hendren, A. Rogers, and J. Reppy. Supporting SPMD execution for dynamic data structures. ACM TOPLAS, 17(2), March 1995, pp. 233--263. Google Scholar
Digital Library
- Chakravarty, M. M. T., R. Leschchinski, S. Peyton Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A status report. In DAMP '07, New York, NY, January 2007. ACM, pp. 10--18. Google Scholar
Digital Library
- Doligez, D. and G. Gonthier. Portable, unobtrusive garbage collection for multiprocessor systems. In POPL '94, New York, NY, January 1994. ACM, pp. 70--83. Google Scholar
Digital Library
- Dybvig, R. K. and R. Hieb. Engines from continuations. Comp. Lang., 14(2), 1989, pp. 109--123. Google Scholar
Digital Library
- Doligez, D. and X. Leroy. A concurrent, generational garbage collector for a multithreaded implementation of ml. In POPL '93, New York, NY, January 1993. ACM, pp. 113--123. Google Scholar
Digital Library
- Danaher, J. S., I.-T. A. Lee, and C. E. Leiserson. Programming with exceptions in JCilk. Science of Computer Programming, 63(2), 2006, pp. 147--171. Google Scholar
Digital Library
- Fedorova, A. Operating System Scheduling for Chip Multithreaded Processors. Ph.D. dissertation, Department of Computer Science, Harvard University, Boston, MA, 2007. Google Scholar
Digital Library
- Feeley, M. Polling efficiently on stock hardware. In FPCA '93, New York, NY, June 1993. ACM, pp. 179--187. Google Scholar
Digital Library
- Feitelson, D. G. Job scheduling in multiprogrammed parallel systems. Research Report RC 19790 (87657), IBM, October 1994. Second revision, August 1997.Google Scholar
- Fluet, M., N. Ford, M. Rainey, J. Reppy, A. Shaw, and Y. Xiao. Status report: The Manticore project. In ML '07, New York, NY, October 2007. ACM, pp. 15--24. Google Scholar
Digital Library
- Frigo, M., C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI '98, New York, NY, June 1998. pp. 212--223. Google Scholar
Digital Library
- Fluet, M., M. Rainey, J. Reppy, A. Shaw, and Y. Xiao. Manticore: A heterogeneous parallel language. In DAMP '07, New York, NY, January 2007. ACM, pp. 37--44. Google Scholar
Digital Library
- Fluet, M., M. Rainey, J. Reppy, and A. Shaw. Implicitly-threaded parallelism in Manticore. In ICFP '08, New York, NY, September 2008. ACM. Google Scholar
Digital Library
- Flanagan, C., A. Sabry, B. F. Duba, and M. Felleisen. The essence of compiling with continuations. In PLDI '93, New York, NY, June 1993. ACM, pp. 237--247. Google Scholar
Digital Library
- Halstead Jr., R. H. Implementation of multilisp: Lisp on a multiprocessor. In LFP '84, New York, NY, August 1984. ACM, pp. 9--17. Google Scholar
Digital Library
- Haynes, C. T. and D. P. Friedman. Engines build process abstractions. In LFP '84, New York, NY, August 1984. ACM, pp. 18--24. Google Scholar
Digital Library
- Haynes, C. T., D. P. Friedman, and M. Wand. Continuations and coroutines. In LFP '84, New York, NY, August 1984. ACM, pp. 293--298. Google Scholar
Digital Library
- Hauser, C., C. Jacobi, M. Theimer, B. Welch, and M. Weiser. Using threads in interactive systems: A case study. In SOSP '93, December 1993, pp. 94--105. Google Scholar
Digital Library
- Jagannathan, S. and J. Philbin. A customizable substrate for concurrent languages. In PLDI '92, New York, NY, June 1992. ACM, pp. 55--81. Google Scholar
Digital Library
- Li, P., S. Marlow, S. Peyton Jones, and A. Tolmach. Lightweight concurrency primitives for GHC. In HASKELL '07, New York, NY, September 2007. ACM, pp. 107--118. Google Scholar
Digital Library
- Mohr, E., D. A. Kranz, and R. H. Halstead Jr. Lazy task creation: a technique for increasing the granularity of parallel programs. In LFP '90, New York, NY, June 1990. ACM, pp. 185--197. Google Scholar
Digital Library
- Michael, M. M. and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In PODC '96, New York, NY, May 1996. ACM, pp. 267--275. Google Scholar
Digital Library
- Nikhil, R. S. ID Language Reference Manual. Laboratory for Computer Science, MIT, Cambridge, MA, July 1991.Google Scholar
- Osborne, R. B. Speculative computation in multilisp. In LFP '90, New York, NY, June 1990. ACM, pp. 198--208. Google Scholar
Digital Library
- Rainey, M. The Manticore runtime model. Master's dissertation, University of Chicago, January 2007. Available from http://manticore.cs.uchicago.edu.Google Scholar
- Rainey, M. Prototyping nested schedulers. In Redex Workshop, September 2007.Google Scholar
- Ramsey, N. Concurrent programming in ML. Technical Report CS-TR-262-90, Dept. of C.S., Princeton University, April 1990.Google Scholar
- Reppy, J. H. First-class synchronous operations in Standard ML. Technical Report TR 89-1068, Dept. of CS, Cornell University, December 1989. Google Scholar
Digital Library
- Reppy, J. H. Asynchronous signals in Standard ML. Technical Report TR 90-1144, Dept. of CS, Cornell University, Ithaca, NY, August 1990. Google Scholar
Digital Library
- Reppy, J. H. Concurrent Programming in ML. Cambridge University Press, Cambridge, England, 1999. Google Scholar
Digital Library
- Reppy, J. Optimizing nested loops using local CPS conversion. HOSC, 15, 2002, pp. 161--180. Google Scholar
Digital Library
- Reppy, J. and Y. Xiao. Specialization of CML message-passing primitives. In POPL '07, New York, NY, January 2007. ACM, pp. 315--326. Google Scholar
Digital Library
- Reppy, J. and Y. Xiao. Toward a parallel implementation of Concurrent ML. In DAMP '08, New York, NY, January 2008. ACM.Google Scholar
- Shaw, A. Data parallelism in Manticore. Master's dissertation, University of Chicago, July 2007. Available from http://manticore.cs.uchicago.edu.Google Scholar
- Shivers, O. Continuations and threads: Expressing machine concurrency directly in advanced languages. In CW '97, New York, NY, January 1997. ACM.Google Scholar
- Vandevoorde, M. T. and E. S. Roberts. Workcrews: an abstraction for controlling parallelism. IJPP, 17(4), August 1988, pp. 347--366. Google Scholar
Digital Library
- Wand, M. Continuation-based multiprocessing. In LFP '80, New York, NY, August 1980. ACM, pp. 19--28. Google Scholar
Digital Library
Index Terms
A scheduling framework for general-purpose parallel languages
Recommendations
A scheduling framework for general-purpose parallel languages
ICFP '08The trend in microprocessor design toward multicore and manycore processors means that future performance gains in software will largely come from harnessing parallelism. To realize such gains, we need languages and implementations that can enable ...
OpenMP task scheduling strategies for multicore NUMA systems
The recent addition of task parallelism to the OpenMP shared memory API allows programmers to express concurrency at a high level of abstraction and places the burden of scheduling parallel execution on the OpenMP run-time system. Efficient scheduling ...
Single machine parallel-batch scheduling with deteriorating jobs
We consider several single machine parallel-batch scheduling problems in which the processing time of a job is a linear function of its starting time. We give a polynomial-time algorithm for minimizing the maximum cost, an O(n5) time algorithm for ...







Comments