Abstract
Research on parallel computing has historically revolved around compute-intensive applications drawn from traditional areas such as high-performance computing. With the growing availability and usage of multicore chips, applications of parallel computing now include interactive parallel applications that mix compute-intensive tasks with interaction, e.g., with the user or more generally with the external world. Recent theoretical work on responsive parallelism presents abstract cost models and type systems for ensuring and reasoning about responsiveness and throughput of such interactive parallel programs.
In this paper, we extend prior work by considering a crucial metric: fairness. To express rich interactive parallel programs, we allow programmers to assign priorities to threads and instruct the scheduler to obey a notion of fairness. We then propose the fairly prompt scheduling principle for executing such programs; the principle specifies the schedule for multithreaded programs on multiple processors. For such schedules, we prove theoretical bounds on the execution and response times of jobs of various priorities. In particular, we bound the amount, i.e., stretch, by which a low-priority job can be delayed by higher-priority work. We also present an algorithm designed to approximate the fairly prompt scheduling principle on multicore computers, implement the algorithm by extending the Standard ML language, and present an empirical evaluation.
Supplemental Material
- Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. 2002. The data locality of work stealing. Theory of Computing Systems (TOCS) 35, 3 (2002), 321–347. Google Scholar
Digital Library
- Umut A. Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018. Heartbeat Scheduling: Provable Efficiency for Nested Parallelism. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). 769–782. Google Scholar
Digital Library
- Umut A. Acar, Arthur Charguéraud, and Mike Rainey. 2013. Scheduling Parallel Programs by Work Stealing with Private Deques. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). Google Scholar
Digital Library
- Umut A. Acar, Arthur Charguéraud, and Mike Rainey. 2016. Oracle-guided scheduling for controlling granularity in implicitly parallel languages. Journal of Functional Programming (JFP) 26 (2016), e23.Google Scholar
Cross Ref
- Kunal Agrawal, Jeremy T. Fineman, Kefu Lu, Brendan Sheridan, Jim Sukha, and Robert Utterback. 2014. Provably Good Scheduling for Parallel Programs That Use Data Structures Through Implicit Batching. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’14). 84–95. Google Scholar
Digital Library
- Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 2001. Thread Scheduling for Multiprogrammed Multiprocessors. Theory of Computing Systems 34, 2 (2001), 115–144.Google Scholar
Cross Ref
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP ’09). 29–44. Google Scholar
Digital Library
- Geoffrey Blake, Ronald G. Dreslinski, Trevor Mudge, and Krisztián Flautner. 2010. Evolution of Thread-level Parallelism in Desktop Applications. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA ’10). 302–313. Google Scholar
Digital Library
- Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2011. Scheduling irregular parallel computations on hierarchical caches. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’11). 355–366. Google Scholar
Digital Library
- Guy E. Blelloch and Phillip B. Gibbons. 2004. Effectively sharing a cache among threads. In SPAA. Google Scholar
Digital Library
- Guy E. Blelloch, Phillip B. Gibbons, and Yossi Matias. 1999. Provably efficient scheduling for languages with fine-grained parallelism. J. ACM 46 (March 1999), 281–321. Issue 2. Google Scholar
Digital Library
- Guy E. Blelloch, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2010. Low Depth Cache-oblivious Algorithms. In Proceedings of the Twenty-second Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’10). ACM, New York, NY, USA, 189–199. Google Scholar
Digital Library
- Guy E. Blelloch, Jonathan C. Hardwick, Jay Sipelstein, Marco Zagha, and Siddhartha Chatterjee. 1994. Implementation of a Portable Nested Data-Parallel Language. J. Parallel Distrib. Comput. 21, 1 (1994), 4–14. Google Scholar
Digital Library
- Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1996. Cilk: An Efficient Multithreaded Runtime System. J. Parallel and Distrib. Comput. 37, 1 (1996), 55 – 69. Google Scholar
Digital Library
- Robert D. Blumofe and Charles E. Leiserson. 1998. Space-Efficient Scheduling of Multithreaded Computations. SIAM J. Comput. 27, 1 (1998), 202–229. Google Scholar
Digital Library
- Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46 (Sept. 1999), 720–748. Issue 5. Google Scholar
Digital Library
- Silas Boyd-Wickizer, Haibo Chen, Rong Chen, Yandong Mao, Frans Kaashoek, Robert Morris, Aleksey Pesterev, Lex Stein, Ming Wu, Yuehua Dai, Yang Zhang, and Zheng Zhang. 2008. Corey: An Operating System for Many Cores. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation. 43–57. Google Scholar
Digital Library
- Richard P. Brent. 1974. The parallel evaluation of general arithmetic expressions. J. ACM 21, 2 (1974), 201–206. Google Scholar
Digital Library
- F. Warren Burton and M. Ronan Sleep. 1981. Executing functional programs on a virtual tree of processors. In Functional Programming Languages and Computer Architecture (FPCA ’81). ACM Press, 187–194. Google Scholar
Digital Library
- Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA ’05). ACM, 519–538. Google Scholar
Digital Library
- Rezaul Alam Chowdhury and Vijaya Ramachandran. 2008. Cache-efficient dynamic programming algorithms for multicores. In Proc. 20th ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, USA, 207–216. Google Scholar
Digital Library
- A. Demers, S. Keshav, and S. Shenker. 1989. Analysis and Simulation of a Fair Queueing Algorithm. In Symposium Proceedings on Communications Architectures &Amp; Protocols (SIGCOMM ’89). ACM, New York, NY, USA, 1–12. Google Scholar
Digital Library
- Derek L. Eager, John Zahorjan, and Edward D. Lazowska. 1989. Speedup versus efficiency in parallel systems. IEEE Transactions on Computing 38, 3 (1989), 408–423. Google Scholar
Digital Library
- Kristián Flautner, Rich Uhlig, Steve Reinhardt, and Trevor Mudge. 2000. Thread-level Parallelism and Interactive Performance of Desktop Applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). 129–138. Google Scholar
Digital Library
- Matthew Fluet, Mike Rainey, John Reppy, and Adam Shaw. 2011. Implicitly threaded parallelism in Manticore. Journal of Functional Programming 20, 5-6 (2011), 1–40. Google Scholar
Digital Library
- Matthew Fluet, Mike Rainey, John H. Reppy, and Adam Shaw. 2008. Implicitly-threaded parallelism in Manticore. In ICFP. 119–130. Google Scholar
Digital Library
- Nissim. Francez. 1986. Fairness. Springer US, New York, NY. Google Scholar
Digital Library
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In PLDI. 212–223. Google Scholar
Digital Library
- C. Gao, A. Gutierrez, R. G. Dreslinski, T. Mudge, K. Flautner, and G. Blake. 2014. A study of Thread Level Parallelism on mobile devices. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on. 126–127.Google Scholar
- The Go Authors. 2018. The Go Programming Language Specification. (Feb. 2018). https://golang.org/ref/spec#Go_statementsGoogle Scholar
- Pawan Goyal, Xingang Guo, and Harrick M. Vin. 1996. A Hierarchial CP U Scheduler for Multimedia Operating Systems. In Proceedings of the Second USENIX Symposium on Operating Systems Design and Implementation (OSDI ’96). ACM, New York, NY, USA, 107–121. Google Scholar
Digital Library
- John Greiner and Guy E. Blelloch. 1999. A Provably Time-efficient Parallel Implementation of Full Speculation. ACM Transactions on Programming Languages and Systems 21, 2 (March 1999), 240–285. Google Scholar
Digital Library
- Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut A. Acar, and Matthew Fluet. 2018. Hierarchical memory management for mutable state. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2018, Vienna, Austria, February 24-28, 2018. 81–93. Google Scholar
Digital Library
- Robert H. Halstead. 1985. MULTILISP: a language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems 7 (1985), 501–538. Google Scholar
Digital Library
- Mor Harchol-Balter. 2013. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press. Google Scholar
Digital Library
- Carl Hauser, Christian Jacobi, Marvin Theimer, Brent Welch, and Mark Weiser. 1993. Using Threads in Interactive Systems: A Case Study. SIGOPS Oper. Syst. Rev. 27, 5 (Dec. 1993), 94–105. Google Scholar
Digital Library
- Shams Imam and Vivek Sarkar. 2015. Load Balancing Prioritized Tasks via Work-Stealing. In Euro-Par 2015: Parallel Processing - 21st International Conference on Parallel and Distributed Computing. 222–234.Google Scholar
- Shams Mahmood Imam and Vivek Sarkar. 2014. Habanero-Java library: a Java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ’14. 75–86. Google Scholar
Cross Ref
- Intel. 2011. Intel Threading Building Blocks. (2011). https://www.threadingbuildingblocks.org/ .Google Scholar
- Gabriele Keller, Manuel M.T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, and Ben Lippmeier. 2010. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming (ICFP ’10). 261–272. Google Scholar
Digital Library
- R. A. Knepper, S. S. Srinivasa, and M. T. Mason. 2010. Hierarchical planning architectures for mobile manipulation tasks in indoor environments. In 2010 IEEE International Conference on Robotics and Automation. 1985–1990.Google Scholar
- Lindsey Kuper, Aaron Todd, Sam Tobin-Hochstadt, and Ryan R. Newton. 2014. Taming the Parallel Effect Zoo: Extensible Deterministic Parallelism with LVish. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 2–14. Google Scholar
Digital Library
- Steven M LaValle. 1998. Rapidly-exploring random trees: A new tool for path planning. Technical Report TR 98-11. Computer Science Dept., Iowa State University.Google Scholar
- Doug Lea. 2000. A Java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande (JAVA ’00). 36–43. Google Scholar
Digital Library
- Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt. 2009. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’09). 227–242. Google Scholar
Digital Library
- Jing Li, Jian Jia Chen, Kunal Agrawal, Chenyang Lu, Chris Gill, and Abusayeed Saifullah. 2014. Analysis of Federated and Global Scheduling for Parallel Real-Time Tasks. In Proceedings of the 2014 Agile Conference (AGILE ’14). IEEE Computer Society, Washington, DC, USA, 85–96. Google Scholar
Digital Library
- Jing Li, Son Dinh, Kevin Kieselbach, Kunal Agrawal, Christopher Gill, and Chenyang Lu. 2016. Randomized Work Stealing for Large Scale Soft Real-time Systems. In Real-Time Systems Symposium (RTSS), 2016 IEEE. IEEE, 203–214.Google Scholar
- Jing Li, David Ferry, Shaurya Ahuja, Kunal Agrawal, Christopher Gill, and Chenyang Lu. 2017. Mixed-criticality Federated Scheduling for Parallel Real-time Tasks. Real-Time Syst. 53, 5 (Sept. 2017), 760–811. Google Scholar
Digital Library
- Sebastian Mattheis, Tobias Schuele, Andreas Raabe, Thomas Henties, and Urs Gleim. 2012. Work Stealing Strategies for Parallel Stream Processing in Soft Real-time Systems. In Proceedings of the 25th International Conference on Architecture of Computing Systems (ARCS’12). Springer-Verlag, Berlin, Heidelberg, 172–183. Google Scholar
Digital Library
- Stefan K. Muller and Umut A. Acar. 2016. Latency-Hiding Work Stealing: Scheduling Interacting Parallel Computations with Work Stealing. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13, 2016. 71–82. Google Scholar
Digital Library
- Stefan K. Muller, Umut A. Acar, and Robert Harper. 2017. Responsive Parallel Computation: Bridging Competitive and Cooperative Threading. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 677–692. Google Scholar
Digital Library
- Stefan K. Muller, Umut A. Acar, and Robert Harper. 2018. Competitive Parallelism: Getting Your Priorities Right. Proc. ACM Program. Lang. 2, ICFP, Article 95 (July 2018), 30 pages. Google Scholar
Digital Library
- Girija J. Narlikar and Guy E. Blelloch. 1999. Space-Efficient Scheduling of Nested Parallelism. ACM Transactions on Programming Languages and Systems 21 (1999). Google Scholar
Digital Library
- Jason Nieh, James G. Hanko, J. Duane Northcutt, and Gerard A. Wall. 1994. SVR4UNIX Scheduler Unacceptable for Multimedia Applications. In Proceedings of the 4th International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV ’93). Springer-Verlag, London, UK, UK, 41–53. Google Scholar
Digital Library
- Stephen Olivier, Jun Huan, Jinze Liu, Jan Prins, James Dinan, P. Sadayappan, and Chau-Wen Tseng. 2006. UTS: An Unbalanced Tree Search Benchmark. In Languages and Compilers for Parallel Computing, 19th International Workshop, LCPC 2006, New Orleans, LA, USA, November 2-4, 2006. Revised Papers. 235–250. Google Scholar
Digital Library
- Ram Raghunathan, Stefan K. Muller, Umut A. Acar, and Guy Blelloch. 2016. Hierarchical Memory Management for Parallel Programs. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016). ACM, New York, NY, USA, 392–406. Google Scholar
Digital Library
- Abusayeed Saifullah, David Ferry, Jing Li, Kunal Agrawal, Chenyang Lu, and Christopher D. Gill. 2014a. Parallel Real-Time Scheduling of DAGs. IEEE Trans. Parallel Distrib. Syst. 25, 12 (2014), 3242–3252.Google Scholar
Cross Ref
- Abusayeed Saifullah, David Ferry, Jing Li, Kunal Agrawal, Chenyang Lu, and Christopher D. Gill. 2014b. Parallel Real-Time Scheduling of DAGs. IEEE Trans. Parallel Distrib. Syst. 25, 12 (2014), 3242–3252.Google Scholar
Cross Ref
- Abusayeed Saifullah, Jing Li, Kunal Agrawal, Chenyang Lu, and Christopher Gill. 2013. Multi-core real-time scheduling for generalized parallel task models. Real-Time Systems 49, 4 (01 Jul 2013), 404–435.Google Scholar
- Abraham Silberschatz, Peter Baer Galvin, and Greg Gagne. 2005. Operating system concepts (7. ed.). Wiley.Google Scholar
- K. C. Sivaramakrishnan, Lukasz Ziarek, and Suresh Jagannathan. 2014. MultiMLton: A multicore-aware runtime for standard ML. Journal of Functional Programming FirstView (6 2014), 1–62.Google Scholar
- Daniel Spoonhower. 2009. Scheduling Deterministic Parallel Programs. Ph.D. Dissertation. Carnegie Mellon University, Pittsburgh, PA, USA.Google Scholar
Digital Library
- N. Sturtevant. 2012. Benchmarks for Grid-Based Pathfinding. Transactions on Computational Intelligence and AI in Games 4, 2 (2012), 144 – 148. http://web.cs.du.edu/~sturtevant/papers/benchmarks.pdfGoogle Scholar
Cross Ref
- Olivier Tardieu, Benjamin Herta, David Cunningham, David Grove, Prabhanjan Kambadur, Vijay Saraswat, Avraham Shinnar, Mikio Takeuchi, and Mandana Vaziri. 2014. X10 and APGAS at Petascale. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’14). 53–66. Google Scholar
Digital Library
- J.D. Ullman. 1975. NP-complete scheduling problems. J. Comput. System Sci. 10, 3 (1975), 384 – 393. Google Scholar
Digital Library
- Carl A. Waldspurger and William E. Weihl. 1994. Lottery Scheduling: Flexible Proportional-Share Resource Management. In Operating Systems Design and Implementation. 1–11. citeseer.ist.psu.edu/waldspurger94lottery.html Google Scholar
Digital Library
- Martin Wimmer, Daniel Cederman, Jesper Larsson Träff, and Philippas Tsigas. 2013. Work-stealing with Configurable Scheduling Strategies. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). 315–316. Google Scholar
Digital Library
- Martin Wimmer, Francesco Versaci, Jesper Larsson Träff, Daniel Cederman, and Philippas Tsigas. 2014. Data Structures for Task-based Priority Scheduling. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’14). 379–380. Google Scholar
Digital Library
Index Terms
- Fairness in responsive parallelism
Recommendations
Competitive parallelism: getting your priorities right
Multi-threaded programs have traditionally fallen into one of two domains: cooperative and competitive. These two domains have traditionally remained mostly disjoint, with cooperative threading used for increasing throughput in compute-intensive ...
A Refactoring Approach to Parallelism
In the multicore era, a major programming task will be to make programs more parallel. This is tedious because it requires changing many lines of code; it's also error-prone and nontrivial because programmers need to ensure noninterference of parallel ...
Massive atomics for massive parallelism on GPUs
ISMM '14One important type of parallelism exploited in many applications is reduction type parallelism. In these applications, the order of the read-modify-write updates to one shared data object can be arbitrary as long as there is an imposed order for the ...





Comments