ABSTRACT
Control planes of cloud frameworks trade off between scheduling granularity and performance. Centralized systems schedule at task granularity, but only schedule a few thousand tasks per second. Distributed systems schedule hundreds of thousands of tasks per second but changing the schedule is costly.
We present execution templates, a control plane abstraction that can schedule hundreds of thousands of tasks per second while supporting fine-grained, per-task scheduling decisions. Execution templates leverage a program's repetitive control flow to cache blocks of frequently-executed tasks. Executing a task in a template requires sending a single message. Large-scale scheduling changes install new templates, while small changes apply edits to existing templates.
Evaluations of execution templates in Nimbus, a data analytics framework, find that they provide the fine-grained scheduling flexibility of centralized control planes while matching the strong scaling of distributed ones. Execution templates support complex, real-world applications, such as a fluid simulation with a triply nested loop and data dependent branches.
- Apache Hadoop. http://wiki.apache.org/hadoop.Google Scholar
- Facebook AI Research open sources deep-learning modules for Torch. https://research. facebook.com/blog/fair-open-sources-deep-learning-modules-for-torch/.Google Scholar
- M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Savannah, Georgia, USA, 2016. Google Scholar
- M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et al. Spark sql: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1383-1394. ACM, 2015. Google Scholar
- M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: Expressing locality and independence with logical regions. In High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for, pages 1-11. IEEE, 2012. Google Scholar
- E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou. Apollo: scalable and coordinated scheduling for cloud-scale computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 285- 300, 2014. Google Scholar
- K. J. Brown, H. Lee, T. Rompf, A. K. Sujeeth, C. De Sa, C. Aberger, and K. Olukotun. Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns. In Proceedings of the 2016 International Symposium on Code Generation and Optimization, pages 194- 205. ACM, 2016. Google Scholar
- C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. Flumejava: easy, efficient data-parallel pipelines. In ACM Sigplan Notices, volume 45, pages 363- 375. ACM, 2010. Google Scholar
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107-113, 2008. Google Scholar
- P. Delgado, F. Dinu, A.-M. Kermarrec, and W. Zwaenepoel. Hawk: hybrid datacenter scheduling. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 499-510, 2015. Google Scholar
- C. Delimitrou, D. Sanchez, and C. Kozyrakis. Tarcil: reconciling scheduling speed and quality in large shared clusters. In Proceedings of the Sixth ACM Symposium on Cloud Computing, pages 97-110. ACM, 2015. Google Scholar
- P. Dubey, P. Hanrahan, R. Fedkiw, M. Lentine, and C. Schroeder. Physbam: Physically based simulation. In ACM SIGGRAPH 2011 Courses, SIGGRAPH '11, pages 10:1-10:22, New York, NY, USA, 2011. ACM. Google Scholar
- D. Enright, R. Fedkiw, J. Ferziger, and I. Mitchell. A hybrid particle level set method for improved interface capturing. Journal of Computational Physics, 183(1):83-116, 2002. Google Scholar
- M. R. Garey, D. S. Johnson, and R. Sethi. The complexity of flowshop and jobshop scheduling. Mathematics of operations research, 1(2):117-129, 1976. Google Scholar
- A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: Fair allocation of multiple resource types. In NSDI, volume 11, pages 24-24, 2011. Google Scholar
- I. Gog, M. Schwarzkopf, A. Gleave, R. N. M. Watson, and S. Hand. Firmament: Fast, centralized cluster scheduling at scale. In To appear in Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation. USENIX, 2016. Google Scholar
- R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource packing for cluster schedulers. In ACM SIGCOMM Computer Communication Review, volume 44, pages 455- 466. ACM, 2014. Google Scholar
- R. Grandl, S. Kandula, S. Rao, A. Akella, and J. Kulkarni. Do the hard stuff first: Scheduling dependent computations in data-analytics clusters. arXiv preprint arXiv:1604.07371, 2016.Google Scholar
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI, volume 11, pages 22-22, 2011. Google Scholar
- M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In ACM SIGOPS Operating Systems Review, volume 41, pages 59- 72. ACM, 2007. Google Scholar
- M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 261-276. ACM, 2009. Google Scholar
- L. V. Kale and S. Krishnan. CHARM++: a portable concurrent object oriented system based on C++, volume 28. ACM, 1993.Google Scholar
- K. Karanasos, S. Rao, C. Curino, C. Douglas, K. Chaliparambil, G. M. Fumarola, S. Heddaya, R. Ramakrishnan, and S. Sakalanaga. Mercury: Hybrid centralized and distributed scheduling in large shared clusters. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 485-497, 2015. Google Scholar
- Q. Ke, M. Isard, and Y. Yu. Optimus: a dynamic rewriting framework for data-parallel execution plans. In Proceedings of the 8th ACM European Conference on Computer Systems, pages 15- 28. ACM, 2013. Google Scholar
- J. Liu and S. J. Wright. Asynchronous stochastic coordinate descent: Parallelism and convergence properties. SIAM Journal on Optimization, 25(1):351-376, 2015.Google Scholar
- Y. Low, J. E. Gonzalez, A. Kyrola, D. Bickson, C. E. Guestrin, and J. Hellerstein. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1408.2041, 2014. Google Scholar
- M. Mitzenmacher. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems, 12(10):1094-1104, 2001. Google Scholar
- D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 439-455. ACM, 2013. Google Scholar
- D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: A universal execution engine for distributed dataflow computing. In NSDI, volume 11, pages 9-9, 2011. Google Scholar
- K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, B.-G. Chun, and V. ICSI. Making sense of performance in data analytics frameworks. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 293-307, 2015. Google Scholar
- K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. Sparrow: distributed, low latency scheduling. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 69- 84. ACM, 2013. Google Scholar
- S. Palkar, J. Thomas, and M. Zaharia. Nested vector language: Roofline performance for data parallel code. http://livinglab.mit.edu/wpcontent/ uploads/2016/01/nvlposter.pdf.Google Scholar
- M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. Omega: flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems, pages 351-364. ACM, 2013. Google Scholar
- M. Snir. MPI-the Complete Reference: The MPI core, volume 1. MIT press, 1998. Google Scholar
- V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, et al. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing, page 5. ACM, 2013. Google Scholar
- S. Venkataraman, A. Panda, K. Ousterhout, A. Ghodsi, M. J. Franklin, B. Recht, and I. Stoica. Drizzle: Fast and adaptable stream processing at scale.Google Scholar
- S. Venkataraman, Z. Yang, M. Franklin, B. Recht, and I. Stoica. Ernest: Efficient performance prediction for large-scale advanced analytics. In Proceedings of the 13th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2016. Google Scholar
- R. Xin. Technical Preview of Apache Spark 2.0 Now on Databricks. https://databricks. com/blog/2016/05/11/apache-spark- 2-0-technical-preview-easier-faster-and-smarter.html.Google Scholar
- R. Xin and J. Rosen. Project Tungsten: Bringing Apache Spark Closer to Bare Metal. https://databricks.com/blog/2015/ 04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html.Google Scholar
- Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, volume 8, pages 1-14, 2008. Google Scholar
- M. Zaharia. New developments in spark and rethinking apis for big data. http:// platformlab.stanford.edu/Seminar% 20Talks/stanford-seminar.pdf.Google Scholar
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012. Google Scholar
Index Terms
Execution templates: caching control plane decisions for strong scaling of data analytics
Recommendations
Unit Execution Time Shop Problems
The problem of preemptively and nonpreemptively scheduling a set of n independent jobs on an m machine open shop, flow shop or job shop is studied. It is shown that the problem of constructing optimal mean finishing time preemptive and nonpreemptive ...
Bipartite Matching Based Speculative Execution to Improve Cloud MapReduce Performance
ACIT-CSI '15: Proceedings of the 2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and IntelligenceMapReduce is a software framework which canparallelize the job execution by dividing a job into a number ofmap and reduce tasks in cloud computing systems. Due to somereasons (hardware malfunction, input data skew, heterogeneouscloud environment, etc.), ...




Comments