skip to main content
research-article

Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing

Authors Info & Claims
Published:19 March 2018Publication History
Skip Abstract Section

Abstract

In-Memory cluster Computing (IMC) frameworks (e.g., Spark) have become increasingly important because they typically achieve more than 10× speedups over the traditional On-Disk cluster Computing (ODC) frameworks for iterative and interactive applications. Like ODC, IMC frameworks typically run the same given programs repeatedly on a given cluster with similar input dataset size each time. It is challenging to build performance model for IMC program because: 1) the performance of IMC programs is more sensitive to the size of input dataset, which is known to be difficult to be incorporated into a performance model due to its complex effects on performance; 2) the number of performance-critical configuration parameters in IMC is much larger than ODC (more than 40 vs. around 10), the high dimensionality requires more sophisticated models to achieve high accuracy. To address this challenge, we propose DAC, a datasize-aware auto-tuning approach to efficiently identify the high dimensional configuration for a given IMC program to achieve optimal performance on a given cluster. DAC is a significant advance over the state-of-the-art because it can take the size of input dataset and 41 configuration parameters as the parameters of the performance model for a given IMC program, --- unprecedented in previous work. It is made possible by two key techniques: 1) Hierarchical Modeling (HM), which combines a number of individual sub-models in a hierarchical manner; 2) Genetic Algorithm (GA) is employed to search the optimal configuration. To evaluate DAC, we use six typical Spark programs, each with five different input dataset sizes. The evaluation results show that DAC improves the performance of six typical Spark programs, each with five different input dataset sizes compared to default configurations by a factor of 30.4x on average and up to 89x. We also report that the geometric mean speedups of DAC over configurations by default, expert, and RFHOC are 15.4x, 2.3x, and 1.5x, respectively.

References

  1. Faraz Ahmad, Srimat T Chakradhar, Anand Raghunathan, and TN Vijaykumar. 2014. ShuffleWatcher: Shuffle-aware Scheduling in Multitenant MapReduce Clusters. In Proceedings of USENIX Annual Technical Conference (ATC) (ATC'14). USENIX Association, Philadelphia, PA, 1-12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT) (PACT'14). ACM Press, Edmonton, Canada, 303-316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1383-1394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Zhendong Bei, Zhibin Yu, Huiling Zhang, Wen Xiong, Chengzhong Xu, Lieven Eeckhout, and Shengzhong Feng. 2016. RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration. IEEE Transactions on Parallel and Distributed Systems 27, 5 (June 2016), 1470-1483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dazhao Cheng, Jia Rao, Yanfei Guo, and Xiaobo Zhou. 2014. Improving MapReduce Performance in Heterogeneous Environments with Adaptive Task Tuning. In Proceedings of the 15th International Middleware Conference (Middleware) (Middleware'14). USENIX Association, Bordeaux, France, 97-108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Tatsuhiro Chiba and Tamiya Onodera. 2015. Workload Characterization and Optimization of TPC-H Queries on Apache Spark. Technical Report. IBM Research - Tokyo, IBM Japan, Ltd.Google ScholarGoogle Scholar
  7. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the International Conference on Operating Systems Design and Implementation (OSDI) (OSDI'12). USENIX Association, San Francisco, CA, 137-150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters. In Proceedings of the 18th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS) (ASPLOS'13). ACM Press, Houston, TX, 77-88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In Proceedings of the 19th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS) (ASPLOS'14). ACM Press, Salt Lake City, UT, 1-12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Adem Efe Gencer, David Bindel, Emin Gun Sirer, and Robbert van Renesse. 2015. Configuring Distributed Computations Using Response Surfaces. In Proceedings of the annual ACM/IFIP/USENIX Middleware conference (Middleware) (Middleware'15). USENIX Association, Vancouver, Canada, 235-246 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Robert Gentleman and Ross Ihaka. 2016. The R Project for Statistical Computing. (Sept. 2016). Retrieved Januray 20, 2018 from https://www.r-project.org/Google ScholarGoogle Scholar
  12. Herodotos Herodotou. 2011. Hadoop Performance Models. Technical Report CS-2011-05. Duke University, Durham, NC.Google ScholarGoogle Scholar
  13. Herodotos Herodotou and Shivnath Babu. 2011. Profiling, What-If Analysis, and Cost-Based Optimization of MapReduce programs. Journal of VLDB Endowment 4, 11 (Jan. 2011), 1111-1122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. 2011. Starfish: A Self-tuning System for Big Data Analytics. In Proceedings of the Biennial International Conference on Innovative Data Systems Research (CIDR'11). CIDRDB, 261-272.Google ScholarGoogle Scholar
  15. Peng Huang, William J. Bolosky, Abhishek Singh, and Yuanyuan Zhou. 2015. Conf Valley: A Systematic Configuration Validation Framework for Cloud Services. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys) (EuroSys'15). USENIX Association, Bordeaux, France, 1-16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Cloudera Inc. 2016. Tuning Spark Applications. (June 2016). Retrieved Januray 20, 2018 from https://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_spark_tuning.htmlGoogle ScholarGoogle Scholar
  17. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed Data-Parallel Programs form Sequential Building Blocks. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys) (EuroSys'07). USENIX Association, Lisbon, Portugal, 59-72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Manoj Kumar, Mohammad Husian, Naveen Upreti, and Deepti Gupta. 2010. Genetic algorithm: Review and Application. International Journal of Information Technology and Knowledge Management 2, 2 (Jan. 2010), 451-454.Google ScholarGoogle Scholar
  19. Palden Lama and Xiaobo Zhou. 2012. AROMA: Automated Resource Allocation and Configuration of MapReduce Environment in the Cloud. In Proceedings of the 9th ACM International Conference on Autonomic Computing (ICAC) (ICAC'12). ACM Press, San Jose, CA, 63-72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jacek Laskowski. 2016. Mastering Apache Spark. (Jan. 2016). Retrieved Januray 20, 2018 from https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-dagscheduler-stages.htmlGoogle ScholarGoogle Scholar
  21. Benjamin C. Lee and David Brooks. 2010. Applied Inference: Case Studies in Micro-architectural Design. ACM Transactions on Architecture and Code Optimization (TACO) 7, 2 (Sept. 2010), 8:1-8:35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Roger J Lewis. 2000. An introduction to classification and regression tree (CART) analysis. In Proceedings of Annual Meeting of the Society for Academic Emergency Medicine. San Francisco, CA, 1-14.Google ScholarGoogle Scholar
  23. Haoyuan Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica. 2014. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In Proceedings of the ACM Symposium on Cloud Computing (SoCC) (SoCC'14). ACM Press, Seattle, WA, 1-15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shen Li, Shaohan Hu, Shiguang Wang, Lu Su, Tarek Abdelzaher, Indranil Gupta, and Richard Pace. 2014. Woha: Deadline-aware map-reduce workflow scheduling framework over hadoop clusters. In Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems (ICDCS) (ICDCS'14). IEEE, Madrid, Spain, 93-103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Guangdeng Liao, Kushal Datta, and Theodore L Willke. 2013. Gunther: Search-Based Auto-Tuning of MapReduce. In Proceedings of Euro-Par 2013 Parallel Processing (EuroPar'13). Springer, Berlin, Heidelberg, 406-419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Luo Lie. 2010. Heuristic Artificial Intelligent Algorithm for Genetic Algorithm. Key Engineering Materials 439 (May 2010), 516-521.Google ScholarGoogle Scholar
  27. Weiqing Liu, Jiannong Cao, Lei Yang, Lin Xu, Xuanjia Qiu, and Jing Li. 2017. AppBooster: Boosting the Performance of Interactive Mobile Applications with Computation Offloading and Parameter Tuning. IEEE Transactions on Parallel and Distributed Systems 28, 6 (June 2017), 1593-1606. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zhaolei Liu and TS Eugene Ng. 2017. Leaky Buffer: A Novel Abstraction for Relieving Memory Pressure from Cluster Data Processing Frameworks. IEEE Transactions on Parallel and Distributed Systems 28, 1 (March 2017), 128-140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. 2015. Trash Day: Coordinating Garbage Collection in Distributed Systems. In Proceedings of the 15th USENIX Workshop on Hot Topics in Operating Systems (HotOS) (HotOS XV). USENIX Association, Kartause Ittingen, Switzerland, 1-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2016. MLlib: Machine Learning in Apach Spark. The Journal of Machine Learning Research 17, 1 (Jan. 2016), 1-7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Shan Lu, Sanazsadat Alamian, and Onur Mutlu. 2016. Yak: A high-performance big-data-friendly garbage collector. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI) (OSDI'16). USENIX Association, Savannah, GA, 349-365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Andrew Or and Josh Rosen. 2016. Unified Memor Management in Spark 1.6. (Jan. 2016). Retrieved Januray 20, 2018 from https://issues.apache.org/jira/secure/attachment/12765646/unified-memory-management-spark-10000.pdfGoogle ScholarGoogle Scholar
  33. Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun. 2015. Making Sense of Performance in Data Analytics Frameworks. In Proceedings of the 12nd USENIX Symposium on Networked Systems Design and Implementation (NSDI) (NSDI'15). USENIX Association, Oakland, CA, 293-307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pankaj. 2017. Java (JVM) Memory Model - Memory Management in Java. (March 2017). Retrieved Januray 20, 2018 from http://www.journaldev.com/2856/java-jvm-memory-model-memory-management-in-javaGoogle ScholarGoogle Scholar
  35. Simone Pellegrini, Radu Prodan, and Thomas Fahringer. 2012. Tuning MPI Runtime Parameter Setting for High Performance Computing. In Proceedings of IEEE International Conference on Cluster Computing Workshops. IEEE Computer Society, Washington, DC, 213-221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zujie Ren, Xianghua Xu, Jian Wan, Weisong Shi, and Min Zhou. 2012. Workload Characterization on a Production Hadoop Cluster: A Case Study on Taobao. In Proceedings of IEEE International Symposium on Workload Characterization (IISWC) (IISWC'12). IEEE Computer Society, San Diego, CA, 1-11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Anooshiravan Saboori, Guofei Jiang, and Haifeng Chen. 2008. Autotuning Configurations in Distributed Systems for Performance Improvements using Evolutionary Strategies. In Proceedings of the 28th International Conference on Distributed Computing Systems (ICDCS) (ICDCS'08). IEEE Computer Society, Beijing, China, 769-776. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Juwei Shi, Yunjie Qiu, Umar Farooq Minhas, Limei Jiao, Chen Wang, Berthold Reinwald, and Fatma Ozcan. 2015. Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics. In Proceedings of the 42nd International Conference on Very Large Data Bases (VLDB Endowment), Vol.8, No.13 (VLDB'15), Vol. 8. Hawai'i, USA, 2110-2121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xueyuan Su, Garret Swart, Brian Goetz, Brian Oliver, and Paul Sandoz. 2014. Changing engines in midstream: A java stream computational model for big data processing. Proceedings of the VLDB Endowment 7, 13 (Sept. 2014), 1343-1354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Apache HBase Team. 2016. Apache HBase. (June 2016). Retrieved Januray 20, 2018 from http://hadoop.apache.org/hbase/Google ScholarGoogle Scholar
  41. Aparch Spark Team. 2016. Aparch Spark. (March 2016). Retrieved Januray 20, 2018 from http://spark.apache.org/Google ScholarGoogle Scholar
  42. Aparch Spark Team. 2016. Spark Configuration. (May 2016). Retrieved Januray 20, 2018 from http://spark.apache.org/docs/latest/configuration.htmlGoogle ScholarGoogle Scholar
  43. Aparch Spark Team. 2016. Tuning Spark. (June 2016). Retrieved Januray 20, 2018 from http://spark.apache.org/docs/latest/tuning.htmlGoogle ScholarGoogle Scholar
  44. Spark Streaming Team. 2016. Spark Streaming. (March 2016). Retrieved Januray 20, 2018 from http://spark.apache.org/streaming/Google ScholarGoogle Scholar
  45. White Tom. 2012. Hadoop: The definitive guide. O'Reilly Media, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Virginia Torczon and Michael W Trosset. {n. d.}. From Evolutionary Operation to Parallel Direct Search: Pattern Search Algorithms for Numerical Optimization. Computing Science and Statistics 29 ({n. d.}).Google ScholarGoogle Scholar
  47. Inc. TypeSafe. 2015. Apache Spark Survey from Typesafe. (Jan. 2015). Retrieved Januray 20, 2018 from https://dzone.com/articles/apache-spark-survey-typesafe-0Google ScholarGoogle Scholar
  48. Md. Wasi ur Rahman, Nusrat Sharmin Islam, Xiaoyi Lu, Dipti Shankar, and Dhabaleswar K. Panda. 2016. MR-Advisor: A Comprehensive Tuning Tool for Advising HPC Users to Accelerate MapReduce Applications on Supercomputers. In Proceedings of 2016 IEEE 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'16). IEEE Computer Society, Los Angeles, CA, 198-205.Google ScholarGoogle Scholar
  49. Guolu Wang, Jungang Xu, and Ben He. 2016. A Novel Method for Tuning Configuration parameters of Spark Based on Machine Learning. In Proceedings of the 2016 IEEE 18th International Conference on High Performance Computing and Communications (HPCC) (HPCC'16). IEEE Computer Society, Sydney, Australia, 586-593.Google ScholarGoogle ScholarCross RefCross Ref
  50. Jingjing Wang and Magdalena Balazinska. 2016. Toward elastic memory management for cloud data analytics. In Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond. ACM Press, San Francisco, CA, 1-7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Reynold S. Xin, Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. 2013. GraphX: A Resilient Distributed Graph System on Spark. In Proceedings of the First International Workshop on Graph Data Management Experimence and System. 1-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Wen Xiong, Zhibin Yu, Lieven Eeckhout, Zhengdong Bei, Fan Zhang, and Chengzhong Xu. 2015. SZTS: A Novel Big Data Transportation System Benchmark Suite. In Proceedings of the 44th International Conference on Parallel Processing (ICPP) (ICPP'15). IEEE, Beijin, China, 819-828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker. 2015. Hey, You Have Given Me Too Many Knobs. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM Press, Bergamo, Italy, 307-319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Tianyin Xu, Jiaqi Zhang, Peng Huang, Jing Zheng, Tianwei Sheng, Ding Yuan, Yuanyuan Zhou, and Shankar Pasupathy. 2013. Do Not Blame Users for Misconfigurations. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP) (SOSP'13). USENIX Association, Farmington, Pennsylvania, 244-259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Tianyin Xu and Yuanyuan Zhou. 2015. Systems Approaches to Tackling Configuration Errors: A Survey. Comput. Surveys 47, 4 (July 2015), 1-41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Tao Ye and Shivkumar Kalyanaraman. {n. d.}. A Recursive Random Search Algorithm for Large-Scale Network Parameter Configuration. ACM SIGMETRICS Performance Evaluation Review 31, 1 ({n. d.}). Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Nezih Yigitbasi, Theodore L. Willke, Guangdeng Liao, and Dick H. J. Epema. 2013. Towards Machine Learning-Based Auto-tuning of MapReduce. In Proceedings of the 21st International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) (MASCOTS'13). IEEE Computer Society, San Francisco, CA, 11-20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Zuoning Yin, Xiao Ma, Jing Zheng, Yuanyuan Zhou, Lakshmi N. Bairavasundaram, and Shankar Pasupathy. 2011. An Empirical Study on Configuration Errors in Commercial and Open Source Systems. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP) (SOSP'11). USENIX Association, Cascais, Portugal, 159-172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud) (HotCloud'10). USENIX Association, Boston, MA, 1-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Jiaqi Zhang, Lakshminarayanan Renganarayana, Xiaolan Zhang, Niyu Ge, Vasanth Bala, Tianyin Xu, and Yuanyuan Zhou. 2014. EnCore: Exploiting System Environment and Correlation Information for Misconfiguration Detection. In Proceedings of the 19th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS) (ASPLOS'14). ACM Press, Salt Lake City, UT, 687-700 Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Yao Zhao, Fei Hu, and Haopeng Chen. 2016. An Adaptive Tuning Strategy on Spark Based on In-memory Computation Characteristics. In Proceedings of the 18th International Conference on Advanced Communication Technology (ICACT) (ICACT'16). PyeongChang, Korea (South), 484-488.Google ScholarGoogle Scholar

Index Terms

  1. Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 53, Issue 2
        ASPLOS '18
        February 2018
        809 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3296957
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
          March 2018
          827 pages
          ISBN:9781450349116
          DOI:10.1145/3173162

        Copyright © 2018 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 March 2018

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!