skip to main content
research-article
Public Access

Understanding and Auto-Adjusting Performance-Sensitive Configurations

Authors Info & Claims
Published:19 March 2018Publication History
Skip Abstract Section

Abstract

Modern software systems are often equipped with hundreds to thousands of configurations, many of which greatly affect performance. Unfortunately, properly setting these configurations is challenging for developers due to the complex and dynamic nature of system workload and environment. In this paper, we first conduct an empirical study to understand performance-sensitive configurations and the challenges of setting them in the real-world. Guided by our study, we design a systematic and general control-theoretic framework, SmartConf, to automatically set and dynamically adjust performance-sensitive configurations to meet required operating constraints while optimizing other performance metrics. Evaluation shows that SmartConf is effective in solving real-world configuration problems, often providing better performance than even the best static configuration developers can choose under existing configuration systems.

References

  1. Mona Attariyan, Michael Chow, and Jason Flinn. X-ray: Automating root-cause diagnosis of performance anomalies in production software. In OSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mona Attariyan and Jason Flinn. Automating configuration troubleshooting with dynamic information flow analysis. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ramazan Bitirgen, Engin Ipek, and Jose F. Martinez. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In MICRO, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. CASSANDRA-1007. Make memtable flush thresholds per-cf instead of global. https://issues.apache.org/jira/browse/CASSANDRA-1007.Google ScholarGoogle Scholar
  5. Chi-Ou Chen, Ye-Qi Zhuo, Chao-Chun Yeh, Che-Min Lin, and Shih-Wei Liao. Machine learning-based configuration parameter tuning on hadoop system. In BigData Congress, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In SoCC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. On the sample complexity of the linear quadratic regulator. Technical Report 1710.01688v1, arXiv, 2017.Google ScholarGoogle Scholar
  8. Christina Delimitrou and Christos Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. In ASPLOS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Christina Delimitrou and Christos Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. In ASPLOS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Zhaoxia Deng, Lunkai Zhang, Nikita Mishra, Henry Hoffmann, and Fred Chong. Memory cocktail therapy: A general learning-based framework to optimize dynamic tradeoffs in nvm. In MICRO, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, and Shan Lu. Interruptible tasks: Treating memory pressure as interrupts for highly scalable data-parallel programs. In SOSP, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Antonio Filieri, Henry Hoffmann, and Martina Maggio. Automated design of self-adaptive software with control-theoretical formal guarantees. In ICSE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Antonio Filieri, Henry Hoffmann, and Martina Maggio. Automated multi-objective control for self-adaptive software design. In ESEC/FSE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Antonio Filieri, Martina Maggio, Konstantinos Angelopoulos, Nicolás D'Ippolito, Ilias Gerostathopoulos, Andreas Berndt Hempel, Henry Hoffmann, Pooyan Jamshidi, Evangelia Kalyvianaki, Cristian Klein, Filip Krikava, Sasa Misailovic, Alessandro Vittorio Papadopoulos, Suprio Ray, Amir Molzam Sharifloo, Stepan Shevtsov, Mateusz Ujma, and Thomas Vogel. Control strategies for self-adaptive software systems. TAAS, 11(4), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Archana Ganapathi, Kaushik Datta, Armando Fox, and David Patterson. A case for machine learning to optimize multicore performance. In HotPar, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Archana Ganapathi, Yi-Min Wang, Ni Lao, and Ji-Rong Wen. Why pcs are fragile and what we can do about it: A study of windows registry problems. In DSN, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jim Gray. Why do computers stop and what can be done about it? In Symposium on Reliability in Distributed Software and Database Systems, 1986.Google ScholarGoogle Scholar
  18. HBASE-13919. Rationalize client timeout -- it's hard to understand what all of these mean and how they interact. https://issues.apache.org/jira/browse/HBASE-13919.Google ScholarGoogle Scholar
  19. Joseph L Hellerstein. Challenges in control engineering of computing systems. In ACC, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Joseph L. Hellerstein, Yixin Diao, Sujay Parekh, and Dawn M. Tilbury. Feedback Control of Computing Systems. John Wiley & Sons, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. Starfish: A self-tuning system for big data analytics. In CIDR, 2011.Google ScholarGoogle Scholar
  22. Henry Hoffmann. Jouleguard: energy guarantees for approximate applications. In SOSP, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Henry Hoffmann, Jim Holt, George Kurian, Eric Lau, Martina Maggio, Jason E. Miller, Sabrina M. Neuman, Mahmut Sinangil, Yildiz Sinangil, Anant Agarwal, Anantha P. Chandrakasan, and Srinivas Devadas. Self-aware computing in the Angstrom processor. In DAC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Robert Vincent Hogg and Elliot A Tanis. Probability and statistical inference. Pearson Educational International, 2009.Google ScholarGoogle Scholar
  25. T. Horvath, T. Abdelzaher, K. Skadron, and Xue Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. TC, 56(4), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jian Huang, Xuechen Zhang, and Karsten Schwan. Understanding issue correlations: a case study of the hadoop system. In SoCC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In ICDEW, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  28. Connor Imes, David HK Kim, Martina Maggio, and Henry Hoffmann. Poet: A portable approach to minimizing energy under soft real-time constraints. In RTAS, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  29. E. Ipek, O. Mutlu, J. F. Martinez, and R. Caruana. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Engin Ïpek, Sally A McKee, Rich Caruana, Bronis R de Supinski, and Martin Schulz. Efficiently exploring architectural design spaces via predictive modeling. In ACM SIGOPS Operating Systems Review, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Í nigo Goiri, Subru Krishnan, Janardhan Kulkarni, and Sriram Rao. Morpheus: Towards automated slos for enterprise clusters. In OSDI, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Christos Karamanolis, Magnus Karlsson, and Xiaoyun Zhu. Designing controllable computer systems. In HOTOS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. William S Levine. The control handbook. CRC press, 1996.Google ScholarGoogle Scholar
  34. Baochun Li and K. Nahrstedt. A control-based middleware framework for quality-of-service adaptations. JSAC, 17(9), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Lu, Y. Lu, T.F. Abdelzaher, J.A. Stankovic, and S.H. Son. Feedback control architecture and design methodology for service delay guarantees in web servers. TPDS, 17(9), September 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Martina Maggio, Henry Hoffmann, Alessandro Vittorio Papadopoulos, Jacopo Panerati, Marco D. Santambrogio, Anant Agarwal, and Alberto Leva. Comparison of decision-making strategies for self-optimization in autonomic computing systems. TAAS, 7(4), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. MAPREDUCE-6143. add configuration for mapreduce speculative execution in mr2. https://issues.apache.org/jira/browse/MAPREDUCE-6143.Google ScholarGoogle Scholar
  38. J. F. Martinez and E. Ipek. Dynamic multicore resource management: A machine learning approach. Micro, 29(5), Sept 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nikita Mishra, Connor Imes, John D. Lafferty, and Henry Hoffmann. CALOREE: learning control for predictable latency and low energy. In ASPLOS, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nikita Mishra, Huazhe Zhang, John D. Lafferty, and Henry Hoffmann. A probabilistic graphical model-based approach for minimizing energy under performance constraints. In ASPLOS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kiran Nagaraja, Fábio Oliveira, Ricardo Bianchini, Richard P. Martin, and Thu D. Nguyen. Understanding and dealing with operator mistakes in internet services. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ariel Rabkin and Randy Katz. Precomputing possible configuration error diagnoses. In ASE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ariel Rabkin and Randy Howard Katz. How hadoop clusters break. IEEE software, 30(4), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. G.A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D.I. August. Swift: software implemented fault tolerance. In CGO, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Muhammad Husni Santriaji and Henry Hoffmann. GRAPE: minimizing energy for GPU applications with performance requirements. In MICRO, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  46. Stepan Shevtsov and Danny Weyns. Keep it simplex: Satisfying multiple goals with guarantees in control-based self-adaptive systems. In FSE, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Filippo Sironi, Martina Maggio, Riccardo Cattaneo, Giovanni F Del Nero, Donatella Sciuto, and Marco D Santambrogio. Thermos: System support for dynamic thermal management of chip multi-processors. In PACT, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. StackOverflow. Stack overflow business solutions: Looking to understand, engage, or hire developers? https://stackoverflow.com/.Google ScholarGoogle Scholar
  49. Q. Sun, G. Dai, and W. Pan. LPV model and its application in web server performance control. In ICCSSE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Richard S. Sutton and Andrew Barto. Reinforcement Learning: An Introduction, Second Edition. MIT Press, 2012.Google ScholarGoogle Scholar
  51. G. Tesauro. Reinforcement learning in autonomic computing: A manifesto and case studies. IC, 11, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Stephen Tu and Benjamin Recht. Least-squares temporal difference learning for the linear quadratic regulator. Technical Report 1712.08642v1, arXiv, 2017.Google ScholarGoogle Scholar
  53. Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. Automatic database management system tuning through large-scale machine learning. In SIGMOD, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Chad Verbowski, Emre Kiciman, Arunvijay Kumar, Brad Daniels, Shan Lu, Juhan Lee, Yi-Min Wang, and Roussi Rousse. Flight data recorder: Monitoring persistent-state interactions to improve systems management. In OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Helen J Wang, John Platt, Yu Chen, Ruyun Zhang, and Yi-Min Wang. Peerpressure: A statistical method for automatic misconfiguration troubleshooting. Technical report, Microsoft Research, 2003.Google ScholarGoogle Scholar
  56. Yi-Min Wang, Chad Verbowski, John Dunagan, Yu Chen, Helen J. Wang, Chun Yuan, and Zheng Zhang. Strider: a black-box, state-based approach to change and configuration management and support. Sci. Comput. Program., 53(2), 2004.Google ScholarGoogle Scholar
  57. Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker. Hey, You Have Given Me Too Many Knobs! Understanding and Dealing with Over-Designed Configuration in System Software. In ESEC/FSE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Tianyin Xu, Xinxin Jin, Peng Huang, Yuanyuan Zhou, Shan Lu, Long Jin, and Shankar Pasupathy. Early detection of configuration errors to reduce failure damage. In OSDI, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Tianyin Xu, Jiaqi Zhang, Peng Huang, Jing Zheng, Tianwei Sheng, Ding Yuan, Yuanyuan Zhou, and Shankar Pasupathy. Do not blame users for misconfigurations. In SOSP, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Nezih Yigitbasi, Theodore L Willke, Guangdeng Liao, and Dick Epema. Towards machine learning-based auto-tuning of mapreduce. In MASCOTS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Zuoning Yin, Xiao Ma, Jing Zheng, Yuanyuan Zhou, Lakshmi N Bairavasundaram, and Shankar Pasupathy. An empirical study on configuration errors in commercial and open source systems. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Zuoning Yin, Xiao Ma, Jing Zheng, Yuanyuan Zhou, Lakshmi N. Bairavasundaram, and Shankar Pasupathy. An empirical study on configuration errors in commercial and open source systems. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Ding Yuan, Yinglian Xie, Rina Panigrahy, Junfeng Yang, Chad Verbowski, and Arunvijay Kumar. Context-based online configuration-error detection. In USENIX ATC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Wanghong Yuan and Klara Nahrstedt. Energy-efficient soft real-time CPU scheduling for mobile multimedia systems. In SOSP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Huazhe Zhang and Henry Hoffmann. Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques. In ASPLOS, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Jiaqi Zhang, Lakshminarayanan Renganarayana, Xiaolan Zhang, Niyu Ge, Vasanth Bala, Tianyin Xu, and Yuanyuan Zhou. Encore: Exploiting system environment and correlation information for misconfiguration detection. In ASPLOS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Ronghua Zhang, Chenyang Lu, Tarek F Abdelzaher, and John A Stankovic. Controlware: A middleware architecture for feedback control of software performance. In ICDCS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, and Yingchun Yang. Bestconfig: tapping the performance potential of systems via automatic configuration tuning. In SoCC, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Understanding and Auto-Adjusting Performance-Sensitive Configurations

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 53, Issue 2
            ASPLOS '18
            February 2018
            809 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/3296957
            Issue’s Table of Contents
            • cover image ACM Conferences
              ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
              March 2018
              827 pages
              ISBN:9781450349116
              DOI:10.1145/3173162

            Copyright © 2018 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 March 2018

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!