Abstract
Modern software systems are often equipped with hundreds to thousands of configurations, many of which greatly affect performance. Unfortunately, properly setting these configurations is challenging for developers due to the complex and dynamic nature of system workload and environment. In this paper, we first conduct an empirical study to understand performance-sensitive configurations and the challenges of setting them in the real-world. Guided by our study, we design a systematic and general control-theoretic framework, SmartConf, to automatically set and dynamically adjust performance-sensitive configurations to meet required operating constraints while optimizing other performance metrics. Evaluation shows that SmartConf is effective in solving real-world configuration problems, often providing better performance than even the best static configuration developers can choose under existing configuration systems.
- Mona Attariyan, Michael Chow, and Jason Flinn. X-ray: Automating root-cause diagnosis of performance anomalies in production software. In OSDI, 2012. Google Scholar
Digital Library
- Mona Attariyan and Jason Flinn. Automating configuration troubleshooting with dynamic information flow analysis. In OSDI, 2010. Google Scholar
Digital Library
- Ramazan Bitirgen, Engin Ipek, and Jose F. Martinez. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In MICRO, 2008. Google Scholar
Digital Library
- CASSANDRA-1007. Make memtable flush thresholds per-cf instead of global. https://issues.apache.org/jira/browse/CASSANDRA-1007.Google Scholar
- Chi-Ou Chen, Ye-Qi Zhuo, Chao-Chun Yeh, Che-Min Lin, and Shih-Wei Liao. Machine learning-based configuration parameter tuning on hadoop system. In BigData Congress, 2015. Google Scholar
Digital Library
- Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In SoCC, 2010. Google Scholar
Digital Library
- Sarah Dean, Horia Mania, Nikolai Matni, Benjamin Recht, and Stephen Tu. On the sample complexity of the linear quadratic regulator. Technical Report 1710.01688v1, arXiv, 2017.Google Scholar
- Christina Delimitrou and Christos Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. In ASPLOS, 2013. Google Scholar
Digital Library
- Christina Delimitrou and Christos Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. In ASPLOS, 2014. Google Scholar
Digital Library
- Zhaoxia Deng, Lunkai Zhang, Nikita Mishra, Henry Hoffmann, and Fred Chong. Memory cocktail therapy: A general learning-based framework to optimize dynamic tradeoffs in nvm. In MICRO, 2017. Google Scholar
Digital Library
- Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, and Shan Lu. Interruptible tasks: Treating memory pressure as interrupts for highly scalable data-parallel programs. In SOSP, 2015. Google Scholar
Digital Library
- Antonio Filieri, Henry Hoffmann, and Martina Maggio. Automated design of self-adaptive software with control-theoretical formal guarantees. In ICSE, 2014. Google Scholar
Digital Library
- Antonio Filieri, Henry Hoffmann, and Martina Maggio. Automated multi-objective control for self-adaptive software design. In ESEC/FSE, 2015. Google Scholar
Digital Library
- Antonio Filieri, Martina Maggio, Konstantinos Angelopoulos, Nicolás D'Ippolito, Ilias Gerostathopoulos, Andreas Berndt Hempel, Henry Hoffmann, Pooyan Jamshidi, Evangelia Kalyvianaki, Cristian Klein, Filip Krikava, Sasa Misailovic, Alessandro Vittorio Papadopoulos, Suprio Ray, Amir Molzam Sharifloo, Stepan Shevtsov, Mateusz Ujma, and Thomas Vogel. Control strategies for self-adaptive software systems. TAAS, 11(4), 2017. Google Scholar
Digital Library
- Archana Ganapathi, Kaushik Datta, Armando Fox, and David Patterson. A case for machine learning to optimize multicore performance. In HotPar, 2009. Google Scholar
Digital Library
- Archana Ganapathi, Yi-Min Wang, Ni Lao, and Ji-Rong Wen. Why pcs are fragile and what we can do about it: A study of windows registry problems. In DSN, 2004. Google Scholar
Digital Library
- Jim Gray. Why do computers stop and what can be done about it? In Symposium on Reliability in Distributed Software and Database Systems, 1986.Google Scholar
- HBASE-13919. Rationalize client timeout -- it's hard to understand what all of these mean and how they interact. https://issues.apache.org/jira/browse/HBASE-13919.Google Scholar
- Joseph L Hellerstein. Challenges in control engineering of computing systems. In ACC, 2004. Google Scholar
Digital Library
- Joseph L. Hellerstein, Yixin Diao, Sujay Parekh, and Dawn M. Tilbury. Feedback Control of Computing Systems. John Wiley & Sons, 2004. Google Scholar
Digital Library
- Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. Starfish: A self-tuning system for big data analytics. In CIDR, 2011.Google Scholar
- Henry Hoffmann. Jouleguard: energy guarantees for approximate applications. In SOSP, 2015. Google Scholar
Digital Library
- Henry Hoffmann, Jim Holt, George Kurian, Eric Lau, Martina Maggio, Jason E. Miller, Sabrina M. Neuman, Mahmut Sinangil, Yildiz Sinangil, Anant Agarwal, Anantha P. Chandrakasan, and Srinivas Devadas. Self-aware computing in the Angstrom processor. In DAC, 2012. Google Scholar
Digital Library
- Robert Vincent Hogg and Elliot A Tanis. Probability and statistical inference. Pearson Educational International, 2009.Google Scholar
- T. Horvath, T. Abdelzaher, K. Skadron, and Xue Liu. Dynamic voltage scaling in multitier web servers with end-to-end delay control. TC, 56(4), 2007. Google Scholar
Digital Library
- Jian Huang, Xuechen Zhang, and Karsten Schwan. Understanding issue correlations: a case study of the hadoop system. In SoCC, 2015. Google Scholar
Digital Library
- Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In ICDEW, 2010.Google Scholar
Cross Ref
- Connor Imes, David HK Kim, Martina Maggio, and Henry Hoffmann. Poet: A portable approach to minimizing energy under soft real-time constraints. In RTAS, 2015.Google Scholar
Cross Ref
- E. Ipek, O. Mutlu, J. F. Martinez, and R. Caruana. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach. In ISCA, 2008. Google Scholar
Digital Library
- Engin Ïpek, Sally A McKee, Rich Caruana, Bronis R de Supinski, and Martin Schulz. Efficiently exploring architectural design spaces via predictive modeling. In ACM SIGOPS Operating Systems Review, 2006. Google Scholar
Digital Library
- Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Í nigo Goiri, Subru Krishnan, Janardhan Kulkarni, and Sriram Rao. Morpheus: Towards automated slos for enterprise clusters. In OSDI, 2016. Google Scholar
Digital Library
- Christos Karamanolis, Magnus Karlsson, and Xiaoyun Zhu. Designing controllable computer systems. In HOTOS, 2005. Google Scholar
Digital Library
- William S Levine. The control handbook. CRC press, 1996.Google Scholar
- Baochun Li and K. Nahrstedt. A control-based middleware framework for quality-of-service adaptations. JSAC, 17(9), 1999. Google Scholar
Digital Library
- C. Lu, Y. Lu, T.F. Abdelzaher, J.A. Stankovic, and S.H. Son. Feedback control architecture and design methodology for service delay guarantees in web servers. TPDS, 17(9), September 2006. Google Scholar
Digital Library
- Martina Maggio, Henry Hoffmann, Alessandro Vittorio Papadopoulos, Jacopo Panerati, Marco D. Santambrogio, Anant Agarwal, and Alberto Leva. Comparison of decision-making strategies for self-optimization in autonomic computing systems. TAAS, 7(4), 2012. Google Scholar
Digital Library
- MAPREDUCE-6143. add configuration for mapreduce speculative execution in mr2. https://issues.apache.org/jira/browse/MAPREDUCE-6143.Google Scholar
- J. F. Martinez and E. Ipek. Dynamic multicore resource management: A machine learning approach. Micro, 29(5), Sept 2009. Google Scholar
Digital Library
- Nikita Mishra, Connor Imes, John D. Lafferty, and Henry Hoffmann. CALOREE: learning control for predictable latency and low energy. In ASPLOS, 2018. Google Scholar
Digital Library
- Nikita Mishra, Huazhe Zhang, John D. Lafferty, and Henry Hoffmann. A probabilistic graphical model-based approach for minimizing energy under performance constraints. In ASPLOS, 2015. Google Scholar
Digital Library
- Kiran Nagaraja, Fábio Oliveira, Ricardo Bianchini, Richard P. Martin, and Thu D. Nguyen. Understanding and dealing with operator mistakes in internet services. In OSDI, 2004. Google Scholar
Digital Library
- Ariel Rabkin and Randy Katz. Precomputing possible configuration error diagnoses. In ASE, 2011. Google Scholar
Digital Library
- Ariel Rabkin and Randy Howard Katz. How hadoop clusters break. IEEE software, 30(4), 2013. Google Scholar
Digital Library
- G.A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D.I. August. Swift: software implemented fault tolerance. In CGO, 2005. Google Scholar
Digital Library
- Muhammad Husni Santriaji and Henry Hoffmann. GRAPE: minimizing energy for GPU applications with performance requirements. In MICRO, 2016.Google Scholar
Cross Ref
- Stepan Shevtsov and Danny Weyns. Keep it simplex: Satisfying multiple goals with guarantees in control-based self-adaptive systems. In FSE, 2016. Google Scholar
Digital Library
- Filippo Sironi, Martina Maggio, Riccardo Cattaneo, Giovanni F Del Nero, Donatella Sciuto, and Marco D Santambrogio. Thermos: System support for dynamic thermal management of chip multi-processors. In PACT, 2013. Google Scholar
Digital Library
- StackOverflow. Stack overflow business solutions: Looking to understand, engage, or hire developers? https://stackoverflow.com/.Google Scholar
- Q. Sun, G. Dai, and W. Pan. LPV model and its application in web server performance control. In ICCSSE, 2008. Google Scholar
Digital Library
- Richard S. Sutton and Andrew Barto. Reinforcement Learning: An Introduction, Second Edition. MIT Press, 2012.Google Scholar
- G. Tesauro. Reinforcement learning in autonomic computing: A manifesto and case studies. IC, 11, 2007. Google Scholar
Digital Library
- Stephen Tu and Benjamin Recht. Least-squares temporal difference learning for the linear quadratic regulator. Technical Report 1712.08642v1, arXiv, 2017.Google Scholar
- Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. Automatic database management system tuning through large-scale machine learning. In SIGMOD, 2017. Google Scholar
Digital Library
- Chad Verbowski, Emre Kiciman, Arunvijay Kumar, Brad Daniels, Shan Lu, Juhan Lee, Yi-Min Wang, and Roussi Rousse. Flight data recorder: Monitoring persistent-state interactions to improve systems management. In OSDI, 2006. Google Scholar
Digital Library
- Helen J Wang, John Platt, Yu Chen, Ruyun Zhang, and Yi-Min Wang. Peerpressure: A statistical method for automatic misconfiguration troubleshooting. Technical report, Microsoft Research, 2003.Google Scholar
- Yi-Min Wang, Chad Verbowski, John Dunagan, Yu Chen, Helen J. Wang, Chun Yuan, and Zheng Zhang. Strider: a black-box, state-based approach to change and configuration management and support. Sci. Comput. Program., 53(2), 2004.Google Scholar
- Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker. Hey, You Have Given Me Too Many Knobs! Understanding and Dealing with Over-Designed Configuration in System Software. In ESEC/FSE, 2015. Google Scholar
Digital Library
- Tianyin Xu, Xinxin Jin, Peng Huang, Yuanyuan Zhou, Shan Lu, Long Jin, and Shankar Pasupathy. Early detection of configuration errors to reduce failure damage. In OSDI, 2017. Google Scholar
Digital Library
- Tianyin Xu, Jiaqi Zhang, Peng Huang, Jing Zheng, Tianwei Sheng, Ding Yuan, Yuanyuan Zhou, and Shankar Pasupathy. Do not blame users for misconfigurations. In SOSP, 2013. Google Scholar
Digital Library
- Nezih Yigitbasi, Theodore L Willke, Guangdeng Liao, and Dick Epema. Towards machine learning-based auto-tuning of mapreduce. In MASCOTS, 2013. Google Scholar
Digital Library
- Zuoning Yin, Xiao Ma, Jing Zheng, Yuanyuan Zhou, Lakshmi N Bairavasundaram, and Shankar Pasupathy. An empirical study on configuration errors in commercial and open source systems. In SOSP, 2011. Google Scholar
Digital Library
- Zuoning Yin, Xiao Ma, Jing Zheng, Yuanyuan Zhou, Lakshmi N. Bairavasundaram, and Shankar Pasupathy. An empirical study on configuration errors in commercial and open source systems. In SOSP, 2011. Google Scholar
Digital Library
- Ding Yuan, Yinglian Xie, Rina Panigrahy, Junfeng Yang, Chad Verbowski, and Arunvijay Kumar. Context-based online configuration-error detection. In USENIX ATC, 2011. Google Scholar
Digital Library
- Wanghong Yuan and Klara Nahrstedt. Energy-efficient soft real-time CPU scheduling for mobile multimedia systems. In SOSP, 2003. Google Scholar
Digital Library
- Huazhe Zhang and Henry Hoffmann. Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques. In ASPLOS, 2016. Google Scholar
Digital Library
- Jiaqi Zhang, Lakshminarayanan Renganarayana, Xiaolan Zhang, Niyu Ge, Vasanth Bala, Tianyin Xu, and Yuanyuan Zhou. Encore: Exploiting system environment and correlation information for misconfiguration detection. In ASPLOS, 2014. Google Scholar
Digital Library
- Ronghua Zhang, Chenyang Lu, Tarek F Abdelzaher, and John A Stankovic. Controlware: A middleware architecture for feedback control of software performance. In ICDCS, 2002. Google Scholar
Digital Library
- Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, and Yingchun Yang. Bestconfig: tapping the performance potential of systems via automatic configuration tuning. In SoCC, 2017. Google Scholar
Digital Library
Index Terms
Understanding and Auto-Adjusting Performance-Sensitive Configurations
Recommendations
Statically inferring performance properties of software configurations
EuroSys '20: Proceedings of the Fifteenth European Conference on Computer SystemsModern software systems often have a huge number of configurations whose performance properties are poorly documented. Unfortunately, obtaining a good understanding of these performance properties is a prerequisite for performance tuning. This paper ...
Understanding and Auto-Adjusting Performance-Sensitive Configurations
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating SystemsModern software systems are often equipped with hundreds to thousands of configurations, many of which greatly affect performance. Unfortunately, properly setting these configurations is challenging for developers due to the complex and dynamic nature ...
AgileCtrl: a self-adaptive framework for configuration tuning
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringSoftware systems increasingly expose performance-sensitive configuration parameters, or PerfConfs, to users. Unfortunately, the right settings of these PerfConfs are difficult to decide and often change at run time. To address this problem, prior ...







Comments