Abstract
A key challenge in developing large scale applications (both in system size and in input size) is finding bugs that are latent at the small scales of testing, only manifesting when a program is deployed at large scales. Traditional statistical techniques fail because no error-free run is available at deployment scales for training purposes. Prior work used scaling models to detect anomalous behavior at large scales without being trained on correct behavior at that scale. However, that work cannot localize bugs automatically. In this paper, we extend that work in three ways: (i) we develop an automatic diagnosis technique, based on feature reconstruction; (ii) we design a heuristic to effectively prune the feature space; and (iii) we validate our design through one fault-injection study, finding that our system can effectively localize bugs in a majority of cases.
- ASC Sequoia Benchmark Codes. https://asc.llnl.gov/sequoia/benchmarks/.Google Scholar
- B. J. Barnes, B. Rountree, D. K. Lowenthal, J. Reeves, B. de Supinski, and M. Schulz. A regression-based approach to scalability prediction. In Proceedings of the 22nd annual international conference on Supercomputing, pages 368--377, 2008. Google Scholar
Digital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190--200, 2005. Google Scholar
Digital Library
- B. Zhou, M. Kulkarni, and S. Bagchi. Vrisha: using scaling properties of parallel programs for bug detection and localization. In Proceedings of the 20th ACM international symposium on High performance distributed computing, pages 85--96, 2011. Google Scholar
Digital Library
Index Terms
WuKong: effective diagnosis of bugs at large system scales
Recommendations
WuKong: automatically detecting and localizing bugs that manifest at large system scales
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computingA key challenge in developing large scale applications is finding bugs that are latent at the small scales of testing, but manifest themselves when the application is deployed at a large scale. Here, we ascribe a dual meaning to "large scale"---it could ...
WuKong: automatically detecting and localizing bugs that manifest at large system scales
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computingA key challenge in developing large scale applications is finding bugs that are latent at the small scales of testing, but manifest themselves when the application is deployed at a large scale. Here, we ascribe a dual meaning to "large scale"---it could ...
WuKong: effective diagnosis of bugs at large system scales
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingA key challenge in developing large scale applications (both in system size and in input size) is finding bugs that are latent at the small scales of testing, only manifesting when a program is deployed at large scales. Traditional statistical ...







Comments