Abstract
Multi-core systems are rapidly becoming more prevalent. Consequently, developers frequently face performance bugs caused by unexpected interactions between parallel software components. The location of these bugs is difficult to identify with current tools. Indeed, the process exhibiting the slowness may be separated from the root cause of the problem by a blocking chain involving several other processes.
This article introduces a new approach for analyzing blocking on multi-core systems and reports on its implementation in the LTTV Delay Analyzer. It enables developers to quickly understand the dependencies among processes and see how the total elapsed time is divided into its main components. The LTTV Delay Analyzer was used to analyze and rapidly correct complex performance problems, something not possible with the existing tools. The Linux Trace Toolkit, LTTng, is used for most of the instrumentation and the trace recording, allowing the tracing of production systems with great accuracy and minimal impact. This approach uses solely kernel instrumentation and does not require the instrumentation or recompilation of processes. The analysis time is linear with respect to trace size.
- K Desktop Environment (KDE). http://www.kde.org. Verified 2009/01/05.Google Scholar
- LTTV. http://ltt.polymtl.ca. Verified 2009/01/05.Google Scholar
- MediaWiki. http://www.mediawiki.org. Verified 2009/01/05.Google Scholar
- memcached. http://www.danga.com/memcached/. Verified 2009/01/05.Google Scholar
- QNX Momentics. http://www.qnx.com. Verified 2009/01/05.Google Scholar
- tbench. http://samba.org/ftp/tridge/dbench/README. Verified 2009/01/05.Google Scholar
- Tracing Wiki. http://ltt.polymtl.ca/tracingwiki. Verified 2009/01/05.Google Scholar
- Wind River Workbench. http://www.windriver.com/products/workbench/. Verified 2009/01/05.Google Scholar
- P. Barham, A. Donnelly, R. Isaacs, and R. Mortier. Using Magpie for request extraction and workload modelling. In Symposium on Operating Systems Design and Implementation, pages 259--272, 2004. Google Scholar
Digital Library
- B. Cantrill, M. Shapiro, and A. Leventhal. Dynamic instrumentation of production systems. pages 15--28, Boston, MA, USA, 2004.Google Scholar
- J. Corbet. Kernel markers. LWN.net, Aug. 2007. http://lwn.net/Articles/245671/. Verified 2009/01/05.Google Scholar
- M. Desnoyers and M. R. Dagenais. The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux. In Linux Symposium, Ottawa, Ontario, Canada, June 2006.Google Scholar
- M. Desnoyers and M. R. Dagenais. LTTng: Tracing across execution layers, from the hypervisor to user-space. In Linux Symposium, 2008.Google Scholar
- Y. Endo and M. Seltzer. Improving interactive performance using TIPME. SIGMETRICS Perform. Eval. Rev., 28(1):240--251, 2000. Google Scholar
Digital Library
- S. Graham, P. Kessler, and M. McKusick. gprof: a call graph execution profiler. volume 17, pages 120--6, Boston, MA, USA, 1982. Google Scholar
Digital Library
- J. Levon and P. Elie. Oprofile: A system profiler for Linux, 2005.Google Scholar
- P. Miller Barton, D. Callaghan Mark, M. Cargille Jonathan, et al. The Paradyn Parallel Performance Measurement Tool. IEEE Computer, 28(11):37--46, 1995. Google Scholar
Digital Library
- P. Reynolds, C. Killian, J. Wiener, J. Mogul, M. Shah, and A. Vahdat. Pip: Detecting the unexpected in distributed systems. In Symposium on Networked Systems Design and Implementation, pages 115--128, 2006. Google Scholar
Digital Library
- Y. Ruan and V. Pai. Making the "box" transparent: system call performance as a first-class result. In Proceedings of the annual conference on USENIX Annual Technical Conference. USENIX Association Berkeley, CA, USA, 2004. Google Scholar
Digital Library
- S. Sandmann. Sysprof-a system-wide linux profiler. http://www.daimi.au.dk/~sandmann/sysprof/. Verified 2009/01/05.Google Scholar
Index Terms
Analyzing blocking to debug performance problems on multi-core systems
Recommendations
Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation
ISPASS '11: Proceedings of the IEEE International Symposium on Performance Analysis of Systems and SoftwareThe state-of-the-art general-purpose graphic processing units (GPGPUs) can offer very high computational throughput for general-purpose, highly-parallel applications using hundreds of available on-chip cores. Meanwhile, as technology is scaled down ...
Debug determinism: the sweet spot for replay-based debugging
HotOS'13: Proceedings of the 13th USENIX conference on Hot topics in operating systemsDeterministic replay tools offer a compelling approach to debugging hard-to-reproduce bugs. Recent work on relaxed-deterministic replay techniques shows that replay debugging with low in-production overhead is possible. However, despite considerable ...
High Performance Parallel Summed-Area Table Kernels for Multi-core and Many-core Systems
Proceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 9833The summed-area table SAT, also known as integral image, is a data structure extensively used in computer graphics and vision for fast image filtering. The parallelization of its construction has been thoroughly investigated and many algorithms have ...






Comments