skip to main content
research-article

Fay: Extensible Distributed Tracing from Kernels to Clusters

Published:01 November 2012Publication History
Skip Abstract Section

Abstract

Fay is a flexible platform for the efficient collection, processing, and analysis of software execution traces. Fay provides dynamic tracing through use of runtime instrumentation and distributed aggregation within machines and across clusters. At the lowest level, Fay can be safely extended with new tracing primitives, including even untrusted, fully optimized machine code, and Fay can be applied to running user-mode or kernel-mode software without compromising system stability. At the highest level, Fay provides a unified, declarative means of specifying what events to trace, as well as the aggregation, processing, and analysis of those events.

We have implemented the Fay tracing platform for Windows and integrated it with two powerful, expressive systems for distributed programming. Our implementation is easy to use, can be applied to unmodified production systems, and provides primitives that allow the overhead of tracing to be greatly reduced, compared to previous dynamic tracing platforms. To show the generality of Fay tracing, we reimplement, in experiments, a range of tracing strategies and several custom mechanisms from existing tracing frameworks.

Fay shows that modern techniques for high-level querying and data-parallel processing of disagreggated data streams are well suited to comprehensive monitoring of software execution in distributed systems. Revisiting a lesson from the late 1960s [Deutsch and Grant 1971], Fay also demonstrates the efficiency and extensibility benefits of using safe, statically verified machine code as the basis for low-level execution tracing. Finally, Fay establishes that, by automatically deriving optimized query plans and code for safe extensions, the expressiveness and performance of high-level tracing queries can equal or even surpass that of specialized monitoring tools.

References

  1. Ansel, J., Marchenko, P., Erlingsson, Ú., Taylor, E., Chen, B., Schuff, D. L., Sehr, D.,Biffle, C. L., and Yee, B. 2011. Language-independent sandboxing of just-in-time compilation and self-modifying code. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apache. Hadoop project. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  3. Avgustinov, P., Tibble, J., Bodden, E., Hendren, L., Lhotak, O., de Moor, O., Ongkingco, N., and Sittampalam, G. 2006. Efficient trace monitoring. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Balazinska, M., Balakrishnan, H., Madden, S., and Stonebraker, M. 2005. Fault-tolerance in the Borealis distributed stream processing system. In Proceedings of the ACM SIGMOD International Conference Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Barham, P., Donnelly, A., Isaacs, R., and Mortier, R. 2004. Using Magpie for request extraction and workload modelling. In Proceedings of the Conference on Operating System Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bershad, B. N., Savage, S., Pardyak, P., Becker, D., Fiuczynski, M., and Sirer, E. G. 1995. Protection is a software issue. In Proceedings of the 5th Workshop on Hot Topics in Operating Systems (HotOS-V). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bhatia, S., Kumar, A., Fiuczynski, M. E., and Peterson, L. 2008. Lightweight, high-resolution monitoring for troubleshooting production systems. In Proceedings of the Conference on Operating System Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bungale, P. P. and Luk, C.-K. 2007. PinOS: A programmable framework for whole-system dynamic instrumentation. In Proceedings of the 3rd International ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environment (VEE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Burrows, M., Erlingsson, Ú., Leung, S.-T. A., Vandevoorde, M. T., Waldspurger, C. A., Walker, K., and Weihl, W. E. 2000. Efficient and flexible value sampling. In Proceedings of the Internaational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cantrill, B. 2006. Hidden in plain sight. ACM Queue 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cantrill, B. M., Shapiro, M. W., and Leventhal, A. H. 2004. Dynamic instrumentation of production systems. In Proceedings of the USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cao, Q., Abdelzaher, T., Stankovic, J., Whitehouse, K., and Luo, L. 2008. Declarative tracepoints: A programmable and application independent debugging system for wireless sensor networks. In Proceedings of the International Conference on Embedded Networked Sensor Systems (SenSys). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R. R., Bradshaw, R., and Weizenbaum, N. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dean, J. and Ghemawat, S. 2010. MapReduce: A flexible data processing tool. Comm. ACM 53, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Deutsch, P. and Grant, C. A. 1971. A flexible measurement tool for software systems. In Proceedings of the IFIP Congress 71.Google ScholarGoogle Scholar
  16. Eclipse. Callgraph plug-in. http://wiki.eclipse.org/Linux_Tools_Project/Callgraph/User_Guide.Google ScholarGoogle Scholar
  17. Eigler, F. C. 2010. Systemtap tutorial. http://sourceware.org/systemtap/tutorial/.Google ScholarGoogle Scholar
  18. Erlingsson, Ú., Abadi, M., Vrable, M., Budiu, M., and Necula, G. C. 2006a. XFI: Software guards for system address spaces. In Proceedings of the Conference on Operating System Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Erlingsson, Ú., Manasse, M., and McSherry, F. 2006b. A cool and practical alternative to traditional hash tables. In Proceedings of the Workshop on Distributed Data and Structures.Google ScholarGoogle Scholar
  20. Etsion, Y., Tsafrir, D., Kirkpatrick, S., and Feitelson, D. G. 2007. Fine grained kernel logging with KLogger: Experience and insights. In Proceedings of the 2007 EuroSys Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. flume. Flume: Open source log collection system. http://github.com/cloudera/flume.Google ScholarGoogle Scholar
  22. Gao, D., Jensen, S., Snodgrass, R. T., and Soo, M. D. 2005. Join operations in temporal databases. Int. J. Very Large Datab. (VLDB Journal) 14, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Glerum, K., Kinshumann, K., Greenberg, S., Aul, G., Orgovan, V., Nichols, G., Grant, D., Loihle, G., and Hunt, G. 2009. Debugging in the (very) large: Ten years of implementation and experience. In Proceedings of the 22nd ACM Symposium on Operating System Principles (SOSP’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Goldsmith, S. F., O’Callahan, R., and Aiken, A. 2005. Relational queries over program traces. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA’05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Gupta, A., Mumick, I. S., and Subrahmanian, V. S. 1993. Maintaining views incrementally. In Proceedings of the ACM International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hunt, G. and Brubacher, D. 1998. Detours: Binary interception of Win32 functions. In Proceedings of the USENIX Windows NT Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. 2007. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lee, G. L., Schulz, M., Ahn, D. H., Bernat, A., de Supinskil, B. R., Ko, S. Y., and Rountree, B. 2007. Dynamic binary instrumentation and data aggregation on large scale systems. Int. J. Parall. Prog. 35, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Liblit, B., Aiken, A., Zheng, A. X., and Jordan, M. I. 2003. Bug isolation via remote program sampling. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI) 38, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Marguerie, F., Eichert, S., and Wooley, J. 2008. LINQ in Action. Manning Publications Co. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Marian, T., Sagar, A., Chen, T., and Weatherspoon, H. 2011. Fmeter: Extracting indexable low-level system signatures by counting kernel function calls. Tech. rep., Cornell University, Computing and Information Science. http://hdl.handle.net/1813/23568.Google ScholarGoogle Scholar
  32. Martin, M., Livshits, B., and Lam, M. S. 2005. Finding application errors and security flaws using PQL: A program query language. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA’05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Massie, M. L., Chun, B. N., and Culler, D. E. 2003. The Ganglia distributed monitoring system: Design, implementation and experience. Int. J. Parall. Comput. 30.Google ScholarGoogle Scholar
  34. McSherry, F., Yu, Y., Budiu, M., Isard, M., and Fetterly, D. 2011. Scaling Up Machine Learning. Cambridge Univ. Press.Google ScholarGoogle Scholar
  35. Microsoft Corp. Determine which queries are holding locks. MSDN. http://msdn.microsoft.com/en-us/library/bb677357.aspx.Google ScholarGoogle Scholar
  36. Microsoft Corp. 2003. Introduction to hotpatching. Microsoft TechNet.Google ScholarGoogle Scholar
  37. Microsoft Corp. 2006. Kernel patch protection: Frequently asked questions. Windows Hardware Developer Central. http://www.microsoft.com/whdc/driver/kernel/64bitpatch_FAQ.mspx.Google ScholarGoogle Scholar
  38. Microsoft Corp. 2010. WDK and developer tools. Windows Hardware Developer Central. http://www.microsoft.com/whdc/DevTools/default.mspx.Google ScholarGoogle Scholar
  39. Microsoft Corp. 2011a. Diagnosing and resolving latch contention on SQL Server. Microsoft Download Center. http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=%%26665.Google ScholarGoogle Scholar
  40. Microsoft Corp. 2011b. Introducing SQL Server extended events. MSDN. http://msdn.microsoft.com/en-us/library/bb630354.aspx.Google ScholarGoogle Scholar
  41. Microsoft Corp. 2011c. Use the Microsoft Symbol Server to obtain debug symbol files. http://support.microsoft.com/kb/311503.Google ScholarGoogle Scholar
  42. Microsoft Corp. 2012. Microsoft StreamInsight. MSDN. http://msdn.microsoft.com/en-us/library/ee362541.aspx.Google ScholarGoogle Scholar
  43. Morrisett, G., Walker, D., Crary, K., and Glew, N. 1998. From System F to typed assembly language. In Proceedings of the Symposium on Principles of Programming Languages (POPL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Necula, G. C. 1997. Proof-carrying code. In Proceedings of the Symposium on Principles of Programming Languages (POPL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Nethercote, N. and Seward, J. 2007. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Oney, W. 2002. Programming the Microsoft Windows Driver Model. Microsoft Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Park, I. and Buch, R. 2007. Improve debugging and performance tuning with ETW. MSDN Magazine.Google ScholarGoogle Scholar
  48. Passing, J., Schmidt, A., von Lowis, M., and Polze, A. 2009. NTrace: Function boundary tracing for Windows on IA-32. In Proceedings of the Working Conference on Reverse Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Peter, S., Baumann, A., Roscoe, T., Barham, P., and Isaacs, R. 2008. 30 seconds is not enough!: A study of operating system timer usage. In Proceedings of the 2008 EuroSys Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Pietrek, M. 1997. A crash course on the depths of Win32 structured exception handling. Microsoft Syst. J.Google ScholarGoogle Scholar
  51. Prasad, V., Cohen, W., Eigler, F. C., Hunt, M., Keniston, J., and Chen, B. 2005. Locating system problems using dynamic instrumentation. In Proceedings of the Ottawa Linux Symposium.Google ScholarGoogle Scholar
  52. Ren, G., Tune, E., Moseley, T., Shi, Y., Rus, S., and Hundt, R. 2010. Google-wide profiling: A continuous profiling infrastructure for data centers. IEEE Micro 30, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Romer, T. H., Lee, D., Voelker, G. M., Wolman, A., Wong, W. A., Baer, J.-L., Bershad, B. N., and Levy, H. M. 1996. The structure and performance of interpreters. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Rostedt, S. 2009. Debugging the kernel using Ftrace. lwn.net.Google ScholarGoogle Scholar
  55. Russinovich, M. E., Solomon, D. A., and Ionescu, A. 2009. Microsoft Windows Internals. Microsoft Press.Google ScholarGoogle Scholar
  56. Sigelman, B. H., Barroso, L. A., Burrows, M., Stephenson, P., Plakal, M., Beaver, D., Jaspan, S., and Shanbhag, C. 2010. Dapper, a large-scale distributed systems tracing infrastructure. Tech. rep. 2010-1, Google Inc.Google ScholarGoogle Scholar
  57. Skadron, K., Ahuja, P. S., Martonosi, M., and Clark, D. W. 1998. Improving prediction for procedure returns with return-address-stack repair mechanisms. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Small, C. and Seltzer, M. I. 1998. MiSFIT: Constructing safe extensible systems. IEEE Concurr.: Parall. Distrib. Mobile Comput. 6, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Sookoor, T., Hnat, T., Hooimeijer, P., Weimer, W., and Whitehouse, K. 2009. Macrodebugging: Global views of distributed program execution. In Proceedings of the International Conference on Embedded Networked Sensor Systems (SenSys). Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Srivastava, A., Edwards, A., and Vo, H. 2001. Vulcan: Binary transformation in a distributed environment. Tech. rep. MSR-TR-2001-50, Microsoft Research.Google ScholarGoogle Scholar
  61. Stanek, W. 2009. Windows PowerShell(TM) 2.0 Administrator’s Pocket Consultant. Microsoft Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Strosaker, M. Sample real-world use of systemtap. http://zombieprocess.wordpress.com/2008/01/03/sample-real-world-use-of-systemtap/.Google ScholarGoogle Scholar
  63. SystemTap. Examples. http://sourceware.org/systemtap/examples/.Google ScholarGoogle Scholar
  64. SystemTap. 2006. Bug 2725: function(“*”) probes sometimes crash & burn. http://sources.redhat.com/bugzilla/show_bug.cgi?id=2725.Google ScholarGoogle Scholar
  65. Varghese, G. and Lauck, A. 1997. Hashed and hierarchical timing wheels. IEEE/ACM Trans. Netw. 5, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Verbowski, C., Kiciman, E., Kumar, A., Daniels, B., Lu, S., Lee, J., Wang, Y.-M., and Roussev, R. 2006. Flight data recorder: Monitoring persistent-state interactions to improve systems management. In Proceedings of the Conference on Operating System Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Wahbe, R., Lucco, S., Anderson, T. E., and Graham, S. L. 1993. Efficient software-based fault isolation. In Proceedings of the 14th ACM Symposium on Operating System Principles (SOSP’93). Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Wisniewski, R. W. and Rosenburg, B. 2003. Efficient, unified, and scalable performance monitoring for multiprocessor operating systems. In Supercomputing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Woodard, D. B. and Goldszmidt, M. 2009. Model-based clustering for online crisis identification in distributed computing. Tech. rep. TR-2009-131, MSR.Google ScholarGoogle Scholar
  70. Yee, B., Sehr, D., Dardyk, G., Chen, J. B., Muth, R., Ormandy, T., Okasaka, S., Narula, N., and Fullagar, N. 2010. Native client: A sandbox for portable, untrusted x86 native code. Comm. ACM 53, 1, 91--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, Ú., Kumar, P. G., and Currey, J. 2008. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the Conference on Operating System Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Yu, Y., Gunda, P. K., and Isard, M. 2009. Distributed aggregation for data-parallel computing: Interfaces and implementations. In Proceedings of the 22nd ACM Symposium on Operating System Principles (SOSP’09). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fay: Extensible Distributed Tracing from Kernels to Clusters

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Computer Systems
                ACM Transactions on Computer Systems  Volume 30, Issue 4
                November 2012
                136 pages
                ISSN:0734-2071
                EISSN:1557-7333
                DOI:10.1145/2382553
                Issue’s Table of Contents

                Copyright © 2012 ACM

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 November 2012
                • Accepted: 1 August 2012
                • Received: 1 July 2012
                Published in tocs Volume 30, Issue 4

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!