skip to main content
10.1145/1736020.1736051acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

Published:13 March 2010Publication History

ABSTRACT

Instruction-grain lifeguards monitor the events of a running application at the level of individual instructions in order to identify and help mitigate application bugs and security exploits. Because such lifeguards impose a 10-100X slowdown on existing platforms, previous studies have proposed hardware designs to accelerate lifeguard processing. However, these accelerators are either tailored to a specific class of lifeguards or suitable only for monitoring singlethreaded programs.

We present ParaLog, the first design of a system enabling fast online parallel monitoring of multithreaded parallel applications. ParaLog supports a broad class of software-defined lifeguards. We show how three existing accelerators can be enhanced to support online multithreaded monitoring, dramatically reducing lifeguard overheads. We identify and solve several challenges in monitoring parallel applications and/or parallelizing these accelerators, including (i) enforcing inter-thread data dependences, (ii) dealing with inter-thread effects that are not reflected in coherence traffic, (iii) dealing with unmonitored operating system activity, and (iv) ensuring lifeguards can access shared metadata with negligible synchronization overheads. We present our system design for both Sequentially Consistent and Total Store Ordering processors. We implement and evaluate our design on a 16 core simulated CMP, using benchmarks from SPLASH-2 and PARSEC and two lifeguards: a data-flow tracking lifeguard and a memory-access checker lifeguard. Our results show that (i) our parallel accelerators improve performance by 2-9X and 1.13-3.4X for our two lifeguards, respectively, (ii) we are 5-126X faster than the time-slicing approach required by existing techniques, and (iii) our average overheads for applications with eight threads are 51% and 28% for the two lifeguards, respectively.

References

  1. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Bruening. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. PhD thesis, MIT, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. R. Bush, J. D. Pincus, and D. J. Sielaff. A static analyzer for finding dynamic programming errors. Software -- Practice and Experience, 30(7), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Chen, B. Falsafi, P. B. Gibbons, M. Kozuch, T. C. Mowry, R. Teodorescu, A. Ailamaki, L. Fix, G. R. Ganger, B. Lin, and S. W. Schlosser. Log-based architectures for general-purpose monitoring of deployed code. In ASID Workshop at ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Chen, M. Kozuch, P. B. Gibbons, M. Ryan, T. Strigkos, T. C. Mowry, O. Ruwase, E. Vlachos, B. Falsafi, and V. Ramachandran. Flexible hardware acceleration for instruction-grain lifeguards. IEEE Micro, 29(1):62--72, 2009. Top Picks from the 2008 Computer Architecture Conferences. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chen, M. Kozuch, T. Strigkos, B. Falsafi, P. B. Gibbons, T. C. Mowry, V. Ramachandran, O. Ruwase, M. Ryan, and E. Vlachos. Flexible hardware acceleration for instruction-grain program monitoring. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Chung, M. Dalton, H. Kannan, and C. Kozyrakis. Thread-safe dynamic binary translation using transactional memory. In HPCA, 2008.Google ScholarGoogle Scholar
  8. M. L. Corliss, E. C. Lewis, and A. Roth. DISE: A programmable macro engine for customizing applications. In ISCA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. R. Crandall and F. T. Chong. Minos: Control data attack prevention orthogonal to memory model. In MICRO, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Dalton, H. Kannan, and C. Kozyrakis. Raksha: A flexible information flow architecture for software security. In ISCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system--specific, programmer-written compiler extensions. In OSDI, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. D. Ernst, J. Cockrell,W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE Trans. Software Engineering, 27(2), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Flanagan, K. R. M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe, and R. Stata. Extended static checking for Java. In PLDI, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay debugging for distributed applications. In USENIX ATEC, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC + ILP = RC? In ISCA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. L. Goodstein, E. Vlachos, S. Chen, P. B. Gibbons, M. Kozuch, and T. C. Mowry. Butterfly analysis: Adapting dataflow analysis to dynamic parallel monitoring. In ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In HPCA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. R. Hower and M. D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. HP Labs. Cacti 5.1 Technical Report. http://www.hpl.hp.com/research/cacti/.Google ScholarGoogle Scholar
  20. H. Kannan. Ordering decoupled metadata accesses in multiprocessors. In MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Montesinos, M. Hicks, S. T. King, and J. Torrellas. Capo: A software-hardware interface for practical deterministic multiprocessor replay. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. S. Mukherjee, B. Falsafi, M. D. Hill, and D. A. Wood. Coherent network interfaces for fine-grain communication. In ISCA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. Nagarajan and R. Gupta. Architectural support for shadow memory in multiprocessors. In VEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Narayanasamy, C. Pereira, and B. Calder. Recording shared memory dependencies using strata. In ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously recording program execution for deterministic replay debugging. In ISCA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Nethercote. Dynamic Binary Analysis and Instrumentation. PhD thesis, U. Cambridge, 2004. http://valgrind.org.Google ScholarGoogle Scholar
  29. N. Nethercote and J. Seward. Valgrind: A program supervision framework. Electronic Notes in Theoretical Computer Science, 89(2), 2003.Google ScholarGoogle Scholar
  30. N. Nethercote and J. Seward. How to shadow every byte of memory used by a program. In VEE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. Nethercote and J. Seward. Valgrind: A framework for heavyweight dynamic binary instrumentation. In PLDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Newsome and D. Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In NDSS, 2005.Google ScholarGoogle Scholar
  33. E. B. Nightingale, D. Peek, P. M. Chen, and J. Flinn. Parallelizing security checks on commodity hardware. In ASPLOS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. G. Pokam, C. Pereira, K. Danne, R. Kassa, and A.-R. Adl-Tabatabai. Architecting a chunk--based memory race recorder in modern CMPs. In MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. F. Qin, C.Wang, Z. Li, H. Kim, Y. Zhou, and Y. Wu. LIFT: A low-overhead practical information flow tracking system for detecting security attacks. In MICRO, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. O. Ruwase, P. B. Gibbons, T. C. Mowry, V. Ramachandran, S. Chen, M. Kozuch, and M. Ryan. Parallelizing Dynamic Information Flow Tracking. In SPAA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic race detector for multi-threaded programs. ACM TOCS, 15(4), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. Shetty, M. Kharbutli, Y. Solihin, and M. Prvulovic. Heapmon: A helper-thread approach to programmable, automatic, and lowoverhead memory bug detection. IBM J. on Research and Development, 50(2/3), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas. Secure program execution via dynamic information flow tracking. In ASPLOS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G.-R. Uh, R. Cohn, B. Yadavalli, R. Peri, and R. Ayyagari. Analyzing dynamic binary instrumentation overhead. In WBIA Workshop at ASPLOS, 2006.Google ScholarGoogle Scholar
  41. G. Venkataramani, I. Doudalis, Y. Solihin, and M. Prvulovic. Flexi-Taint: A programmable accelerator for dynamic taint propagation. In HPCA, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  42. G. Venkataramani, B. Roemer, Y. Solihin, and M. Prvulovic. Mem-Tracker: Efficient and programmable support for memory access monitoring and debugging. In HPCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Virtutech Simics. http://www.virtutech.com/.Google ScholarGoogle Scholar
  44. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. Xu, R. Bodik, and M. D. Hill. A 'Flight Data Recorder' for enabling full-system multiprocessor deterministic replay. In ISCA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. Xu, R. Bodik, and M. D. Hill. A regulated transitive reduction (RTR) for longer memory race recording. In ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. P. Zhou, R. Teodorescu, and Y. Zhou. HARD: Hardware-assisted lockset-based race detection. In HPCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Y. Zhou, P. Zhou, F. Qin,W. Liu, and J. Torrellas. Efficient and flexible architectural support for dynamic monitoring. ACM TACO, 2(1), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!