skip to main content
research-article
Public Access

Towards Practical Default-On Multi-Core Record/Replay

Published:04 April 2017Publication History
Skip Abstract Section

Abstract

We present Castor, a record/replay system for multi-core applications that provides consistently low and predictable overheads. With Castor, developers can leave record and replay on by default, making it practical to record and reproduce production bugs, or employ fault tolerance to recover from hardware failures.

Castor is inspired by several observations: First, an efficient mechanism for logging non-deterministic events is critical for recording demanding workloads with low overhead. Through careful use of hardware we were able to increase log throughput by 10x or more, e.g., we could record a server handling 10x more requests per second for the same record overhead. Second, most applications can be recorded without modifying source code by using the compiler to instrument language level sources of non-determinism, in conjunction with more familiar techniques like shared library interposition. Third, while Castor cannot deterministically replay all data races, this limitation is generally unimportant in practice, contrary to what prior work has assumed.

Castor currently supports applications written in C, C++, and Go on FreeBSD. We have evaluated Castor on parallel and server workloads, including a commercial implementation of memcached in Go, which runs Castor in production.

References

  1. UndoDB Live Recorder: Record/Replay for Live Systems. http://jakob.engbloms.se/archives/2259, Nov. 2015.Google ScholarGoogle Scholar
  2. Reverse Debugging With Replay Engine. http://docs.roguewave.com/totalview/8.15.7/pdfs/ReplayEngine_Getting_Started_Guide.pdf, 2015.Google ScholarGoogle Scholar
  3. Simics 5 is Here. http://blogs. windriver.com/wind_river_blog/2015/06/ simics-5-is-here-more-parallel-than-ever.html, June 2015.Google ScholarGoogle Scholar
  4. Lighttpd - fly light. https://www.lighttpd.net/, Jan 2017.Google ScholarGoogle Scholar
  5. VMware vSphere: What's New- Availability Enhancements. http://www.slideshare.net/muk_ua/vswn6-m08-avalabilityenhancements, Jan 2017.Google ScholarGoogle Scholar
  6. xxHash - Extremely fast non-cryptographic hash algorithm. http://cyan4973.github.io/xxHash/http://cyan4973.github.io/xxHash/, Jan. 2017.Google ScholarGoogle Scholar
  7. S. V. Adve, M. D. Hill, B. P. Miller, and R. H. B. Netzer. Detecting data races on weak memory systems. In Proceedings of the 18th Annual International Symposium on Computer Architecture, ISCA '91, pages 234--243, New York, NY, USA, 1991. ACM. ISBN 0-89791-394-9. doi: 10.1145/115952.115976. URL http://doi.acm.org/10.1145/115952.115976 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Altekar and I. Stoica. ODR: Output-deterministic Replay for Multicore Debugging. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, pages 193--206, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-752-3. doi: 10.1145/1629575.1629594. URL http://doi.acm.org/10.1145/1629575.1629594 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H.-J. Boehm. How to Miscompile Programs with "Benign" Data Races. In Proceedings of the 3rd USENIX Conference on Hot Topic in Parallelism, HotPar'11, pages 3--3, Berkeley, CA, USA, 2011. USENIX Association. URL http://dl.acm.org/citation.cfm?id=2001252.2001255Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H.-J. Boehm. Position Paper: Nondeterminism is Unavoidable, but Data Races Are Pure Evil. In Proceedings of the 2012 10.1145/2414729.2414732. URL http://doi.acm.org/10.1145/2414729.2414732.Google ScholarGoogle Scholar
  12. H.-J. Boehm and S. V. Adve. Foundations of the c++ concurrency memory model. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pages 68--78, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-860-2. doi: 10.1145/1375581.1375591. URL http://doi.acm.org/10.1145/1375581.1375591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. D. Bosschere. Personal communication, May 2016.Google ScholarGoogle Scholar
  14. T. C. Bressoud and F. B. Schneider. Hypervisor-based Fault Tolerance. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, SOSP '95, pages 1--11, New York, NY, USA, 1995. ACM. ISBN 0-89791-715-4. doi: 10.1145/224056.224058. URL http://doi.acm.org/10.1145/224056.224058. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Chen and H. Chen. Scalable Deterministic Replay in a Parallel Full-system Emulator. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 207--218, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1922-5. doi: 10.1145/2442516.2442537. URL http://doi.acm.org/10.1145/2442516.2442537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Chow, T. Garfinkel, and P. M. Chen. Decoupling Dynamic Program Analysis from Execution in Virtual Environments. In USENIX 2008 Annual Technical Conference on Annual Technical Conference, ATC'08, pages 1--14, Berkeley, CA, USA, 2008. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1404014.1404015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56:74--80, 2013. URL http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scale/fulltext. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Devecsery, M. Chow, X. Dou, J. Flinn, and P. M. Chen. Eidetic Systems. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 525--540, Berkeley, CA, USA, 2014. USENIX Association. ISBN 978-1-931971-16-4. URL http://dl.acm.org/citation.cfm?id=2685048.2685090.Google ScholarGoogle Scholar
  19. G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution Replay of Multiprocessor Virtual Machines. In Proceedings of the Fourth ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE'08, pages 121--130, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-796-4. doi: 10.1145/1346256.1346273. URL http://doi.acm.org/10.1145/1346256.1346273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay Debugging for Distributed Applications. In Proceedings of the Annual Conference on USENIX '06 Annual Technical Conference, ATEC '06, pages 27--27, Berkeley, CA, USA, 2006. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1267359.1267386.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Ghemawat and J. Dean. GitHub - google/leveldb. https://github.com/google/leveldb, Jan 2017.Google ScholarGoogle Scholar
  22. Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An Application-level Kernel for Record and Replay. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pages 193--208, Berkeley, CA, USA, 2008. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1855741.1855755.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Z. Guo, C. Hong, M. Yang, D. Zhou, L. Zhou, and L. Zhuang. Rex: Replication at the Speed of Multi-core. In Proceedings of the Ninth European Conference on Computer Systems, EuroSys'14, pages 11:1--11:14, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2704-6. doi: 10.1145/2592798.2592800. URL http://doi.acm.org/10.1145/2592798.2592800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Huang, P. Liu, and C. Zhang. Leap: Lightweight deterministic multi-processor replay of concurrent java programs. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE '10, pages 207--216, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-791-2. doi: 10.1145/1882291.1882323. URL http://doi.acm.org/10.1145/1882291.1882323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Kasikci, C. Zamfir, and G. Candea. Data races vs. data race bugs: Telling the difference with portend. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 185-198, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-0759-8. doi: 10.1145/2150976.2150997. URL http://doi.acm.org/10.1145/2150976.2150997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Laadan, N. Viennot, and J. Nieh. Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems. SIGMETRICS Perform. Eval. Rev., 38(1): 155--166, June 2010. ISSN 0163-5999. doi: 10.1145/1811099. 1811057. URL http://doi.acm.org/10.1145/1811099.1811057. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558--565, July 1978. ISSN 0001-0782. doi: 10.1145/359545.359563. URL http://doi.acm.org/10.1145/359545.359563. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Trans. Comput., 36(4): 471--482, Apr. 1987. ISSN 0018-9340. doi: 10.1109/TC.1987. 1676929. URL http://dx.doi.org/10.1109/TC.1987.1676929.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: Efficient Online Multiprocessor Replayvia Speculation and External Determinism. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV, pages 77--90, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-839-1. doi: 10.1145/1736020.1736031. URL http://doi.acm.org/10.1145/1736020.1736031. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Lee, P. M. Chen, J. Flinn, and S. Narayanasamy. Chimera: Hybrid Program Analysis for Determinism. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pages 463--474, New York, NY, USA, 2012. ACM. ISBN 978-1-4503- 1205-9. doi: 10.1145/2254064.2254119. URL http://doi.acm.org/10.1145/2254064.2254119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Leverich. Mutilate: high-performance memcached load generator. https://github.com/leverich/mutilate, Jan 2017.Google ScholarGoogle Scholar
  32. Matt Holt. Caddy: The HTTP/2 web server with automatic HTTPS. https://caddyserver.com/, May 2016.Google ScholarGoogle Scholar
  33. MemCachier Inc. MemCachier. https://www.memcachier.com/, May 2016.Google ScholarGoogle Scholar
  34. S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, and B. Calder. Automatically Classifying Benign and Harmful Data Races Using Replay Analysis. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, pages 22--31, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-633-2. doi: 10.1145/1250734.1250738. URL http://doi.acm.org/10.1145/1250734.1250738. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. NGINX Inc. NGINX --- High Performance Load Balancer, Web Server, & Reverse Proxy. https://www.nginx.com/, Jan 2017.Google ScholarGoogle Scholar
  36. R. O'Callahan, C. Jones, N. Froyd, K. Huey, A. Noll, and N. Partush. Lightweight User-Space Record And Replay. CoRR, abs/1610.02144, 2016. URL http://arxiv.org/abs/1610.02144.Google ScholarGoogle Scholar
  37. S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, pages 177--192, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-752-3. doi: 10.1145/1629575.1629593. URL http://doi.acm.org/10.1145/1629575.1629593. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. PostgreSQL Global Development Group. PostgreSQL: The world's most advanced open source database. https://www.postgresql.org/, Jan 2017.Google ScholarGoogle Scholar
  39. M. Ronsse and K. De Bosschere. RecPlay: A Fully Integrated Practical Record/Replay System. ACM Trans. Comput. Syst., 17(2):133--152, May 1999. ISSN 0734-2071. doi: 10.1145/312203.312214. URL http://doi.acm.org/10.1145/312203.312214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Saito. Jockey: A User-space Library for Record-replay Debugging. In Proceedings of the Sixth International Symposium on Automated Analysis-driven Debugging, AADE- BUG'05, pages 69--76, New York, NY, USA, 2005. ACM. ISBN 1-59593-050-7. doi: 10.1145/1085130.1085139. URL http://doi.acm.org/10.1145/1085130.1085139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. D. J. Scales, M. Nelson, and G. Venkitachalam. The design of a practical system for fault-tolerant virtual machines. SIGOPS Oper. Syst. Rev., 44(4):30--39, Dec. 2010. ISSN 0163-5980. doi: 10.1145/1899928.1899932. URL http://doi.acm.org/10.1145/1899928.1899932. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. K. Serebryany and T. Iskhodzhanov. ThreadSanitizer: Data Race Detection in Practice. In Proceedings of the Workshop on Binary Instrumentation and Applications, WBIA '09, pages 62--71, New York, NY, USA, 2009. ACM. ISBN 978-1- 60558-793-6. doi: 10.1145/1791194.1791203. URL http://doi.acm.org/10.1145/1791194.1791203 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Sherry, P. X. Gao, S. Basu, A. Panda, A. Krishnamurthy, C. Maciocco, M. Manesh, J. a. Martins, S. Ratnasamy, L. Rizzo, and S. Shenker. Rollback-Recovery for Middleboxes. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, pages 227--240, New York, NY, USA, 2015. ACM. ISBN 978-1-4503- 3542-3. doi: 10.1145/2785956.2787501. URL http://doi.acm.org/10.1145/2785956.2787501 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC '04, pages 3--3, Berkeley, CA, USA, 2004. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1247415.1247418.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. L. Szekeres, M. Payer, T. Wei, and D. Song. Sok: Eternal war in memory. In Proceedings of the 2013 IEEE Symposium on Security and Privacy, SP '13, pages 48--62, Washington, DC, USA, 2013. IEEE Computer Society. ISBN 978-0-7695-4977- 4. doi: 10.1109/SP.2013.13. URL http://dx.doi.org/10.1109/SP.2013.13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. The Apache Software Foundation. ab - Apache HTTP server benchmarking tool. https://httpd.apache.org/docs/2.4/programs/ab.html, Jan 2017.Google ScholarGoogle Scholar
  47. The Golang Team. The Go Memory Model. https://golang.org/ref/mem, May 2014.Google ScholarGoogle Scholar
  48. The Memcached Team. Memcached: A distributed memory object caching system. http://memcached.org/, May 2016.Google ScholarGoogle Scholar
  49. K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: Parallelizing Sequential Logging and Replay. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 15--26, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0266-1. doi: 10.1145/1950365.1950370. URL http://doi.acm.org/10.1145/1950365.1950370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Wikipedia Foundation, Inc. Intel SHA extensions. https://en.wikipedia.org/wiki/Intel_SHA_extensions, Jan 2017.Google ScholarGoogle Scholar
  51. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA '95, pages 24--36, New York, NY, USA, 1995. ACM. ISBN 0- 89791-698-0. doi: 10.1145/223982.223990. URL http://doi.acm.org/10.1145/223982.223990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. C. Zilles and G. Sohi. Master/Slave Speculative Parallelization. In Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 35, pages 85--96, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press. ISBN 0-7695-1859-1. URL http://dl.acm.org/citation.cfm?id=774861.774871Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards Practical Default-On Multi-Core Record/Replay

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 52, Issue 4
      ASPLOS '17
      April 2017
      811 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3093336
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
        April 2017
        856 pages
        ISBN:9781450344654
        DOI:10.1145/3037697

      Copyright © 2017 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 April 2017

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!