Abstract
We present Castor, a record/replay system for multi-core applications that provides consistently low and predictable overheads. With Castor, developers can leave record and replay on by default, making it practical to record and reproduce production bugs, or employ fault tolerance to recover from hardware failures.
Castor is inspired by several observations: First, an efficient mechanism for logging non-deterministic events is critical for recording demanding workloads with low overhead. Through careful use of hardware we were able to increase log throughput by 10x or more, e.g., we could record a server handling 10x more requests per second for the same record overhead. Second, most applications can be recorded without modifying source code by using the compiler to instrument language level sources of non-determinism, in conjunction with more familiar techniques like shared library interposition. Third, while Castor cannot deterministically replay all data races, this limitation is generally unimportant in practice, contrary to what prior work has assumed.
Castor currently supports applications written in C, C++, and Go on FreeBSD. We have evaluated Castor on parallel and server workloads, including a commercial implementation of memcached in Go, which runs Castor in production.
- UndoDB Live Recorder: Record/Replay for Live Systems. http://jakob.engbloms.se/archives/2259, Nov. 2015.Google Scholar
- Reverse Debugging With Replay Engine. http://docs.roguewave.com/totalview/8.15.7/pdfs/ReplayEngine_Getting_Started_Guide.pdf, 2015.Google Scholar
- Simics 5 is Here. http://blogs. windriver.com/wind_river_blog/2015/06/ simics-5-is-here-more-parallel-than-ever.html, June 2015.Google Scholar
- Lighttpd - fly light. https://www.lighttpd.net/, Jan 2017.Google Scholar
- VMware vSphere: What's New- Availability Enhancements. http://www.slideshare.net/muk_ua/vswn6-m08-avalabilityenhancements, Jan 2017.Google Scholar
- xxHash - Extremely fast non-cryptographic hash algorithm. http://cyan4973.github.io/xxHash/http://cyan4973.github.io/xxHash/, Jan. 2017.Google Scholar
- S. V. Adve, M. D. Hill, B. P. Miller, and R. H. B. Netzer. Detecting data races on weak memory systems. In Proceedings of the 18th Annual International Symposium on Computer Architecture, ISCA '91, pages 234--243, New York, NY, USA, 1991. ACM. ISBN 0-89791-394-9. doi: 10.1145/115952.115976. URL http://doi.acm.org/10.1145/115952.115976 Google Scholar
Digital Library
- G. Altekar and I. Stoica. ODR: Output-deterministic Replay for Multicore Debugging. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, pages 193--206, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-752-3. doi: 10.1145/1629575.1629594. URL http://doi.acm.org/10.1145/1629575.1629594 Google Scholar
Digital Library
- C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011.Google Scholar
Digital Library
- H.-J. Boehm. How to Miscompile Programs with "Benign" Data Races. In Proceedings of the 3rd USENIX Conference on Hot Topic in Parallelism, HotPar'11, pages 3--3, Berkeley, CA, USA, 2011. USENIX Association. URL http://dl.acm.org/citation.cfm?id=2001252.2001255Google Scholar
Digital Library
- H.-J. Boehm. Position Paper: Nondeterminism is Unavoidable, but Data Races Are Pure Evil. In Proceedings of the 2012 10.1145/2414729.2414732. URL http://doi.acm.org/10.1145/2414729.2414732.Google Scholar
- H.-J. Boehm and S. V. Adve. Foundations of the c++ concurrency memory model. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pages 68--78, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-860-2. doi: 10.1145/1375581.1375591. URL http://doi.acm.org/10.1145/1375581.1375591. Google Scholar
Digital Library
- K. D. Bosschere. Personal communication, May 2016.Google Scholar
- T. C. Bressoud and F. B. Schneider. Hypervisor-based Fault Tolerance. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, SOSP '95, pages 1--11, New York, NY, USA, 1995. ACM. ISBN 0-89791-715-4. doi: 10.1145/224056.224058. URL http://doi.acm.org/10.1145/224056.224058. Google Scholar
Digital Library
- Y. Chen and H. Chen. Scalable Deterministic Replay in a Parallel Full-system Emulator. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 207--218, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1922-5. doi: 10.1145/2442516.2442537. URL http://doi.acm.org/10.1145/2442516.2442537. Google Scholar
Digital Library
- J. Chow, T. Garfinkel, and P. M. Chen. Decoupling Dynamic Program Analysis from Execution in Virtual Environments. In USENIX 2008 Annual Technical Conference on Annual Technical Conference, ATC'08, pages 1--14, Berkeley, CA, USA, 2008. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1404014.1404015.Google Scholar
Digital Library
- J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56:74--80, 2013. URL http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scale/fulltext. Google Scholar
Digital Library
- D. Devecsery, M. Chow, X. Dou, J. Flinn, and P. M. Chen. Eidetic Systems. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 525--540, Berkeley, CA, USA, 2014. USENIX Association. ISBN 978-1-931971-16-4. URL http://dl.acm.org/citation.cfm?id=2685048.2685090.Google Scholar
- G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution Replay of Multiprocessor Virtual Machines. In Proceedings of the Fourth ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE'08, pages 121--130, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-796-4. doi: 10.1145/1346256.1346273. URL http://doi.acm.org/10.1145/1346256.1346273. Google Scholar
Digital Library
- D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay Debugging for Distributed Applications. In Proceedings of the Annual Conference on USENIX '06 Annual Technical Conference, ATEC '06, pages 27--27, Berkeley, CA, USA, 2006. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1267359.1267386.Google Scholar
Digital Library
- S. Ghemawat and J. Dean. GitHub - google/leveldb. https://github.com/google/leveldb, Jan 2017.Google Scholar
- Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An Application-level Kernel for Record and Replay. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pages 193--208, Berkeley, CA, USA, 2008. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1855741.1855755.Google Scholar
Digital Library
- Z. Guo, C. Hong, M. Yang, D. Zhou, L. Zhou, and L. Zhuang. Rex: Replication at the Speed of Multi-core. In Proceedings of the Ninth European Conference on Computer Systems, EuroSys'14, pages 11:1--11:14, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2704-6. doi: 10.1145/2592798.2592800. URL http://doi.acm.org/10.1145/2592798.2592800. Google Scholar
Digital Library
- J. Huang, P. Liu, and C. Zhang. Leap: Lightweight deterministic multi-processor replay of concurrent java programs. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE '10, pages 207--216, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-791-2. doi: 10.1145/1882291.1882323. URL http://doi.acm.org/10.1145/1882291.1882323. Google Scholar
Digital Library
- B. Kasikci, C. Zamfir, and G. Candea. Data races vs. data race bugs: Telling the difference with portend. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 185-198, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-0759-8. doi: 10.1145/2150976.2150997. URL http://doi.acm.org/10.1145/2150976.2150997. Google Scholar
Digital Library
- O. Laadan, N. Viennot, and J. Nieh. Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems. SIGMETRICS Perform. Eval. Rev., 38(1): 155--166, June 2010. ISSN 0163-5999. doi: 10.1145/1811099. 1811057. URL http://doi.acm.org/10.1145/1811099.1811057. Google Scholar
Digital Library
- L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558--565, July 1978. ISSN 0001-0782. doi: 10.1145/359545.359563. URL http://doi.acm.org/10.1145/359545.359563. Google Scholar
Digital Library
- T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Trans. Comput., 36(4): 471--482, Apr. 1987. ISSN 0018-9340. doi: 10.1109/TC.1987. 1676929. URL http://dx.doi.org/10.1109/TC.1987.1676929.Google Scholar
Digital Library
- D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: Efficient Online Multiprocessor Replayvia Speculation and External Determinism. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV, pages 77--90, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-839-1. doi: 10.1145/1736020.1736031. URL http://doi.acm.org/10.1145/1736020.1736031. Google Scholar
Digital Library
- D. Lee, P. M. Chen, J. Flinn, and S. Narayanasamy. Chimera: Hybrid Program Analysis for Determinism. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pages 463--474, New York, NY, USA, 2012. ACM. ISBN 978-1-4503- 1205-9. doi: 10.1145/2254064.2254119. URL http://doi.acm.org/10.1145/2254064.2254119. Google Scholar
Digital Library
- J. Leverich. Mutilate: high-performance memcached load generator. https://github.com/leverich/mutilate, Jan 2017.Google Scholar
- Matt Holt. Caddy: The HTTP/2 web server with automatic HTTPS. https://caddyserver.com/, May 2016.Google Scholar
- MemCachier Inc. MemCachier. https://www.memcachier.com/, May 2016.Google Scholar
- S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, and B. Calder. Automatically Classifying Benign and Harmful Data Races Using Replay Analysis. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, pages 22--31, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-633-2. doi: 10.1145/1250734.1250738. URL http://doi.acm.org/10.1145/1250734.1250738. Google Scholar
Digital Library
- NGINX Inc. NGINX --- High Performance Load Balancer, Web Server, & Reverse Proxy. https://www.nginx.com/, Jan 2017.Google Scholar
- R. O'Callahan, C. Jones, N. Froyd, K. Huey, A. Noll, and N. Partush. Lightweight User-Space Record And Replay. CoRR, abs/1610.02144, 2016. URL http://arxiv.org/abs/1610.02144.Google Scholar
- S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, pages 177--192, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-752-3. doi: 10.1145/1629575.1629593. URL http://doi.acm.org/10.1145/1629575.1629593. Google Scholar
Digital Library
- PostgreSQL Global Development Group. PostgreSQL: The world's most advanced open source database. https://www.postgresql.org/, Jan 2017.Google Scholar
- M. Ronsse and K. De Bosschere. RecPlay: A Fully Integrated Practical Record/Replay System. ACM Trans. Comput. Syst., 17(2):133--152, May 1999. ISSN 0734-2071. doi: 10.1145/312203.312214. URL http://doi.acm.org/10.1145/312203.312214. Google Scholar
Digital Library
- Y. Saito. Jockey: A User-space Library for Record-replay Debugging. In Proceedings of the Sixth International Symposium on Automated Analysis-driven Debugging, AADE- BUG'05, pages 69--76, New York, NY, USA, 2005. ACM. ISBN 1-59593-050-7. doi: 10.1145/1085130.1085139. URL http://doi.acm.org/10.1145/1085130.1085139. Google Scholar
Digital Library
- D. J. Scales, M. Nelson, and G. Venkitachalam. The design of a practical system for fault-tolerant virtual machines. SIGOPS Oper. Syst. Rev., 44(4):30--39, Dec. 2010. ISSN 0163-5980. doi: 10.1145/1899928.1899932. URL http://doi.acm.org/10.1145/1899928.1899932. Google Scholar
Digital Library
- K. Serebryany and T. Iskhodzhanov. ThreadSanitizer: Data Race Detection in Practice. In Proceedings of the Workshop on Binary Instrumentation and Applications, WBIA '09, pages 62--71, New York, NY, USA, 2009. ACM. ISBN 978-1- 60558-793-6. doi: 10.1145/1791194.1791203. URL http://doi.acm.org/10.1145/1791194.1791203 Google Scholar
Digital Library
- J. Sherry, P. X. Gao, S. Basu, A. Panda, A. Krishnamurthy, C. Maciocco, M. Manesh, J. a. Martins, S. Ratnasamy, L. Rizzo, and S. Shenker. Rollback-Recovery for Middleboxes. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, pages 227--240, New York, NY, USA, 2015. ACM. ISBN 978-1-4503- 3542-3. doi: 10.1145/2785956.2787501. URL http://doi.acm.org/10.1145/2785956.2787501 Google Scholar
Digital Library
- S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC '04, pages 3--3, Berkeley, CA, USA, 2004. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1247415.1247418.Google Scholar
Digital Library
- L. Szekeres, M. Payer, T. Wei, and D. Song. Sok: Eternal war in memory. In Proceedings of the 2013 IEEE Symposium on Security and Privacy, SP '13, pages 48--62, Washington, DC, USA, 2013. IEEE Computer Society. ISBN 978-0-7695-4977- 4. doi: 10.1109/SP.2013.13. URL http://dx.doi.org/10.1109/SP.2013.13. Google Scholar
Digital Library
- The Apache Software Foundation. ab - Apache HTTP server benchmarking tool. https://httpd.apache.org/docs/2.4/programs/ab.html, Jan 2017.Google Scholar
- The Golang Team. The Go Memory Model. https://golang.org/ref/mem, May 2014.Google Scholar
- The Memcached Team. Memcached: A distributed memory object caching system. http://memcached.org/, May 2016.Google Scholar
- K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: Parallelizing Sequential Logging and Replay. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 15--26, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0266-1. doi: 10.1145/1950365.1950370. URL http://doi.acm.org/10.1145/1950365.1950370. Google Scholar
Digital Library
- Wikipedia Foundation, Inc. Intel SHA extensions. https://en.wikipedia.org/wiki/Intel_SHA_extensions, Jan 2017.Google Scholar
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA '95, pages 24--36, New York, NY, USA, 1995. ACM. ISBN 0- 89791-698-0. doi: 10.1145/223982.223990. URL http://doi.acm.org/10.1145/223982.223990. Google Scholar
Digital Library
- C. Zilles and G. Sohi. Master/Slave Speculative Parallelization. In Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 35, pages 85--96, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press. ISBN 0-7695-1859-1. URL http://dl.acm.org/citation.cfm?id=774861.774871Google Scholar
Digital Library
Index Terms
Towards Practical Default-On Multi-Core Record/Replay
Recommendations
Towards Practical Default-On Multi-Core Record/Replay
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsWe present Castor, a record/replay system for multi-core applications that provides consistently low and predictable overheads. With Castor, developers can leave record and replay on by default, making it practical to record and reproduce production ...
Towards Practical Default-On Multi-Core Record/Replay
Asplos'17We present Castor, a record/replay system for multi-core applications that provides consistently low and predictable overheads. With Castor, developers can leave record and replay on by default, making it practical to record and reproduce production ...
Replay Debugging for Multi-threaded Embedded Software
EUC '10: Proceedings of the 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous ComputingThe non-deterministic behavior of multi-threaded embedded software makes cyclic debugging difficult. Even with the same input data, consecutive runs may result in different executions and reproducing the same bug is itself a challenge. Despite the fact ...







Comments