research-article

Gatling: Automatic Performance Attack Discovery in Large-Scale Distributed Systems

Published:24 April 2015Publication History

Abstract

In this article, we propose Gatling, a framework that automatically finds performance attacks caused by insider attackers in large-scale message-passing distributed systems. In performance attacks, malicious nodes deviate from the protocol when sending or creating messages, with the goal of degrading system performance. We identify a representative set of basic malicious message delivery and lying actions and design a greedy search algorithm that finds effective attacks consisting of a subset of these actions. Although lying malicious actions are protocol dependent, requiring the format and meaning of messages, Gatling captures them without needing to modify the target system by using a type-aware compiler. We have implemented and used Gatling on nine systems, a virtual coordinate system, a distributed hash table lookup service and application, two multicast systems and one file sharing application, and three secure systems designed specifically to tolerate insiders, two based on virtual coordinates and one using Outlier Detection, one invariant derived from physical laws, and the last one a Byzantine resilient replication system. We found a total of 48 attacks, with the time needed to find each attack ranging from a few minutes to a few hours.

References

  1. Yair Amir, Brian Coan, Jonathan Kirsch, and John Lane. 2008. Byzantine replication under attack. In Proceedings of the IEEE International Conference on Dependable Systems and Networks with FTCS and DCC. 197--206.Google ScholarGoogle ScholarCross RefCross Ref
  2. Joao Antunes, Nuno Neves, Miguel Correia, Paulo Verissimo, and Rui Neves. 2010. Vulnerability discovery with attack injection. IEEE Transactions on Software Engineering 36, 357--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alessandro Armando, David Basin, Yohan Boichut, Yannick Chevalier, Luca Compagna, Jorge Cuellar, Paul Hankes Drielsma, Pierre C. Hem, Olga Kouchnarenko, Jacopo Mantovani, Sebastian Modersheim, David von Oheimb, Michael Rusinowitch, Judson Santiago, Mathieu Turuani, Luca Vigan, and Laurent Vigneron. 2005. The AVISPA tool for the automated validation of Internet security protocols and applications. In Proceedings of the 17th International Conference on Computer Aided Verification. 281--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alessandro Armando and Luca Compagna. 2008. SAT-based model-checking for security protocols analysis. International Journal of Information Security 7, 3--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mona Attariyan, Michael Chow, and Jason Flinn. 2012. X-ray: Automating root-cause diagnosis of performance anomalies in production software. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). 307--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Radu Banabic, George Candea, and Rachid Guerraoui. 2011. Automated vulnerability discovery in distributed systems. In Proceedings of the IEEE/IFIP 41st International Conference on Dependable Systems and Networks Workshops (DSNW’11). 188--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bruno Blanchet. 2002. From secrecy to authenticity in security protocols. In Static Analysis. Lecture Notes in Computer Science, Vol. 2477. Springer, 342--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). 209--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Miguel Castro, Peter Drushel, Ayalvadi Ganesh, Antony Rowstron, and Dan Wallach. 2002. Secure routing for structured peer-to-peer overlay networks. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI’02). 299--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Miguel Castro and Barbara Liskov. 1999. Practical Byzantine fault tolerance. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI’99). 173--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Eric Chan-Tin, Victor Heorhiadi, Nicholas Hopper, and Yongdae Kim. 2011. The frog-boiling attack: Limitations of secure network coordinate systems. ACM Transactions on Information and System Security 14, 3, Article No. 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chia Yuan Cho, Domagoj Babi, Pongsin Poosankam, Kevin Zhijie Chen, Edward XueJun Wu, and Dawn Song. 2011. MACE: Model-inference-assisted concolic exploration for protocol and vulnerability discovery. In Proceedings of the 20th USENIX Conference on Security (SEC’11). 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yang-Hua Chu, Aditya Ganjam, T. S. Eugene Ng, Sanjay Rao, Kunwadee Sripanidkulchai, Jibin Zhan, and Hui Zhang. 2004. Early experience with an Internet broadcast system based on overlay multicast. In Proceedings of the USENIX Annual Technical Conference. 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bram Cohen. 2003. Incentives build robustness in BitTorrent. In Proceedings of P2P Economics. 1--5.Google ScholarGoogle Scholar
  15. Frank Dabek, Russ Cox, Frans Kaashoek, and Robert Morris. 2004. Vivaldi: A decentralized network coordinate system. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’04). 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Deter. 2013. Cyber-Defense Technology Experimental Research Laboratory Testbed. Retrieved March 13, 2015, from http://www.deter-project.org.Google ScholarGoogle Scholar
  17. Emulab. 2013. Emulab—Network Emulation Testbed Home. Retrieved March 13, 2015, from http://www.emulab.net/.Google ScholarGoogle Scholar
  18. Dennis Geels, Gautam Altekar, Petros Maniatis, Timothy Roscoe, and Ion Stoica. 2007. Friday: Global comprehension for distributed replay. In Proceedings of the 4th USENIX Conference on Networked Systems Design and Implementation. 21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Geni. 2013. Global Environment for Network Innovation. Retrieved March 13, 2015, from http://www.geni.net.Google ScholarGoogle Scholar
  20. Thomer M. Gil, Frans Kaashoek, Jinyang Li, Robert Morris, and Jeremy Stribling. 2013. p2psim: A Simulator for Peer-to-Peer Protocols. Retrieved March 13, 2015, from http://pdos.csail.mit.edu/p2psim/.Google ScholarGoogle Scholar
  21. Patrice Godefroid. 1997. Model checking for programming languages using VeriSoft. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming (POPL’97). 174--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Patrice Godefroid, Michael Y. Levin, and David Molnar. 2008. Automated whitebox fuzz testing. In Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS’08).Google ScholarGoogle Scholar
  23. GTNet. 2013. Georgia Tech Network Simulator. Retrieved March 13, 2015, from http://www.ece.gatech.edu/research/labs/MANIACS/GTNetS/.Google ScholarGoogle Scholar
  24. Krishna P. Gummadi, Stefan Saroiu, and Steven D. Gribble. 2002. King: Estimating latency between arbitrary Internet end hosts. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurement. 5--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Haryadi S. Gunawi, Thanh Do, Pallavi Joshi, Peter Alvaro, Joseph M. Hellerstein, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Koushik Sen, and Dhruba Borthakur. 2011. FATE and DESTINI: A framework for cloud recovery testing. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI’11). 238--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Gerard J. Holzmann. 1997. The model checker SPIN. IEEE Transactions on Software Engineering 23, 5, 279--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Charles Edwin Killian, James W. Anderson, Ryan Braud, Ranjit Jhala, and Amin M. Vahdat. 2007a. Mace: Language support for building distributed systems. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). 179--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Charles Killian, James W. Anderson, Ranjit Jhala, and Amin Vahdat. 2007b. Life, death, and the critical transition: Detecting liveness bugs in systems code. In Proceedings of the 4th ISENIX Conference on Networked Systems Design and Implementation (NSDI’07). 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Dejan Kostić, Ryan Braud, Charles Killian, Erik Vandekieft, James W. Anderson, Alex C. Snoeren, and Amin Vahdat. 2005. Maintaining high bandwidth under dynamic network conditions. In Proceedings of the USENIX Annual Technical Conference (ATEC’05). 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Dejan Kostic, Adolfo Rodriguez, Jeannie Albrecht, and Amin Vahdat. 2003a. Bullet: High bandwidth data dissemination using an overlay mesh. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03). 282--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Dejan Kostic, Adolfo Rodriguez, Jeannie Albrecht, Abhijeet Bhirud, and Amin Vahdat. 2003b. Using random subsets to build scalable network services. In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS’03). 19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Dejan Kostić, Alex C. Snoeren, Amin Vahdat, Ryan Braud, Charles Killian, James W. Anderson, Jeannie Albrecht, Adolfo Rodriguez, and Erik Vandekieft. 2008. High-bandwidth data dissemination for large-scale distributed systems. ACM Transactions on Computer Systems 26, 1, 1--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nupur Kothari, Ratul Mahajan, Todd Millstein, Ramesh Govindan, and Madanlal Musuvathi. 2011. Finding protocol manipulation attacks. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM’11). 26--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Maxwell Krohn, Eddie Kohler, and M. Frans Kaashoek. 2007. Events can make sense. In Proceedings of the USENIX Annual Technical Conference (ATC’07). Article No. 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Volodymyr Kuznetsov, Johannes Kinder, Stefan Bucur, and George Candea. 2012. Efficient state merging in symbolic execution. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, New York, NY, 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Leslie Lamport. 2002. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison Wesley Longman, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jonathan Ledlie, Paul Gardner, and Margo Seltzer. 2007. Network coordinates in the wild. In Proceedings of the 4th USENIX Conference on Networked Systems Design and Implementation. 22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Hyojeong Lee, Jeff Seibert, Charles Edwin Killian, and Cristina Nita-Rotaru. 2012. Gatling: Automatic attack discovery in large-scale distributed systems.. In Proceedings of the 19th Annual Network and Distributed System Security Symposium (NDSS’12).Google ScholarGoogle Scholar
  39. Lorenzo Leonini, Étienne Rivière, and Pascal Felber. 2009. SPLAY: Distributed systems evaluation made simple (or how to turn ideas into live systems in a breeze). In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation. 185--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shiding Lin, Aimin Pan, Zheng Zhang, Rui Guo, and Zhenyu Guo. 2005. WiDS: An integrated toolkit for distributed systems development. In Proceedings of the 10th Conference on Hot Topics in Operating Systems (HOTOS’05). 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. 2008. Convicting exploitable software vulnerabilities: An efficient input provenance based approach. In Proceedings of the IEEE International Conference on Dependable Systems and Networks with FTCS and DCC (DSN’08). 247--256.Google ScholarGoogle Scholar
  42. Boon Thau Loo, Tyson Condie, Joseph M. Hellerstein, Petros Maniatis, Timothy Roscoe, and Ion Stoica. 2005. Implementing declarative overlays. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP’05). 75--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xuezheng Lui, Wei Lin, Aimin Pan, and Zheng Zhang. 2007. WiDS checker: Combating bugs in distributed systems. In Proceedings of the 4th USENIX Conference on Networked Systems Design and Implementation. 19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Nancy Lynch. 1996. Distributed Algorithms. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Madanlal Musuvathi, David Y. W. Park, Andy Chou, Dawson R. Engler, and David L. Dill. 2002. CMC: A pragmatic approach to model checking real code. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI’02). 75--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Madanlal Musuvathi, Shaz Qadeer, Thomas Ball, Gérard Basler, Piramanayagam A. Nainar, and Iulian Neamtiu. 2008. Finding and reproducing Heisenbugs in concurrent programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation. 267--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Karthik Nagaraj, Charles Killian, and Jennifer Neville. 2012. Structured comparative analysis of systems logs to diagnose performance problems. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI’12). 26. http://dl.acm.org/citation.cfm?id=2228298.2228334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. James Newsome and Dawn Song. 2005. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In Proceedings of the Network and Distributed System Security Symposium (NDSS’05).Google ScholarGoogle Scholar
  49. NS3. 2013. Network Simulator 3. Retrieved March 13, 2015, from http://www.nsnam.org/.Google ScholarGoogle Scholar
  50. PlanetLab. 2002. PlanetLab. Retrieved March 13, 2015, from http://www.planet-lab.org.Google ScholarGoogle Scholar
  51. Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker. 2001. A scalable content-addressable network. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’01). ACM, New York, NY, 161--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz. 2004. Handling churn in a DHT. In Proceedings of the USENIX Annual Technical Conference (ATEC’04). 10. citeseer.ist.psu.edu/rhea03handling.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Adolfo Rodriguez, Dejan Kostić, Dejan, and Amin Vahdat. 2004. Scalability in adaptive multi-metric overlays. In Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS’04). 112--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Ron. 2013. Resilient Overlay Networks. Retrieved March 13, 2015, from http://nms.csail.mit.edu/ron/.Google ScholarGoogle Scholar
  55. Antony Rowstron and Peter Druschel. 2001. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg (Middleware’01). 329--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Antony Rowstron, Anne-Marie Kermarrec, Miguel Castro, and Peter Druschel. 2001. SCRIBE: The design of a large-scale event notification infrastructure. In Proceedings of the 3rd International COST264 Workshop on Networked Group Communication (NGC’01). 30--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jeffrey Seibert, Sheila Becker, Cristina Nita-Rotaru, and Radu State. 2012. Newton: Securing virtual coordinates by enforcing physical laws. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS’12). IEEE, Los Alamitos, CA, 315--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Milan Stanojevic, Ratul Mahajan, Todd Millstein, and Madanlal Musuvathi. 2008. Can you fool me? Towards automatically checking protocol gullibility. In Proceedings of the 7th ACM Workshop on Hot Topics in Networks (HotNets’08).Google ScholarGoogle Scholar
  59. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. 2001. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’01). 149--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Amin Vahdat, Ken Yocum, Kevin Walsh, Priya Mahadevan, Dejan Kostić, Jeff Chase, and David Becker. 2002. Scalability and accuracy in a large-scale network emulator. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI’02). 271--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Aaron Walters, David Zage, and Cristina Nita-Rotaru. 2008. A framework for mitigating attacks against measurement-based adaptation mechanisms in unstructured multicast overlay networks. IEEE/ACM Transactions on Networking 16, 1434--1446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Wenhua Wang, Yu Lei, Donggang Liu, David Kung, Christoph Csallner, Dazhi Zhang, Raghu Kacker, and Rick Kuhn. 2011. A combinatorial approach to detecting buffer overflow vulnerabilities. In Proceedings of the 41st International Conference on Dependable Systems and Networks (DSN’11). 269--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Matt Welsh, David E. Culler, and Eric A. Brewer. 2001. SEDA: An architecture for well-conditioned, scalable Internet services. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP’01). 230--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Maysam Yabandeh, Nikola Knezevic, Dejan Kostic, and Viktor Kuncak. 2009. CrystalBall: Predicting and preventing inconsistencies in deployed distributed systems. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI’09). 229--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Junfeng Yang, Tisheng Chen, Ming Wu, Zhilei Xu, Xuezheng Liu, Haoxiang Lin, Mao Yang, Fan Long, Lintao Zhang, and Lidong Zhou. 2009. MODIST: Transparent model checking of unmodified distributed systems. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI’09). 213--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. David John Zage and Cristina Nita-Rotaru. 2007. On the accuracy of decentralized virtual coordinate systems in adversarial networks. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS’07). 214--224. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Gatling: Automatic Performance Attack Discovery in Large-Scale Distributed Systems

      Reviews

      RuayShiung Chang

      A distributed system consists of many individual and independent nodes interacting with one another using a standardized set of protocols. Since the scale of nodes is often thousands or even millions, it is very hard, if not impossible, to detect if any one node is malfunctioning. A node can malfunction for several reasons, for example, errors in the protocol design, bugs in code, or being invaded by a malicious user. This paper proposes a framework called Gatling to automatically find "performance attacks caused by insider attackers in large-scale message-passing distributed systems." By performance attacks, it means that the malicious nodes will send or create messages "with the goal of degrading system performance." It is understandable that an exhaustive search for malicious nodes is not possible. Gatling identifies a representative set of basic malicious message delivery and lying actions and designs a greedy search algorithm that finds effective attacks consisting of a subset of these actions. The system has been tested on nine distributed systems and the results are promising. However, distributed systems, though not black boxes, are hard to test and visualize. The algorithms proposed in this paper are also heuristics. They may have worked on the test cases, but there is no guarantee that they will work on your systems. Therefore, use Gatling with this in mind and take your own risks. Finally, a 34-page paper seems a bit too long; it could use a little brevity. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Information and System Security
        ACM Transactions on Information and System Security  Volume 17, Issue 4
        April 2015
        127 pages
        ISSN:1094-9224
        EISSN:1557-7406
        DOI:10.1145/2756875
        • Editor:
        • Gene Tsudik
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 April 2015
        • Accepted: 1 January 2015
        • Revised: 1 October 2014
        • Received: 1 July 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!