skip to main content
research-article
Open Access
Artifacts Evaluated & Functional

String constraints with concatenation and transducers solved efficiently

Published:27 December 2017Publication History
Skip Abstract Section

Abstract

String analysis is the problem of reasoning about how strings are manipulated by a program. It has numerous applications including automatic detection of cross-site scripting, and automatic test-case generation. A popular string analysis technique includes symbolic executions, which at their core use constraint solvers over the string domain, a.k.a. string solvers. Such solvers typically reason about constraints expressed in theories over strings with the concatenation operator as an atomic constraint. In recent years, researchers started to recognise the importance of incorporating the replace-all operator (i.e. replace all occurrences of a string by another string) and, more generally, finite-state transductions in the theories of strings with concatenation. Such string operations are typically crucial for reasoning about XSS vulnerabilities in web applications, especially for modelling sanitisation functions and implicit browser transductions (e.g. innerHTML). Although this results in an undecidable theory in general, it was recently shown that the straight-line fragment of the theory is decidable, and is sufficiently expressive in practice. In this paper, we provide the first string solver that can reason about constraints involving both concatenation and finite-state transductions. Moreover, it has a completeness and termination guarantee for several important fragments (e.g. straight-line fragment). The main challenge addressed in the paper is the prohibitive worst-case complexity of the theory (double-exponential time), which is exponentially harder than the case without finite-state transductions. To this end, we propose a method that exploits succinct alternating finite-state automata as concise symbolic representations of string constraints. In contrast to previous approaches using nondeterministic automata, alternation offers not only exponential savings in space when representing Boolean combinations of transducers, but also a possibility of succinct representation of otherwise costly combinations of transducers and concatenation. Reasoning about the emptiness of the AFA language requires a state-space exploration in an exponential-sized graph, for which we use model checking algorithms (e.g. IC3). We have implemented our algorithm and demonstrated its efficacy on benchmarks that are derived from cross-site scripting analysis and other examples in the literature.

Skip Supplemental Material Section

Supplemental Material

concatenationandtransducers.webm

References

  1. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Lukás Holík, Ahmed Rezine, Philipp Rümmer, and Jari Stenman. 2014. String Constraints for Verification. In CAV. 150–166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Davide Balzarotti, Marco Cova, Viktoria Felmetsger, Nenad Jovanovic, Engin Kirda, Christopher Kruegel, and Giovanni Vigna. 2008. Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications. In S&P. 387–401.Google ScholarGoogle Scholar
  3. Pablo Barceló, Diego Figueira, and Leonid Libkin. 2013. Graph Logics with Rational Relations. Logical Methods in Computer Science 9, 3 (2013). DOI: Google ScholarGoogle ScholarCross RefCross Ref
  4. Pablo Barceló, Leonid Libkin, A. W. Lin, and Peter T. Wood. 2012. Expressive Languages for Path Queries over GraphStructured Data. ACM Trans. Database Syst. 37, 4 (2012), 31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Clark Barrett, Aaron Stump, and Cesare Tinelli. 2010. The SMT-LIB Standard: Version 2.0. In Proc. of SMT’10.Google ScholarGoogle Scholar
  6. Clark W. Barrett, Cesare Tinelli, Morgan Deters, Tianyi Liang, Andrew Reynolds, and Nestan Tsiskaridze. 2016. Efficient solving of string constraints for security analysis. In Proceedings of the Symposium and Bootcamp on the Science of Security, Pittsburgh, PA, USA, April 19-21, 2016. 4–6. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Daniel Le Berre and Anne Parrain. 2010. The Sat4j library, release 2.2. JSAT 7, 2-3 (2010), 59–6. http://jsat.ewi.tudelft.nl/ content/volume7/JSAT7_4_LeBerre.pdfGoogle ScholarGoogle Scholar
  8. Jean Berstel. 1979. Transductions and Context-Free Languages. Teubner-Verlag. Google ScholarGoogle ScholarCross RefCross Ref
  9. Armin Biere, Keijo Heljanko, and Siert Wieringa. 2017. AIGER 1.9 and Beyond (Draft). http://fmv.jku.at/hwmcc11/beyond1.pdf (cited in 2017). (2017).Google ScholarGoogle Scholar
  10. Nikolaj Bjørner, Nikolai Tillmann, and Andrei Voronkov. 2009. Path feasibility analysis for string-manipulating programs. In TACAS. 307–321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Aaron R. Bradley. 2012. Understanding IC3. In Theory and Applications of Satisfiability Testing - SAT 2012 - 15th International Conference, Trento, Italy, June 17-20, 2012. Proceedings. 1–14. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Robert Brayton and Alan Mishchenko. 2010. ABC: An Academic Industrial-Strength Verification Tool. In Computer Aided Verification: 22nd International Conference, CAV 2010, Edinburgh, UK, July 15-19, 2010. Proceedings, Tayssir Touili, Byron Cook, and Paul Jackson (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 24–40. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2008. EXE: Automatically Generating Inputs of Death. ACM Trans. Inf. Syst. Secur. 12, 2 (2008), 10:1–10:38. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cristian Cadar, Patrice Godefroid, Sarfraz Khurshid, Corina S. Pasareanu, Koushik Sen, Nikolai Tillmann, and Willem Visser. 2011. Symbolic execution for software testing in practice: preliminary assessment. In Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011. 1066–1071. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Roberto Cavada, Alessandro Cimatti, Michele Dorigatti, Alberto Griggio, Alessandro Mariotti, Andrea Micheli, Sergio Mover, Marco Roveri, and Stefano Tonetta. 2014. The nuXmv Symbolic Model Checker. In CAV’14 (Lecture Notes in Computer Science), Vol. 8559. Springer, 334–342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Edmund M. Clarke, Orna Grumberg, and Doron A. Peled. 1999. Model Checking. The MIT Press, Cambridge, Massachusetts.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Google co. 2015. Google Closure Library (referred in Nov 2015). https://developers.google.com/closure/library/ . (2015).Google ScholarGoogle Scholar
  18. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Arlen Cox and Jason Leasure. 2017. Model Checking Regular Language Constraints. CoRR abs/1708.09073 (2017). arXiv: 1708.09073 http://arxiv.org/abs/1708.09073Google ScholarGoogle Scholar
  20. Loris D’Antoni, Zachary Kincaid, and Fang Wang. 2016. A Symbolic Decision Procedure for Symbolic Alternating Finite Automata. CoRR abs/1610.01722 (2016). http://arxiv.org/abs/1610.01722Google ScholarGoogle Scholar
  21. Loris D’Antoni and Margus Veanes. 2013. Static Analysis of String Encoders and Decoders. In VMCAI. 209–228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Leonardo De Moura and Nikolaj Bjørner. 2011. Satisfiability modulo theories: introduction and applications. Commun. ACM 54, 9 (2011), 69–77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Volker Diekert. 2002. Makanin’s Algorithm. In Algebraic Combinatorics on Words, M. Lothaire (Ed.). Encyclopedia of Mathematics and its Applications, Vol. 90. Cambridge University Press, Chapter 12, 387–442.Google ScholarGoogle Scholar
  24. Laurent Doyen and Jean-François Raskin. 2010. Antichain Algorithms for Finite Automata. In TACAS’10 (Lecture Notes in Computer Science), Vol. 6015. Springer, 2–22. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Alain Finkel. 1987. A Generalization of the Procedure of Karp and Miller to Well Structured Transition Systems. In Automata, Languages and Programming, 14th International Colloquium, ICALP87, Karlsruhe, Germany, July 13-17, 1987, Proceedings (Lecture Notes in Computer Science), Thomas Ottmann (Ed.), Vol. 267. Springer, 499–508. DOI: Google ScholarGoogle ScholarCross RefCross Ref
  26. Xiang Fu and Chung-Chih Li. 2010. Modeling Regular Replacement for String Constraint Solving. In NFM. 67–76.Google ScholarGoogle Scholar
  27. Xiang Fu, Michael C. Powell, Michael Bantegui, and Chung-Chih Li. 2013. Simple linear string constraints. Formal Asp. Comput. 25, 6 (2013), 847–891. Google ScholarGoogle ScholarCross RefCross Ref
  28. Vijay Ganesh, Mia Minnes, Armando Solar-Lezama, and Martin Rinard. 2013. Word equations with length constraints: what’s decidable? In Hardware and Software: Verification and Testing. Springer, 209–226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Graeme Gange, Jorge A. Navas, Peter J. Stuckey, Harald Søndergaard, and Peter Schachte. 2013. Unbounded Model-Checking with Interpolation for Regular Language Constraints. In TACAS’2013 (Lecture Notes in Computer Science), Vol. 7795. Springer, 277–291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Seymour Ginsburg and Edwin H. Spanier. 1966. Semigroups, Presburger formulas, and languages. Pacific J. Math. 16, 2 (1966), 285–296. http://projecteuclid.org/euclid.pjm/1102994974 Google ScholarGoogle ScholarCross RefCross Ref
  31. Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: directed automated random testing. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12-15, 2005. 213–223. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Claudio Gutiérrez. 1998. Solving Equations in Strings: On Makanin’s Algorithm. In LATIN. 358–373.Google ScholarGoogle Scholar
  33. Mario Heiderich, Jörg Schwenk, Tilman Frosch, Jonas Magazinius, and Edward Z. Yang. 2013. mXSS attacks: attacking well-secured web-applications by using innerHTML mutations. In CCS. 777–788.Google ScholarGoogle Scholar
  34. Pieter Hooimeijer, Benjamin Livshits, David Molnar, Prateek Saxena, and Margus Veanes. 2011. Fast and Precise Sanitizer Analysis with BEK. In USENIX Security Symposium. http://static.usenix.org/events/sec11/tech/full_papers/Hooimeijer.pdfGoogle ScholarGoogle Scholar
  35. Pieter Hooimeijer and Westley Weimer. 2012. StrSolve: solving string constraints lazily. Autom. Softw. Eng. 19, 4 (2012), 531–559. Google ScholarGoogle ScholarCross RefCross Ref
  36. Artur Jez. 2016. Recompression: A Simple and Powerful Technique for Word Equations. J. ACM 63, 1 (2016), 4:1–4:51. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Scott Kausler and Elena Sherman. 2014. Evaluation of String Constraint Solvers in the Context of Symbolic Execution. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE ’14). ACM, New York, NY, USA, 259–270. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Christoph Kern. 2014. Securing the Tangled Web. Commun. ACM 57, 9 (Sept. 2014), 38–47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Adam Kiezun and others. 2012. HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars. ACM Trans. Softw. Eng. Methodol. 21, 4 (2012), 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nils Klarlund, Anders Møller, and Michael I. Schwartzbach. 2002. MONA Implementation Secrets. International Journal of Foundations of Computer Science 13, 4 (2002), 571–586. Google ScholarGoogle ScholarCross RefCross Ref
  41. Johannes Kloos, Rupak Majumdar, Filip Niksic, and Ruzica Piskac. 2013. Incremental, Inductive Coverability. In CAV’13 (Lecture Notes in Computer Science), Vol. 8044. Springer, 158–173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Daniel Kroening and Ofer Strichman. 2008. Decision Procedures. Springer.Google ScholarGoogle Scholar
  43. Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2014. A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions. In CAV. 646–662.Google ScholarGoogle Scholar
  44. Tianyi Liang, Andrew Reynolds, Nestan Tsiskaridze, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2016. An efficient SMT solver for string constraints. Formal Methods in System Design 48, 3 (2016), 206–234. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Tianyi Liang, Nestan Tsiskaridze, Andrew Reynolds, Cesare Tinelli, and Clark Barrett. 2015. A Decision Procedure for Regular Membership and Length Constraints over Unbounded Strings. In Frontiers of Combining Systems - 10th International Symposium, FroCoS 2015, Wroclaw, Poland, September 21-24, 2015. Proceedings. 135–150. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Anthony Widjaja Lin and Pablo Barceló. 2016. String solving with word equations and transducers: towards a logic for analysing mutation XSS. In POPL. 123–136. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Blake Loring, Duncan Mitchell, and Johannes Kinder. 2017. ExpoSE: Practical Symbolic Execution of Standalone JavaScript. In SPIN.Google ScholarGoogle Scholar
  48. Gennady S Makanin. 1977. The problem of solvability of equations in a free semigroup. Sbornik: Mathematics 32, 2 (1977), 129–198. Google ScholarGoogle ScholarCross RefCross Ref
  49. John McCarthy. 1980. Circumscription - A Form of Non-Monotonic Reasoning. Artif. Intell. 13, 1-2 (1980), 27–39. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Kenneth L. McMillan. 2003. Interpolation and SAT-Based Model Checking. In Computer Aided Verification, 15th International Conference, CAV 2003, Boulder, CO, USA, July 8-12, 2003, Proceedings. 1–13. DOI: Google ScholarGoogle ScholarCross RefCross Ref
  51. Christophe Morvan. 2000. On Rational Graphs. In FoSSaCS. 252–266. Google ScholarGoogle ScholarCross RefCross Ref
  52. Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2004. Abstract DPLL and Abstract DPLL Modulo Theories. In LPAR’04 (LNCS), Vol. 3452. Springer, 36–50.Google ScholarGoogle Scholar
  53. OWASP. 2013. https://www.owasp.org/images/f/f8/OWASP_Top_10_- _2013.pdf . (2013).Google ScholarGoogle Scholar
  54. Wojciech Plandowski. 2004. Satisfiability of word equations with constants is in PSPACE. J. ACM 51, 3 (2004), 483–496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Wojciech Plandowski. 2006. An efficient algorithm for solving word equations. In STOC. 467–476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Gideon Redelinghuys, Willem Visser, and Jaco Geldenhuys. 2012. Symbolic execution of programs with strings. In SAICSIT. 139–148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Philipp Rümmer. 2008. A Constraint Sequent Calculus for First-Order Logic with Linear Integer Arithmetic. In Proceedings, 15th International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LNCS), Vol. 5330. Springer, 274–289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Jacques Sakarovitch. 2009. Elements of automata theory. Cambridge University Press. Google ScholarGoogle ScholarCross RefCross Ref
  59. Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. 2010. A Symbolic Execution Framework for JavaScript. In S&P. 513–528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Koushik Sen, Swaroop Kalasapur, Tasneem G. Brutch, and Simon Gibbs. 2013. Jalangi: a selective record-replay and dynamic analysis framework for JavaScript. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013. 488–498. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Mary Sheeran, Satnam Singh, and Gunnar Stålmarck. 2000. Checking Safety Properties Using Induction and a SAT-Solver. In FMCAD (LNCS), Vol. 1954. Springer, 108–125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Deian Tabakov and Moshe Y. Vardi. 2005. Experimental Evaluation of Classical Automata Constructions. In Logic for Programming, Artificial Intelligence, and Reasoning, 12th International Conference, LPAR 2005, Montego Bay, Jamaica, December 2-6, 2005, Proceedings (Lecture Notes in Computer Science), Geoff Sutcliffe and Andrei Voronkov (Eds.), Vol. 3835. Springer, 396–411. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2014. S3: A Symbolic String Solver for Vulnerability Detection in Web Applications. In CCS. 1232–1243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2016. Progressive Reasoning over Recursively-Defined Strings. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I. 218–240. DOI: Google ScholarGoogle ScholarCross RefCross Ref
  65. Moshe Y. Vardi. 1995. An Automata-Theoretic Approach to Linear Temporal Logic. In Logics for Concurrency - Structure versus Automata (8th Banff Higher Order Workshop, August 27 - September 3, 1995, Proceedings). 238–266. DOI: Google ScholarGoogle ScholarCross RefCross Ref
  66. Margus Veanes, Pieter Hooimeijer, Benjamin Livshits, David Molnar, and Nikolaj Bjørner. 2012. Symbolic finite state transducers: algorithms and applications. In POPL. 137–150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Hung-En Wang, Tzung-Lin Tsai, Chun-Han Lin, Fang Yu, and Jie-Hong R. Jiang. 2016. String Analysis via Automata Manipulation with Logic Circuit Representation. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I (Lecture Notes in Computer Science), Vol. 9779. Springer, 241–260. DOI: Google ScholarGoogle ScholarCross RefCross Ref
  68. Gary Wassermann, Dachuan Yu, Ajay Chander, Dinakar Dhurjati, Hiroshi Inamura, and Zhendong Su. 2008. Dynamic test input generation for web applications. In ISSTA. 249–260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Joel Weinberger, Prateek Saxena, Devdatta Akhawe, Matthew Finifter, Eui Chul Richard Shin, and Dawn Song. 2011. A Systematic Analysis of XSS Sanitization in Web Application Frameworks. In ESORICS. 150–171. Google ScholarGoogle ScholarCross RefCross Ref
  70. Fang Yu, Muath Alkhalaf, and Tevfik Bultan. 2010. Stranger: An Automata-Based String Analysis Tool for PHP. In TACAS. 154–157. Benchmark can be found at http://www.cs.ucsb.edu/~vlab/stranger/ .Google ScholarGoogle Scholar
  71. Fang Yu, Muath Alkhalaf, Tevfik Bultan, and Oscar H. Ibarra. 2014. Automata-based symbolic string analysis for vulnerability detection. Formal Methods in System Design 44, 1 (2014), 44–70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Fang Yu, Tevfik Bultan, and Oscar H. Ibarra. 2009. Symbolic String Verification: Combining String Analysis and Size Analysis. In TACAS. 322–336.Google ScholarGoogle Scholar
  73. Fang Yu, Tevfik Bultan, and Oscar H. Ibarra. 2011. Relational String Verification Using Multi-Track Automata. Int. J. Found. Comput. Sci. 22, 8 (2011), 1909–1924. Google ScholarGoogle Scholar
  74. Yunhui Zheng, Xiangyu Zhang, and Vijay Ganesh. 2013. Z3-str: a Z3-based string solver for web application analysis. In ESEC/SIGSOFT FSE. 114–124. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. String constraints with concatenation and transducers solved efficiently

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!