Abstract
String analysis is the problem of reasoning about how strings are manipulated by a program. It has numerous applications including automatic detection of cross-site scripting, and automatic test-case generation. A popular string analysis technique includes symbolic executions, which at their core use constraint solvers over the string domain, a.k.a. string solvers. Such solvers typically reason about constraints expressed in theories over strings with the concatenation operator as an atomic constraint. In recent years, researchers started to recognise the importance of incorporating the replace-all operator (i.e. replace all occurrences of a string by another string) and, more generally, finite-state transductions in the theories of strings with concatenation. Such string operations are typically crucial for reasoning about XSS vulnerabilities in web applications, especially for modelling sanitisation functions and implicit browser transductions (e.g. innerHTML). Although this results in an undecidable theory in general, it was recently shown that the straight-line fragment of the theory is decidable, and is sufficiently expressive in practice. In this paper, we provide the first string solver that can reason about constraints involving both concatenation and finite-state transductions. Moreover, it has a completeness and termination guarantee for several important fragments (e.g. straight-line fragment). The main challenge addressed in the paper is the prohibitive worst-case complexity of the theory (double-exponential time), which is exponentially harder than the case without finite-state transductions. To this end, we propose a method that exploits succinct alternating finite-state automata as concise symbolic representations of string constraints. In contrast to previous approaches using nondeterministic automata, alternation offers not only exponential savings in space when representing Boolean combinations of transducers, but also a possibility of succinct representation of otherwise costly combinations of transducers and concatenation. Reasoning about the emptiness of the AFA language requires a state-space exploration in an exponential-sized graph, for which we use model checking algorithms (e.g. IC3). We have implemented our algorithm and demonstrated its efficacy on benchmarks that are derived from cross-site scripting analysis and other examples in the literature.
Supplemental Material
- Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Lukás Holík, Ahmed Rezine, Philipp Rümmer, and Jari Stenman. 2014. String Constraints for Verification. In CAV. 150–166. Google Scholar
Digital Library
- Davide Balzarotti, Marco Cova, Viktoria Felmetsger, Nenad Jovanovic, Engin Kirda, Christopher Kruegel, and Giovanni Vigna. 2008. Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications. In S&P. 387–401.Google Scholar
- Pablo Barceló, Diego Figueira, and Leonid Libkin. 2013. Graph Logics with Rational Relations. Logical Methods in Computer Science 9, 3 (2013). DOI: Google Scholar
Cross Ref
- Pablo Barceló, Leonid Libkin, A. W. Lin, and Peter T. Wood. 2012. Expressive Languages for Path Queries over GraphStructured Data. ACM Trans. Database Syst. 37, 4 (2012), 31.Google Scholar
Digital Library
- Clark Barrett, Aaron Stump, and Cesare Tinelli. 2010. The SMT-LIB Standard: Version 2.0. In Proc. of SMT’10.Google Scholar
- Clark W. Barrett, Cesare Tinelli, Morgan Deters, Tianyi Liang, Andrew Reynolds, and Nestan Tsiskaridze. 2016. Efficient solving of string constraints for security analysis. In Proceedings of the Symposium and Bootcamp on the Science of Security, Pittsburgh, PA, USA, April 19-21, 2016. 4–6. DOI: Google Scholar
Digital Library
- Daniel Le Berre and Anne Parrain. 2010. The Sat4j library, release 2.2. JSAT 7, 2-3 (2010), 59–6. http://jsat.ewi.tudelft.nl/ content/volume7/JSAT7_4_LeBerre.pdfGoogle Scholar
- Jean Berstel. 1979. Transductions and Context-Free Languages. Teubner-Verlag. Google Scholar
Cross Ref
- Armin Biere, Keijo Heljanko, and Siert Wieringa. 2017. AIGER 1.9 and Beyond (Draft). http://fmv.jku.at/hwmcc11/beyond1.pdf (cited in 2017). (2017).Google Scholar
- Nikolaj Bjørner, Nikolai Tillmann, and Andrei Voronkov. 2009. Path feasibility analysis for string-manipulating programs. In TACAS. 307–321. Google Scholar
Digital Library
- Aaron R. Bradley. 2012. Understanding IC3. In Theory and Applications of Satisfiability Testing - SAT 2012 - 15th International Conference, Trento, Italy, June 17-20, 2012. Proceedings. 1–14. DOI: Google Scholar
Digital Library
- Robert Brayton and Alan Mishchenko. 2010. ABC: An Academic Industrial-Strength Verification Tool. In Computer Aided Verification: 22nd International Conference, CAV 2010, Edinburgh, UK, July 15-19, 2010. Proceedings, Tayssir Touili, Byron Cook, and Paul Jackson (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 24–40. DOI: Google Scholar
Digital Library
- Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2008. EXE: Automatically Generating Inputs of Death. ACM Trans. Inf. Syst. Secur. 12, 2 (2008), 10:1–10:38. DOI: Google Scholar
Digital Library
- Cristian Cadar, Patrice Godefroid, Sarfraz Khurshid, Corina S. Pasareanu, Koushik Sen, Nikolai Tillmann, and Willem Visser. 2011. Symbolic execution for software testing in practice: preliminary assessment. In Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011. 1066–1071. DOI: Google Scholar
Digital Library
- Roberto Cavada, Alessandro Cimatti, Michele Dorigatti, Alberto Griggio, Alessandro Mariotti, Andrea Micheli, Sergio Mover, Marco Roveri, and Stefano Tonetta. 2014. The nuXmv Symbolic Model Checker. In CAV’14 (Lecture Notes in Computer Science), Vol. 8559. Springer, 334–342. Google Scholar
Digital Library
- Edmund M. Clarke, Orna Grumberg, and Doron A. Peled. 1999. Model Checking. The MIT Press, Cambridge, Massachusetts.Google Scholar
Digital Library
- Google co. 2015. Google Closure Library (referred in Nov 2015). https://developers.google.com/closure/library/ . (2015).Google Scholar
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.Google Scholar
Digital Library
- Arlen Cox and Jason Leasure. 2017. Model Checking Regular Language Constraints. CoRR abs/1708.09073 (2017). arXiv: 1708.09073 http://arxiv.org/abs/1708.09073Google Scholar
- Loris D’Antoni, Zachary Kincaid, and Fang Wang. 2016. A Symbolic Decision Procedure for Symbolic Alternating Finite Automata. CoRR abs/1610.01722 (2016). http://arxiv.org/abs/1610.01722Google Scholar
- Loris D’Antoni and Margus Veanes. 2013. Static Analysis of String Encoders and Decoders. In VMCAI. 209–228. Google Scholar
Digital Library
- Leonardo De Moura and Nikolaj Bjørner. 2011. Satisfiability modulo theories: introduction and applications. Commun. ACM 54, 9 (2011), 69–77. Google Scholar
Digital Library
- Volker Diekert. 2002. Makanin’s Algorithm. In Algebraic Combinatorics on Words, M. Lothaire (Ed.). Encyclopedia of Mathematics and its Applications, Vol. 90. Cambridge University Press, Chapter 12, 387–442.Google Scholar
- Laurent Doyen and Jean-François Raskin. 2010. Antichain Algorithms for Finite Automata. In TACAS’10 (Lecture Notes in Computer Science), Vol. 6015. Springer, 2–22. DOI: Google Scholar
Digital Library
- Alain Finkel. 1987. A Generalization of the Procedure of Karp and Miller to Well Structured Transition Systems. In Automata, Languages and Programming, 14th International Colloquium, ICALP87, Karlsruhe, Germany, July 13-17, 1987, Proceedings (Lecture Notes in Computer Science), Thomas Ottmann (Ed.), Vol. 267. Springer, 499–508. DOI: Google Scholar
Cross Ref
- Xiang Fu and Chung-Chih Li. 2010. Modeling Regular Replacement for String Constraint Solving. In NFM. 67–76.Google Scholar
- Xiang Fu, Michael C. Powell, Michael Bantegui, and Chung-Chih Li. 2013. Simple linear string constraints. Formal Asp. Comput. 25, 6 (2013), 847–891. Google Scholar
Cross Ref
- Vijay Ganesh, Mia Minnes, Armando Solar-Lezama, and Martin Rinard. 2013. Word equations with length constraints: what’s decidable? In Hardware and Software: Verification and Testing. Springer, 209–226. Google Scholar
Digital Library
- Graeme Gange, Jorge A. Navas, Peter J. Stuckey, Harald Søndergaard, and Peter Schachte. 2013. Unbounded Model-Checking with Interpolation for Regular Language Constraints. In TACAS’2013 (Lecture Notes in Computer Science), Vol. 7795. Springer, 277–291. Google Scholar
Digital Library
- Seymour Ginsburg and Edwin H. Spanier. 1966. Semigroups, Presburger formulas, and languages. Pacific J. Math. 16, 2 (1966), 285–296. http://projecteuclid.org/euclid.pjm/1102994974 Google Scholar
Cross Ref
- Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: directed automated random testing. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12-15, 2005. 213–223. DOI: Google Scholar
Digital Library
- Claudio Gutiérrez. 1998. Solving Equations in Strings: On Makanin’s Algorithm. In LATIN. 358–373.Google Scholar
- Mario Heiderich, Jörg Schwenk, Tilman Frosch, Jonas Magazinius, and Edward Z. Yang. 2013. mXSS attacks: attacking well-secured web-applications by using innerHTML mutations. In CCS. 777–788.Google Scholar
- Pieter Hooimeijer, Benjamin Livshits, David Molnar, Prateek Saxena, and Margus Veanes. 2011. Fast and Precise Sanitizer Analysis with BEK. In USENIX Security Symposium. http://static.usenix.org/events/sec11/tech/full_papers/Hooimeijer.pdfGoogle Scholar
- Pieter Hooimeijer and Westley Weimer. 2012. StrSolve: solving string constraints lazily. Autom. Softw. Eng. 19, 4 (2012), 531–559. Google Scholar
Cross Ref
- Artur Jez. 2016. Recompression: A Simple and Powerful Technique for Word Equations. J. ACM 63, 1 (2016), 4:1–4:51. DOI: Google Scholar
Digital Library
- Scott Kausler and Elena Sherman. 2014. Evaluation of String Constraint Solvers in the Context of Symbolic Execution. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE ’14). ACM, New York, NY, USA, 259–270. DOI: Google Scholar
Digital Library
- Christoph Kern. 2014. Securing the Tangled Web. Commun. ACM 57, 9 (Sept. 2014), 38–47. Google Scholar
Digital Library
- Adam Kiezun and others. 2012. HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars. ACM Trans. Softw. Eng. Methodol. 21, 4 (2012), 25. Google Scholar
Digital Library
- Nils Klarlund, Anders Møller, and Michael I. Schwartzbach. 2002. MONA Implementation Secrets. International Journal of Foundations of Computer Science 13, 4 (2002), 571–586. Google Scholar
Cross Ref
- Johannes Kloos, Rupak Majumdar, Filip Niksic, and Ruzica Piskac. 2013. Incremental, Inductive Coverability. In CAV’13 (Lecture Notes in Computer Science), Vol. 8044. Springer, 158–173. Google Scholar
Digital Library
- Daniel Kroening and Ofer Strichman. 2008. Decision Procedures. Springer.Google Scholar
- Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2014. A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions. In CAV. 646–662.Google Scholar
- Tianyi Liang, Andrew Reynolds, Nestan Tsiskaridze, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2016. An efficient SMT solver for string constraints. Formal Methods in System Design 48, 3 (2016), 206–234. DOI: Google Scholar
Digital Library
- Tianyi Liang, Nestan Tsiskaridze, Andrew Reynolds, Cesare Tinelli, and Clark Barrett. 2015. A Decision Procedure for Regular Membership and Length Constraints over Unbounded Strings. In Frontiers of Combining Systems - 10th International Symposium, FroCoS 2015, Wroclaw, Poland, September 21-24, 2015. Proceedings. 135–150. DOI: Google Scholar
Digital Library
- Anthony Widjaja Lin and Pablo Barceló. 2016. String solving with word equations and transducers: towards a logic for analysing mutation XSS. In POPL. 123–136. DOI: Google Scholar
Digital Library
- Blake Loring, Duncan Mitchell, and Johannes Kinder. 2017. ExpoSE: Practical Symbolic Execution of Standalone JavaScript. In SPIN.Google Scholar
- Gennady S Makanin. 1977. The problem of solvability of equations in a free semigroup. Sbornik: Mathematics 32, 2 (1977), 129–198. Google Scholar
Cross Ref
- John McCarthy. 1980. Circumscription - A Form of Non-Monotonic Reasoning. Artif. Intell. 13, 1-2 (1980), 27–39. DOI: Google Scholar
Digital Library
- Kenneth L. McMillan. 2003. Interpolation and SAT-Based Model Checking. In Computer Aided Verification, 15th International Conference, CAV 2003, Boulder, CO, USA, July 8-12, 2003, Proceedings. 1–13. DOI: Google Scholar
Cross Ref
- Christophe Morvan. 2000. On Rational Graphs. In FoSSaCS. 252–266. Google Scholar
Cross Ref
- Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2004. Abstract DPLL and Abstract DPLL Modulo Theories. In LPAR’04 (LNCS), Vol. 3452. Springer, 36–50.Google Scholar
- OWASP. 2013. https://www.owasp.org/images/f/f8/OWASP_Top_10_- _2013.pdf . (2013).Google Scholar
- Wojciech Plandowski. 2004. Satisfiability of word equations with constants is in PSPACE. J. ACM 51, 3 (2004), 483–496. Google Scholar
Digital Library
- Wojciech Plandowski. 2006. An efficient algorithm for solving word equations. In STOC. 467–476. Google Scholar
Digital Library
- Gideon Redelinghuys, Willem Visser, and Jaco Geldenhuys. 2012. Symbolic execution of programs with strings. In SAICSIT. 139–148. Google Scholar
Digital Library
- Philipp Rümmer. 2008. A Constraint Sequent Calculus for First-Order Logic with Linear Integer Arithmetic. In Proceedings, 15th International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LNCS), Vol. 5330. Springer, 274–289. Google Scholar
Digital Library
- Jacques Sakarovitch. 2009. Elements of automata theory. Cambridge University Press. Google Scholar
Cross Ref
- Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. 2010. A Symbolic Execution Framework for JavaScript. In S&P. 513–528. Google Scholar
Digital Library
- Koushik Sen, Swaroop Kalasapur, Tasneem G. Brutch, and Simon Gibbs. 2013. Jalangi: a selective record-replay and dynamic analysis framework for JavaScript. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013. 488–498. DOI: Google Scholar
Digital Library
- Mary Sheeran, Satnam Singh, and Gunnar Stålmarck. 2000. Checking Safety Properties Using Induction and a SAT-Solver. In FMCAD (LNCS), Vol. 1954. Springer, 108–125.Google Scholar
Digital Library
- Deian Tabakov and Moshe Y. Vardi. 2005. Experimental Evaluation of Classical Automata Constructions. In Logic for Programming, Artificial Intelligence, and Reasoning, 12th International Conference, LPAR 2005, Montego Bay, Jamaica, December 2-6, 2005, Proceedings (Lecture Notes in Computer Science), Geoff Sutcliffe and Andrei Voronkov (Eds.), Vol. 3835. Springer, 396–411. DOI: Google Scholar
Digital Library
- Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2014. S3: A Symbolic String Solver for Vulnerability Detection in Web Applications. In CCS. 1232–1243. Google Scholar
Digital Library
- Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2016. Progressive Reasoning over Recursively-Defined Strings. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I. 218–240. DOI: Google Scholar
Cross Ref
- Moshe Y. Vardi. 1995. An Automata-Theoretic Approach to Linear Temporal Logic. In Logics for Concurrency - Structure versus Automata (8th Banff Higher Order Workshop, August 27 - September 3, 1995, Proceedings). 238–266. DOI: Google Scholar
Cross Ref
- Margus Veanes, Pieter Hooimeijer, Benjamin Livshits, David Molnar, and Nikolaj Bjørner. 2012. Symbolic finite state transducers: algorithms and applications. In POPL. 137–150. Google Scholar
Digital Library
- Hung-En Wang, Tzung-Lin Tsai, Chun-Han Lin, Fang Yu, and Jie-Hong R. Jiang. 2016. String Analysis via Automata Manipulation with Logic Circuit Representation. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part I (Lecture Notes in Computer Science), Vol. 9779. Springer, 241–260. DOI: Google Scholar
Cross Ref
- Gary Wassermann, Dachuan Yu, Ajay Chander, Dinakar Dhurjati, Hiroshi Inamura, and Zhendong Su. 2008. Dynamic test input generation for web applications. In ISSTA. 249–260. Google Scholar
Digital Library
- Joel Weinberger, Prateek Saxena, Devdatta Akhawe, Matthew Finifter, Eui Chul Richard Shin, and Dawn Song. 2011. A Systematic Analysis of XSS Sanitization in Web Application Frameworks. In ESORICS. 150–171. Google Scholar
Cross Ref
- Fang Yu, Muath Alkhalaf, and Tevfik Bultan. 2010. Stranger: An Automata-Based String Analysis Tool for PHP. In TACAS. 154–157. Benchmark can be found at http://www.cs.ucsb.edu/~vlab/stranger/ .Google Scholar
- Fang Yu, Muath Alkhalaf, Tevfik Bultan, and Oscar H. Ibarra. 2014. Automata-based symbolic string analysis for vulnerability detection. Formal Methods in System Design 44, 1 (2014), 44–70. Google Scholar
Digital Library
- Fang Yu, Tevfik Bultan, and Oscar H. Ibarra. 2009. Symbolic String Verification: Combining String Analysis and Size Analysis. In TACAS. 322–336.Google Scholar
- Fang Yu, Tevfik Bultan, and Oscar H. Ibarra. 2011. Relational String Verification Using Multi-Track Automata. Int. J. Found. Comput. Sci. 22, 8 (2011), 1909–1924. Google Scholar
- Yunhui Zheng, Xiangyu Zhang, and Vijay Ganesh. 2013. Z3-str: a Z3-based string solver for web application analysis. In ESEC/SIGSOFT FSE. 114–124. Google Scholar
Digital Library
Index Terms
String constraints with concatenation and transducers solved efficiently
Recommendations
An efficient SMT solver for string constraints
An increasing number of applications in verification and security rely on or could benefit from automatic solvers that can check the satisfiability of constraints over a diverse set of data types that includes character strings. Until recently, ...
From bidirectionality to alternation
We describe an explicit simulation of 2-way nondeterministic automata by 1-way alternating automata with quadratic blow-up. We first describe the construction for automata on finite words, and extend it to automata on infinite words.
An alternating hierarchy for finite automata
We study the polynomial state complexity classes [email protected]"k and [email protected]"k, that is, the hierarchy of problems that can be solved with a polynomial number of states by two-way alternating finite automata (2Afas) making at most k-1 alternations between existential ...






Comments