Abstract
Reasoning about string variables, in particular program inputs, is an important aspect of many program analyses and testing frameworks. Program inputs invariably arrive as strings, and are often manipulated using high-level string operations such as equality checks, regular expression matching, and string concatenation. It is difficult to reason about these operations because they are not well-integrated into current constraint solvers.
We present a decision procedure that solves systems of equations over regular language variables. Given such a system of constraints, our algorithm finds satisfying assignments for the variables in the system. We define this problem formally and render a mechanized correctness proof of the core of the algorithm. We evaluate its scalability and practical utility by applying it to the problem of automatically finding inputs that cause SQL injection vulnerabilities.
- S. Adams, T. Ball, M. Das, S. Lerner, S. K. Rajamani, M. Seigle, and W. Weimer. Speeding up dataflow analysis using flow-insensitive pointer analysis. In Static Analysis Symposium, pages 230--246, 2002. Google Scholar
Digital Library
- S. Bala. Regular language matching and other decidable cases of the satisfiability problem for constraints between regular open terms. In STACS, pages 596--607, 2004.Google Scholar
Cross Ref
- T. Ball, B. Cook, S. K. Lahiri, and L. Zhang. Zapato: Automatic theorem proving for predicate abstraction refinement. In Computer Aided Verification, pages 457--461, 2004.Google Scholar
Cross Ref
- T. Ball, M. Naik, and S. K. Rajamani. From symptom to cause: localizing errors in counterexample traces. SIGPLAN Not., 38(1):97--105, 2003. Google Scholar
Digital Library
- T. Ball and S. K. Rajamani. Automatically validating temporal safety properties of interfaces. In SPIN Workshop on Model Checking of Software, pages 103--122, May 2001. Google Scholar
Digital Library
- Y. Bertot and P. Casteran. Interactive Theorem Proving and Program Development. SpringerVerlag, 2004. Google Scholar
Digital Library
- N. Bjørner, N. Tillmann, and A. Voronkov. Path feasibility analysis for string-manipulating programs. In Tools and Algorithms for the Construction and Analysis of Systems, 2009. Google Scholar
Digital Library
- British Broadcasting Corporation. UN's website breached by hackers. In http://news.bbc.co.uk/2/hi/technology/6943385.stm, Aug. 2007.Google Scholar
- R. E. Bryant, D. Kroening, J. Ouaknine, S. A. Seshia, O. Strichman, and B. Brady. Deciding bit-vector arithmetic with abstraction. In Tools and Algorithms for the Construction and Analysis of Systems, pages 358--372, 2007. Google Scholar
Digital Library
- C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler. EXE: automatically generating inputs of death. In Computer and Communications Security, pages 322--335, 2006. Google Scholar
Digital Library
- A. S. Christensen, A. Møller, and M. I. Schwartzbach. Precise analysis of string expressions. In International Symposium on Static Analysis, pages 1--18, 2003. Google Scholar
Digital Library
- T. Coquand and G. P. Huet. The calculus of constructions. Inf. Comput., 76(2/3):95--120, 1988. Google Scholar
Digital Library
- L. M. de Moura and N. Bjørner. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems, pages 337--340, 2008. Google Scholar
Cross Ref
- D. Detlefs, G. Nelson, and J. B. Saxe. Simplify: a theorem prover for program checking. J. ACM, 52(3):365--473, 2005. Google Scholar
Digital Library
- V. Ganesh and D. L. Dill. A decision procedure for bit-vectors and arrays. In Computer-Aided Verification, pages 519--531, 2007. Google Scholar
Digital Library
- P. Godefroid, A. Kie|un, and M. Y. Levin. Grammar-based whitebox fuzzing. In Programming Language Design and Implementation, Tucson, AZ, USA, June 9--11, 2008. Google Scholar
Digital Library
- P. Godefroid, N. Klarlund, and K. Sen. DART: directed automated random testing. In Programming Language Design and Implementation, pages 213--223, 2005. Google Scholar
Digital Library
- P. Godefroid, M. Levin, and D. Molnar. Automated whitebox fuzz testing. In Network Distributed Security Symposium (NDSS), 2008.Google Scholar
- T. A. Henzinger, R. Jhala, R. Majumdar, G. C. Necula, G. Sutre, and W. Weimer. Temporal-safety proofs for systems code. In Computer Aided Verification, pages 526--538, 2002. Google Scholar
Digital Library
- T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy abstraction. In Principles of Programming Languages, pages 58--70, 2002. Google Scholar
Digital Library
- K. J. Higgins. Cross-site scripting: attackers' new favorite flaw. Technical report, http://www.darkreading.com/document.asp?doc_id=103774&WT.svl=news1_1, Sept. 2006.Google Scholar
- P. Hooimeijer and W. Weimer. Modeling bug report quality. In International Conference on Automated Software Engineering, pages 73--82, 2007. Google Scholar
Digital Library
- R. Jhala and R. Majumdar. Path slicing. In Programming Language Design and Implementation, pages 38--47, 2005. Google Scholar
Digital Library
- N. Jovanovic, C. Kruegel, and E. Kirda. Pixy: A static analysis tool for detecting web application vulnerabilities (short paper). In Symposium on Security and Privacy, pages 258--263, 2006. Google Scholar
Digital Library
- A. Kie|un, V. Ganesh, P. J. Guo, P. Hooimeijer, and M. D. Ernst. HAMPI: A solver for string constraints. technical report, Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory.Google Scholar
- J. Kodumal and A. Aiken. Banshee: A scalable constraint-based analysis toolkit. In Static Analysis Symposium, pages 218--234, 2005. Google Scholar
Digital Library
- M. Kunc. The power of commuting with finite sets of words. Theory Comput. Syst., 40(4):521--551, 2007. Google Scholar
Digital Library
- M. Kunc. What do we know about language equations? In Developments in Language Theory, pages 23--27, 2007. Google Scholar
Digital Library
- S. K. Lahiri, T. Ball, and B. Cook. Predicate abstraction via symbolic decision procedures. Logical Methods in Computer Science, 3(2), 2007.Google Scholar
- R. Majumdar and R.-G. Xu. Directed test generation using symbolic grammars. In Automated Software Engineering, pages 134--143, 2007. Google Scholar
Digital Library
- M. C. Martin, V. B. Livshits, and M. S. Lam. Finding application errors and security flaws using PQL: a program query language. In Object-Oriented Programming, Systems, Languages, and Applications, pages 365--383, 2005. Google Scholar
Digital Library
- Y. Minamide. Static approximation of dynamically generated web pages. In International Conference on the World Wide Web, pages 432--441, 2005. Google Scholar
Digital Library
- M. W. Moskewicz, C. F. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: engineering an efficient SAT solver. In Design Automation Conference, pages 530--535, 2001. Google Scholar
Digital Library
- M. Naik and A. Aiken. Conditional must not aliasing for static race detection. In Principles of Programming Languages, pages 327--338, 2007. Google Scholar
Digital Library
- G. C. Necula. Proof-carrying code. In Principles of Programming Languages, pages 106--119, New York, NY, USA, 1997. ACM. Google Scholar
Digital Library
- G. Nelson and D. C. Oppen. Simplification by cooperating decision procedures. ACM Trans. Program. Lang. Syst., 1(2):245--257, 1979. Google Scholar
Digital Library
- A. Salomaa, K. Salomaa, and S. Yu. State complexity of combined operations. Theor. Comput. Sci., 383(2--3):140--152, 2007. Google Scholar
Digital Library
- K. Sen. Race directed random testing of concurrent programs. In Programming Language Design and Implementation, pages 11--21, 2008. Google Scholar
Digital Library
- B. Steensgaard. Points-to analysis in almost linear time. In Principles of Programming Languages, pages 32--41, 1996. Google Scholar
Digital Library
- A. Stump, C. W. Barrett, and D. L. Dill. Cvc: A cooperating validity checker. In Computer Aided Verification, pages 500--504, 2002. Google Scholar
Digital Library
- Z. Su and G. Wassermann. The essence of command injection attacks in web applications. In Principles of Programming Languages, pages 372--382, 2006. Google Scholar
Digital Library
- P. Thiemann. Grammar-based analysis of string expressions. In Workshop on Types in Languages Design and Implementation, pages 59--70, New York, NY, USA, 2005. ACM. Google Scholar
Digital Library
- G. Wassermann and Z. Su. Sound and precise analysis of web applications for injection vulnerabilities. In Programming Language Design and Implementation, pages 32--41, 2007. Google Scholar
Digital Library
- G. Wassermann and Z. Su. Static detection of cross-site scripting vulnerabilities. In International Conference on Software Engineering, pages 171--180, 2008. Google Scholar
Digital Library
- G. Wassermann, D. Yu, A. Chander, D. Dhurjati, H. Inamura, and Z. Su. Dynamic test input generation for web applications. In International Symposium on Software testing and analysis, pages 249--260, 2008. Google Scholar
Digital Library
- W. Weimer. Patches as better bug reports. In Generative Programming and Component Engineering, pages 181--190, 2006. Google Scholar
Digital Library
- Y. Xie and A. Aiken. Static detection of security vulnerabilities in scripting languages. In Usenix Security Symposium, pages 179--192, July 2006. Google Scholar
Digital Library
- Y. Xie and A. Aiken. Saturn: A scalable framework for error detection using boolean satisfiability. ACM Trans. Program. Lang. Syst., 29(3): 16, 2007. Google Scholar
Digital Library
- F. Yu, T. Bultan, M. Cova, and O. H. Ibarra. Symbolic string verification: An automata-based approach. In SPIN'08: Proceedings of the 15th international workshop on Model Checking Software, pages 306--324, Berlin, Heidelberg, 2008. Springer-Verlag. Google Scholar
Digital Library
- F. Yu, T. Bultan, and O. H. Ibarra. Symbolic string verification: Combining string analysis and size analysis. In Tools and Algorithms for the Construction and Analysis of Systems, 2009. Google Scholar
Digital Library
Index Terms
A decision procedure for subset constraints over regular languages
Recommendations
A decision procedure for subset constraints over regular languages
PLDI '09: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and ImplementationReasoning about string variables, in particular program inputs, is an important aspect of many program analyses and testing frameworks. Program inputs invariably arrive as strings, and are often manipulated using high-level string operations such as ...
Restarting Transducers, Regular Languages, and Rational Relations
A (nonforgetting) restarting transducer is a (nonforgetting) restarting automaton that is equipped with an output function. Accordingly, restarting transducers compute binary relations, and deterministic restarting transducers compute functions. Here we ...
Regular component decomposition of regular languages
A language is regular if it can be recognized by a finite automaton. According to the pumping lemma, every infinite regular language contains a regular subset of the form uv+w, where u,v,w are words and v is not empty. It is known that every regular ...







Comments