skip to main content
research-article

Sigma*: symbolic learning of input-output specifications

Published:23 January 2013Publication History
Skip Abstract Section

Abstract

We present Sigma*, a novel technique for learning symbolic models of software behavior. Sigma* addresses the challenge of synthesizing models of software by using symbolic conjectures and abstraction. By combining dynamic symbolic execution to discover symbolic input-output steps of the programs and counterexample guided abstraction refinement to over-approximate program behavior, Sigma* transforms arbitrary source representation of programs into faithful input-output models. We define a class of stream filters---programs that process streams of data items---for which Sigma* converges to a complete model if abstraction refinement eventually builds up a sufficiently strong abstraction. In other words, Sigma* is complete relative to abstraction. To represent inferred symbolic models, we use a variant of symbolic transducers that can be effectively composed and equivalence checked. Thus, Sigma* enables fully automatic analysis of behavioral properties such as commutativity, reversibility and idempotence, which is useful for web sanitizer verification and stream programs compiler optimizations, as we show experimentally. We also show how models inferred by Sigma* can boost performance of stream programs by parallelized code generation.

Skip Supplemental Material Section

Supplemental Material

r2d3_talk1.mp4

References

  1. F. Aarts, B. Jonsson, and J. Uijen. Generating models of infinite-state communication protocols using regular inference with abstraction. In Proc. of the 22nd IFIP WG 6.1 Int. Conf. on Testing Software and Systems, pages 188--204, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Agrawal, W. Thies, and S. P. Amarasinghe. Optimizing stream programs using linear state space analysis. In Proc. of the 2005 Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, pages 126--136, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Alur and P. Cerny. Streaming transducers for algorithmic verification of single-pass list-processing programs. In Proc. of the 38th ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 599--610, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Alur, P. Cerny, P. Madhusudan, andW. Nam. Synthesis of interface specifications for Java classes. In Proc. of the 32nd ACM SIGPLANSIGACT Symp. on Principles of Programming Languages, pages 98--109, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 75(2):87--106, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Ball. Formalizing counterexample-driven refinement with weakest preconditions. In Engineering Theories of Software Intensive Systems, volume 195 of NATO Science Series, pages 121--139. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  7. T. Ball, R. Majumdar, T. Millstein, and S. Rajamani. Automatic predicate abstraction of C programs. In Proc. of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 203--213, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Ball, A. Podelski, and S. K. Rajamani. Boolean and Cartesian abstraction for model checking C programs. In Proc. of the 7th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems, pages 268--283, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Ball, A. Podelski, and S. K. Rajamani. Relative completeness of abstraction refinement for software model checking. In Proc. of the 8th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems, pages 158--172, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Balzarotti, M. Cova, V. Felmetsger, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna. Saner: Composing static and dynamic analysis to validate sanitization in web applications. In IEEE Symposium on Security and Privacy, pages 387--401, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. M. Baskaran, N. Vydyanathan, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In Proc. of the 14th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 219--228, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Berg, B. Jonsson, and H. Raffelt. Regular inference for state machines using domains with equality tests. In Proc. of the Theory and practice of software, 11th Int. Conf. on Fundamental approaches to software engineering, pages 317--331, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Beyer and M. E. Keremoglu. Cpachecker: A tool for configurable software verification. In Proc. of the 23rd Int. Conf. on Computer Aided Verification, pages 184--190, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Bjørner, P. Hooimeijer, B. Livshits, D. Molnar, and M. Veanes. Symbolic finite state transducers: Algorithms and applications. In Proc.of the 39th ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Buck, T. Foley, D. R. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph., 23(3):777--786, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Cadar and D. R. Engler. Execution generated test cases: How to make systems code crash itself. In Proc. of the 12th Int. SPIN Workshop on Model Checking Software, pages 2--23, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Cadar, D. Dunbar, and D. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proc. of the 8th USENIX Symp. on Operating Systems Design and Implementation, pages 209--224, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. K. Chen, X.-F. Li, R. Lian, J. H. Lin, L. Liu, T. Liu, and R. Ju. Shangri-La: achieving high performance from compiled network applications while enabling ease of programming. In Proc. of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 224--236, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Y. Cho, D. Babić, P. Poosankam, K. Z. Chen, E. X. Wu, and D. Song. MACE: Model-inference-assisted concolic exploration for protocol and vulnerability discovery. In USENIX Security Symposium, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided abstraction refinement. In Proc. of the 12th Int. Conf. on Computer Aided Verification, pages 154--169, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. M. Cobleigh, D. Giannakopoulou, and C. S. Pasareanu. Learning assumptions for compositional verification. In Proc. of the 9th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems, volume 2619, pages 331--346, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. J. Demers, C. Keleman, and B. Reusch. On some decidable properties of finite state translations. Acta Informatica, 17:349--364, 1982.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Drake, H. Hoffmann, R. M. Rabbah, and S. P. Amarasinghe. MPEG-2 decoding in a stream programming language. In Proc. of the 20th International Parallel and Distributed Processing Symposium, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Feret. Static analysis of digital filters. In Programming Languages and Systems, 13th European Symposium on Programming, pages 33--48, 2004.Google ScholarGoogle Scholar
  25. V. Ganesh and D. L. Dill. A decision procedure for bit-vectors and arrays. In Proc. of the 19th Int. Conf. on Computer Aided Verification, pages 519--531, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, and M. Doo. Spade: the System S declarative stream processing engine. In SIGMOD Conference, pages 1123--1134, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Giannakopoulou, Z. Rakamaric, and V. Raman. Symbolic learning of component interfaces. In 19th Int. Symp. on Static Analysis, pages 248--264, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Godefroid, N. Klarlund, and K. Sen. DART: directed automated random testing. In Proc. of the ACM SIGPLAN 2005 Conf. on Programming Language Design and Implementation, pages 213--223, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. I. Gordon, W. Thies, and S. P. Amarasinghe. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In Proc. of the 12th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 151--162, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Graf and H. Saidi. Construction of abstract state graphs with PVS. In Proc. of the 9th Int. Conf. on Computer Aided Verification, pages 72--83, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. S. Gulavani, T. A. Henzinger, Y. Kannan, A. V. Nori, and S. K. Rajamani. SYNERGY: a new algorithm for property checking. In Proc. of the 14th ACM SIGSOFT Int. Symp. on Foundations of Software Engineering, pages 117--127, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum. Streamware: programming general-purpose multicore processors using streams. In Proc. of the 13th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 297--307, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proc. of the Workload Characterization. WWC-4 IEEE Int. Workshop, pages 3--14, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Habermehl and T. Vojnar. Regular model checking using inference of regular languages. Electr. Notes Theor. Comput. Sci., 138:21--36, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Hagiescu,W.-F.Wong, D. F. Bacon, and R. M. Rabbah. A computing origami: folding streams in FPGAs. In Proc. of the 46th Design Automation Conference, pages 282--287, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy abstraction. In Proc. of the 29th ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 58--70, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with BEK. In USENIX Security Symposium, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. E. Hopcroft. On the equivalence and containment problems for context-free languages. Theory of Computing Systems, 3:119--124, 1969.Google ScholarGoogle Scholar
  39. A. Hormati, M. Kudlur, S. A. Mahlke, D. F. Bacon, and R. M. Rabbah. Optimus: efficient realization of streaming applications on FPGAs. In Proc. of the 2008 Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, pages 41--50, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. F. Howar, B. Steffen, B. Jonsson, and S. Cassel. Inferring canonical register automata. In Proc. of the 13th Int. Conf. on Verification, Model Checking, and Abstract Interpretation, pages 251--266, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. O. H. Ibarra. The unsolvability of the equivalence problem for _-free NGSM's with unary input (output) alphabet and applications. In Proc. of the 18th Annual Symp. on Foundations of Computer Science, pages 74--81, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. R. Jhala and K. L. McMillan. A practical and complete approach to predicate refinement. In Proc. of the 12th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems, pages 459--473, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. U. J. Kapasi,W. J. Dally, S. Rixner, J. D. Owens, and B. Khailany. The imagine stream processor. In Proc. of the 20th Int. Conf. on Computer Design, VLSI in Computers and Processors, pages 282--288, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Kudlur and S. A. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In Proc. of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, pages 114--124, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. A. A. Lamb, W. Thies, and S. P. Amarasinghe. Linear analysis and optimization of stream programs. In Proc. of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pages 12--25, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. D. Lee and M. Yannakakis. Principles and methods of testing finite state machines-a survey. In Proc. of the IEEE, volume 84, pages 1090--1123, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  47. S.-W. Liao, Z. Du, G. Wu, and G.-Y. Lueh. Data and computation transformations for Brook streaming applications on multiprocessors. In Proc. of the 4th IEEE/ACM Int. Symp. on Code Generation and Optimization, pages 196--207, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. P. Prabhu, G. Ramalingam, and K. Vaswani. Safe programmable speculative parallelism. In Proc. of the 2010 ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 50--61, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. K. Sen, D. Marinov, and G. Agha. CUTE: a concolic unit testing engine for C. In Proc. of the 10th European Software Engineering Conf. held jointly with 13th ACM SIGSOFT Int. Symp. on Foundations of Software Engineering, pages 263--272, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. M. Shahbaz and R. Groz. Inferring Mealy machines. In Proc. of the 2nd World Congress on Formal Methods, pages 207--222, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. R. Singh, D. Giannakopoulou, and C. S. Pasareanu. Learning component interfaces with may and must abstractions. In Proc. of the 22nd Int. Conf. on Computer Aided Verification, pages 527--542, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. R. Soule, M. Hirzel, R. Grimm, B. Gedik, H. Andrade, V. Kumar, and K.-L. Wu. A universal calculus for stream processing languages. In Programming Languages and Systems, 19th European Symposium on Programming, pages 507--528, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. W. Thies and S. P. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In Proc. of the 19th International Conference on Parallel Architecture and Compilation Techniques, pages 365--376, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proc. of the 11th International Conference on Compiler Construction, pages 179--196, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. W. Thies, V. Chandrasekhar, and S. P. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In 40th Annual IEEE/ACM Int. Symp. on Microarchitecture, pages 356--369, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Software pipelined execution of stream programs on GPUs. In Proc. of the 7th Int. Symp. on Code Generation and Optimization, pages 200--209, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. M. Veanes, D. Molnar, B. Livshits, and L. Litchev. Generating fast string manipulating code through transducer exploration and SIMD integration. Technical Report MSR-TR-2011-124, Microsoft Research, 2011.Google ScholarGoogle Scholar
  58. J. M. Vilar. Query learning of subsequential transducers. In Proc. of the 3rd Int. Colloquium on Grammatical Inference: Learning Syntax from Sentences, pages 72--83, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sigma*: symbolic learning of input-output specifications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!