skip to main content
research-article

Program Boosting: Program Synthesis via Crowd-Sourcing

Published:14 January 2015Publication History
Skip Abstract Section

Abstract

In this paper, we investigate an approach to program synthesis that is based on crowd-sourcing. With the help of crowd-sourcing, we aim to capture the "wisdom of the crowds" to find good if not perfect solutions to inherently tricky programming tasks, which elude even expert developers and lack an easy-to-formalize specification.

We propose an approach we call program boosting, which involves crowd-sourcing imperfect solutions to a difficult programming problem from developers and then blending these programs together in a way that improves their correctness.

We implement this approach in a system called CROWDBOOST and show in our experiments that interesting and highly non-trivial tasks such as writing regular expressions for URLs or email addresses can be effectively crowd-sourced. We demonstrate that carefully blending the crowd-sourced results together consistently produces a boost, yielding results that are better than any of the starting programs. Our experiments on 465 program pairs show consistent boosts in accuracy and demonstrate that program boosting can be performed at a relatively modest monetary cost.

Skip Supplemental Material Section

Supplemental Material

p677-sidebyside.mpg

References

  1. D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 75(2), 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Programming: An Introduction. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. W. Barowy, C. Curtsinger, E. D. Berger, and A. McGregor. Automan: a platform for integrating human-based and digital computation. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. F. Barrero, D. Camacho, and M. D.R-Moreno. Automatic web data extraction based on genetic algorithms and regular expressions. In Data Mining and Multi-agent Integration. 2009.Google ScholarGoogle ScholarCross RefCross Ref
  5. K. Chellapilla and D. Czarnecki. A preliminary investigation into evolving modular finite state machines. In Proceedings of the Congress on Evolutionary Computation, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  6. B. Cody-Kenny and S. Barrett. Self-focusing genetic programming for software optimisation. In Proceedings of the Conference on Genetic and Evolutionary Computation, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Cousot and R. Cousot. Formal language, grammar and setconstraint-based program analysis by abstract interpretation. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. D'Antoni and M. Veanes. Minimization of symbolic automata. In Proceedings of the Symposium on Principles of Programming Languages, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. J. Fogel, A. J. Owens, and M. J. Walsh. Artificial Intelligence through Simulated Evolution. 1966.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Forrest, T. Nguyen, W. Weimer, and C. L. Goues. A genetic programming approach to automated software repair. In Proceedings of the Conference on Genetic and Evolutionary Computation, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. U. Galassi and A. Giordana. Learning regular expressions from noisy sequences. In Proceedings of the Symposium on Abstraction, Reformulation and Approximation, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. M. Gold. Language identification in the limit. Information and Control, 10(5), 1967.Google ScholarGoogle Scholar
  13. M. Goldman, G. Little, and R. C. Miller. Collabode: Collaborative coding in the browser. In Proceedings of theWorkshop on Cooperative and Human Aspects of Software Engineering, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. González-Pardo and D. Camacho. Analysis of grammatical evolutionary approaches to regular expression induction. In Proceedings of the Congress on Evolutionary Computation, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Gulwani. Automating string processing in spreadsheets using inputoutput examples. In Proceedings of the Symposium on Principles of Programming Languages, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Gvero, V. Kuncak, and R. Piskac. Interactive synthesis of code snippets. In Proceedings of the Converence on Computer Aided Verification, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with Bek. In Proceedings of the USENIX Security Symposium, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Inagaki. On synchronized evolution of the network of automata. IEEE Transactions on Evolutionary Computation, 6(2), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Kearns and L. Valiant. Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM, 41(1), 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Lambeau, C. Damas, and P. Dupont. State-merging DFA induction algorithms with mandatory merge constraints. In Proceedings of the International Colloquim on Grammatical Inference, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. J. Lang. Random DFA's can be approximately learned from sparse uniform examples. In Proceedings of Workshop on Computational Learning Theory, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. J. Lang, B. A. Pearlmutter, and R. Price. Results of the abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In Proceedings of the International Colloquim on Grammatical Inference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. D. LaToza,W. B. Towne, C. M. Adriano, and A. van der Hoek. Microtask programming: Building software with a crowd. In Proceedings of the Symposium on User Interface Software and Technology, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Li, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. V. Jagadish. Regular expression learning for information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Little, L. B. Chilton, M. Goldman, and R. C. Miller. TurKit: tools for iterative tasks on mechanical turk. In Proceedings of the SIGKDD Workshop on Human Computation, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. M. Lucas and T. J. Reynolds. Learning DFA: evolution versus evidence driven state merging. In Proceedings of the Congress on Evolutionary Computation, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  29. S. M. Lucas and T. J. Reynolds. Learning deterministic finite automata with a smart state labeling evolutionary algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. O. Maler and I. E. Mens. Learning regular languages over large alphabets. In Tools and Algorithms for the Construction and Analysis of Systems, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  31. J. V. Nickerson, Y. Sakamoto, and L. Yu. Structures for creativity: The crowdsourcing of design. In CHI Workshop on Crowd-sourcing and Human Computation, 2011.Google ScholarGoogle Scholar
  32. J. Oncina and P. Garcíá. Identifying regular languages in polynomial time. In Advances in Structural and Syntactic Pattern Recognition, Series in Machine Perception and Artificial Intelligence. 1992.Google ScholarGoogle ScholarCross RefCross Ref
  33. A. G. Parameswaran, H. Park, H. Garcia-Molina, N. Polyzotis, and J. Widom. Deco: declarative crowdsourcing. In Proceedings of the Conference on Information and Knowledge Management, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Pitt and M. K. Warmuth. The minimum consistent DFA problem cannot be approximated within any polynomial. Journal of the ACM, 40(1), 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Poli, W. B. Langdon, and N. F. McPhee. A Field Guide to Genetic Programming. Lulu Enterprises, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. J. Quinn, B. B. Bederson, T. Yeh, and J. Lin. Crowdflow: Integrating machine learning with mechanical turk for speed-cost-quality flexibility. Technical Report HCIL-2010-09, University of Maryland, College Park, 2010.Google ScholarGoogle Scholar
  37. R. E. Schapire. The boosting approach to machine learning: An overview. https://www.cs.princeton.edu/courses/ archive/spring07/cos424/papers/boosting-survey.pdf, 2002.Google ScholarGoogle Scholar
  38. A. Solar-Lezama. Program sketching. STTT, 15(5--6):475--495, 2013.Google ScholarGoogle Scholar
  39. S. Srivastava, S. Gulwani, and J. S. Foster. From program verification to program synthesis. In Proceedings of the Symposium on Principles of Programming Languages, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. T. E. Uribe and M. E. Stickel. Ordered binary decision diagrams and the davis-putnam procedure. In Proceedings of the Conference on Constraints in Computational Linguistics, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Veanes and N. Bjørner. Symbolic automata: The toolkit. In Proceedings of the Conference on Tools and Algorithms for the Construction and Analysis of Systems, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. R. A. Wagner. Order-n correction for regular languages. Communications of the ACM, 17(5), 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Program Boosting: Program Synthesis via Crowd-Sourcing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 50, Issue 1
        POPL '15
        January 2015
        682 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2775051
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
        • cover image ACM Conferences
          POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
          January 2015
          716 pages
          ISBN:9781450333009
          DOI:10.1145/2676726

        Copyright © 2015 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 January 2015

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!