Abstract
In this paper, we investigate an approach to program synthesis that is based on crowd-sourcing. With the help of crowd-sourcing, we aim to capture the "wisdom of the crowds" to find good if not perfect solutions to inherently tricky programming tasks, which elude even expert developers and lack an easy-to-formalize specification.
We propose an approach we call program boosting, which involves crowd-sourcing imperfect solutions to a difficult programming problem from developers and then blending these programs together in a way that improves their correctness.
We implement this approach in a system called CROWDBOOST and show in our experiments that interesting and highly non-trivial tasks such as writing regular expressions for URLs or email addresses can be effectively crowd-sourced. We demonstrate that carefully blending the crowd-sourced results together consistently produces a boost, yielding results that are better than any of the starting programs. Our experiments on 465 program pairs show consistent boosts in accuracy and demonstrate that program boosting can be performed at a relatively modest monetary cost.
Supplemental Material
- D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 75(2), 1987. Google Scholar
Digital Library
- W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Programming: An Introduction. 1997. Google Scholar
Digital Library
- D. W. Barowy, C. Curtsinger, E. D. Berger, and A. McGregor. Automan: a platform for integrating human-based and digital computation. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, 2012. Google Scholar
Digital Library
- D. F. Barrero, D. Camacho, and M. D.R-Moreno. Automatic web data extraction based on genetic algorithms and regular expressions. In Data Mining and Multi-agent Integration. 2009.Google Scholar
Cross Ref
- K. Chellapilla and D. Czarnecki. A preliminary investigation into evolving modular finite state machines. In Proceedings of the Congress on Evolutionary Computation, 1999.Google Scholar
Cross Ref
- B. Cody-Kenny and S. Barrett. Self-focusing genetic programming for software optimisation. In Proceedings of the Conference on Genetic and Evolutionary Computation, 2013. Google Scholar
Digital Library
- P. Cousot and R. Cousot. Formal language, grammar and setconstraint-based program analysis by abstract interpretation. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture, 1995. Google Scholar
Digital Library
- L. D'Antoni and M. Veanes. Minimization of symbolic automata. In Proceedings of the Symposium on Principles of Programming Languages, 2014. Google Scholar
Digital Library
- L. J. Fogel, A. J. Owens, and M. J. Walsh. Artificial Intelligence through Simulated Evolution. 1966.Google Scholar
Digital Library
- S. Forrest, T. Nguyen, W. Weimer, and C. L. Goues. A genetic programming approach to automated software repair. In Proceedings of the Conference on Genetic and Evolutionary Computation, 2009. Google Scholar
Digital Library
- U. Galassi and A. Giordana. Learning regular expressions from noisy sequences. In Proceedings of the Symposium on Abstraction, Reformulation and Approximation, 2005. Google Scholar
Digital Library
- E. M. Gold. Language identification in the limit. Information and Control, 10(5), 1967.Google Scholar
- M. Goldman, G. Little, and R. C. Miller. Collabode: Collaborative coding in the browser. In Proceedings of theWorkshop on Cooperative and Human Aspects of Software Engineering, 2011. Google Scholar
Digital Library
- A. González-Pardo and D. Camacho. Analysis of grammatical evolutionary approaches to regular expression induction. In Proceedings of the Congress on Evolutionary Computation, 2011.Google Scholar
Cross Ref
- S. Gulwani. Automating string processing in spreadsheets using inputoutput examples. In Proceedings of the Symposium on Principles of Programming Languages, 2011. Google Scholar
Digital Library
- T. Gvero, V. Kuncak, and R. Piskac. Interactive synthesis of code snippets. In Proceedings of the Converence on Computer Aided Verification, 2011. Google Scholar
Digital Library
- P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with Bek. In Proceedings of the USENIX Security Symposium, 2011. Google Scholar
Digital Library
- Y. Inagaki. On synchronized evolution of the network of automata. IEEE Transactions on Evolutionary Computation, 6(2), 2002. Google Scholar
Digital Library
- M. Kearns and L. Valiant. Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM, 41(1), 1994. Google Scholar
Digital Library
- R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18(1), 2009. Google Scholar
Digital Library
- J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. 1992. Google Scholar
Digital Library
- B. Lambeau, C. Damas, and P. Dupont. State-merging DFA induction algorithms with mandatory merge constraints. In Proceedings of the International Colloquim on Grammatical Inference, 2008. Google Scholar
Digital Library
- K. J. Lang. Random DFA's can be approximately learned from sparse uniform examples. In Proceedings of Workshop on Computational Learning Theory, 1992. Google Scholar
Digital Library
- K. J. Lang, B. A. Pearlmutter, and R. Price. Results of the abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In Proceedings of the International Colloquim on Grammatical Inference, 1998. Google Scholar
Digital Library
- T. D. LaToza,W. B. Towne, C. M. Adriano, and A. van der Hoek. Microtask programming: Building software with a crowd. In Proceedings of the Symposium on User Interface Software and Technology, 2014. Google Scholar
Digital Library
- Y. Li, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. V. Jagadish. Regular expression learning for information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008. Google Scholar
Digital Library
- G. Little, L. B. Chilton, M. Goldman, and R. C. Miller. TurKit: tools for iterative tasks on mechanical turk. In Proceedings of the SIGKDD Workshop on Human Computation, 2009. Google Scholar
Digital Library
- S. M. Lucas and T. J. Reynolds. Learning DFA: evolution versus evidence driven state merging. In Proceedings of the Congress on Evolutionary Computation, 2003.Google Scholar
Cross Ref
- S. M. Lucas and T. J. Reynolds. Learning deterministic finite automata with a smart state labeling evolutionary algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7), 2005. Google Scholar
Digital Library
- O. Maler and I. E. Mens. Learning regular languages over large alphabets. In Tools and Algorithms for the Construction and Analysis of Systems, 2014.Google Scholar
Cross Ref
- J. V. Nickerson, Y. Sakamoto, and L. Yu. Structures for creativity: The crowdsourcing of design. In CHI Workshop on Crowd-sourcing and Human Computation, 2011.Google Scholar
- J. Oncina and P. Garcíá. Identifying regular languages in polynomial time. In Advances in Structural and Syntactic Pattern Recognition, Series in Machine Perception and Artificial Intelligence. 1992.Google Scholar
Cross Ref
- A. G. Parameswaran, H. Park, H. Garcia-Molina, N. Polyzotis, and J. Widom. Deco: declarative crowdsourcing. In Proceedings of the Conference on Information and Knowledge Management, 2012. Google Scholar
Digital Library
- L. Pitt and M. K. Warmuth. The minimum consistent DFA problem cannot be approximated within any polynomial. Journal of the ACM, 40(1), 1993. Google Scholar
Digital Library
- R. Poli, W. B. Langdon, and N. F. McPhee. A Field Guide to Genetic Programming. Lulu Enterprises, 2008. Google Scholar
Digital Library
- A. J. Quinn, B. B. Bederson, T. Yeh, and J. Lin. Crowdflow: Integrating machine learning with mechanical turk for speed-cost-quality flexibility. Technical Report HCIL-2010-09, University of Maryland, College Park, 2010.Google Scholar
- R. E. Schapire. The boosting approach to machine learning: An overview. https://www.cs.princeton.edu/courses/ archive/spring07/cos424/papers/boosting-survey.pdf, 2002.Google Scholar
- A. Solar-Lezama. Program sketching. STTT, 15(5--6):475--495, 2013.Google Scholar
- S. Srivastava, S. Gulwani, and J. S. Foster. From program verification to program synthesis. In Proceedings of the Symposium on Principles of Programming Languages, 2010. Google Scholar
Digital Library
- T. E. Uribe and M. E. Stickel. Ordered binary decision diagrams and the davis-putnam procedure. In Proceedings of the Conference on Constraints in Computational Linguistics, 1994. Google Scholar
Digital Library
- M. Veanes and N. Bjørner. Symbolic automata: The toolkit. In Proceedings of the Conference on Tools and Algorithms for the Construction and Analysis of Systems, 2012. Google Scholar
Digital Library
- R. A. Wagner. Order-n correction for regular languages. Communications of the ACM, 17(5), 1974. Google Scholar
Digital Library
Index Terms
Program Boosting: Program Synthesis via Crowd-Sourcing
Recommendations
Program Boosting: Program Synthesis via Crowd-Sourcing
In this paper, we investigate an approach to program synthesis that is based on crowd-sourcing. With the help of crowd-sourcing, we aim to capture the "wisdom of the crowds" to find good if not perfect solutions to inherently tricky programming tasks, ...







Comments