Abstract
Static Analysis tools have rules for several code quality issues and these rules are created by experts manually. In this paper, we address the problem of automatic synthesis of code quality rules from examples. We formulate the rule synthesis problem as synthesizing first order logic formulas over graph representations of code. We present a new synthesis algorithm RhoSynth that is based on Integer Linear Programming-based graph alignment for identifying code elements of interest to the rule. We bootstrap RhoSynth by leveraging code changes made by developers as the source of positive and negative examples. We also address rule refinement in which the rules are incrementally improved with additional user-provided examples. We validate RhoSynth by synthesizing more than 30 Java code quality rules. These rules have been deployed as part of Amazon CodeGuru Reviewer and their precision exceeds 75% based on developer feedback collected during live code-reviews within Amazon. Through comparisons with recent baselines, we show that current state-of-the-art program synthesis approaches are unable to synthesize most of these rules.
Supplemental Material
Available for Download
This supplement to "Synthesizing Code Quality Rules from Examples" contains proofs and additional details about the concepts described in the main paper.
- Mansour Ahmadi, Reza Mirzazade Farkhani, Ryan Williams, and Long Lu. 2021. Finding Bugs Using Your Own Code: Detecting Functionally-similar yet Inconsistent Code. In 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, Michael Bailey and Rachel Greenstadt (Eds.). USENIX Association, 2025–2040. https://www.usenix.org/conference/usenixsecurity21/presentation/ahmadi
Google Scholar
- Miklós Ajtai and Yuri Gurevich. 1994. Datalog vs First-Order Logic. J. Comput. Syst. Sci., 49, 3 (1994), 562–588. https://doi.org/10.1016/S0022-0000(05)80071-6
Google Scholar
Digital Library
- Aws Albarghouthi, Paraschos Koutris, Mayur Naik, and Calvin Smith. 2017. Constraint-Based Synthesis of Datalog Programs. In Principles and Practice of Constraint Programming, J. Christopher Beck (Ed.). Springer International Publishing, Cham. 689–706. isbn:978-3-319-66158-2 https://doi.org/10.1007/978-3-319-66158-2_44
Google Scholar
Cross Ref
- Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In International Conference on Learning Representations. https://openreview.net/forum?id=BJOFETxR-
Google Scholar
- Miltiadis Allamanis, Henry Jackson-Flux, and Marc Brockschmidt. 2021. Self-Supervised Bug Detection and Repair. CoRR, abs/2105.12787 (2021), arXiv:2105.12787. arxiv:2105.12787
Google Scholar
- Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-Guided Synthesis. In Proceedings of the IEEE International Conference on Formal Methods in Computer-Aided Design (FMCAD). 1–17. https://doi.org/10.1109/FMCAD.2013.6679385
Google Scholar
Cross Ref
- Johannes Bader, Andrew Scott, Michael Pradel, and Satish Chandra. 2019. Getafix: Learning to Fix Bugs Automatically. Proc. ACM Program. Lang., 3, OOPSLA (2019), Article 159, Oct., 27 pages. https://doi.org/10.1145/3360585
Google Scholar
Digital Library
- Clark Barrett and Cesare Tinelli. 2018. Satisfiability Modulo Theories. Springer International Publishing, Cham. 305–343. isbn:978-3-319-10575-8 https://doi.org/10.1007/978-3-319-10575-8_11
Google Scholar
Cross Ref
- Rohan Bavishi, Caroline Lemieux, Koushik Sen, and Ion Stoica. 2021. Gauss: Program Synthesis by Reasoning over Graphs. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 134, oct, 29 pages. https://doi.org/10.1145/3485511
Google Scholar
Digital Library
- Rohan Bavishi, Hiroaki Yoshida, and Mukul R. Prasad. 2019. Phoenix: Automated Data-Driven Synthesis of Repairs for Static Analysis Violations. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA. 613–624. isbn:9781450355728 https://doi.org/10.1145/3338906.3338952
Google Scholar
Digital Library
- Kent Beck. 2002. Test Driven Development. By Example (Addison-Wesley Signature). Addison-Wesley Longman, Amsterdam. isbn:0321146530
Google Scholar
- Berkay Berabi, Jingxuan He, Veselin Raychev, and Martin Vechev. 2021. TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer. In Proceedings of the 38th International Conference on Machine Learning, Marina Meila and Tong Zhang (Eds.) (Proceedings of Machine Learning Research, Vol. 139). PMLR, 780–791. https://proceedings.mlr.press/v139/berabi21a.html
Google Scholar
- Pan Bian, Bin Liang, Wenchang Shi, Jianjun Huang, and Yan Cai. 2018. NAR-Miner: Discovering Negative Association Rules from Code for Bug Detection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA. 411–422. isbn:9781450355735 https://doi.org/10.1145/3236024.3236032
Google Scholar
Digital Library
- Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08/ETAPS’08). Springer-Verlag, Berlin, Heidelberg. 337–340. isbn:3540787992
Google Scholar
Digital Library
- Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. 2017. RobustFill: Neural Program Learning under Noisy I/O. CoRR, abs/1703.07469 (2017), arxiv:1703.07469. arxiv:1703.07469
Google Scholar
- Jacob Devlin, Jonathan Uesato, Rishabh Singh, and Pushmeet Kohli. 2017. Semantic Code Repair using Neuro-Symbolic Transformation Networks. CoRR, abs/1710.11054 (2017), arxiv:1710.11054
Google Scholar
- Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. 2020. Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=SJeqs6EFvB
Google Scholar
- Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-Grained and Accurate Source Code Differencing. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE ’14). Association for Computing Machinery, New York, NY, USA. 313–324. isbn:9781450330138 https://doi.org/10.1145/2642937.2642982
Google Scholar
Digital Library
- Kasra Ferdowsifard, Shraddha Barke, Hila Peleg, Sorin Lerner, and Nadia Polikarpova. 2021. LooPy: Interactive Program Synthesis with Control Structures. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 153, oct, 29 pages. https://doi.org/10.1145/3485530
Google Scholar
Digital Library
- Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst., 9, 3 (1987), July, 319–349. issn:0164-0925 https://doi.org/10.1145/24039.24041
Google Scholar
Digital Library
- Xiang Gao, Arjun Radhakrishna, Gustavo Soares, Ridwan Shariffdeen, Sumit Gulwani, and Abhik Roychoudhury. 2021. APIfix: Output-Oriented Program Synthesis for Combating Breaking Changes in Libraries. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 161, oct, 27 pages. https://doi.org/10.1145/3485538
Google Scholar
Digital Library
- Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-Output Examples. SIGPLAN Not., 46, 1 (2011), Jan., 317–330. issn:0362-1340 https://doi.org/10.1145/1925844.1926423
Google Scholar
Digital Library
- Sumit Gulwani and Prateek Jain. 2017. Programming by Examples: PL Meets ML. In Programming Languages and Systems - 15th Asian Symposium, APLAS 2017, Suzhou, China, November 27-29, 2017, Proceedings, Bor-Yuh Evan Chang (Ed.) (Lecture Notes in Computer Science, Vol. 10695). Springer, 3–20. https://doi.org/10.1007/978-3-319-71237-6_1
Google Scholar
Cross Ref
- David Haussler. 1989. Learning Conjunctive Concepts in Structural Domains. Mach. Learn., 4 (1989), 7–40. https://doi.org/10.1007/BF00114802
Google Scholar
Cross Ref
- Ruyi Ji, Jingjing Liang, Yingfei Xiong, Lu Zhang, and Zhenjiang Hu. 2020. Question selection for interactive program synthesis. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 1143–1158. https://doi.org/10.1145/3385412.3386025
Google Scholar
Digital Library
- Ashwin Kalyan, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, and Sumit Gulwani. 2018. Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=rywDjg-RW
Google Scholar
- Patrick Kreutzer, Georg Dotzler, Matthias Ring, Bjoern M. Eskofier, and Michael Philippsen. 2016. Automatic Clustering of Code Changes. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR ’16). Association for Computing Machinery, New York, NY, USA. 61–72. isbn:9781450341868 https://doi.org/10.1145/2901739.2901749
Google Scholar
Digital Library
- Vu Le, Daniel Perelman, Oleksandr Polozov, Mohammad Raza, Abhishek Udupa, and Sumit Gulwani. 2017. Interactive Program Synthesis. CoRR, abs/1703.03539 (2017), arXiv:1703.03539. arxiv:1703.03539
Google Scholar
- Julien Lerouge, Zeina Abu-Aisheh, Romain Raveaux, Pierre Héroux, and Sébastien Adam. 2017. New binary linear programming formulation to compute the graph edit distance. Pattern Recognition, 72 (2017), Dec., 254 – 265. https://doi.org/10.1016/j.patcog.2017.07.029
Google Scholar
Digital Library
- Tao Li, Sheng Ma, and Mitsunori Ogihara. 2004. Entropy-based criterion in categorical clustering. In Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004, Carla E. Brodley (Ed.) (ACM International Conference Proceeding Series, Vol. 69). ACM. https://doi.org/10.1145/1015330.1015404
Google Scholar
Digital Library
- Ziyang Li, Aravind Machiry, Binghong Chen, Mayur Naik, Ke Wang, and Le Song. 2021. ARBITRAR: User-Guided API Misuse Detection. In 2021 IEEE Symposium on Security and Privacy (SP). 1400–1415. https://doi.org/10.1109/SP40001.2021.00090
Google Scholar
Cross Ref
- Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: Automatically Extracting Implicit Programming Rules and Detecting Violations in Large Software Code. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE-13). Association for Computing Machinery, New York, NY, USA. 306–315. isbn:1595930140 https://doi.org/10.1145/1081706.1081755
Google Scholar
Digital Library
- Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. SIGPLAN Not., 51, 1 (2016), jan, 298–312. issn:0362-1340 https://doi.org/10.1145/2914770.2837617
Google Scholar
Digital Library
- Jonathan Mendelson, Aaditya Naik, Mukund Raghothaman, and Mayur Naik. 2021. GENSYNTH: Synthesizing Datalog Programs without Language Bias. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 6444–6453. https://ojs.aaai.org/index.php/AAAI/article/view/16799
Google Scholar
- Na Meng, Miryung Kim, and Kathryn S. McKinley. 2013. LASE: Locating and Applying Systematic Edits by Learning from Examples. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). IEEE Press, 502–511. isbn:9781467330763
Google Scholar
- Anders Miltner, Sumit Gulwani, Vu Le, Alan Leung, Arjun Radhakrishna, Gustavo Soares, Ashish Tiwari, and Abhishek Udupa. 2019. On the Fly Synthesis of Edit Suggestions. Proc. ACM Program. Lang., 3, OOPSLA (2019), Article 143, Oct., 29 pages. https://doi.org/10.1145/3360569
Google Scholar
Digital Library
- Tom M. Mitchell. 1997. Machine Learning. McGraw-Hill, New York. isbn:978-0-07-042807-2
Google Scholar
Digital Library
- Rajdeep Mukherjee, Omer Tripp, Ben Liblit, and Michael Wilson. 2022. Static Analysis for AWS Best Practices in Python Code. In 36th European Conference on Object-Oriented Programming, ECOOP 2022, June 6-10, 2022, Berlin, Germany, Karim Ali and Jan Vitek (Eds.) (LIPIcs, Vol. 222). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 14:1–14:28. https://doi.org/10.4230/LIPIcs.ECOOP.2022.14
Google Scholar
Cross Ref
- Aaditya Naik, Jonathan Mendelson, Nathaniel Sands, Yuepeng Wang, Mayur Naik, and Mukund Raghothaman. 2021. Sporq: An Interactive Environment for Exploring Code Using Query-by-Example. In The 34th Annual ACM Symposium on User Interface Software and Technology (UIST ’21). Association for Computing Machinery, New York, NY, USA. 84–99. isbn:9781450386357 https://doi.org/10.1145/3472749.3474737
Google Scholar
Digital Library
- Ansong Ni, Daniel Ramos, Aidan Z. H. Yang, Inês Lynce, Vasco M. Manquinho, Ruben Martins, and Claire Le Goues. 2021. SOAR: A Synthesis Approach for Data Science API Refactoring. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 112–124. https://doi.org/10.1109/ICSE43902.2021.00023
Google Scholar
Digital Library
- Rumen Paletov, Petar Tsankov, Veselin Raychev, and Martin Vechev. 2018. Inferring Crypto API Rules from Code Changes. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 450–464. isbn:9781450356985 https://doi.org/10.1145/3192366.3192403
Google Scholar
Digital Library
- Rangeet Pan, Vu Le, Nachiappan Nagappan, Sumit Gulwani, Shuvendu K. Lahiri, and Mike Kaufman. 2021. Can Program Synthesis be Used to Learn Merge Conflict Resolutions? An Empirical Analysis. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 785–796. https://doi.org/10.1109/ICSE43902.2021.00077
Google Scholar
Digital Library
- Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program Synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). Association for Computing Machinery, New York, NY, USA. 107–126. isbn:9781450336895 https://doi.org/10.1145/2814270.2814310
Google Scholar
Digital Library
- Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 147, Oct., 25 pages. https://doi.org/10.1145/3276517
Google Scholar
Digital Library
- Mukund Raghothaman, Jonathan Mendelson, David Zhao, Mayur Naik, and Bernhard Scholz. 2019. Provenance-Guided Synthesis of Datalog Programs. Proc. ACM Program. Lang., 4, POPL (2019), Article 62, Dec., 27 pages. https://doi.org/10.1145/3371130
Google Scholar
Digital Library
- Veselin Raychev. 2021. Learning to Find Bugs and Code Quality Problems - What Worked and What not? In 2021 International Conference on Code Quality (ICCQ). 1–5. https://doi.org/10.1109/ICCQ51190.2021.9392977
Google Scholar
Cross Ref
- Oscar Rodriguez-Prieto, Alan Mycroft, and Francisco Ortin. 2020. An Efficient and Scalable Platform for Java Source Code Analysis Using Overlaid Graph Representations. IEEE Access, 8 (2020), 72239–72260. https://doi.org/10.1109/ACCESS.2020.2987631
Google Scholar
Cross Ref
- Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning syntactic program transformations from examples. In Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017, Sebastián Uchitel, Alessandro Orso, and Martin P. Robillard (Eds.). IEEE / ACM, 404–415. https://doi.org/10.1109/ICSE.2017.44
Google Scholar
Digital Library
- Reudismam Rolim, Gustavo Soares, Rohit Gheyi, and Loris D’Antoni. 2018. Learning Quick Fixes from Code Repositories. CoRR, abs/1803.03806 (2018), arxiv:1803.03806. arxiv:1803.03806
Google Scholar
- Evgeniy Ryzhkov. 2011. How to add a new diagnostic rule into PVS-Studio? Days from developers’ life. https://pvs-studio.com/en/blog/posts/0110/
Google Scholar
- Alexander Schrijver. 1986. Theory of Linear and Integer Programming. John Wiley & Sons, Inc., USA. isbn:0471908541
Google Scholar
- Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik. 2018. Syntax-Guided Synthesis of Datalog Programs. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA. 515–527. isbn:9781450355735 https://doi.org/10.1145/3236024.3236034
Google Scholar
Digital Library
- Xujie Si, Mukund Raghothaman, Kihong Heo, and Mayur Naik. 2019. Synthesizing Datalog Programs using Numerical Relaxation. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, Sarit Kraus (Ed.). ijcai.org, 6117–6124. https://doi.org/10.24963/ijcai.2019/847
Google Scholar
Cross Ref
- Aishwarya Sivaraman, Tianyi Zhang, Guy Van den Broeck, and Miryung Kim. 2019. Active Inductive Logic Programming for Code Search. In Proceedings of the 41st International Conference on Software Engineering (ICSE ’19). IEEE Press, 292–303. https://doi.org/10.1109/ICSE.2019.00044
Google Scholar
Digital Library
- Victor Sobreira, Thomas Durieux, Fernanda Madeiral, Martin Monperrus, and Marcelo de Almeida Maia. 2018. Dissection of a bug dataset: Anatomy of 395 patches from Defects4J. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). 130–140. https://doi.org/10.1109/SANER.2018.8330203
Google Scholar
Cross Ref
- Armando Solar Lezama. 2008. Program Synthesis By Sketching. Ph.D. Dissertation. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-177.html
Google Scholar
- Amann Sven and Hoan Anh Nguyen. 2019. MUDetect : An API-Misuse Detector. https://github.com/stg-tud/MUDetect
Google Scholar
- Aalok Thakkar, Aaditya Naik, Nathaniel Sands, Rajeev Alur, Mayur Naik, and Mukund Raghothaman. 2021. Example-Guided Synthesis of Relational Queries. Association for Computing Machinery, New York, NY, USA. 1110–1125. isbn:9781450383912 https://doi.org/10.1145/3453483.3454098
Google Scholar
Digital Library
- Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2018. An Empirical Investigation into Learning Bug-Fixing Patches in the Wild via Neural Machine Translation. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). Association for Computing Machinery, New York, NY, USA. 832–837. isbn:9781450359375 https://doi.org/10.1145/3238147.3240732
Google Scholar
Digital Library
- Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, and Rishabh singh. 2019. Neural Program Repair by Jointly Learning to Localize and Repair. In International Conference on Learning Representations. https://openreview.net/forum?id=ByloJ20qtm
Google Scholar
- Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic Programming by Example with Pre-Trained Models. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 100, oct, 25 pages. https://doi.org/10.1145/3485477
Google Scholar
Digital Library
- Jingbo Wang, Chungha Sung, Mukund Raghothaman, and Chao Wang. 2021. Data-Driven Synthesis of Provably Sound Side Channel Analyses. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 810–822. https://doi.org/10.1109/ICSE43902.2021.00079
Google Scholar
Digital Library
- Junchi Yan, Yu Tian, Hongyuan Zha, Xiaokang Yang, Ya Zhang, and Stephen M. Chu. 2013. Joint Optimization for Consistent Multiple Graph Matching. In 2013 IEEE International Conference on Computer Vision. 1649–1656. https://doi.org/10.1109/ICCV.2013.207
Google Scholar
Digital Library
- Insu Yun, Changwoo Min, Xujie Si, Yeongjin Jang, Taesoo Kim, and Mayur Naik. 2016. APISAN: Sanitizing API Usages through Semantic Cross-Checking. In Proceedings of the 25th USENIX Conference on Security Symposium (SEC’16). USENIX Association, USA. 363–378. isbn:9781931971324
Google Scholar
Index Terms
Synthesizing code quality rules from examples
Recommendations
Synthesizing an instruction selection rule library from semantic specifications
CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and OptimizationInstruction selection is the part of a compiler that transforms intermediate representation (IR) code into machine code. Instruction selectors build on a library of hundreds if not thousands of rules. Creating and maintaining these rules is a tedious ...
Synthesizing entity matching rules by examples
Entity matching (EM) is a critical part of data integration. We study how to synthesize entity matching rules from positive-negative matching examples. The core of our solution is program synthesis, a powerful tool to automatically generate rules (or ...
Rule quality measure-based induction of unordered sets of regression rules
AIMSA'12: Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applicationsThis paper presents the algorithm for induction of unordered sets of regression rules. It uses sequential covering strategy and dynamic reduction to classification approach. The main focus is put on quality measures which control the process of rule ...






Comments