Abstract
While unstructured merge tools rely only on textual analysis to detect and resolve conflicts, semistructured merge tools go further by partially exploiting the syntactic structure and semantics of the involved artifacts. Previous studies compare these merge approaches with respect to the number of reported conflicts, showing, for most projects and merge situations, reduction in favor of semistructured merge. However, these studies do not investigate whether this reduction actually leads to integration effort reduction (productivity) without negative impact on the correctness of the merging process (quality). To analyze that, and better understand how merge tools could be improved, in this paper we reproduce more than 30,000 merges from 50 open source projects, identifying conflicts incorrectly reported by one approach but not by the other (false positives), and conflicts correctly reported by one approach but missed by the other (false negatives). Our results and complementary analysis indicate that, in the studied sample, the number of false positives is significantly reduced when using semistructured merge. We also find evidence that its false positives are easier to analyze and resolve than those reported by unstructured merge. However, we find no evidence that semistructured merge leads to fewer false negatives, and we argue that they are harder to detect and resolve than unstructured merge false negatives. Driven by these findings, we implement an improved semistructured merge tool that further combines both approaches to reduce the false positives and false negatives of semistructured merge. We find evidence that the improved tool, when compared to unstructured merge in our sample, reduces the number of reported conflicts by half, has no additional false positives, has at least 8% fewer false negatives, and is not prohibitively slower.
Supplemental Material
Available for Download
- B. Adams and S. McIntosh. 2016. Modern Release Engineering in a Nutshell – Why Researchers Should Care. In Proceedings of the 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER’16) . IEEE. Google Scholar
Cross Ref
- Sven Apel and Christian Lengauer. 2008. Superimposition: A Language-independent Approach to Software Composition. In Proceedings of the 7th International Conference on Software Composition (SC’08) . Springer-Verlag. Google Scholar
Cross Ref
- Sven Apel, Olaf Lessenich, and Christian Lengauer. 2012. Structured Merge with Auto-tuning: Balancing Precision and Performance. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE’12). ACM. Google Scholar
Digital Library
- Sven Apel, Jörg Liebig, Benjamin Brandl, Christian Lengauer, and Christian Kästner. 2011. Semistructured Merge: Rethinking Merge in Revision Control Systems. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE’11) . ACM. Google Scholar
Digital Library
- Taweesup Apiwattanapong, Alessandro Orso, and Mary Jean Harrold. 2007. JDiff: A Differencing Technique and Tool for Object-oriented Programs. Automated Software Engineering (2007).Google Scholar
- Daniel M. Berry. 2017. Evaluation of Tools for Hairy Requirements Engineering and Software Engineering Tasks. (2017). https://cs.uwaterloo.ca/~dberry/FTP_SITE/tech.reports/EvalPaper.pdfGoogle Scholar
- Valdis Berzins. 1986. On merging software extensions. Acta Informatica (1986).Google Scholar
- Valdis Berzins. 1994. Software Merge: Semantics of Combining Changes to Programs. ACM Transactions on Programming Languages and Systems (1994).Google Scholar
Digital Library
- David Binkley, Susan Horwitz, and Thomas Reps. 1995. Program Integration for Languages with Procedure Calls. ACM Transactions on Software Engineering and Methodology (1995).Google Scholar
- Christian Bird, Peter C. Rigby, Earl T. Barr, David J. Hamilton, Daniel M. German, and Prem Devanbu. 2009. The Promises and Perils of Mining Git. In Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories (MSR’09) . IEEE. Google Scholar
Digital Library
- Christian Bird and Thomas Zimmermann. 2012. Assessing the Value of Branches with What-if Analysis. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE’12) . ACM. Google Scholar
Digital Library
- Yuriy Brun, Reid Holmes, Michael D. Ernst, and David Notkin. 2011. Proactive Detection of Collaboration Conflicts. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE’11) . ACM. Google Scholar
Digital Library
- Jim Buffenbarger. 1995. Syntactic Software Merging. In Selected Papers from the ICSE SCM-4 and SCM-5 Workshops, on Software Configuration Management . Springer-Verlag. Google Scholar
Cross Ref
- Guilherme Cavalcanti, Paola Accioly, and Paulo Borba. 2015. Assessing Semistructured Merge in Version Control Systems: A Replicated Experiment. In Proceedings of the 9th International Symposium on Empirical Software Engineering and Measurement (ESEM’15) . ACM. Google Scholar
Cross Ref
- Guilherme Cavalcanti, Paola Accioly, and Paulo Borba. 2017. Online Appendix of the paper Evaluating and Improving Semistructured Merge. Hosted on https://spgroup.github.io/s3m . (2017).Google Scholar
- Danny Dig, Can Comertoglu, Darko Marinov, and Ralph Johnson. 2006. Automated Detection of Refactorings in Evolving Components. In Proceedings of the 20th European Conference on Object-Oriented Programming (ECOOP’06). SpringerVerlag. Google Scholar
Digital Library
- Danny Dig, Kashif Manzoor, Ralph E. Johnson, and Tien N. Nguyen. 2008. Effective Software Merging in the Presence of Object-Oriented Refactorings. IEEE Transactions of Software Engineering (2008).Google Scholar
- John Eng. 2003. Sample size estimation: how many individuals should be studied? Radiology (2003).Google Scholar
- Judith E. Grass. 1992. Cdiff: A Syntax Directed Differencer for C++ Programs. In Proceedings of the USENIX C++ Conference. USENIX Association.Google Scholar
- Fergus Henderson. 2017. Software Engineering at Google. CoRR (2017).Google Scholar
- Susan Horwitz, Jan Prins, and Thomas Reps. 1989. Integrating Noninterfering Versions of Programs. ACM Transactions on Programming Languages and Systems (1989).Google Scholar
- Daniel Jackson and David A. Ladd. 1994. Semantic Diff: A Tool for Summarizing the Effects of Modifications. In Proceedings of the International Conference on Software Maintenance (ICSM’94) . IEEE. Google Scholar
Cross Ref
- Bakhtiar Khan Kasi and Anita Sarma. 2013. Cassandra: Proactive Conflict Minimization Through Optimized Task Scheduling. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). IEEE. Google Scholar
Cross Ref
- Sanjeev Khanna, Keshav Kunal, and Benjamin C. Pierce. 2007. A Formal Investigation of Diff3. In Proceedings of the 27th International Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS’07) . Springer-Verlag. Google Scholar
Digital Library
- Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady .Google Scholar
- Guido Malpohl, James J. Hunt, and Walter F. Tichy. 2000. Renaming detection. In Proceedings of the 15th IEEE International Conference on Automated Software Engineering . IEEE. Google Scholar
Cross Ref
- Gleiph Menezes. 2016. On the Nature of Software Merge Conflicts. Ph.D. Dissertation. Federal Fluminense University.Google Scholar
- T. Mens. 2002. A State-of-the-Art Survey on Software Merging. IEEE Transactions on Software Engineering (2002).Google Scholar
Digital Library
- Meiyappan Nagappan, Thomas Zimmermann, and Christian Bird. 2013. Diversity in Software Engineering Research. In Proceedings of the 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’13) . ACM. Google Scholar
Digital Library
- T. N. Nguyen. 2006. Object-Oriented Software Configuration Management. In Proceedings of the 22th International Conference on Software Maintenance (ICSM’06) . IEEE. Google Scholar
Digital Library
- João Gustavo Prudêncio, Leonardo Murta, Cláudia Werner, and Rafael Cepêda. 2012. To Lock, or Not to Lock: That is the Question. Journal of Systems and Software (2012).Google Scholar
- Rafael Santos and Leonardo Murta. 2012. Evaluating the Branch Merging Effort in Version Control Systems. In Proceedings of the 26th Brazilian Symposium on Software Engineering (SBES’12) . IEEE. Google Scholar
Digital Library
- Bernhard Westfechtel. 1991. Structure-oriented Merging of Revisions of Software Documents. In Proceedings of the 3rd International Workshop on Software Configuration Management (SCM’91) . ACM. Google Scholar
Digital Library
- Thomas Zimmermann. 2007. Mining Workspace Updates in CVS. In Proceedings of the 4th International Workshop on Mining Software Repositories (MSR’07) . IEEE. Google Scholar
Digital Library
Index Terms
Evaluating and improving semistructured merge
Recommendations
Structured merge with auto-tuning: balancing precision and performance
ASE '12: Proceedings of the 27th IEEE/ACM International Conference on Automated Software EngineeringSoftware-merging techniques face the challenge of finding a balance between precision and performance. In practice, developers use unstructured-merge (i.e., line-based) tools, which are fast but imprecise. In academia, many approaches incorporate ...
Semistructured merge: rethinking merge in revision control systems
ESEC/FSE '11: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineeringAn ongoing problem in revision control systems is how to resolve conflicts in a merge of independently developed revisions. Unstructured revision control systems are purely text-based and solve conflicts based on textual similarity. Structured revision ...
The impact of structure on software merging: semistructured versus structured merge
ASE '19: Proceedings of the 34th IEEE/ACM International Conference on Automated Software EngineeringMerge conflicts often occur when developers concurrently change the same code artifacts. While state of practice unstructured merge tools (e.g. Git merge) try to automatically resolve merge conflicts based on textual similarity, semistructured and ...






Comments