Obfuscation-Resilient Software Plagiarism Detection with JPlag

The rise of automated obfuscation techniques challenges the wide-spread assumption that evading a software plagiarism detector requires more effort than completing programming and modeling assignments in computer science education. This threatens pla-giarism detectors without comprehensive obfuscation resilience and, ultimately, academic integrity. This paper summarizes recent enhancements of JPlag, a widely-used software plagiarism detector, enabling it to achieve broad resilience against automated obfuscation. The findings demonstrate that JPlag significantly outperforms the state-of-the-art in terms of obfuscation resilience.


INTRODUCTION
Plagiarism is widespread in computer science education, and students often try to obfuscate their plagiarism, e.g., by renaming, reordering, or insertion [12].Manual inspection, however, is impractical due to larger class sizes.Software plagiarism detectors have been employed to address this issue [2,8].JPlag [8], in use and maintained since 1996, is one of the most used detectors worldwide.These detectors ensure that it requires more effort and skill to evade detection than completing the actual assignment [4].Yet, this paradigm is challenged by the emergence of automated obfuscation attacks [9].Mossad [4], e.g., is a plagiarism generator that repeatedly inserts statements into a plagiarized program without changing its behavior.It is unclear what other automated obfuscation attacks exist [3], requiring reevaluating existing detection methods.This paper presents enhancements of JPlag with the recently released version 5.0, enabling it to achieve broad resilience against  automated obfuscation attacks.Thus, JPlag is not only a reliable, open-source 1 , and GDPR-compliant software plagiarism detector with an intuitive result visualization, but it also outperforms the state-of-the-art regarding obfuscation resilience (see Figure 1).In the following, we first introduce JPlag and its capabilities.Then, we discuss the latest research on obfuscation resilience.

PLAGIARISM DETECTION WITH JPLAG
JPlag compares the structure of programs to identify suspicious candidates [8].While it ranks these candidates and highlights outliers, instructors make the final decision.The result visualization of JPlag includes features such as histograms, clustering, code comparison, and anonymization, enhancing its ease of use and report visualization.It also provides a side-by-side view to explain how similarity scores are calculated.Thus, JPlag allows for ethical academic misconduct investigations by combining automated analysis with human judgment.While JPlag was first developed in 1996, it is actively maintained and a subject of active research.It currently supports 17 languages.Recent additions include the support of modeling languages [11], thus addressing plagiarism in modeling assignments [12].JPlag scales well, facilitating plagiarism detection even in large courses.Internally, JPlag works as depicted in Figure 2. First, it parses the program, extracting a subset of the parse tree nodes as tokens, thus linearizing the tree [9].Second, it uses newly introduced normalization techniques [9,10] to achieve obfuscation resilience.Third, it conducts pairwise comparisons (using GST with KR matching [13]) to identify matching sections.After that, it applies post-processing for both obfuscation resilience and clustering calculation.Fifth, JPlag calculates a similarity score for each pair based on the matches.Finally, the results are visualized in a human-readable way.

ACHIEVING OBFUSCATION RESILIENCE
As JPlag is token-based, it abstracts from the underlying program and thus inherits resilience against some obfuscation attacks, including renaming, retyping, and obscuring values [8].Furthermore, this includes all lexical [8] attacks.However, for broad obfuscation resilience, we introduce three defense mechanisms.Token sequence normalization via token normalization graphs (TNGs) [9] is specifically tailored to counter insertion-and reordering-based obfuscation attacks on programming assignments.After the tokenization, the mechanism constructs a TNG for each program, a language-independent version of a PDG for token sequences.The TNG enables the removal of dead tokens and establishes a deterministic token order.Thus, it generates a normalized token sequence invariant to insertion-based and reordering-based obfuscation.Model subtree reordering [10] is tailored for assignments involving tree-based modeling artifacts, such as EMF models.This mechanism counters reordering attacks, common for modeling assignments [12].Reordering attacks are more straightforward for them since, for code, the statement sequence strongly impacts the behavior.To counteract this, we normalize the order of the nodes in the model tree.This is done through a multi-step algorithm, including a token distribution of a given subtree, its interpretation as a coordinate space, and a nearest neighbor path calculation.For any obfuscation attack to be effective against token-based plagiarism detectors, it must alter the token sequence to interfere with the subsequence matching.Subsequence merging [6] heuristically combines pairs of neighboring matches, thus reverting the effects of obfuscation attacks.Subsequence merging is language-independent and attack-type-independent and thus achieves broad resilience.As this defense mechanism solely operates on the match subsequences, it is independent of the language of the programs and can be applied for both modeling and programming assignments.

EVALUATION
To demonstrate the obfuscation resilience of JPlag, we evaluate it with a Java dataset from PROGpedia [7] and generate plagiarism based on reordering and insertion [9].We compare the results with MOSS [1] and Dolos [5].For all three detectors, we use default parameters.Dolos only supports single file submissions, thus limiting the choice of datasets.MOSS does not return all comparisons, only the ones with the highest scores.JPlag has neither of these limitations.Figure 1 shows the evaluation results.For each tool, we show the similarity distribution of the plagiarism instances (higher is better) and the unrelated originals (lower is better).For JPlag, note that the plagiarism instances have a median similarity of 100%.Thus, JPlag significantly outperforms Dolos with a median of 26.7% and MOSS with a median of 7.5%.Moreover, JPlag achieves low similarity scores for unrelated originals with a median similarity of 5.8%.Dolos achieves 14.3%, while MOSS achieves 4.5%.A high separation between the plagiarism instances and the unrelated originals allows straightforward detection and is thus preferable.For JPlag, we observe no overlap, with a median delta of 94.2 percentage points (pp).We observe an overlap and a median delta of only 12.4pp for Dolos.We observe the most substantial overlap and the lowest median delta with 3.0pp for MOSS.Thus, reliably differentiating plagiarism from the original is impaired for Dolos and impossible for MOSS.These results can be explained by Dolos and MOSS not providing any defense against automated obfuscation attacks.In practice, reordering and inserting statements is a viable attack for Dolos and MOSS but no longer for JPlag.

Figure 1 :
Figure 1: Results of JPlag, Dolos, and MOSS for PROGpedia-19 dataset with plagiarism instances automatically obfuscated through statement reordering and insertion.