On the Effects of Program Slicing for Vulnerability Detection During Code Inspection: Extended Abstract

[Background]: Slicing has been first introduced to support debugging as a fault localization technique. Yet, program slicing as support for identifying vulnerabilities during code inspection has received limited attention. [Aims]: Evaluate the effectiveness of slicing as a general concept to support code inspectors while detecting vulnerabilities into source code. [Method]: We designed a controlled experiment which goal is identifying the vulnerable lines in original or sliced Java files from Apache Tomcat. The designed treatments differ in the pair (Vulnerability, Original/Sliced file) with a balanced design with four vulnerabilities from the OWASP Top 10. The participants are MSc students attending security courses (n = 236). [Observations]: By using a notion of neighborhood based on the context size of the command git diff we observed that slicing helps in ‘finding something’ as opposed to ‘finding nothing’. However, once some correct lines have been found, analyzing a slice and analyzing the original file are statistically equivalent.


RESEARCH PROBLEM
The notion of slicing was proposed by Weiser in 1979 [12].The concept is to extract program parts based on some special criteria to improve further processing.Besides being used as fault localization technique [13], recent approaches used slicing for detecting vulnerable lines of code [9,10].
Vulnerabilities are hard to find and giving developers a slice of the original file (for example based on the outputs of a static analyzer or a ML algorithm) may help to find them.This seems obvious: a sliced program is by construction shorter than an original program and thus finding something by inspection should be easier as well.However, the obvious might not necessarily be always true.
Hence, we first investigate whether slicing is supportive for developers while inspecting code for identifying vulnerabilities.Since vulnerabilities are widely different, we also investigated whether some kinds of vulnerabilities are easier to detect during code inspection.To complete this study there is an underlying methodological design decision to establish a ground truth: what does it mean for a human inspector to identify a vulnerable fragment in a file spanning a thousand code lines?Since vulnerable fragments are typically small and involve few lines in otherwise large files (e.g.[8,4]) the definition of how to match the lines identified by the inspector and the lines identified by the security expert is critical.

BACKGROUND
Several studies proposed and improved tools to support developers in vulnerability detection [3,11].Besides identifying vulnerabilities, organizations are adopting modern code inspection techniques [7] trying to reduce the time spent on code inspection.Rather than working on the assessment of tools to identify vulnerabilities, in our study we focus on code inspections by human assessors.
From the perspective of supporting debugging, several studies proposed program slicing as a fault localization technique [2] and different algorithms have been proposed to compute slices.Moreover slicing program has been used in recent approaches for detecting vulnerable lines of code [9,10].Controlled experiments are often used to empirically evaluate and study new techniques and several papers have experimentally assessed new solutions to support developers on debugging and fault localization [6,5,14,1].In our study we designed a controlled experiment to investigate the effectiveness of slicing in supporting the developers during code inspection.

APPROACH
We designed a controlled experiment with 6 different groups, each of which to be tested for the cases when no vulnerable lines have been found and when some vulnerable lines have been found.Each participant would then be randomly assigned to execute four different assessments, each described into a square in Figure 1.Experiment.The participants were tasked with identifying the vulnerabilities into non-runnable Java code.They could inspect the code using the IDE they were more comfortable with.We collected the answers through a Qualtrics survey, the participants had to write the number of the vulnerable lines identified for each file.Each participant inspected four different files (one of each type of vulnerability), and two of them were the original files, and the other two a slice of the original files.Ground Truth Determination.To allow for a small margin of error by the participants, we consider the identified vulnerable line as correct whenever it falls in the same 'neighborhood'.Choosing a neighborhood  too big would lead to insignificant results.Too small would mean that nobody got any result even if, for practical purposes, the participants essentially identified the code region where the vulnerability was.We used the same value for the command git diff 1 as the value of  neighborhood, which is  = 3.

PRELIMINARY FINDINGS
Slicing effectiveness.We did some preliminary analysis by comparing the two main groups: original files vs. sliced files.Therefore, we analyzed whether inspecting a sliced files helps in identifying more vulnerabilities compared to inspecting the entire file.We found that slicing is only useful for 'finding something' as opposed to 'finding nothing'.Once participants find something (the participants that identified at least one vulnerable line) there is no difference between inspecting the original file or a slice. 1 https://git-scm.com/docs/git-diffSlicing usefulness for each vulnerability type.We also analyzed whether slicing intervention is more effective to identify certain types of vulnerabilities rather than others.Again the only difference between the types of vulnerabilities is whether we 'find something' or we 'find nothing'.However, once we find the regions where the vulnerability lies, there is no difference in the number of identified lines between slicing and original files.

CONCLUSIONS
We present a controlled experiment to investigate whether slicing intervention supports developers in detecting security vulnerabilities during code inspection.By using a notion of neighborhood based on the context size of the command git diff we found that slicing helps in 'finding something' as opposed to 'finding nothing'.However, we also found that once 'some' correct lines have been identified, slicing makes no significant difference in the number of correctly identified fragments.
More experiments with more vulnerability types are needed to determine whether this result is due to the different vulnerability types or just to our choices (e.g. using Java as the language of choice).Still, we believe that these results are promising, and open new exploring directions.Such experiment could be also replicated in other contexts as within companies with professional developers.
Training.The first part of the experiment consists of a training phase on the detection of vulnerabilities in source code.Its purpose is to fill the security knowledge gaps of the participants since not 368 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.ICSE-Companion '24, April 14-20, 2024, Lisbon, Portugal Papotti et al.

Figure 1 :
Figure 1: Steps of our experiment