Accurate Architectural Threat Elicitation From Source Code Through Hybrid Information Flow Analysis

Software processes a vast amount of sensitive data. However, tracing information flows in complex programs and eliciting threats, which, for example, could lead to information leaks, pose significant challenges. The problem lies in the absence of suitable approaches to effectively address this issue. Symbolic verification is too restrictive for practical use, taint analysis faces challenges due to overapproximation, and fuzzers can only identify crashes and hangs. In my doctoral research, I introduce an approach for reconstructing and refining information flow graphs in order to elicit threats. Using static analysis, I automatically reconstruct an information flow graph. Subsequently, I refine the found information flows using information flow fuzzing and associate threats through a rule-based system. My approach provides a validated information flow graph of the software and a list of elicited threats.


INTRODUCTION
Given that software routinely handles sensitive data such as passwords, certificates, and personal information, it is essential to integrate security-enhancing methods and assessments into software development.However, we lack generally applicable approaches to investigate information flow-related threats in software.For instance, a threat could be insecure or unintended information flow, leading to potential information leakage.
Though theoretically robust, static information flow analysis and symbolic verification are hard to scale due to their strictness [18].Static taint analysis is restricted to information flows within the program, losing external processed flow information.In contrast, dynamic taint analysis is limited to the explored execution paths [3].Fuzzing methodologies, effective in crash detection, lack inherent capabilities to reveal information flow issues.
In my doctoral research, I will develop an approach focused on reconstructing and refining information flow graphs, enabling the elicitation of threats.I will use static analysis to automatically reconstruct an information flow graph.Addressing the limitations of static analysis, I introduce information flow fuzzing-a dynamic technique aimed at refining identified flows and revealing previously unrecognized ones.Finally, I will automate threat elicitation to find threats.My contribution is to provide a validated information flow graph of the software system and a list of elicited threats.

THREAT ELICITATION AND REFINEMENT
My approach comprises three steps, as illustrated in Figure 1.Firstly, I present a brief overview of the information flow graph.Next, I introduce information flow fuzzing to refine information flows.Subsequently, I detail the reconstruction of the information flow graph.Finally, I describe a rule-based method for threat elicitation based on the validated information flow graph.

Information Flow Graphs
Information flow graphs are abstract representations of software systems consisting of five different types of elements: process, data store, external entity, data flow, and trust boundary.They are used as input for popular threat analysis methods like Linddun [4] and Stride [20].These graphs are referred to as (abstract) data flow graphs, but this frequently results in confusion with detailed lowlevel data flow graphs based on the data exchange between variables [11].Due to their coarse granularity, the abstraction of variables to general information, the scope, and their practical application, I designate the previously mentioned (abstract) data flow graphs as information flow graphs.

Refinement of Information Flows
Information flow fuzzing is an approach I introduce to steer a fuzzer toward identifying information flows between a source and a sink.It is used to validate the statically discovered information flows and to uncover missed ones.My implementation is named FlowFuzz and functions with any coverage-guided fuzzer.It is under review as a registered report at ACM-TOSEM.
As part of information flow fuzzing, I propose an oracle for detecting information flows.It operates as follows: for each input,  Moreover, I present a guidance strategy to explore information flows more effectively.In this strategy, the fuzzer not only strives to maximize coverage but also focuses on inducing changes in data between the two consecutive runs.This is achieved by translating data changes into coverage by comparing the global variables of the two runs.

Reconstruction of Information Flow Graphs
I will automate the process of reconstructing information flow graphs from the source code of a software project to be used as input for threat elicitation.Prior research studies have identified a compelling necessity for automation in this process.This is needed because of its unstructured nature and considerable resource demands [12,22,2].
My objective is to establish a pipeline for the intricate reconstruction task, thus breaking it down into the subtasks as depicted in Figure 1.Most of the challenges will be addressed through static analysis, including detecting external entities, data stores, trust boundaries, and information flows.Additionally, I will incorporate clustering techniques to identify abstract processes and natural language processing-based methods to name the graph elements.

Threat Elicitation
The reconstructed and validated information flow graph will be used to elicit threats, for example, insecure information flows or unencrypted data stores.I will develop an automated, rule-based system by building upon previous research [23], using threat mapping rules from Linddun and Stride.

EVALUATION PLAN
I will investigate the following research questions (RQ): (1) How many and which elements of the information flow graph can be reconstructed and correctly assembled?(2) Can FlowFuzz effectively identify information flows and what is its efficiency based on key metrics such as execution time, invocations, and code coverage?(3) Considering the reconstructed and refined information flow graph, what is the nature and number of threats elicited?
In the registered report on FlowFuzz, I demonstrate its capability by identifying an unintended information flow in OpenSSL caused by Heartbleed [5].Next, I will tackle RQ 2 and evaluate its effectiveness by assessing the information flows of nine other subjects.To address RQ 1 and 3, I create a dataset comprising ten open-source repositories with information flow graphs and threat models, manually supplementing any missing artifacts.For RQ 1, I will employ metrics such as Precision-Recall [1], EdgeSim [15], and MeCL [15] to facilitate a comparison between the predicted and ground truth information flow graphs.Concerning RQ 3, I will compare the elicited threats against the corresponding threat model.In all RQs, I will compare my approach against suitable state-of-the-art methods.

RELATED WORK
Automated Threat Analysis: While several research papers focus on automating threat analysis [23,19], none explicitly tackle the reconstruction of graphs required for analysis.Jamil et al. attempt to address this aspect in their work [10] but do not provide results due to major runtime problems.Software Architecture Reconstruction: Research mainly focuses on the clustering of modules and component diagrams [21,13], which are much more abstract than information flow graphs.Natural Language Processing: There are approaches to predict module [8] or method names [17] that could be adapted to suggest names for the graph elements.Information Flow: Static analysis and symbolic verification are applied to show the absence of inference, but this is often too strict to be practical [18].However, in testing, the objective is to demonstrate the presence of the flow, as exemplified in the case of MUTAFLOW [14] through mutations.Data Flow: Taint analysis can be used to track data flow, which approximates the information flow [11].Static methods [24,6] are limited to the program under test, and dynamic methods [16,7] can only reason about the actually explored execution paths.These limitations can lead to overapproximation [3].Fuzzing: FlowFuzz extends the capabilities of fuzzing approaches like evolutionary fuzzing [25] or analysis-based fuzzing [9] to include the validation of information flows.

SUMMARY
I will develop an approach that enables eliciting threats based on a reconstructed and refined information flow graph.This involves using static analysis to automatically reconstruct the information flow graph of a software project.By introducing information flow fuzzing and implementing FlowFuzz, I demonstrated promising results in refining information flows.

Figure 1 :
Figure 1: My approach: Information flow graph reconstruction, dynamic refinement, and threat elicitation