Tracking Assets in Source Code with Security Annotations

Small and medium enterprises (SMEs) that build individualized software require lightweight solutions to trace cybersecurity concerns across the codebase. This includes tracking where potentially vulnerable assets are handled in the codebase. The solution that provides this tracking should be fully integrated into the developers' workflow and should be usable by developers who are not cybersecurity experts. To address this need, we propose Security Annotations, which can be added to any codebase regardless of programming language and allows linking blocks of code, functions, or single statements with assets. In order to use the main functionality of the Security Annotations an asset catalog of sufficient quality is needed. These assets can either be identified upfront or while annotating. We conducted a preliminary evaluation in which four pairs of developers created an asset catalog for a legacy software system and then annotated the code using Security Annotations. All groups successfully identified assets in a code base largely unknown to them. We also found that the annotation patterns differed between pairs but that there were significant overlaps. The workload of identifying assets and performing annotations was demanding, but feasible.


INTRODUCTION
A common approach to developing secure software is to start by identifying a system's assets, performing a threat and risk analysis, and creating security requirements that describe countermeasures for threats that are deemed too risky.Such an approach is suggested, e.g., by Microsoft's Secure Development Lifecycle (SDL) [5] or OWASP's Software Assurance Maturity Model [8].In addition, in order to show compliance with modern security standards such as ISO 21434 [4] in the automotive domain or ISO 14971 [3] in the medical domain, it is necessary to demonstrate traceability of the implementation of the security requirements in the architecture and the code and to verify and validate that the requirements have been met.This means that a chain of evidence (see, e.g., [9] for how this term is used in safety assurance) needs to be established to show that a cybersecurity risk has been adequately addressed.As SMEs have limited resources and therefore often lack dedicated  IT security experts [1], an approach is needed that can be used by developers without specific knowledge of cybersecurity.

SECURITY ANNOTATIONS
We introduce the novel concept of Security Annotations, a lightweight approach to link assets with the code segments that handle them.Once assets are identified and linked to the source code, it is also possible to connect other artifacts such as potential vulnerabilities 1 or security requirements with the assets and the code base, but this is left for future work.Security Annotations were designed to reduce the barrier of entry and to integrate into the development workflow as much as possible to ensure adoption by developers.
Assigning assets and potential vulnerabilities to a specific section of code creates an indirect link between them that can be used in the risk management process, providing full traceability.This established part of the chain of evidence which connects assets, threats, risks, security requirements, security mitigations, and security tests is required to demonstrate compliance.
Using Security Annotations, the developer has two tasks: 1) to define a catalog of the potentially vulnerable assets; and 2) to annotate the source code, either all at once or one segment at a time.The annotation task is supported by a Code Annotation Tool 2 which integrates into JetBrains IDEs.Annotations in the code have a certain scope: they can annotate an arbitrary block of code by using a start and end marker, a single function, or an individual line of code.Security Annotations are programming language agnostic.Technical Description.An important characteristic of source code is its continuous evolution during development.To minimize the overhead of maintaining annotations, they need to integrate well with an ever-changing code base and ideally adapt to evolving risk analysis requirements.This is the main reason for keeping code annotations and meta-data such as assets and vulnerabilities separate in our implementation.Annotations are as close to the code as possible to ensure seamless integration and long-term traceability, in particular considering the capabilities of modern version-control systems.As annotations are part of the code, developers see them and are aware of them and thus have a reminder to update and maintain them as well.Meta-data, on the other hand, is stored in a database, as that prevents polluting the source code with additional information and allows updating and changing the meta-data without changing the code base.
The database contains all the additional information stored, including linked assets, linked potential vulnerabilities, the content hash, and git commit.The Code Annotation Tool uses git commits to create a history of the annotations.This history represents the evolution of the annotated segment, and to keep track of changes within the segment, or of the actuality of threats and their respective security measures.Annotations and meta-data are linked using a global unique identifier, which is generated by Code Annotation Tool when the annotation is added to the source code.

PRELIMINARY EVALUATION
As a first step to evaluate our hypothesis that developers are able to perform the two steps of identifying assets and annotating the code base even with little knowledge and guidance, we conducted an experiment in which three pairs of developers from an SME went through 42.090 lines of code and independently created an asset catalog as well as code annotations.We evaluated the resulting asset catalog and code annotations against a catalog and set of annotations created by the authors, who are familiar with the code.
In the workshop, the different groups were given only a basic introduction to assets.However, during the workshop they themselves developed a sense of many deeper concepts of assets such as criticality [7], granularity (in the form of a hierarchical structure as used by Touhiduzzaman et al. [12] and The European Network and Information Security Agency (ENISA) [11]) and the distinction between primary and secondary assets Farzana et al. [2], Sterbak et al. [10].All groups also structured their assets hierarchically, independently of each other and without specific instructions.
We find that participants and researchers have significant overlap in their identified assets.All groups agreed on the most relevant assets, that were manipulated in the code, such as sensitive user data in a single round of identification without review or feedback.This suggests that developers can replace experts in identifying assets on a legacy code base in SMEs.
For the Security Annotations, the developers used various strategies.Mostly, they systematically traversed the code, starting at entry points and following function chains.Along the way they refined their understanding of criticality, demonstrating a learning curve.The teams also differed in their sense of criticality and the size of the sections they annotated.Feedback from the developers also shows that teamwork is beneficial for asset identification, while a single person is sufficient for annotation.This makes it even more important to develop a common understanding of criticality within the organization to facilitate high-quality annotations.
As Security Annotations are intended to save resources by allowing for an efficient way of providing a chain of evidence, the initial investment required by an SME must be low.In this workshop, we tested the workload for a project with approx.42k lines of code.Each group took less than 4.5 hours to complete the first round of asset identification.As for the annotation task, two groups finished in the given time frame of 9 hours, while one team progressed through 70% of the code base.This means that even with further rounds of review and refinement, the assets can be identified and annotated within a common iteration length in an iterative development method.

FUTURE WORK
In future work, we will provide a more detailed and nuanced analysis of our evaluation.We will also extend Security Annotations to include potential vulnerabilities as well as security requirements to broaden the scope of the approach and to include further securityrelated artifacts into the tracing and therefore into the chain of evidence.This also allows us to approach the annotation task from another direction and convert the output of static code analysis to Security Annotations.Furthermore, we will test if AI can further assist developers in the annotation task to lower the entry barrier.We also plan to investigate how Security Annotations can be used to semi-automatically derive parts of security assurance cases [6].

Figure 1 :
Figure 1: Example of an annotated code section.The Security Annotation marks function.It consists of a comment containing the annotation type and a UUID.