Modelling Indicators of Behaviour for Cyber Threat Hunting via Sysmon

Hunting for threats is of capital importance for security teams. Establishing multifaceted contexts around the evolving behaviours of threat actors is paramount for enabling threat hunting teams to tell the malicious from the benign. The MITRE ATT&CK framework is the state-of-art knowledge base for referencing how threat actors conduct their tactics, techniques and procedures. Despite the abstract concepts of techniques being well defined, it is challenging to hunt from an abstract technique concept to security event data. In this work, we develop a data driven knowledge base of threat actor behaviours called Indicators of Behaviour, that use semantic reasoning to infer threat actor behaviours. Unlike generalised techniques in MITRE ATT&CK, these behaviours can be queried from a low level indicator and the behaviour itself. We use MITRE’s Caldera platform to emulate threat actor behaviours and Sysmon for capturing security events and defining the knowledge base’s semantics. By utilising this approach, the semantic reasoner aids threat hunting teams by inferring threat actor behaviour chains from individual interconnected events.


INTRODUCTION
The proliferation of increasingly advanced cyber-attacks against organisations and critical infrastructure presents scalability problems for cyber defence responders.The scale of security log data handled by large organisations and the cost of manual iterative analysis by humans make it challenging to do effective response and proactive threat hunting in cyber-relevant time.On average, companies take more than 200 days to identify and contain a data breach [11].The natural approach to face this challenging situation is to let automated security systems aid in reducing the time taken to analyse and contain incidents.
Currently, investigating and reconstructing attacks is mostly a manual and retroactive process that typically is time consuming, cumbersome and subject to individual analyst bias.
Although automated intrusion detection mechanisms generate alerts for investigation and to some extent alleviate the burden of human analytics, such alerts only reflect single events without a holistic overview of multiphase attacks.Moreover, these detection mechanisms are reactive in the sense that an alert is triggered each time an observed indicator matches a predefined known attack indicator.To only rely on signature based detection of known attack indicators is inadequate when faced with adaptable threat actors.Anomaly detection based on (un)supervised learning of system behaviour is an alternative approach, but which lack context and is prone to high false positive rates, creating overhead for security analysts due to the volume of security alerts and the need to finetune algorithms.
Threat hunting is an approach that complements security monitoring.Threat hunting is a proactive hypothesis-driven approach approach in searching for a series of attack patterns, encompassing a threat actor's tactics, techniques and procedures (TTPs) that can be observed within an organisational network.Threat hunting aims to indicate the presence of an attack and identification of a threat actor within the organisational network.
A threat actor is an individual, a group, an organisation or a nation state.Each threat actor has its own motivations, methodologies and behaviours that drive its attacks and that can be detected and derived.
This research aims to develop a model and an approach for context enrichment and threat hunting to enhance cyber-situational awareness.Our approach is to construct an ontology on top of Sysmon event data, accompanied by cyber threat intelligence (CTI) and Windows domain knowledge.By utilising an ontology, it will be possible to conduct reasoning and inference across elements of the ontology to generate context on individual alerts.Furthermore, this ontology applies temporal characteristics to event data to analyse dynamic attack behaviours.
We contribute to the field of cyber threat hunting by providing a semantic representation of event data as a chain of events, representing event chains as a set of behaviours and how these behaviours can be queried at a higher level of abstraction.Similarly, we supplement the MITRE ATT&CK framework by providing a methodology for representing queriable procedures in event data captured by Sysmon.
The paper is structured as follows -Section 2 provides an overview of the problem and several domain concepts.Section 3 explains how the indicators of behaviour are modelled.Section 4 explains how the research was conducted.Section 5 evaluates the implementation.Section 6 analyses and compares the related work.Finally Section 7 discusses some challenges and future work.

BACKGROUND
Tactics, techniques and procedures (TTPs) represent methodologies used by a threat actor.A particular set of TTPs describe a security event or type of security event that can be searched for.For a threat actor it is more difficult to change TTPs compared to lowlevel artifacts such as a hash value or IP address.David Bianco introduced the pyramid-of-pain model from the perspective of a threat actor, where you as a defender can give them great pain if you can detect their methodologies rather than only detecting low level indicators such as hash values or IP addresses [6].Hence, to be able to hunt for and detect TTPs is of great advantage for the defender.
MITRE ATT&CK for Enterprise is a knowledge base of adversarial techniques.This framework provides descriptions and context to offensively orientated adversarial tactics and techniques.The framework forms a basis for our research to bridge the gap between the low level indicators, such as hashes and IP addresses, and the high level abstractions of adversarial tactics and techniques and ultimately their behaviours.However, MITRE ATT&CK Enterprise provides contextual information on individual techniques, but does not describe how these techniques can be chained together as a series to represent behaviour.Our hypothesis is that detection based on behaviour will reduce the false-positive rate of alarms, and will give defenders a more holistic overview of adversarial behaviours.An example of the limitation of MITRE ATT&CK Enterprise is technique T1059.001-Command and Scripting Interpreter: PowerShell which is a subtechnique of T1059 -Command and Scripting Interpreter.T0159.001contains the following excerpt: Examples include the Start-Process cmdlet which can be used to run an executable and the Invoke-Command cmdlet which runs a command locally or on a remote computer (though administrator permissions are required to use PowerShell to connect to remote systems).PowerShell may also be used to download and run executables from the Internet, which can be executed from disk or in memory without touching disk.[19].
The excerpt provides two descriptions of low level behaviours which on their own are not necessarily malicious.Hence, an analyst would need additional context for further actions correlated to that process.Similarly, 'PowerShell´is rather vague as a behaviour.In the detection configuration logic for Hartong's Sysmon Modular 1 , identification of the technique T1059.001depends on the following conditions: • Image Name = powershell.exe• Image Name = powershell_ise.exeWhilst it is true that any event instance with these image names will most likely be Powershell, it says very little about what this actually is, thus prompting the need for additional context.MITRE provides example software instances that are known to utilise Powershell, such as POWERTON with a set of techniques that POWER-TON uses and some generalised context surrounding the techniques 1 https://github.com/olafhartong/sysmon-modular/blob/master/sysmonconfig.xmlused.A challenge with this is information such as T1547.001-POW-ERTON can install a Registry Run key for persistence and T1059.001-POWERTON is written in PowerShell is the technique attribution with Sysmons rule configuration.To understand these use contexts in more detail an analyst must open the referenced report; assuming the source is still active, search for the technique or technique use context, find how this technique is conducted and extract the indicators of compromise.From this threat intelligence report [1] the adversarial activity chain that occurred over the course of 485 days is the following: • Download and execute payload from external website.
• Payload retrieves and executes additional PowerShell payloads from external website.• PowerShell payload performs reconnaissance on system.
• Based on reconnaissance results of infected system the payload acquires the appropriate variant of PowerSploit.• PowerSploit reflectively loads another software suite -PUPYRAT.• The threat actor escalates their privileges and utilises SysInternals PROCDUMP to dump the LSASS process.• MIMIKATZ is then present to steal additional credentials.
Each of these activities are a low level behaviour that the threat actor is trying to achieve, and when correlated together, form a behaviour chain.Rather than focusing on complex IoC detection logic for this type of exploit, the focus of our research is to define indicators of behaviour to detect this behaviour chain.This can be achieved through semantic representation of process behaviours, driven by a generalised Sysmon syntax.

INDICATOR-OF-BEHAVIOUR MODELLING
To avoid the reliance solely on indicators of compromise for analytics, the behaviours of processes and their interactions across the Windows environment are modelled in the ontology as Indicators of Behaviour.
These indicators are a combination of low level behaviours, mid level behaviours, top level behaviours and known APT behaviours.The low level behaviours are indicative of what a process is doing at that very moment.Depending on the individual behaviour it may not be malicious but when combined with other related behaviours a pattern of malicious activity can be observed.For example, a low level behaviour could be the instance of Command Prompt launching Powershell.Individually this does not indicate much, but it certainly is suspicious behaviour when combined with other indicators such as launching Powershell with a script that imports a Powershell module, establishing a connection to a remote network, uploading a series of files and then removing those artefacts including the module.Through the relationships between each individual process event and any associated process correlations provided by Sysmon, a chain of events can be inferred as a higher level of behaviour.
The advantage of this approach is that it allows analysts to run queries on the behaviours themselves rather than constructing complex threat hunting queries with the technical observations returned for each relevant behaviour.Similarly, if utilised with an alert logging system, an alert will be generated for the generalised behaviour such as "Non administrative user has successfully escalated privileges", rather than every instance of a rule triggering on technique T1548 -Abuse Elevation Control Mechanism.For example, querying the data exfiltration behaviour will return all events in this behaviour class.Then, path queries, descriptions and predefined or ad-hoc construct queries can be utilised to get additional context surrounding these events.
A data model for the relationships between a process and its subsequent events and the impact that modelling these relationships has on security analytics is introduced in [20], as shown in figure 1.These relationships focus on event chains and provenance that can be modelled.These relationships build the foundation of the ontology which subsequently moves beyond just the relationships of the processes and into the inference rules that can be generated surrounding behaviours.Similarly, these relationship concepts are aspects of MITRE's D3FEND knowledge base of security countermeasure techniques D3-PA -Process Analysis where Process Analysis → analyses → process artifacts [18].However, the semantics of this ontology is driven by the semantics of Sysmon and the Windows event logging syntax rather than the generalised semantics of D3-PA.Table 1 lists a subset of all the relationships present in the ontology.These relationships are typically causal and chronological in the sense that they correlate the individual events together to aid in provenance, behaviour analysis and root cause analysis.Each relationship is an action that a process may take during its lifecycle and denotes the type of basic low level behaviours that is occuring when captured by Sysmon.The domains and ranges are property descriptions that connect the relationships to the axioms in the ontology through inference.These descriptions have been declared via Protege within its object property description view.An example inference is the loaded relationship.If individual X loaded individual Y then X is classified as an Image and Y is classified as a Module, the inverse of this is inferred via the loadedBy relationship.If individual Z interacts with either X,Y then they are related to one another.Preprocessing is conducted within the RDF database using a forward property path search.This forward property path search is used to indirectly relate events and subevents together without applying a transitive property characteristic to the relationship.To achieve this the ontology utilises a SPARQL SELECT query for each individual action relationships in Table 1.For each returned edge a related relationship is asserted via a SPARQL INSERT query, this allows for handling action recursion in path queries.For example, if X created Y and Y created Z then X is related to Y and Z and vice versa, utilising the created+ property path notation to look ahead in the behaviour chain.Querying the parent property related for X, Y or Z will return XYZ.In using this approach, process events can be related to one another without creating nonsensical inference inherited by a transitive property.Similarly, this allows for rule based inference across groups of behaviours (forming a behaviour chain) via the asserted related relationship instead of transitive inference properties.
The Sysmon event log parsing tool is utilised to parse the individual event log data and assert the event relationships into the ontology.Sysmon's structured syntax drives the ontology and dictates the asserted relations.The Sysmon parser will extract event data properties associated with the image and the module that is loaded, insert this data into the ontology and then build the relationship.An simplified output relationship is notepad.exeloaded ole32.dll;its inferred inverse is relationship ole32.dll.exeloadedBy notepad.exe.
The concept behind the behaviours is to model the syntactical properties of process events and their subsequent actions across the environment to detect what an attacker, or user, is trying to achieve.They infer a higher level of abstraction with the associated technical observations for these abstractions.While individual indicators can be changed or modified such as file names and hashes, the threat actor is limited to the syntax of the chosen programs and the chain of events that are spawned by the processes.For example, a threat actor utilises Powershell to interact with an external server they are limited to using Powershell syntax such as Invoke-WebRequest which cannot change.While the individual files have the potential to change, this Powershell process behaviour persists as it must be syntactically correct for Powershell to execute the command.This simple behaviour is an instance of a user attempting to establish a network connection.Subsequent behaviours that follow from this Invoke-WebRequest will follow a pattern based on how the Windows operating system and Powershell execute these commands.These subequent events are then related via their associated action relationship to form a behavioural chain.Here, the associated abstract behaviours are: a user has attempted to establish a network connection, the user has attempted to upload data to a network connection.With each behaviour having its own unique identifier for indexing and querying.If the DNS is resolved then the assumption is this network connection attempt was successful.Generating an alert or querying on the combined set of behaviours returns the individual interconnected events, a provenance graph and additional contextual nodes in the knowledge graph.

Behaviour Modelling Continued
In this study the behaviours have been modelled to reflect the operations of the emulated threat actor operations emulated with Caldera.
An example behaviour is an adversary attempting to copy a target file from a remote file share through an existing C2 channel.This behaviour is reliant on the syntactical parameters of the Powershell command Import-Module, Invoke-MultipartFormDataUpload and -Uri.Each of these parameters is an individual low level behaviour which when combined makes the higher level abstracted behaviour.The first behaviour is an instance of a module being imported, the second behaviour is an instance of a multipart form data upload and the third behaviour is an instance of a remote network connection.
The higher level abstracted behaviour is formDataUploadViaPowershell for which its inference is configured using the Manchester OWl Syntax2 when an individual has some behaviours of Import-Module, some behaviours of FormDataUpload and some behaviours of NetworkConnection.Additionally an SWRL3 rule is configured with the following notation: :Image(?x) ∧ :indicates(?x,importModule) ∧ :indicates(?x,formDataUpload) ∧ :networkConnection(?x, ?nc) → :formDataUploadViaPowershell(?x)If an individual event has these parameters it will be inferred as being a member of the Ex-filtrationOverCommandControl behaviour.These higher level behavioural abstractions can also be chained together to form generalised behavioural observations.Caldera provides YAML file configurations for its adversarial abilities which are examined for the low level parameter configurations.
Modelling the behaviours is not limited solely to command line parameters.The relationships between individuals is also indicative of behaviours.While the specific indicators and registry values may change, these process behaviour are persistent.Other indicators such as integrity levels of processes ascertain if privilege escalation has occurred.The integrity level of a Windows process can be defined as"low", "medium", "high" and "system" where standard users have access levels medium and below and elevated users receive high and below [16].System is an integrity level that is reserved for the system, segmenting kernel and core services from elevated administrative accounts [9].It is not possible for a process with a lower integrity level to open a handle with full access to another process that requires a higher integrity level [7].If the subsequent Powershell handles integrity levels change from low or medium to high and is related to the UAC bypass event previously discovered then it can be deduced that the UAC bypass was successful with an inferred behaviour of User has successfully elevated privileges via UAC bypass.
This can be further deduced by tracking and contextualising user and system access levels where the integrity levels are mapped to that user or systems access level.If user X does not have any access control group memberships that permit high integrity and an instance of a high integrity process has occurred by that user and an optional match for UAC bypass behaviours has occurred then the likelihood of successful privilege escalation increases.This can be represented as user hasPermission userGroup where user group belongs to an administrative or non administrative access group.The following SWRL rule contextualises this automatically when analysing new data that enters the knowledge base.
Image(?x) ∧ StandardUser(?su) ∧ byUser(?y, ?su) ∧ related(?x,?y) ∧ hasIntegrity(?y, highIntegrity) → indicates(?y, StandardUserPrivi-legeEscalation) Alternatively this can be represented as a SPARQL rule that inserts the relation with the following notation: Listing 1: Example of SPARQL Insert rule to assert a non administrative account escalating its privileges when instantiating a process Assuming process Image X is created or interacted with by a standard user with no administrative access SU and a transitively related process Y is launched with high integrity by SU then Y has the behaviour StandardUserPrivilegeEscalation.Here the behavioural indicator is looking at the context of the event chain rather than a detection rule based on command line arguments or image name and locations.
Alternatively, if the permissions of a user is not known or not currently inventoried by the organisation then investigating whether a user has ever launched an instance of a "High" integrity level process can be conducted to gain such context.Similarly, the computer that the event has been generated on is also a key indicator to these contexts through its permissions.
Other information sources can be abstracted from the process behaviours level of granted access and what this level of granted access means in relation to the user activity.Microsoft define a security model for controlling the access level a process has to objects on the system realised as an assigned token value which contextualises the security level a process has in relation to the user's levels of security access [17].The token value of this access is returned as the data property GrantedAccess in Sysmon.One example is GrantedAccess: "0x1fffff", which means a process has been granted all possible access rights to an object on the system [21], a level of access that would not be provided to a process launched by a user with non administrative access.Subsequently, the behaviour here is a process launching with full access rights.Detection rules such as Sigma rule ID fa34b441-961a-42fa-a100-ecc28c886725 [22] can be transformed into a SPARQL query to return instances of of these access tokens but can be prone to false positives as indicates by the author [22].By combining the abstracted behaviour with the context that this process was launched by a non administrative account infers the user behaviour: process launched with full access rights by non administrative account.An analyst can view this behavioural detection context in relation to the event that was generated and what this means in a wider context, rather than a general fact of indicator indicates technique.

Behaviour
A behaviour is a layer of abstraction that can be inferred based on the classes, relationships and data properties of events captured by Sysmon and other event log sources.A concept of data properties for semantically structured behaviours is introduced by [27].These concepts are implemented within the ontology to represent an individual within a given class that is a reference point of contextual information for any classified behaviours.An example is a behaviour class named SystemInformationGathering which will have an individual named systemInformationGathering with the data properties and values from [27] alongside any relevant relationships of ATT&CK techniques and tactics.Any event that is classified as being a member of the SystemInformationGathering class axiom will have an inferred relationship named indicates to the individual named systemInformationGathering.An analyst can then query the knowledge based for all members of the SystemIn-formationGathering class axiom or for anything that indicates the systemInformationGathering individual.The knowledge base then returns the subjects that satisfy this query with their technical observations.Low level behaviours are then chained together to form medium and higher abstracted behaviours.These individual behaviours or related behaviours chains also act as procedural informers and procedural analytics, rather than external prose formatted security reports commonly supplemented with MITRE ATT&CK techniques.
Classifying low level behaviours is a combined approach utilising inference based upon the values data properties, be that the syntactical parameters observed within the command line, the value of the EventID parameter and other data properties and utilising the Semantic Web Rule Language (SWRL) for additional logic.Class expressions are utilised for simple data property value checks to infer behaviours.An example being the ImageLoaded low level behaviour, if an event has data property value eventID 7 then it is inferred to be an ImageLoaded event as per the Sysmon EventID syntax [23].
Higher abstracted behaviours are a combination of low level behaviours that can be observed based on the data property values of individual events, the relationships of nodes in the knowledge graph, pattern chains and the relations of behaviours.Figure 2 shows an example of how chaining these behaviours together leads to abstracted behaviours at a higher level where low level behaviours infer medium level behaviours that ultimately lead to the higher behaviour abstraction.This aids in establishing better contexts for individual and correlated event detections which are prone to false positives.Similarly the behaviours provide understanding of what processes and users are trying to achieve at the behavioural level.Examples of where a wider context is needed exist in the several publicly avilable Sigma rules for detecting UAC bypass 4 .Many of these UAC bypass detection Sigma rules analyse the Parent and Child indicators and the integrity level indicators of the spawned process.These Sigma rules try to detect the patterns associated with UACME5 activities but often target standard Windows services or executables where the benign or malicious activity needs to be investigated further to confirm a true positive.
While considered suspicious activity, a Parent process instance of cmd.exe creating a Child process of eventvwr.exewith Integrity level High isn't inherently malicious or an instance of UAC bypass.The key behavioural indicator in this chain of low level behaviours is whether this is an instance of a non-administrative user launching a process with High integrity.If this behavioural activity has been detected the individual event matching the data property values for this behaviour is inferred with the relationship E indicates→ B, where E is the security event and B is the behaviour.

METHODOLOGY
The framework requires captured security event data via Sysmon to be parsed to OWL data for reasoning and inference.To achieve this, the framework uses a python script that analyses the Sysmon XML schema and translates the XML schema to the ontology.Once this OWL output file is created is is imported to the Stardog database.Relationship preprocessing is conducted on the ontology to generate the direct and indirect relationships between individual event data.Once completed the ontology is ready for IoB reasoning.A summary of these events can be seen in figure 3

Emulating Threat Actor Behaviours
MITRE's Caldera 6 platform was chosen to emulate the actions of an Advanced Persistent Threat (APT).The Caldera framework provides autonomous red-team engagements where users can define adversarial threat profiles, develop attack campaigns and launch these attacks within a provided network.This framework provides solutions to challenges related to adversarial emulation which can be resource intensive and require domain expert knowledge to deploy.
Caldera utilises the ATT&CK framework to emulate the tactics, techniques and procedures utilised by adversaries.In the Caldera framework, tactics and techniques are represented as abilities, 6 https://caldera.mitre.org/where an ability includes commands, payloads, modules and target platforms for deploying the tactics and techniques.A combination of these abilities forms the adversary profile.Through the usage of adversarial profiles, the Caldera framework provides a systematic approach to adversarial emulation and analysing the adversarial procedures for data driven threat hunting.
Furthermore, these adversarial profiles emulate the procedure an adversary will conduct, showing the behaviours of elements of an adversarial operation and the individual elements of the abilities that can be queried.For example, from analysis of an adversarial profile, a chain of windows processes can be discovered that are conducted in a same or similar fashion -thus indicating a behavioural procedure analytic that can be reused in other threat hunting scenarios.Rather than indicator based threat hunting.
MITRE provide adversarial emulation plans for the automated deployment of some adversarial campaigns, including their techniques, procedures and abilities.More about this is covered in section 4.2

APT29 Emulation Plan
Adversary emulation plans detail the procedures of advanced persistent threats with mappings to the ATT&CK framework.
In this research, MITRE's adversarial emulation plan for APT29 is utilised within Caldera to analyse adversarial behaviours in the target environment.
MITRE's APT29 emulation plan is available as a plugin for Caldera providing full automation and deployment of the defined APT29 campaign and the actions that APT29 performs.From the perspective of the attacker, these are high level indicators of behaviour that can be abstracted from security events captured by the security monitoring tools deployed in the environment.Each of these high level behaviours have a set of low level behaviours that infer this high level of abstraction.
One limitation to this emulation plan is that it is designed to be ran under on an earlier version of Caldera which references an older version of the MITRE ATT&CK framework.For instance, it references Powershell Technique as T1086 which is now T1059.001 in the latest version of the framework.

Data and Github Repository
The original dataset of the captured security events on each Windows device is saved in the Windows Event log extension (.evtx) format.This allows for the import and export of the .evtxfiles on any device or software capable of parsing .evtxfiles.The Sysmon .EVTX files are then processed by the Sysmon OWL parser into a subsequent OWL file.
The original EVTX files and output semantic data and processing scripts can be found on the following Github7 repository.

Sysmon
System Monitor (Sysmon) is a Windows system service and device driver that persistently monitors and logs system activity to the Windows event log [23] and is utilised in this methodology for logging activities on the target hosts.
Sysmon provides detailed information regarding process creations, network connections, file modifications, registry modifications and many other Windows events.Since our target environment is a Windows based environment, this makes Sysmon an optimal sensor for monitoring and capturing events occuring in the operating system.
Sysmon does not analyse events or act as a threat prevention system, it simply reports events based on rule detection logic defined in a supplied configuration file.Olaf Hartong's Sysmon Modular8 configuration file is utilised in this methodology.This indicator and rule based configuration file matches event content on individual or grouped rule conditions and generates a logged event in Sysmon, furthermore this configuration file tries to map all configurations to the MITRE ATT&CK framework whenever Sysmon is able to detect it [10].This mapping provides additional reasoning and inference for why the event was triggered and its relation to the TTPs in the MITRE ATT&CK knowledge base.Figure 2 demonstrates an example rule for linking the presence of a file name containing "Mavinject.exe" or "mavinject64.exe"with the presence of "/INJECTRUNNING" in the command line.If these conditions are met then this is saved in the Sysmon event logs with the Rule name "technique_id = T1218, technique_name=Signed Binary Proxy Execution".

Knowledge Base
The framework's knowledge base is data driven -designed primarily through the analysis of data properties asserted by the Sysmon syntax whilst utilising additional domain concepts for context and reasoning.
The primary focus of the knowledge base is to relate events together as a chain of events with their associated asserted or inferred behaviours.These chain of events require relateble characteristics to enable reasoning and the relation of behaviours over individually logged events.To enable this reasoning, the ontology utilises Sysmon data properties for linking domain objects with the appropriate predicate.For example, "Sysmon Event ID 7 -File Created" is logged at a particular timestamp, this event contains the Target File Name: __PSScriptPolicyTest_ggntoy5i.qee.ps1created by Image: pow-ershell_ise.exe,all these individual events are linked to the Process GUID -747f3d96-2859-61f3-be07-000000001b00 creating a process hierarchy and provenance from start to finish.
We create an event chain utilising the direct and indirect relationship named related.This related property is linked to the actions a process can take or has taken and is used as a pivoting point for relating security events.For example, if process X created process Y then X and its subsequent events are related to Y.This extends to any further actions conducted by X and Y.If the actions of process Z intersect with X or Y then Z is related to X and Y as its security events intersect with one another in this event chain.
This intersecting relationship makes it easier to understand how individual events are related to one another based on their intersecting interactions, what occurred before or after these intersects and how individual behavioural indicators can be associated with one another to infer and abstract higher level contexts.

Event and Behavioural Chains
The chain of events are represented as such in the ontology using relationship characteristics influenced by elements of Allen's interval algebra [2].Allen's interval algebra is a set of jointly exhaustive and pairwise disjoint binary relations for representing temporal relationships of intervals.The key elements used in this ontology are the starts and ends relations and their inverse relations started, ended, where starts is element x starting element y and ends is element x ending element y.These relationships have been adapted to conform with the domain specific properties generated by Windows event processes and to characteristics that adhere to the OWL 2 Manchester Syntax9 to generate event chains from the returned Sysmon data.For example starts is translated to created.
When an action occurs between two individuals these are asserted as related using Basic Graph Pattern (BGP) rules utilising conditional if-then statements for the logic.Similarly, using a recursive property path for a given action achieves this across a behavioural chain.We do not use the OWL:equivalentTo or OWL:sameAs property to avoid nonsensical transitive relationship inference.Take the following example: Process X created Process Y and Process Y created File F and Process Z accessed File F .
Using an OWL:equivalentTo or OWL:sameAs property to link related created and accessed with a transitive object property characteristic would cause the following example nonsensical inference: Process X accessed File F This occurs due to the created and accessed relationships inheriting the transitive object property equivalence.
Instead we use the custom SPARQL INSERT rule to assert the relationship where the antecedent is the recursive actions a process takes and the consequent is the relation.An example antecedent X created Y is represented as the following in the custom rule: This approach provides the knowledge base with a means to assert any direct and indirect relationships between security events based upon the actions created by security events on systems without the actions inheriting transitive properties that are nonsensical when chained together.
A second relation rule is also made via the ProcessGUID data property provided by Sysmon.In this ontology the ProcessGUID data property is parsed as an individual within a ProcessCorrelation class.If Process X and Process Y hasCorrelation ProcessCorrelation Z then X and Y are related.
A unique general universal identifier (GUID) is generated for each event's Uniform Resource Identifier (URI) to ensure that each individual logged entity is unique and doesn't impact ontology reasoning.The URI is an ASCII string used to identify things on the semantic web [26].For example, two individual process events logged by Sysmon could have the same process identifier '0000-1111-2222-3333' as their GUID but have two different logged Sysmon event IDs.If the first process has an event ID 1 it is classified as a ProcessCreation behaviour [23], if process two has an event ID 7 it is classified as an ImageLoaded behaviour [23].This would affect reasoning if both event's data properties for event ID were added to a single process, since the ontology looks at the behaviour chain of processes and their classified actions.It would be inferred or asserted that each process chain is both a 'process create' and 'image loaded' event, at the same time, losing its behavioural characteristics.If two captured events X and Y have the same process ID but different event data then X is asserted as the Master process and Y is asserted as the SubProcess of X with the relation X hasSub Y.This ensures consistency by relating X to Y whilst maintaining their individual behavioural characteristics.

Tools Used for Designing the Ontology
The ontology has been designed using the Protegé10 ontology tool developed by Stanford University and is processed using the Stardog knowledge graph platform11 .Stardog offers multiple reasoning options for its reasoning engine, the optimal reasoning configuration for this ontology is the Blackout reasoner utilising the SL inference type configuration setting 12 .This is so that the reasoning and inference is compatible with the user defined behaviour rulesets for the ontology.For Reasoners in Protegé it is recommended to use the Pellet reasoner 13 as this reasoner supports SWRL built-ins 14utilised in this ontology.

EVALUATION
A limitation with the chosen Sysmon configuration file existed with technique attribution with the configuration files rule set.A common occurring technique within the captured event data is T1059.001-Powershell 15 .Due to Sysmon triggering this ruleset based on the process image name containing the value Powershell and APT29's Caldera agent conducting most of its attacks via Powershell, this technique attribution occured frequently.While it is technically correct that a technique that utilises Powershell is occuring, the threat actor is actually trying to compress all data within a directory and exfiltrate those results, represented as techniques T1074 -Data From Staged File and T1041 -Exfiltration over C2 channel.This indicates a differing perspective from the defender and threat actor, where the defender views an instance of Powershell whilst the threat actor is exfiltrating data.The full capability has ID a612311d-a802-48da-bb7f-88a4b9dd7a24 in the APT29 emulation plan for further reading.
This can be better represented as IoBs for the chain of events.For ground truth and IoB generation, the adversaries command is used as the source indicator.In this instance the knowledge base was queried via SPARQL for an excerpt of the security events commandLine property where it contains the string MultipartFor-mDataUpload, a syntax used in the command ability that uploads file(s) to a HTTP server.The following SPARQL query is used: This returns the RDF tuple seen in Table 2. Using the returned ?image subject , one can pivot on all transitive related images via the related relationship for any Image class where image is related relatedImage.Where true, this part of the SPARQL query returns all subjects relatedImage, one can then pivot again on relatedImage to return all network relatedImage subjects that have egress network connections.These steps are represented as the following SPARQL query analytic, however converting these into behavioural steps reduces the required workflow and correlates the behavioural contexts together.In this instance there are three individual behaviours that are related based on the actions taken by the process chains.The first behaviour is an instance of the Powershell process creating a compressed output file, the second behaviour is an instance of a Powershell process uploading this compressed output file and the final behaviour is the network connection.
To represent the low level IoB for the Powershell process trying to compress a file we use the following rule syntax The consequent of this rule is any Image indicates b1-002 when true. b1.002 is asserted in the ontology as being a member of the B1-002 with a class equivalence of indicates some B1-002.
Querying this IoB grouping in SPARQL can be achieved in one of two ways: • SELECT DISTINCT * ?subject threat:indicates threat:b1-002 • SELECT DISTINCT * ?subject a threat:B1-002 Either of these queries returns the following subjects with their human readable label: • threat:dce4f184-e4df-4407-b405-bd06ce10afbc 'powershell.exe'• threat:d35f242b-7552-4ff7-84f2-b254a0984afa 'powershell.exe'A second rule is created for the multi-part form data upload behaviour conducted by powershell.This is inferred based on either the prescence of MultiPartFormData within the commandLine data property.
Combining these behaviours together forms a higher level of abstraction behaviour that indicates a user uploading a compressed zip file to a network connection via powershell.This behaviour then addresses the overlap between different MITRE ATT&CK techniques since the staging is T1560 -Archive Collected Data the medium is T1059.001-Powershell, but output is T1041 -Exfiltration Over C2 Channel all combined in one security event.The behaviour chain has been given ID B1005 for querying.
After creating a low level IoB for each stage, these IoBs are then chained together to form the final behaviour where querying for subject indicates B1005 returns the corresponding subjects for each behaviour in context to one another.The user defined inference rules to represent these behaviours aids in automated collection of key aspects of adversarial behaviours and aids in reducing the time spent on threat hunting query construction.Analysts can simply query for the behavioural IDs that have been indicated by a subject in the captured security event data.
Where similar approaches to utilising knowledge graphs for security event data focus on attribution at the individual event, this knowledge bases' behaviours apply inference logic to generate high level abstractions that can be queried to return low level technical event data.This related or non related data is then represented as a procedure that can be reused for querying or exported for uses such as a STIX attack-pattern object, JSON formatted machine readable analytics, SIEM tool integrated analytics (SIGMA).This behavioural procedure can be utilised threat hunters rather for analysis rather than the prose formatted texts commonly found in the Procedure section of MITRE ATT&CK techniques.[15] the data models lack full granular contexts surrounding process behaviours.This research builds upon this knowledge and semantic representations by providing additional domain level knowledge of network environments, more granular semantics for host and network based security events and developing reusable analytics for querying the ontology.Furthermore, the ontology is enriched with security event data captured from emulated threat actors in a simulated enterprise environment.Similar research on automating the threat hunting process utilise natural language processing to extract indicators from published threat intelligence reports [8] but is limited to the structuring of the report texts and the usage of a domain specific language.The SEPSES knowledge graph [12] and the KRYSTAL knowledge graph [13] are similar knowledge graph approaches.SEPSES' main purpose is describe the relations between CVE, CWE, CVSS and CAPEC resources extracted from the NVD database, while this ontology does not address IoBs, it's designed as a publicly accessible knowledge graph for integrating these domain concepts.The KRYS-TAL knowledge graph creates attack graphs from audit data using SIGMA rule detection, while the knowledge base can represent system provenance and semantically represented graph patterns between domain concept relations, it is IoC based and does not conceptualise behavioural indicators.AttacKG [14] uses natural language processing techniques to generate knowledge graphs from open source threat reports, whilst it can abstract IoCs and causal techniques it doesn't build generate abstract behaviours for process behaviours leaving it reliant on IoC and technique intelligence gathering.

Field
The Unified Cyber Ontology UCO) [24] [25] is an ontology with the aim to standardise a variety of concepts within the cyber security domain.Whilst UCO provides a plethora of knowledge represented cyber security observables, it does not currently analyse behaviours from the perspective of what the attacker is trying to achieve.In contrast to the UCO, our ontology is not designed to try and conceptualise the entire cybersecurity domain as we assume the ontology is being utilised by expert within the field of security analytics.A file is a file and an process image is a process image as asserted by Sysmon.
From a detection and threat hunting perspective MITRE have a public knowledge base called the Cyber Analytics Repository (CAR) 16 which is a knowledge base of detection analytics that are associated with attacker techniques in the MITRE ATT&CK framework and defender techniques for MITRE D3FEND.A limitation with CAR is the inconsistency in which detection data is presented to the analyst.Often the detailed analytic is presented as a prose based description accompanied by generalised data model objects that an analyst must correlate.Alternatively these analytics come with psuedocode detection or a combination of both prose based with psuedocode.
Frameworks such as Open Indicators of Compromise (OpenIOC) and Structured Threat Information eXpression (STIX) provide structured data for automated threat hunting while OpenIoC is primarily used for sharing individually observed IoCs, STIX provides a wider context around threat actor activities with its STIX bundles and Groupings, a series of interconnected observable IoCs with their associated TTPs but it is still heavily reliant on IoCs.A series of TTPs can be included as a STIX attack pattern object 17 but they provide limited insight for hunting on behavioural patterns and instead focus on providing the observable IoCs for these TTPs.STIX is extended with STIX patterning for detecting activities on a system or network but is limited to the STIX's Indicator object and cannot reason.In our approach, we use SPARQL for detecting behaviours and reasoning both on low level indicators, high level abstracted behaviours and domain concepts.

CHALLENGES & FUTURE WORK
Initial challenges when parsing the Sysmon data were inconsistent syntax mappings for XSD.Key Values were not always the same.For example Hash vs Hashes depending on the event type.These overlapping keys are merged as one in the ontology at preprocessing.Currently the knowledge base is limited to data collected for Windows environments with Sysmon.Future work will look at normalised data and how best to semantically represent multiple log sources.Similarly, Sysmon can be deployed in Linux environments 18 Reasoning over an ontology can be computationally expensive with ever expanding content.In this research the captured security events are not aggregated, nor is the infrastructure focused on optimising the knowledge base contents.The ontology includes semantic representations of user and computer information provided by the logged event data by Sysmon.A comprehensive inventory of these assets within the organisation would aid in providing additional contexts for more nuanced reasoning within the ontology.This contextualised inventory will aid in behavioural decision making and context enrichment regarding the state of security events and their relevance to wider cyber threat intelligence

Listing 3 :
Creating a direct relation between two elements based on the action "created" INSERT { ?threat threat:related ?relation .} WHERE { ?threat threat:created ?relation .}

Table 1 :
List of relationships in the ontology for relating security event domain objects Example of a low level IoB for uploading data via a webform Listing 6: Example of a low level IoB for a powershell process creating a compressed archive file