Abstract
Today’s threats use multiple means of propagation, such as social engineering, email, and application vulnerabilities, and often operate in different phases, such as single device compromise, lateral network movement, and data exfiltration. These complex threats rely on advanced persistent threats supported by well-advanced tactics for appearing unknown to traditional security defenses. As organizations realize that attacks are increasing in size and complexity, cyber threat intelligence (TI) is growing in popularity and use. This trend followed the evolution of advanced persistent threats, as they require a different level of response that is more specific to the organization. TI can be obtained via many formats, with open-source intelligence one of the most common, and using threat intelligence platforms (TIPs) that aid organizations to consume, produce, and share TI. TIPs have multiple advantages that enable organizations to quickly bootstrap the core processes of collecting, analyzing, and sharing threat-related information. However, current TIPs have some limitations that prevent their mass adoption. This article proposes AECCP, a platform that addresses some of the TIPs limitations. AECCP improves quality TI by classifying it accordingly a single unified taxonomy, removing the information with low value, enriching it with valuable information from open-source intelligence sources, and aggregating it for complementing information associated with the same threat. AECCP was validated and evaluated with three datasets of events and compared with two other platforms, showing that it can generate quality TI automatically and help security analysts analyze security incidents in less time.
1 INTRODUCTION
In today’s world, most organizations are digital, operating with technologies and processes of the Internet era. The changes in IT infrastructure and usage models, including mobility, cloud computing, and virtualization, have dissolved traditional enterprise security perimeters, creating a vast attack surface for hackers and other threat actors [45]. Managing the digital landscape in which an organization operates is a challenge that has never been more difficult, making an organization vulnerable to many forms of attack.
Not only has the digital landscape has evolved, but there also has been a significant evolution in cyber threats, as adversaries have advanced their knowledge. They have deployed increasingly sophisticated means of circumventing individual controls within users’ local environments and probed further into their systems to execute well-planned and orchestrated attacks [44]. With the increase of the digital landscape and the threat landscape complexity, organizations are more likely to be targeted and suffer a severe cyber-attack, with high financial and reputational impact. With the high probability and impact of cyber-attacks, in addition to the significant regulatory pressure to protect the information, such as the European Union’s General Data Protection Regulation, organizations are encouraged to look for new solutions to reduce their vulnerabilities [14].
One domain that has emerged during the past decade is cyber threat intelligence (TI). This new domain combines key aspects from incident response and traditional intelligence, and it can be defined as “the process and product resulting from the interpretation of raw data into information that meets a requirement as it relates to the adversaries that have the intent, opportunity and capability to do harm” [38]. However, compared to other cyber domains, such as incident response and security operations, TI is still in the early adoption phase, limited by the lack of suitable technologies, known as threat intelligence platforms (TIPs) [45, 47]. Although organizations recognize the potentiality of TI, the lack of tools that would help them manage the collected information and convert it to actions is preventing the mass adoption of this kind of solution.
With the emergence of new threat actors, like advanced persistent threats (APTs), organizations cannot rely on a single solution to protect from this type of threat. The static approach of traditional security based on heuristic and signature does not match new threats known to be evasive, resilient, and complex. These complex threats rely on well-advanced tactics to appear unknown to signature-based tools and yet authentic enough to bypass spam filters [16]. Today’s organizations must deploy a multi-layered defense to improve their chances of detecting or disrupting an attack to fight these threats.
Under a form of open-source intelligence (OSINT), TI information can provide knowledge to a vast selection of systems and processes that form this multi-layered defense, such as antivirus and intrusion prevention systems and the processes that manage these solutions and review the events generated by them. This knowledge can be collected from many sources using TIPs. However, TIPs receive thousands of security events, which makes it hard to analyze them to extract relevant data about threats. The volume and quality of data are the most common barriers to effective information exchange. In addition, shared data is often outdated and not specific enough to aid the decision-making process, becoming unactionable [48]. The confidence level of information is another barrier since most sources do not provide this information, forcing security operations center (SOC) analysts to put additional effort into evaluating and verifying the received data. In addition, most organizations cannot make valuable use of their threat data because there is too much—approximately 250 to millions of indicators of compromise (IoCs) per day [48]. Considering the volume of shared threat information, most of the platforms end up being data warehouses rather than platforms where threat information can be analyzed. Moreover, the time SOC analysts spend analyzing and classifying incidents has increased due to this volume of data, not valuable information and duplication of incident classification in several public incident taxonomies (e.g., eCSIRT and ENISA). There are few platforms [1, 3, 18] that deal with these drawbacks. They aggregate diverse OSINT data related to the same threat into a single event. At first, this approach is beneficial, as it avoids the manual analysis of several individual events and the manual attempt to establish their relationships. As a result, it will decrease the time spent by SOC analysts performing this task. However, aggregating a set of events into one will increase the amount of information that analysts must check. This amount can reach more than 1,000 attributes in an event, and therefore, the time required to analyze it can be longer than the time needed to analyze the set of events individually.
This article proposes AECCP, an automated event classification and correlation platform (AECCP) that implements an approach to address some TIP limitations by generating highly information-rich objects under a standard format and a single unified taxonomy (UT), with their threat categories characterized by main threat attributes. In addition, it correlates and aggregates these objects into clusters of objects, thus generating quality TI that shares the same threat type and other information. To improve the collection and automatic classification of actionable TI, as well as to define the UT, we first need to understand the TI life cycle, as well as the available information sources and current TIPs, and to identify the main attributes that allow characterizing each threat category of UT. This requires working on all levels of the intelligence-gathering operation, using an automated system to (i) receive data from multiple sources, (ii) improve the enrichment process and validate the information collected by cross-referencing it, (iii) produce objects under a standard format and taxonomy, and (iv) store the obtained intelligence in such a way that it can be applied in the optimisation of defense mechanisms. Moreover, by using a UT and the main threat attributes, the problem that arose from the platforms aforementioned will be solved.
To the best of our knowledge, this article is the first to (i) propose a unified taxonomy to classify security events; (ii) study and identify the main attributes that better describe threat types; (iii) classify security events automatically into an incident category and remove the overlap of classification tags, without human intervention; and (iv) propose a platform to reduce the amount of information aggregated in a single event, after an event correlation and clustering task. Moreover, our approach aims to improve the response of threat analysts and all of the systems used by the organization against today’s complex threats. In addition, it aims at finding ways to benefit from OSINT to increase the detection capabilities of defense mechanisms, such as security information and event management systems (SIEMs) or intrusion detection systems (IDS), reducing the number of false positives and negatives.
We validated and evaluated AECCP with three datasets of security events. Our results suggest that AECCP can automatically classify TI into an incident category and generate new and enriched TI that associate different security events regarding the same threat in a single way. In addition, we compared AECCP with two platforms from the literature, and the results show that our approach performs better than the others.
The main contributions of the article are as follows. First, we present a UT to reduce the overlapping of taxonomies with the same meaning and simplify the event classification while maintaining its details. Second, we present the identification of the main attributes that characterize each incident category into the proposed taxonomy, which will allow reducing the volume of shared information. Third, we offer an approach that aims to improve quality threat intelligence produced by TIPs by automatically classifying and enriching it. The approach is composed of a set of modules, each one focused on one or more limitations of TIPs and verified in our data analysis. Fourth, we present AECCP and its assessment with three event datasets and two other platforms.
2 BACKGROUND AND RELATED WORK
2.1 Advanced Persistent Threats
Today’s generation threats are multi-vectored and often multi-stage—that is, most attacks use multiple means of propagation, such as social engineering, email, and application vulnerabilities, and most attacks operate in different phases, such as single device compromise, lateral network movement, and data exfiltration [48]. These complex threats rely on social engineering techniques, the latest zero-day vulnerabilities, and well-advanced tactics for appearing unknown to signature-based tools and yet authentic enough to bypass spam filters. Traditional security defenses were developed to inspect each attack vector as a separate path and each stage of an attack as an independent event, failing in identifying and analyzing an attack as an orchestrated series of cyber incidents [16].
APTs, being one of today’s generation threats that had a significant impact on the rise of cybercrime, branched from young hackers in the Black Hat community, whose objective was mayhem and reputation, to organized crime groups provided by states and private entities [45]. Chen et al. [5] characterize APTs and separate them from other criminal enterprises online, with them being specific targets and clear objectives, highly organized and well-resourced attackers, long-term campaigns with repeated attempts, and stealthy and evasive techniques [5].
2.2 Open Source Intelligence
The earliest forms of (OSINT date back to World War II, marked by the ability to find relevant information and combine it in a way that treats information as a resource rather than a commodity [17, 23]. OSINT can be defined as intelligence produced from publicly available information (open-source information (OSINF)), such as information gathered from radio, television, newspapers, websites, blogs, papers, and conferences. Today, due to the development of the Internet, this type of information has become significantly more accessible and cheaper to gather than the traditional public information acquired by clandestine services. In comparison to other sources of information, like human intelligence, OSINF can sometimes provide extra information and be a more reliable and safe way of acquiring intelligence [11].
To produce OSINT, OSINF is analyzed, edited, filtered, and validated. Moreover, the information gathered is linked with other sources to verify, complement, and contextualize the collected data. The more public are available sources, the better intelligence will be produced [11, 17]. OSINT has become one of the most common forms of intelligence and is considered a goldmine for organizations [36]. For instance, recent studies stated that valuable and early information can be provided by social networks, such as Twitter [39, 48]. One of the biggest advantages of using OSINT is the cost, as it is much less expensive than traditional information-gathering tools. In addition to the cost advantage, OSINT has many benefits when it comes to sharing and accessing information, as the latter can be legally and easily shared with anyone, and open sources are always available and up to date [19]. However, OSINT has some constraints, such as the high quantity of available information that needs to be processed to create valid intelligence, demanding a high amount of work to extract useful information from the noise. This task requires a large amount of analytical work from security specialists to distinguish valid, verified information from false, misleading, or inaccurate data. A final constraint of OSINT is that its production may not always provide the needed answer since it only uses available information [19].
2.3 Threat Intelligence
Threat intelligence (TI) can be defined as “evidence-based knowledge, including context, mechanisms, indicators . . . about the hazard to assets that can be used to inform decisions regarding the subject’s response to that menace or hazard” [50].
In its simplest form, TI is the process of understanding the threats toward an organization based on available information. However, there must also be an understanding of how the information relates to the organization. Hence, it must be combined with contextual information to determine relevant threats to the organization. Moreover, TI is valuable to an organization only if it is actionable. If the SOC cannot determine how to best respond to, combat, or mitigate a threat to the organization, then the information provides little to no value [4]. Detecting incidents sooner and potentially even preventing them is the overall goal of TI. Organizations often see TI as a way to reinforce the environment and prepare for both known and unknown threats.
TI has grown in popularity and use among organizations as they realize that attacks have increased in size and complexity. According to a TI survey, 85.5% of respondents have at least one person responsible for consuming or producing TI in their organization and 7.1% of respondents plan to have one shortly. This trend followed the evolution of targeted attacks and APTs as they require a different level of response that is more specific to the organization [21]. Many organizations are convinced that TI is a valuable tool to help them better understand their attackers.
As stated, the objective of creating TI is the creation and delivery of a product that can be acted upon. While threat intelligence professionals find value in sharing threat information through informal and traditional communication channels, the results are inconsistent and unscalable. Hence, better frameworks are needed for communicating TI to provide an adequate answer to today’s complex threats. Such frameworks should include standardized reporting terminology and processes, benefit in information sharing for cybersecurity purposes, the ability for users to create trusted communities, and technical infrastructure to share and analyze TI at machine speed. In the absence of an industry-standard framework, current sharing mechanisms include private or restricted face-to-face meetings and phone calls; emails, forums, and message boards; web portals with wiki-type capabilities; web portals acting as document management systems; web portals (some with APIs) allowing downloads of structured data; and web portals offering social networking facilities with secure access and sharing controls [12].
TI represents security threat activities that are provided as a form of IoCs—that is, information artifacts obtained from a forensic analysis that aggregate data on malicious activity in a system or within a network that was attacked [26]. For sharing TI among entities and security platforms and structuring its information, diverse standard formats have been proposed, with OpenIoC [9], STIX [32], TAXII [33], CSV, and MISP format the most popular. However, its use is not widespread and is poorly implemented [37].
2.4 Threat Intelligence Platforms
Threat intelligence sharing platforms (TIPs) were introduced to fill the industry-standard gap in TI sharing, and gaps and limitations of actual detection and monitoring defense mechanisms placed in IT infrastructures [46]. In this sense, TIPs are used for OSINT and TI collection and their processing, storage, sharing, and integration of their resulting data with other security platforms and tools related to incident response and threat management (e.g., SOC, CSIRTs). They retrieve (structured and unstructured) data from several external sources (e.g., OSINT feeds) and process these data by applying various operations, such as filtering, normalization, aggregation, and some correlation [3].
TIPs usually vary in the (i) objective, as some are used to operational information, whereas others may be focused on long-term risk analysis; (ii) the scope of their action, from accepting only processed inputs to possessing natural language processing capacities; and (iii) their capabilities, of which current platforms range from data acquisition and storage to advanced analytics using machine learning. Despite their differences, the functionalities of TIPs follow the steps of the threat intelligence life cycle, namely planning and direction, collection, processing and exploitation, analysis and production, dissemination, and integration [4, 20, 25, 34].
Since the existence of TIPs, their adoption by organizations has grown and played an important role in spreading security threat activity among the collaborative entities working in this field. However, their adoption and implementation are still in their infancy [43], with many limitations to be resolved, such as automatic trust assessment and classification of TI and advanced capabilities of analysis, where SOC intervention continues to be required to filter and retrieve TI information that is relevant and effectively actionable.
Some open-source TIPs have been adopted by organizations, with the next four those widely used [48]: MISP (the Malware Information Sharing Platform) [30], CIF (the Collective Intelligence Framework) [8], CRITs (Collaborative Research Into Threats) [31], and SoltraEdge [22], with MISP being the most popular.
2.5 MISP
MISP was initially created by the NATO Computer Incident Response Capability Technical Centre (NCIRC TC) to implement the Smart Defense concept and presently is owned by the Computer Incident Response Centre Luxembourg (CIRCL). One of the key concepts of MISP is the sharing of intelligence among members of the same community [30, 49].
Currently, MISP has not only, but mainly, the following capabilities: sharing; storage; automatic correlation of IoCs; advanced filtering capabilities; and export and import of data in the most popular formats, namely STIX, OpenIOC, CSV, and MISP standardized format [10, 49]. IoCs, also called MISP events, contain technical and general information of TI, which are represented in MISP format and stored in a database of indicators.
A new entry in MISP’s database is called an event object, which can be defined as a set of characteristics and all kinds of descriptions of an IoC. These characteristics and relevant information are called attributes. Examples of attribute types are hash, filename, hostname, and IP address. An attribute can even be a complex object that contains multiple attributes. An example of a complex attribute is an antivirus signature, which can include the name of the antivirus, the name of the signature, and the detection date [49]. Furthermore, each attribute can be correlated with other simple or complex attributes. In addition, IoCs, when stored, are automatically correlated to describe the relationships between attributes and indicators [10].
2.5.1 Taxonomies.
Data classification is often bound to internal, community, or national classification schemes. One common problem is the mapping of events into categories. This is a complex task since categories are not always known in advance. Since a centralized pre-defined set of definitions that satisfies all potential users is a hard challenge, MISP uses a distributed approach based on machine tags. However, the freedom of defining tags can easily lead to a situation where multiple tags have the same meaning, making filtering complicated. A new concept of tagging was introduced to overcome this problem—the taxonomies. Taxonomy is based on a triple tag structure with a namespace, a predicate, and a value— for example,
In its default configuration, MISP includes a set of public incident classification taxonomies [29], where some of the most used of them are described next, and their tags are presented in Table 1 as being recognized in the MISP tag structure:
| eCSIRT.net Taxonomy Main Category | Microsoft Implementation of the CARO Naming Scheme |
|---|---|
| ecsirt:abusive-content | ms-caro-malware:malware-type=“Adware” |
| ecsirt:malicious-code | ms-caro-malware:malware-type=“Backdoor” |
| ecsirt:information-gathering | ms-caro-malware:malware-type=“Behavior” |
| ecsirt:intrusion-attempts | ms-caro-malware:malware-type=“BroswerModifier” |
| ecsirt:intrusions | ms-caro-malware:malware-type=“Constructor” |
| ecsirt:availability | ms-caro-malware:malware-type=“DDoS” |
| ecsirt:information-content-security | ms-caro-malware:malware-type=“Dialer” |
| ecsirt:fraud | ms-caro-malware:malware-type=“DoS” |
| ecsirt:vulnerable | ms-caro-malware:malware-type=“Exploit” |
| ecsirt:other | ms-caro-malware:malware-type=“HackTool” |
| ecsirt:test | ms-caro-malware:malware-type=“Joke” |
| ms-caro-malware:malware-type=“Misleading” | |
| CIRCL.LU Taxonomy | ms-caro-malware:malware-type=“MonitoringTool” |
| circl:incident-classification=“spam” | ms-caro-malware:malware-type=“Program” |
| circl:incident-classification=“system-compromise” | ms-caro-malware:malware-type=“PUA” |
| circl:incident-classification=“scan” | ms-caro-malware:malware-type=“PWS” |
| circl:incident-classification=“denial-of-service” | ms-caro-malware:malware-type=“Ransom” |
| circl:incident-classification=“copyright-issue” | ms-caro-malware:malware-type=“RemoteAccess” |
| circl:incident-classification=“phishing” | ms-caro-malware:malware-type=“Rogue” |
| circl:incident-classification=“malware” | ms-caro-malware:malware-type=“SettingsModifier” |
| circl:incident-classification=“XSS” | ms-caro-malware:malware-type=“SoftwareBundler” |
| circl:incident-classification=“vulnerability” | ms-caro-malware:malware-type=“Spammer” |
| circl:incident-classification=“fastflux” | ms-caro-malware:malware-type=“Spoofer” |
| circl:incident-classification=“sql-injection” | ms-caro-malware:malware-type=“Spyware” |
| circl:incident-classification=“information-leak” | ms-caro-malware:malware-type=“Tool” |
| circl:incident-classification=“scam” | ms-caro-malware:malware-type=“Trojan” |
| circl:incident-classification=“cryptojacking” | ms-caro-malware:malware-type=“TrojanClicker” |
| circl:incident-classification=“locker” | ms-caro-malware:malware-type=“TrojanDownloader” |
| circl:incident-classification=“screenlocker” | ms-caro-malware:malware-type=“TrojanDropper” |
| circl:incident-classification=“wiper” | ms-caro-malware:malware-type=“TrojanNotifier” |
| circl:incident-classification=“sextortion” | ms-caro-malware:malware-type=“TrojanProxy” |
| ms-caro-malware:malware-type=“TrojanSpy” | |
| ms-caro-malware:malware-type=“VirTool” | |
| ms-caro-malware:malware-type=“Virus” | |
| ms-caro-malware:malware-type=“Worm” |
Table 1. eCSIRT.net, CIRCL.LU and Microsoft Implementation of CARO Taxonomies Recognized in the MISP Tag Structure
eCSIRT.net [7] (middle-high of column 1): This taxonomy was developed many years ago, but the main categories are still current and can easily be used. However, the subcategories can lead to problems with classifying an incident. Despite its defects, many European Computer Security Incident Response Teams (CSIRTs) use it, which allow teams to team up with others.
CIRCL.LU [6] (middle-bottom of column 1): MISP owners and main contributors use their taxonomy for classifying incidents. With some similarities with eCSIRT.net taxonomy, CIRCL.LU only has one level of classification.
Microsoft implementation of CARO Naming Scheme [27] (second column): According to the Computer Antivirus Research Organization (CARO) malware naming scheme, Microsoft designates malware and unwanted software. This scheme was created by a committee at CARO and was the first attempt to make malware naming consistent.
2.6 Limitations of TIPs
TIPs have multiple advantages that enable organizations to easily bootstrap the core processes of collecting, normalizing, enriching, correlating, analyzing, disseminating, and sharing threat information. However, current solutions have some limitations that prevent their mass adoption. Next, we present the limitations related to the current state and usage of TIPs [13, 35, 47]:
LT1—Shared threat information is too voluminous: One of the problems is the overload of threat information shared via open source, commercial sources, and communities. Combining shared threat information from different sources makes the relevant intelligence hard to find and makes it difficult to generate value.
LT2—Limited technology enablement in threat triage: There is limited technology enablement to facilitate the relevancy determination process. Currently, this process is done manually, in a complex way, and dependent on the analyst.
LT3—Data quality: The confidence level of information is not provided by most of the feed, forcing analysts to put additional effort into evaluating and verifying the received data.
LT4—Limited analysis capabilities: Most TIPs have limited capabilities related to browsing, attribute-based filtering, advanced searching information, pivoting, exploration, and visualization.
LT5—Limited advanced analytics capabilities and automation tasks: Most TIPs have limited capabilities related to aggregation, composition, and generalization of data, as well as the capability to de-duplicate, tag, and classify data automatically.
LT6—Focus on data collection: Considering the volume of shared threat information and the limited analysis capabilities provided by TIPs, most of the platforms end up being data warehouses rather than platforms where threat information can be shared and analyzed.
LT7—Limited threat knowledge management: No common vocabulary is used for describing threat actors, tactics, techniques, procedures, and tools.
LT8—Focus on tactical IoCs: Tactical IoCs are mostly shared, lacking comprehensive threat information. Standardized formats are underused or even not used during information sharing, noting that most information is exchanged in unstructured files.
LT9—Trust-related issues: Most TIPs have limitations in the way that organizations interact and contribute to specific communities, and most platforms do not allow organizations to share only specific types of threat data with particular communities.
LT10—Diverse data formats: Although there are community efforts to provide connectors between different standards and formats, converting information without losing any elements or context from the source format is a challenge. Most TIPs tend to stay with one format, limiting the flexibility of the TIP users.
LT11—Shared intelligence without expiration date: Currently, the time-to-live information is not provided by most of the feeds, and TIPs have limited capabilities in handling this type of metadata information.
LT12—Diverse APIs and requirements for integration: TIPs integrate with a standard set of services and tools while the owners prioritize requests for additional integrations.
LT13—Limited workflow enablement: Currently, TIPs provide limited workflow capabilities that would make the process of threat management more efficient, such as the capability of stakeholders to send requests for information.
2.7 Platforms for Resolving Limitations of TIPs
A few platforms try to reduce some TIPs’ limitations and improve TI processing.
PURE [3] is a platform that generates improved intelligence based on OSINT. This enhanced intelligence translates into new enriched IoCs obtained by correlating and combining IoCs from different OSINT feeds sharing information about the same threat. The novel cluster method used by PURE allows the creation of clusters that can be summarized and converted into an enriched IoC, allowing the discovery of unidentified patterns and the detection of new complex attacks. The platform comprises the normalization of the different IoC formats in a single one and compares the IoCs received with the IoCs stored in the database to check the existence of duplicates. Besides discarding the duplicated IoCs, it discards those that provide no new information. The set of IoCs of interest resulting from a filter step is sent to a clustering module, which applies similarity and weighs metrics over them to aggregate similar and related IoCs to create quality TI. IoCs belonging to a cluster are correlated to find the most relevant information that characterizes a threat and then are converted into a single enriched IoC.
ETIP [15, 18] is a platform that extends the importing capabilities, the quality assessment processes, and the information-sharing capabilities in current TIPs. ETIP gathers and processes structured information from external sources, such as OSINT and a monitored IT infrastructure. It comprises two main modules: a composed IoC module, in charge of collecting, normalising, processing, and aggregating IoCs from OSINT feeds, and a context-aware intelligence sharing module, able to correlate, assess, and share static and real-time information with data obtained from multiple OSINT sources. ETIP computes a threat score (TS) associated with each IoC before sharing it with other tools and trusted external parties. Enriched IoCs produced by ETIP contain a TS that allows SOC analysts to prioritize the analysis of incidents. The TS evaluates heuristics with two weights: individual weights assigned to every attribute based on their relevance, accuracy, and variety, and a global weight (i.e., completeness criterion) assigned to the heuristic. The higher the TS value, the more reliable the IoC.
SYNAPSE [1], a Twitter-based streaming threat monitor for threat detection in SOCs, implements a pipeline that gathers tweets from a set of accounts, filters them based on the monitored infrastructure, and classifies the remaining tweets as either relevant or not. The pipeline is composed of a data collector, a filter, pre-processing and feature extraction module, a classifier, and a clustering module. The data collector requires a set of accounts, from which it will collect every posted tweet using Twitter’s stream API. The filtering approach assumes that a tweet must mention a particular IT infrastructure asset when referring to a threat to a specific IT infrastructure asset. Only tweets that include at least one of the keywords will pass the filter. The pre-processing and feature extraction module is then used to normalize the tweet representation before the classifier. Two classifiers were explored for the classification of tweets according to their security relevance: Support Vector Machines (SVM) and Multi-Layer Perceptron (MLP) neural networks. Finally, SYNAPSE uses clustering to aggregate similar tweets in the newsfeed stream, adapting a Clustream algorithm to achieve the desired threat aggregation. Relevant tweets are grouped in dynamic clusters and presented as IoCs that can be manually inspected or fed to SIEMs and other TI tools.
Table 2 presents which TIPs limitations (stated in Section 2.6) are addressed by these platforms (columns 3 through 5). They all have the main objective of creating quality TI through new analytical approaches and in an automated way. The new TI is obtained by filtering and combining OSINT associated with the same threat in a single security event. The concretization of this objective addresses the first six TIPs’ limitations (LT1 to LT6) since the resulting TI will allow decreasing the amount of individual and not related data (security events) that SOC analysts must analyze. However, as this resulting TI aggregates in a single event much more information (the merging of several events) than those contained in individual events, the task to analyze this quantity of data by SOC analysts can be more challenging. PURE and ETIP also deal with LT10 because they can receive OSINT in diverse formats. As ETIP consumes data from the organization’s IT infrastructure to analyze it jointly with OSINT and the resulting TI can be exported to be used in defense mechanisms, it deals with LT8 and LT11, respectively. In turn, SYNAPSE also addresses LT11 for the same reasons as ETIP.
| ID | Limitation | PURE | ETIP | SYNAPSE | AECCP |
|---|---|---|---|---|---|
| LT1 | Shared threat information is too voluminous | x | x | x | x |
| LT2 | Limited technology enablement in threat triage | x | x | x | x |
| LT3 | Data quality | x | x | x | x |
| LT4 | Limited analysis capabilities | x | x | x | x |
| LT5 | Limited advanced analytics capabilities and tasks automation | x | x | x | x |
| LT6 | Focus on data collection | x | x | x | x |
| LT7 | Threat knowledge management limitations | x | |||
| LT8 | Focus on tactical IoCs | x | x | ||
| LT9 | Trust-related issues | x | |||
| LT10 | Diverse data formats | x | x | x | |
| LT11 | Shared intelligence without expiration date | x | x | x | |
| LT12 | Diverse APIs and requirements for integration | ||||
| LT13 | Limited workflow enablement |
Table 2. TIPs Limitations Addressed by PURE, ETIP, SYNAPSE, and AECCP Platforms
The platform we propose—AECCP (last column of the table)—addresses all TIPs’ limitations except the last two (LT12 and LT13). Although AECCP shares the main objective of the other platforms, it employs different types of analysis for filtering and combining data (detailed in Section 4). It gives a step further by proposing a UT and threat main attributes to classify OSINT data, which both will allow reducing the amount of information consolidated in a single and resulting event (something that the other platforms face), and therefore decrease the effort that SOC analysts must employ in analyzing such data. These valencies will treat the limitations of LT7 and LT9 and make AECCP the first platform that achieves that. In addition, it is the first platform that classifies security events in incident categories and removes the existent overlap of classification of public taxonomies’ tags without human intervention (i.e., automatically). As well, our platform consumes diverse OSINT data formats (LT10) and external data (LT8) to improve the quality of TI, and the generated TI can be shared and used in organizations’ defense mechanisms (LT11).
3 DATA ANALYSIS FOR A UNIFIED TAXONOMY AND THREAT MAIN ATTRIBUTES
As we stated before, the primordial goal of this work is to address some of the limitations of TIPs, described in Section 2.6. We manage all of them except the last two (L12 and L13), focusing on the first seven limitations. More specifically, we aim to solve those related to the processing of data in the platforms (i.e., classify, analyze, and generate data automatically), thus minimizing the human intervention in this process. However, to produce the most accurate and complete TI, we have to consider resolving the other four limitations since they are related to these seven. For example, to obtain more comprehensive data about a given attack, it is necessary to consider and process OSINT data that can come in diverse formats (L10). To address the limitations with an adequate solution capable of treating and minimizing them, first we had to understand such constraints. Hence, this section presents the data analysis performed to obtain such understanding.
The analysis is based on MISP events, as MISP is the most open-source TIP adopted among organizations. Therefore, the section first gives an overview of the data sources used to collect the events and how the dataset used in the analysis was built (presented next). Second, it presents an analysis of MISP taxonomies, which shows how the vast set of public incident classification schemes included in MISP to classify the same threat can increase unnecessary complexity and relevant information. To tackle this and decrease such unnecessary information, we propose a UT, which is defined in Section 3.2. In addition, an analysis of MISP event attributes is provided, showing that too many attributes in a single event can also increase the unnecessary complexity, specifically if they do not add useful information. To face this problem, we propose a solution in Section 3.3 that involves discovering which are the most prevalent attributes that underlie a threat. Finally, a brief explanation on how we can take advantage of references to external platforms to increase the quality of TI is given in Section 3.4.
3.1 Data Sources and Dataset
The source information to get the dataset for analysis was provided from external OSINT feeds, and the TIP to collect and process them was MISP. MISP can process different feed formats, namely MISP standardized format, CSV, and free text. CSV and free text feeds are only parsed as MISP Attributes and do not take advantage of all MISP functionalities. Contrarily, the MISP formatted feeds can be parsed from simple MISP Attributes to the more complex MISP Objects and benefit from all MISP functionalities. Therefore, we left aside CSV and free text feeds and worked only with MISP formatted feeds, resulting in the following three feeds: CIRCL OSINT Feed,1 Botvrij.eu Data,2 and inThreat OSINT Feed.3
From these three feeds, we collected 1,366 events published by 14 different organizations, such as CIRCL, CUDESO, InThreat, CthuluSPRL.be, Synovus Financial, VK-Intel, ESET, and NCSC-NL. However, some of these events are dated to 2014, near the embryonic phase of MISP, meaning poorer events with minimal information and more events containing collections of IoCs from multiple attacks (e.g., blacklists). In contrast, recent events (since 2016) were richer in knowledge, and many more events corresponded to one attack. Consequently, we shortened the initial dataset only to contain richer events, resulting in 1,168 out of 1,366 events, in which most of them were provided by CIRCL and CUDESO with 907 and 120 events, respectively.
3.2 Unified Taxonomy
Over the past decades, multiple cyber threat classification systems have been proposed; some of them focus on the classification of actors and methods [35], whereas others focus on specific techniques [28] or specific targets [40]. With more than 100 classification systems, this complex array of taxonomies adds confusion when a security analyst manually analyzes a threat and, consequently, increases the time and effort he spends. This complexity is increased in MISP with unnecessary information since an event can be classified by the analyst for a given incident with different taxonomies, meaning that that event will have several tags with the same meaning. For example, an event classified as ransomware has five tags mapping different taxonomies, namely [ecsirt:malicious-code=“ransomware”], [malware_classification:malware-category=“Ransomware”], [veris:action:malware:variety=“Ransomware”], [enisa:nefarious-activity-abuse=“ransomware”], and [ms-caro-malware:malware-type=“Ransom”]. Based on this evidence, in this section, we present a solution to reduce this complexity by proposing a UT.
As explained previously, events in MISP are classified with tags following taxonomies, meaning that a classified event requires having at least one tag. Our dataset based on this principle contains 1,166 tagged events and 2 untagged events. However, a more detailed analysis showed that many of the tagged events did not have a tag that allowed to classify them correctly into an incident category. Only 691 (out of 1166) events were tagged into an incident category. Furthermore, we found that several occurrences had multiple overlapping classification tags from different taxonomies, meaning duplicated information about their type.
From the 1,166 tagged events, 493 different tags were extracted. Table 3 shows the 16 most used tags in their classification. A more extensive table can be found in Appendix A [24]. From the extracted tags, only 13% of them (62) corresponded to a known incident classification taxonomy (IDs 4–6), meaning that most remaining tags did not add information about the type of the threat but added information about its source (IDs 2, 8, 9, and 14) and its sharing, such as the Traffic Light Protocol (TLP) and OSINT (IDs 1 and 3). Additionally, 61% of the tags (i.e., 302) corresponded to MISP Galaxies. MISP Galaxies are highly customizable and can correspond not only to known attacks (ID 7) but also to attack patterns, threat actors (ID 11), and tools (ID 13). Therefore, we opted not to consider MISP Galaxy tags and the other tags referred to earlier as classification tags due to the high heterogeneity and low information about the type of threat they carried. Hence, for further analysis, we only considered the 62 tags associated with incident classification, which belong to 10 different incident classification taxonomies (the first 10 IDs of Table 4).
| ID | Tag | Hits | ID | Tag | Hits | |
|---|---|---|---|---|---|---|
| 1 | tlp:white | 1,133 | 9 | osint:source-type=“block-or-filter-list” | 32 | |
| 2 | osint:source-type=“blog-post” | 275 | 10 | circl:topic=“finance” | 31 | |
| 3 | Type:OSINT | 273 | 11 | misp-galaxy:threat-actor=“Sofacy” | 26 | |
| 4 | circl:incident-classification=“malware” | 218 | 12 | OSINT | 26 | |
| 5 | malware_classification:malware-category=“Ransomware” | 113 | 13 | misp-galaxy:tool=“Trick Bot” | 24 | |
| 6 | ecsirt:malicious-code=“ransomware” | 98 | 14 | osint:source-type=“technical-report” | 23 | |
| 7 | misp-galaxy:ransomware=“Locky” | 70 | 15 | workflow:todo=“expansion” | 22 | |
| 8 | inthreat:event-src=“feed-osint” | 32 | 16 | osint:lifetime=ephemeral | 21 |
Table 3. The 16 Most Used Tags in Events
| ID | Taxonomy | ID | Taxonomy |
|---|---|---|---|
| 1 | CIRCL.LU taxonomy | 12 | Information security indicators from ETSI GS ISI |
| 2 | eCSIRT.net incident taxonomy | 13 | Malware Attribute Enumeration and Characterization (MAEC) |
| 3 | ENISA threat taxonomy | 14 | Reference security incident classification taxonomy |
| 4 | Microsoft implementation of CARO Naming Scheme | 15 | Threats targeting cryptocurrency, based on CipherTrace report |
| 5 | Internal taxonomy for Canadian Centre for Cyber Security (CCCS) | 16 | Open Threat Taxonomy |
| 6 | Europol common taxonomy for law enforcement and CSIRTs | 17 | Penetration test (pentest) classification |
| 7 | Vocabulary for Event Recording and Incident Sharing (VERIS) | 18 | Infoleak taxonomy |
| 8 | ENISA threat taxonomy in the scope of securing smart airports | 19 | Common Taxonomy for Law enforcement and CSIRTs |
| 9 | SANS malware classification based on “Malware 101—Viruses” | 20 | MONARC |
| 10 | CERT-XLM Security Incident Classification | 21 | Distributed Denial of Service (DDoS) taxonomy |
| 11 | GSMA—Fraud and Security Group | 22 | Incident disposition based on the NASA Incident Response and Management Handbook |
Table 4. The 10 Taxonomies Used for Incident Classification and the 22 of Taxonomies Analyzed to Define the Unified Taxonomy
The UT we propose is based on structures of the eCSIRT.net incident taxonomy and CARO malware naming scheme, and it aims to simplify the event classification while maintaining its details. In addition, since most taxonomies have two tiers of classification, such as the eCSIRT.net incident taxonomy, we opted to follow this level of detail. This allows us to choose the granularity level of the classification. To define UT, we analyzed the 22 public taxonomies listed in Table 4 for the tags related to incident classification.4 UT is composed of 8 incident categories of Tier 1 (like the other two taxonomies) and 38 sub-categories of Tier 2 distributed by Tier 1 categories.
Table 5 relays how each public taxonomy of Table 4 contributed to the definition of UT, in terms of number of incident classification tags for each Tier 2 sub-category (column 3), and, how many taxonomies are in root of each Tier 1 and Tier 2 (column 26). In total, 354 tags from public taxonomies were mapped to our taxonomy, with VERIS, CARO, and Europol being the taxonomies that most contributed (line 41). In addition, eCSIRT.net, VERIS, CERT-XLM, and CARO were the taxonomies that most participated in the definition of Tier 2 sub-categories (last line).
| Unified Taxonomy | Public Taxonomies | |||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Tier 1 | Tier 2 | #Tg | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | #Tx | #W |
| Abusive content | spam | 13 | 1 | 1 | 1 | 1 | 3 | 2 | 1 | 1 | 1 | 1 | 10 | 13 | ||||||||||||
| Malicious code | adware | 4 | 1 | 1 | 1 | 1 | 4 | 1 | ||||||||||||||||||
| backdoor | 4 | 2 | 1 | 1 | 3 | 1 | ||||||||||||||||||||
| browser-modifier | 3 | 2 | 1 | 2 | 2 | |||||||||||||||||||||
| cryptominer | 3 | 1 | 1 | 1 | 3 | 6 | ||||||||||||||||||||
| dialer | 4 | 1 | 2 | 1 | 3 | 1 | ||||||||||||||||||||
| dos | 14 | 4 | 1 | 9 | 3 | 5 | ||||||||||||||||||||
| exploit | 6 | 1 | 2 | 1 | 2 | 4 | 1 | |||||||||||||||||||
| hack-tool | 1 | 1 | 1 | 2 | ||||||||||||||||||||||
| misleading | 8 | 1 | 1 | 6 | 3 | 6 | ||||||||||||||||||||
| monitoring-tool | 7 | 2 | 2 | 3 | 3 | 8 | ||||||||||||||||||||
| password-stealer | 6 | 1 | 1 | 4 | 3 | 6 | ||||||||||||||||||||
| ransomware | 12 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 2 | 1 | 10 | 2 | |||||||||||||
| remote-access-tool | 7 | 1 | 2 | 2 | 1 | 1 | 5 | 1 | ||||||||||||||||||
| settings-modifier | 3 | 1 | 2 | 2 | 4 | |||||||||||||||||||||
| spammer | 4 | 1 | 1 | 2 | 3 | 2 | ||||||||||||||||||||
| spoofer | 2 | 2 | 1 | 2 | ||||||||||||||||||||||
| spyware | 8 | 1 | 2 | 2 | 1 | 1 | 1 | 6 | 2 | |||||||||||||||||
| trojan | 15 | 1 | 10 | 1 | 2 | 1 | 5 | 7 | ||||||||||||||||||
| virtool | 8 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 7 | 3 | ||||||||||||||||
| virus | 7 | 1 | 1 | 2 | 1 | 1 | 1 | 6 | 2 | |||||||||||||||||
| wiper | 5 | 1 | 2 | 2 | 3 | 6 | ||||||||||||||||||||
| worm | 9 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 8 | 2 | |||||||||||||||
| Information- | scanning | 11 | 1 | 1 | 1 | 2 | 1 | 1 | 3 | 1 | 8 | 3 | ||||||||||||||
| gathering | sniffing | 6 | 1 | 3 | 1 | 1 | 4 | 2 | ||||||||||||||||||
| social-engineering | 17 | 1 | 1 | 6 | 4 | 1 | 1 | 2 | 1 | 8 | 12 | |||||||||||||||
| Intrusion- | ids-alert | 12 | 1 | 8 | 1 | 1 | 1 | 5 | 5 | |||||||||||||||||
| attempts | brute-force | 9 | 1 | 4 | 1 | 1 | 1 | 1 | 6 | 3 | ||||||||||||||||
| unknown-exploit | 3 | 1 | 1 | 1 | 3 | 3 | ||||||||||||||||||||
| account-compromise | 6 | 2 | 2 | 2 | 3 | 6 | ||||||||||||||||||||
| system-or-application-compromise | 60 | 4 | 4 | 1 | 1 | 7 | 34 | 2 | 2 | 1 | 2 | 2 | 11 | 6 | ||||||||||||
| botnet-member | 2 | 1 | 1 | 2 | 2 | |||||||||||||||||||||
| Availability | dos-or-ddos | 24 | 1 | 3 | 4 | 4 | 1 | 1 | 2 | 1 | 2 | 5 | 10 | 6 | ||||||||||||
| information- | unauthorised-information-access | 9 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 7 | 3 | |||||||||||||||
| content-security | unauthorised-information-modification | 9 | 1 | 1 | 3 | 1 | 1 | 1 | 1 | 7 | 3 | |||||||||||||||
| Fraud | masquerade | 6 | 1 | 1 | 1 | 1 | 1 | 1 | 6 | 2 | ||||||||||||||||
| phishing | 23 | 1 | 1 | 3 | 2 | 4 | 1 | 1 | 1 | 4 | 2 | 1 | 1 | 1 | 13 | 4 | ||||||||||
| Vulnerable | vulnerable-service | 4 | 1 | 1 | 1 | 1 | 4 | 2 | ||||||||||||||||||
| Contribution of each public taxonomy in #Tags | 354 | 12 | 31 | 23 | 53 | 18 | 35 | 66 | 5 | 9 | 23 | 3 | 12 | 27 | 13 | 1 | 1 | 8 | 2 | 3 | 3 | 5 | 1 | |||
| #Tier 2 categories in which public taxonomies contributed | 9 | 25 | 16 | 20 | 16 | 10 | 21 | 2 | 8 | 21 | 3 | 7 | 10 | 11 | 1 | 1 | 5 | 1 | 3 | 3 | 1 | 1 | ||||
Table 5. Contribution of Each Public Taxonomy of Table 4 in the Definition of the Unified Taxonomy
Table 6 contains an excerpt of UT, showing the relationship map we created for all public taxonomies (columns 1 to 3). The complete definition of UT can be found in Appendix B [24].
| Unified Taxonomy | Public Taxonomies | Bag of Words | |
|---|---|---|---|
| Tier 1 | Tier 2 | ||
| Abusive content | spam | cccs:email-type=“spam” | spam, junk email, junk mail, junk e-mail, |
| circl:incident-classification=“spam” | unsolicited email, unsolicited mail, | ||
| ecsirt:abusive-content=“spam” | unsolicited e-mail, bulk email, bulk mail, | ||
| enisa:nefarious-activity-abuse=“spam” | bulk e-mail, unwanted email, | ||
| europol-event:email-flooding | unwanted mail, unwanted e-mail | ||
| europol-event:spam | |||
| europol-incident:abusive-content=“spam” | |||
| gsma-fraud:technical=“spamming” | |||
| information-security-indicators:iex=“spm.1” | |||
| maec-malware-capabilities:maec-malware-capability= | |||
| “email-spam” | |||
| rsit:abusive-content=“spam” | |||
| veris:action:malware:variety=“spam” | |||
| veris:action:social:variety=“spam” | |||
| malware | adware | cccs:malware-category=“adware” | adware |
| malware_classification:malware-category=“adware” | |||
| ms-caro-malware:malware-type=“adware” | |||
| veris:action:malware:variety=“adware” | |||
| backdoor | maec-malware-behavior:maec-malware-behavior= | backdoor | |
| “install-backdoor” | |||
| ms-caro-malware:malware-type=“backdoor” | |||
| ms-caro-malware-full:malware-type=“backdoor” | |||
| veris:action:malware:variety=“backdoor” | |||
| browser- | cccs:malware-category=“browser-hijacker” | browser hijacker, browser modifier | |
| modifier | ms-caro-malware:malware-type=“broswermodifier” | ||
| ms-caro-malware-full:malware-type=“broswermodifier” | |||
Table 6. Unified Taxonomy (Excerpt of) with Public Taxonomy and Bag of Words Mappings
Additionally, a bag of words was defined for each Tier 2 of UT to describe them and allow further classification. Each bag was created based on words extracted from the public taxonomies and synonyms from these words. These bags of words will not only support further analyses over events with public taxonomy tags but, most importantly, will be used to analyze events without public taxonomy tags (e.g., those two untagged events from our dataset that were not classified yet). The last column of Table 5 contains the number of words affected to each category, in a total of 147 words, and the last column of Table 6 presents the bag of words mapped by category of UT. The complete list of bags of words can be found in Appendix B as part of the definition of UT [24].
3.3 Main Threat Attributes
As stated previously, the volume of shared information is one of the TIPs’ limitations (see Section 2.6). This limitation was observed during the analysis of our dataset in the following formats:
Events containing collections of IoCs from multiple attacks: Most of these events contain IoCs with few or no correlations. For example, some of these events contain lists of malicious IPs with the primary purpose to serve as an input for a detection or prevention component. Since these events contain long lists of attributes with few to no context between each other, we opted to discard them from further analyses, not negatively impacting our results. In total, 17 events were discarded from the 1,168 events.
Events with too many attributes: Twenty percent of our dataset contained events with more than 100 attributes. From the point of view of a security analyst, the more attributes an event has, the more difficult it is to analyze.
To discover the most prevalent attributes that underlie an incident category (i.e., the main threat attributes), the following analyses focused on the events with fewer than 100 attributes and those with too many attributes. For the latter, we intend to understand why they have so many attributes and capture which important information might be extracted from them. Thus, the following three analyses were made considering both numbers of attributes. These analyses combined the results by the number of attributes, aiming to differentiate the results from smaller and bigger events and consequently determine the main attributes. For this purpose, four attribute intervals were considered: I1 (less than or equal to 100), I2 (between 100 and 500), I3 (between 500 and 1,000), and I4 (greater than 1,000).
3.3.1 Distribution of Events by Attributes.
This first analysis was based on the distribution of events by the four intervals of attributes. However, since we aim to get the attributes that better characterize an incident category, it was necessary to determine which events are classified as an incident and which are not, distributing them along with the intervals. We resorted to the public taxonomies’ tags to classify each event according to UT. More precisely, each tag from each event was compared with the public tags and, when matched, classified according to the corresponding Tier 1 category of UT. The 691 tagged events in an incident category were correctly classified in UT, whereas the remaining 460 (out of 1,151) were not classified because they did not have any classification tags related to incidents, so they did not match with any taxonomy. A total of 666 of the classified events fit the first two (I1 and I2) intervals, respectively, with 550 and 116 events. It is important to note that some events were classified with more than one Tier 1 category because they had more than one public tag corresponding to different UT categories.
3.3.2 Identification of Similar Attribute Types.
Due to the high amount of MISP-supported attribute types, a second analysis was made to identify attributes with similar types (i.e., properties) and aggregate them. For example, both MD5 and SHA1 attributes are hash values that are used as a checksum to verify data integrity, so they will be aggregated into the same group named file hash. By aggregating similar types of attributes, the results of the subsequent analysis will be focused on the characteristics of the attributes and not only on their type, meaning that even if our dataset only has attributes with the type MD5, attributes with the type SHA1 will not be discarded from the results since they belong to the same group.
3.3.3 Identification of Threat Main Attributes.
This analysis had the objective of identifying the most predominant attribute groups for each Tier 1 category, based on the previous two analyses. The four intervals of the number of attributes were considered but cumulative. This means that the first cumulative interval (CI1) is equal to I1, the second cumulative interval (CI2) contains all events with a number of attributes until 500 (i.e., I1 and I2), and so on. Table 7 shows the results of this analysis—that is, the most predominant attribute groups for each Tier 1 category of UT. The complete tables can be found in Appendix C [24].
As expected, the events with more attributes have a higher impact on the statistical results due to the weight of an event being directly proportional to the amount of the attributes in itself. This observation can be confirmed from the results presented in the table. As a result, when the analysis was performed over all classified events (CI4 interval of attributes), some of the results had significant discrepancies compared to the analysis results restricted to events with fewer than 100 attributes. For example, for the information-gathering Tier 1 category, the attribute group network name equals 12% of all groups when the analysis is only made over events with fewer than 100 attributes, and the same attribute group equals 61% of all groups when including all the classified events in the analysis (CI4). Since our dataset comprises events with fewer than 100 attributes, we have higher trust in the results gathered from those. Thus, we opted to use the result from the CI1 (or I1) interval. In a more detailed analysis on this interval for all Tier 1 categories, we noticed that four attribute groups are present in every category, namely, Network address, File hash, Other Info, and File name. In addition, the attributes URL and Network name are present in all categories, except in Vulnerable and information-content-security categories. This information will be used to improve the global quality of the events by only using the most important attributes of each category.
3.4 OSINT References to External Platforms
Another key finding from our dataset was many references to external platforms in the form of links, namely 5,325 links from 228 domains. More than 90% of the links pointed to VirusTotal,5 an online service that analyzes files and URLs enabling the detection of viruses, worms, trojans, and other kinds of malicious content using antivirus engines and website scanners. Additionally, platforms like VirusTotal tend to provide APIs to access information without using the website interface. However, the amount of these references increases the time an analyst requires to analyze the event since the analyst needs to jump between platforms to gather information and process it manually. We consider this as a TIP limitation (not pinpointed on Section 2.6, neither by other works [13, 14, 44]) that can easily be turned into a benefit, and it is considered in our proposed solution.
4 AUTOMATED EVENT CLASSIFICATION AND CORRELATION PLATFORM
This section presents the overall design of AECCP, our proposed solution that aims to improve the quality threat intelligence produced by TIPs by classifying and enriching it automatically. In practice, the solution is composed of four core modules, each one focused on one or more limitations verified in our data analysis detailed in Section 3 and some of those presented in Section 2.6, and a fifth module that interconnects the other four and manages all of AECCP’s operations.
Regarding the limitation related to the volume of shared information, we propose an approach to reduce the number of attributes per event based on the most predominant attributes of its category, which were determined in Section 3.3. Moreover, for incident taxonomy management, we propose to classify every event according to the unified taxonomy defined in Section 3.2. Since AECCP will analyze and classify events in an automated way, it also increases technology enablement in threat triage. Furthermore, we propose a solution to enrich the data quality of an event based on OSINT from the VirusTotal platform. To increase the advanced analytic capabilities of MISP, we propose to create new events as clusters of enriched events from the same threat and with related attributes in common, after a correlation process that looks for relationships between attributes of different events. Table 8 depicts the limitations that we addressed in AECCP as well as the proposed solution for each one, the AECCP’s module that comprises the solution, and the section in which it is presented. However, for a better understanding of the solutions, first we present the symbolic representation of an event that is used in the sections, and in Section 4.2 we give an overview of the platform, showing the workflow and interactions between the four modules.
| ID | Limitation | Solution | Module | Section |
|---|---|---|---|---|
| LT10 | Diverse data formats | Every event will be normalized to a standard format | Classifier | 4.3 |
| LT7 | Threat knowledge management limitations | Every event will be classified according to the unified | ||
| taxonomy defined in Section 3.2 | ||||
| LT2 | Limited technology enablement in threat triage | The classification of each event will be automated, | ||
| LT5 | Limited advanced analytics capabilities and tasks automation | based on its data (description of the attack, antivirus reports, etc.) | ||
| LT1 | Shared threat information is too voluminous | Each event will have a simplified view only containing the most predominant attributes stated in Section 3.3 | Trimmer | 4.4 |
| LT3 | Data quality | Events containing links to VirusTotal will be enriched with information provided by the platform | Enricher | 4.5 |
| LT8 | Focus on tactical IoCs | Additionally, events containing hashes and URLs will | ||
| LT9 | Trust-related issues | also be enriched using the same method | ||
| LT4 | Limited analytics capabilities | When at least 2 events from the same category have | Clusterer | 4.6 |
| LT6 | Focus on data collection | an attribute in common, a cluster will be created | ||
| LT11 | Shared intelligence without expiration date | to help an analyst identify related events and to be included in network defense mechanisms |
Table 8. Addressed Limitations and Correspondent Proposed Solutions
4.1 Symbolic Representation of an Event
A TIP’s event can be represented by the tuple \( E_x = \, \lt d, ot, T, A, R\gt \), identified by x and where d is its description, \( T = \lbrace NULL | T_1\ldots T_n\rbrace \) represents the public taxonomy tags that classify it into malicious threat categories and custom tags created by SOCs, for example, to identify the event within the organization; \( A = \lbrace A_1\ldots A_m\rbrace \) represents the attributes, ranging from 1 to m, that characterize the event; and \( R = \lbrace NULL | (A_i, A_j)\ldots (A_u, A_v)\rbrace \) represents the relations between attributes. For example, \( (A_1, A_2) \) represents the relation between \( A_1 \) and \( A_2 \) attributes. If the event is not yet classified and there is no relation between their attributes, NULL is used to indicate this. Finally, all of the other data of an event with minor relevance for this work will be compacted into the field ot.
AECCP follows this event representation, but the elements of AECCP’s events are sets associated with UT, main and enriched attributes, and their relations. We denote \( ^uE_x = \, \lt d, ^uT, ^uA, ^uR\gt \) as being the resulting AECCP event when the platform processes \( E_x \), and we use the following nomenclature: \( ^uT = \lbrace {^uT_1}\ldots {^uT_m}\rbrace \) is the UT tags that classify the event, and \( ^uA = \lbrace ^gA, ^eA\rbrace \) is the set of attributes that characterize the event, which can be main threat attributes (\( ^gA = \lbrace ^gA_1\ldots ^gA_j\rbrace \)) and enriched attributes (\( ^eA = \lbrace ^eA_1\ldots ^eA_v\rbrace \)). A \( ^eA_j \) attribute is the result of an enrichment of a \( ^gA_j \) attribute—that is, a \( ^gA_j \) attribute is enriched with external information from VirusTotal and with antivirus information associated with the result of VirusTotal (resulting in \( ^eA_j \)). \( ^uR = R(^uA) \) the relations between attributes from \( ^uA \). In addition, we denote by \( ^uC_y \) the cluster resulting from the correlation and aggregation tasks performed by AECCP over \( ^uE \) events.
4.2 AECCP Overview
AECCP is a platform that interacts with TIPs (e.g., MISP) to generate new events with their quality threat intelligence increased. In other words, it classifies, enriches, and correlates the events received by TIPs, and does all of the work in an automated manner. The platform is composed of five modules—Classifier, Trimmer, Enricher, Clusterer, and Orchestrator—of which the first four perform together all of the work and the last coordinates the workflow between the other four. Figure 1 depicts the overview of its architecture and the workflow between the four modules:
Fig. 1. Overview of AECCP.
An event \( E_a \), from the TIP database (e.g., MISP), serves as input to the Classifier module, without suffering any pre-processing from TIP. The module aims at classifying each event according to UT. To get the most accurate classification, \( E_a \) is first normalized to a standard format and then is only classified according to the Tier 1 category of UT. Afterward, the event is updated with Tier 1 tags (\( ^uT \) tag set), transforming it into \( E_{a^{\prime }} \).
The Trimmer module aims at reducing the volume of attributes in an event based on the relevancy of those attributes. The module receives \( E_{a^{\prime }} \), iterates over its attributes, and creates \( ^uE_a \), an AECCP event with the most relevant attributes \( ^uA_i \) and the \( ^uT \) tag set from \( E_{a^{\prime }} \).
The new event (\( ^uE_a \)) is then sent to the Enricher module to enrich it with information from VirusTotal. In this module, \( ^uA \) attributes in the event containing URLs or hashes are updated with information from VirusTotal. Additionally, the module adds an associated enriched attribute to the event for each \( ^uA_i \) attribute that was updated (enriched). This new attribute will support the output of antivirus engines, website scanners, and analysis tools (that allowed the update). At the final, \( ^uE_a \) is updated with both attributes and its relationship (\( R(^uA) \)).
\( ^uE_a \) is now reprocessed by the Classifier module, but this time according to the Tier 2 category of UT. Since the event was enriched (by the Enricher) with information not existent at the beginning of the processing, the Classifier module can classify the event more accurately. In this step, the Tier 1 \( ^uT_x \) tags are updated with Tier 2 \( ^uT_x._y \) tags (e.g.,
[\( unified:^uT_1=^uT_1._2 \)] ).The Clusterer module aims at creating clusters of events that share the same threat category and have at least an \( ^uA_i \) attribute in common. Other events that share at least one Tier 2 \( ^uT_x._y \) with \( ^uE_a \) and have at least one valuable attribute \( ^uA_i \) (attributes that provide context to a specific attack) in common with \( ^uE_a \) are clustered in a new cluster event \( ^uC_i \). Moreover, this module is recursive, meaning that it tries to find other events related to every event added to the cluster. Additionally, multiple new \( ^uC_i \) can be created by Clusterer if \( ^uE_a \) has more than one distinct Tier 2 category tag.
Both results provided by the second pass of Classifier and Clusterer can be integrated into defense mechanisms (e.g., firewalls, IDS, IPS, and SIEMs) installed in the organization’s IT infrastructure to protect the organization from cyber-attacks.
Figure 2 presents the detailed workflow within and between the four modules. The following four sections are dedicated to each module to describe its operation in detail.
Fig. 2. The detailed workflow within and between the modules of the AECCP.
4.3 Automated Event Classification
As explained in Section 3.2, the high diversity of classification tags can be a disadvantage from the point of view of threat knowledge management (LT7). Furthermore, the diversity of data formats that OSINT can take (LT10) can have a negative impact on this management, making OSINT processing difficult. Additionally, due to this diversity, most events must be manually analyzed to identify their categories and classify them as such. Since most threat triage and periodization processes rely on the event category (LT2), this manual process creates an unwanted delay in the subsequent processes (LT3). To reduce these limitations, AECCP comprises the Classifier module that automatically classifies events according to the UT after they have their data format normalised and based on the tag, description, and attribute information of TIP’s events. To do so, the Classifier module resorts to two methods: classification based on public taxonomies tags and classification based on keywords.
Regarding the first method, Classifier takes advantage of the mapping information from Table 6 to find every public taxonomy tag \( T_i \) to map to a UT tag \( ^uT_i \). In other words, each TIP’s event will have its tags scanned and matched against the UT mapping table. When matched, the corresponding UT tag \( ^uT_i \) is added to the \( ^uT \) list, if not already in the list. In the end, the T tag list of the event is updated with the \( ^uT \) list it found. For example, if an event has two public tags related to the same threat category (e.g., the tags
For the second method, the Classifier module uses the bag of words from the last column of Table 6 to identify keywords related to a UT category based on the information contained in the description, attributes, and custom tags (tags that do not belong to a public taxonomy) of the TIP’s events. As mentioned previously, some events hold important details in their descriptions that can help an analyst identify the category of the incident. Moreover, it is also possible to gather important information from attributes and custom tags of an event to better classify it. Therefore, events will also have their custom tags, description, and attributes scanned and matched against the bag of words. When matched, the related UT tag \( ^uT_i \) is added to the \( ^uT \) tags list, if not already in the list. Later, this list will be added to the T list. Unlike the first method, this method can classify events that were not tagged yet (i.e., without classification tags; \( T=NULL \)). As an example, if the word phishing is found in the description of an event with no public taxonomy tags, the event will be updated to contain the \( ^uT_i \) tag
Each event is processed two times by the Classifier module, in steps 1 and 4 of Figure 2, each time according to a different UT Tier. In step 1, the module classifies \( E_a \) according to Tier 1 and updates it with the Tier 1 \( ^uT \) tags it found, thus resulting in \( E_{a^{\prime }} \). This step uses the two classification methods described previously. However, in step 4, the Classifier module updates the \( ^uT \) tags determined in step 1, but now according to Tier 2. It uses the classification based on keywords method, but now it resorts to information driven by the processing of the Trimmer and Enricher modules (see the next two sections), which add information that did not belong to the initial event (\( E_a \)), respectively, the main attributes (\( ^gA \)) and the enriched attributes (\( ^eA \)). Therefore, this information is matched against the bag of words for each Tier 1 category already found, obtaining the Tier 2 associated with Tier 1. In addition, new \( ^uT_i \) Tier 1 can be found during the analysis if those attributes contain information that allows this. Afterward, the Tier 1 tags from the \( ^uT \) list are updated with Tier 2 tags, in the form
As final remarks, if \( E_a \) could not be classified according to the Tier 1 category (in step 1) due to lack of information, the event proceeds without \( ^uT \) tags since the subsequent modules will enrich it; so it will receive other information. Step 4 will reprocess and classify it according to Tier 1 and Tier 2 categories. If it still could not be classified, the event exits the pipeline and is not processed by the further modules.
Algorithm 1 represents the main logic behinde Classifier, where the processing of each event is separated in Tier 1 classification (step 1, lines 1 through 3) and Tier 2 classification (step 4, lines 5 through 9) based on the state of the event that was passed into the Classifier module. For each tier classification, the functions classifyTier1 and classifyTier2 are called. The classifyTier1 function (presented in Algorithm 2) uses the Public Taxonomy Mapping (lines 5 through 8) and the Bag of Words (lines 9 through 16) for discovering the \( ^uT_i \) Tier 1 tags. Algorithm 3 shows the logic behind the classifyTier2 function, which also uses the same repositories for processing the information of step 4.



4.4 Event Simplification
The amount of shared information derived from events with too many attributes (LT1) was another limitation verified in Section 3.3. Both manual and automated analyses of events are impacted by unnecessary information. This type of information mainly acts as good to know, opposite to need to know, creating noise and consequently adding complexity to the event. To minimize this limitation, we propose the Trimmer module. Trimmer automatically trims the less relevant attributes from events, based on their UT Tier 1 tags and according to the predominant attributes (i.e., good to know information) resulting from the analysis presented in Section 3.3.
Each event served as an input to the module will have its attributes scanned and mapped according to the attribute groups. Afterward, based on a global relevancy threshold defined by the security analyst for each attribute group (e.g., 10%) and the Tier 1 tags, if the attribute in analysis belongs to a group with greater relevance than the threshold and based on results of Table 7, the attribute will be marked as being a main threat attribute. For cases where the event has no Tier 1 \( ^uT \), it is processed in the same way as if it had all Tier 1 of \( ^uT \) tags, thus not losing any predominant attributes. Finally, if both attributes of an event’s relation were considered main threat attributes, the relation is added to the final event (i.e., to \( ^uE_a \)). This verification and addition are made for every relation the event contains.
Summarily, the module receives \( E_{a^{\prime }} \) as input, identifies its main attributes and the relations between them, and then creates the \( ^uE_a \) event with the description of \( E_{a^{\prime }} \), the \( ^uT \) tags, the list \( ^gA \) of main attributes, and their relations (\( R(^gA) \)). Algorithm 4 shows the logic behind this module, which follows the process described throughout this section.

4.5 OSINT-based Event Enrichment
As explained in Section 3.4, more than 90% of the links contained in events pointed to the VirusTotal online platform. The references to external platforms increase the time an analyst requires to analyze an event since he needs to jump manually between platforms to gather information. Moreover, enriching events with additional information gathered from external sources can significantly improve other processes and tasks (LT3, LT8) if this information is related to a predominant attribute group (a main threat attribute) (LT9).
AECCP integrates an event Enricher module that takes advantage of the references to external platforms to enrich the quality threat intelligence of events. Hence, the module automatically enriches events containing main threat attributes with links to VirusTotal, URLs, or file hashes.
Algorithm 5 illustrates the dataflow made by this module, which follows the process presented next. Each \( ^uE_a \) event processed by Enricher will have its \( ^gA \) main attributes scanned. If any of these attributes have any URL or file hash, it is parsed to extract them. In addition, since VirusTotal links contain IoCs in the target URL, they are also extracted by the same procedure. For each extracted IoC (URL or file hash), a request is sent to VirusTotal, and a report is received containing the most known antivirus engines, website scanners, and analysis tools regarding that IoC. This information will update those \( ^gA_i \) attributes with URLs and file hashes, transforming them into enriched attributes, \( ^eA_i \). Additionally, complementary information can be received like hashes according to different hashing algorithms. Such information is also stored in \( ^eA_i \) attributes, and a relationship between them is created (denoted by \( R(^eA_i) \)).

4.6 Event Clustering
Creating correlations between events is one key feature that helps SOC analysts identify threats with similarities, such as source, target, payload, threat actor, and used tools. However, as mentioned previously, most TIPs have limited advanced analytics capabilities (LT4) related to event correlation. MISP has its built-in correlation algorithm that allows an analyst to identify events that have attributes in common. However, this algorithm relies on the values of the attributes and one key information, a flag, that specifies if that attribute can be correlated. This flag is inserted manually and, if not appropriately used, negatively impacts the correlation of events. For example, if a user adds an attribute to an event that indicates that the payload was sent over HTTP, the correlation of this attribute with attributes from other events will mostly be useless since many attacks use HTTP to send the payload. Therefore, we must know which attributes should be flagged as correlation information and why some attributes should not be flagged as such. Thus, it is crucial to managing event correlation properly. Moreover, this built-in algorithm does not use the information related to the event category, creating a relation between events without context.
The AECCP aims to improve the analytic capabilities (LT4) of TIPs, namely the event correlation capabilities, turning TIPs more than a data collector and repository (LT6). For that, it contains the Clusterer module for automatically creating clusters of events that share the same incident category and have at least one valuable main attribute in common (attributes that provide context to a specific attack, e.g., hashes). The resulting clusters are AECCP events that combine information about the same attack and which can be shared timely with external entities and used in defense mechanisms (LT11).
Hence, each event received by Clusterer will have its main attributes scanned, looking for connections points with other events. For each scanned attribute, if its content does not add value when correlated, it will be skipped. Attributes’ contents such as Booleans, dates, and small sets of possible values like HTTP methods fit in this case because multiple events with no relation have them in common. A concrete example of this case is an HTTP flood attack, which is categorized on UT as
In Figure 2, we can see the transformation of event \( ^uE_a \) processed by Clusterer. When processed, attributes from \( ^gA \) and \( ^eA \) lists are scanned to identify valuable attribute (attributes that provide context to a specific attack). With \( ^gA_x \) being an valuable attribute, a search is made over \( ^uE \) events database to identify other events with \( ^gA_x \). With \( ^uE_b \) being an event that contains \( ^gA_x \) in common with \( ^uE_a \), \( ^uT \) tags from \( ^uE_a \) and \( ^uE_b \) are scanned to find at least one UT tag in common. With \( ^uT_i \) being a common tag for both events, the \( ^uC_{ab} \) cluster is created with the tag \( ^uT_i \). Furthermore, all attributes from \( ^uE_a \) and \( ^uE_b \) are added to the cluster, where for those valuable attributes in common (i.e., that formed the cluster), their contents are concatenated (e.g., \( ^gA_x = [^uE_a(^gA_x)||^uE_b(^gA_x)] \)). Additionally, \( ^uE_a \) and \( ^uE_b \) are also added as attributes to avoid losing the original events that generated the cluster, and relations are created between them. In Section 5.2.4, a real example is provided to better understand Clusterer output.
Algorithm 6 shows the dataflow of Clusterer explained earlier. In lines 3 through 9, the algorithm searches upon events \( ^uE \) on the database to get other events with at least one attribute in common with event \( ^uE_a \).

4.7 Orchestrator
The Orchestrator module is responsible for ensuring that each event, at any time, follows a specific flow, and it is only processed by a module if the event has the required requirements (e.g., only can be enriched if it was already trimmed). Additionally, this module is responsible for checking for new events of TIPs, which were added via sharing or manually and initiating the AECCP processing for each event. In sum, Orchestrator is responsible for the following tasks:
Fetch new TIP’s events: Periodically, it checks if there are new events from the selected OSINT feeds and adds them to the TIP’s database.
Initiate processing of new TIP events: Periodically, it checks for events that were added since the last time AECCP processed an event.
Assure the correct workflow order: Orchestrator acts as a manager by sending each event to the correct next module. This module takes advantage of custom tags that are only used by it, and these tags store the current state of the event regarding the AECCP processing order.
Resume the process: If the processing of an event is interrupted, the module can resume the processing of that event without impacting the event database by falling back to the previous event state.
4.8 Implementation
We implemented AECCP using Python 3.7 and over the MISP. For that, AECCP resorts to PyMISP,6 a Python library to access the MISP platform via their REST API. Implementing AECCP leverages built-in PyMISP functionalities to search, add, or update events and attributes.
AECCP implements the five modules described in Section 4. Its modules can be considered smaller solutions and therefore can work regardless of each other. In addition, the platform has the capability of exporting its events (i.e., \( ^uE \) events and \( ^uC \) clusters) to be used by external entities, such as SIEMs, CSIRTS, and SOCs.
5 EVALUATION
The objective of the experimental evaluation was to answer the following questions:
Is AECCP able to classify events that are not initially tagged?
Is AECCP able to reclassify events previously tagged with a known incident classification taxonomy?
Does AECCP simplify event triage?
Is Trimmer able to reduce the number of attributes of events without losing valuable information for their classification?
Does Enricher improve the quality of the events?
Is AECCP able to correlate different events (threats) that share the same IoC?
Is AECCP more effective than PURE and ETIP platforms?
We validated and evaluated AECCP with three datasets of events. For validation, we used as ground truth the dataset we analyzed in Section 3 (Section 5.1), whereas for evaluation, we used two datasets of which we did not have any knowledge about their events and one of them being constituted by events generated by PURE [3] (Sections 5.2 and 5.3). In addition, Section 5.3 presents an evaluation of AECCP with the PURE and ETIP platforms.
5.1 Validation with the Ground Truth Dataset
To validate AECCP, we used as the ground truth dataset the 1,168 events we analyzed in Section 3. The dataset comprises 2 totally untagged events and 1,166 tagged events, of which, from the latter, 691 events are tagged into an incident category, but several of them have multiple overlapping classification tags from different public taxonomies. The remaining 475 events are not tagged into an incident category; hence, we consider them untagged. Summing up, the ground truth contains 691 tagged events and 477 untagged events. The tagged events will serve to validate the classification based on public taxonomies tags method, whereas the untagged events will validate the classification based on keywords method, both methods from the Classifier module (see Section 4.3). However, note that we want to classify events for both UT tiers, meaning that the Classifier, Trimmer, and Enricher modules will be used and validated, and Classifier will be executed twice.
Processing the 691 tagged events with AECCP, we verified that they were correctly classified into incident categories of UT for both Tier 1 and Tier 2. The resulting classification was checked based on the manual classification we made in the data analysis section (see Section 3). The second column of Table 9 shows these events classified through the eight Tier 1 categories of UT. Notice that an event can fit into different Tier 1 categories.
| Tier 1 | Tagged Events | Unttagged Events |
|---|---|---|
| Abusive content | 145 | 99 |
| Malicious code | 607 | 408 |
| Information-gathering | 63 | 55 |
| Intrusion-attempts | 37 | 43 |
| Availability | 5 | 10 |
| information-content-security | 2 | 12 |
| Fraud | 34 | 40 |
| Vulnerable | 3 | 5 |
| Total | 896 | 672 |
Table 9. Ground Truth Dataset Classified by AECCP over the Tier 1 Incident Categories of UT
For the 477 untagged events, when Classifier processed them the first time, the classification based on keywords method was able to classify 453 of them into Tier 1 categories of UT, based on their descriptions and attribute values. The other 24 remained untagged events, carried on to the Trimmer and Enricher modules, and then re-evaluated by Classifier. We observed after this processing that 16 of them were enriched with external data, but the external data only allowed to tag 8 of them in an incident category (i.e., with UT Tier 1 and Tier 2 tags). Curiously, the 2 totally untagged events were between these 8 events. For all 461 classified events, we manually inspected their information before and after they were processed by AECCP and verified that AECCP correctly tagged them. For the 16 events that the platform failed to classify, we also inspected them to find out why. We checked that they did not provide enough information in their descriptions and attributes to permit them to be associated with an incident category. In addition, the attributes that Enricher enriched did not bring valuable information that would allow their classification. The last column of Table 9 presents the 461 events classified into the eight Tier 1 categories.
Most of the events were classified into the Malicious code (malware) and Abusive content Tier 1 incident categories of UT, reflecting well the number of cyber-attacks that have been made over the Internet. As a result, we can conclude that AECCP has a precision7 of 1 (i.e., 100%) when it classifies events previously labeled by public taxonomies. In contrast, when processing untagged events, AECCP’s precision depends on the information that their descriptions, attributes, and external data can provide about the threats they report. Based on our ground truth, from the 477 untagged events, the platform correctly classified 461 (TP) and did not have false positives (FP, events classified wrongly into incident categories), meaning that it had a precision of 1. However, since it was not able to classify 16 out of the 477 events, we consider these events as being false negatives (FN), and so it had a false-negative rate of 0.033 and a recall8 of 0.966. Overall, based on the 1,168 events, AECCP classified 1,152 (without false positives) and missed 16. Thus, it had a precision of 1, a recall of 0.986, a false-negative rate of 0.013, and an F1-score9 of 0.992.
We measured the time that AECCP takes to process both types of events (tagged and untagged). This time is strongly related to the quantity of data included in the events and that which the platform has to analyze, which depends on diverse factors, namely the number of the public taxonomy tags, the number of attributes, and the amount of external data. As expected, the greater the amount of data, the longer it takes to process it. In addition, tagged events take longer than untagged events, considering that both types of events have the same number of attributes and the same amount of external data. This is explained by the fact that the former have their tags analyzed by the classification based on public taxonomies tags method, whereas the latter does not. For the tagged events with fewer than 100 attributes, the average time for processing an event by AECCP is 30 seconds. Considering all 691 tagged events, it takes an average of 41 seconds to consume an event, with a standard deviation of 17 seconds, which means that, at most, it takes approximately 1 minute to process an event. Regarding untagged events, the processing times are shorter, namely (i) 24 seconds on average for events with fewer than 100 attributes; (ii) 31 seconds on average for processing any event out of the 477 events, with an 11-second standard deviation; and (iii) a maximum of 42 seconds to process an event. Therefore, the maximum time AECCP takes to process an event is 1 minute. Although it seems a bit long, we consider it acceptable given that it is the cost of reducing to zero the time spent by SOC analysts in analyzing and classifying events, which might incur classification errors.
5.2 Processing Dataset of MISP’s Events
This section assesses the ability of AECCP to process a dataset composed of 64 MISP events that were not previously processed by the platform. The following sections present the characterization of the dataset and its processing by AECCP’s modules.
5.2.1 Dataset Characterization.
The dataset’s events were provided from different providers—CIRCL, CUDESO, inThreat, VK-Intel, ESET, and MalwareMustDie—where 54 of the events were from the first two sources. From the 64 events, approximately 77% (49 events) of them did not contain any tags related to a known incident classification taxonomy, meaning that those events were not yet classified. These events will serve to evaluate AECCP’s ability to classify events with the classification based on keywords method and to answer question 1. Regarding the volume of attributes of the events and distributing them according to the same four intervals used in Section 3.3, the dataset is mainly composed of events with fewer than 100 attributes, 90% of the 64 events.
To get a detailed evaluation of our solution, we choose to perform a more in-depth analysis of the (remaining) 15 events that, contrarily to the other 49 events, were initially classified with a known incident classification taxonomy. We choose these events since they can be used to evaluate almost all use cases that AECCP deals with, except AECCP’s ability to classify events that are not initially classified, which can be evaluated by comparing the number of unclassified events initially and after being processed by AECCP. Table 10 shows a more detailed view of the tags and the attributes of these 15 events, namely their public taxonomy tags (column 2); the total number of tags (TT, column 3), including tags that did not add information about the type of the threat (e.g., TLP); the number of classification tags related to threat incidents (CT, column 4); and the number of attributes (Att, column 5). As we can observe, all of the events have more tags than those that really classify events with known incidents, with some of them having a considerable number of tags not associated with incidents, such as events 1, 11, and 12. As already stated, such tags do not add value of threats, making the SOC analyst waste time by analyzing irrelevant information.
Table 10. Characterization of the Dataset of MISP’s Events and Results of Processing of It by AECCP
5.2.2 Event Classification.
This section seeks to evaluate AECCP’s ability to classify events into UT for Tier 1 and Tier 2. Thus, the Classifier module will be evaluated for all of its functionalities, as well as the Trimmer and Enricher modules since these two modules support Classifier in the classification of events. In addition, this section aims to answer the first three questions.
After AECCP processed the dataset, 61 out of the 64 events were classified, increasing 72% of the number of classified events. We recall that only 15 events were initially classified with public taxonomy tags. Only 3 (out of the 64) events were not classified into UT due to the lack of information in their descriptions and the absence of indicators that Enricher could process (e.g., URL), thus adding more information to the events helpful to Classifier. The classification was verified manually, meaning that AECCP correctly processed all events.
The 49 out of the 64 events without any tags related to a known incident classification taxonomy were processed only using the classification based on keywords method. AECCP was able to classify 46 of them, meaning that the 3 events that were not classified belong to this data subset. Overall, 75% (46) of 61 classified events by AECCP were classified only based on keywords, meaning that AECCP can classify events that are not initially classified, answering positively to question 1.
Regarding the analysis targeted to the 15 events initially classified with a known incident classification taxonomy, the platform was able to use both classification methods and classify them correctly. Almost every event was classified with a new type of threat that was not initially considered in the public taxonomy tags. For example, event \( E_1 \) from Table 10 was identified only as spam before being processed by AECCP, but after being processed by AECCP it was also classified as malicious code with virus, worm, and spammer, meaning that AECCP is able to reclassify events, thus answering question 2. The sixth column of Table 10 shows this reclassification for the 15 events, where their original classification (second column) was transformed in the tags of the sixth column.
From the 15 events, on average, each had five more tags than before being processed by AECCP, thus increasing their tags from two to seven (columns 4 and 8). As explained in Sections 3.2 and 4.3, AECCP classifies events according to UT and also based on information contained in their description, meaning that each event classification can be improved. These assumptions can increase the number of tags per event. In addition, it is important to note that after being processed by AECCP, all of the tags on the events tag list are classification tags, contrary to before being processed by AECCP where most of the tags were not classification tags but added information about its source and its sharing (e.g., TLP). In columns 4 and 8 of Table 10, the number of tags is shown regarding known incident classification taxonomy, before and after being processed by AECCP.
From the 15 events, 14 of them had their total number of tags significantly reduced (columns 3 and 7) due to two factors. The first is when an event has overlapping classification tags in its initial tag list (e.g.,
5.2.3 Attribute Trimming and Enrichment.
This section looks to evaluate AECCP’s ability to trim and enrich events. More precisely, we evaluated the Trimmer and Enricher modules and sought to answer the fourth and fifth questions.
Before being processed by AECCP, our dataset had approximately 90% of events with fewer than 100 attributes. After being processed by AECCP, the number of events with fewer than 100 attributes decreased to 85% of the initial number. This means, at first glance, that our solution enriches more than it trims, adding more attributes than removing.
To understand this overall attribute increment, we analyzed the number of attributes of the events in three specific phases: before being processed by Trimmer, exactly after being processed by Trimmer, and finally after being processed by Enricher. From the results of this analysis, we can see that, on average, Trimmer removes 12 attributes per event and Enricher adds 54 attributes per event, thus increasing 44 attributes per event. Enricher’s increase is because it can add a maximum of 6 new attributes for each hash and 12 new attributes for each URL. For example, if an event has attributes containing three hashes and three URLs, Enricher will add 54 attributes to the event. Summing up, on average, the number of attributes in the three phases is 49, 37, and 91. Therefore, the attribute increment is due to Enricher, which overlaps Trimmer’s effect since this last trims the event attributes effectively.
Similar to the Classifier evaluation, we also evaluated the impact of Trimmer and Enricher on the 15 events. Table 10 shows the number of attributes on the three phases, namely before they are processed by Trimmer and Enricher (Att, column 5), after Trimmer (AT, column 9), and after Enricher (AE, last column). We verified that AECCP could reduce the number of attributes of some events depending on the type of attributes of those events, so Trimmer, in these cases, reduced the number of attributes effectively. This was observed in 6 out of the 15 events. However, we also verified that in those events where their attributes contain hashes and URLs, their number of attributes was increased by Enricher. Summing up, 7 events were increased, where 4 were first trimmed. Two of the remaining 8 events were trimmed but not enriched, and the other 6 were neither trimmed nor enriched. Overall, 6 had their number of attributes increased, 3 had their attributes reduced, and the remaining 6 maintained their number of attributes.
We evaluated the 15 events with and without these two modules to answer the fourth and fifth questions. Table 11 shows the results of this evaluation, which compares the number of classification tags when events were not processed by Trimmer and Enricher (columns 2, 6, and 10) with the number of classification tags when they only were processed by Trimmer (columns 3, 7, and 11), and with the number of classification tags when processed by both modules (columns 4, 8, and 12). As we can observe, all events have the same number of tags in columns 2 and 3, 6 and 7, and 10 and 11, meaning that Trimmer does not remove valuable information for the classification of events, answering positively to question 4. We can also observe from columns 4, 8, and 12 that the number of classification tags of 4 events were increased (\( E_3 \), \( E_8 \), \( E_9 \), and \( E_{15} \)), where 2 of them leveraged from the enrichment provided by Enricher (\( E_8 \), and \( E_{15} \)). Therefore, we conclude that Enricher improved the quality of the events, answering question 5.
5.2.4 Clustering.
This section aims to assess AECCP’s ability to correlate different events that share mutual IoCs (i.e., the Clusterer module) and answers the sixth question.
Since our evaluation dataset is small (64 events) and therefore Clusterer might not create many clusters, we allowed these events to be correlated with events from our ground truth dataset, thus totaling 1,232 events. With this approach, we were able to create 24 clusters. Table 12 details some of these clusters, whereas the rest are omitted since they have the same properties, except their taxonomies, as one of the clusters in this table. For example, clusters 100, 101, and 102 have exactly the same attributes and correlations, but they were created with different taxonomies (
| \( ^uC_x \) | # Events | Taxonomy and Description | # Att | Mutual IoCs |
|---|---|---|---|---|
| 1 | 2 | malicious-code=“worm” | 416 | www.tashdqdxp.com |
| -Soft Cell case indicators | ||||
| -Malware with Ties to SunOrcal | ||||
| 9 | 3 | malicious-code=“trojan” | 68 | https://twitter.com/VK_Intel/status/1128079463785349121 |
| -FIN7 JScript Loader Malware | ||||
| -APT28 XTunnel Backdoor | ||||
| -Turla Kazuar RAT | ||||
| 10 | 2 | malicious-code=“virus” | 47 | https://twitter.com/VK_Intel/status/1128079463785349121 |
| -FIN7 JScript Loader Malware | ||||
| -APT28 XTunnel Backdoor | ||||
| 11 | 2 | malicious-code=“ransomware” | 69 | All except one |
| -Sodinokibi ransomware | ||||
| -Ransomware exploits WebLogic vulnerability | ||||
| 14 | 2 | malicious-code=“cryptominer” | 65 | CVE-2019-3396 |
| -Botnet Malware Exploits CVE-2019-3396 | ||||
| -SystemTen (ELF trojan, miner, bot and rootkit) | ||||
| 119 | 2 | malicious-code=“backdloor” | 53 | All except three |
| -Operation ShadowHammer | ||||
| -Operation ShadowHammer | ||||
| 21 | 2 | malicious-code=“ransomware” | 28 | https://www.bleepingcomputer.com/new-lockergoga-ransomware-allegedly-used-in-altran-attack/ |
| -The Norsk Hydro ransomware attack | ||||
| -New LockerGoga Ransomware in Altran Attack |
Table 12. Clusters Created by AECCP
Figure 3 presents one of the clusters that were created by AECCP, identified with ID 21 in Table 12. This cluster is formed by two events (1518 and 1520) that have a common attribute, a link, and a common UT tag,
Fig. 3. Cluster 21 created by AECCP and composed of two events: 1518 on the right and 1520 on the left.
5.3 Processing Events with the PURE and ETIP Platforms
To demonstrate AECCP’s ability to process events processed by other platforms existent in the literature, without losing relevant information by trimming event attributes and enriching the information they carried and, hence, their threat impact, we processed six events from PURE [3]. In addition, we compare the resulting events with the PURE versions by submitting them to ETIP [15] to calculate the TS of the threat value they carried.
Table 13 shows the characterization of the six events of PURE—namely, for each eIoC, the number of events it aggregates (#E, column 2), its description (column 3), the number of attributes it contains (#att, column 4), and its threat score measured by ETIP (TS, column 5).
| PURE and ETIP | AECCP and ETIP | |||||||
|---|---|---|---|---|---|---|---|---|
| ID | #E | Description | #att | TS | #AT | #AE | Unified Taxonomy | TS |
| E1 | 2 | - OSINT Aveo Malware Family Targets Japanese Speaking | 82 | 1.29 | 77 | 87 | malicious-code=“backdloor” | 1.29 |
| - Pivot on whois registrant [email protected] | malicious-code=“trojan” | |||||||
| E2 | 2 | - OSINT - Packrat: Seven Years of a South American | 267 | 2.54 | 257 | 423 | availability=“dos-or-ddos” | 2.68 |
| Threat Actor | fraud=“phishing” | |||||||
| - Packrat: Seven Years of a South American Threat Actor | malicious-code=“backdloor” | |||||||
| malicious-code=“dos” | ||||||||
| malicious-code=“ransomware” | ||||||||
| malicious-code=“trojan” | ||||||||
| malicious-code=“worm” | ||||||||
| E3 | 2 | - Expansion on [email protected] | 274 | 3.22 | 273 | 401 | malicious-code=“backdloor” | 3.50 |
| - New Variant of Gh0st Malware by Palo Alto Networks | malicious-code=“trojan” | |||||||
| Unit 42 | ||||||||
| E4 | 3 | - Spear Phishing Attack Using Cobalt Strike | 85 | 2.53 | 78 | 159 | abusive-content=“spam” | 2.58 |
| Against Financial Institutions | fraud=“phishing” | |||||||
| - RTF files for Hancitor utilize exploit for CVE-2017-11882 | malicious-code=“exploit” | |||||||
| - Targeted Attack in the Middle East by APT34, | malicious-code=“spammer” | |||||||
| using CVE-2017-11882 | malicious-code=“trojan” | |||||||
| vulnerable=“vulnerable-service” | ||||||||
| E5 | 3 | - EPS Processing Zero-Days Exploited by Multiple | 156 | 2.87 | 146 | 361 | information-gathering=“scanning” | 3.12 |
| Threat Actors | malicious-code=“backdloor” | |||||||
| - Malicious Documents Targeting Security Professionals | malicious-code=“exploit” | |||||||
| - APT28 Targets Hospitality Sector, Presents Threat | malicious-code=“ransomware” | |||||||
| to Travelers | malicious-code=“trojan” | |||||||
| malicious-code=“worm” | ||||||||
| vulnerable=“vulnerable-service” | ||||||||
| E6 | 4 | - Sakula Malware Family | 842 | 3.11 | 821 | 2907 | information-gathering=“scanning” | 3.40 |
| - Cyber-Kraken (Threat Group 3390 / Emissary Panda) | malicious-code=“backdloor” | |||||||
| - Korean Website Installs Banking Malware | malicious-code=“trojan” | |||||||
| - Sakula Reloaded | ||||||||
#E, number of events; #att, number of attributes;
#AT: number of attributes after Trimmer; #AE: number of attributes after Enricher.
Table 13. PURE Events Characterization, Processed by AECCP, and TS Calculation by ETIP
#E, number of events; #att, number of attributes;
#AT: number of attributes after Trimmer; #AE: number of attributes after Enricher.
The six events received from PURE were processed by AECCP, producing the results shown in columns 6 through 8 of the table. As we can observe, AECCP could process events from an external platform. All of the events, which were not initially tagged, were classified by AECCP (column 8). In addition, the initial number of attributes (column #att) was slightly reduced (column #AT) by Trimmer. However, as explained in Section 5.2.3, AECCP adds, on average, 44 attributes per event when it enriches events. This increase can be seen in column #AE, a price to pay for the added value. But this increase allowed events to gain more information, which apparently is relevant since their threat impact grew and was reflected in their TS value (last column).
Based on these results, we can answer positively to question 7, meaning that AECCP improves the quality TI better than the other two platforms. Notice that the ETIP platform calculates the TS of events (enriched IoC), meaning that the platform contains an enricher module that aggregates and correlates events before calculating TS. Therefore, if the TS value of AECCP’s events is higher than ETIP’s events, this means that AECCP generates events with better quality than ETIP. The same is concluded about PURE.
6 IMPROVEMENTS AND FUTURE WORK
The prevention and detection of cyber-attacks have deserved significant attention from organizations, which have been adopting new strategies and defense mechanisms to protect themselves. TI has emerged as an ally of organizations, allowing them to access information about threats that have occurred. They use TI for various purposes, namely to verify whether their assets are vulnerable to an attack that has occurred, to update their defense mechanisms with rules and patterns on announced threats, and to check whether their assets have been victims of an attack.
TI must be timeless for organizations to be proactive on time and avoid severe damage. However, TI only announces attacks after they have already occurred, thus being a reactive notification [41, 51] and not much useful for victim organizations. To develop proactive TI, it is necessary to obtain data from the online hacker community to understand what is happening in that community and try to predict possible malicious actions. One way to do this is to access underground forums where, for example, hackers exchange technical mechanisms and tutorials of malicious tools that they can use to carry out attacks [41]. These tools can be found and purchased within the dark web (DW), more precisely in dark-net markets. In addition, dark-net forums are placed within the DW for the hacker community [2]. By accessing the DW data and collecting and analyzing it, it is possible to identify emerging hacker threats, thus proactive TI [42].
AECCP was designed in light of traditional TI, meaning that the unified taxonomy and the main threat attributes were defined based on public taxonomies and security events of traditional TI. AECCP can benefit from DW data in various ways:
The unified taxonomy can be extended with Tier 2 tags and bag of words based on terms only observed in the DW and that are related to an incident category (Tier 1 level) of UT.
Processing data provided by DW sources, classifying it with the extended UT and aggregating it with (i) some other DW data associated with the same attack intent. In this case, SOC analysts can get insights into malicious actions and anticipate potential attacks that have been planned. Next, they can be proactive and make decisions to prevent them against the organization; (ii) traditional TI that already exists from some announced misbehaviour but no associations and has been passed unnoticed by security analysts (e.g., some attacks that have been planned but not yet fully executed). In this case, the SOC analyst can be proactive and activate the necessary protections against the attack; (iii) traditional TI from an already occurred attack. In this case, the resulting information is reactive, but the analyst can have access to information about the attack plan and from there can make some decisions based on that.
Make the necessary modifications in AECCP to accept the different formats provided by the DW data.
7 CONCLUSION
In this article, we proposed and presented AECCP, an implementation of an approach to improve quality threat intelligence produced by TIPs by classifying and enriching it automatically. AECCP is composed of a set of smaller solutions, each one focused on one or more limitations of TIPs, which were verified in a detailed data analysis over an intelligence dataset of more than 1,000 security events. Regarding threat knowledge management limitations and technology enablement in threat triage limitations, the platform integrates a Classifier module that classifies each event according to a UT proposed by us. To deal with the high volume of shared threat information, we proposed a Trimmer module for trimming the low-value information from each event, based on main threat attributes we discovered upon the data analysis. AECCP contains an Enricher module for data improvement that enriches each event based on intelligence collected from VirusTotal. Last, to address advanced analytics limitations, we proposed a Clusterer module that creates clusters of events that share information and context about the same threat and represents each cluster as an AECCP event.
To prove the applicability and feasibility of AECCP, the platform was developed based on the MISP platform. AECCP was validated over more than 1,000 events and tested against a dataset of 64 newer and not used events and 6 events produced by a different platform—PURE. From these tests, we created 24 clusters, classified, trimmed, and enriched by AECCP, and we were able to trim and enrich the events produced by PURE. In addition, these events were processed by another platform, ETIP, to calculate their TS. The results showed that AECCP produces quality TI better than the other platforms.
Footnotes
1 https://www.circl.lu/doc/misp/feed-osint/.
Footnote2 http://www.botvrij.eu/data/feed-osint/.
Footnote3 https://feeds.inthreat.com/osint/misp/.
Footnote4 https://www.misp-project.org/taxonomies.html.
Footnote5 https://www.virustotal.com/.
Footnote6 https://pymisp.readthedocs.io/.
Footnote7 Precision \( = TP/(TP+FP) \).
Footnote- Footnote
9 F1-score \( = 2*(Precision*Recall/(Precision+Recall)) \).
Footnote
- [1] . 2021. Processing tweets for cybersecurity threat awareness. Information Systems 95 (2021), 101586.Google Scholar
Cross Ref
- [2] . 2019. Dark-net ecosystem cyber-threat intelligence (CTI) tool. In Proceedings of the 2019 IEEE International Conference on Intelligence and Security Informatics (ISI’19). 92–97.Google Scholar
- [3] . 2019. PURE: Generating quality threat intelligence by clustering and correlating OSINT. In Proceedings of the 18th IEEE International Conference on Trust, Security, And Privacy in Computing and Communications (TrustCom’19). 483–490.Google Scholar
Cross Ref
- [4] . 2016. Threat Intelligence: What It Is, and How to Use It Effectively. Retrieved February 22, 2022 from https://nsfocusglobal.com/wp-content/uploads/2017/01/SANS_Whitepaper_Threat_Intelligence__What_It_Is__and_How_to_Use_It_Effectively.pdf.Google Scholar
- [5] . 2014. A study on advanced persistent threats. In Proceedings of the 15th IFIP International Conference on Communications and Multimedia Security. 63–72.Google Scholar
Cross Ref
- [6] . 2018. CIRCL Taxonomy—Schemes of Classification in Incident Response and Detection. Retrieved February 22, 2022 from https://www.circl.lu/pub/taxonomy/.Google Scholar
- [7] . 2015. Incident Classification/Incident Taxonomy According to eCSIRT.net—Adapted. Retrieved February 22, 2022 from https://www.trusted-introducer.org/Incident-Classification-Taxonomy.pdf.Google Scholar
- [8] . 2020. The FASTEST Way to Consume Threat Intelligence. Period. Retrieved February 22, 2022 from https://csirtgadgets.com/collective-intelligence-framework.Google Scholar
- [9] . 2020. OpenIOC—Sharing Threat Intelligence. Retrieved February 22, 2022 from https://www.darknet.org.uk/2016/06/openioc-sharing-threat-intelligence/.Google Scholar
- [10] . 2020. A methodology to evaluate standards and platforms within cyber threat intelligence. Future Internet 12, 6 (2020), 108.Google Scholar
Cross Ref
- [11] . 2013. Open source intelligence and privacy dilemmas: Is it time to reassess state accountability? Security and Human Rights23, 4 (
April 2013), 1–12.Google Scholar - [12] . 2015. Standards and Tools for Exchange and Processing of Actionable Information.
Technical Report . ENISA.Google Scholar - [13] . 2017. Exploring the Opportunities and Limitations of Current Threat Intelligence Platforms.
Technical Report . ENISA.Google Scholar - [14] . 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). EUR-Lex. Retrieved February 22, 2022 from https://eur-lex.europa.eu/eli/reg/2016/679/oj.Google Scholar
- [15] . 2019. Enriching threat intelligence platforms capabilities. In Proceedings of the 16th International Conference on Security and Cryptography (SECRYPT’19). 37–48.Google Scholar
Cross Ref
- [16] . 2013. Taking a Lean-Forward Approach to Combat Today’s Cyber Attacks.
Technical Report . FireEye.Google Scholar - [17] . 2012. Intelligence in the internet age: The emergence and evolution of Open Source Intelligence (OSINT). Computers in Human Behavior 28 (
March 2012), 673–682.Google ScholarDigital Library
- [18] . 2021. ETIP: An enriched threat intelligence platform for improving OSINT correlation, analysis, visualisation and sharing capabilities. Journal of Information Security and Applications 58 (
May 2021), 102715.Google ScholarCross Ref
- [19] . 2014. OSINT: A “Grey Zone”? International Journal of Intelligence and Counterintelligence 27, (May 2014), 529–549.Google Scholar
Cross Ref
- [20] . 2016. Threat Intelligence: Planning and Direction. SANS Institute–InfoSec Reading Room.Google Scholar
- [21] . 2020. 2020 SANS Cyber Threat Intelligence (CTI) Survey. SANS Institute—InfoSec Reading Room.Google Scholar
- [22] . 2020. TheHive Project: Open Source, Free and Scalable Cyber Threat Intelligence & Security Incident Response Solutions. Retrieved February 22, 2022 from https://blog.thehive-project.org/tag/soltra-edge/.Google Scholar
- [23] . 2013. Joint Intelligence (JP 2-0).
Technical Report . U.S. Army.Google Scholar - [24] . 2020. Additional Info on the Paper Submitted to ACM TOPS. Retrieved February 22, 2022 from https://sites.google.com/view/siteaddinfo-tops.Google Scholar
- [25] . 2014. Operational levels of cyber intelligence. International Journal of Intelligence and Counterintelligence 27, 4 (Dec. 2014), 702–719.Google Scholar
- [26] . 2016. Reduce Business Risk with an Effective Threat Intelligence Capability. Retrieved February 22, 2022 from https://www.recordedfuture.com/threat-intelligence-capability/.Google Scholar
- [27] . 2018. Security Intelligence. Retrieved February 22, 2022 from https://docs.microsoft.com/en-us/windows/security/threat-protection/intelligence/.Google Scholar
- [28] . 2004. A taxonomy of DDoS attack and DDoS Defense mechanisms. ACM SIGCOMM Computer Communication Review 34, 2 (
May 2004), 39–53.Google Scholar - [29] . 2020. MISP Taxonomies. Retrieved February 22, 2022 from https://www.misp-project.org/datamodels/#misp-taxonomies.Google Scholar
- [30] . 2020. Open Source Threat Intelligence Platform & Open Standards for Threat Information Sharing. Retrieved February 22, 2022 from http://www.misp-project.org.Google Scholar
- [31] . 2020. CRITs: Collaborative Research into Threats. Retrieved February 22, 2022 from https://crits.github.io/.Google Scholar
- [32] . 2020. Introduction to STIX. Retrieved February 22, 2022 from https://oasis-open.github.io/cti-documentation/stix/intro.html.Google Scholar
- [33] . 2020. Introduction to TAXII. Retrieved February 22, 2022 from https://oasis-open.github.io/cti-documentation/taxii/intro.html.Google Scholar
- [34] . 2016. Understanding Cyber Threat Intelligence Operations. Retrieved February 22, 2022 from https://www.bankofengland.co.uk/-/media/boe/files/financial-stability/financial-sector-continuity/understanding-cyber-threat-intelligence-operations.pdf.Google Scholar
- [35] . 2020. Cyber threat intelligence: A product without a process? International Journal of Intelligence and CounterIntelligence 34, 2 (2020), 300–315.Google Scholar
- [36] . 2020. The not yet exploited goldmine of OSINT: Opportunities, open challenges and future trends. IEEE Access 8 (2020), 10282–10304.Google Scholar
Cross Ref
- [37] . 2020. A comparative analysis of cyber-threat intelligence sources, formats and languages. Electronics 9, 5 (
May 2020), 824.Google ScholarCross Ref
- [38] . 2016. Intelligence Defined and Its Impact on Cyber Threat Intelligence. Retrieved February 22, 2022 from https://www.robertmlee.org/intelligence-defined-and-its-impact-on-cyber-threat-intelligence/.Google Scholar
- [39] . 2015. Vulnerability disclosure in the age of social media: Exploiting Twitter for predicting real-world exploits. In Proceedings of the 24th USENIX Security Symposium. 1041–1056.Google Scholar
- [40] . 2014. A taxonomy of browser attacks. In Handbook of Research on Digital Crime, Cyberspace Security, and Information Assurance. IGI Global, 291–313.Google Scholar
- [41] 2017. Exploring emerging hacker assets and key hackers for proactive cyber threat intelligence. Journal of Management Information Systems 34, 4 (2017), 1023–1053.Google Scholar
Cross Ref
- [42] . 2020. Proactively identifying emerging hacker threats from the dark web: A diachronic graph embedding framework (D-GEF). ACM Transactions on Privacy and Security 23, 4 (
Aug. 2020), Article 21, 33 pages.Google ScholarDigital Library
- [43] . 2017. Threat intelligence sharing platforms: An exploratory study of software vendors and research perspectives. Wirtschaftsinformatik und Angewandte Informatik 2017 (2017), 1–15.Google Scholar
- [44] . 2019. The Evolving Cyber Threat to the Global Banking Community. Retrieved February 22, 2022 from https://www.swift.com/pt/node/147646.Google Scholar
- [45] . 2011. Advanced Persistent Threats: A Symantec Perspective.
Technical Report . Symantec.Google Scholar - [46] . 2019. Threat Intelligence Platforms. Everything You’ve Ever Wanted to Know But Didn’t Know to Ask. ThreatConnect.Google Scholar
- [47] . 2019. What is cyber threat intelligence and how is it evolving? In Cyber-Vigilance and Digital Trust: Cyber Security in the Era of Cloud Computing and IoT. John Wiley & Sons, 1–49.Google Scholar
- [48] . 2018. A survey on technical threat intelligence in the age of sophisticated cyber attacks. Computers & Security 72 (
Jan. 2018), 212–233.Google ScholarDigital Library
- [49] . 2016. MISP: The design and implementation of a collaborative threat intelligence sharing platform. In Proceedings of the 2016 ACM on Workshop on Information Sharing and Collaborative Security. 49–56.Google Scholar
Digital Library
- [50] . 2014. Threat Intelligence: What Is It, and How Can It Protect You from Today’s Advanced Cyber-Attacks.
Technical Report . Gartner.Google Scholar - [51] . 2018. Incremental hacker forum exploit collection and classification for proactive cyber threat intelligence: An exploratory study. In Proceedings of the 2018 IEEE International Conference on Intelligence and Security Informatics. 94–99.Google Scholar
Digital Library
Index Terms
Generating Quality Threat Intelligence Leveraging OSINT and a Cyber Threat Unified Taxonomy
Recommendations
The AI-Based Cyber Threat Landscape: A Survey
Recent advancements in artificial intelligence (AI) technologies have induced tremendous growth in innovation and automation. Although these AI technologies offer significant benefits, they can be used maliciously. Highly targeted and evasive attacks in ...
A Feature-driven Method for Automating the Assessment of OSINT Cyber Threat Sources
AbstractGlobal malware campaigns and large-scale data breaches show how everyday life can be impacted when the defensive measures fail to protect computer systems from cyber threats. Understanding the threat landscape and the adversaries’ ...










Comments