skip to main content
research-article
Open Access

Generating Quality Threat Intelligence Leveraging OSINT and a Cyber Threat Unified Taxonomy

Published:19 May 2022Publication History

Skip Abstract Section

Abstract

Today’s threats use multiple means of propagation, such as social engineering, email, and application vulnerabilities, and often operate in different phases, such as single device compromise, lateral network movement, and data exfiltration. These complex threats rely on advanced persistent threats supported by well-advanced tactics for appearing unknown to traditional security defenses. As organizations realize that attacks are increasing in size and complexity, cyber threat intelligence (TI) is growing in popularity and use. This trend followed the evolution of advanced persistent threats, as they require a different level of response that is more specific to the organization. TI can be obtained via many formats, with open-source intelligence one of the most common, and using threat intelligence platforms (TIPs) that aid organizations to consume, produce, and share TI. TIPs have multiple advantages that enable organizations to quickly bootstrap the core processes of collecting, analyzing, and sharing threat-related information. However, current TIPs have some limitations that prevent their mass adoption. This article proposes AECCP, a platform that addresses some of the TIPs limitations. AECCP improves quality TI by classifying it accordingly a single unified taxonomy, removing the information with low value, enriching it with valuable information from open-source intelligence sources, and aggregating it for complementing information associated with the same threat. AECCP was validated and evaluated with three datasets of events and compared with two other platforms, showing that it can generate quality TI automatically and help security analysts analyze security incidents in less time.

Skip 1INTRODUCTION Section

1 INTRODUCTION

In today’s world, most organizations are digital, operating with technologies and processes of the Internet era. The changes in IT infrastructure and usage models, including mobility, cloud computing, and virtualization, have dissolved traditional enterprise security perimeters, creating a vast attack surface for hackers and other threat actors [45]. Managing the digital landscape in which an organization operates is a challenge that has never been more difficult, making an organization vulnerable to many forms of attack.

Not only has the digital landscape has evolved, but there also has been a significant evolution in cyber threats, as adversaries have advanced their knowledge. They have deployed increasingly sophisticated means of circumventing individual controls within users’ local environments and probed further into their systems to execute well-planned and orchestrated attacks [44]. With the increase of the digital landscape and the threat landscape complexity, organizations are more likely to be targeted and suffer a severe cyber-attack, with high financial and reputational impact. With the high probability and impact of cyber-attacks, in addition to the significant regulatory pressure to protect the information, such as the European Union’s General Data Protection Regulation, organizations are encouraged to look for new solutions to reduce their vulnerabilities [14].

One domain that has emerged during the past decade is cyber threat intelligence (TI). This new domain combines key aspects from incident response and traditional intelligence, and it can be defined as “the process and product resulting from the interpretation of raw data into information that meets a requirement as it relates to the adversaries that have the intent, opportunity and capability to do harm” [38]. However, compared to other cyber domains, such as incident response and security operations, TI is still in the early adoption phase, limited by the lack of suitable technologies, known as threat intelligence platforms (TIPs) [45, 47]. Although organizations recognize the potentiality of TI, the lack of tools that would help them manage the collected information and convert it to actions is preventing the mass adoption of this kind of solution.

With the emergence of new threat actors, like advanced persistent threats (APTs), organizations cannot rely on a single solution to protect from this type of threat. The static approach of traditional security based on heuristic and signature does not match new threats known to be evasive, resilient, and complex. These complex threats rely on well-advanced tactics to appear unknown to signature-based tools and yet authentic enough to bypass spam filters [16]. Today’s organizations must deploy a multi-layered defense to improve their chances of detecting or disrupting an attack to fight these threats.

Under a form of open-source intelligence (OSINT), TI information can provide knowledge to a vast selection of systems and processes that form this multi-layered defense, such as antivirus and intrusion prevention systems and the processes that manage these solutions and review the events generated by them. This knowledge can be collected from many sources using TIPs. However, TIPs receive thousands of security events, which makes it hard to analyze them to extract relevant data about threats. The volume and quality of data are the most common barriers to effective information exchange. In addition, shared data is often outdated and not specific enough to aid the decision-making process, becoming unactionable [48]. The confidence level of information is another barrier since most sources do not provide this information, forcing security operations center (SOC) analysts to put additional effort into evaluating and verifying the received data. In addition, most organizations cannot make valuable use of their threat data because there is too much—approximately 250 to millions of indicators of compromise (IoCs) per day [48]. Considering the volume of shared threat information, most of the platforms end up being data warehouses rather than platforms where threat information can be analyzed. Moreover, the time SOC analysts spend analyzing and classifying incidents has increased due to this volume of data, not valuable information and duplication of incident classification in several public incident taxonomies (e.g., eCSIRT and ENISA). There are few platforms [1, 3, 18] that deal with these drawbacks. They aggregate diverse OSINT data related to the same threat into a single event. At first, this approach is beneficial, as it avoids the manual analysis of several individual events and the manual attempt to establish their relationships. As a result, it will decrease the time spent by SOC analysts performing this task. However, aggregating a set of events into one will increase the amount of information that analysts must check. This amount can reach more than 1,000 attributes in an event, and therefore, the time required to analyze it can be longer than the time needed to analyze the set of events individually.

This article proposes AECCP, an automated event classification and correlation platform (AECCP) that implements an approach to address some TIP limitations by generating highly information-rich objects under a standard format and a single unified taxonomy (UT), with their threat categories characterized by main threat attributes. In addition, it correlates and aggregates these objects into clusters of objects, thus generating quality TI that shares the same threat type and other information. To improve the collection and automatic classification of actionable TI, as well as to define the UT, we first need to understand the TI life cycle, as well as the available information sources and current TIPs, and to identify the main attributes that allow characterizing each threat category of UT. This requires working on all levels of the intelligence-gathering operation, using an automated system to (i) receive data from multiple sources, (ii) improve the enrichment process and validate the information collected by cross-referencing it, (iii) produce objects under a standard format and taxonomy, and (iv) store the obtained intelligence in such a way that it can be applied in the optimisation of defense mechanisms. Moreover, by using a UT and the main threat attributes, the problem that arose from the platforms aforementioned will be solved.

To the best of our knowledge, this article is the first to (i) propose a unified taxonomy to classify security events; (ii) study and identify the main attributes that better describe threat types; (iii) classify security events automatically into an incident category and remove the overlap of classification tags, without human intervention; and (iv) propose a platform to reduce the amount of information aggregated in a single event, after an event correlation and clustering task. Moreover, our approach aims to improve the response of threat analysts and all of the systems used by the organization against today’s complex threats. In addition, it aims at finding ways to benefit from OSINT to increase the detection capabilities of defense mechanisms, such as security information and event management systems (SIEMs) or intrusion detection systems (IDS), reducing the number of false positives and negatives.

We validated and evaluated AECCP with three datasets of security events. Our results suggest that AECCP can automatically classify TI into an incident category and generate new and enriched TI that associate different security events regarding the same threat in a single way. In addition, we compared AECCP with two platforms from the literature, and the results show that our approach performs better than the others.

The main contributions of the article are as follows. First, we present a UT to reduce the overlapping of taxonomies with the same meaning and simplify the event classification while maintaining its details. Second, we present the identification of the main attributes that characterize each incident category into the proposed taxonomy, which will allow reducing the volume of shared information. Third, we offer an approach that aims to improve quality threat intelligence produced by TIPs by automatically classifying and enriching it. The approach is composed of a set of modules, each one focused on one or more limitations of TIPs and verified in our data analysis. Fourth, we present AECCP and its assessment with three event datasets and two other platforms.

Skip 2BACKGROUND AND RELATED WORK Section

2 BACKGROUND AND RELATED WORK

2.1 Advanced Persistent Threats

Today’s generation threats are multi-vectored and often multi-stage—that is, most attacks use multiple means of propagation, such as social engineering, email, and application vulnerabilities, and most attacks operate in different phases, such as single device compromise, lateral network movement, and data exfiltration [48]. These complex threats rely on social engineering techniques, the latest zero-day vulnerabilities, and well-advanced tactics for appearing unknown to signature-based tools and yet authentic enough to bypass spam filters. Traditional security defenses were developed to inspect each attack vector as a separate path and each stage of an attack as an independent event, failing in identifying and analyzing an attack as an orchestrated series of cyber incidents [16].

APTs, being one of today’s generation threats that had a significant impact on the rise of cybercrime, branched from young hackers in the Black Hat community, whose objective was mayhem and reputation, to organized crime groups provided by states and private entities [45]. Chen et al. [5] characterize APTs and separate them from other criminal enterprises online, with them being specific targets and clear objectives, highly organized and well-resourced attackers, long-term campaigns with repeated attempts, and stealthy and evasive techniques [5].

2.2 Open Source Intelligence

The earliest forms of (OSINT date back to World War II, marked by the ability to find relevant information and combine it in a way that treats information as a resource rather than a commodity [17, 23]. OSINT can be defined as intelligence produced from publicly available information (open-source information (OSINF)), such as information gathered from radio, television, newspapers, websites, blogs, papers, and conferences. Today, due to the development of the Internet, this type of information has become significantly more accessible and cheaper to gather than the traditional public information acquired by clandestine services. In comparison to other sources of information, like human intelligence, OSINF can sometimes provide extra information and be a more reliable and safe way of acquiring intelligence [11].

To produce OSINT, OSINF is analyzed, edited, filtered, and validated. Moreover, the information gathered is linked with other sources to verify, complement, and contextualize the collected data. The more public are available sources, the better intelligence will be produced [11, 17]. OSINT has become one of the most common forms of intelligence and is considered a goldmine for organizations [36]. For instance, recent studies stated that valuable and early information can be provided by social networks, such as Twitter [39, 48]. One of the biggest advantages of using OSINT is the cost, as it is much less expensive than traditional information-gathering tools. In addition to the cost advantage, OSINT has many benefits when it comes to sharing and accessing information, as the latter can be legally and easily shared with anyone, and open sources are always available and up to date [19]. However, OSINT has some constraints, such as the high quantity of available information that needs to be processed to create valid intelligence, demanding a high amount of work to extract useful information from the noise. This task requires a large amount of analytical work from security specialists to distinguish valid, verified information from false, misleading, or inaccurate data. A final constraint of OSINT is that its production may not always provide the needed answer since it only uses available information [19].

2.3 Threat Intelligence

Threat intelligence (TI) can be defined as “evidence-based knowledge, including context, mechanisms, indicators . . . about the hazard to assets that can be used to inform decisions regarding the subject’s response to that menace or hazard” [50].

In its simplest form, TI is the process of understanding the threats toward an organization based on available information. However, there must also be an understanding of how the information relates to the organization. Hence, it must be combined with contextual information to determine relevant threats to the organization. Moreover, TI is valuable to an organization only if it is actionable. If the SOC cannot determine how to best respond to, combat, or mitigate a threat to the organization, then the information provides little to no value [4]. Detecting incidents sooner and potentially even preventing them is the overall goal of TI. Organizations often see TI as a way to reinforce the environment and prepare for both known and unknown threats.

TI has grown in popularity and use among organizations as they realize that attacks have increased in size and complexity. According to a TI survey, 85.5% of respondents have at least one person responsible for consuming or producing TI in their organization and 7.1% of respondents plan to have one shortly. This trend followed the evolution of targeted attacks and APTs as they require a different level of response that is more specific to the organization [21]. Many organizations are convinced that TI is a valuable tool to help them better understand their attackers.

As stated, the objective of creating TI is the creation and delivery of a product that can be acted upon. While threat intelligence professionals find value in sharing threat information through informal and traditional communication channels, the results are inconsistent and unscalable. Hence, better frameworks are needed for communicating TI to provide an adequate answer to today’s complex threats. Such frameworks should include standardized reporting terminology and processes, benefit in information sharing for cybersecurity purposes, the ability for users to create trusted communities, and technical infrastructure to share and analyze TI at machine speed. In the absence of an industry-standard framework, current sharing mechanisms include private or restricted face-to-face meetings and phone calls; emails, forums, and message boards; web portals with wiki-type capabilities; web portals acting as document management systems; web portals (some with APIs) allowing downloads of structured data; and web portals offering social networking facilities with secure access and sharing controls [12].

TI represents security threat activities that are provided as a form of IoCs—that is, information artifacts obtained from a forensic analysis that aggregate data on malicious activity in a system or within a network that was attacked [26]. For sharing TI among entities and security platforms and structuring its information, diverse standard formats have been proposed, with OpenIoC [9], STIX [32], TAXII [33], CSV, and MISP format the most popular. However, its use is not widespread and is poorly implemented [37].

2.4 Threat Intelligence Platforms

Threat intelligence sharing platforms (TIPs) were introduced to fill the industry-standard gap in TI sharing, and gaps and limitations of actual detection and monitoring defense mechanisms placed in IT infrastructures [46]. In this sense, TIPs are used for OSINT and TI collection and their processing, storage, sharing, and integration of their resulting data with other security platforms and tools related to incident response and threat management (e.g., SOC, CSIRTs). They retrieve (structured and unstructured) data from several external sources (e.g., OSINT feeds) and process these data by applying various operations, such as filtering, normalization, aggregation, and some correlation [3].

TIPs usually vary in the (i) objective, as some are used to operational information, whereas others may be focused on long-term risk analysis; (ii) the scope of their action, from accepting only processed inputs to possessing natural language processing capacities; and (iii) their capabilities, of which current platforms range from data acquisition and storage to advanced analytics using machine learning. Despite their differences, the functionalities of TIPs follow the steps of the threat intelligence life cycle, namely planning and direction, collection, processing and exploitation, analysis and production, dissemination, and integration [4, 20, 25, 34].

Since the existence of TIPs, their adoption by organizations has grown and played an important role in spreading security threat activity among the collaborative entities working in this field. However, their adoption and implementation are still in their infancy [43], with many limitations to be resolved, such as automatic trust assessment and classification of TI and advanced capabilities of analysis, where SOC intervention continues to be required to filter and retrieve TI information that is relevant and effectively actionable.

Some open-source TIPs have been adopted by organizations, with the next four those widely used [48]: MISP (the Malware Information Sharing Platform) [30], CIF (the Collective Intelligence Framework) [8], CRITs (Collaborative Research Into Threats) [31], and SoltraEdge [22], with MISP being the most popular.

2.5 MISP

MISP was initially created by the NATO Computer Incident Response Capability Technical Centre (NCIRC TC) to implement the Smart Defense concept and presently is owned by the Computer Incident Response Centre Luxembourg (CIRCL). One of the key concepts of MISP is the sharing of intelligence among members of the same community [30, 49].

Currently, MISP has not only, but mainly, the following capabilities: sharing; storage; automatic correlation of IoCs; advanced filtering capabilities; and export and import of data in the most popular formats, namely STIX, OpenIOC, CSV, and MISP standardized format [10, 49]. IoCs, also called MISP events, contain technical and general information of TI, which are represented in MISP format and stored in a database of indicators.

A new entry in MISP’s database is called an event object, which can be defined as a set of characteristics and all kinds of descriptions of an IoC. These characteristics and relevant information are called attributes. Examples of attribute types are hash, filename, hostname, and IP address. An attribute can even be a complex object that contains multiple attributes. An example of a complex attribute is an antivirus signature, which can include the name of the antivirus, the name of the signature, and the detection date [49]. Furthermore, each attribute can be correlated with other simple or complex attributes. In addition, IoCs, when stored, are automatically correlated to describe the relationships between attributes and indicators [10].

2.5.1 Taxonomies.

Data classification is often bound to internal, community, or national classification schemes. One common problem is the mapping of events into categories. This is a complex task since categories are not always known in advance. Since a centralized pre-defined set of definitions that satisfies all potential users is a hard challenge, MISP uses a distributed approach based on machine tags. However, the freedom of defining tags can easily lead to a situation where multiple tags have the same meaning, making filtering complicated. A new concept of tagging was introduced to overcome this problem—the taxonomies. Taxonomy is based on a triple tag structure with a namespace, a predicate, and a value— for example, [enisa:nefarious-activity-abuse=“ransomware”]. This flexible concept allows classifying and tagging events following an organization’s own classification schemes or existing taxonomies used by other organizations. A clear advantage of this concept is the still human-readable format of the machine tags [49].

In its default configuration, MISP includes a set of public incident classification taxonomies [29], where some of the most used of them are described next, and their tags are presented in Table 1 as being recognized in the MISP tag structure:

Table 1.
eCSIRT.net Taxonomy Main CategoryMicrosoft Implementation of the CARO Naming Scheme
ecsirt:abusive-contentms-caro-malware:malware-type=“Adware”
ecsirt:malicious-codems-caro-malware:malware-type=“Backdoor”
ecsirt:information-gatheringms-caro-malware:malware-type=“Behavior”
ecsirt:intrusion-attemptsms-caro-malware:malware-type=“BroswerModifier”
ecsirt:intrusionsms-caro-malware:malware-type=“Constructor”
ecsirt:availabilityms-caro-malware:malware-type=“DDoS”
ecsirt:information-content-securityms-caro-malware:malware-type=“Dialer”
ecsirt:fraudms-caro-malware:malware-type=“DoS”
ecsirt:vulnerablems-caro-malware:malware-type=“Exploit”
ecsirt:otherms-caro-malware:malware-type=“HackTool”
ecsirt:testms-caro-malware:malware-type=“Joke”
ms-caro-malware:malware-type=“Misleading”
CIRCL.LU Taxonomyms-caro-malware:malware-type=“MonitoringTool”
circl:incident-classification=“spam”ms-caro-malware:malware-type=“Program”
circl:incident-classification=“system-compromise”ms-caro-malware:malware-type=“PUA”
circl:incident-classification=“scan”ms-caro-malware:malware-type=“PWS”
circl:incident-classification=“denial-of-service”ms-caro-malware:malware-type=“Ransom”
circl:incident-classification=“copyright-issue”ms-caro-malware:malware-type=“RemoteAccess”
circl:incident-classification=“phishing”ms-caro-malware:malware-type=“Rogue”
circl:incident-classification=“malware”ms-caro-malware:malware-type=“SettingsModifier”
circl:incident-classification=“XSS”ms-caro-malware:malware-type=“SoftwareBundler”
circl:incident-classification=“vulnerability”ms-caro-malware:malware-type=“Spammer”
circl:incident-classification=“fastflux”ms-caro-malware:malware-type=“Spoofer”
circl:incident-classification=“sql-injection”ms-caro-malware:malware-type=“Spyware”
circl:incident-classification=“information-leak”ms-caro-malware:malware-type=“Tool”
circl:incident-classification=“scam”ms-caro-malware:malware-type=“Trojan”
circl:incident-classification=“cryptojacking”ms-caro-malware:malware-type=“TrojanClicker”
circl:incident-classification=“locker”ms-caro-malware:malware-type=“TrojanDownloader”
circl:incident-classification=“screenlocker”ms-caro-malware:malware-type=“TrojanDropper”
circl:incident-classification=“wiper”ms-caro-malware:malware-type=“TrojanNotifier”
circl:incident-classification=“sextortion”ms-caro-malware:malware-type=“TrojanProxy”
ms-caro-malware:malware-type=“TrojanSpy”
ms-caro-malware:malware-type=“VirTool”
ms-caro-malware:malware-type=“Virus”
ms-caro-malware:malware-type=“Worm”

Table 1. eCSIRT.net, CIRCL.LU and Microsoft Implementation of CARO Taxonomies Recognized in the MISP Tag Structure

  • eCSIRT.net [7] (middle-high of column 1): This taxonomy was developed many years ago, but the main categories are still current and can easily be used. However, the subcategories can lead to problems with classifying an incident. Despite its defects, many European Computer Security Incident Response Teams (CSIRTs) use it, which allow teams to team up with others.

  • CIRCL.LU [6] (middle-bottom of column 1): MISP owners and main contributors use their taxonomy for classifying incidents. With some similarities with eCSIRT.net taxonomy, CIRCL.LU only has one level of classification.

  • Microsoft implementation of CARO Naming Scheme [27] (second column): According to the Computer Antivirus Research Organization (CARO) malware naming scheme, Microsoft designates malware and unwanted software. This scheme was created by a committee at CARO and was the first attempt to make malware naming consistent.

2.6 Limitations of TIPs

TIPs have multiple advantages that enable organizations to easily bootstrap the core processes of collecting, normalizing, enriching, correlating, analyzing, disseminating, and sharing threat information. However, current solutions have some limitations that prevent their mass adoption. Next, we present the limitations related to the current state and usage of TIPs [13, 35, 47]:

  • LT1—Shared threat information is too voluminous: One of the problems is the overload of threat information shared via open source, commercial sources, and communities. Combining shared threat information from different sources makes the relevant intelligence hard to find and makes it difficult to generate value.

  • LT2—Limited technology enablement in threat triage: There is limited technology enablement to facilitate the relevancy determination process. Currently, this process is done manually, in a complex way, and dependent on the analyst.

  • LT3—Data quality: The confidence level of information is not provided by most of the feed, forcing analysts to put additional effort into evaluating and verifying the received data.

  • LT4—Limited analysis capabilities: Most TIPs have limited capabilities related to browsing, attribute-based filtering, advanced searching information, pivoting, exploration, and visualization.

  • LT5—Limited advanced analytics capabilities and automation tasks: Most TIPs have limited capabilities related to aggregation, composition, and generalization of data, as well as the capability to de-duplicate, tag, and classify data automatically.

  • LT6—Focus on data collection: Considering the volume of shared threat information and the limited analysis capabilities provided by TIPs, most of the platforms end up being data warehouses rather than platforms where threat information can be shared and analyzed.

  • LT7—Limited threat knowledge management: No common vocabulary is used for describing threat actors, tactics, techniques, procedures, and tools.

  • LT8—Focus on tactical IoCs: Tactical IoCs are mostly shared, lacking comprehensive threat information. Standardized formats are underused or even not used during information sharing, noting that most information is exchanged in unstructured files.

  • LT9—Trust-related issues: Most TIPs have limitations in the way that organizations interact and contribute to specific communities, and most platforms do not allow organizations to share only specific types of threat data with particular communities.

  • LT10—Diverse data formats: Although there are community efforts to provide connectors between different standards and formats, converting information without losing any elements or context from the source format is a challenge. Most TIPs tend to stay with one format, limiting the flexibility of the TIP users.

  • LT11—Shared intelligence without expiration date: Currently, the time-to-live information is not provided by most of the feeds, and TIPs have limited capabilities in handling this type of metadata information.

  • LT12—Diverse APIs and requirements for integration: TIPs integrate with a standard set of services and tools while the owners prioritize requests for additional integrations.

  • LT13—Limited workflow enablement: Currently, TIPs provide limited workflow capabilities that would make the process of threat management more efficient, such as the capability of stakeholders to send requests for information.

2.7 Platforms for Resolving Limitations of TIPs

A few platforms try to reduce some TIPs’ limitations and improve TI processing.

PURE [3] is a platform that generates improved intelligence based on OSINT. This enhanced intelligence translates into new enriched IoCs obtained by correlating and combining IoCs from different OSINT feeds sharing information about the same threat. The novel cluster method used by PURE allows the creation of clusters that can be summarized and converted into an enriched IoC, allowing the discovery of unidentified patterns and the detection of new complex attacks. The platform comprises the normalization of the different IoC formats in a single one and compares the IoCs received with the IoCs stored in the database to check the existence of duplicates. Besides discarding the duplicated IoCs, it discards those that provide no new information. The set of IoCs of interest resulting from a filter step is sent to a clustering module, which applies similarity and weighs metrics over them to aggregate similar and related IoCs to create quality TI. IoCs belonging to a cluster are correlated to find the most relevant information that characterizes a threat and then are converted into a single enriched IoC.

ETIP [15, 18] is a platform that extends the importing capabilities, the quality assessment processes, and the information-sharing capabilities in current TIPs. ETIP gathers and processes structured information from external sources, such as OSINT and a monitored IT infrastructure. It comprises two main modules: a composed IoC module, in charge of collecting, normalising, processing, and aggregating IoCs from OSINT feeds, and a context-aware intelligence sharing module, able to correlate, assess, and share static and real-time information with data obtained from multiple OSINT sources. ETIP computes a threat score (TS) associated with each IoC before sharing it with other tools and trusted external parties. Enriched IoCs produced by ETIP contain a TS that allows SOC analysts to prioritize the analysis of incidents. The TS evaluates heuristics with two weights: individual weights assigned to every attribute based on their relevance, accuracy, and variety, and a global weight (i.e., completeness criterion) assigned to the heuristic. The higher the TS value, the more reliable the IoC.

SYNAPSE [1], a Twitter-based streaming threat monitor for threat detection in SOCs, implements a pipeline that gathers tweets from a set of accounts, filters them based on the monitored infrastructure, and classifies the remaining tweets as either relevant or not. The pipeline is composed of a data collector, a filter, pre-processing and feature extraction module, a classifier, and a clustering module. The data collector requires a set of accounts, from which it will collect every posted tweet using Twitter’s stream API. The filtering approach assumes that a tweet must mention a particular IT infrastructure asset when referring to a threat to a specific IT infrastructure asset. Only tweets that include at least one of the keywords will pass the filter. The pre-processing and feature extraction module is then used to normalize the tweet representation before the classifier. Two classifiers were explored for the classification of tweets according to their security relevance: Support Vector Machines (SVM) and Multi-Layer Perceptron (MLP) neural networks. Finally, SYNAPSE uses clustering to aggregate similar tweets in the newsfeed stream, adapting a Clustream algorithm to achieve the desired threat aggregation. Relevant tweets are grouped in dynamic clusters and presented as IoCs that can be manually inspected or fed to SIEMs and other TI tools.

Table 2 presents which TIPs limitations (stated in Section 2.6) are addressed by these platforms (columns 3 through 5). They all have the main objective of creating quality TI through new analytical approaches and in an automated way. The new TI is obtained by filtering and combining OSINT associated with the same threat in a single security event. The concretization of this objective addresses the first six TIPs’ limitations (LT1 to LT6) since the resulting TI will allow decreasing the amount of individual and not related data (security events) that SOC analysts must analyze. However, as this resulting TI aggregates in a single event much more information (the merging of several events) than those contained in individual events, the task to analyze this quantity of data by SOC analysts can be more challenging. PURE and ETIP also deal with LT10 because they can receive OSINT in diverse formats. As ETIP consumes data from the organization’s IT infrastructure to analyze it jointly with OSINT and the resulting TI can be exported to be used in defense mechanisms, it deals with LT8 and LT11, respectively. In turn, SYNAPSE also addresses LT11 for the same reasons as ETIP.

Table 2.
IDLimitationPUREETIPSYNAPSEAECCP
LT1Shared threat information is too voluminousxxxx
LT2Limited technology enablement in threat triagexxxx
LT3Data qualityxxxx
LT4Limited analysis capabilitiesxxxx
LT5Limited advanced analytics capabilities and tasks automationxxxx
LT6Focus on data collectionxxxx
LT7Threat knowledge management limitationsx
LT8Focus on tactical IoCsxx
LT9Trust-related issuesx
LT10Diverse data formatsxxx
LT11Shared intelligence without expiration datexxx
LT12Diverse APIs and requirements for integration
LT13Limited workflow enablement

Table 2. TIPs Limitations Addressed by PURE, ETIP, SYNAPSE, and AECCP Platforms

The platform we propose—AECCP (last column of the table)—addresses all TIPs’ limitations except the last two (LT12 and LT13). Although AECCP shares the main objective of the other platforms, it employs different types of analysis for filtering and combining data (detailed in Section 4). It gives a step further by proposing a UT and threat main attributes to classify OSINT data, which both will allow reducing the amount of information consolidated in a single and resulting event (something that the other platforms face), and therefore decrease the effort that SOC analysts must employ in analyzing such data. These valencies will treat the limitations of LT7 and LT9 and make AECCP the first platform that achieves that. In addition, it is the first platform that classifies security events in incident categories and removes the existent overlap of classification of public taxonomies’ tags without human intervention (i.e., automatically). As well, our platform consumes diverse OSINT data formats (LT10) and external data (LT8) to improve the quality of TI, and the generated TI can be shared and used in organizations’ defense mechanisms (LT11).

Skip 3DATA ANALYSIS FOR A UNIFIED TAXONOMY AND THREAT MAIN ATTRIBUTES Section

3 DATA ANALYSIS FOR A UNIFIED TAXONOMY AND THREAT MAIN ATTRIBUTES

As we stated before, the primordial goal of this work is to address some of the limitations of TIPs, described in Section 2.6. We manage all of them except the last two (L12 and L13), focusing on the first seven limitations. More specifically, we aim to solve those related to the processing of data in the platforms (i.e., classify, analyze, and generate data automatically), thus minimizing the human intervention in this process. However, to produce the most accurate and complete TI, we have to consider resolving the other four limitations since they are related to these seven. For example, to obtain more comprehensive data about a given attack, it is necessary to consider and process OSINT data that can come in diverse formats (L10). To address the limitations with an adequate solution capable of treating and minimizing them, first we had to understand such constraints. Hence, this section presents the data analysis performed to obtain such understanding.

The analysis is based on MISP events, as MISP is the most open-source TIP adopted among organizations. Therefore, the section first gives an overview of the data sources used to collect the events and how the dataset used in the analysis was built (presented next). Second, it presents an analysis of MISP taxonomies, which shows how the vast set of public incident classification schemes included in MISP to classify the same threat can increase unnecessary complexity and relevant information. To tackle this and decrease such unnecessary information, we propose a UT, which is defined in Section 3.2. In addition, an analysis of MISP event attributes is provided, showing that too many attributes in a single event can also increase the unnecessary complexity, specifically if they do not add useful information. To face this problem, we propose a solution in Section 3.3 that involves discovering which are the most prevalent attributes that underlie a threat. Finally, a brief explanation on how we can take advantage of references to external platforms to increase the quality of TI is given in Section 3.4.

3.1 Data Sources and Dataset

The source information to get the dataset for analysis was provided from external OSINT feeds, and the TIP to collect and process them was MISP. MISP can process different feed formats, namely MISP standardized format, CSV, and free text. CSV and free text feeds are only parsed as MISP Attributes and do not take advantage of all MISP functionalities. Contrarily, the MISP formatted feeds can be parsed from simple MISP Attributes to the more complex MISP Objects and benefit from all MISP functionalities. Therefore, we left aside CSV and free text feeds and worked only with MISP formatted feeds, resulting in the following three feeds: CIRCL OSINT Feed,1 Botvrij.eu Data,2 and inThreat OSINT Feed.3

From these three feeds, we collected 1,366 events published by 14 different organizations, such as CIRCL, CUDESO, InThreat, CthuluSPRL.be, Synovus Financial, VK-Intel, ESET, and NCSC-NL. However, some of these events are dated to 2014, near the embryonic phase of MISP, meaning poorer events with minimal information and more events containing collections of IoCs from multiple attacks (e.g., blacklists). In contrast, recent events (since 2016) were richer in knowledge, and many more events corresponded to one attack. Consequently, we shortened the initial dataset only to contain richer events, resulting in 1,168 out of 1,366 events, in which most of them were provided by CIRCL and CUDESO with 907 and 120 events, respectively.

3.2 Unified Taxonomy

Over the past decades, multiple cyber threat classification systems have been proposed; some of them focus on the classification of actors and methods [35], whereas others focus on specific techniques [28] or specific targets [40]. With more than 100 classification systems, this complex array of taxonomies adds confusion when a security analyst manually analyzes a threat and, consequently, increases the time and effort he spends. This complexity is increased in MISP with unnecessary information since an event can be classified by the analyst for a given incident with different taxonomies, meaning that that event will have several tags with the same meaning. For example, an event classified as ransomware has five tags mapping different taxonomies, namely [ecsirt:malicious-code=“ransomware”], [malware_classification:malware-category=“Ransomware”], [veris:action:malware:variety=“Ransomware”], [enisa:nefarious-activity-abuse=“ransomware”], and [ms-caro-malware:malware-type=“Ransom”]. Based on this evidence, in this section, we present a solution to reduce this complexity by proposing a UT.

As explained previously, events in MISP are classified with tags following taxonomies, meaning that a classified event requires having at least one tag. Our dataset based on this principle contains 1,166 tagged events and 2 untagged events. However, a more detailed analysis showed that many of the tagged events did not have a tag that allowed to classify them correctly into an incident category. Only 691 (out of 1166) events were tagged into an incident category. Furthermore, we found that several occurrences had multiple overlapping classification tags from different taxonomies, meaning duplicated information about their type.

From the 1,166 tagged events, 493 different tags were extracted. Table 3 shows the 16 most used tags in their classification. A more extensive table can be found in Appendix A [24]. From the extracted tags, only 13% of them (62) corresponded to a known incident classification taxonomy (IDs 4–6), meaning that most remaining tags did not add information about the type of the threat but added information about its source (IDs 2, 8, 9, and 14) and its sharing, such as the Traffic Light Protocol (TLP) and OSINT (IDs 1 and 3). Additionally, 61% of the tags (i.e., 302) corresponded to MISP Galaxies. MISP Galaxies are highly customizable and can correspond not only to known attacks (ID 7) but also to attack patterns, threat actors (ID 11), and tools (ID 13). Therefore, we opted not to consider MISP Galaxy tags and the other tags referred to earlier as classification tags due to the high heterogeneity and low information about the type of threat they carried. Hence, for further analysis, we only considered the 62 tags associated with incident classification, which belong to 10 different incident classification taxonomies (the first 10 IDs of Table 4).

Table 3.
IDTagHitsIDTagHits
1tlp:white1,1339osint:source-type=“block-or-filter-list”32
2osint:source-type=“blog-post”27510circl:topic=“finance”31
3Type:OSINT27311misp-galaxy:threat-actor=“Sofacy”26
4circl:incident-classification=“malware”21812OSINT26
5malware_classification:malware-category=“Ransomware”11313misp-galaxy:tool=“Trick Bot”24
6ecsirt:malicious-code=“ransomware”9814osint:source-type=“technical-report”23
7misp-galaxy:ransomware=“Locky”7015workflow:todo=“expansion”22
8inthreat:event-src=“feed-osint”3216osint:lifetime=ephemeral21

Table 3. The 16 Most Used Tags in Events

Table 4.
IDTaxonomyIDTaxonomy
1CIRCL.LU taxonomy12Information security indicators from ETSI GS ISI
2eCSIRT.net incident taxonomy13Malware Attribute Enumeration and Characterization (MAEC)
3ENISA threat taxonomy14Reference security incident classification taxonomy
4Microsoft implementation of CARO Naming Scheme15Threats targeting cryptocurrency, based on CipherTrace report
5Internal taxonomy for Canadian Centre for Cyber Security (CCCS)16Open Threat Taxonomy
6Europol common taxonomy for law enforcement and CSIRTs17Penetration test (pentest) classification
7Vocabulary for Event Recording and Incident Sharing (VERIS)18Infoleak taxonomy
8ENISA threat taxonomy in the scope of securing smart airports19Common Taxonomy for Law enforcement and CSIRTs
9SANS malware classification based on “Malware 101—Viruses”20MONARC
10CERT-XLM Security Incident Classification21Distributed Denial of Service (DDoS) taxonomy
11GSMA—Fraud and Security Group22Incident disposition based on the NASA Incident Response and Management Handbook

Table 4. The 10 Taxonomies Used for Incident Classification and the 22 of Taxonomies Analyzed to Define the Unified Taxonomy

The UT we propose is based on structures of the eCSIRT.net incident taxonomy and CARO malware naming scheme, and it aims to simplify the event classification while maintaining its details. In addition, since most taxonomies have two tiers of classification, such as the eCSIRT.net incident taxonomy, we opted to follow this level of detail. This allows us to choose the granularity level of the classification. To define UT, we analyzed the 22 public taxonomies listed in Table 4 for the tags related to incident classification.4 UT is composed of 8 incident categories of Tier 1 (like the other two taxonomies) and 38 sub-categories of Tier 2 distributed by Tier 1 categories.

Table 5 relays how each public taxonomy of Table 4 contributed to the definition of UT, in terms of number of incident classification tags for each Tier 2 sub-category (column 3), and, how many taxonomies are in root of each Tier 1 and Tier 2 (column 26). In total, 354 tags from public taxonomies were mapped to our taxonomy, with VERIS, CARO, and Europol being the taxonomies that most contributed (line 41). In addition, eCSIRT.net, VERIS, CERT-XLM, and CARO were the taxonomies that most participated in the definition of Tier 2 sub-categories (last line).

Table 5.
Unified TaxonomyPublic Taxonomies
Tier 1Tier 2#Tg12345678910111213141516171819202122#Tx#W
Abusive contentspam1311113211111013
Malicious codeadware4111141
backdoor421131
browser-modifier32122
cryptominer311136
dialer412131
dos1441935
exploit6121241
hack-tool1112
misleading811636
monitoring-tool722338
password-stealer611436
ransomware121112111121102
remote-access-tool71221151
settings-modifier31224
spammer411232
spoofer2212
spyware812211162
trojan1511012157
virtool8112111173
virus711211162
wiper512236
worm91121111182
Information-scanning111112113183
gatheringsniffing6131142
social-engineering1711641121812
Intrusion-ids-alert121811155
attemptsbrute-force914111163
unknown-exploit311133
account-compromise622236
system-or-application-compromise60441173422122116
botnet-member21122
Availabilitydos-or-ddos241344112125106
information-unauthorised-information-access9122111173
content-securityunauthorised-information-modification9113111173
Fraudmasquerade611111162
phishing231132411142111134
Vulnerablevulnerable-service4111142
Contribution of each public taxonomy in #Tags354123123531835665923312271311823351
#Tier 2 categories in which public taxonomies contributed9251620161021282137101111513311

Table 5. Contribution of Each Public Taxonomy of Table 4 in the Definition of the Unified Taxonomy

Table 6 contains an excerpt of UT, showing the relationship map we created for all public taxonomies (columns 1 to 3). The complete definition of UT can be found in Appendix B [24].

Table 6.
Unified TaxonomyPublic TaxonomiesBag of Words
Tier 1Tier 2
Abusive contentspamcccs:email-type=“spam”spam, junk email, junk mail, junk e-mail,
circl:incident-classification=“spam”unsolicited email, unsolicited mail,
ecsirt:abusive-content=“spam”unsolicited e-mail, bulk email, bulk mail,
enisa:nefarious-activity-abuse=“spam”bulk e-mail, unwanted email,
europol-event:email-floodingunwanted mail, unwanted e-mail
europol-event:spam
europol-incident:abusive-content=“spam”
gsma-fraud:technical=“spamming”
information-security-indicators:iex=“spm.1”
maec-malware-capabilities:maec-malware-capability=
“email-spam”
rsit:abusive-content=“spam”
veris:action:malware:variety=“spam”
veris:action:social:variety=“spam”
malwareadwarecccs:malware-category=“adware”adware
malware_classification:malware-category=“adware”
ms-caro-malware:malware-type=“adware”
veris:action:malware:variety=“adware”
backdoormaec-malware-behavior:maec-malware-behavior=backdoor
“install-backdoor”
ms-caro-malware:malware-type=“backdoor”
ms-caro-malware-full:malware-type=“backdoor”
veris:action:malware:variety=“backdoor”
browser-cccs:malware-category=“browser-hijacker”browser hijacker, browser modifier
modifierms-caro-malware:malware-type=“broswermodifier”
ms-caro-malware-full:malware-type=“broswermodifier”

Table 6. Unified Taxonomy (Excerpt of) with Public Taxonomy and Bag of Words Mappings

Additionally, a bag of words was defined for each Tier 2 of UT to describe them and allow further classification. Each bag was created based on words extracted from the public taxonomies and synonyms from these words. These bags of words will not only support further analyses over events with public taxonomy tags but, most importantly, will be used to analyze events without public taxonomy tags (e.g., those two untagged events from our dataset that were not classified yet). The last column of Table 5 contains the number of words affected to each category, in a total of 147 words, and the last column of Table 6 presents the bag of words mapped by category of UT. The complete list of bags of words can be found in Appendix B as part of the definition of UT [24].

3.3 Main Threat Attributes

As stated previously, the volume of shared information is one of the TIPs’ limitations (see Section 2.6). This limitation was observed during the analysis of our dataset in the following formats:

  • Events containing collections of IoCs from multiple attacks: Most of these events contain IoCs with few or no correlations. For example, some of these events contain lists of malicious IPs with the primary purpose to serve as an input for a detection or prevention component. Since these events contain long lists of attributes with few to no context between each other, we opted to discard them from further analyses, not negatively impacting our results. In total, 17 events were discarded from the 1,168 events.

  • Events with too many attributes: Twenty percent of our dataset contained events with more than 100 attributes. From the point of view of a security analyst, the more attributes an event has, the more difficult it is to analyze.

To discover the most prevalent attributes that underlie an incident category (i.e., the main threat attributes), the following analyses focused on the events with fewer than 100 attributes and those with too many attributes. For the latter, we intend to understand why they have so many attributes and capture which important information might be extracted from them. Thus, the following three analyses were made considering both numbers of attributes. These analyses combined the results by the number of attributes, aiming to differentiate the results from smaller and bigger events and consequently determine the main attributes. For this purpose, four attribute intervals were considered: I1 (less than or equal to 100), I2 (between 100 and 500), I3 (between 500 and 1,000), and I4 (greater than 1,000).

3.3.1 Distribution of Events by Attributes.

This first analysis was based on the distribution of events by the four intervals of attributes. However, since we aim to get the attributes that better characterize an incident category, it was necessary to determine which events are classified as an incident and which are not, distributing them along with the intervals. We resorted to the public taxonomies’ tags to classify each event according to UT. More precisely, each tag from each event was compared with the public tags and, when matched, classified according to the corresponding Tier 1 category of UT. The 691 tagged events in an incident category were correctly classified in UT, whereas the remaining 460 (out of 1,151) were not classified because they did not have any classification tags related to incidents, so they did not match with any taxonomy. A total of 666 of the classified events fit the first two (I1 and I2) intervals, respectively, with 550 and 116 events. It is important to note that some events were classified with more than one Tier 1 category because they had more than one public tag corresponding to different UT categories.

3.3.2 Identification of Similar Attribute Types.

Due to the high amount of MISP-supported attribute types, a second analysis was made to identify attributes with similar types (i.e., properties) and aggregate them. For example, both MD5 and SHA1 attributes are hash values that are used as a checksum to verify data integrity, so they will be aggregated into the same group named file hash. By aggregating similar types of attributes, the results of the subsequent analysis will be focused on the characteristics of the attributes and not only on their type, meaning that even if our dataset only has attributes with the type MD5, attributes with the type SHA1 will not be discarded from the results since they belong to the same group.

3.3.3 Identification of Threat Main Attributes.

This analysis had the objective of identifying the most predominant attribute groups for each Tier 1 category, based on the previous two analyses. The four intervals of the number of attributes were considered but cumulative. This means that the first cumulative interval (CI1) is equal to I1, the second cumulative interval (CI2) contains all events with a number of attributes until 500 (i.e., I1 and I2), and so on. Table 7 shows the results of this analysis—that is, the most predominant attribute groups for each Tier 1 category of UT. The complete tables can be found in Appendix C [24].

Table 7.

Table 7. The Most Predominant Attribute Groups for Tier 1 Categories of the Unified Taxonomy

As expected, the events with more attributes have a higher impact on the statistical results due to the weight of an event being directly proportional to the amount of the attributes in itself. This observation can be confirmed from the results presented in the table. As a result, when the analysis was performed over all classified events (CI4 interval of attributes), some of the results had significant discrepancies compared to the analysis results restricted to events with fewer than 100 attributes. For example, for the information-gathering Tier 1 category, the attribute group network name equals 12% of all groups when the analysis is only made over events with fewer than 100 attributes, and the same attribute group equals 61% of all groups when including all the classified events in the analysis (CI4). Since our dataset comprises events with fewer than 100 attributes, we have higher trust in the results gathered from those. Thus, we opted to use the result from the CI1 (or I1) interval. In a more detailed analysis on this interval for all Tier 1 categories, we noticed that four attribute groups are present in every category, namely, Network address, File hash, Other Info, and File name. In addition, the attributes URL and Network name are present in all categories, except in Vulnerable and information-content-security categories. This information will be used to improve the global quality of the events by only using the most important attributes of each category.

3.4 OSINT References to External Platforms

Another key finding from our dataset was many references to external platforms in the form of links, namely 5,325 links from 228 domains. More than 90% of the links pointed to VirusTotal,5 an online service that analyzes files and URLs enabling the detection of viruses, worms, trojans, and other kinds of malicious content using antivirus engines and website scanners. Additionally, platforms like VirusTotal tend to provide APIs to access information without using the website interface. However, the amount of these references increases the time an analyst requires to analyze the event since the analyst needs to jump between platforms to gather information and process it manually. We consider this as a TIP limitation (not pinpointed on Section 2.6, neither by other works [13, 14, 44]) that can easily be turned into a benefit, and it is considered in our proposed solution.

Skip 4AUTOMATED EVENT CLASSIFICATION AND CORRELATION PLATFORM Section

4 AUTOMATED EVENT CLASSIFICATION AND CORRELATION PLATFORM

This section presents the overall design of AECCP, our proposed solution that aims to improve the quality threat intelligence produced by TIPs by classifying and enriching it automatically. In practice, the solution is composed of four core modules, each one focused on one or more limitations verified in our data analysis detailed in Section 3 and some of those presented in Section 2.6, and a fifth module that interconnects the other four and manages all of AECCP’s operations.

Regarding the limitation related to the volume of shared information, we propose an approach to reduce the number of attributes per event based on the most predominant attributes of its category, which were determined in Section 3.3. Moreover, for incident taxonomy management, we propose to classify every event according to the unified taxonomy defined in Section 3.2. Since AECCP will analyze and classify events in an automated way, it also increases technology enablement in threat triage. Furthermore, we propose a solution to enrich the data quality of an event based on OSINT from the VirusTotal platform. To increase the advanced analytic capabilities of MISP, we propose to create new events as clusters of enriched events from the same threat and with related attributes in common, after a correlation process that looks for relationships between attributes of different events. Table 8 depicts the limitations that we addressed in AECCP as well as the proposed solution for each one, the AECCP’s module that comprises the solution, and the section in which it is presented. However, for a better understanding of the solutions, first we present the symbolic representation of an event that is used in the sections, and in Section 4.2 we give an overview of the platform, showing the workflow and interactions between the four modules.

Table 8.
IDLimitationSolutionModuleSection
LT10Diverse data formatsEvery event will be normalized to a standard formatClassifier4.3
LT7Threat knowledge management limitationsEvery event will be classified according to the unified
taxonomy defined in Section 3.2
LT2Limited technology enablement in threat triageThe classification of each event will be automated,
LT5Limited advanced analytics capabilities and tasks automationbased on its data (description of the attack, antivirus reports, etc.)
LT1Shared threat information is too voluminousEach event will have a simplified view only containing the most predominant attributes stated in Section 3.3Trimmer4.4
LT3Data qualityEvents containing links to VirusTotal will be enriched with information provided by the platformEnricher4.5
LT8Focus on tactical IoCsAdditionally, events containing hashes and URLs will
LT9Trust-related issuesalso be enriched using the same method
LT4Limited analytics capabilitiesWhen at least 2 events from the same category haveClusterer4.6
LT6Focus on data collectionan attribute in common, a cluster will be created
LT11Shared intelligence without expiration dateto help an analyst identify related events and to be included in network defense mechanisms

Table 8. Addressed Limitations and Correspondent Proposed Solutions

4.1 Symbolic Representation of an Event

A TIP’s event can be represented by the tuple \( E_x = \, \lt d, ot, T, A, R\gt \), identified by x and where d is its description, \( T = \lbrace NULL | T_1\ldots T_n\rbrace \) represents the public taxonomy tags that classify it into malicious threat categories and custom tags created by SOCs, for example, to identify the event within the organization; \( A = \lbrace A_1\ldots A_m\rbrace \) represents the attributes, ranging from 1 to m, that characterize the event; and \( R = \lbrace NULL | (A_i, A_j)\ldots (A_u, A_v)\rbrace \) represents the relations between attributes. For example, \( (A_1, A_2) \) represents the relation between \( A_1 \) and \( A_2 \) attributes. If the event is not yet classified and there is no relation between their attributes, NULL is used to indicate this. Finally, all of the other data of an event with minor relevance for this work will be compacted into the field ot.

AECCP follows this event representation, but the elements of AECCP’s events are sets associated with UT, main and enriched attributes, and their relations. We denote \( ^uE_x = \, \lt d, ^uT, ^uA, ^uR\gt \) as being the resulting AECCP event when the platform processes \( E_x \), and we use the following nomenclature: \( ^uT = \lbrace {^uT_1}\ldots {^uT_m}\rbrace \) is the UT tags that classify the event, and \( ^uA = \lbrace ^gA, ^eA\rbrace \) is the set of attributes that characterize the event, which can be main threat attributes (\( ^gA = \lbrace ^gA_1\ldots ^gA_j\rbrace \)) and enriched attributes (\( ^eA = \lbrace ^eA_1\ldots ^eA_v\rbrace \)). A \( ^eA_j \) attribute is the result of an enrichment of a \( ^gA_j \) attribute—that is, a \( ^gA_j \) attribute is enriched with external information from VirusTotal and with antivirus information associated with the result of VirusTotal (resulting in \( ^eA_j \)). \( ^uR = R(^uA) \) the relations between attributes from \( ^uA \). In addition, we denote by \( ^uC_y \) the cluster resulting from the correlation and aggregation tasks performed by AECCP over \( ^uE \) events.

4.2 AECCP Overview

AECCP is a platform that interacts with TIPs (e.g., MISP) to generate new events with their quality threat intelligence increased. In other words, it classifies, enriches, and correlates the events received by TIPs, and does all of the work in an automated manner. The platform is composed of five modules—Classifier, Trimmer, Enricher, Clusterer, and Orchestrator—of which the first four perform together all of the work and the last coordinates the workflow between the other four. Figure 1 depicts the overview of its architecture and the workflow between the four modules:

Fig. 1.

Fig. 1. Overview of AECCP.

  • An event \( E_a \), from the TIP database (e.g., MISP), serves as input to the Classifier module, without suffering any pre-processing from TIP. The module aims at classifying each event according to UT. To get the most accurate classification, \( E_a \) is first normalized to a standard format and then is only classified according to the Tier 1 category of UT. Afterward, the event is updated with Tier 1 tags (\( ^uT \) tag set), transforming it into \( E_{a^{\prime }} \).

  • The Trimmer module aims at reducing the volume of attributes in an event based on the relevancy of those attributes. The module receives \( E_{a^{\prime }} \), iterates over its attributes, and creates \( ^uE_a \), an AECCP event with the most relevant attributes \( ^uA_i \) and the \( ^uT \) tag set from \( E_{a^{\prime }} \).

  • The new event (\( ^uE_a \)) is then sent to the Enricher module to enrich it with information from VirusTotal. In this module, \( ^uA \) attributes in the event containing URLs or hashes are updated with information from VirusTotal. Additionally, the module adds an associated enriched attribute to the event for each \( ^uA_i \) attribute that was updated (enriched). This new attribute will support the output of antivirus engines, website scanners, and analysis tools (that allowed the update). At the final, \( ^uE_a \) is updated with both attributes and its relationship (\( R(^uA) \)).

  • \( ^uE_a \) is now reprocessed by the Classifier module, but this time according to the Tier 2 category of UT. Since the event was enriched (by the Enricher) with information not existent at the beginning of the processing, the Classifier module can classify the event more accurately. In this step, the Tier 1 \( ^uT_x \) tags are updated with Tier 2 \( ^uT_x._y \) tags (e.g., [\( unified:^uT_1=^uT_1._2 \)]).

  • The Clusterer module aims at creating clusters of events that share the same threat category and have at least an \( ^uA_i \) attribute in common. Other events that share at least one Tier 2 \( ^uT_x._y \) with \( ^uE_a \) and have at least one valuable attribute \( ^uA_i \) (attributes that provide context to a specific attack) in common with \( ^uE_a \) are clustered in a new cluster event \( ^uC_i \). Moreover, this module is recursive, meaning that it tries to find other events related to every event added to the cluster. Additionally, multiple new \( ^uC_i \) can be created by Clusterer if \( ^uE_a \) has more than one distinct Tier 2 category tag.

Both results provided by the second pass of Classifier and Clusterer can be integrated into defense mechanisms (e.g., firewalls, IDS, IPS, and SIEMs) installed in the organization’s IT infrastructure to protect the organization from cyber-attacks.

Figure 2 presents the detailed workflow within and between the four modules. The following four sections are dedicated to each module to describe its operation in detail.

Fig. 2.

Fig. 2. The detailed workflow within and between the modules of the AECCP.

4.3 Automated Event Classification

As explained in Section 3.2, the high diversity of classification tags can be a disadvantage from the point of view of threat knowledge management (LT7). Furthermore, the diversity of data formats that OSINT can take (LT10) can have a negative impact on this management, making OSINT processing difficult. Additionally, due to this diversity, most events must be manually analyzed to identify their categories and classify them as such. Since most threat triage and periodization processes rely on the event category (LT2), this manual process creates an unwanted delay in the subsequent processes (LT3). To reduce these limitations, AECCP comprises the Classifier module that automatically classifies events according to the UT after they have their data format normalised and based on the tag, description, and attribute information of TIP’s events. To do so, the Classifier module resorts to two methods: classification based on public taxonomies tags and classification based on keywords.

Regarding the first method, Classifier takes advantage of the mapping information from Table 6 to find every public taxonomy tag \( T_i \) to map to a UT tag \( ^uT_i \). In other words, each TIP’s event will have its tags scanned and matched against the UT mapping table. When matched, the corresponding UT tag \( ^uT_i \) is added to the \( ^uT \) list, if not already in the list. In the end, the T tag list of the event is updated with the \( ^uT \) list it found. For example, if an event has two public tags related to the same threat category (e.g., the tags [cert-xlm:information-gathering=“scanner”] and [circl:incident-classification=“scan”]), the UT tag [unified:information- gathering=“scanning”] will be added to the \( ^uT \) tag list once, and then this list will be added to the T list. Note that the \( ^uT \) tag follows the same scheme of tags from public taxonomies (i.e., [taxonomy:Tier1 tag=“Tier2 value”]). We identified UT by unified.

For the second method, the Classifier module uses the bag of words from the last column of Table 6 to identify keywords related to a UT category based on the information contained in the description, attributes, and custom tags (tags that do not belong to a public taxonomy) of the TIP’s events. As mentioned previously, some events hold important details in their descriptions that can help an analyst identify the category of the incident. Moreover, it is also possible to gather important information from attributes and custom tags of an event to better classify it. Therefore, events will also have their custom tags, description, and attributes scanned and matched against the bag of words. When matched, the related UT tag \( ^uT_i \) is added to the \( ^uT \) tags list, if not already in the list. Later, this list will be added to the T list. Unlike the first method, this method can classify events that were not tagged yet (i.e., without classification tags; \( T=NULL \)). As an example, if the word phishing is found in the description of an event with no public taxonomy tags, the event will be updated to contain the \( ^uT_i \) tag [unified:fraud=“phishing”] in its \( ^uT \) list.

Each event is processed two times by the Classifier module, in steps 1 and 4 of Figure 2, each time according to a different UT Tier. In step 1, the module classifies \( E_a \) according to Tier 1 and updates it with the Tier 1 \( ^uT \) tags it found, thus resulting in \( E_{a^{\prime }} \). This step uses the two classification methods described previously. However, in step 4, the Classifier module updates the \( ^uT \) tags determined in step 1, but now according to Tier 2. It uses the classification based on keywords method, but now it resorts to information driven by the processing of the Trimmer and Enricher modules (see the next two sections), which add information that did not belong to the initial event (\( E_a \)), respectively, the main attributes (\( ^gA \)) and the enriched attributes (\( ^eA \)). Therefore, this information is matched against the bag of words for each Tier 1 category already found, obtaining the Tier 2 associated with Tier 1. In addition, new \( ^uT_i \) Tier 1 can be found during the analysis if those attributes contain information that allows this. Afterward, the Tier 1 tags from the \( ^uT \) list are updated with Tier 2 tags, in the form [unified:\( ^uT_i \)\( Tier 1 \) = \( ^uT_i._j \)\( Tier 2 \)] (e.g., [unified:fraud=“phishing”]).

As final remarks, if \( E_a \) could not be classified according to the Tier 1 category (in step 1) due to lack of information, the event proceeds without \( ^uT \) tags since the subsequent modules will enrich it; so it will receive other information. Step 4 will reprocess and classify it according to Tier 1 and Tier 2 categories. If it still could not be classified, the event exits the pipeline and is not processed by the further modules.

Algorithm 1 represents the main logic behinde Classifier, where the processing of each event is separated in Tier 1 classification (step 1, lines 1 through 3) and Tier 2 classification (step 4, lines 5 through 9) based on the state of the event that was passed into the Classifier module. For each tier classification, the functions classifyTier1 and classifyTier2 are called. The classifyTier1 function (presented in Algorithm 2) uses the Public Taxonomy Mapping (lines 5 through 8) and the Bag of Words (lines 9 through 16) for discovering the \( ^uT_i \) Tier 1 tags. Algorithm 3 shows the logic behind the classifyTier2 function, which also uses the same repositories for processing the information of step 4.

4.4 Event Simplification

The amount of shared information derived from events with too many attributes (LT1) was another limitation verified in Section 3.3. Both manual and automated analyses of events are impacted by unnecessary information. This type of information mainly acts as good to know, opposite to need to know, creating noise and consequently adding complexity to the event. To minimize this limitation, we propose the Trimmer module. Trimmer automatically trims the less relevant attributes from events, based on their UT Tier 1 tags and according to the predominant attributes (i.e., good to know information) resulting from the analysis presented in Section 3.3.

Each event served as an input to the module will have its attributes scanned and mapped according to the attribute groups. Afterward, based on a global relevancy threshold defined by the security analyst for each attribute group (e.g., 10%) and the Tier 1 tags, if the attribute in analysis belongs to a group with greater relevance than the threshold and based on results of Table 7, the attribute will be marked as being a main threat attribute. For cases where the event has no Tier 1 \( ^uT \), it is processed in the same way as if it had all Tier 1 of \( ^uT \) tags, thus not losing any predominant attributes. Finally, if both attributes of an event’s relation were considered main threat attributes, the relation is added to the final event (i.e., to \( ^uE_a \)). This verification and addition are made for every relation the event contains.

Summarily, the module receives \( E_{a^{\prime }} \) as input, identifies its main attributes and the relations between them, and then creates the \( ^uE_a \) event with the description of \( E_{a^{\prime }} \), the \( ^uT \) tags, the list \( ^gA \) of main attributes, and their relations (\( R(^gA) \)). Algorithm 4 shows the logic behind this module, which follows the process described throughout this section.

4.5 OSINT-based Event Enrichment

As explained in Section 3.4, more than 90% of the links contained in events pointed to the VirusTotal online platform. The references to external platforms increase the time an analyst requires to analyze an event since he needs to jump manually between platforms to gather information. Moreover, enriching events with additional information gathered from external sources can significantly improve other processes and tasks (LT3, LT8) if this information is related to a predominant attribute group (a main threat attribute) (LT9).

AECCP integrates an event Enricher module that takes advantage of the references to external platforms to enrich the quality threat intelligence of events. Hence, the module automatically enriches events containing main threat attributes with links to VirusTotal, URLs, or file hashes.

Algorithm 5 illustrates the dataflow made by this module, which follows the process presented next. Each \( ^uE_a \) event processed by Enricher will have its \( ^gA \) main attributes scanned. If any of these attributes have any URL or file hash, it is parsed to extract them. In addition, since VirusTotal links contain IoCs in the target URL, they are also extracted by the same procedure. For each extracted IoC (URL or file hash), a request is sent to VirusTotal, and a report is received containing the most known antivirus engines, website scanners, and analysis tools regarding that IoC. This information will update those \( ^gA_i \) attributes with URLs and file hashes, transforming them into enriched attributes, \( ^eA_i \). Additionally, complementary information can be received like hashes according to different hashing algorithms. Such information is also stored in \( ^eA_i \) attributes, and a relationship between them is created (denoted by \( R(^eA_i) \)).

4.6 Event Clustering

Creating correlations between events is one key feature that helps SOC analysts identify threats with similarities, such as source, target, payload, threat actor, and used tools. However, as mentioned previously, most TIPs have limited advanced analytics capabilities (LT4) related to event correlation. MISP has its built-in correlation algorithm that allows an analyst to identify events that have attributes in common. However, this algorithm relies on the values of the attributes and one key information, a flag, that specifies if that attribute can be correlated. This flag is inserted manually and, if not appropriately used, negatively impacts the correlation of events. For example, if a user adds an attribute to an event that indicates that the payload was sent over HTTP, the correlation of this attribute with attributes from other events will mostly be useless since many attacks use HTTP to send the payload. Therefore, we must know which attributes should be flagged as correlation information and why some attributes should not be flagged as such. Thus, it is crucial to managing event correlation properly. Moreover, this built-in algorithm does not use the information related to the event category, creating a relation between events without context.

The AECCP aims to improve the analytic capabilities (LT4) of TIPs, namely the event correlation capabilities, turning TIPs more than a data collector and repository (LT6). For that, it contains the Clusterer module for automatically creating clusters of events that share the same incident category and have at least one valuable main attribute in common (attributes that provide context to a specific attack, e.g., hashes). The resulting clusters are AECCP events that combine information about the same attack and which can be shared timely with external entities and used in defense mechanisms (LT11).

Hence, each event received by Clusterer will have its main attributes scanned, looking for connections points with other events. For each scanned attribute, if its content does not add value when correlated, it will be skipped. Attributes’ contents such as Booleans, dates, and small sets of possible values like HTTP methods fit in this case because multiple events with no relation have them in common. A concrete example of this case is an HTTP flood attack, which is categorized on UT as [unified:availability=“dos-or-ddos”], and an intrusion using an unknown exploit as [unified:intrusion-or-attempts=“unknown-exploit”]. Both events could be exploited using the HTTP GET method, but they do not correlate between them, meaning that they may even share some attribute’s content (HTTP GET), but it does not imply that they are related. However, if the scanned attribute adds values when correlated, a search is made over the database of events to identify other events that contain the same attribute. If at least the event has a correlation with another event and both share a \( ^uT_i \) tag, a cluster is created. The resulting cluster contains the \( ^uT_i \) tag shared by events that compose the cluster, as well as all of their attributes. Finally, all events that compose the cluster are added as attributes and, for each, relations are created with the attributes obtained from the correspondent source events.

In Figure 2, we can see the transformation of event \( ^uE_a \) processed by Clusterer. When processed, attributes from \( ^gA \) and \( ^eA \) lists are scanned to identify valuable attribute (attributes that provide context to a specific attack). With \( ^gA_x \) being an valuable attribute, a search is made over \( ^uE \) events database to identify other events with \( ^gA_x \). With \( ^uE_b \) being an event that contains \( ^gA_x \) in common with \( ^uE_a \), \( ^uT \) tags from \( ^uE_a \) and \( ^uE_b \) are scanned to find at least one UT tag in common. With \( ^uT_i \) being a common tag for both events, the \( ^uC_{ab} \) cluster is created with the tag \( ^uT_i \). Furthermore, all attributes from \( ^uE_a \) and \( ^uE_b \) are added to the cluster, where for those valuable attributes in common (i.e., that formed the cluster), their contents are concatenated (e.g., \( ^gA_x = [^uE_a(^gA_x)||^uE_b(^gA_x)] \)). Additionally, \( ^uE_a \) and \( ^uE_b \) are also added as attributes to avoid losing the original events that generated the cluster, and relations are created between them. In Section 5.2.4, a real example is provided to better understand Clusterer output.

Algorithm 6 shows the dataflow of Clusterer explained earlier. In lines 3 through 9, the algorithm searches upon events \( ^uE \) on the database to get other events with at least one attribute in common with event \( ^uE_a \).

4.7 Orchestrator

The Orchestrator module is responsible for ensuring that each event, at any time, follows a specific flow, and it is only processed by a module if the event has the required requirements (e.g., only can be enriched if it was already trimmed). Additionally, this module is responsible for checking for new events of TIPs, which were added via sharing or manually and initiating the AECCP processing for each event. In sum, Orchestrator is responsible for the following tasks:

  • Fetch new TIP’s events: Periodically, it checks if there are new events from the selected OSINT feeds and adds them to the TIP’s database.

  • Initiate processing of new TIP events: Periodically, it checks for events that were added since the last time AECCP processed an event.

  • Assure the correct workflow order: Orchestrator acts as a manager by sending each event to the correct next module. This module takes advantage of custom tags that are only used by it, and these tags store the current state of the event regarding the AECCP processing order.

  • Resume the process: If the processing of an event is interrupted, the module can resume the processing of that event without impacting the event database by falling back to the previous event state.

4.8 Implementation

We implemented AECCP using Python 3.7 and over the MISP. For that, AECCP resorts to PyMISP,6 a Python library to access the MISP platform via their REST API. Implementing AECCP leverages built-in PyMISP functionalities to search, add, or update events and attributes.

AECCP implements the five modules described in Section 4. Its modules can be considered smaller solutions and therefore can work regardless of each other. In addition, the platform has the capability of exporting its events (i.e., \( ^uE \) events and \( ^uC \) clusters) to be used by external entities, such as SIEMs, CSIRTS, and SOCs.

Skip 5EVALUATION Section

5 EVALUATION

The objective of the experimental evaluation was to answer the following questions:

  • Is AECCP able to classify events that are not initially tagged?

  • Is AECCP able to reclassify events previously tagged with a known incident classification taxonomy?

  • Does AECCP simplify event triage?

  • Is Trimmer able to reduce the number of attributes of events without losing valuable information for their classification?

  • Does Enricher improve the quality of the events?

  • Is AECCP able to correlate different events (threats) that share the same IoC?

  • Is AECCP more effective than PURE and ETIP platforms?

We validated and evaluated AECCP with three datasets of events. For validation, we used as ground truth the dataset we analyzed in Section 3 (Section 5.1), whereas for evaluation, we used two datasets of which we did not have any knowledge about their events and one of them being constituted by events generated by PURE [3] (Sections 5.2 and 5.3). In addition, Section 5.3 presents an evaluation of AECCP with the PURE and ETIP platforms.

5.1 Validation with the Ground Truth Dataset

To validate AECCP, we used as the ground truth dataset the 1,168 events we analyzed in Section 3. The dataset comprises 2 totally untagged events and 1,166 tagged events, of which, from the latter, 691 events are tagged into an incident category, but several of them have multiple overlapping classification tags from different public taxonomies. The remaining 475 events are not tagged into an incident category; hence, we consider them untagged. Summing up, the ground truth contains 691 tagged events and 477 untagged events. The tagged events will serve to validate the classification based on public taxonomies tags method, whereas the untagged events will validate the classification based on keywords method, both methods from the Classifier module (see Section 4.3). However, note that we want to classify events for both UT tiers, meaning that the Classifier, Trimmer, and Enricher modules will be used and validated, and Classifier will be executed twice.

Processing the 691 tagged events with AECCP, we verified that they were correctly classified into incident categories of UT for both Tier 1 and Tier 2. The resulting classification was checked based on the manual classification we made in the data analysis section (see Section 3). The second column of Table 9 shows these events classified through the eight Tier 1 categories of UT. Notice that an event can fit into different Tier 1 categories.

Table 9.
Tier 1Tagged EventsUnttagged Events
Abusive content14599
Malicious code607408
Information-gathering6355
Intrusion-attempts3743
Availability510
information-content-security212
Fraud3440
Vulnerable35
Total896672

Table 9. Ground Truth Dataset Classified by AECCP over the Tier 1 Incident Categories of UT

For the 477 untagged events, when Classifier processed them the first time, the classification based on keywords method was able to classify 453 of them into Tier 1 categories of UT, based on their descriptions and attribute values. The other 24 remained untagged events, carried on to the Trimmer and Enricher modules, and then re-evaluated by Classifier. We observed after this processing that 16 of them were enriched with external data, but the external data only allowed to tag 8 of them in an incident category (i.e., with UT Tier 1 and Tier 2 tags). Curiously, the 2 totally untagged events were between these 8 events. For all 461 classified events, we manually inspected their information before and after they were processed by AECCP and verified that AECCP correctly tagged them. For the 16 events that the platform failed to classify, we also inspected them to find out why. We checked that they did not provide enough information in their descriptions and attributes to permit them to be associated with an incident category. In addition, the attributes that Enricher enriched did not bring valuable information that would allow their classification. The last column of Table 9 presents the 461 events classified into the eight Tier 1 categories.

Most of the events were classified into the Malicious code (malware) and Abusive content Tier 1 incident categories of UT, reflecting well the number of cyber-attacks that have been made over the Internet. As a result, we can conclude that AECCP has a precision7 of 1 (i.e., 100%) when it classifies events previously labeled by public taxonomies. In contrast, when processing untagged events, AECCP’s precision depends on the information that their descriptions, attributes, and external data can provide about the threats they report. Based on our ground truth, from the 477 untagged events, the platform correctly classified 461 (TP) and did not have false positives (FP, events classified wrongly into incident categories), meaning that it had a precision of 1. However, since it was not able to classify 16 out of the 477 events, we consider these events as being false negatives (FN), and so it had a false-negative rate of 0.033 and a recall8 of 0.966. Overall, based on the 1,168 events, AECCP classified 1,152 (without false positives) and missed 16. Thus, it had a precision of 1, a recall of 0.986, a false-negative rate of 0.013, and an F1-score9 of 0.992.

We measured the time that AECCP takes to process both types of events (tagged and untagged). This time is strongly related to the quantity of data included in the events and that which the platform has to analyze, which depends on diverse factors, namely the number of the public taxonomy tags, the number of attributes, and the amount of external data. As expected, the greater the amount of data, the longer it takes to process it. In addition, tagged events take longer than untagged events, considering that both types of events have the same number of attributes and the same amount of external data. This is explained by the fact that the former have their tags analyzed by the classification based on public taxonomies tags method, whereas the latter does not. For the tagged events with fewer than 100 attributes, the average time for processing an event by AECCP is 30 seconds. Considering all 691 tagged events, it takes an average of 41 seconds to consume an event, with a standard deviation of 17 seconds, which means that, at most, it takes approximately 1 minute to process an event. Regarding untagged events, the processing times are shorter, namely (i) 24 seconds on average for events with fewer than 100 attributes; (ii) 31 seconds on average for processing any event out of the 477 events, with an 11-second standard deviation; and (iii) a maximum of 42 seconds to process an event. Therefore, the maximum time AECCP takes to process an event is 1 minute. Although it seems a bit long, we consider it acceptable given that it is the cost of reducing to zero the time spent by SOC analysts in analyzing and classifying events, which might incur classification errors.

5.2 Processing Dataset of MISP’s Events

This section assesses the ability of AECCP to process a dataset composed of 64 MISP events that were not previously processed by the platform. The following sections present the characterization of the dataset and its processing by AECCP’s modules.

5.2.1 Dataset Characterization.

The dataset’s events were provided from different providers—CIRCL, CUDESO, inThreat, VK-Intel, ESET, and MalwareMustDie—where 54 of the events were from the first two sources. From the 64 events, approximately 77% (49 events) of them did not contain any tags related to a known incident classification taxonomy, meaning that those events were not yet classified. These events will serve to evaluate AECCP’s ability to classify events with the classification based on keywords method and to answer question 1. Regarding the volume of attributes of the events and distributing them according to the same four intervals used in Section 3.3, the dataset is mainly composed of events with fewer than 100 attributes, 90% of the 64 events.

To get a detailed evaluation of our solution, we choose to perform a more in-depth analysis of the (remaining) 15 events that, contrarily to the other 49 events, were initially classified with a known incident classification taxonomy. We choose these events since they can be used to evaluate almost all use cases that AECCP deals with, except AECCP’s ability to classify events that are not initially classified, which can be evaluated by comparing the number of unclassified events initially and after being processed by AECCP. Table 10 shows a more detailed view of the tags and the attributes of these 15 events, namely their public taxonomy tags (column 2); the total number of tags (TT, column 3), including tags that did not add information about the type of the threat (e.g., TLP); the number of classification tags related to threat incidents (CT, column 4); and the number of attributes (Att, column 5). As we can observe, all of the events have more tags than those that really classify events with known incidents, with some of them having a considerable number of tags not associated with incidents, such as events 1, 11, and 12. As already stated, such tags do not add value of threats, making the SOC analyst waste time by analyzing irrelevant information.

Table 10.
MISP’s EventsAECCP
\( E_x \)Public taxonomy tagsTTCTAttUnified taxonomy tagsTTCTATAE
1circl:incident-classification=“spam”12117malicious-code=“virus”441313
malicious-code=“worm”
malicious-code=“spammer”
abusive-content=“spam”
2enisa:nefarious-activity-abuse=“spear-phishing-attacks”4184fraud=“phishing”117892
3malware_classification:malware-category=“Botnet”4110availability=“dos-or-ddos”661010
malicious-code=“exploit”
malicious-code=“dos”
malicious-code=“backdoor”
malicious-code=“remote-access-tool”
malicious-code=“cryptominer”
4malware_classification:malware-category=“Ransomware”5118vulnerable=“vulnerable-service”331842
malicious-code=“exploit”
malicious-code=“ransomware”
5malware_classification:malware-category=“Ransomware”319malicious-code=“wiper”2288
malicious-code=“ransomware”
6circl:incident-classification=“malware”8473malicious-code=“virtool”444353
malware_classification:malware-category=“Downloader”malicious-code=“cryptominer”
malware_classification:malware-category=“Rootkit”malicious-code=“trojan”
malware_classification:malware-category=“Botnet”malicious-code=“remote-access-tool”
7malware_classification:malware-category=“Ransomware”517malicious-code=“ransomware”1177
8circl:incident-classification=“malware”8129malicious-code=“virus”222936
malicious-code=“trojan”
9circl:incident-classification=“malware”4111malicious-code=“trojan”111111
10enisa:nefarious-activity-abuse=“spear-phishing-attacks”81115fraud=“phishing”11105173
11ecsirt:intrusions=“backdoor”38417malicious-code=“virtool”441534
veris:action:malware:variety=“Backdoor”malicious-code=“trojan”
ms-caro-malware:malware-type=“Backdoor”malicious-code=“backdoor”
ms-caro-malware-full:malware-type=“Backdoor”fraud=“phishing”
12ms-caro-malware:malware-type=“Trojan”10510malicious-code=“trojan”111010
ms-caro-malware-full:malware-type=“Trojan”
ecsirt:malicious-code=“trojan”
CERT-XLM:malicious-code=“trojan-malware”
malware_classification:malware-category=“Trojan”
13ecsirt:intrusions=“backdoor”10434malicious-code=“virtool”443434
veris:action:malware:variety=“Backdoor”malicious-code=“backdoor”
ms-caro-malware:malware-type=“Backdoor”malicious-code=“virus”
ms-caro-malware-full:malware-type=“Backdoor”malicious-code=“cryptominer”
14circl:incident-classification=“malware”12286malicious-code=“trojan”118686
ecsirt:malicious-code=“malware”
15ecsirt:malicious-code=“trojan”7127malicious-code=“trojan”1127166

Table 10. Characterization of the Dataset of MISP’s Events and Results of Processing of It by AECCP

5.2.2 Event Classification.

This section seeks to evaluate AECCP’s ability to classify events into UT for Tier 1 and Tier 2. Thus, the Classifier module will be evaluated for all of its functionalities, as well as the Trimmer and Enricher modules since these two modules support Classifier in the classification of events. In addition, this section aims to answer the first three questions.

After AECCP processed the dataset, 61 out of the 64 events were classified, increasing 72% of the number of classified events. We recall that only 15 events were initially classified with public taxonomy tags. Only 3 (out of the 64) events were not classified into UT due to the lack of information in their descriptions and the absence of indicators that Enricher could process (e.g., URL), thus adding more information to the events helpful to Classifier. The classification was verified manually, meaning that AECCP correctly processed all events.

The 49 out of the 64 events without any tags related to a known incident classification taxonomy were processed only using the classification based on keywords method. AECCP was able to classify 46 of them, meaning that the 3 events that were not classified belong to this data subset. Overall, 75% (46) of 61 classified events by AECCP were classified only based on keywords, meaning that AECCP can classify events that are not initially classified, answering positively to question 1.

Regarding the analysis targeted to the 15 events initially classified with a known incident classification taxonomy, the platform was able to use both classification methods and classify them correctly. Almost every event was classified with a new type of threat that was not initially considered in the public taxonomy tags. For example, event \( E_1 \) from Table 10 was identified only as spam before being processed by AECCP, but after being processed by AECCP it was also classified as malicious code with virus, worm, and spammer, meaning that AECCP is able to reclassify events, thus answering question 2. The sixth column of Table 10 shows this reclassification for the 15 events, where their original classification (second column) was transformed in the tags of the sixth column.

From the 15 events, on average, each had five more tags than before being processed by AECCP, thus increasing their tags from two to seven (columns 4 and 8). As explained in Sections 3.2 and 4.3, AECCP classifies events according to UT and also based on information contained in their description, meaning that each event classification can be improved. These assumptions can increase the number of tags per event. In addition, it is important to note that after being processed by AECCP, all of the tags on the events tag list are classification tags, contrary to before being processed by AECCP where most of the tags were not classification tags but added information about its source and its sharing (e.g., TLP). In columns 4 and 8 of Table 10, the number of tags is shown regarding known incident classification taxonomy, before and after being processed by AECCP.

From the 15 events, 14 of them had their total number of tags significantly reduced (columns 3 and 7) due to two factors. The first is when an event has overlapping classification tags in its initial tag list (e.g., [cccs:malware-category=“ransomware”], [cert-xlm:malicious- code=“ransomware”]) since they are transformed into a UT tag after being processed by AECCP. The second one is when an event has non-classification tags in its initial tag list (e.g., TLP) since they are removed after being processed by AECCP. From the point of view of an SOC analyst, the exclusion of non-classification tags and the inclusion of new classification tags based on OSINT can simplify event triage since all of the tags in the event tag list add value to the analyses, thus answering question 3.

5.2.3 Attribute Trimming and Enrichment.

This section looks to evaluate AECCP’s ability to trim and enrich events. More precisely, we evaluated the Trimmer and Enricher modules and sought to answer the fourth and fifth questions.

Before being processed by AECCP, our dataset had approximately 90% of events with fewer than 100 attributes. After being processed by AECCP, the number of events with fewer than 100 attributes decreased to 85% of the initial number. This means, at first glance, that our solution enriches more than it trims, adding more attributes than removing.

To understand this overall attribute increment, we analyzed the number of attributes of the events in three specific phases: before being processed by Trimmer, exactly after being processed by Trimmer, and finally after being processed by Enricher. From the results of this analysis, we can see that, on average, Trimmer removes 12 attributes per event and Enricher adds 54 attributes per event, thus increasing 44 attributes per event. Enricher’s increase is because it can add a maximum of 6 new attributes for each hash and 12 new attributes for each URL. For example, if an event has attributes containing three hashes and three URLs, Enricher will add 54 attributes to the event. Summing up, on average, the number of attributes in the three phases is 49, 37, and 91. Therefore, the attribute increment is due to Enricher, which overlaps Trimmer’s effect since this last trims the event attributes effectively.

Similar to the Classifier evaluation, we also evaluated the impact of Trimmer and Enricher on the 15 events. Table 10 shows the number of attributes on the three phases, namely before they are processed by Trimmer and Enricher (Att, column 5), after Trimmer (AT, column 9), and after Enricher (AE, last column). We verified that AECCP could reduce the number of attributes of some events depending on the type of attributes of those events, so Trimmer, in these cases, reduced the number of attributes effectively. This was observed in 6 out of the 15 events. However, we also verified that in those events where their attributes contain hashes and URLs, their number of attributes was increased by Enricher. Summing up, 7 events were increased, where 4 were first trimmed. Two of the remaining 8 events were trimmed but not enriched, and the other 6 were neither trimmed nor enriched. Overall, 6 had their number of attributes increased, 3 had their attributes reduced, and the remaining 6 maintained their number of attributes.

We evaluated the 15 events with and without these two modules to answer the fourth and fifth questions. Table 11 shows the results of this evaluation, which compares the number of classification tags when events were not processed by Trimmer and Enricher (columns 2, 6, and 10) with the number of classification tags when they only were processed by Trimmer (columns 3, 7, and 11), and with the number of classification tags when processed by both modules (columns 4, 8, and 12). As we can observe, all events have the same number of tags in columns 2 and 3, 6 and 7, and 10 and 11, meaning that Trimmer does not remove valuable information for the classification of events, answering positively to question 4. We can also observe from columns 4, 8, and 12 that the number of classification tags of 4 events were increased (\( E_3 \), \( E_8 \), \( E_9 \), and \( E_{15} \)), where 2 of them leveraged from the enrichment provided by Enricher (\( E_8 \), and \( E_{15} \)). Therefore, we conclude that Enricher improved the quality of the events, answering question 5.

Table 11.
WithoutWithWithWithoutWithWithWithoutWithWith
\( E_x \)T & ETT & E\( E_x \)T & ETT & E\( E_x \)T & ETT & E
1444644411444
2111711112111
3556811213444
4333900114111
52221011115001

Table 11. Trimmer and Enricher Impact on the Number of Tags of the 15 Events

5.2.4 Clustering.

This section aims to assess AECCP’s ability to correlate different events that share mutual IoCs (i.e., the Clusterer module) and answers the sixth question.

Since our evaluation dataset is small (64 events) and therefore Clusterer might not create many clusters, we allowed these events to be correlated with events from our ground truth dataset, thus totaling 1,232 events. With this approach, we were able to create 24 clusters. Table 12 details some of these clusters, whereas the rest are omitted since they have the same properties, except their taxonomies, as one of the clusters in this table. For example, clusters 100, 101, and 102 have exactly the same attributes and correlations, but they were created with different taxonomies ([unified:malicious-code=“worm”], [unified:malicious-code=“backdoor”] and [unified:malicious-code=“trojan”]) due to the logic behind the Clusterer module.

Table 12.
\( ^uC_x \)# EventsTaxonomy and Description# AttMutual IoCs
12malicious-code=“worm”416 www.tashdqdxp.com
-Soft Cell case indicators
-Malware with Ties to SunOrcal
93malicious-code=“trojan”68 https://twitter.com/VK_Intel/status/1128079463785349121
-FIN7 JScript Loader Malware
-APT28 XTunnel Backdoor
-Turla Kazuar RAT
102malicious-code=“virus”47 https://twitter.com/VK_Intel/status/1128079463785349121
-FIN7 JScript Loader Malware
-APT28 XTunnel Backdoor
112malicious-code=“ransomware”69All except one
-Sodinokibi ransomware
-Ransomware exploits WebLogic vulnerability
142malicious-code=“cryptominer”65CVE-2019-3396
-Botnet Malware Exploits CVE-2019-3396
-SystemTen (ELF trojan, miner, bot and rootkit)
1192malicious-code=“backdloor”53All except three
-Operation ShadowHammer
-Operation ShadowHammer
212malicious-code=“ransomware”28 https://www.bleepingcomputer.com/new-lockergoga-ransomware-allegedly-used-in-altran-attack/
-The Norsk Hydro ransomware attack
-New LockerGoga Ransomware in Altran Attack

Table 12. Clusters Created by AECCP

Figure 3 presents one of the clusters that were created by AECCP, identified with ID 21 in Table 12. This cluster is formed by two events (1518 and 1520) that have a common attribute, a link, and a common UT tag, [unified:malicious-code=“ransomware”]. The attribute in common is a link to https:\( \backslash \backslash \)bleepingcomputer.com with news related to ransomware LockerGoga, meaning that both events are related to the same threat. Because these two events have different information, except for the single shared link, they complement each other. This type of event correlation can be precious to an SOC analyst since he can easily gather more information about an event based on previously received events and will give him more indicators that can be used in block rules and other types of defenses, thus answering question 6.

Fig. 3.

Fig. 3. Cluster 21 created by AECCP and composed of two events: 1518 on the right and 1520 on the left.

5.3 Processing Events with the PURE and ETIP Platforms

To demonstrate AECCP’s ability to process events processed by other platforms existent in the literature, without losing relevant information by trimming event attributes and enriching the information they carried and, hence, their threat impact, we processed six events from PURE [3]. In addition, we compare the resulting events with the PURE versions by submitting them to ETIP [15] to calculate the TS of the threat value they carried.

Table 13 shows the characterization of the six events of PURE—namely, for each eIoC, the number of events it aggregates (#E, column 2), its description (column 3), the number of attributes it contains (#att, column 4), and its threat score measured by ETIP (TS, column 5).

Table 13.
PURE and ETIPAECCP and ETIP
ID#EDescription#attTS#AT#AEUnified TaxonomyTS
E12- OSINT Aveo Malware Family Targets Japanese Speaking821.297787malicious-code=“backdloor”1.29
- Pivot on whois registrant [email protected]malicious-code=“trojan”
E22- OSINT - Packrat: Seven Years of a South American2672.54257423availability=“dos-or-ddos”2.68
Threat Actorfraud=“phishing”
- Packrat: Seven Years of a South American Threat Actormalicious-code=“backdloor”
malicious-code=“dos”
malicious-code=“ransomware”
malicious-code=“trojan”
malicious-code=“worm”
E32- Expansion on [email protected]2743.22273401malicious-code=“backdloor”3.50
- New Variant of Gh0st Malware by Palo Alto Networksmalicious-code=“trojan”
Unit 42
E43- Spear Phishing Attack Using Cobalt Strike852.5378159abusive-content=“spam”2.58
Against Financial Institutionsfraud=“phishing”
- RTF files for Hancitor utilize exploit for CVE-2017-11882malicious-code=“exploit”
- Targeted Attack in the Middle East by APT34,malicious-code=“spammer”
using CVE-2017-11882malicious-code=“trojan”
vulnerable=“vulnerable-service”
E53- EPS Processing Zero-Days Exploited by Multiple1562.87146361information-gathering=“scanning”3.12
Threat Actorsmalicious-code=“backdloor”
- Malicious Documents Targeting Security Professionalsmalicious-code=“exploit”
- APT28 Targets Hospitality Sector, Presents Threatmalicious-code=“ransomware”
to Travelersmalicious-code=“trojan”
malicious-code=“worm”
vulnerable=“vulnerable-service”
E64- Sakula Malware Family8423.118212907information-gathering=“scanning”3.40
- Cyber-Kraken (Threat Group 3390 / Emissary Panda)malicious-code=“backdloor”
- Korean Website Installs Banking Malwaremalicious-code=“trojan”
- Sakula Reloaded
  • #E, number of events; #att, number of attributes;

  • #AT: number of attributes after Trimmer; #AE: number of attributes after Enricher.

Table 13. PURE Events Characterization, Processed by AECCP, and TS Calculation by ETIP

  • #E, number of events; #att, number of attributes;

  • #AT: number of attributes after Trimmer; #AE: number of attributes after Enricher.

The six events received from PURE were processed by AECCP, producing the results shown in columns 6 through 8 of the table. As we can observe, AECCP could process events from an external platform. All of the events, which were not initially tagged, were classified by AECCP (column 8). In addition, the initial number of attributes (column #att) was slightly reduced (column #AT) by Trimmer. However, as explained in Section 5.2.3, AECCP adds, on average, 44 attributes per event when it enriches events. This increase can be seen in column #AE, a price to pay for the added value. But this increase allowed events to gain more information, which apparently is relevant since their threat impact grew and was reflected in their TS value (last column).

Based on these results, we can answer positively to question 7, meaning that AECCP improves the quality TI better than the other two platforms. Notice that the ETIP platform calculates the TS of events (enriched IoC), meaning that the platform contains an enricher module that aggregates and correlates events before calculating TS. Therefore, if the TS value of AECCP’s events is higher than ETIP’s events, this means that AECCP generates events with better quality than ETIP. The same is concluded about PURE.

Skip 6IMPROVEMENTS AND FUTURE WORK Section

6 IMPROVEMENTS AND FUTURE WORK

The prevention and detection of cyber-attacks have deserved significant attention from organizations, which have been adopting new strategies and defense mechanisms to protect themselves. TI has emerged as an ally of organizations, allowing them to access information about threats that have occurred. They use TI for various purposes, namely to verify whether their assets are vulnerable to an attack that has occurred, to update their defense mechanisms with rules and patterns on announced threats, and to check whether their assets have been victims of an attack.

TI must be timeless for organizations to be proactive on time and avoid severe damage. However, TI only announces attacks after they have already occurred, thus being a reactive notification [41, 51] and not much useful for victim organizations. To develop proactive TI, it is necessary to obtain data from the online hacker community to understand what is happening in that community and try to predict possible malicious actions. One way to do this is to access underground forums where, for example, hackers exchange technical mechanisms and tutorials of malicious tools that they can use to carry out attacks [41]. These tools can be found and purchased within the dark web (DW), more precisely in dark-net markets. In addition, dark-net forums are placed within the DW for the hacker community [2]. By accessing the DW data and collecting and analyzing it, it is possible to identify emerging hacker threats, thus proactive TI [42].

AECCP was designed in light of traditional TI, meaning that the unified taxonomy and the main threat attributes were defined based on public taxonomies and security events of traditional TI. AECCP can benefit from DW data in various ways:

  • The unified taxonomy can be extended with Tier 2 tags and bag of words based on terms only observed in the DW and that are related to an incident category (Tier 1 level) of UT.

  • Processing data provided by DW sources, classifying it with the extended UT and aggregating it with (i) some other DW data associated with the same attack intent. In this case, SOC analysts can get insights into malicious actions and anticipate potential attacks that have been planned. Next, they can be proactive and make decisions to prevent them against the organization; (ii) traditional TI that already exists from some announced misbehaviour but no associations and has been passed unnoticed by security analysts (e.g., some attacks that have been planned but not yet fully executed). In this case, the SOC analyst can be proactive and activate the necessary protections against the attack; (iii) traditional TI from an already occurred attack. In this case, the resulting information is reactive, but the analyst can have access to information about the attack plan and from there can make some decisions based on that.

  • Make the necessary modifications in AECCP to accept the different formats provided by the DW data.

Skip 7CONCLUSION Section

7 CONCLUSION

In this article, we proposed and presented AECCP, an implementation of an approach to improve quality threat intelligence produced by TIPs by classifying and enriching it automatically. AECCP is composed of a set of smaller solutions, each one focused on one or more limitations of TIPs, which were verified in a detailed data analysis over an intelligence dataset of more than 1,000 security events. Regarding threat knowledge management limitations and technology enablement in threat triage limitations, the platform integrates a Classifier module that classifies each event according to a UT proposed by us. To deal with the high volume of shared threat information, we proposed a Trimmer module for trimming the low-value information from each event, based on main threat attributes we discovered upon the data analysis. AECCP contains an Enricher module for data improvement that enriches each event based on intelligence collected from VirusTotal. Last, to address advanced analytics limitations, we proposed a Clusterer module that creates clusters of events that share information and context about the same threat and represents each cluster as an AECCP event.

To prove the applicability and feasibility of AECCP, the platform was developed based on the MISP platform. AECCP was validated over more than 1,000 events and tested against a dataset of 64 newer and not used events and 6 events produced by a different platform—PURE. From these tests, we created 24 clusters, classified, trimmed, and enriched by AECCP, and we were able to trim and enrich the events produced by PURE. In addition, these events were processed by another platform, ETIP, to calculate their TS. The results showed that AECCP produces quality TI better than the other platforms.

Footnotes

REFERENCES

  1. [1] Alves Fernando, Bettini Aurélien, Ferreira Pedro M., and Bessani Alysson. 2021. Processing tweets for cybersecurity threat awareness. Information Systems 95 (2021), 101586.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Arnold Nolan, Ebrahimi Mohammadreza, Zhang Ning, Lazarine Ben, Patton Mark, and Samtani Sagar. 2019. Dark-net ecosystem cyber-threat intelligence (CTI) tool. In Proceedings of the 2019 IEEE International Conference on Intelligence and Security Informatics (ISI’19). 9297.Google ScholarGoogle Scholar
  3. [3] Azevedo Rui, Medeiros Ibéria, and Bessani Alysson. 2019. PURE: Generating quality threat intelligence by clustering and correlating OSINT. In Proceedings of the 18th IEEE International Conference on Trust, Security, And Privacy in Computing and Communications (TrustCom’19). 483490.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Bromiley Matt. 2016. Threat Intelligence: What It Is, and How to Use It Effectively. Retrieved February 22, 2022 from https://nsfocusglobal.com/wp-content/uploads/2017/01/SANS_Whitepaper_Threat_Intelligence__What_It_Is__and_How_to_Use_It_Effectively.pdf.Google ScholarGoogle Scholar
  5. [5] Chen Ping, Desmet Lieven, and Huygens Christophe. 2014. A study on advanced persistent threats. In Proceedings of the 15th IFIP International Conference on Communications and Multimedia Security. 6372.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] CIRCL.lu. 2018. CIRCL Taxonomy—Schemes of Classification in Incident Response and Detection. Retrieved February 22, 2022 from https://www.circl.lu/pub/taxonomy/.Google ScholarGoogle Scholar
  7. [7] Cormack A., Jansen X., Moens A., and Peters P.. 2015. Incident Classification/Incident Taxonomy According to eCSIRT.net—Adapted. Retrieved February 22, 2022 from https://www.trusted-introducer.org/Incident-Classification-Taxonomy.pdf.Google ScholarGoogle Scholar
  8. [8] CSIRTG. 2020. The FASTEST Way to Consume Threat Intelligence. Period. Retrieved February 22, 2022 from https://csirtgadgets.com/collective-intelligence-framework.Google ScholarGoogle Scholar
  9. [9] Darknet. 2020. OpenIOC—Sharing Threat Intelligence. Retrieved February 22, 2022 from https://www.darknet.org.uk/2016/06/openioc-sharing-threat-intelligence/.Google ScholarGoogle Scholar
  10. [10] Silva Alessandra de Melo e, Gondim João José Costa, Albuquerque Robson de Oliveira, and García-Villalba Luis Javier. 2020. A methodology to evaluate standards and platforms within cyber threat intelligence. Future Internet 12, 6 (2020), 108.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Eijkman Quirine and Weggemans Daan. 2013. Open source intelligence and privacy dilemmas: Is it time to reassess state accountability? Security and Human Rights23, 4 (April 2013), 1–12.Google ScholarGoogle Scholar
  12. [12] ENISA. 2015. Standards and Tools for Exchange and Processing of Actionable Information. Technical Report. ENISA.Google ScholarGoogle Scholar
  13. [13] ENISA. 2017. Exploring the Opportunities and Limitations of Current Threat Intelligence Platforms. Technical Report. ENISA.Google ScholarGoogle Scholar
  14. [14] Commission European. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). EUR-Lex. Retrieved February 22, 2022 from https://eur-lex.europa.eu/eli/reg/2016/679/oj.Google ScholarGoogle Scholar
  15. [15] Faiella Mario, Gonzalez-Granadillo Gustavo, Medeiros Ibéria, Azevedo Rui, and Gonzalez-Zarzosa Susana. 2019. Enriching threat intelligence platforms capabilities. In Proceedings of the 16th International Conference on Security and Cryptography (SECRYPT’19). 3748.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] FireEye. 2013. Taking a Lean-Forward Approach to Combat Today’s Cyber Attacks. Technical Report. FireEye.Google ScholarGoogle Scholar
  17. [17] Glassman Michael and Kang Min Ju. 2012. Intelligence in the internet age: The emergence and evolution of Open Source Intelligence (OSINT). Computers in Human Behavior 28 (March 2012), 673682.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Gonzalez-Granadillo Gustavo, Faiella Mario, Medeiros Ibéria, Azevedo Rui, and Gonzalez-Zarzosa Susana. 2021. ETIP: An enriched threat intelligence platform for improving OSINT correlation, analysis, visualisation and sharing capabilities. Journal of Information Security and Applications 58 (May 2021), 102715.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Hribar Gasper, Podbregar Iztok, and Ivanusa Teodora. 2014. OSINT: A “Grey Zone”? International Journal of Intelligence and Counterintelligence 27, (May 2014), 529–549.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Kime Brian. 2016. Threat Intelligence: Planning and Direction. SANS Institute–InfoSec Reading Room.Google ScholarGoogle Scholar
  21. [21] Lee Robert M.. 2020. 2020 SANS Cyber Threat Intelligence (CTI) Survey. SANS Institute—InfoSec Reading Room.Google ScholarGoogle Scholar
  22. [22] Leonard Jerome. 2020. TheHive Project: Open Source, Free and Scalable Cyber Threat Intelligence & Security Incident Response Solutions. Retrieved February 22, 2022 from https://blog.thehive-project.org/tag/soltra-edge/.Google ScholarGoogle Scholar
  23. [23] Dempsey Martin E.. 2013. Joint Intelligence (JP 2-0). Technical Report. U.S. Army.Google ScholarGoogle Scholar
  24. [24] Martins Cláudio and Medeiros Ibéria. 2020. Additional Info on the Paper Submitted to ACM TOPS. Retrieved February 22, 2022 from https://sites.google.com/view/siteaddinfo-tops.Google ScholarGoogle Scholar
  25. [25] Mattern Troy, Felker John, Borum Randy, and Bamford George. 2014. Operational levels of cyber intelligence. International Journal of Intelligence and Counterintelligence 27, 4 (Dec. 2014), 702–719.Google ScholarGoogle Scholar
  26. [26] McKeon Amanda. 2016. Reduce Business Risk with an Effective Threat Intelligence Capability. Retrieved February 22, 2022 from https://www.recordedfuture.com/threat-intelligence-capability/.Google ScholarGoogle Scholar
  27. [27] Microsoft. 2018. Security Intelligence. Retrieved February 22, 2022 from https://docs.microsoft.com/en-us/windows/security/threat-protection/intelligence/.Google ScholarGoogle Scholar
  28. [28] Mirkovic Jelena and Reiher Peter. 2004. A taxonomy of DDoS attack and DDoS Defense mechanisms. ACM SIGCOMM Computer Communication Review 34, 2 (May 2004), 39–53.Google ScholarGoogle Scholar
  29. [29] MISP. 2020. MISP Taxonomies. Retrieved February 22, 2022 from https://www.misp-project.org/datamodels/#misp-taxonomies.Google ScholarGoogle Scholar
  30. [30] MISP. 2020. Open Source Threat Intelligence Platform & Open Standards for Threat Information Sharing. Retrieved February 22, 2022 from http://www.misp-project.org.Google ScholarGoogle Scholar
  31. [31] MITRE. 2020. CRITs: Collaborative Research into Threats. Retrieved February 22, 2022 from https://crits.github.io/.Google ScholarGoogle Scholar
  32. [32] OASIS. 2020. Introduction to STIX. Retrieved February 22, 2022 from https://oasis-open.github.io/cti-documentation/stix/intro.html.Google ScholarGoogle Scholar
  33. [33] OASIS. 2020. Introduction to TAXII. Retrieved February 22, 2022 from https://oasis-open.github.io/cti-documentation/taxii/intro.html.Google ScholarGoogle Scholar
  34. [34] England Bank of. 2016. Understanding Cyber Threat Intelligence Operations. Retrieved February 22, 2022 from https://www.bankofengland.co.uk/-/media/boe/files/financial-stability/financial-sector-continuity/understanding-cyber-threat-intelligence-operations.pdf.Google ScholarGoogle Scholar
  35. [35] Oosthoek Kris and Doerr Christian. 2020. Cyber threat intelligence: A product without a process? International Journal of Intelligence and CounterIntelligence 34, 2 (2020), 300315.Google ScholarGoogle Scholar
  36. [36] Pastor-Galindo J., Nespoli P., Mármol F. Gómez, and Pérez G. Martínez. 2020. The not yet exploited goldmine of OSINT: Opportunities, open challenges and future trends. IEEE Access 8 (2020), 1028210304.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Ramsdale Andrew, Shiaeles Stavros, and Kolokotronis Nicholas. 2020. A comparative analysis of cyber-threat intelligence sources, formats and languages. Electronics 9, 5 (May 2020), 824.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Lee Robert M.. 2016. Intelligence Defined and Its Impact on Cyber Threat Intelligence. Retrieved February 22, 2022 from https://www.robertmlee.org/intelligence-defined-and-its-impact-on-cyber-threat-intelligence/.Google ScholarGoogle Scholar
  39. [39] Sabottke C., Suciu O., and Dumitras T.. 2015. Vulnerability disclosure in the age of social media: Exploiting Twitter for predicting real-world exploits. In Proceedings of the 24th USENIX Security Symposium. 10411056.Google ScholarGoogle Scholar
  40. [40] Saini A., Gaur Manoj, and Laxmi Vijay. 2014. A taxonomy of browser attacks. In Handbook of Research on Digital Crime, Cyberspace Security, and Information Assurance. IGI Global, 291313.Google ScholarGoogle Scholar
  41. [41] Samtani Sagar, Chinn Ryan, Chen Hsinchun, and Jr. Jay F. Nunamaker2017. Exploring emerging hacker assets and key hackers for proactive cyber threat intelligence. Journal of Management Information Systems 34, 4 (2017), 10231053.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Samtani Sagar, Zhu Hongyi, and Chen Hsinchun. 2020. Proactively identifying emerging hacker threats from the dark web: A diachronic graph embedding framework (D-GEF). ACM Transactions on Privacy and Security 23, 4 (Aug. 2020), Article 21, 33 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Sauerwein C., Sillaber C., Mussmann Andrea, and Breu R.. 2017. Threat intelligence sharing platforms: An exploratory study of software vendors and research perspectives. Wirtschaftsinformatik und Angewandte Informatik 2017 (2017), 1–15.Google ScholarGoogle Scholar
  44. [44] SWIFT. 2019. The Evolving Cyber Threat to the Global Banking Community. Retrieved February 22, 2022 from https://www.swift.com/pt/node/147646.Google ScholarGoogle Scholar
  45. [45] Headquarters Symantec World. 2011. Advanced Persistent Threats: A Symantec Perspective. Technical Report. Symantec.Google ScholarGoogle Scholar
  46. [46] ThreatConnect. 2019. Threat Intelligence Platforms. Everything You’ve Ever Wanted to Know But Didn’t Know to Ask. ThreatConnect.Google ScholarGoogle Scholar
  47. [47] Tounsi Wiem (Ed.). 2019. What is cyber threat intelligence and how is it evolving? In Cyber-Vigilance and Digital Trust: Cyber Security in the Era of Cloud Computing and IoT. John Wiley & Sons, 1–49.Google ScholarGoogle Scholar
  48. [48] Tounsi Wiem and Rais Helmi. 2018. A survey on technical threat intelligence in the age of sophisticated cyber attacks. Computers & Security 72 (Jan. 2018), 212233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Wagner Cynthia, Dulaunoy Alexandre, Wagener Gérard, and Iklody Andras. 2016. MISP: The design and implementation of a collaborative threat intelligence sharing platform. In Proceedings of the 2016 ACM on Workshop on Information Sharing and Collaborative Security. 4956.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Webroot. 2014. Threat Intelligence: What Is It, and How Can It Protect You from Today’s Advanced Cyber-Attacks. Technical Report. Gartner.Google ScholarGoogle Scholar
  51. [51] Williams Ryan, Samtani Sagar, Patton Mark, and Chen Hsinchun. 2018. Incremental hacker forum exploit collection and classification for proactive cyber threat intelligence: An exploratory study. In Proceedings of the 2018 IEEE International Conference on Intelligence and Security Informatics. 9499.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generating Quality Threat Intelligence Leveraging OSINT and a Cyber Threat Unified Taxonomy

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Privacy and Security
                ACM Transactions on Privacy and Security  Volume 25, Issue 3
                August 2022
                288 pages
                ISSN:2471-2566
                EISSN:2471-2574
                DOI:10.1145/3530305
                Issue’s Table of Contents

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 19 May 2022
                • Accepted: 1 January 2022
                • Revised: 1 August 2021
                • Received: 1 December 2020
                Published in tops Volume 25, Issue 3

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Refereed

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format .

              View HTML Format
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!