skip to main content
research-article
Public Access

CAN Bus Intrusion Detection Based on Auxiliary Classifier GAN and Out-of-distribution Detection

Published:05 September 2022Publication History

Skip Abstract Section

Abstract

The Controller Area Network (CAN) is a ubiquitous bus protocol present in the Electrical/Electronic (E/E) systems of almost all vehicles. It is vulnerable to a range of attacks once the attacker gains access to the bus through the vehicle’s attack surface. We address the problem of Intrusion Detection on the CAN bus and present a series of methods based on two classifiers trained with Auxiliary Classifier Generative Adversarial Network (ACGAN) to detect and assign fine-grained labels to Known Attacks and also detect the Unknown Attack class in a dataset containing a mixture of (Normal + Known Attacks + Unknown Attack) messages. The most effective method is a cascaded two-stage classification architecture, with the multi-class Auxiliary Classifier in the first stage for classification of Normal and Known Attacks, passing Out-of-Distribution (OOD) samples to the binary Real-Fake Classifier in the second stage for detection of the Unknown Attack class. Performance evaluation demonstrates that our method achieves both high classification accuracy and low runtime overhead, making it suitable for deployment in the resource-constrained in-vehicle environment.

Skip 1INTRODUCTION Section

1 INTRODUCTION

Modern Connected and Automated Vehicles (CAVs) are prototypical Cyber-physical Systems [10], where the in-vehicle Electrical/Electronic (E/E) system interacts closely with its physical surroundings to form a sense-compute-actuate feedback loop, including the vehicle itself and the external driving environment. With increasing deployment of CAVs, the issue of automotive cyber-security is gaining increasing attention from both industry and academia. A CAV has a large attack surface, and attackers may gain access to its internal networks either physically through the On-Board Diagnostics (OBD-II) connector or wirelessly through the interface of the On-Board Unit (OBU) (e.g., cellular, WiFi, Bluetooth). Attackers may target non-essential functions, e.g., window lifters, indicator lights, the signal horn, or the display panel, to cause minor annoyances, or safety-critical functions, e.g., acceleration, brake, steering control inputs, to cause severe consequences.

The automotive E/E system is a distributed system consisting of multiple Electronic Control Units (ECUs) interconnected with different in-vehicle networking protocols, including Ethernet, FlexRay, Media Oriented Systems Transport (MOST), CAN, and Local Interconnect Network (LIN). The CAN bus is used for low or medium-bandwidth in-vehicle networking to connect the E/E components, including ECUs, sensors, and actuators. We focus on the CAN bus in this article, which is ubiquitously deployed in almost all vehicles, regardless of the price range or the manufacturer. The CAN bus is vulnerable to attacks due to several factors: broadcast transmission, where all nodes can hear all message transmissions on the bus; ID-based priority arbitration; and lack of authentication/encryption mechanisms. One possible defense mechanism is Message Authentication for defending against message spoofing attacks, where the sender adds a Message Authentication Code (MAC) to each message, and the receiver recomputes and verifies the MAC. This approach is constrained by the limited computing power of ECUs and the small payload size of CAN messages (8 Bytes). Proposed solutions include: adding hardware coprocessors to offload computation-intensive cryptographic algorithms from the ECU [31]; selectively adding coprocessors to a partial set of ECUs to meet timing constraints while minimizing cost [21]; truncating the MAC to reduce its length and performing successful authentication of several consecutively received messages [21, 47]; dynamically adjusting the MAC size to maximize it while meeting timing constraints [54]; security-aware obfuscated priority assignment for CAN with Flexible Data-rate (CAN-FD) [16, 53], and others. Despite these research advances, MA on the CAN bus is not yet widely adopted in the automotive industry due to its high cost or high complexity. In addition, MA can only defend against message spoofing attacks, not other types of attacks such as DoS or Fuzzing.

A more practical alternative approach is to adopt an onboard Intrusion Detection System (IDS) to monitor the CAN bus traffic to detect possible attacks. Since CAN is a multi-master broadcast bus protocol, the IDS may be deployed as a separate dedicated node on the CAN bus, implemented as either hardware (FPGA or ASIC) or integrated into one of the ECUs as a software application. Upon detecting intrusions/attacks, the Intrusion Prevention System (IPS) may take immediate countermeasures, e.g., sending error frames to stop the current transmission [18]. In addition, a detailed log of the onboard security events may be sent to the Security Operations Center in the cloud, which performs in-depth post-attack analysis of the collected data both for single vehicles and the whole fleet, e.g., root cause analysis, impact analysis, for developing threat response strategies. The automotive software standard AUTOSAR (AUTomotive Open System ARchitecture) is in the process of introducing a new standard for onboard IDS [15].

There has been significant related work on CAN bus IDS (as discussed in Section 4). When formulated as a machine learning problem, existing methods either perform coarse-grained binary classification of Normal vs. Attack (Figure 1(a)); or perform multi-class classification with \( n+1 \) classes, including Normal and n Known Attacks (KAs), i.e., to detect KAs and assign fine-grained labels to them, but cannot detect any Unknown Attack (UA) (Figure 1(b)). In this article, we consider the novel problem formulation of multi-class classification with \( n+2 \) classes, including Normal, n KA classes and one UA class (Figure 1(c)), i.e., to detect both KAs and any UA, and also assign fine-grained labels to detected KAs. To justify our problem formulation: On the one hand, it is important to be able to assign fine-grained labels to KAs, since knowing the specific attack type can be quite useful for selecting appropriate counter-measures and performing post-attack analysis; on the other hand, it is important to be able to detect any UA, since attackers may come up with novel zero-day attacks that may not fit the patterns of existing attacks.

Fig. 1.

Fig. 1. Machine Learning-based approaches to CAN bus IDS.

We propose and evaluate four different methods1 and demonstrate that the best-performing method is the one based on Auxiliary Classifier Generative Adversarial Network (ACGAN) [38] in combination with Out-of-Distribution (OOD) detection [24], where the Auxiliary Classifier (AC) in ACGAN is used for assigning fine-grained labels to KAs, OOD detection is used for preliminary detection of the UA class, and the Real-Fake Classifier (RFC) in ACGAN is used for final determination of the UA class.

Table 1 contains frequently used abbreviations used in this article.

Table 1.
IDSIntrusion Detection System
KAKnown Attack
UAUnknown Attack
FFNNFeed-Forward Neural Network
CNNConvolutional Neural Network
GANGenerative Adversarial Network
DGAN Discriminator
GGAN Generator
OODOut-of-Distribution
MSPMaximum SoftMax Probability

Table 1. Abbreviations Used in This Article

The rest of the article is structured as follows: We present the problem formulation and our approach in Section 2, including the input encoding method, ACGAN, OOD detection, and our four proposed methods; performance evaluation in Section 3; discussions of related work in Section 4; and conclusions and future work in Section 6.

Skip 2PROBLEM FORMULATION AND OUR APPROACH Section

2 PROBLEM FORMULATION AND OUR APPROACH

2.1 Introduction to the CAN Bus Protocol

Figure 2 shows the format of a CAN message, which consists of a header (CAN ID) with the standard format of 11 bits and a payload of up to 8 Bytes. (We assume the standard CAN ID length of 11 bits in this article, but the extended CAN ID length of 29 bits can be easily handled by adjusting the input encoding.) Each bit in the CAN ID is either dominant (0) or recessive (1), and the dominant bit wins in case of a collision on the bus. This implies that the CAN ID determines message priority, and a message with a smaller CAN ID has higher priority. The broadcast nature of the CAN bus creates opportunities for attackers to inject malicious messages from a compromised ECU, and the simple priority arbitration protocol means that malicious messages with smaller CAN IDs can cause undue delays to legitimate messages.

Fig. 2.

Fig. 2. Format of a standard CAN message.

Figure 3 illustrates the three common types of attacks on the CAN bus addressed in this article:

Fig. 3.

Fig. 3. Three common types of attacks on the CAN bus.

  • DoS (Denial-of-Service): The attacker injects high-frequency messages with high-priority (e.g., with CAN ID 0 \( \times \) 000 or some other ID that is known to be smaller than all legitimate message IDs) to flood the bus and occupy the bus bandwidth to prevent the legitimate messages from transmitting successfully, e.g., in Figure 3(a), the messages with CAN IDs 0 \( \times \) 387 and 0 \( \times \) 590 are both delayed by the attack messages with CAN ID 0 \( \times \) 000.

  • Fuzzing: The attacker injects CAN messages with random CAN ID, to cause delays to certain legitimate messages with lower priority (larger CAN IDs), e.g., in Figure 3(b), the message with CAN ID 0 \( \times \) 387 sent by the legitimate node ECU1 is not delayed by the two attack messages due to its smaller ID value; the message with CAN ID 0 \( \times \) 590 sent by the legitimate node ECU2 is delayed by the attack message with CAN ID 0 \( \times \) 419, but not by the attack message with CAN ID 0 \( \times \) 897. This type of attack is more stealthy than DoS attack, thanks to the randomness of the CAN IDs of the injected messages.

  • Spoofing: The attacker injects messages with specific CAN IDs that belong to legitimate nodes, which may be captured by observing the traffic on the CAN bus before the attack and with modified payloads, e.g., the two attack messages have the same CAN IDs 0 \( \times \) 387 and 0 \( \times \) 590 as the legitimate nodes. The attacker’s goal is not to cause delays to normal messages, but to inject false signals to be read by other nodes. (In the public car-hacking dataset [50], two types of spoofing attacks are included, including GEAR for Gear-Shift and RPM for the engine’s rotation speed measured with Revolutions-Per-Minute.)

2.2 Input Encoding

To apply Deep Learning to CAN bus intrusion detection, we need to encode a sequence of CAN IDs into a format suitable for input to a Feed-Forward Neural Network (FFNN) (including Multi-Layer Perceptron (MLP) and Convolutional Neural Network (CNN)), which takes an image as input data, which may be a 1D time series or a 2D image of pixels. We perform a preprocessing step to encode a consecutive sequence of CAN IDs into a 2D Image. Figure 4(a) shows one method of constructing an \( N\times 11 \) 2D square image by stacking a set of N (here, \( N=11 \)) CAN IDs, each 11 bits long [45]. We adopt another method of constructing a \( N\times 48 \) 2D square image with one-hot encoding [42], shown Figure 4(b). Each 11-bit CAN ID is converted into a 48-bit vector, which consists of three 16-bit vectors, each one-hot encoding one hex digit, e.g., the ID of \( 0 \times 316 \) in hexadecimal format is converted into three 16-bit vectors, each encoding a hex number 3, 1, 6, respectively. We then stack a set of N (here, \( N=48 \)) CAN IDs, each with length 48 bits, to form a \( 48\times 48 \) 2D square image. The terms “input sample” or “sample” are used to refer to each CAN image. This approach to CAN image construction is inspired by GIDS (GAN-based Intrusion Detection) [42].

Fig. 4.

Fig. 4. Two different ways of encoding a sequence of CAN IDs into a 2D image.

  • We choose the one-hot encoding method, which was shown [42] to result in larger differences and easier separation between normal images and attack images than the direct encoding method.

  • The input size N, i.e., the number of CAN messages grouped into a single CAN image, is an important hyper-parameter that can be selected to trade off between detection accuracy and computation overhead. Experimental results in Reference [42] indicate that the input size of 64 achieves the best accuracy. (In our experiments, we set \( N=48 \) to form a square image, which is the most common image shape in computer vision.)

Figure 5 shows some example CAN images obtained with one-hot encoding (refer to Section 3.1 for details on the different attack types). The differences among them may not be apparent by visual inspection, as the number of attack messages may be a small percentage of the total 48 messages. However, these differences enable the application of powerful Machine Learning (ML), especially Deep Learning (DL) methods, to distinguish between them.

Fig. 5.

Fig. 5. Example \( 48\times 48 \) CAN images for normal data and different types of KAs.

2.3 GAN and ACGAN

Figure 6(a) shows Generative Adversarial Network (GAN) [19], a framework for joint training of two models simultaneously, generator G and discriminator D. G aims to capture the data distribution. It takes as input a random noise vector z from an arbitrary latent distribution (noise prior) and generates a synthetic (fake) sample \( X_{fake} = G(z) \). D aims to distinguish between real samples and fake samples generated by G. It receives as input sample, either a real sample \( X_{real} \) or a fake sample \( X_{fake} \) from G, and outputs a probability distribution \( P(S\mid X) \) over two possible sample sources S = {real, fake}. D is a CNN backbone for feature extraction, with a binary classifier head, called the Real-Fake Classifier (RFC) in this article. The RFC is typically implemented as a Sigmoid function with optional fully connected layers preceding it. (A minor technical detail is whether the RFC is viewed as part of D or separate from D, which consists of the CNN backbone only. We take the latter view in this article.) The GAN loss function is defined as the loglikelihood of the correct source: (1) \( \begin{equation} L = E[\log P(S = real \mid X_{real})] + E[\log P(S = fake \mid X_{fake})]. \end{equation} \)

Fig. 6.

Fig. 6. Regular GAN vs. ACGAN.

D is trained to maximize L, i.e., to use the RFC to distinguish between real vs. fake samples, and G is trained to minimize the second term in L, i.e., to generate realistic fake samples that fool the RFC of D into classifying it as real.

Figure 6(b) shows ACGAN [38], an extension of GAN with an additional multiclass Auxiliary Classifier (AC) head, a SoftMax layer with optional fully connected layers preceding it, which shares the same CNN backbone D with the RFC for feature extraction. Every generated sample has a corresponding class label \( c\sim p_c \) in addition to the noise z. G uses both to generate class-conditional fake samples \( X_{fake}=G(c, z) \). For each input sample, D performs two tasks with its two classifier heads: The binary RFC classifies it into either real or fake by outputting a probability distribution \( P(S\mid X) \), regardless of its class label c; the multiclass AC assigns it a class label by outputting a probability distribution \( P(C\mid X) \) over all possible class labels, regardless of its source S being real or fake. The loss function for ACGAN has two parts: (2) \( \begin{align} L_S &= E[\log P(S = real \mid X_{real})] + E[\log P(S = fake \mid X_{fake})], \nonumber \nonumber\\ L_C &= E[\log P(C = c \mid X_{real})] + E[\log P(C = c \mid X_{fake})]. \end{align} \)

D is trained to maximize \( L_C + L_S \), and G is trained to maximize \( L_C - L_S \). In addition to \( L_S \), the same loss function of GAN (Equation (1)), the additional term \( L_C \) in ACGAN’s loss function forces G to generate realistic fake samples with some given class label c, and the AC of D to perform accurate classification of both real and fake samples with label c. It has been shown that this small extension to the standard GAN helps stabilize training and learn better representations [38].

The original design intention of ACGAN is to use the AC to help improve the GAN training process and train a better G that can generate diverse and high-resolution class-conditional photo-realistic image samples; D serves in an auxiliary role to help the training of G, and is not useful after training. In contrast, we apply ACGAN in an unconventional way: Our goal is not to generate photo-realistic image samples, but to train a better D, with its two classifier heads RFC and AC; G serves in an auxiliary role to help the training of D and is not useful after training.

2.4 Out-of-Distribution (OOD) Detection

OOD samples refer to samples that fall outside of the distribution of the training dataset, i.e., outliers/anomalies. The opposite of OOD is In-Distribution (InD), or inliers/normal data. An OOD detector [24] is a binary classifier that predicts an InD or OOD label to an input sample. OOD detection is an active research area with a wide range of algorithms and techniques [9], including our previous work [35] on using Isolation Forest (IF) or Local Outlier Factor (LOF) for outlier detection in one or more hidden layers of a CNN.

Hendrycks and Gimpel [24] presented a simple baseline method of using the predicted Maximum SoftMax Probability (MSP) to separate OOD samples (with low MSP) from InD samples (with high MSP) based on a threshold th, called the OOD threshold in this article. An input sample is determined to be OOD if its MSP does not exceed the OOD threshold. As an example, Figure 7 shows two possible probability distributions as output from the SoftMax layer for two different input samples. In both cases, the classifier predicts the MSP class Normal, but the classifier has different levels of confidence in its prediction. Suppose we set the OOD threshold \( th=0.8 \), then the input sample \( X_1 \) corresponding to the distribution \( P(C\mid X_1) \) in Figure 7(a) has \( MSP=0.9\ge th \) and is determined to be Normal; the input sample \( X_2 \) corresponding to the distribution \( P(C\mid X_2) \) in Figure 7(b) has \( MSP=0.5\lt th \) and is determined to be OOD. The MSP method is not the most accurate OOD detection algorithm, since Neural Networks often give incorrect yet over-confident predictions, but it is widely used in practice, since its accuracy is often good enough, and it is very efficient, incurring no additional overhead beyond the forward inference time. Most of the more sophisticated OOD detection algorithms [9], including our own work based on Isolation Forest (IF) and Local Outlier Factor (LOF) [35] have too high runtime overhead to be used for CAN bus IDS.

Fig. 7.

Fig. 7. Two different SoftMax distributions for illustration of OOD detection based on MSP.

2.5 Four Methods for CAN Bus IDS

2.5.1 General Observations and Statements.

The CAN bus IDS may be developed by a supplier as a general-purpose product sold to OEMs (Original Equipment Manufacturers), hence, it must be able to handle a wide range of diverse applications across different OEMs and vehicle models. Hence, the Normal class is likely to contain CAN messages from different applications with diverse features, i.e., an irregular, multimodal distribution in the feature space. In contrast, each KA is likely to have its own distinct features, with a concentrated, sharp distribution in the feature space. The UA class by definition will have diverse features, since it is novel and unseen during training time. If there is no UA, i.e., all attacks are KAs with available training data, then it is straightforward to train a classifier with Supervised Learning for either binary classification of Normal vs. Attack, or multiclass classification of Normal and n KAs. However, the problem becomes more difficult if we want to detect both KAs and any UA, and also assign fine-grained labels to detected KAs.

We make a set of important statements in Table 2. The reason for S1 is that UA samples are not in the training dataset. Hence, they are likely to result in lower MSP values. The reason for S2 is that Normal samples may have more diverse features than any of the KAs, hence, they are more difficult to classify and result in lower MSP than any of the KA classes. The reasons for S3 and S4 are that Normal samples and UA samples are more easily confused with each other, whereas each KA’s samples are more easily classified and distinguished from Normal samples and UA samples. These statements will be confirmed by experimental results later.

Table 2.
S1A UA sample is more likely to be OOD (with MSP \( \lt \) th) than a Normal or a KA sample.
S2A Normal sample is more likely to be OOD (with MSP \( \lt \) th) than a KA sample.
S3A UA sample is more likely to be misclassified as Normal than as one of the KAs.
S4A UA sample is more likely to be misclassified as Normal than a KA sample being misclassified as Normal.

Table 2. Important Statements as Design Rationales

We propose four CAN bus IDS methods as shown in Figure 8. Assuming n KAs, the fine-grained known/unknown IDS performs multiclass classification among \( n+2 \) classes \( C=\lbrace \text{Normal}, \text{KA}_1, \ldots , \text{KA}_n, \text{UA}\rbrace \), including \( n+1 \) classes (Normal and n KAs) with labeled samples in the training dataset, plus one additional UA class not in the training dataset. The four methods share the common first step of performing multiclass classification among \( n+1 \) classes \( C=\lbrace \text{Normal}, \text{KA}_1, \ldots , \text{KA}_n\rbrace \), including Normal and n KAs, This (\( n+1 \))-class classifier may be trained with Supervised Learning, as in CNN-Th, or may be trained as the AC in ACGAN. They differ in the next step of identifying/classifying UA samples, based on the statements in Table 2. The RFC in ACGAN is used by two of the four methods, which is a binary classifier with label real corresponding to one of the \( n+1 \) classes (Normal and n KAs) and the label fake corresponding to the UA class. (Note that we assign the class label UA to all possible UAs, since we do not distinguish among different types of UAs.)

Fig. 8.

Fig. 8. Four proposed methods for CAN bus IDS.

2.5.2 CNN-Th.

Figure 8(a) shows the CNN-Th method. The CNN is used for assigning fine-grained labels to KAs, and OOD detection is used for detection of the UA class (with MSP \( \lt \) th). Its design rationale is Statement S1 in Table 2: “A UA sample is more likely to be OOD than a Normal or a KA sample.” CNN-Th adopts the simplistic assumption that all OOD samples should be labeled as UA. As shown in Equation (3), we use the MSP method [24] for OOD detection: (1) For a given input sample X, compute the MSP \( P(c^*\mid X) \) over the known \( n+1 \) classes \( c_i \in C=\lbrace \text{Normal}, \text{KA}_1, \ldots , \text{KA}_n\rbrace \); (2) if the MSP exceeds the OOD threshold (\( MSP\ge th \)), then we trust this prediction and return the MSP class \( c^* \); otherwise, the input sample is determined to be OOD and labeled as UA. (3) \( \begin{align} c^*= {\left\lbrace \begin{array}{ll}\operatorname{argmax}_{c_{i} \in \mathcal {C}} P\left(c_{i} \mid X\right) & \text{ if } P\left(c^{*} \mid X\right) \ge th \\ UA & \text{ Otherwise } \end{array}\right.} \end{align} \)

2.5.3 ACGAN-AC-Th.

Figure 8(b) shows the ACGAN-AC-Th method. It is similar to CNN-Th except that the CNN is replaced by the AC in ACGAN, which is used for assigning fine-grained labels to KAs; OOD detection is used for detection of the UA class (with MSP \( \lt \) th). The RFC in ACGAN is not used here. Our experimental results indicate that the AC trained with ACGAN generally has better performance than a CNN with similar architecture but trained with Supervised Learning, since the training dataset for the AC consists of both the labeled training dataset (the real samples) and the synthetic (fake) samples generated by G. If G is well-trained and generates realistic fake samples, then this is an effective form of data augmentation, hence, the training dataset for the AC is enlarged significantly compared to the CNN, which only uses the real samples. We expect the ACGAN-based approach to be especially beneficial in the case of unbalanced datasets with limited attack data samples, which is a common situation in practice.

2.5.4 ACGAN-AC-RFC-Normal.

Figure 8(c) shows the ACGAN-AC-RFC-Normal method, a cascaded two-stage classification architecture. The AC in ACGAN is used for assigning fine-grained labels to KAs; assuming the KAs are classified correctly but UA samples may be misclassified as the Normal class, the RFC in ACGAN is used for determination of the UA class. Its design rationale includes statements S3 and S4 in Table 2: “A UA sample is more likely to be misclassified as Normal than as one of the KAs” and “A UA sample is more likely to be misclassified as Normal than a KA sample being misclassified as Normal.” The logits (feature vector from the last hidden layer of the CNN backbone D) are passed to both the first-stage multiclass AC and the second-stage binary RFC. If the AC predicts one of the KAs as the MSP class among the \( n+1 \) classes, then we trust its prediction and return that class label. Otherwise, the AC predicts Normal as the MSP class. We do not trust its prediction and refer to the second-stage RFC. If the RFC predicts real, then return Normal; if it predicts fake, then return UA.

2.5.5 ACGAN-AC-RFC-Th.

Figure 8(d) shows the ACGAN-AC-RFC-Th method, another cascaded two-stage classification architecture. The AC in ACGAN is used for assigning fine-grained labels to KAs, OOD detection is used for preliminary detection of the UA class (with MSP \( \lt \) th), and the RFC in ACGAN is used for final determination of the UA class. Its design rationale includes statements S1 and S2 in Table 2: “A UA sample is more likely to be OOD than a Normal or a KA sample” and “A Normal sample is more likely to be OOD than a KA sample.” Since not all OOD samples are UA samples, i.e., some OOD samples may be Normal or KA samples, we add the RFC to classify the UA samples more precisely. After the first-stage AC has computed the probability distribution \( P(C\mid X) \) over the \( n+1 \) classes C for input sample X, if \( MSP\ge th \), then we trust its prediction and return that class label. Otherwise, the input sample X is determined to be OOD, so we do not trust the AC’s prediction of the MSP class and look at the second stage RFC. If the RFC predicts real, then return the MSP class label from the AC, which may be Normal or one of the n KAs; if RFC predicts fake, then return UA. (Note that the RFC has the same decision logic in ACGAN-AC-RFC-Normal: If the RFC predicts real, then return Normal, which is also the MSP class label from the AC.)

For further clarification, Tables 3 and 4 compare the decision logics of ACGAN-AC-RFC-Normal in Figure 8(c) and ACGAN-AC-RFC-Th in Figure 8(d) in tabular form.

Table 3.

Table 3. With ACGAN-AC-RFC-Normal, Samples with MSP Class Normal Are Passed to the RFC

Table 4.

Table 4. With ACGAN-AC-RFC-Th, OOD Samples (with MSP \( \lt \) th) are Passed to the RFC

Skip 3PERFORMANCE EVALUATION Section

3 PERFORMANCE EVALUATION

In this section, we present the details of constructing the training and test datasets in Section 3.1; the ACGAN model details in Section 3.2; performance comparisons among our four proposed methods in Section 3.3; performance comparisons with related work in Section 3.4; and timing performance in Section 3.5.

3.1 Dataset Construction

We use the public car-hacking dataset [50] in our experiments, which was constructed by the authors of Reference [42] by logging CAN traffic via the OBD-II port from a real vehicle while message injection attacks were carried out. It consists of CAN message sequences for the Normal class and four attack classes: DoS, Fuzzing, Spoofing of GEAR or RPM messages (denoted as GEAR and RPM in short), as shown in Figure 3.

We group a consecutive sequence of \( N=48 \) CAN IDs into a CAN image with size \( 48\times 48 \), as shown in Figure 4(b). Each normal CAN image contains 48 normal CAN messages, and each KA or UA CAN image contains 48 CAN messages with at least one attack message of the specific attack type (this corresponds to attack threshold of 1 in GIDS [42]). Inspired by the common method of emulating OOD samples in research works on OOD detection [9, 35], we emulate UA samples by samples of one KA that are excluded from the training dataset. Suppose we have 3 KAs (Fuzzing, RPM, GEAR) and 1 UA (DoS). The training dataset consists of samples of \( n+1=4 \) classes, including Normal and \( n=3 \) KAs (Fuzzing, RPM, GEAR). We exclude samples of the DoS attack class from the training dataset to use them as UA samples during testing. The test dataset contains samples of \( n+2=5 \) classes, including Normal, 3 KAs (Fuzzing, RPM, GEAR), and 1 UA (DoS). In our experiments, the training dataset consists of 40,000 samples (1.92M messages), 10,000 for each of the 4 classes (Normal and 3 KAs); the test dataset consists of a total of 25,000 samples (1.2M messages), 5,000 for each of the 5 classes (Normal, 3 KAs, and 1 UA). (In addition to using DoS as the UA, we also tried other KA/UA splits and obtained similar results).

3.2 The ACGAN Model and Training Procedure

Table 5 shows the ACGAN model architecture based on the open-source repository [32], adapted to fit our problem setting, i.e., the set of classes for the AC. The CNN architecture used in the CNN-th method is identical to the Discriminator D, hence, not included in the table. (Note that more sophisticated CNN architectures may be adopted to achieve even better performance, e.g., the Inception-ResNet [23] as used by Song et al. [45], but we view this as an orthogonal issue to our main contribution, which is the high-level architecture design of the two-stage classifier shown in Figure 8.)

Table 5.
OperationFilter (K \( \times \) K/S, P)Output Feature MapBNDropoutActivation
Generator
Linear Embedding\( 6\times 6\times 128 \)
Upsample\( 12\times 12\times 128 \)
Convolution\( 3\times 3/1,1 \)\( 12\times 12\times 64 \)\( \checkmark \)Leaky ReLU
Upsample\( 24\times 24\times 64 \)
Convolution\( 3\times 3/1,1 \)\( 24\times 24\times 32 \)\( \checkmark \)Leaky ReLU
Upsample\( 48\times 48\times 32 \)
Convolution\( 3\times 3/1,1 \)\( 48\times 48\times 16 \)\( \checkmark \)Leaky ReLU
Convolution\( 3\times 3/1,1 \)\( 48\times 48\times 1 \)Tanh
Latent Dimension (z)256
Leaky ReLU slope0.2
BN momentum0.8
OptimizerAdam(lr = 0.0002, betas = (0.5,0.999))
Discriminator
Convolution\( 3\times 3/2, 1 \)\( 24\times 24\times 16 \)\( \checkmark \)Leaky ReLU
Convolution\( 3\times 3/2, 1 \)\( 12\times 12\times 32 \)\( \checkmark \)\( \checkmark \)Leaky ReLU
Convolution\( 3\times 3/2, 1 \)\( 6\times 6\times 64 \)\( \checkmark \)\( \checkmark \)Leaky ReLU
Convolution\( 3\times 3/2, 1 \)\( 3\times 3\times 128 \)\( \checkmark \)\( \checkmark \)Leaky ReLU
Linear (RFC)2sigmoid
Linear (AC)4logSoftmax
Leaky ReLU slope0.2
BN momentum0.8
Dropout0.5
OptimizerSGD(lr = 0.001 down to 0.0002, momentum = 0.9)
  • Each convolutional filter has size K \( \times \) K, with stepsize S and padding P. BN denotes Batch Normalization.

Table 5. Architecture and Hyperparameter Settings of the ACGAN

  • Each convolutional filter has size K \( \times \) K, with stepsize S and padding P. BN denotes Batch Normalization.

Our hardware platform for model training is a Linux workstation with CPU: Intel(R) Xeon(R) E5-2650 v4 @ 2.20 GHz; RAM: 16 GB; GPU: NVIDIA GeForce GTX 2080Ti. We use the deep learning framework PyTorch. We adopted the hyperparameter settings for GAN training primarily from Reference [32], but adjusted the learning rates based on Reference [29] to mitigate instability during training. For GAN training, we use the Adam optimizer for G with a constant learning rate of 0.0002; the SGD optimizer for D with an initial high learning rate of 0.001, gradually reduced to 0.0002 during the training process. It is well-known that model selection for GAN is challenging due to possible instability during training, and a longer training time does not necessarily lead to better performance [29]. Typically, the goal of GAN training is to train the G to generate realistic images, so one may select the G by visual inspection of generated images from different model checkpoints during the training process. However, our goal is different from the conventional GAN training, as we want to train the D, including both AC and RFC, to have good performance. Therefore, we adopt the approach from Reference [29]: Set aside 10% of the entire training dataset as the validation dataset, and use the remaining 90% as the actual training dataset. (Note that the typical k-fold cross-validation in Supervised Learning is not suitable for GAN training.) The training process consists of 1,000 epochs, with Batch Size 64. After each epoch, we checkpoint the model parameters and measure the overall classification performance on the validation dataset in terms of the macro F1-score (computed with Equation (6)). At the end of the training process, we select the D (the AC and the RFC) that achieves the highest macro F1-score. This implies that we may select different D for different methods in Figure 8, or even different OOD thresholds with the same method. To have fair performance comparisons, the hyperparameters for training the CNN are set to be the same as those for training D, in terms of the SGD optimizer and learning rate schedule. The entire training process lasts on average 11.5 hours for ACGAN and 9 hours for CNN, including both training time on the GPU and testing time on the validation dataset on the CPU.

3.3 Performance Comparisons among Proposed Methods

For a binary classifier, the metrics of accuracy A, recall R, precision P, and F1 score F1 are defined as follows: (4) \( \begin{align} A = \frac{TP+TN}{TP+TN+FP+FN}, \qquad R = \frac{TP}{TP+FN}, \qquad P = \frac{TP}{TP+FP}, \qquad F1 = \frac{2P\cdot R}{P+R}, \end{align} \) where TP, FP, FN stand for True Positive, False Positive, False Negative, respectively. For anomaly detection, typically the Normal class is viewed as the negative class, and the Attack class is viewed as the positive class.

For a multiclass classifier with m classes, the metrics of accuracy \( A_i \), recall \( R_i \), precision \( P_i \), and F1 score \( F1_i \) are defined for each class \( 1 \le i \le m \): (5) \( \begin{align} A_i = \frac{TP_i+TN_i}{TP_i+TN_i+FP_i+FN_i}, \quad R_i = \frac{TP_i}{TP_i+FN_i}, \quad P_i = \frac{TP_i}{TP_i+FP_i}, \quad F1_i = \frac{2P_i\cdot R_i}{P_i+R_i}, \end{align} \) where class i is the positive class, and all other classes in aggregate form the negative class in a One-vs.-Rest approach. For example, if class i is a KA, then the Normal class and all the other KAs are viewed as the negative class; If class i is Normal, then all the KAs are viewed as the negative class (which is counter-intuitive).

For a multiclass classifier, the macro versions of recall, precision, and F1 score are defined to measure the average performance of the classifier across each of the m classes [20] (note that there is no macro version of the accuracy metric): (6) \( \begin{align} R_{macro} = \frac{1}{m}\sum _i^m R_i, \qquad P_{macro} = \frac{1}{m}\sum _i^m P_i, \qquad F1_{macro} = \frac{2*P_{macro}\cdot R_{macro}}{P_{macro}+R_{macro}}. \end{align} \)

Among all the metrics above, \( F1_{macro} \) is the most comprehensive and important performance metric, since it takes into account both recall and precision of all classes.

All our four proposed methods are multiclass classifiers with \( n+2=5 \) classes. For cascaded two-stage classification architectures, we treat the two classifiers as a single aggregate classifier. In subsequent tables showing performance results, we include the metrics for each of the 4 attack classes (3 KAs and 1 UA), computed with Equation (5), but not those for the Normal class (which can be derived from the metrics for the 4 attack classes if needed), as well as the macro metrics, computed with Equation (6), by averaging across all 5 classes.

The OOD threshold is an important hyper-parameter with a large impact on the IDS performance. (The decision threshold for the sigmoid function in the RFC is another hyper-parameter, which is set to be the default threshold of 0.5 after experimentation.) Figure 9 shows how the macro F1 scores of the four methods are affected by the OOD threshold. We observe that: (1) the optimal OOD threshold setting is different for each of the three methods that are based on OOD detection; (2) different methods have different levels of sensitivity to the OOD threshold, with ACGAN-AC-RFC-Th being the least sensitive among the three. It is beneficial to be insensitive to the OOD threshold, as this reduces the burden of searching for the optimal OOD threshold at deployment time; (3) ACGAN-AC-RFC-Th has the best performance among all four methods, regardless of the OOD threshold value; (4) Since ACGAN-AC-RFC-Normal is not based on OOD detection, its F1 score is a horizontal line in Figure 9 and the same across Tables 6 to 14.

Fig. 9.

Fig. 9. Tuning the OOD threshold to maximize the macro F1 score.

Table 6.
ClassMethodRecall(%)Precision(%)F1-Score(%)
MacroCNN-Th(th=0.96)88.7089.3188.54
ACGAN-AC-Th (th=0.94)96.7196.7896.73
ACGAN-AC-RFC-Normal96.6296.7696.57
ACGAN-AC-RFC-Th (th=0.995)99.2399.2499.23
UA (DoS)CNN-Th(th=0.96)67.0874.6270.65
ACGAN-AC-Th (th=0.94)94.2490.5892.37
ACGAN-AC-RFC-Normal86.6699.8892.80
ACGAN-AC-RFC-Th (th=0.995)98.9299.6499.28
FuzzingCNN-Th(th=0.96)97.1277.7286.34
ACGAN-AC-Th (th=0.94)95.8299.3297.54
ACGAN-AC-RFC-Normal97.5092.2994.83
ACGAN-AC-RFC-Th (th=0.995)98.4099.9299.15
GEARCNN-Th(th=0.96)99.6099.7299.66
ACGAN-AC-Th (th=0.94)99.4099.6299.51
ACGAN-AC-RFC-Normal99.5895.9097.70
ACGAN-AC-RFC-Th (th=0.995)99.7099.3699.53
RPMCNN-Th(th=0.96)99.7095.4497.53
ACGAN-AC-Th (th=0.94)99.50100.0099.75
ACGAN-AC-RFC-Normal99.5699.8499.70
ACGAN-AC-RFC-Th (th=0.995)99.7099.9099.80
  • Red font denotes entries with the highest F1-Scores.

Table 6. Performance Results with Optimal OOD Thresholds for Each Method

  • Red font denotes entries with the highest F1-Scores.

Table 6 shows performance results for 4 attack classes (3 KAs: Fuzzing, GEAR, RPM and 1 UA: DoS), with optimal OOD thresholds for each of the three methods, i.e., 0.96 for CNN-Th, 0.94 for ACGAN-AC-Th; 0.995 for ACGAN-AC-RFC-Th, based on Figure 9. (We include additional performance evaluation results in the Appendix, with the same OOD threshold for all methods.) We make the following observations:

  • ACGAN-AC-RFC-Th has the best overall performance in terms of the macro metrics. It has a slightly lower F1 score (99.53%) than CNN-Th for the KA GEAR (99.66%), but the difference is small and within the range of statistical noise, and both have very good performance. This is likely because the class GEAR is inherently easy to classify.

  • CNN-th has the worst overall performance in terms of the macro metrics and especially for the UA class (emulated by DoS), with F1 score of 70.65%. This highlights the importance of having both the AC for distinguishing between Normal and KAs and the RFC for distinguishing between real (Normal and KAs) and fake data (UA). Comparing CNN-Th with ACGAN-AC-Th, we can see that AC trained with ACGAN by itself (without the RFC) already outperforms CNN trained with Supervised Learning, thanks to the massive amounts of synthetic (fake) data generated by G. Our training dataset is artificially balanced, with an equal number of samples (5,000) for each class. In practice, the training dataset is likely to be very unbalanced with limited attack data samples, and we expect the ACGAN-based approach to be even more beneficial and outperform CNN-th, as mentioned in Section 2.5.3.

3.3.1 Performance Evaluation of RFC.

Having evaluated the performance of the overall cascaded classifier, we now focus on the second-stage binary RFC used in the two methods ACGAN-AC-RFC-Normal and ACGAN-AC-RFC-Th. Recall that the RFC’s goal is to classify UA samples as fake, and all other samples (Normal and KAs) as real. Table 7 shows the Confusion Matrix definition for the RFC, where the Ground Truth (GT) positive is the UA class, and the GT negative is defined as either Normal or one of the KAs (denoted KAi) aggregated together. Note the difference from a traditional binary classifier for anomaly detection, where the GT positive is the Attack class (anomaly), and the GT negative is the Normal class. For example, an input sample with GT class KAi is a GT negative sample. It is an FP if it is classified as fake by RFC, hence, predicted to be the UA class; it is a TN if it is classified as real by RFC, hence, predicted to be either Normal in case of ACGAN-AC-RFC-Normal or the MSP class among the \( n+1 \) classes (Normal and n KAs) in case of ACGAN-AC-RFC-Th. In the latter case, it may be given the wrong label, hence, should be an FP or FN from the overall cascaded classifier’s perspective, yet it is still a TN from the RFC’s perspective.

Table 7.

Table 7. Confusion Matrix Definition for RFC

Table 8 shows the composition of the samples passed to the RFC, and Table 9 shows the confusion matrices and metrics for the RFC, with three different OOD threshold settings. (We do not show the multiclass confusion matrix for the overall classifier due to space limitations.) Table 8 can be used to confirm the statements in Table 2 in Section 2.4:

Table 8.
Method# SamplesGT PosGT Neg
UANormalKAi
ACGAN-AC-RFC-Normal9,5404,3934,992155
ACGAN-AC-RFC-Th (th=0.94)5,8064,818845143
ACGAN-AC-RFC-Th (th=0.96)5,9954,892915188
ACGAN-AC-RFC-Th (th=0.995)9,4274,9944,068365

Table 8. Composition of Samples Passed to the RFC

Table 9.
MethodTPTNFPFNRecall (%)Precision(%)F1 Score(%)
ACGAN-AC-RFC-Normal4,3335,14256098.6399.8899.26
ACGAN-AC-RFC-Th (th=0.94)4,767977115198.9499.7799.35
ACGAN-AC-RFC-Th (th=0.96)4,8671,09762599.4999.8899.68
ACGAN-AC-RFC-Th (th=0.995)4,9464,415184899.0499.6499.34

Table 9. Confusion Matrices and Classification Metrics for RFC

  • For S1 “A UA sample is more likely to be OOD than a Normal or a KA sample” and S2 “a Normal sample is more likely to be OOD than a KA sample,” consider the three rows for ACGAN-AC-RFC-Th with different OOD thresholds: They all have the numeral ranking # UA samples \( \gt \) # Normal samples \( \gt \gt \) # KA samples, e.g., the row for ACGAN-AC-RFC-Th (\( th=0.94 \)) shows that among 5,806 OOD samples, 4,818 are UA samples, 845 are Normal samples, and 143 are KA samples. (Keep in mind that the test dataset is balanced with 5,000 samples for each class.)

  • For S3 “A UA sample is more likely to be misclassified as Normal than as one of the KAs,” consider the row for ACGAN-AC-RFC-Normal: Among 5,000 UA samples, 4,393 are misclassified as Normal, and 607 are misclassified as one of the KAs.

  • For S4 “A UA sample is more likely to be misclassified as Normal than a KA sample being misclassified as Normal,” consider the row for ACGAN-AC-RFC-Normal: Among the 9,540 samples classified as Normal, 4,393 are UA samples (misclassified); only 155 are KA samples (misclassified); the remaining 4,992 are Normal samples (correctly classified).

3.3.2 Summary and Discussions.

We draw the following conclusions from the experimental results:

  • OOD detection alone is insufficient in identifying the UA samples, hence, we cannot blindly label all OOD samples to be UA and must rely on the second-stage RFC to classify UA samples. The OOD samples are likely to contain a mixture of Normal, KA, and UA samples (though UA samples make up the highest percentage), and a higher OOD threshold causes more input samples to be determined to be OOD across all the classes. Fortunately, the RFC has very good performance, proving the effectiveness of ACGAN training.

  • Among the two-stage multiclass classification architectures, ACGAN-AC-RFC-Th is superior to ACGAN-AC-RFC-Normal, i.e., it is better to pass OOD samples (with MSP \( \lt \) th) than to pass samples labeled Normal to the RFC. Among the total 5,000 UA samples in the test dataset, with ACGAN-AC-RFC-Normal, 4,393 UA samples are misclassified as Normal and passed to RFC; The remaining 607 UA samples are misclassified as one of the KAs, hence, not passed to the RFC. With ACGAN-AC-RFC-Th, much more UA samples (4,818–4,994 for three different OOD thresholds) are passed to the RFC, regardless of how the UA samples are misclassified. This explains the lower recall (86.66%) of ACGAN-AC-RFC-Normal in comparison to ACGAN-AC-RFC-Th in Table 6.

  • The optimal OOD threshold for ACGAN-AC-RFC-Th is dependent on the relative performance of the AC and the RFC. If the RFC has good performance relative to the AC, then we should set a higher OOD threshold and pass more OOD samples to it and vice versa. In other words, give more responsibility to someone you trust more. In the extreme case when the RFC has perfect performance (recall=precision=1), then the OOD threshold should be set to \( th=1.0 \), causing all samples to be determined OOD and passed to the RFC. This extreme case is equivalent to reversing the order of the AC and the RFC: The RFC is applied first to filter out the UA samples labeled fake, and the samples labeled real are passed further to the second-stage AC for fine-grained classification of Normal and KAs. In practice, with \( th=1.0 \), the RFC is likely to have significantly degraded precision due to the massive number of negative samples (with GT Normal or KAi) it receives. (We tried setting \( th=1.0 \) and obtained a macro F1 score of 98.48%, slightly worse than 99.23% with \( th=0.995 \), since the RFC is good, but not perfect).

3.4 Performance Comparisons with Related Work

Among the related works listed in Table 12, a handful of papers also use the same car-hacking dataset [50] in their experiments. Table 10 shows performance comparisons of such papers that use the same dataset on the accuracy, recall, precision, and F1 score, averaged across four attacks (DoS, Fuzzing, GEAR spoofing, and RPM spoofing). However, the metrics of our approach may not be directly comparable with the related works, due to the differences between the dataset construction method and the dataset features used.

Table 10.
WorkMethodAccuracy (%)Recall (%)Precision (%)F1 (%)
Alshammari et al. (2018)KNN [4]97.496.394.793.4
Alshammari et al. (2018)SVM [4]96.595.795.293.3
Barletta et al. (2020)XYF-K [6]99.198.3910098.79
Olufowobi et al. (2019)SAIDuCANT [39]87.2186.6698.2492.0
Lokman et al. (2018)SSAE [33]-98.598.098.0
Song et al. (2020)DCNN [45]99.9399.8499.8499.91
Song et al. (2021)DCNN+Self-Sup. [43]95.3793.4296.7194.51
Ashraf et al. (2020)LSTM-AE [5]99.099.010099.0
Yang et al. (2021)MTH-IDS [57]99.99999.99999.999499.999
Our work (macro metrics)ACGANN/A99.2399.2499.23
  • Refer to Table 12 for more details on each method. (Note that our work (last row) performs multiclass classification and uses macro metrics defined in Equation (6)).

Table 10. Performance Comparisons with Related Work That Use the Same Car-hacking Dataset [50]

  • Refer to Table 12 for more details on each method. (Note that our work (last row) performs multiclass classification and uses macro metrics defined in Equation (6)).

Dataset construction method: In all the related works in Table 10, the training and the test datasets each consist of four datasets, each containing a mixture of Normal messages and one type of attack, i.e., (Normal + DoS), (Normal + Fuzzing), (Normal + GEAR), (Normal + RPM). The metrics accuracy, recall, precision, and F1 score are computed by averaging the corresponding metrics for binary classification of Normal vs. Attack on each dataset, as shown in Figure 1(a). i.e., no fine-grained label is provided for each attack type. The implicit assumption is that there is at most one attack type at any given time. In contrast, our training dataset consists of a mixture of (Normal + Fuzzing + GEAR + RPM) messages, and our test dataset consists of a mixture of (Normal + Fuzzing + GEAR + RPM + DoS) messages (assuming DoS is the UA). The multiclass classifier returns one class among the five classes (Normal, Fuzzing, GEAR, RPM, UA), as shown in Figure 1(c). (Refer to Section 3.1 for more details.) Hence, our problem setup is more general and can handle simultaneous occurrence of multiple KAs, gives fine-grained labels for each KA, and also detects the UA. This results in a more challenging multiclass classification problem than related works, which perform binary classification for each dataset with one attack type only. Due to this difference in problem setup, the performance metrics are also necessarily different: Our multiclass classifier uses the macro versions of recall, precision, and F1 score metrics defined in Equation (6)) for a single test dataset, while related works use the binary classification metrics defined in Equation (4), averaged over multiple test datasets.

Dataset features used: As discussed in Section 5, it is not fair to use payload as the part of the input for the public car-hacking dataset; only the CAN ID should be used instead. Most related works in Table 10 use CAN ID + payload, except for Song et al. [45] and Song et al. [43], who are the authors that collected the original dataset; and SAIDuCANT [39], which relies on message timing instead of message content (ID or payload) for intrusion detection.

3.5 Timing Performance

In this section, we perform quantitative evaluation of the timing performance, i.e., detection latency, of our best-performing method ACGAN-AC-RFC-Th. For CAN IDS with input of a sequence of messages, the detection latency consists of two factors: the amount of time for assembling a given number of consecutive messages as input to the RNN or CNN and the execution time of the detector/classifier itself.

3.5.1 Time Window Sizes of Each CAN Image.

During runtime operation, we can adopt a sliding-window approach and move the input time window (in terms of the number of consecutive messages) forward with step size 1, i.e., by one message at a time whenever a new message appears on the bus; or a non-overlapping window approach, and move the window forward each time the step size equal to the window size; or an intermediate approach of moving the window forward with step size between 1 and the window size. Figure 10 illustrates different step sizes (assuming hypothetically a sliding window of size 4). Different step sizes lead to different tradeoffs between detection latency and runtime overhead, e.g., step size 1 results in the lowest detection latency but highest runtime overhead. Therefore, a larger time window size does not necessarily imply a longer detection latency, since we can adopt a sliding-window approach with small step size. (As the choice of different sliding-window approaches is orthogonal to the detection algorithm, related works typically do not specify this aspect explicitly).

Fig. 10.

Fig. 10. Different step sizes assuming sliding window of size 4. Each box represents a CAN message.

We gathered the statistics of mean and variance for each non-overlapping window of 48 messages forming a CAN image for each type of dataset in the original public car-hacking dataset, shown in Table 11. (We also gathered the same statistics for each sliding window with step size 1, and the numbers are almost the same). We observe that the Normal dataset has very small variance, consisting of periodic messages with period 0.5 ms. Other datasets, especially DoS and fuzzing datasets, have large variances, and GEAR/RPM spoofing datasets have medium variances. This is consistent with our intuition of these attack types. For datasets with large time windows or large variances, it is reasonable to adopt smaller step sizes to reduce the detection latency.

Table 11.
DatasetMean (ms)Variance
Normal24.073.38
DoS36.03927.27
Fuzzing36.091,330.85
GEAR25.7289.41
RPM24.7651.04

Table 11. Time Window Sizes for Each CAN Image of 48 Messages

3.5.2 Classifier Latency and Memory Overhead.

Our embedded hardware platform is Raspberry Pi 4 Model B with 8 GB memory and quad-core ARM Cortex-A72 CPU at 1.5 GHz clock speed. The Cortex-A72 ARM CPU is representative of a medium-to-high-end automotive ECU from a hardware resource and performance perspective. We adopt the TVM compiler for Deep Learning [12] to generate C code to run on the target device, which was shown to outperform hand-optimized TensorFlow Lite for well-known neural network architectures such as ResNet and MobileNet [12]. We consider both single-core execution, where the neural network inference is constrained to execute on a single core; and multi-core execution, where the neural network inference executes in parallel on all four cores. The weights are single-precision 32-bit floating point. Figure 11 shows the execution time distributions for single or multicore execution of the ACGAN-AC-RFC-Th model. For single-core execution, the average execution time and standard deviation are 0.538 ms and 0.030 ms, respectively. For multicore execution, the average execution time and standard deviation are 0.203 ms and 0.032 ms, respectively. This is comparable to the average execution time of 0.574 ms on a Raspberry Pi 3 device with quad-core ARM Cortex-A53 CPU at 1.2 GHz clock speed for the classic ML approach in MTH-IDS [57] (it is not specified whether single-core or multicore execution is used) and well within the 10 ms latency requirement for in-vehicle IDS.2 We also observe that multicore execution achieves significant speedup compared to single-core execution, but the speedup is much less than 4-fold on the quad-core Cortex-A72 processor.

Fig. 11.

Fig. 11. Execution time distributions for single or multicore execution of the ACGAN-AC-RFC-Th model.

In addition, we used the tool PyTorch-OpCounter3 to obtain the total number of parameters of the ACGAN-AC-RFC-Th model to be 104,518, which amounts to 418.1 KB for 32-bit weights and 104.5 KB for 8-bit weights. This is a quite small model compared to well-known models such as ResNet and MobileNet with millions of parameters and can fit into the L2 cache of the Cortex-A72 ARM CPU, which has cache size ranging from 512 KB to 4 MB.4

Skip 4RELATED WORK Section

4 RELATED WORK

CAN bus IDS is an active research area due to its critical importance in automotive cyber-security [52]. The detection approaches can be broadly categorized into either rule-based, where humans design detection rules manually, or ML-based, where an ML model is trained from data. Rule-based approaches include: methods based on physical fingerprints, e.g., clock [59] or voltage measurements [14]; methods based on message timing [39, 44] or frequency [48]; methods based on message ID entropy [51] or Hamming distance [46], and many others. Due to the increasing complexity and diversity of in-vehicle network workloads, rule-based methods are generally viewed as not as accurate or flexible/adaptable as ML-based methods [56]. There is a wide variety of ML algorithms, including classic ML algorithms such as Support Vector Machines (SVM), K-Nearest Neighbor (KNN), Naive Bayes, Decision Trees, and so on, and the more recent Deep Learning (DL) algorithms. Next, we present a summary and comparative analysis of ML-based approaches to CAN bus IDS. They can be broadly categorized into [11]:

  • Supervised Learning for classification requires a labeled training dataset with samples of both Normal and the KAs and learns a discriminative decision boundary that separates them, with either binary classification of Normal vs. Attack or multiclass classification of Normal and fine-grained labels of KAs. It returns the probability \( P(c\mid X) \) that the input sample X has label c, with either a Sigmoid function for binary classification or a SoftMax function for multiclass classification. The drawback of Supervised Learning is that it cannot detect the UA class well. (We can combine Supervised learning with OOD detection to detect the UA class, e.g., CNN-Th in Figure 8(a), but its performance is generally unsatisfactory).

  • Semi-Supervised Learning for anomaly detection requires only Normal samples during training, not labeled KA samples. Anomalies are detected as outliers that deviate significantly from Normal samples. The drawback of this approach is that it cannot assign fine-grained labels to KAs.

  • Hybrid approaches that combine Supervised and Unsupervised/Semi-Supervised Learning, e.g., GAN, ACGAN (as in this article).

Deep Learning architectures based on multi-layer Neural Networks can be broadly categorized into FFNN and RNN. An FFNN by definition has no feedback or recurrent connections and consists of multiple layers that progressively extract more abstract and higher-level representations (features), which are used by downstream classification or regression tasks. CNNs are the most popular FFNN, but there are also other types, including Multi-Layer Perceptrons (MLPs), Deep Belief Network (DBN), Capsule Networks, and Transformers. An RNN contains recurrent connections to capture the temporal dimension of the input data. Vanilla RNNs suffer from short-term memory and the vanishing gradient problem, hence, Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) have been proposed to remedy their shortcomings.

CAN bus IDS is inherently a problem of detecting anomalous patterns in a time-series sequence of messages, and just observing one message may not be sufficient for anomaly detection in most cases. RNNs, including its variants LSTM and GRU, can be naturally applied to capture the temporal dimension of multiple messages in a time window. It is also possible to take into account the temporal dimension with a Feed-Forward Neural Network (FFNN) (including MLP and CNN) by stacking multiple messages in a time window into a 1D or 2D image as its input, as shown in Figure 4.

Next, we discuss representative related works, summarized in Table 12. (This selection is intended to be representative, not exhaustive, and we refer the interested reader to several survey papers for more comprehensive coverage [26, 52, 56]).

Table 12.

Table 12. Summary of Representative Approaches in Related Works

4.1 Specification-based IDS

Olufowobi et al. [39] present Specification-based (i.e., rule-based) Automotive Intrusion Detection using CAN Timing (SAIDuCANT), which uses the real-time model of the CAN bus to specify intended behavior and detect timing violations as signs of intrusions.

4.2 Classic ML-based IDS

Alshammari et al. [4] apply two classic machine learning techniques, KNN and SVM, for binary classification of Normal vs. Attack. Barletta et al. [6] present X–Y fused Kohonen network with K-means clustering (XYF–K) for binary classification. Yang et al. [57] present a multitiered hybrid IDS that incorporates a signature-based IDS and an anomaly-based IDS to detect KAs and UA with rather complex workflow consisting of four stages and 12 algorithms, which may present a steep learning curve to engineers who need to apply this framework in practice to address a new problem with different datasets and/or attack types.

4.3 FFNN-based IDS

Kang et al. [27] present an MLP-based binary classifier with two hidden layers using 64 bits of the CAN payload as 64-dimensional input. Song et al. [45] present an IDS for detecting KAs based on an advanced CNN architecture Inception-ResNet [23] for image classification. Song et al. [43] present a self-supervised method for detecting any UA using noised pseudo normal data, which consists of two deep-learning models of the generator and the detector (the same Inception-ResNet used in Reference [45]), which generates noised pseudo normal data and detects anomalies, respectively.

4.4 RNN-based IDS

Rehman et al. [40] present a hybrid CNN and attention-based GRU model, where the CNN extracts features as input to the downstream GRU model for classification. Hossain et al. [25] used an LSTM trained on both normal and attack messages to perform either multiclass classification with four labels: Normal and three KAs, or binary classification with two labels: Normal vs. Attack.

Taylor et al. [49] employ an LSTM trained on normal messages only to predict the next message’s payload from a past window of messages, as shown in Figure 12. A message is considered malicious if its payload deviates from the predicted value significantly.

Fig. 12.

Fig. 12. RNN-based prediction.

4.5 Autoencoder-based Reconstruction

An autoencoder (AE) is trained on normal data only and consists of an encoder that maps the input sample into the latent space representation in the bottleneck hidden layer and a decoder (generator) that maps the representation to a reconstruction of the input sample, as shown in Figure 13. The autoencoder is then used to compress an input sample with the encode then reconstruct it with the decoder. The reconstruction error, computed as the difference between the input sample and its reconstruction, can be used as the anomaly score. A large reconstruction error indicates a likely anomaly. The encoder and decoder in LSTM may be either FFNNs or RNNs.

Fig. 13.

Fig. 13. Autoencoder-based reconstruction.

Lokman et al. [33] present an MLP-based Stacked Sparse Autoencoder (SSAE) that computes the anomaly score based on the difference between an input message sequence and the reconstructed sequence. Longari et al. [34] present an LSTM-based autoencoder and an anomaly detector that works by computing the statistical characteristics of reconstruction errors over a separate validation dataset; then, it assigns a distance score that indicates how far a given reconstruction error is from the expected normal distribution. Hanselmann et al. [22] use one independent input LSTM model for each CAN, and the outputs of all input LSTM models are aggregated and fed into a fully connected subnetwork with an autoencoder structure to take into account inter-dependencies between messages of different IDs. Ashraf et al. [5] present an LSTM-based autoencoder and use a statistical feature extraction technique to capture contextual features as input instead of using the raw message bits as input.

4.6 GAN-based IDS

We can further divide GAN-based IDS into two types:

  • GAN-based classification: use D as a binary RFC and label the real input samples as Normal and the fake samples as Attack.

  • GAN-based reconstruction: similar to autoencoders, detect attacks as anomalies based on reconstruction errors.

4.6.1 GAN-based Classification.

Seo et al. [42] present GIDS, GAN-based CAN bus IDS with cascaded binary classifier architecture, as shown in Figure 14. The first-stage binary classifier is the GAN discriminator D trained with Supervised learning (without the GAN generator G) to distinguish between Normal and one specific KA. It has good performance for the specific KA class in the training dataset, but not for other types of KAs or any UA. The second-stage binary classifier is the D, trained with GAN jointly with G, to distinguish between real (Normal) and fake (Attack) data, using only Normal samples in the training dataset. GIDS has a binary classifier in the first stage, which is only effective for the one KAi class in its training dataset. For samples of another KA that is not KAi, the burden falls on the second-stage RFC to classify them accurately. There is thus a dilemma of which binary classifier to use among the n binary classifiers trained with each of the n KAs, with the most logical choice being the one trained with the most frequently occurring KAi class. We avoid this dilemma by using the multiclass AC as the first-stage classifier in ACGAN-AC-RFC-Normal and ACGAN-AC-RFC-Th.

Fig. 14.

Fig. 14. GAN-based CAN bus IDS (GIDS) [42].

Yang et al. [58] present a GAN-based IDS for multiclass classification of KAs. Xie et al. [55] considered the CAN communication matrix, and grouped all messages from the same sender into a data block as input to the IDS, and achieved improved performance compared to Reference [58].

4.6.2 GAN-based Reconstruction.

Donahue et al. present BiGAN (Bidirectional GAN) [17], a type of GAN where generator G not only generates data samples from the latent representation, but also includes an encoder for inverse mapping from data samples to the latent representation. This encoder enables an additional term of reconstruction loss in the GAN loss function, which can be used for anomaly detection based on the reconstruction error. Schlegl et al. present AnoGAN [41] with the loss function defined as a weighted sum of the discriminator loss and the reconstruction loss. Akcay et al. present GANomaly [2], which combines autoencoder and GAN, including the following subnetworks: an autoencoder as the generator, a real/fake classifier as the discriminator, and another encoder that compresses the generated image to its latent representation. The GAN-based reconstruction approaches have shown good performance for anomaly detection in high-resolution images, but they also have much higher runtime overhead due to the complex network architecture, hence, may not be suitable for deployment in resource-constrained in-vehicle embedded systems, and we are not aware of any work on CAN bus IDS using the reconstruction-based approach.

4.7 Summary and Comparative Analysis

Figure 15 compares different approaches to building a single one-stage classifier for CAN bus IDS, with the simplistic assumption of 1D Gaussian distribution for each class, showing a flatter distribution for the Normal class and sharper distributions for KA classes. Another simplistic assumption is that all UA samples are OOD, which has been shown to be not true in Section 3.3.1. The distribution of the UA class are not shown explicitly, which should have significant overlaps with that of the Normal class, based on our previous observation that “Normal samples and UA samples are more easily confused with each other, whereas each KA’s samples are more easily classified and distinguished from Normal samples and UA samples.” (Note that Figure 15 does not cover the GAN-based methods in Table 12.)

Fig. 15.

Fig. 15. Intuitive illustration of different approaches to building classifiers for CAN bus IDS. The decision boundaries are visualized as vertical dotted lines.

  • Figure 15(a): Binary classifier trained with Semi-Supervised Learning for anomaly detection using Normal samples only. It can perform coarse-grained classification of Normal vs. Attack, i.e., any input sample that deviates from distribution of the Normal class is classified as Attack, but not fine-grained classification of KAs. Since labeled samples of KAs are not included in the training dataset, it cannot detect KAs effectively by exploiting their unique features.

  • Figure 15(b): Binary classifier trained with Supervised Learning (the two Normal distributions on the sides represent two KAs that are given the same coarse-grained label Attack). It can perform coarse-grained classification of Normal vs. Attack, but is ineffective in fine-grained classification of KAs.

  • Figure 15(c): Multiclass classifier trained with Supervised Learning. It can perform fine-grained classification of Normal and KAs, but is ineffective in classifying UA samples.

  • Figure 15(d): Multiclass classifier trained with Supervised Learning, combined with OOD detection. It can perform fine-grained classification of Normal and KAs and also classify UA samples. This corresponds to CNN-Th (Figure 8(a)) and ACGAN-AC-Th (Figure 8(b)) architectures in this article.

To summarize: In the presence of the UA class, a binary classifier trained with Semi-Supervised Learning (Figure 15(a)) can be effective, but it can only perform coarse-grained classification. A binary classifier (Figure 15(b)) or multiclass classifier (Figure 15(c)) trained with Supervised Learning is ineffective and should not be used. Adding OOD detection (Figure 15(d)) mitigates the problem, but is still insufficient. We assert that a two-stage cascaded architecture must be adopted for achieving both fine-grained classification of KAs and effective detection of the UA class, and the second-stage RFC for classifying the UA samples plays a crucial role in the overall classifier performance. In certain Operational Design Domains, where the attack classes encountered during operation are likely to be all KAs with no or very low chance of UA, straightforward Supervised Learning methods may be sufficient.

Skip 5DISCUSSIONS ON FEATURE SELECTION: CAN ID OR PAYLOAD Section

5 DISCUSSIONS ON FEATURE SELECTION: CAN ID OR PAYLOAD

The CAN bus IDS may use CAN ID only, payload only, or both CAN ID and payload as the input features to a classifier. The appropriate choice of input features depends on the attack scenario. Since normal CAN messages are typically periodic, this regular timing pattern of messages makes it amenable to anomaly detection based on CAN-ID only. Consider the three common types of attacks in Figure 3: They all rely on setting fake CAN IDs for the attack messages, hence, they are all detectable with CAN ID only. For example, to detect RPM spoofing attacks, we can use the CAN ID and detect anomalies in the sequence of CAN IDs based on the periodicity of normal messages, e.g., if the normal RPM message has a period of 10 ms and we see much more frequent RPM messages within a small time window, since the spoofing attack messages are typically sent with much higher frequency than normal messages to increase the success of attack [39]. Or, we can use the payload and detect anomalies in the time series of RPM values encoded as signals in the message payload, e.g., abrupt increase or decrease of the RPM value in a sequence of smoothly varying values.

Certain types of attacks are payload-based attacks, and the payload contains relevant information and should be used to detect them, e.g., anomalous variations in the signal values contained in the payload. But certain types of attacks are CAN ID-based attacks, e.g., for DoS, fuzzing, and spoofing attacks shown in Figure 3 and included in the car-hacking dataset [50]; the CAN ID is the critical feature that distinguishes the attack messages from the normal messages. However, some related works use the payload as part of the input features for IDS, which may be problematic, since the payloads in the public car-hacking dataset are generated randomly and form very distinguishing/discriminative features that are an artifact of the dataset generation process and may not be present in real attacks. Seo et al. [42] describe the dataset generation procedure. The payloads of the attack messages are generated randomly, hence, they are very different from the payloads of normal messages, so of course it is very effective to use the message payload as features for Normal vs. Attack classification. But this is an artifact of the dataset generation method, and real-world attacks may be very different, as a smart attacker may not use random payloads for attack messages, but instead assign similar signal values to make them look like normal message payloads, or in case of spoofing attacks, capture and replay the signal values of normal messages and use them as attack message payloads. Therefore, at least for this dataset, the message payload does not contain relevant information and should not be used as input features. Methods that use the payload (e.g., References [4, 5, 6, 33, 39, 57] in Table 10) have an unfair advantage over those that use CAN ID only for classification. For example, MTH-IDS by Yang et al. [57] has four features (“CAN ID,” “DATA[5],” “DATA[3],” and “DATA[1]”) after feature selection as the input. It is interesting to note that the authors who collected the public dataset have published three papers themselves using this dataset [42, 43, 45], and all of them used CAN ID only, which is the correct approach.

Even for cases where both CAN ID and payload contain relevant information, using the CAN ID only for detection has the following benefits:

  • It incurs lower runtime overhead, since the payload does not need to be decoded, which is important for resource-constrained in-vehicle systems;

  • It is independent of the message format/payload length, hence, is applicable to both the classic CAN and the more recent higher-bandwidth CAN bus stands without change, e.g., CAN-FD with payload size of 64 Bytes [16] and CAN-XL with up to 2,048 Bytes [37]. The larger payload size, especially for CAN-XL, may render many current payload-based methods ineffective, since the large payload field may overwhelm the CAN ID field in the input feature space, so it may be necessary to reduce the feature dimension by feature engineering.

  • Payload-based detection is likely to be application-specific. Consider RPM spoofing attacks: The normal range and pattern of RPM values may be specific to each user/driver, and the classifier may need to be retrained/fine-tuned for different user behavior profiles, i.e., a classifier for detecting RPM spoofing attacks that is trained on a dataset of message payloads from a defensive driver may not work well on a test dataset from a race car driver, since the RPM signal may have very different patterns of variation for them. However, CAN ID-based detection is more application-neutral and more robust to user behavior variations.

Skip 6CONCLUSIONS AND FUTURE WORK Section

6 CONCLUSIONS AND FUTURE WORK

In this article, we address the problem of CAN bus intrusion detection, with the objective of detecting both KAs and any UA, and also assign fine-grained labels to detected KAs. Performance evaluation demonstrates the effectiveness of our proposed methods, especially ACGAN-AC-RFC-Th, a cascaded two-stage classification architecture, with the multi-class AC in the first stage for classification of Normal and KAs, passing OOD samples to the binary RFC in the second stage for detection of the UA class. Performance evaluation demonstrates that our method achieves both high classification accuracy and low runtime overhead. It is also conceptually simple and user-friendly, thanks to modern Deep Learning frameworks such as PyTorch and DNN compilers such as TVM.

Recently, researchers have proposed an emerging class of stealthy attacks that present significant challenges to current IDSes. Cho and Shin [13] present the bus-off attack, which exploits the error-handling mechanism of the CAN bus to shut down victim ECUs. Bloom [7] present WeepingCAN, a refinement of the bus-off attack that is more stealthy and can evade detection. Kulandaivel et al. [30] present CANnon, which leverages the peripheral clock gating feature to insert arbitrary bits at any time instance. A future research topic is the detection and mitigation of this type of stealthy attack, which may require secure network architectures [8], physical-layer IDS, and secure transceivers.

APPENDIX

A ADDITIONAL EXPERIMENTAL RESULTS

We present additional experimental results with different settings of the OOD threshold th, with \( th=0.96 \) (the optimal OOD threshold for CNN-Th) in Table 13 and \( th=0.94 \) (the optimal OOD threshold for ACGAN-AC-Th) in Table 14. (We use references to Figure 8 to replace the method names to save space.) Even though they are not the optimal OOD threshold setting for ACGAN-AC-RFC-Th, it still achieves the highest macro F1 score for both threshold values. This is consistent with the results shown in Figure 9.

Table 13.
ClassMethodRecall(%)Precision(%)F1-Score(%)
MacroFigure 8(a)88.7089.3188.54
Figure 8(b)96.5096.7096.54
Figure 8(c)96.6296.7696.57
Figure 8(d)98.9598.9898.95
UA (DoS)Figure 8(a)67.0874.6270.65
Figure 8(b)96.1888.3092.07
Figure 8(c)86.6699.8892.80
Figure 8(d)97.3499.8898.59
FuzzingFigure 8(a)97.1277.7286.34
Figure 8(b)93.5499.8196.57
Figure 8(c)97.5092.2994.83
Figure 8(d)98.4699.7099.07
GEARFigure 8(a)99.6099.7299.66
Figure 8(b)99.3699.9099.63
Figure 8(c)99.5895.9097.70
Figure 8(d)99.6699.3499.50
RPMFigure 8(a)99.7095.4497.53
Figure 8(b)99.46100.0099.73
Figure 8(c)99.5699.8499.70
Figure 8(d)99.6699.9499.80

Table 13. Performance Results with \( th=0.96 \) , the Optimal OOD Threshold for CNN-Th

Table 14.
ClassMethodRecall(%)Precision(%)F1-Score(%)
MacroFigure 8(a)88.6389.2588.26
Figure 8(b)96.7196.7896.73
Figure 8(c)96.6296.7696.57
Figure 8(d)98.5798.6098.57
UA (DoS)Figure 8(a)59.9479.0368.18
Figure 8(b)94.2490.5892.37
Figure 8(c)86.6699.8892.80
Figure 8(d)95.3499.7797.50
FuzzingFigure 8(a)97.4474.4084.38
Figure 8(b)95.8299.3297.54
Figure 8(c)97.5092.2994.83
Figure 8(d)98.8097.9898.39
GEARFigure 8(a)99.6099.6899.64
Figure 8(b)99.4099.6299.51
Figure 8(c)99.5895.9097.70
Figure 8(d)99.6899.1199.39
RPMFigure 8(a)99.7094.0796.81
Figure 8(b)99.50100.0099.75
Figure 8(c)99.5699.8499.70
Figure 8(d)99.6499.9499.79

Table 14. Performance Results with \( th=0.94 \) , the Optimal OOD Threshold for ACGAN-AC-Th

Footnotes

  1. 1 Our code is open-source, available at https://github.com/leyiweb/CAN_GAN_Anomaly.

    Footnote
  2. 2 From Reference [57]: According to the U.S. Department of Transportation, the highest priority vehicle safety services, such as collision and attack warnings, should have a latency of at most 10 to 100 ms [1]. However, for autonomous or cooperative driving, the V2X traffic safety applications require a stringent latency requirement of 10–20 ms [36]. Thus, for a vehicle-level IDS, the time needed to process each network packet is required to be less than 10 ms to meet the real-time or latency requirements.

    Footnote
  3. 3 https://github.com/Lyken17/pytorch-OpCounter.

    Footnote
  4. 4 https://en.wikipedia.org/wiki/AR_Cortex-A72.

    Footnote

REFERENCES

  1. [1] Abualhoul Mohammad Y., Shagdar Oyunchimeg, and Nashashibi Fawzi. 2016. Visible light inter-vehicle communication for platooning of autonomous vehicles. In IEEE Intelligent Vehicles Symposium (IV). IEEE, 508513.Google ScholarGoogle Scholar
  2. [2] Akcay Samet, Atapour-Abarghouei Amir, and Breckon Toby P.. 2018. GANomaly: Semi-supervised anomaly detection via adversarial training. In Asian Conference on Computer Vision. Springer, 622637.Google ScholarGoogle Scholar
  3. [3] Akçay Samet, Breckon Amir Atapour-Abarghouei, and Toby P.. 2019. Skip-GANomaly: Skip connected and adversarially trained encoder-decoder anomaly detection. In International Joint Conference on Neural Networks (IJCNN’19). IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Alshammari Abdulaziz, Zohdy Mohamed A., Debnath Debatosh, and Corser George. 2018. Classification approach for intrusion detection in vehicle systems. Wirel. Eng. Technol. 9, 4 (2018), 7994.Google ScholarGoogle Scholar
  5. [5] Ashraf Javed, Bakhshi Asim D., Moustafa Nour, Khurshid Hasnat, Javed Abdullah, and Beheshti Amin. 2020. Novel deep learning-enabled LSTM autoencoder architecture for discovering anomalous events from intelligent transportation systems. IEEE Trans. Intell. Transport. Syst. 22, 7 (2020), 45074518.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Barletta Vita Santa, Caivano Danilo, Nannavecchia Antonella, and Scalera Michele. 2020. A Kohonen SOM architecture for intrusion detection on in-vehicle communication networks. Appl. Sci. 10, 15 (2020), 5062.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Bloom Gedare. 2021. WeepingCAN: A stealthy CAN bus-off attack. In Workshop on Automotive and Autonomous Vehicle Security (AutoSec).Google ScholarGoogle Scholar
  8. [8] Bloom Gedare, Cena Gianluca, Bertolotti Ivan Cibrario, Hu Tingting, and Valenzano Adriano. 2017. Supporting security protocols on CAN-based networks. In IEEE International Conference on Industrial Technology (ICIT). IEEE, 13341339.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Bulusu Saikiran, Kailkhura Bhavya, Li Bo, Varshney Pramod K., and Song Dawn. 2020. Anomalous example detection in deep learning: A survey. IEEE Access 8 (2020), 132330132347.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Chakraborty Samarjit, Faruque Mohammad Abdullah Al, Chang Wanli, Goswami Dip, Wolf Marilyn, and Zhu Qi. 2016. Automotive cyber-physical systems: A tutorial introduction. IEEE Des. Test 33, 4 (2016), 92108.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Chalapathy Raghavendra and Chawla Sanjay. 2019. Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407 (2019).Google ScholarGoogle Scholar
  12. [12] Chen Tianqi, Moreau Thierry, Jiang Ziheng, Zheng Lianmin, Yan Eddie, Shen Haichen, Cowan Meghan, Wang Leyuan, Hu Yuwei, Ceze Luis, et al. 2018. TVM: An automated End-to-End optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). 578594.Google ScholarGoogle Scholar
  13. [13] Cho Kyong-Tak and Shin Kang G.. 2016. Error handling of in-vehicle networks makes them vulnerable. In ACM SIGSAC Conference on Computer and Communications Security. 10441055.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Cho Kyong-Tak and Shin Kang G.. 2016. Fingerprinting electronic control units for vehicle intrusion detection. In 25th USENIX Security Symposium (USENIX Security’16). 911927.Google ScholarGoogle Scholar
  15. [15] Consortium The AUTOSAR. 2020. Requirements on Intrusion Detection System. R20-11. Foundation. AUTOSAR. Nov. 2020. Retrieved from: https://www.autosar.org/fileadmin/user_upload/standards/foundation/20-11/AUTOSAR_RS_IntrusionDetectionSystem.pdf.Google ScholarGoogle Scholar
  16. [16] Andrade Ricardo De, Hodel Kleber N., Justo Joao Francisco, Laganá Armando M., Santos Max Mauro, and Gu Zonghua. 2018. Analytical and experimental performance evaluations of CAN-FD bus. IEEE Access 6 (2018), 2128721295.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Donahue Jeff, Krähenbühl Philipp, and Darrell Trevor. 2016. Adversarial feature learning. arXiv preprint arXiv:1605.09782 (2016).Google ScholarGoogle Scholar
  18. [18] Giannopoulos Hristos, Wyglinski Alexander M., and Chapman Joseph. 2017. Securing vehicular controller area networks: An approach to active bus-level countermeasures. IEEE Vehic. Technol. Mag. 12, 4 (2017), 6068.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Goodfellow Ian, Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. 2014. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014).Google ScholarGoogle Scholar
  20. [20] Grandini Margherita, Bagli Enrico, and Visani Giorgio. 2020. Metrics for multi-class classification: An overview. arXiv preprint arXiv:2008.05756 (2020).Google ScholarGoogle Scholar
  21. [21] Gu Zonghua, Han Gang, Zeng Haibo, and Zhao Qingling. 2016. Security-aware mapping and scheduling with hardware co-processors for FlexRay-based distributed embedded systems. IEEE Trans. Parallel Distrib. Syst. 27, 10 (2016), 30443057.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Hanselmann Markus, Strauss Thilo, Dormann Katharina, and Ulmer Holger. 2020. CANet: An unsupervised intrusion detection system for high dimensional CAN bus data. IEEE Access 8 (2020), 5819458205.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Hendrycks Dan and Gimpel Kevin. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016).Google ScholarGoogle Scholar
  25. [25] Hossain Md Delwar, Inoue Hiroyuki, Ochiai Hideya, Fall Doudou, and Kadobayashi Youki. 2020. Long short-term memory-based intrusion detection system for in-vehicle controller area network bus. In IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC’20). IEEE, 1017.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Jo Hyo Jin and Choi Wonsuk. 2021. A Survey of Attacks on Controller Area Networks and Corresponding Countermeasures. IEEE Trans. Intell. Transport. Syst. 23, 7 (2021), 6123–6141.Google ScholarGoogle Scholar
  27. [27] Kang Min-Joo and Kang Je-Won. 2016. Intrusion detection system using deep neural network for in-vehicle network security. PloS One 11, 6 (2016), e0155781.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Kang Min-Ju and Kang Je-Won. 2016. A novel intrusion detection method using deep neural network for in-vehicle network security. In IEEE 83rd Vehicular Technology Conference (VTC Spring). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Kong Shu and Ramanan Deva. 2021. OpenGAN: Open-set recognition via open data generation. arXiv preprint arXiv:2104.02939 (2021).Google ScholarGoogle Scholar
  30. [30] Kulandaivel Sekar, Jain Shalabh, Guajardo Jorge, and Sekar Vyas. 2021. Cannon: Reliable and stealthy remote shutdown attacks via unaltered automotive microcontrollers. In IEEE Symposium on Security and Privacy (SP). IEEE, 195210.Google ScholarGoogle Scholar
  31. [31] Labrado Carson and Thapliyal Himanshu. 2019. Hardware security primitives for vehicles. IEEE Consum. Electron. Mag. 8, 6 (2019), 99103.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Linder-Norén Erik. 2019. PyTorch Implementations of Generative Adversarial Networks. Retrieved from: https://github.com/eriklindernoren/PyTorch-GAN.Google ScholarGoogle Scholar
  33. [33] Lokman Siti Farhana, Othman Abu Talib, Bakar Muhamad Husaini Abu, and Razuwan Rizal. 2018. Stacked sparse autoencodersbased outlier discovery for in-vehicle controller area network (CAN). Int. J. Eng. Technol. 7, 4.33 (2018), 375380.Google ScholarGoogle Scholar
  34. [34] Longari Stefano, Valcarcel Daniel Humberto Nova, Zago Mattia, Carminati Michele, and Zanero Stefano. 2020. CANnolo: An anomaly detection system based on LSTM autoencoders for controller area network. IEEE Trans. Netw. Serv. Manag. 18, 2 (2020), 1913–1924.Google ScholarGoogle Scholar
  35. [35] Luan Siyu, Gu Zonghua, Freidovich Leonid B., Jiang Lili, and Zhao Qingling. 2021. Out-of-distribution detection for deep neural networks with isolation forest and local outlier factor. IEEE Access 9 (2021), 132980132989.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Moubayed Abdallah and Shami Abdallah. 2020. Softwarization, virtualization, & machine learning for intelligent & effective V2X communications. arXiv preprint arXiv:2006.04595 (2020).Google ScholarGoogle Scholar
  37. [37] Mutter Arthur. 2021. SIG CAN XL. Retrieved from: https://www.can-cia.org/groups/technical-groups/technical-committee-tc/ig-lower-layers/sig-can-xl/.Google ScholarGoogle Scholar
  38. [38] Odena Augustus, Olah Christopher, and Shlens Jonathon. 2017. Conditional image synthesis with auxiliary classifier gans. In International Conference on Machine Learning. PMLR, 26422651.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Olufowobi Habeeb, Young Clinton, Zambreno Joseph, and Bloom Gedare. 2019. SAIDuCANT: Specification-based automotive intrusion detection using controller area network (CAN) timing. IEEE Trans. Vehic. Technol. 69, 2 (2019), 14841494.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Rehman Abdul, Rehman Saif Ur, Khan Mohib Ullah, Alazab Mamoun, and Reddy Thippa. 2021. CANintelliIDS: Detecting in-vehicle intrusion attacks on a controller area network using CNN and attention-based GRU. IEEE Trans. Netw. Sci. Eng. 8, 2 (2021), 1456–1466.Google ScholarGoogle Scholar
  41. [41] Schlegl Thomas, Seeböck Philipp, Waldstein Sebastian M., Schmidt-Erfurth Ursula, and Langs Georg. 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International Conference on Information Processing in Medical Imaging. Springer, 146157.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Seo Eunbi, Song Hyun Min, and Kim Huy Kang. 2018. GIDS: GAN based intrusion detection system for in-vehicle network. In 16th Annual Conference on Privacy, Security and Trust (PST’18). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Song Hyun Min and Kim Huy Kang. 2021. Self-supervised anomaly detection for in-vehicle network using noised pseudo normal data. IEEE Trans. Vehic. Technol. 70, 2 (2021), 10981108.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Song Hyun Min, Kim Ha Rang, and Kim Huy Kang. 2016. Intrusion detection system based on the analysis of time intervals of CAN messages for in-vehicle network. In International Conference on Information Networking (ICOIN’16). IEEE, 6368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Song Hyun Min, Woo Jiyoung, and Kim Huy Kang. 2020. In-vehicle network intrusion detection using deep convolutional neural network. Vehic. Commun. 21 (2020), 100198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Stabili Dario, Marchetti Mirco, and Colajanni Michele. 2017. Detecting attacks to internal vehicle networks through Hamming distance. In AEIT International Annual Conference. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Szilagyi Christopher and Koopman Philip. 2009. Flexible multicast authentication for time-triggered embedded control network applications. In IEEE/IFIP International Conference on Dependable Systems & Networks. IEEE, 165174.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Taylor Adrian, Japkowicz Nathalie, and Leblanc Sylvain. 2015. Frequency-based anomaly detection for the automotive CAN bus. In World Congress on Industrial Control Systems Security (WCICSS’15). IEEE, 4549.Google ScholarGoogle Scholar
  49. [49] Taylor Adrian, Leblanc Sylvain, and Japkowicz Nathalie. 2016. Anomaly detection in automobile control network data with long short-term memory networks. In IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 130139.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] University Korea. 2018. Car-Hacking Dataset for the Intrusion Detection. Retrieved from: https://ocslab.hksecurity.net/Datasets/car-hacking-dataset.Google ScholarGoogle Scholar
  51. [51] Wu Wufei, Huang Yizhi, Kurachi Ryo, Zeng Gang, Xie Guoqi, Li Renfa, and Li Keqin. 2018. Sliding window optimized information entropy analysis method for intrusion detection on in-vehicle networks. IEEE Access 6 (2018), 4523345245.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Wu Wufei, Li Renfa, Xie Guoqi, An Jiyao, Bai Yang, Zhou Jia, and Li Keqin. 2019. A survey of intrusion detection for in-vehicle networks. IEEE Trans. Intell. Transport. Syst. 21, 3 (2019), 919933.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Xie Guoqi, Li Renfa, and Hu Shiyan. 2020. Security-aware obfuscated priority assignment for CAN-FD messages in real-time parallel automotive applications. IEEE Trans. Comput.-aid. Des. Integ. Circ. Syst. 39, 12 (2020), 44134425.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Xie Guoqi, Yang Laurence T., Liu Yao, Luo Haibo, Peng Xin, and Li Renfa. 2021. Security enhancement for real-time independent in-vehicle CAN-FD messages in vehicular networks. IEEE Trans. Vehic. Technol. 70, 6 (2021).Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Xie Guoqi, Yang Laurence T., Yang Yuanda, Luo Haibo, Li Renfa, and Alazab Mamoun. 2021. Threat analysis for automotive CAN networks: A GAN model-based intrusion detection technique. IEEE Trans. Intell. Transport. Syst. 22, 7 (2021), 4467–4477.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Xie Yong, Zhou Yu, Xu Jing, Zhou Jian, Chen Xiaobai, and Xiao Fu. 2021. Cybersecurity protection on in-vehicle networks for distributed automotive cyber-physical systems: state-of-the-art and future challenges. Softw.: Pract. Exper. 51, 11 (2021), 2108–2127.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Yang Li, Moubayed Abdallah, and Shami Abdallah. 2021. MTH-IDS: A multitiered hybrid intrusion detection system for internet of vehicles. IEEE Internet Things J. 9, 1 (2021), 616632.Google ScholarGoogle Scholar
  58. [58] Yang Yuanda, Xie Guoqi, Wang Jilong, Zhou Jia, Xia Ze, and Li Renfa. 2021. Intrusion detection for in-vehicle network by using single GAN in connected vehicles. J. Circ., Syst. Comput. 30, 1 (2021), 2150007.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Zhou Jia, Xie Guoqi, Yu Siyang, and Li Renfa. 2020. Clock-based sender identification and attack detection for automotive CAN network. IEEE Access 9 (2020), 26652679.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. CAN Bus Intrusion Detection Based on Auxiliary Classifier GAN and Out-of-distribution Detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 21, Issue 4
      July 2022
      330 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/3551651
      • Editor:
      • Tulika Mitra
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 September 2022
      • Online AM: 3 June 2022
      • Revised: 1 May 2022
      • Accepted: 1 May 2022
      • Received: 1 December 2021
      Published in tecs Volume 21, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!