ThermWare: Toward Side-Channel Defense for Tiny IoT Devices

As malware in IoT devices flourishes, defenses are lacking. Traditional antivirus or intrusion detection-based defense techniques fail for the limited computational capabilities and the large diversity of platforms and environments. In this paper, we present ThermWare, a non-intrusive screening method to detect anomalous operations on embedded devices at run-time. ThermWare relies on the observation that electronic circuits generate subtle patterns of heat at the component level when the corresponding module is accessed by the micro-operations (e.g., file-write) of the running code. We propose the use of these side-channel heat signatures captured by a thermal camera to determine the sequence of underlying computations in real time. An early implementation of ThermWare shows success in detecting common malware routines in general-purpose IoT devices. We envision leveraging the thermal side-channel to track the internal operations of an embedded device, which can potentially lead to broader applications in engineering embedded systems, monitoring device health and run-time capacity, assisting embedded coding optimization, and physical layer security analysis.


INTRODUCTION
Miniaturized Internet of Things (IoT) devices have enabled a paradigm of connected everyday objects, ranging from a smart light bulb to implantable microchips for pets.Recent advancements show ubiquitous micro-motes [10] or insect-scale smart devices [7,8,19,35,47] for ne-grained sensing of our environments are not a distant future.The small size and the embedded nature of the future IoT devices forces a reconsideration of how we approach security -even the most basic defenses, like anti-virus software or process memory isolation, simply cannot run on resource-constrained systems.Often, hardware debugging ports are also sacriced in the architecture in the interest of saving space and computation power on the circuit boards.As a result, standard security methods do not apply to these devices, which makes them notoriously vulnerable despite their close access to users' private information.At the same time, it is dicult to make complete architectural changes to reinforce security in such systems [46].One of the approaches to ensure security in resource-constrained embedded devices seeks alternative solutions in leveraging sidechannel information for ooading anomaly detection, memory forensics, memory rewriting, and malware activity monitoring to more powerful machines.
The research community is constantly in search of new information sources to combine and complement side-channel defense techniques [20,36,42,62].This paper envisions a noninvasive malware detection approach using thermal maps.Activities on a processor and other peripherals lead to temperature variations at dierent regions of the circuit.The generated heat is directly related to the switching of transistors' state at the lowest level -an indicator of activities at various functional units of the circuit and inside the chips.When monitored with ne-grained resolution in time and space, the transient thermal map of the circuit can reveal underlying operations at run-time.Such monitoring can be done passively with thermal cameras, without any physical contact with the device.A system can potentially learn this thermal pattern to detect any deviation from the desired activities on the circuit.We envision characterizing malicious activities on IoT devices through spatiotemporal analysis of the circuit's heat dissipation.We call this system ThermWare.Figure 1 shows an overview of the system.If successful, this can become a noninvasive method for malware detection in small devices that are otherwise dicult to connect to standard monitoring equipment.
As a malware screening system, ThermWare requires only passive access to the circuit board for scanning thermal signatures using infrared sensors or thermal cameras.Thermal map observation does not require physical or electrical contact with the device, which adds to its advantage compared to other side-channel methods.Our system can be used as an add-on to the device under monitoring and run detection code based on real-time thermal features.ThermWare can also be useful for auditing the behavior of small embedded devices, such as in-body medical devices when a patient visits the doctor's oce or during regular maintenance or replacements.While we believe the use-case and physical implementation of the method will evolve, in this paper we focus on exploring the core capabilities of this side-channel in malware detection and assess the advantages and limitations of such a system.
A system-on-chip (SoC) is built with careful positioning of modules like register banks, memory units, CPU, communication radio, and I/O interface inside one chip.We ask the question -Can we leverage high-resolution thermal imaging techniques to detect anomalous computations inside a tiny SoC?A program or computing operation can be viewed as a sequence of activation of these modules at dierent locations of the chip.Given such activation inevitably release heat, the spatial heat pattern can be a direct indicator of the underlying computing operations.This can open an opportunity to infer the executed code without a physical access.We aim to explore the possibility.
Detection of a micro-operation in run-time, such as le writes, is particularly relevant since the majority of malware in Linux environments are ransomware and web shells, and therefore le writes comprise a dominant portion of their behavior [6].In this work, we show that we can detect le writes of at least 400KB with 80% accuracy using a relatively inexpensive thermal camera.Additionally, we achieve more than 90% accuracy with a higher-end thermal camera.Our results suggest that a high-end camera provides diminishing returns if an application tolerates a slight decrease in the accuracy level.Next, we show that an anomaly detector can achieve reliable detection of common malware routines in general-purpose IoT devices using an inexpensive thermal camera adapter to a smartphone.At the same time, we report a few undetectable actions, such as network port scanning on a voice assistant device.
ThermWare demonstrates the feasibility of using heat signatures as an alternative method to detect the states of computational operations inside an embedded device.We hypothesize that the fundamental units of the computations, such as le write events, memory access, network operations, etc., manifest as thermal signatures.If this side-channel information is carefully analyzed, it can track the behavior of running codes.This technique can have several implications by adding to the capabilities of reverse engineering embedded systems, monitoring device health and run-time capacity, assisting embedded coding optimization, and physical layer security analysis.Needless to say, this paper is only a rst step toward this broader vision and we aim to inquire about the opportunities it presents.We start by experimenting with the technique as a defense application that can identify anomalous behavior and malware in an embedded system without requiring system-level access.

FEASIBILITY STUDY
For initial proof-of-testing, we focus on detecting malware running on a single-board-computer.While many operations could be used as features to identify malware, we chose to specically focus on le writes for this initial work, as malware most commonly performs le writes [6].By extension, if we can identify the presence of erroneous le writes, we can detect the presence of malware.We perform a 100MB le write on a BeagleBone Black [1] and record the thermal images using a Seek thermal camera [2] at 9 fps. Figure 2 shows the board with three components indicated, as well as a series of thermal images of the board's heat signature over the course of a le write.All three chips provide clear visual indications of write activity, as each gets noticeably hotter over milliseconds.The eMMC ash storage is of particular interest as it only heats up during le writes, lending credibility to the hypothesis that the thermal signature could be used to identify the presence of malware.Complimenting power side-channel: Prior work [14] has shown the feasibility of using a power side channel to detect generic malware in the wild.We analyze the dierence between power and thermal side channels.In Figure 3 we show the power consumption and average pixel value of the power management chip of Beaglebone while running a Gafgyt malware.Formally, we measured that the 2 signals are strongly correlated with a 0.76 Pearson correlation coecient.However, unlike a power side channel, the thermal side channel is delayed by around 0.93B.This is because the chip takes time to heat, even though it immediately starts drawing power.Therefore, we consider the thermal side channel as a complementary set of features to the power side channel, for the task of malware detection.

DETECTING FILE-WRITE OPERATIONS FOR MALWARE DETECTION
File write operations involve writing some number of bytes into the onboard solid-state drive (SSD)/eMMC of the embedded system.This writing activity causes the SSD chip to heat up, as shown previously in Figure 2. To develop a malware detector, we focus on detecting these heat variations in the solid-state drive.In this section, we will describe a manual image processing-based and learning-based technique to localize temperature variations and classify if a le write operation is performed.

Image Processing-based Detection
We process the recorded video frames individually to extract the features needed for classication.As the rst work in this area, we chose the most common features: average pixel intensity and the ratio of heated area.The former is calculated as the mean value of the pixels in the image and the second is the number of pixels above a certain value.This step yields two values for each frame of the video.As a next step, we merge the results of the calculation for each frame into a time series.To remove the noise, we use a Butterworth low-pass lter.We manually tuned the hyperparameters to remove most of the noise without impacting small le writes.We found that a lter order of 3 and a cuto frequency of 0.1 yielded the best results for all the le writes.Figure 4 and 5 shows the mean pixel intensity values before and after ltering.We notice that the sharpest peaks correspond to actual le write operations, so we take the derivative of the time series signal.We put a threshold on the derivative to determine when a peak is due to an operation.The line in orange corresponds to part of the plot above the 0.001 threshold, which indicates a detected le write.

Learning-based Detection
We also explore a machine-learning-based approach to develop a general-purpose pipeline that can detect anomalies on any chip.We try Isolation Forests (IF) and Local Outlier Factors (LOF).After multiple trials, we concluded that for the correct hyperparameter values IF and LOF would produce the same anomalies.Therefore, we chose IF for our system, since it requires less memory for inference.
Feature extraction: For a general approach, independent of the type of Integrated Circuit, we rst nd the region of interest.We use a time series of all the values each pixel takes over time and calculate the min-max variability.The intuition is that the pixels with the largest variability are the pixels that heat up during an operation.We found that taking the top 3000 pixels gave us the best representation of the region in which we could visually observe a temperature increase.After slicing 3000 pixels from every video frame, we use the 9000 frames from the idle video to train an isolation forest with 50 trees.After training, we tested using the cropped video frames from the recording with le writes and got an anomaly score for each video frame.Finally, we use a condence threshold to classify frames with anomalies.
Classication: We merge consecutive detected frames and those with a distance of less than 4 undetected frames (⇠ 0.5 seconds) into a detection interval and count how many of the intervals overlap with the ground truth of when a le write occurred.We remove any interval that has less than 5 frames because they are likely false positives.Since there is a delay from the occurrence of a le write to when the heat from the write is discernible from the SSD, we consider a ±2 second interval from the operation was recorded as having been performed.Figure 6 shows the classication results along with the extracted signal from the region-of-interest.As we can see, the anomaly detector misses just 1 le write operation.

IMPLEMENTATION AND SETUP
In this section, we describe our experimental setup and various IoT device attack scenarios.

Experimental Setup
The components involved in generating, collecting, and analyzing thermal side-channel data are described as follows: 1) Single-Board Computer (SBC): The experiments are performed on a BeagleBone Black SBC [1].The SBC contains a 2) Thermal Cameras: Two of the three thermal cameras shown in Figure 7: FLIR A325sc [3] and Seek CompactPRO [2] were chosen to collect data, due to their diversity in thermal sensitivity, frame rate, and price point.The detailed characteristics of these cameras are mentioned in Table 1.The FLIR A325sc has a thermal sensitivity of 50< and is capable of capturing video at a rate of 60 frames per second.In contrast, the Seek CompactPRO has a slightly higher thermal sensitivity of 70< , and can only capture video at a rate of 9 frames per second.There is a signicant price dierence between these two cameras, with the FLIR A325sc costing $10, 000 and the Seek CompactPRO costing $500.FLIR A325sc is connected to the laptop running FLIR Research Studio version 1.1.3via ethernet cable.Seek CompactPRO is connected to the iPhone via a lightning cable and controlled through the ocial Seek Thermal application version 2.2.7.0.To maintain a stable video feed, the thermal camera is placed in a xed position, so that the majority of the board is within the camera's eld of view.We ensure that the camera is not in contact with any component on the board.

IoT Device Attack Scenarios
For testing dierent applications of IoT devices, we simulate a temperature sensor, air quality sensor, and a voice assistant on the Beaglebone Black.
•Temperature and air quality sensors: A simulation of the temperature and air quality sensor is performed by having the Beaglebone read /dev/urandom and log the values in a le, then send it through the network to a le server.
•Voice assistants: We simulate a voice assistant by recording audio of the top 6 Alexa commands and translating them to text using a speech-to-text library.We encompass a large number of commands that deal with data fetching from the internet and audio playback.We accomplish this with a script that downloads 10KB of data for 1000 iterations, performing text-to-speech and writing the results to a le.
•Malware: Malware is simulated while all the devices perform their usual tasks.We consider either idle or logging states for temperature and air quality sensors.For a voice assistant, we consider the following set: idle, getting voice commands, and playing music.We extract our features every 1 minute, such that we can capture all states.Our list of simulated malware actions is based on the top 4 most common Linux malware families [34].We extract the SYN-ood routine from lightaidra, the telnet brute force from Mirai, network scanning from pnscan, and a generic CPU cryptomining program.

PRELIMINARY RESULTS
In this section, we evaluate the performance of our le write detectors and malware detection in IoT devices.

Performance: File Write Detection
We perform a systematic study in detecting le writes of dierent sizes.We perform 20 le writes for le sizes from 1000KB -25KB.These les' write event sizes are chosen to be equal to or less than the average malware le size (in the range of hundreds of KB) [40]; in fact, Lua malware are even larger than 1MB.We capture thermal video with both the FLIR A325ac and the SeeK CompactPRO cameras.We then apply both our automatic and manual le write detection techniques from Sections 3.1 and 3.2, respectively.
Figure 8 shows the performance for le write detection using both types of cameras.We see that FLIR thermal camera has the highest performance at le sizes > 300 ⌫, and is at or above 90% at le sizes of 500 ⌫ and above.Conversely, the SeeK camera, averaged around 70% accuracy at le sizes > 300 ⌫ and outperformed the FLIR at detecting le writes <= 300 ⌫, with accuracy ranging between roughly 30 and 50%.There's a clear trade-o compared to the much less expensive SeeK if slightly lower detection performance is tolerable.Manual write detection slightly outperformed our automated write detector in both accuracy and recall, although additional evaluation of the automated procedure or alternate ML techniques would likely narrow the margin.

Performance: Malware Detection
In this section, we explore the capability of our features to detect malware in embedded devices.A detection is considered successful if the anomaly detector raises an alert within 1 minute of when the malware execution takes place.Similarly, a false positive is counted when an alert is raised during a time window where there is no malware running and a false negative is considered when no alert is raised during malware execution.
Table 2 shows the results of detecting various malicious routines as they run alongside the emulated devices.As mentioned, we run each of the malware routines 10 times for each benign action.Following are the key observations: 1) FP is usually 1 for Ransomware and Cryptomining activities.The reason is that these activities signicantly heat the CPU, which takes some time to cool down.This causes our system to register anomalies even after malware activity is nished.2) Our system nds it challenging to detect malware attacks that require very little CPU activity (e.g., scanning).For example, our system missed the rst two runs of telnet brute force, when the scanner was searching for IPs to attack.3) In the voice assistant case, the model is trained to accept long actions with high CPU requirements (e.g., playing music).Therefore, no alarm will be raised unless the CPU or power management chips heat beyond the detection threshold.That is why our detector missed some quick low computational network port scanning attacks.4) It also highlights the limitation that a targeted malware may be able to evade our detector by operating in sequences, allowing chips to cool down before performing the next batch of operations.

RELATED WORK
The literature in embedded systems security and side-channel information processing is extremely rich and rooted in many subelds of engineering.Here we zoom into some related areas.
Securing IoT devices: The susceptibility of IoT devices to attack has been studied thoroughly [30,51,52,54,55,58,62].In 2016, the  Mirai botnet was used to launch some of the largest recorded DDoS attacks in history, including a 623 Gbps attack [5], and a 1.2 Tbps attack [17].A Mirai bot simply spread by attempting log into a Telnet service using a list of factory default username/password combinations.The type of solutions that have been considered thus far focuses predominately on analyzing the network trac of these devices [9,22,38,60,62].ThermWare is complementary to these methods and provides an independent source of detection and validation.
Attack using side-channels: Past projects have shown methods for inferring information from a variety of signal sources [21], including electromagnetic elds [4,45], household AC power consumption [26], sounds and vibrations [12,20,49,53], inertial sensor data [15], and optical channels [56,59].Integrated circuit (IC) design and manufacturing research have explored on-chip heat distributions as one of the quality metrics [24,32,39].Security researchers have leveraged this as a side channel to probe internal activities and infer various information to reveal potential attacks through covert communications and data leakage [23,29,37,41,44].
Prior work [57] explores a reverse problem and has shown that CPU workload can be used to infer CPU temperature.In this paper, we seek to utilize the inherent problem of information leakage to our advantage in detecting malware.
Defense using side-channels: The use of side channels to detect malicious behavior has also been prevalent in several prior works [16,31,43,61].In [11], side-channel current analysis is used to detect the presence of modied rmware on an embedded solid-state drive.A few early papers [13,14,33,36], published between 2008 and 2013, have shown the possibility of using energy consumption patterns as an indicator of anomalous activity on computing devices.VirusMeter [36] and WattsUpDoc [14] show changes in power consumption during the execution of malware codes in cell phones and medical devices.Prior works [18,27,28] have used the thermal and EM side channels to detect backdoors in device manufacturing.While their threat model is dierent from ours (e.g., supply chain attacker), they demonstrate that these side channels can be used to detect anomalies in a chip's normal heating patterns.Our work demonstrates that thermal readings can be used to detect anomalous out-of-order operations for the purpose of detecting malware.

APPLICATIONS BEYOND SECURITY
In this paper, we explored the potential of using thermal side-channel techniques to detect malware in embedded systems.We believe that ThermWare can also be extended to more general use cases and application scenarios such as non-intrusive reverse engineering and remote performance auditing.
Non-intrusive reverse engineering: Reverse engineering embedded systems can be a challenging and time-consuming process, especially if the goal is to do so in a non-intrusive way.However, the thermal side-channel can be a useful tool to gain insight in the internals of the chip.By analyzing the changes in surface temperature of the chip, as it performs dierent operations, an engineer can create a spatio-temporal heatmap which can reveal the sequence of operations that were executed.This information can be helpful in understanding the design and function of the system and can help in reverse engineering the ongoing process.
Performance monitoring: ThermWare can also be used to remotely monitor and audit the performance of data centers and high performance computing (HPC) clusters.The thermal signatures can be obtained using the existing temperature sensors and infrared cameras in the centers.These sensors can be repurposed to create a more ne grained spatial heatmap which can be then used to monitor the performance and behaviour of the system, including memory transfer, job scheduling and activity of CPU, GPU, routers, etc.Using the performance metrics, engineers can gain insights into the functioning of the data centers and identify potential issues or scope of improvements.Additionally, any non-regular temperature patterns detected can serve as early warnings for data leaks, malware, or other security threats.
Covert communication: Alternative communication modalities are useful [48,50] for clandestine operations where information need to be exchanged without leaving any trace in the wireless channel or in the Internet.Recently, BitWhisper showed the use of heat generated by CPU or GPU as a modality for covert data communication [23].By detecting the heat signature of micro-operation at the circuit level, ThermWare essentially provides a way to observe complex patterns of heat signatures.If explored, this can be a building block for a high data-rate thermal communication channel between an embedded device and a thermal camera.The system can run a combination of micro-operations to generate a specic pattern of heat across dierent components encoding data bits.The camera can use methods shown in this paper to recover data from the temporal variations of the heat patterns.

CONCLUSION AND FUTURE WORK
This paper presents ThermWare, a malware detection system for IoT devices using thermal side-channel.We show the possibility of real-time malware detection by monitoring the thermal patterns of a le write operation on an embedded system.Needless to say, ThermWare is an exploratory rst realization of the concept and there is room for improvements and further work.
Formalizing the verication process: ThermWare requires knowing the original thermal signature of an application to detect anomalies.This limits the generalization to other applications software updates of the existing application since any change in code aects the thermal signature.We plan to learn the mapping of code to thermal signatures.Our intuition is to have a directed graph of computing operation to a spatio-temporal thermal sequence which can be used to generate new thermal signatures given any code.If successful, developers can create a version of true thermal signatures using the directed graph and release it along with their code.Generalize malware detection: ThermWare is a novel solution for detecting malware that leverages the inherent temperature variations inside a chip that occur when executing commands.At present, the system is primarily focused on read/write activities.However, the fundamental concept of detecting malware by monitoring temperature patterns is not restricted to read/write activities alone.In the future, we aim to generalize this technique for the detection of any unauthorized execution of commands, as any process or computation will result in heat variations in certain parts of the chip.These variations may be small or large, depending on the activity, but with nely calibrated temperature sensors, we can detect these variations and use them to detect malware.Obfuscation: Currently, the system does not account for an adversary that is invested in bypassing our thermal detection.For example, an adversary can obfuscate the attack by emulating thermal signatures.This can be done by carefully timing the le write operation along with other le write operations or by emulating a known application's signature.We plan to work on identifying such obfuscation attacks.
Resolution: In the future, we will explore the eect of the spatial and temporal spread of the heat to neighboring regions.If a compute-heavy task takes place, the heat gradually spread all over the chip spoiling the resolution until the chip cools down.We want to make the system robust to past operations by incorporating the eect of the multidimensional Point Spread Function of the heat.As this research evolves, we envision that our core idea of thermal side-channel will open new ways of defense in miniature IoT devices.

Figure 1 :
Figure 1: The overview of ThermWare, a thermal-side channel based malware detection system.

Figure 2 :
Figure 2: Thermal signature of a 100 MB le written over time.The left image shows a BeagleBone Black with three components highlighted: (1) Power Management Integrated Chip, (2) Embedded Multimedia Card, and (3) Processor, all of which produce distinct heat signatures over the course of the write operation.

Figure 3 :
Figure 3: Power vs. thermal side channel when running Gafgyt botnet on a Beaglebone Black.

Figure 4 :
Figure 4: The extracted signal from the thermal camera and ground truth of le-write events over time.

Figure 5 :
Figure 5: Filtered signal and detected anomalous le-writes events.

Figure 6 :
Figure 6: Detecting anomalous le write operations using the machine-learning-based technique.

Figure 8 :
Figure 8: (a) Accuracy, (b) precision, and (c) recall of SeeK Compact Pro and FLIR A325sc cameras, using both automated and manual detection.

Table 1 :
Comparison of the thermal cameras used in experiments.

Table 2 :
Malware detection results for dierent kinds of devices and activities.