Deception in Drone Surveillance Missions: Strategic vs. Learning Approaches

Unmanned Aerial Vehicles (UAVs) have been used for surveillance operations, search and rescue missions, and delivery services. Given their importance and versatility, they naturally become targets for cyberattacks. Denial-of-Service (DoS) attacks are commonly considered to exhaust their resources or crash UAVs (or drones). This work proposes a unique proactive defense using honey drones (HD) for UAVs during surveillance operations. These HDs use lightweight virtual machines to lure and redirect potential DoS attacks. Both the choice of target by the attacker and the HD's deceptive tactics are influenced by the strength of the radio signal. However, a critical trade-off exists in that stronger signals can deplete battery life, while weaker signals can negatively affect the connectivity of a drone fleet network. To address this, we formulate an optimization problem to select the best strategies for an attacker or defender in selecting their signal strength level. We propose a novel HD-based defense to identify the optimal setting using deep reinforcement learning (DRL) or game theory and compare their performance with that of non-HD-based methods, such as Intrusion Detection Systems and ContainerDrone. Our experiments demonstrate the unique benefits and superior efficacy of each HD-based defense across various attack scenarios.


INTRODUCTION
Our work presents a novel approach to mitigate DoS attacks by employing defensive deception (DD) tactics within Unmanned Aerial Vehicles (UAVs) systems [23].We propose a surveillance mission system that utilizes honey drones (HDs), a specialized form of a drone-based honeypot, to combat Denial-of-Service (DoS) attacks while performing relay service.Unlike techniques using Raspberry Pi to emulate static drones [6], our HDs are drone-based, equipped with specifically vulnerable software and dynamic signal strength.These HDs function as proactive decoys, attracting and disorienting cyber attackers, collecting crucial attack intelligence, and dynamically reconfiguring system settings as a response.
This work identifies optimal settings under which game theory or deep reinforcement learning (DRL)-based HD defense is used while investigating the advantages and constraints of each strategy.This work has the following key contributions: (1) • Defensive Deception using Honey Drones: We design a surveillance mission system where HDs, serving as dronebased mobile honeypots with intentionally vulnerable software, aim to attract DoS attacks.These drones act as proactive decoys to collect crucial attack intelligence and allow responses to the detected threat.• Intelligent Attack-Defense Game Modeling Under Uncertainty: We create an attack-defense interaction model, which allows both the attacker and defender to adopt intelligent strategies.This intelligent strategy selection will enrich defensive deception research by introducing promising proactive defense strategies under diverse cyber games.

RELATED WORK
The existing body of research has presented various strategies for defending UAVs against DoS attacks.Those include a resource allocation technique [4], hierarchical detection system [16], intrusion detection systems (IDS) [12] , and injection detector [17].
Game-Theoretic Defensive Deception: Various defensive deception (DD) applications have adopted game theory, such as Cumulative Prospect Theory [22] or a multi-stage Stackelberg game [2], to analyze attacker behavior.Hypergame Theory, another gametheoretic approach, has been employed to develop more advanced defensive deception techniques [1,19,20].These studies underscore the potential of Hypergame Theory in devising robust strategies against cyber threats.In addition, no prior works above have considered UAV contexts with high dynamics, resource-constraints, and time-sensitive tasks.
DRL-based Defensive Deception: DRL has also been popularly leveraged for optimizing the effectiveness of DD approaches and examining various vulnerabilities [10] .In addition, deceptive signals as a defense against the vulnerabilities of RL have been examined [8].In addition, the optimal selection of proactive defenses, such as moving target defense and DD, was studied using DRL [3].DRL was also used to strengthen UAV communications [11].
Limitations of the Related State-of-the-Art Defensive Deception Research: Although defensive deception techniques often employ either game theory or Deep Reinforcement Learning (DRL), they are seldom used in conjunction or compared directly to investigate the advantage of each approach.Since each approach using game theory or DRL to solve a given problem has not been compared, there is a lack of understanding of how to best leverage the merit of each technique under certain conditions.We will fill this gap and contribute to identifying the best approach to be leveraged depending on environmental and system conditions.

SYSTEM MODEL 3.1 Network Model
We envisage a drone fleet for surveillance operations within a targeted region, aiming to maximize mission effectiveness while defending against DoS attacks.The network design includes a regional leader drone (RLD) communicating with the ground control station (GCS) through a satellite network [5], and mission drones (MDs) and honey drones (HDs) connected to the RLD via WiFi, forming a flying ad hoc network (FANET) [9].Each drone maintains the Neighbor Table (NT) and Fleet Table (FT) for location and mission crew status.On receipt of a hello message from another drone, a connection setup procedure starts, encompassing a TCP handshake and data transmission over UDP [9].

Node Model
Our network incorporates a Ground Control Station (GCS) for task distribution, a charging station (CS) for drone power replenishment, and UAVs composed of an RLD, multiple MDs, and multiple HDs.The GCS monitors the mission's progression, the CS charges the drones, and the RLD adjusts the drones' routes and signal strength levels dynamically.MDs adhere to a specified path and transmit data to the RLD via multi-hop communication, while HDs can serve as mobile drone-based honeypots to lure DoS attacks.

Energy Model
For our simulation, we deploy Crazyflie 2.X quadrotor drones and consider the energy utilization of both MDs and HDs following a model based on their different operational rates [13].

Threat Model
We focus on DoS attacks, a prevalent and severe threat for UAVs.These attacks execute by sending numerous simultaneous JSON connection requests to a drone under attack, disrupting its network connectivity and leading to a drone crash [7].To evaluate a drone's software vulnerability, we employ the Common Vulnerability Scoring System (CVSS), symbolized as a real value, vul  ∈ [0, 1], signifying the probability of a successful attack compromising drone  [18].Figure1 portrays the high-level conceptual layout of our proposed honey drone mission system and the way agents choose attack/defense tactics.

STRATEGY SELECTION
This section tackles the challenge in mission systems arising from the absence of pre-existing information about attack patterns, necessitating an autonomous decision-making mechanism.We employ game theory and DRL to address this.We characterize the time taken for mission completion as   , with an upper limit denoted as  max  .The mission is divided into several rounds of interaction between the attacker and the defender.
Both attacker and defender have ten interaction strategies.This number was determined through preliminary experiments, where we assessed various strategy counts.Ten strategies emerged as the optimal balance, ensuring computational efficiency without oversimplifying the environmental conditions.We tested other numbers of strategies and found there was no significant difference.The design of the three subgames also considered the number of rounds a game plays before the mission terminates, ensuring players accumulate sufficient interaction experiences to form beliefs. Introducing too many subgames may dilute these experiences and make belief formation challenging.) where  = 4, and   ()/  ( 0 ) is the observed signal strength at a distance / 0 .

Attack Strategy Selection using Game Theory (GT).
The game theory (GT) agent for the attacker identifies an optimal attack strategy   based on the expected utility yielded by choosing strategy .This is calculated as: where    is the attacker's belief regarding the defender's strategy choice .The terms     and     refer to the attacker's gain and loss respectively.  target, indicates the set of target drones when the attacker picks strategy , while  refers to the attack budget.

ASR ′
is the anticipated attack success ratio for drone   .The term    denotes the criticality of drone   , and sig  is indicative of the maximum signal strength (normalized to 10).

Attack Strategy Selection using DRL.
The objective of the attacker DRL agent is to select the optimal attack strategy to maximize the accumulated reward,   .The decision-making process of this agent, including its state, action set, and reward function, is described as follows: • The State (S  ) can be expressed as S  = (   ), where    is the count of drones within each signal strength range at round .
where  is a predefined integer to ensure a stronger signal strength for HDs.The signal transmission range is uniformly divided from 100m to 1000m.Game theory and DRL are utilized to find the optimal defensive strategy,    , to modulate the signal strength levels of both MDs and HDs.

Defense Strategy Selection using Game Theory (GT).
The defender's expected utility when taking defense strategy  is computed based on the multiplication of the defender's belief,    , in an attacker choosing attack strategy  and the utility of the defender for every defense strategy against each attack strategy,    .This utility signifies the difference between the defender's gain and loss.The gain takes into account the decreased security vulnerability by defense strategy  and the attack cost by selecting attack strategy .Conversely, the loss comprises the negative effect introduced by attack strategy  and defense cost by opting for defense strategy .The defender's utility when choosing strategy  is: di where   Σ is the defender's beliefs toward attack strategies.The    and    denote the defender's gain and loss.The vul  refers to the vulnerability level of drone  in range [0, 1], as mentioned in Section 3.4.The number of target drones perceived by the defender in   target, is based on their experience.The defender maintains a record of which drones are targeted when the attacker opts for strategy .The  is the attack budget. ′ , is the anticipated number of connected drones after choosing   , and   is the total number of drones initially allocated to the mission team.The sig  denotes the maximum signal level (i.e., 10).

Defense Strategy Selection using DRL.
The defender DRL agent aims to optimize the drones' signal strength, including both MDs and HDs, by maximizing the total accumulated reward.The state, action set, and reward used by the defender DRL agent are as follows: • State (S   ) is composed of the mission completion ratio and the scan progress map, defined as S   = (R   , M   ) where R   is the ratio of completed mission tasks at round  in range [0, 1].M   is a map showing the scan progress for each cell at round .Each cell value in the target area reflects the level of scanning progress, providing a detailed overview of the surveillance status of the target area.
where each action   signifies a defense strategy   indicating the signal strength of the HDs.For MDs, the descriptions in Section 4.2.1 apply.The action  selected by the defender in round  is represented as    .• Reward Function (R   (   )), a defender's immediate reward by executing action    , is given by N   , the number of mission tasks completed in round .The accumulated defense reward,   , is calculated by , where   is the defender's decay factor.

EXPERIMENT SETUP 5.1 Simulation Environment Setup
Experiments were conducted in a Python 3.10 simulated environment using PyTorch and NetworkX.The surveillance area is a 750m x 750m grid divided into 25 cells.The drone fleet consists of 15 MDs and 5 HDs.Drones with low battery return to the charging station, and if no additional MDs are available, the mission proceeds with fewer MDs.We employed the A2C algorithm for DRL and used a memory buffer storing up to 10,000 transitions.Prioritized experience replay [15] was also integrated to emphasize high TD error transitions.

Metrics
For experimental verification, we consider the following criteria: (1) Ratio of Completed Mission Tasks (R  ), which quantifies the proportion of completed cells among all assigned cells during the mission duration; (2) Energy Consumption (EC) accounts for the cumulative energy utilization by all drones, encompassing both HDs and MDs; and (3) Number of Active, Connected Drones (N  ) estimates the number of non-compromised MDs participating in the mission execution.

Comparing Schemes
We compare the performance of the following schemes: (1) HD-F: HD-based approach using a fixed signal strength level (i.e., 5); (2) HD-DRL: HD-based approach with the optimal signal strength level identified by DRL; (3) HD-GT: HD-based approach with the optimal signal strength level identified by GT; (4) IDS [14]: Intrusion detection system-based approach to detect and isolate DoS attacks; (5) CD: ContainerDrone [4] which stops working when detecting DoS attacks; and (6) No-Defense: No defense is used.

NUMERICAL RESULTS AND ANALYSES 6.1 Ratio of Mission Completion
Figure 2 presents the performance of various defense schemes (see Section 5.3) against DoS attacks based on the ratio of completed mission tasks (R  ).Key observations include: (1) HD-DRL excels against fixed or DRL-based attack strategies.Clearer action patterns in these attacks, especially with DRL, allow the defender to counteract more effectively.The defender's reward, linked to completed mission tasks (Section 4.2.3),results in a higher R  when the attacker's patterns are more discernible.(2) HD-GT performs well in early game rounds due to GT's strategic forecasting based on a rapidly formed payoff matrix.However, its advantage diminishes over time due to GT's limited adaptability.Conversely, DRL strategies, while initially slower due to their learning curve, improve over time, eventually outpacing GT-based approaches.(3) HD-DRL shows performance fluctuations, particularly against intelligent adversaries using GT or DRL.These adversaries amplify the defender's optimization complexity, causing more explorative behaviors in the DRL agent.Additionally, the presence of an intelligent opponent introduces non-stationarity challenges.In this multi-agent environment, strategies evolve and change the goals of the optimal status with fluctuating learning curve.

Energy Consumption
Figure 3 illustrates the energy consumption of various defense strategies (i.e., HD-F, HD-DRL, HD-GT, IDS, CD, and No-Defense) against DoS attacks, measured using the EC metric.Based on Figure 3, we observe: (1) HD-GT demonstrates the most energyefficient strategy among the various defense mechanisms considered.This is attributed to the fact that the GT-based agent takes signal strength-based defense costs into account while deciding on a strategy (see the details in Section 4.2.2),leading to the lowest EC and hence conserving energy.(2) Notably, when the attacker employs GT or DRL to select its attack strategy, HDs employing intelligent defense mechanisms (i.e., HD-DRL and HD-GT) show lower EC than other defense techniques (i.e., IDS and CD).As seen in Figure 2, intelligent HD-based strategies can effectively balance mission performance with energy conservation, showing a high R  and maintaining a lower energy cost in EC.

Number of Active, Connected Drones
Figure 4 presents the number of active, connected drones for various defense strategies (i.e., HD-F, HD-DRL, HD-GT, IDS, CD, and No-Defense) in response to DoS attacks, measured using the N  metric.Based on Figure 4, we make the following observations: (1) HD-based defenses (i.e., HD-F, HD-DRL, and HD-GT) effectively maintain the connectivity of the drone fleet.This is because HDs can serve as relays, facilitating mission drones' connection to the regional leader drone.(2) High connectivity, ensured by HDs, does not necessarily lead to increased energy consumption, particularly when intelligent strategies are employed.Similar to Figure 3, HDbased defenses demonstrate high N  while HD-DRL and HD-GT exhibit lower energy consumption in Figures 3b and 3c.These intelligent strategies do not rely solely on high signal strength.Instead, they occasionally utilize lower signal strengths to avoid being targeted by attackers, resulting in efficient energy conservation.

CONCLUSION & FUTURE WORK
This study compared various HD-based defenses with non-HDbased counterparts (i.e., IDS and CD) when intelligent strategy        selection methods are used based on deep reinforcement learning or game theory in terms of the mission completion ratio, energy consumption, and the number of active, connected drones.Via the extensive experiments, we obtained the following key findings: (1) HD-based defenses outperformed IDS and CD, where both HD-DRL and HD-GT offer distinctive advantages.These HDbased defenses efficiently maintained the connectivity of the drone fleet, ensured high mission completion ratios, and regulated energy consumption effectively by properly defending against DoS attacks.This was primarily achieved by intelligently varying the signal strength to minimize security vulnerabilities while maintaining strong mission performance and energy efficiency.(2) When comparing HD-DRL and HD-GT, each displayed unique strengths depending on the specific context.HD-DRL showed superior performance under fixed or DRL-based attack strategy, which tends to exhibit clearer action patterns which made it easier for HD-DRL to identify and counter the attacks.DRL's autonomous and continuous learning capabilities based on complex neural networks enabled the HD-DRL strategy to gradually improve its performance, eventually surpassing other strategies.(3) On the other hand, HD-GT demonstrated initial advantages, particularly in the early stages of the game, due to the strategic forecasting abilities of game theory.However, as the game advanced, this advantage diminished due to GT's limited explorability for optimal solutions under high dynamics.Nevertheless, HD-GT stood out in energy efficiency, consuming less energy than other defenses while maintaining high mission completion rates and drone connectivity.Hence, HD-GT can provide its high merit under resource-constrained environments, which requires significantly low energy consumption.(4) Overall, the choice between HD-DRL and HD-GT would depend on the specific circumstances.For scenarios with predictable or fixed attacker strategies or where long-term learning and adaptation are required or allowed, HD-DRL would be a preferred choice.Conversely, for situations requiring immediate effective defenses or where energy efficiency is a paramount concern, HD-GT can provide a feasible, attractive solution.
As for future work directions, we aim to extend this work by (1) exploring different types of cyberattacks beyond DoS, to ascertain the effectiveness of HD-based deceptive defense techniques; (2) incorporating transfer learning [21] to counteract the initial performance drop in RL, and to evaluate its advantages over GT in aspects such as mission efficacy and efficiency (including computational burden like training duration); (3) designing more realistic and methodical mechanisms to evaluate agents' perceived uncertainty that aligns with real-world scenarios; (4) conducting sensitivity analysis by varying scenario settings, to thoroughly assess the robustness and scalability of our approach across different environments; and (5) exploring other application scenarios to evaluate and validate our technique further.This will provide a more comprehensive understanding of the limitations and potential of our method, aligning it closer with practical implementations.

•
Extensive Comparative Performance Validation & Analyses: We validate the performance of the developed HDbased defenses via extensive experiments and demonstrate their superiority under various attack scenarios in mission performance and energy conservation.

Figure 1 :
Figure 1: The proposed model of honey drone mission systems under the scenario where both the DoS attacker and defender leverage either DRL or GT to choose their most effective strategies, specifically the optimal signal strength.
R  under DRL attack.

Figure 2 :
Figure2: Performance analysis of HD-DRL, HD-GT, IDS, CD, and fixed defense, given an attack strategy with respect to the ratio of completed mission tasks (R  ).
E C under GT attack.
E C under DRL attack.

Figure 3 :
Figure 3: Performance analysis of HD-DRL, HD-GT, IDS, CD, and fixed defense, given an attack strategy with respect to energy consumption (EC).
N  under GT attack.
N  under DRL attack.

Figure 4 :
Figure 4: Performance analysis of HD-DRL, HD-GT, IDS, CD, and fixed defense, given an attack strategy, with respect to the number of active, connected drones (N  ).
4.1.1Attacker's Action Space.The attacker observes the drones' signal strengths and selects its attack strategy,   ∈ { 1 , . . .,  10 }, where each action corresponds to a range of received signal strengths [   ,    ].The signal strength decreases as the distance between the transmitter and receiver increases and is estimated by the signal attenuation formula   () =   ( 0 ) − • 10 • log 10 (   0 • The Action Set (A  ) is defined by A  = { 1 , . . .,   , . . .,   }, where each   equates to   and determines the subset of target drones, represented as  target, .The action  executed by the attacker DRL agent in round  is denoted as    .• The Reward Function (R   (   )) is the immediate reward an attacker obtains by executing    , and is given by N   , which is the number of unfulfilled mission tasks in round .The accumulated attack reward, symbolized as   , is computed by   = ∞  =0 (  )  • R   , where   represents the decay factor of the attacker.Defender's Action Space.Our defense strategy   ∈  1 , . . .,  10 controls the HDs' signal strength    .The signal strength of MDs is set as