NeuroRadar: A Neuromorphic Radar Sensor for Low-Power IoT Systems

Radar sensors have recently been explored in the industrial and consumer Internet of Things (IoT). However, such applications often require self-sustainable or untethered operations, which are at odds with the high power consumption of radar. This paper proposes NeuroRadar, a neuromorphic radar sensor, to achieve low-power wireless sensing. NeuroRadar jointly optimizes the analog hardware and the computation model, in order to mimic the highly efficient biological sensing and neural processing system. NeuroRadar features a highly simplified radar front end, which eliminates the power-hungry components in conventional radars. It directly "encodes" ambient motion into spiking signals, which can be processed using spiking neural networks running on energy-efficient neuromorphic computing platforms. We have prototyped NeuroRadar and evaluated its performance in two use cases: gesture sensing and localization. Our experiments demonstrate that NeuroRadar can achieve high sensing accuracy, at orders of magnitude lower power consumption compared with traditional radar.


INTRODUCTION
Radar sensors in the Internet of Things (IoT) systems have gained traction in recent years, and are widely used in healthcare, smart homes, industrial automation, and intelligent transportation [26,54,59].By 2025, industrial radar applications are anticipated to encompass 10 million devices, whereas the consumer market will reach a substantial $250 million [93].Nevertheless, the high power consumption of radar hardware remains a significant challenge, particularly  for battery-operated IoT devices and wearables where energy efficiency and battery lifespan are crucial.Compounding this issue, numerous smart sensing applications, such as motion-activated security radar, wearable gesture recognition, and activity classification, often employ power-intensive artificial neural networks (ANNs) for signal processing.Unlike human neurons that operate in short, pulse-based bursts, ANNs prolong the activity of their "neurons" using continuous activation functions, which substantially increases the power demands of IoT devices.Furthermore, ANNs utilize the classical von Neumann architecture, which frequently shuttles data between physically separate CPU and memory units, resulting in additional processing overhead.
Recent advancements in neuromorphic engineering have inspired Spiking Neural Networks (SNNs) [46] and dedicated neuromorphic circuits [18] that better approach the efficiency of sensory signal processing in the brain.SNNs are structured to mirror the pulse-based behavior of the human nervous system.They consist of spiking neurons and the synaptic connections between them.Realized on dedicated neuromorphic circuits, SNNs showcase exceptional energy efficiency that surpasses traditional von Neumann computing units by orders of magnitude [10].The revolution in neuromorphic computing has also given rise to state-of-the-art neuromorphic sensing hardware, such as the energy-efficient, fast-response event camera [39].
Inspired by these advancements, recent research has proposed SNN-based signal processing to facilitate low-power radar operation [6,7].However, these systems do not incorporate a full-fledged neuromorphic hardware architecture.Primarily, the analog front-end of these "SNN radar" systems [6,7] remains the same as traditional radars.Although the SNN-based signal processing has lowered the signal processing power consumption to the order of hundreds of  [77], the radar front-end can demand tens to hundreds of mW.This discrepancy poses a challenge to achieving truly energy-efficient radar sensing.Additionally, the "SNN radar" systems [6,7] continue to rely on conventional CPUs or digital signal processing (DSP) units for signal processing.The radar signals have to be first sampled by analog-to-digital converters (ADCs), mapped into spikes, and then processed by SNNs for ranging or environmental perception.Unfortunately, the extra sampling steps prior to the SNN involve traditional computing units, which adds a substantial overhead, preventing the full exploitation of neuromorphic computing's potential.
In this paper, we introduce NEURORADAR, a novel low-power radar sensing system that fully exploits the power of neuromorphic sensing and computing.NEURORADAR draws inspiration from neuromorphic sensors that mimic mammalian sensory systems, generating event-triggered outputs in response to external stimuli, as depicted in Fig. 1.Contrary to traditional radars with continuous frame-based outputs, NEURORADAR produces spiking patterns upon detecting motion in the surrounding area.Unlike the recently proposed SNN radars [6,7], NEURORADAR follows a neuromorphic architecture that jointly designs the analog sensing front-end and spiking signal processing: (1) SIL-based radar sensor front-end.NEURORADAR employs a drastically simplified RF front-end that removes most powerintensive active RF components that exist in traditional radars, leaving only a low-power free-running oscillator.NEURORADAR senses environmental changes using the self-injection locking (SIL) principle [90], where the oscillator's frequency is influenced by motion in the surrounding area.Despite the simplistic design, NEURORADAR preserves a reasonable level of sensitivity due to the inherent properties of SIL architecture that enhance the signal strength.However, a single SIL sensor is unable to provide angular resolution and accurate range information, as it only senses environmental motion information.To overcome this limitation, we draw inspiration from the compound structures found in certain biological eyes [29], and propose to design an array of SIL sensors with judiciously separated carrier frequencies.With the SIL sensor array, NEURORADAR can implicitly encode spatial information through the multi-channel spiking signals, which can subsequently be decoded using applicationspecific SNN models.
(2) Analog spike encoding and full SNN processing.NEURO-RADAR converts ambient motion signals from the sensor front-end into spikes using an analog spike encoding circuit.The spike encoder follows a biological neuron model, preserving all the essential sensing information in the spike sequences.The spike sequences can then be directly processed by the SNNs on neuromorphic computing systems, thereby eliminating the need for any non-spike-based computing units.Consequently, we can train the SNNs using these raw spike signals for various tasks, including gesture recognition and localization.This comprehensive SNN processing workflow allows NEURORADAR to deliver application-specific sensing results with superior energy efficiency.
To verify the effectiveness of our design, we prototype NEU-RORADAR using discrete RF circuits and further perform simulation for the integrated circuit (IC) version.Our experiments show that a single-RF-chain NEURORADAR can effectively sense motion in the environment while consuming only 780 W power (IC: 240 W), which is 1-2 orders of magnitude lower than existing continuous wave radar systems with similar operating frequencies.We further conduct two case studies to verify the usability of NEURO-RADAR for practical IoT sensing applications.Specifically, NEU-RORADAR can facilitate hand gesture recognition with an accuracy of 94.6% and perform moving target localization with an average error of 0.98 m.Compared with other SNN-based gesture recognition systems [6,7,68] with similar capability, NEURORADAR saves 78%-93% computing power.NEURORADAR reduces the end-to-end power consumption by 1-2 orders of magnitude for both use cases, compared with existing radars.Considering the spatial resolution and motion detection capabilities, NEURORADAR can potentially be used in a wide range of wireless IoT sensing applications, such as vital sign sensing, surveillance alarm, etc.
In summary, we make the following contributions: (1) We introduce NEURORADAR, a novel low-power radar paradigm that realizes the concept of neuromorphic radar sensing.NEU-RORADAR incorporates a spike-generation radar sensor that directly interfaces with SNN-based neuromorphic processors, leading to superior energy efficiency.
(2) We devise a low-power, low-complexity radar front end based on the SIL principle.Both our theoretical analysis and experimental results demonstrate that multi-chain SIL radar sensors can supply ample information for short-range, low-velocity sensing applications.
(3) We implement the neuromorphic radar system through a PCB prototype and carry out simulations for the IC version.Our experiments verify NEURORADAR's capability to empower the resourceconstrained IoT devices to perform low-power smart sensing.

BACKGROUND 2.1 Self-Injection Locking
Self-injection locking [14] is a phenomenon where an oscillator's frequency is affected by a reflected version of its own signal, as depicted in Fig. 2. Unlike conventional injection locking, where the oscillator's frequency is locked to the frequency of an external injection signal [1], the frequency of a self-injection-locked oscillator (SILO) is dependent on the amplitude and phase of the reflected signal.
Based on classical analysis of injection locking [61], we can model the frequency shift of the oscillator caused by the reflectors in the environment: Here,   is the center frequency of the oscillator;  is the quality factor of the LC resonating tank;  () is the injection signal from the  ℎ reflector;   is the oscillator signal;  () is the distance of the  ℎ reflector;  is the speed of light, and  is the total number of reflectors.
Notably, the strength of the reflected signal, |   |, is proportional to 1/ 2 .Its phase, ∠   , encapsulated in the sine term, is also related to  .Thus, the oscillation frequency is modulated by the motion of the reflectors.In the case of multiple reflectors, the observed frequency shift of a SILO is the summation of the shifts caused by each reflector.Since static reflectors cause constant frequency shifts, only moving reflectors contribute to frequency modulation.This principle is leveraged by NEURORADAR for environmental perception.

Spiking Neural Networks
Biological neurons communicate by generating and propagating electrical pulses or spikes.Neurons are interconnected via specialized junctions termed synapses.A neuron fires a spike whenever enough incoming pulses accumulate to push its membrane potential above a certain threshold, following which the neuron resets itself.This process is often abstracted as Leaky-Integrate-and-Fire (LIF) [24].In traditional ANNs, neurons encode information in a complex network of real-valued activations.Activation functions such as ReLU essentially approximate the spiking rates of biological neurons.In contrast, SNNs mimic the human neuron system more closely by: (i) using spiking signals directly for inter-neuronal communication and (ii) using the timing rather than shape of spikes to convey neural information.
The computation and energy efficiency advantages of SNNs originate from two fundamental aspects.First, the neuromorphic architecture can realize massive parallel processing, since each neuron represents an integrated memory and computation unit, in contrast to the rigid separation of CPU and RAM in von Neumann architectures.Thus, SNNs can potentially continue to push the "intelligence per Joule" as Moore's law scaling comes to an imminent end.Secondly, the energy consumption of SNNs is proportional to the number of processed spikes, with each spike requiring as little as a few pico Joules [10].As information is sparsely encoded in the rates/timing of the spiking neurons, an SNN can implement the same end-to-end functionality as an ANN [70] but with much lower energy expenditure.
Notably, the advantages of SNNs can be manifested only on specialized non-von Neumann in-memory computing platforms specifically designed to process spiking inputs.The past decade has witnessed a variety of such platforms, such as Intel Loihi [18], uBrain [77], DynapSE [52], IBM Truenorth [3], and SPINNaker [22].Albeit an active area of research, neuromorphic computers have already demonstrated orders of magnitude of energy efficiency than conventional computing architectures [10].

SYSTEM OVERVIEW
NEURORADAR consists of three main components: sensor front-end, spike encoders, and spike processors (Fig. 3).The sensor front-end senses ambient motion and the output signals are converted into spike sequences (referred to as spike trains) by the spike encoders.These spike trains are then directly processed by the energy-efficient SNNs.Sensor front-end.The NEURORADAR front-end emits a weak, continuous-wave single-tone signal in the 0.3∼3 GHz ultra-high frequency (UHF) band.The core component is a SILO whose frequency is modulated by the motion of the surrounding targets [90].By demodulating this frequency shift, the system generates a baseband signal that carries the motion information.We further introduce a sensor array design that combines multiple SILOs with different operating frequencies to provide richer spatiotemporal information.
Spike encoder.The spike encoding circuit takes the baseband signal produced by the front-end and converts the signal into spike trains following the LIF model [24] (Sec.2.2).Given that the input is AC-coupled and the signal comprises both positive and negative parts, two spike encoders are jointly employed to encode each channel of the radar sensor.The spike encoding circuits operate entirely in an event-driven manner; they only generate spikes when the sensor front-end detects motion and stays idle otherwise.
Spike processor.The spike encoders interface directly with the neuromorphic computing circuits, enabling all signals to be processed within the spike domain.Our approach involves designing multi-layer convolutional SNNs to process the multi-channel spike chains from the NEURORADAR sensor array.These SNNs execute pattern recognition and regression tasks according to the application requirements.

NEURORADAR SENSOR FRONT-END DESIGN 4.1 Design Principle
The main principle of the front-end design is to reduce its power consumption for NEURORADAR.To achieve it, we first analyze the power-hungry RF components of traditional radars that lead to high power consumption.A typical continuous-wave (CW) radar front-end, as shown in Fig. 4(a), includes elements such as voltagecontrolled oscillator (VCO), phase-locked-loop (PLL), crystal oscillator (XO), mixer, low-noise amplifier (LNA), and power amplifier (PA).While power consumption can vary depending on specific designs, we annotate a representative CW radar [86] for reference.These active RF components are necessary to maintain the high sensing performance required for advanced applications such as automotive perception.For instance, the phase noise of a radar directly impacts target detectability, spatial resolution, and maximum range [20,73].To contain the phase noise, most radar systems use a VCO and PLL in a feedback loop, using a high-precision XO as the reference input to synthesize a low-phase noise radar signal with a precise frequency.Consequently, such high-profile radar front-ends require a high power budget of several hundred mW, irrespective of the signal processing hardware.
In contrast to traditional radar systems, neuromorphic systems exhibit superior power efficiency and rapid response times by emulating the event-driven communication and computation in biological neural systems [11,43].Event cameras [39], also known as dynamic vision sensors, represent an epitome of neuromorphic sensing systems.Instead of capturing full frames at a fixed rate, event cameras generate asynchronous events in response to changes in pixel-level brightness.This event-driven approach increases the camera's dynamic range, while substantially reducing power consumption and data processing load [39].
Inspired by the event camera, we design NEURORADAR sensor front-end that only responds to changes in the radar channel (caused by motion) and produces asynchronous spike signals that contain relevant motion information.To attain these properties, we extend the SIL structure and develop spike encoders to convert radar signals into spike trains.A SIL radar, as a variant of Doppler radar [88], is inherently a motion detector, which aligns well with the event-driven neuromorphic sensing principle.Moreover, radars with more complex waveforms, like wide-band Frequency-Modulated Continuous Wave (FMCW) radar, require intricate baseband signal processing (such as FFTs) to extract basic sensing information.This is challenging to implement without ADC sampling and conventional DSP units.In contrast, a SIL radar emits single-tone signals and "demodulates" motion directly from the reflected signals, which eliminates the need for complicated wideband signal processing and facilitates the use of spike encoders.SIL radars feature a simplified architecture, which makes them power-efficient and cost-effective to implement.We elaborate on the SIL front-end design of NEURORADAR in the following sections.

SIL Sensor Design
SIL radar adopts a simplistic architecture with only three RF components: an oscillator, a time delay unit, and a mixer (Fig. 4(b)).The oscillator emits an RF signal that becomes self-injection-locked due to environmental reflections.A time delay unit and mixer demodulate the frequency shift caused by moving targets.The system's total power consumption is kept under 300 by lowering the oscillator's output power and employing a passive low-power mixer.
While the removal of active RF components such as LNA or PA typically results in low sensitivity, the SIL radar's unique architecture provides a sensitivity gain that compensates for the impact.This property ensures that despite the simplified design, a SIL radar can still support our targeted IoT sensing applications that require only limited range/velocity resolution (e.g., occupancy detection, coarse indoor tracking, hand gesture recognition).
A target's motion induces phase modulation on conventional a Doppler radar [57], in contrast to frequency modulation on the SIL radar [90].The demodulation circuit extracts the phase change over the delay time   .As phase is the time integral of frequency, SIL radar's demodulation process inherently integrates and enhances the motion signal [80].A larger demodulation gain and hence high sensitivity can be achieved by increasing   , if the motion frequency is much lower than 1/  , which holds true for our targeted IoT use cases.Furthermore, in SIL radar, the oscillator signal, which contains sensing information, directly enters the mixer without any attenuation.In contrast, for conventional Doppler radar, the mixer's input comes from the attenuated reflected signal.Therefore, SIL radar effectively amplifies the received signal amplitude to the oscillator output level for free.Following the empirical model in [80], we find that the SIL radar can provide a sensitivity gain of around 19.97 dB with   = 80 ns and carrier frequency 915 MHz (corresponding to our implementation), which can be traded for low-power operations.

Array of SIL Sensors
4.3.1 Sensing Information from a Single SIL Radar.Suppose a target is moving randomly within the surrounding area and a continuous frequency shift (Eq.( 1)) of the oscillator is observed.The demodulation circuit integrates the frequency shift Δ () during the delay time   and the demodulated output  () ≈ Δ ()  , if Δ () changes slowly and is considered constant over   .Therefore  () is a direct representation of the motion-modulated frequency signal and from Eq. (1), Here   represents an abstracted gain encapsulating various factors (i.e.demodulation gain, antenna gain, and any practical system loss), and  is the radar cross section (RCS) of the target, which is an unknown parameter and may fluctuate over time.
The range information  () is embedded in both () and [4

𝑟 (𝑡 )
], but this alone is insufficient for localizing the target.Although () includes absolute range information,  () cannot be estimated due to the unknown target RCS .Additionally, as [4

𝑟 (𝑡 )
] is 2−periodic, it only contains ambiguous range information.Moreover, a single sensor fails to provide the angular information of the target.This implies that given a distance  , the actual location of the target could be anywhere on a circle with a radius of  .Therefore, further information is required to precisely localize the target.

Frequency-Diverse SIL Sensor Array Design.
To overcome the lack of range/angle resolution, NEURORADAR combines multiple SIL radar sensors operating at different frequencies, forming a frequency-diverse array (FDA).A colocated sensor array can infer the direction of the target by exploiting the phase difference of the received signals across the sensors, whereas the frequency diversity offers the potential to resolve the range ambiguity.Distinct from traditional antenna arrays that often employ a half-wavelength (/2) spacing to avoid angular aliasing [79], NEURORADAR employs a quarter-wavelength (/4) spacing,  being the average wavelength, because a SIL radar uses the same antenna for both transmission and reception, effectively doubling the phase difference due to antenna spacing.
To quantize the sensing capability of the array design, we derive a model-driven localization and speed estimation process.Consider a linear array of  sensors, where the position and frequency of the -th sensor are ì   and   , respectively.Since NEURORADAR only detects motion, we suppose a target moves at a constant speed within a short time (e.g., 0.5 s), and a total of  observations are made with an interval of Δ.According to Eq. ( 2), when a target is located at ì  and moves with velocity ì , the theoretical observation vector: where is the distance between the target's location at time Δ and the -th radar sensor at ì   .In the model, the amplitude () is not considered because in practice () fluctuates randomly due to the time-varying RCS and provides unreliable information.
Given the real observation vector ì  = [ , ] from the radar sensor array, the location and speed of the target can be estimated by Here computes the correlation between the real observation vector and the theoretical observation vector.
To establish the optimal number of radar sensors required to achieve reasonable resolution, we use the above model to numerically derive the sensing resolution.In the numerical simulation, the sensor operating frequencies are set between 800 MHz and 950 MHz.The simulation considers targets ranging from 0.5 to 7 meters and sensing directions between 45 • and 135 • relative to the sensor array.Target speeds range from 0.5 to 3 m/s, with their moving directions varying between 0 • and 360 • .We randomly sample 100 targets and calculate the average percentage of the area with a correlation value (Eq.( 5)) exceeding -3 dB.This area represents the ambiguity of NEURORADAR.The results depicted in Fig. 5a reveal that the ambiguous area decreases as the number of sensors increases.A smaller ambiguous area signifies a reduction in ambiguous side lobes and a more concentrated main lobe.The empirical findings suggest that an array of 6 sensors is sufficient for NEURORADAR to resolve most targets, striking an effective balance between resolution and array size.
The configuration of frequencies in the sensor array impacts sensing ambiguity.As shown in Fig. 5b, a random frequency permutation results in a significantly smaller ambiguous area compared to an ascending permutation.This finding aligns with previous research on traditional FDA [2,47], which indicated that nonlinear and random frequency offsets result in a range-angle decoupled beam pattern.Fig. 6 presents the location ambiguity area given an observation vector from a target at a specific location.In this instance, we only consider the location for simplicity, assuming that the observation vector can be retrieved even if the target is stationary.Consequently, NEURORADAR takes advantage of this property by adopting a nonlinear frequency offset and a random frequency permutation for its sensor array.

SPIKE ENCODING AND PROCESSING 5.1 Spike Encoder Design
To facilitate end-to-end SNN signal processing, NEURORADAR employs an analog spike encoding circuit to directly transform the SIL radar signals into spike trains.The spike encoder must preserve the essential sensing information in Eq. (2).To this end, the encoder should perform spike rate encoding, in which the firing frequency increases linearly with the amplitude of the input signal.As a result, the phase information in Eq. ( 2) is represented as variations in spike density.
We design our spike encoder based on the aforementioned LIF neuron model [24] (Sec.2.2).The LIF neuron model consists of a current injector, an RC parallel circuit, and a spike firing circuit, as depicted in Fig. 7a.In the human nervous system, a neuron's membrane potential  () rises upon receiving input stimuli  () from other neurons.Once  () reaches a threshold   , the neuron triggers a spike to adjacent neurons and resets its voltage to a resting value   , as shown in Fig. 7b.In the absence of input, the membrane potential decays exponentially to its resting value through a leaky resistance path.
The evolution of  () can be characterized as: Here, the membrane time constant   =  determines the decay time of the membrane voltage, with  being the membrane capacitance and  representing the leaky resistance.Given a constant input  0 from the SIL radar, the spike firing interval can be found by solving the differential Eq. ( 6), When the leaky resistor is large, the spike firing rate (  −  ) , which grows linearly with the input signal.The leaky resistor causes a small input dead zone where no spike is fired even the input  0 > 0, which is implied by Eq. ( 8) that  0 >   −   must be satisfied.We utilize this property and design a proper dead zone to suppress random noise input and avoid spike misfiring.
To encode a continuous signal into a spike sequence without information loss, a sufficiently large spike rate is required.Given a spike sequence with spike time   , and signal bandwidth Ω, must be satisfied to guarantee perfect recovery [15].As a larger spike firing rate increases power consumption, our design strikes a tradeoff between input bandwidth and power consumption.

SNN Design
As the motion signals are converted into multiple parallel spike trains, we design SNNs to process the spiking signals and extract the spatiotemporal features.The overall structure of the SNN includes three main components: spike buffering units, convolution layers, and spike decoders (Fig. 8).Spike buffering units.The input spike sequences initially arrive at the spike buffering units, which are made up of cascaded time delay units.Each delay unit imposes a consistent time delay of    clock ticks, and the output spikes then enter the next-stage time delay unit.In the majority of neuromorphic computing hardware, SNNs are realized using digital circuits, with neuron states being updated synchronously according to a clock tick (e.g., 1 ms).Upon the completion of the input sequence, the spike buffering units concatenate the outputs from all delay units and present the spikes concurrently to the subsequent layer.To improve the performance of the SNN, the buffered spikes are repetitively dispatched to the next layer every   clock ticks.By flattening the temporal dimension of the spike sequence, the spike buffering units simplify the task for the subsequent convolution layers in extracting the temporal features of the spike sequence.A similar method is employed in [4] for processing spike data from the event cameras.
Convolution layers.Convolutional layers are essential components of Convolutional Neural Networks (CNNs) which can detect local spatial patterns and structures within an image.With the spike buffering units flattening the temporal dimension of the input spike sequences, convolution layers can be similarly employed to extract the spatiotemporal features of the spike sequences.Consequently, we design a stack of convolution layers, accompanied by other types of layers such as pooling layers and fully connected layers, to process and classify the extracted features.
Spike decoders.In NEURORADAR, the SNNs are trained in such a way that the output values are represented by the spike firing rate of neurons in the final layer.Eventually, the output spike rate has to be converted into a continuous value that can be interpreted by the sensing applications.For classification tasks, the prediction probability for each class can be determined by applying low-pass filtering to the spikes from each output neuron representing the respective classes.For regression tasks, the output values are represented by an ensemble of neurons.We train decoders to perform a linear mapping between the neuron outputs and the final output following [76].

SNN Training
SNN training is crucial for extracting spatiotemporal features from input data.In NEURORADAR, the trainable parts of the SNNs are the convolution layers and the spike decoders.The ANN-SNN conversion method [66] is employed for training the SNNs in NEURO-RADAR.The method involves training a conventional ANN with the same structure as the desired SNN.Given an initial spike sequence, some spike buffering units fire spikes into the next layer at a constant firing rate, while others do not fire spikes at all.This allows the conversion of the input into a static image with 0/1 binary pixels.We then utilize conventional neuron models, like ReLU neurons, and conventional backpropagation algorithms to optimize the connection weights within the ANN.After training, all the ReLU neurons in the ANN are replaced with spiking neuron models, specifically LIF neurons.Lastly, weight scaling needs to be performed for the SNN to ensure a reasonable spike firing frequency.After completing these steps, the trained ANN is effectively converted into an SNN, which can then process spike input efficiently and accurately.

SYSTEM IMPLEMENTATION
We build a NEURORADAR hardware prototype using discrete components on PCBs, which comprises up to 6 SIL radar channels with different operating frequencies.We also design a motherboard to enable a robust power distribution to each hardware module.
Ideally, for a real neuromorphic system, the spike encoders should directly interface with neuromorphic computing hardware and send spikes to a pre-trained SNN as input.However, due to a lack of highly  specialized neuromorphic processors, we implement the SNN using simulation frameworks that are well-established in the neuromorphic computing research community.Specifically, we adopt the Nengo-DL framework [60] because it supports deep SNN training and accurate emulation of real neuromorphic computing hardware such as Loihi [18].In addition, due to the need for offline SNN training on the simulated neuromorphic computer, we still need to sample the spikes digitally using an FPGA and store the timestamps on a host PC.
Self-injection locked oscillator.For our prototype, our SILO design employs a Clapp oscillator [63] due to its broad oscillation range and tunability in our targeted UHF band.The oscillator is built using an RF transistor along with discrete LCR components, as shown in Fig. 10a.Infineon BFP620 [33] is selected as the RF transistor for its high transition frequency, which ensures sufficient gain for oscillation at higher operational frequencies.RF inductors and capacitors with high self-resonance frequency are selected to ensure the proper function of the oscillator.The output power of this oscillator design is approximately -20 dBm, and a monopole antenna is used for RF signal transmission.For the IC simulation, we design a cross-coupled oscillator due to its high energy efficiency [63].It consists of a pair of transistors and features a differential output, as shown in Fig. 10b.The inductor's quality factor substantially impacts energy efficiency, and in the simulation, we use off-chip RF inductors, with their characteristics detailed by the manufacturer [34].The simulation is carried out using Cadence Virtuoso with a 90 nm generic process design kit.
When an array of SIL sensors is required (Sec.4.3), their frequencies need to be sufficiently separated to avoid mutual injection locking or pulling [62].The minimum required separation is determined by the quality factor of the LC oscillator and the coupled signal strength.We find empirically that with /4 spacing (Sec.4.3), the impact of mutual coupling becomes negligible when the center frequencies are separated by at least 10 MHz.Time delay unit.NEURORADAR requires a time delay to demodulate the motion signal (Sec.4.2).A longer time delay with minimal insertion loss is desired to attain a large demodulation gain, so the goal is to maximize the gain delay product.We choose to use Surface Acoustic Wave (SAW) filters (RFMi SF2098E [64]) to implement the time delay because of their compact size, low attenuation (1.3dB), and reasonable delay time (28ns).SAW filters operate by converting electrical signals into mechanical vibrations and back into electrical signals.As mechanical or acoustic waves propagate significantly slower than electromagnetic waves, a time delay is introduced when a signal passes through a SAW filter.Notably, the bandwidth of the selected SAW filter is 20-30 MHz, while the maximum observed frequency shift of the oscillator is only 100s of kHz.Therefore, the design ensures that the frequency does not shift outside the bandwidth, where the SAW filters exhibit large attenuation.Moreover, just like regular bandpass filters [51], multiple SAW filters can be cascaded to increase the gain delay product.Given the delay (  ) and gain (  ) of a single-stage SAW filter, the optimal number of cascaded SAW filters  can be determined by maximizing the product      .In the actual implementation, we find that cascading 3 to 4 SAW filters yields the strongest baseband signal.
Low-drive mixer.The mixer multiplies the oscillator signal with its delayed replica and extracts the frequency shift during the time delay.For power efficiency, we opt for passive diode mixers which, in principle, exploit the nonlinearity of diodes to achieve signal mixing.While complex single-balanced and double-balanced mixers offer improved linearity and isolation, they need a high local oscillator (LO) power to drive the diodes [48].We instead choose a simple single-diode mixer implemented using a low-barrier Schottky diode (Infineon BAT63 [32]), which operates efficiently at ultra-low drive power.To maintain impedance consistency, we shunt another identical diode to the ground to absorb the negative cycle of the input signal, which can only pass the positive cycle through the mixer.Further, a C-L impedance matching circuit precedes the mixer to maximize power delivery from the SILO, and a single-stage LC low-pass filter follows the mixer to attenuate high-frequency signals, as shown in Fig. 11.The demodulated baseband signal has a much lower amplitude than the dynamic range of the spike encoding circuit.Therefore, we need to add an operational amplifier (based on TI TLV521 [81]) module to boost the signal level.
Spike encoding and sampling.We design spike encoder circuits to emulate a LIF neuron, following the description in Sec.5.1, and Fig. 12.  1 (2N3906 [56]) serves as the current injector of the "neuron" that charges the capacitor  1 with current   () ≈   ()/ 3 . 2 (2N3906) and  3 (2N3904 [55]) implement the spike firing circuit, with the firing threshold   set by  ℎ .When the voltage of the capacitor   1 exceeds the firing threshold   ,  2 and  3 are turned on shortly to fire a spike and discharge  1 through   .The leaky resistor   is directly connected to the ground, setting   to zero.To ensure event-driven output, the input signal is AC-coupled, and two instances of LIF encoders are used to represent the negative and positive parts of the signal, respectively.This ensures no spike will be generated if the input signal is constant.
In order to provide spike samples for the Nengo-DL emulation, we connect all output channels of the spike encoder to a lightweight FPGA-Xilinx CMOD-A7 [21]-for sampling.The FPGA samples the spike sequences by polling the digital I/Os.Whenever a spike is detected, each sampling channel creates a frame that contains the timestamp of the spike and the channel index.To support multi-channel parallel sampling, frames from multiple channels are sequenced together using a first-in-first-out (FIFO) buffering block.The output of the FIFO is connected to a universal asynchronous receiver/transmitter (UART) block, which sends the spike frames to the host PC, where the SNN emulation runs.
Power distribution motherboard for the NEURORADAR array.Given that the aforementioned amplifiers offer high gain to the baseband signal, they make the spiking encoding circuit susceptible to noise because even minor disturbances are amplified and may trigger false spikes.To mitigate this, a motherboard with a dedicated power distribution network is designed to provide each SIL radar channel with a stable power supply and suppress noise along the power supply paths.This is accomplished through the incorporation of Linear low-dropout (LDO) power regulators (TPS7A02 [82]) into each amplifier module and spike encoder, which exhibit excellent noise suppression capability.Additionally, the motherboard's large power plane provides a low resistance path for the supply current, further reducing power supply noise.

EVALUATION
Microbenchmark of the SIL oscillator.We carry out experiments in a multipath-rich lab setting to assess the motion modulation capability of the self-injection oscillator.To control the experimental conditions, we use a 20cm × 20cm aluminum sheet as a representative target, place it at predefined locations, and measure the SILO's frequency using a spectrum analyzer.As Fig. 13a illustrates, the measured frequency shift aligns closely with the theoretical pattern described by Eq.( 1).These results indicate that the frequency of the SILO can indeed be effectively modulated by the movement of nearby reflectors, and the frequency shift pattern is not affected by the presence of multipath clutter.Microbenchmark of the motion demodulation circuit.We proceed validate the motion demodulation circuit by attaching it to a running SILO.A target (an adult) moves away from the radar at an approximately constant speed from 0.5m to 3m.Fig. 13b shows the demodulated signal, which follows the sinusoid pattern consistent with Eq. ( 2).As the target moves away, the reflected signal becomes weaker, causing less frequency variation in the oscillator.The overall baseband signal strength decays approximately proportionally to 1/ 2 .In addition, it can be estimated that the distance the target covers  =  •/2 = 2.60 (close to the ground truth of 2.5 m), where  = 14 is the completed cycles, and the wavelength  = 37.2 cm.Therefore, the result shows that the motion demodulation circuit can effectively convert the frequency shift of the oscillator into a continuous baseband signal.
Spike encoder properties.We profile the spike encoding circuit by applying different DC voltage levels at the input and changing the membrane capacitance (Sec.5.1).Fig. 14a shows the spike density with respect to the input voltage.As analyzed, the spike firing rate can be increased with smaller membrane capacitance.From   = 0 to 0.07 , the spike encoder is in the dead zone and produces no spikes.When 0.07 <   < 0.90 , the spike firing rate increases approximately linearly with the input voltage, as delineated in Eq. ( 8).When   > 0.90 , the spike rate starts dropping quickly down to 0. The spike density plot shows that the spike encoding circuit achieves a one-on-one mapping between the input voltage and spike density.We then connect a real baseband signal from the motion demodulation circuit into the spike encoders and convert it to spike trains.Fig. 14b shows the spike representation of the signal.We find that the spike generation is indeed event-driven and asynchronous, as no spike is generated when the input is 0 V, and the spikes can be fired at any random time without any explicit synchronization signal.
Power consumption of the front-end.Next, we characterize the power consumption of a single-channel SIL radar.The radar frontend comprises three main parts: the oscillator, the baseband amplifier,  and the spike encoder.As shown in Fig. 15a, the system's power consumption is dominated by the oscillator, which is the sole active RF component in the system.Due to the low signal bandwidth, the baseband amplifier can be designed with low-power consumption, consuming merely 20 µW power.The power consumption of the spike encoder is primarily due to the quiescent current induced by resistor dividers that are used to provide a DC bias.Each spike generation only consumes around 90 pJ and therefore has a negligible impact on the total power consumption.Fig. 15b shows the power consumption of the oscillator at popular operating frequencies in the UHF band.The IC version, which adopts a more power-efficient oscillator structure (discussed in Sec.6), consumes less power than the discrete version.The total power consumption of the radar frontend falls below 300 µW, underscoring NEURORADAR's low-power operational capacity across different operating frequencies in the UHF band.

CASE STUDIES
In this section, we implement and evaluate two use cases based on NEURORADAR: hand gesture recognition and moving target localization.For each case, we train and test the SNN, collect spike chain data, and use them to drive the neuromorphic processor emulation in the NengoDL [60].
A model-based method is employed to estimate the power consumption of the signal processing units.The energy consumption of running an SNN can be calculated as follows: Here,   is the energy consumption of updating the status of a neuron, which must be done for all neurons in the SNN for each emulation timestep (1 ms).  is the number of neurons, and   is the number of timesteps.  denotes the energy consumption of a synaptic operation, which includes generating a spike and passing it to other neurons through synapses.We assume that all the SNNs in our comparisons run on the Intel Loihi neuromorphic chip [18], where   = 23.6 and   = 81  .
The energy consumption of a conventional ANN is calculated by summing over all the multiply-and-accumulate (MAC) units: We assume that all the ANNs run on the Nvidia GTX Titan Black GPU, which has a per-MAC energy consumption of   =3.584 GMAC/W [19].Radar signal preprocessing is performed using DSP units.Since FFTs are widely used, we employ energy consumption data from the Texas Instruments (TI) TMS320VC5505 [49] to estimate the energy consumption.For other preprocessing algorithms such as digital filtering, we use the MAC energy consumption data of the TI TMS320C6678 (0.853 GMAC/W) [19].These two TI chips are widely used and represent the state-of-art of DSP unit performance.

Gesture Recognition
We customize NEURORADAR for hand gesture recognition with two setups, one with a single SIL channel at 915 MHz, and the other with three SIL channels with distinct frequencies around the 866/915 MHz band.For the three-channel setup, two antennas are placed on a horizontal line, with a spacing of /4, with  being the average carrier wavelength; and the third antenna is placed above the horizontal line and forms an equilateral triangle with the other two antennas, as shown in Fig. 16a.The gestures are made in front of the antenna plane, facing the center of the triangle.The spacing is designed to be comparable to the displacement of a hand when making gestures, affording more distinguishable signal patterns.
The elevated antenna provides richer information for vertical hand movement (such as "swipe up" or "swipe down").Similar to state-of-the-art gesture recognition radar such as Google Soli [40], we define a set of 12 gestures, as shown in Fig. 17.In the gesture set, hand movement direction is diverse within a 3D space (e.g.push, pull, left, right, up, and down), and some gestures require 2 hands to move simultaneously.The three-channel version is employed to recognize all 12 gestures.However, the one-channel setup only employs one antenna, which is unable to acquire any angular information.We, therefore, select 4 out of 12 gestures (5, 6, 7, 12) that primarily induce a distinguishable pattern in the range domain, for the one-channel NEURORADAR to recognize.
As each SIL radar channel is paired with two spike encoders (Sec.6), the three-channel setup produces six spike sequences, while the one-channel setup produces two, and the timestamps of the spikes are recorded for SNN training.Each gesture sample contains sequences of spikes across a time length of 1.5s.As explained in Sec.5.2, we buffer the input spike sequences, concatenate the output of the delay buffers, and present them together to the convolution layers.We set    = 6 (6 ms), which results in an input dimension of 6 × 250 for the three-channel setup and 2 × 250 for the one-channel setup.In total, we collected 2400 samples, 200 for each gesture.We divide the samples into a training set (1920 samples) and a test set (480 samples) with a random 80/20 split.Although each sample has a fixed length, the starting time of the gesture action is random.We thus perform data augmentation by time-shifting the samples by a     18a, the filtered spike signal at each output neuron is interpreted as the probability of each class, and it needs sufficient time steps to stabilize.Fig. 18b indicates that the SNN needs approximately 80 timesteps to produce recognition results with an accuracy exceeding 90%.This means that after gesture operation input is complete, the SNN needs a mere 80 ms to produce a reliable result, which is sufficient for most applications.The confusion matrix in Table 2 shows that the 3-channel NEURORADAR is able to distinguish 12 different gestures, with gesture #10 slightly less accurate than others.Due to the sparsity of spikes, the neuron energy, which linearly increases with emulation time (shown in Eq. ( 10)), is dominating the overall energy consumption per inference.The number of time steps strikes a tradeoff between higher accuracy and lower energy consumption.
We present a comparison of NEURORADAR with other RF-based gesture recognition systems in Table 3, including works that utilize conventional ANN [69,78], SNN [6,7,68], and other simple machine-learning models [85].When calculating the power consumption of the RF front-end, duty cycling is considered according to the descriptions provided in the corresponding publications.
Compared with multi-RF-chain radar systems [6,7,69,78], NEU-RORADAR demonstrates comparable gesture recognition capabilities.In comparison to ANN-based systems [69,78], NEURORADAR achieves 3 orders of magnitude reduction in end-to-end power consumption.Other SNN-based gesture recognition systems [6,7] still  rely on conventional computing units (i.e., CPU or DSP) for radar signal pre-processing.This factor dominates the signal processing power and diminishes the benefits of using an SNN.Owing to its full-SNN architecture, NEURORADAR achieves a power consumption reduction between 78% -93% in terms of signal processing (pre-processing and SNN).In addition, compared with other singlechannel radar gesture recognition systems with similar gesture sets [68,85], the single-channel NEURORADAR still reduces the power consumption by at least one order of magnitude.

Moving Target Localization
To localize a single moving target with an acceptable level of ambiguity, NEURORADAR employs a 6-sensor array with a /4 spacing and diverse carrier frequencies, as simulated in Sec.4.3.Since NEU-      as shown in Fig. 19.We employ a ZED-2i [75] depth camera to obtain the ground-truth location and speed.We collect 6 segments of 10-minute (600 × 6 = 3600) continuous data for training and testing.For each segment, we allocate the first 480 (80%) of data as training samples, reserving the last 120 (20%) as test samples.This approach helps to mitigate any inconsistencies that might arise between segments, such as issues with frame alignment with the ground truth.We then further segment the continuous data into 2 short frames with a 75% overlap, and each of the short frames becomes a training/test sample.This results in a total of 5742 training samples and 1422 test samples.Again, the input spike trains are buffered (with    = 4) and presented collectively to the SNN, resulting in an input dimension of 12 × 500.From each frame, we evenly selected four data points, yielding four sets of location and velocity data: ( 1 ,  1 ,  1 ,  1 ), ( 2 ,  2 ,  2 ,  2 ), • • • , ( 4 ,  4 ,  4 ,  4 ).These sets were used as labels for the regression problem, thus making the output dimension of the neural network 1×16.The specific structure of the SNN model is outlined in Table 4.
Fig. 20a shows the localization result by combining the output of consecutive frames.Similar to the gesture recognition use case, the SNN needs to run for enough timesteps to yield a reasonable result.Fig. 20b shows with about 150 timesteps, a localization accuracy of 1m can be achieved.The mean squared error for speed estimation stabilizes at 0.25 2 / 2 .The result implies a tracking delay of 150 ms, which is sufficient for our low-velocity indoor applications.Since we are filtering spike sequences to achieve a continuous value, errors are inevitable and the accuracy is impacted.
To showcase the advantages of NEURORADAR, we compare it with a multi-tone (2.4 and 5.8 GHz) Doppler radar system, Doorpler [35], which utilizes a conventional RF front-end architecture (Sec.4.1) and signal processing method.Doorpler is a radar-based occupancy sensing system that can detect zone crossing events and estimate the direction of movement at zone transition spots (e.g.doorways).Similar to NEURORADAR, Doorpler employs an antenna array to acquire angular information of the target and leverages the Doppler effect to infer the moving direction.
Table 5 compares the performance of the two systems.Due to the extra demodulation gain of SIL radar (Sec.4.2), NEURORADAR provides a more extensive coverage area than Doorpler, even with 10 dB lower Tx power.Due to its simple SIL structure and powerefficient design, NEURORADAR achieves 1-2 orders of magnitude of reduction in front-end power.The combination of FDA design and neural network allows NEURORADAR to obtain more abundant and accurate sensing information.Unlike Doorpler, which merely detects crossing events and their direction, NEURORADAR offers both location and speed estimation.At the same time, SNN processing significantly reduces the computational power, and the end-to-end system power consumption is reduced by 97%.
SNN-based radar signal processing.Recent research has attempted to use SNNs to process radar signals to reduce power consumption and latency.A majority of the proposed systems first preprocess the raw radar samples using FFT based on conventional computing units (i.e., CPU/DSP), convert the intermediate data, such as range-doppler matrix, into spikes, and then employ SNNs to perform tasks like gesture recognition [5,7,8,31,37,67,68].Other works analyze the feasibility of using SNNs to perform FFT directly on raw radar samples [6,44,83].In [71], Shaaban et al. uses an SNN to directly perform classification on time-domain radar frames.However, all these methods are based on traditional radar frontends, and the radar signal must first be sampled using ADCs, then processed and converted into spikes digitally.In contrast, NEURO-RADAR adopts a novel front-end that produces spike sequences and can directly interface with energy-efficient neuromorphic computing hardware.
Self-injection locked radar.Wang et al. [90] were among the first to analyze the advantages of SIL radar and demonstrate its ability to perform vital sign sensing.Numerous variations of SIL radar have been explored, such as a single-antenna SIL radar [89], mutual injection locked radar [88], and bistatic SIL radar [87].Tang et al. [80] compared the performance of SIL radar with Doppler radar and proposed innovative designs to eliminate null detection points.Hsu et al. [30] studies a mutually injection-locked oscillator array and demonstrates that by adjusting the tuning voltages of the oscillators, a beamforming SIL radar can be achieved.In contrast, NEURORADAR studies the sensing capabilities of a frequency-diverse SIL radar array, and performs gesture recognition and target localization.
Low-power radar sensing.The RF amplifiers in radar systems constitute a substantial source of power consumption.Prior research has investigated energy-efficient, high power density amplifiers, such as gallium nitride (GaN) amplifiers [17,27,91], to reduce the power consumption of the radar front-end.Radar waveform design represents another approach to power reduction.For instance, impulse radio ultra-wideband (IR-UWB) radars [16,53] emit short-duration pulses that helps achieve lower power compared to the widely used FMCW radar.Another approach for low-power radar design is utilizing existing radio sources.Passive radars, which utilize existing radio sources such as television [36] and Wi-Fi [23,38], eliminate the need for transmitting their own signals.Consequently, they provide a cost-effective, low-power alternative to conventional active radar systems.NEURORADAR offers a distinctive solution to this challenge, achieving low power consumption by incorporating neuromorphic engineering into radar sensing.

DISCUSSIONS AND FUTURE WORKS
Multi-target localization.While our experiments underscore NEU-RORADAR's proficiency in sensing a singular moving target, the model can seamlessly be extended for multi-target sensing.With N targets in the vicinity, Eq. ( 4) can be rewritten as follows: , The observed frequency shift becomes the superposition of the frequency shift induced by each moving target.The potential to localize targets is dependent upon the ambiguity area resulting from solving the optimization problem in Eq. ( 5).As an increase in targets would induce more ambiguity in the observation, a larger number of sensors are required to distinctly localize each target.Subsequently, following the same single-target paradigm, the SNN can be trained to identify multiple targets based on the observations.Reducing the form factor.For broader real-world applicability, minimizing the system's form factor is pivotal.Though the majority of the NEURORADAR components can be integrated into an IC, the system's form factor is predominantly dictated by the antenna dimensions and inter-antenna spacing.The monopole antennas we employ, while robust, are size-intensive.Replacements like compact PCB loop antennas, often adopted in low-frequency IoT devices [50], could be more feasible.In addition, the current implementation has restricted the system to the UHF band because of the frequency limitation of the SAW filters (below 3 GHz).Developing NEURORADAR sensors at higher frequencies can substantially reduce its form factor, due to smaller antennas and a more compact inter-antenna spacing.We leave the exploration of such solutions for future work.
Limitations and suitable applications.While the SIL radar's streamlined architecture facilitates energy-efficient operation, its free-running oscillator remains vulnerable to external disturbances.In-band signals from external sources may cause frequency pulling or injection locking [61] to the oscillator, thereby jeopardizing the SIL radar's sensing capability either partially or wholly.Moreover, SIL radars need to be stationary to function because otherwise, all the reflectors in the environment would be moving relatively to the radar, each adding a frequency shift to the oscillator, and making it impractical to perform sensing.Nevertheless, in a controlled indoor setting devoid of interference, NEURORADAR can support various applications such as surveillance systems, vital sign monitors, motion tracking, and gesture-based controls.

CONCLUSION
In this work, we have introduced NEURORADAR, a novel and pioneering approach in radar systems that fully embraces the principles of neuromorphic sensing.Through the joint design of analog hardware and spike signal processing, NEURORADAR achieves superior energy efficiency.Through gesture recognition and localization tasks, NEURORADAR has demonstrated its capability while maintaining a power consumption significantly lower than that of traditional radar systems.This research marks a significant step forward, providing a unique and innovative solution for radar sensing in energyconstrained IoT devices.

Figure 1 :
Figure 1: Analogy of neuromorphic radar sensor.NEURORADAR achieves energy-efficient sensing by emulating the structure and functionality of biological sensing systems.

Figure 5 :
Figure 5: Localization ambiguity area with different settings.

Figure 6 :Figure 7 :
Figure 6: Localization ambiguity area.The true target location is marked with ×.

Figure 8 :
Figure 8: Illustration of the SNN structure.

Figure 18 :
Figure18: Gesture recognition results.Fig.18summarizes the gesture recognition outcome.As shown in Fig.18a, the filtered spike signal at each output neuron is interpreted as the probability of each class, and it needs sufficient time steps to stabilize.Fig.18bindicates that the SNN needs approximately 80 timesteps to produce recognition results with an accuracy exceeding 90%.This means that after gesture operation input is complete, the SNN needs a mere 80 ms to produce a reliable result, which is sufficient for most applications.The confusion matrix in Table2shows that the 3-channel NEURORADAR is able to distinguish 12 different gestures, with gesture #10 slightly less accurate than others.Due to the sparsity of spikes, the neuron energy, which linearly increases with emulation time (shown in Eq. (10)), is dominating the overall energy consumption per inference.The number of time steps strikes a tradeoff between higher accuracy and lower energy consumption.We present a comparison of NEURORADAR with other RF-based gesture recognition systems in Table3, including works that utilize conventional ANN[69,78], SNN[6,7,68], and other simple machine-learning models[85].When calculating the power consumption of the RF front-end, duty cycling is considered according to the descriptions provided in the corresponding publications.Compared with multi-RF-chain radar systems[6,7,69,78], NEU-RORADAR demonstrates comparable gesture recognition capabilities.In comparison to ANN-based systems[69,78], NEURORADAR achieves 3 orders of magnitude reduction in end-to-end power consumption.Other SNN-based gesture recognition systems[6,7] still

Figure 19 :
Figure 19: The area of interest for target location.The maximum distance from the radar is around 6 meters.

Table 1 :
SNN specification for gesture recognition.

Table 2 :
Confusion matrix for gesture recognition.

Table 3 :
Comparison of radar systems for gesture recognition.R stands for Range, D for Doppler, and A for Angle.2conv.means the neural network comprises convolution layers. 1

Table 5 :
Indoor tracking system performance comparison.