Poster: Listening to Earth's Voice: Advanced Vehicle Recognition through Seismic Sensing

Vehicle recognition approaches have recently gained remarkable interest in improving intelligent transportation systems, especially in harsh weather or insufficient illumination conditions. Vehicle recognition using seismic waves is a novel technique that records the vibrations of vehicles using geophones. Hence, SeismicSense is introduced, a deep learning-based system that is trained using vehicle vibrations. Moreover, the proposed system includes different modules that ensure its robustness against noise. A real data set is collected and leveraged to implement our proposed SeismicSense. Evaluation Results ensure its remarkable accuracy in precise vehicle recognition.


INTRODUCTION
Vehicle recognition (VR) schemes are integral components of intelligent transportation systems, enabling applications such as automated parking, traffic control, road condition monitoring, smart cities, and autonomous driving [3].Accurate VR is crucial for recognizing vehicle dimensions& weights and facilitating authorities' redesign of road infrastructure to optimize traffic flow and enhance road safety.Advanced technologies, including machine learning (ML) and sensing, have improved the accuracy and efficiency of VR systems.However, existing systems vary in attributes, operational settings, and costs.Vision-based systems [4], for example, cover large areas but struggle in severe weather and challenging lighting conditions, while inductive loop detectors [2] offer high ACM ISBN 979-8-4007-0581-6/24/06. . .$15.00 10.1145/3643832.3661430accuracy but face implementation challenges and high costs.Our approach addresses these challenges by leveraging seismic sensing technology generated from geophones.
Seismic sensors offer straightforward deployment, requiring minimal modifications to existing infrastructure such as roadways, sidewalks, or buildings [3].They are already deployed in many buildings and roadways for earthquake monitoring purposes.This simplicity streamlines the setup process, reducing time and resource requirements, particularly in urban environments.Additionally, seismic sensors provide extensive coverage when strategically placed at intersections, highways, or urban centers, allowing authorities to analyze traffic patterns and optimize transportation infrastructure effectively.Moreover, seismic sensors exhibit resilience to environmental conditions, mitigating concerns related to adverse weather, low light, or visual obstructions.
Motivated by these advantages, we introduce SeismicSense: a system that leverages seismic sensors to record vibrations induced by vehicle passage for vehicle recognition.However, SeismicSense needs to address challenges such as noise, interference, and the non-stationary nature of seismic signals that hinder the accurate classification of vehicles.This difficulty is illustrated in the patterns of different vehicles shown in Fig. 1.To overcome these challenges, SeismicSense incorporates modules to mitigate noise and automatically map data into a new feature space where patterns of different vehicles are easily discriminative.This is achieved through a novel design of a deep Neighborhood Component Analysis (NCA) network, ensuring reliable operation of the vehicle recognition system.

THE SYSTEM DETAILS
The architecture of the system consists of the following modules.

Pre-processor
This module is crucial for optimizing seismic data quality and reliability, ensuring it is clean, normalized, and suitable for further analysis.Seismic data naturally contains amplitude variations, influenced by factors like source distance or receiver sensitivity.To address this variability, a key step in the pre-processing pipeline is applying min-max normalization.This process standardizes the data to a uniform range between 0 and 1, improving compatibility across different sources and receivers and enabling more precise and meaningful analysis.Additionally, this module converts the continuous stream of measurements into fixed vectors of u timesteps, each corresponding to 3-second measurements.This conversion is necessary to prepare the data for input into the subsequent 1D CNN module.This module also includes a data augmentation process aimed at preventing overfitting and improving the model's generalization capacity.This involves augmenting the original seismic data with synthetic counterparts to enrich the training dataset.The augmentation technique primarily adds random noise to the vibration waveforms, thereby altering their Signal-to-Noise Ratio. Figure 2: Network Architecture.Figure 3: The SeismicSense's performance.

Feature Extraction Model
This section describes the feature extraction model used in Seis-micSense to extract features from sequential seismic data.The approach leverages a Deep Neighborhood Component Analysis (NCA) learning method, which consists of a convolutional neural network (CNN) architecture followed by a layer to compute the distance metric, as shown in Fig. 2. The CNN component of the feature extraction model is tasked with learning hierarchical representations of the input sequential seismic data.It is composed of multiple convolutional layers followed by max-pooling layers to extract spatial features from the data.The architecture of the CNN is designed to capture both local and global patterns in the sequential seismic data.The output of the CNN is a learned feature map that encodes relevant spatial features extracted from the sequential seismic data.Following the CNN component, the feature extraction model includes an NCA layer.This layer learns a distance metric in the feature space to optimize the similarity between samples of the same class and the dissimilarity between samples of different classes.This is achieved with a distance metric defined as the triplet loss function: as follows:

Vehicle Recognition Model Training
The module is responsible for training a vehicle recognition model to efficiently classify the types of vehicles present in the sequential seismic data.Given the learned feature embeddings generated by the feature extraction model, the recognition model employs a shallow feedforward neural network (FFN) architecture.The decision to utilize a shallow FFN is based on the observation that the extracted features from the feature extraction model are separable and exhibit clear boundaries between classes, and thereby easily distinguishable.The FFN consists of two fully connected dense layers, which process the learned feature embeddings and perform classification.The final layer of the FFN typically consists of softmax activation, which outputs probability distributions over the possible vehicle classes.This enables the model to provide a probability score for each vehicle class, indicating the likelihood of the input data belonging to that class.During the training phase, the parameters of the FFN are optimized using backpropagation with Adam.The training objective is to minimize the categorical cross-entropy between the predicted class probabilities and the true labels of the input data.

EVALUATION 3.1 Experimental setup
A large dataset is collected at Kyushu University [1,3] was used to validate the performance of the proposed system.Three geophones were positioned 15 meters apart, 0.5 meters from the road, to measure the vertical vibrations at a rate of 250Hz.We classified vehicles as large (such as buses and lorries), medium (such as private passenger cars), and tiny (such as motorcycles and scooters) in addition to seismic noise that was present during the data collection process.A video was recorded during the experiment to give a visual reference for the manual compilation of the training data.
When the car was close to the geophone, each event (the passage of a vehicle) lasted 2-3 seconds.The average speed of the vehicles utilized in this experiment was 25.35 km/h, with a top speed of 45 km/h.

Results and Discussion
To facilitate a rigorous evaluation, the data was partitioned into three subsets: training (60%), validation (20%), and testing (20%).This ensures balanced representation and reliable analysis.The testing set served as an unbiased assessment of the model's generalization.The evaluation of performance for the proposed method is encapsulated within Fig. 3, wherein the testing data set encompasses 259 buses, 439 cars, 228 motorcycles, and 469 instances of noise.The figure confirms the efficacy of the proposed proposed method.Notably, most misclassifications arise from the inherent difficulty in distinguishing between motorcycles and cars.This challenge arises due to the existence of lightweight car variants that closely resemble the weight characteristics of their heavy motorcycle counterparts.However, despite encountering such confusion, the system remarkably attains an accuracy rate of 97.3%, denoting its robustness and resilience.