StableLev: Data-Driven Stability Enhancement for Multi-Particle Acoustic Levitation

Acoustic levitation is an emerging technique that has found application in contactless assembly and dynamic displays. It uses precise phase control in an ultrasound transducer array to manage the positions and movements of multiple particles. Yet, maintaining stable mid-air particles is challenging, with unexpected drops disrupting the intended motion and position. Here, we present StableLev, a data-driven pipeline for the detection and amendment of instabilities in multi-particle levitation. We first curate a hybrid levitation dataset, blending optimized simulations with labels based on actual trajectory outcomes. We then design an AutoEncoder to detect anomalies in the simulated data, correlating closely with observed particle drops. Finally, we reconstruct the acoustic field at anomaly regions to improve particle stability and experimentally demonstrate successful dynamic levitation for trajectories within our dataset. Our work provides new insights into multi-particle levitation and enhances its robustness, which will be valuable in a wide range of applications.


INTRODUCTION
Sound wave harnessing to levitate and manipulate objects has captured the interest of the Human-Computer Interaction (HCI) community due to its revolutionary possibilities.Acoustic levitation uses acoustic radiation pressure to create mid-air traps for levitating objects.This is achieved through phase retrieval algorithms, such as IBP [24], NAIVE [34], and GS-PAT [34], which shape the activation signals for a phased array of transducers (PAT).
This technology has spurred novel applications within the HCI and Graphics community in the past 10 years, like levitation-based volumetric display [18,28,31,34], data physicalisation [13,30], physical interaction with mid-air contents [2,20], contactless assembly and printing [7,8], etc. Platforms like OpenMPD [27] have democratized access to levitation systems, offering a comprehensive software tool for Unity-based application development.Yet, as we continue to innovate and push the boundaries of displays, assemblies, and interactions at the dynamic application level, there's a growing demand for more robust levitation to achieve widely accessible and user-friendly applications, especially given the increasing number of levitation points and evolving trajectories in the real world, which puts forward higher standards in phase retrieval algorithms to meet the demands of more complex levitation applications.
Existing multi-point phase retrieval algorithms predominantly rely on simulated outcomes and overlook potential discrepancies when applied to real-world dynamic levitation scenarios.Dynamic levitation introduces myriad complexities that lead to unpredictable failures.Recent empirical studies, such as those by ArticuLev [8] and DataLev [13], underscore this challenge, reporting variability in success rates during assembly and animation phases.These findings illuminate the pressing need for enhanced strategies, especially as the complexity of 3D movements escalates with an increasing number of traps.
A significant hurdle in advancing the study of dynamic multipoint levitation has been the absence of a dedicated dataset to foster rigorous analysis and algorithm development.Recognizing this gap, our initial endeavor is the formulation of a comprehensive dataset.With over 180,000 data points sourced from two distinct phase retrieval solvers (NAIVE and GSPAT), this dataset serves as the backbone of our data-driven approach.By combining the insights of our study with the potential of this dataset, we aspire to spark a surge in innovative acoustic levitation research, driving both dataset expansion and the exploration of more novel research avenues.
Building on this dataset, we introduce "StableLev", a multi-particle stability prediction and enhancement pipeline tailored to harness the power of dynamic levitation data.With the goal of pinpointing and subsequently rectifying unstable segments within levitation trajectories, StableLev employs a sophisticated combination of recurrent neural networks (RNN) like long short-term memory networks (LSTM) or gated recurrent unit (GRU) with AutoEncoders (AE) and variational AutoEncoders (VAE).This intricate model architecture yields a prediction performance of approximately 90%, improving the reliability and precision of dynamic levitation stability predictions.
In this paper, we make three key contributions that advance the pursuit of stable and reliable acoustic levitation: (1) The curation of a first-of-its-kind comprehensive dataset, amassing over 180,000 data points from two distinct solvers, acting as a catalyst for data-driven approaches in future research.
(2) The development and introduction of "StableLev, " a state-ofthe-art stability prediction and enhancement pipeline.By integrating RNNs with VAE and AE, our model achieves a performance of 90% ( -score = 0.9) in predicting and rectifying unstable levitation trajectories.(3) A series of detailed evaluations demonstrating StableLev's efficacy and applicability in real-world scenarios, ensuring dynamic levitation with unmatched stability.These advancements not only enhance the robustness of current acoustic levitation systems but also provide a robust foundation for future innovations in the field.

RELATED WORKS 2.1 Applications with Acoustic Levitation
Recent advancements in acoustic levitation have positioned it as a versatile interface across diverse fields.For the purpose of physical displays, floating charts [30] employed the positions of expanded polystyrene (EPS) particles to depict mid-air scatter plots, effectively transforming static data into interactive, tangible forms.This novel representation demonstrated the ability of acoustic levitation to enable immersive data physicalizations.To represent more intricate shapes, both LeviProps [28] and ArticuLev [8] introduced optimized levitation structures and assembly processes.They integrated complex primitives such as threads and fabrics, offering the possibility of multi-material, dynamic physical displays.Moreover, DataLev [13] harnessed levitation to produce reconfigurable, multimodal data physicalizations with enhanced materiality.By combining different materials, DataLev not only presents data but also provides multi-sensory feedback, enriching the user experience.
LeviCursor [2] introduced a mechanism for manipulating and stabilizing a levitating particle.Focusing on indirect interactions, it offers a unique method where distance-based interactions between a finger and the particle create a way to select and interact with levitated particles.TipTrap [20] improved the interaction by enabling closer proximity of the finger to the levitated particle, utilizing sound scattering from the finger to create a levitation trap for direct, co-located interaction.
Leveraging the benefits of contactless manipulation, acoustic levitation has found novel applications in food delivery and 3D printing.In food delivery systems [38], acoustic levitation has been used to levitate and deliver food (like miniature burgers) as well as drinks (like gin and tonic), promising to redefine the gastronomic experience by allowing chefs to play with food textures and presentations [39].Additionally, in the domain of 3D printing, it aids in maneuvering UV resin and sticks, opening doors to innovative design possibilities and structures that were previously challenging or impossible to achieve [7].

Acoustic Levitation Principles and Advances
Initial acoustic levitation studies employed a single ultrasonic transducer on a Langevin horn directing towards a reflective surface [42].These "single-axis levitators" produce standing wave acoustic fields, allowing small particles to levitate at points with minimal acoustic pressure.Here, the Gor'kov potential ( ) is minimum, and the acoustic radiation force F = −∇ , given by the potential's negative gradient [4], becomes zero, thus holds particles in the trapping positions.Particles that are close to standing wave antinodes get pushed towards the nearest potential well [1].
Recently, phased arrays of transducers (PAT) have replaced the Langevin horn in acoustic levitation devices [29].They offer more acoustic energy, arbitrary movement of levitated particles beyond the antinodes of a fixed standing wave, and the generation of multiple acoustic traps [24].A PAT-based levitator includes a total of  transducers arranged into an array or set of arrays [25].The phase delays  ∈ R  of these transducers give rise to different complex acoustic fields  ∈ C  at  points in space.For the  th point in space,  = [1, ...], this complex pressure is given by where  , describes the acoustic transmission of a transducer  to a point , given by the piston model [34].Using phase retrieval algorithms, one can optimize the transducers' phases  to generate acoustic fields that include local standing wave patterns that correspond to acoustic traps (Figure 1).To guarantee the generation of proper sinusoidal standing wave patterns, i.e. the interchange between acoustic pressure minima and maxima, it has been quite common to compute phases that maximize the Laplacian of the Gor'kov potential (known as trapping stiffness) [7,25,28].When trapping stiffness is maximum, the acoustic radiation forces converge to the trapping positions.
However, many applications such as volumetric displays [12,18,34] require high update rates for the transducer phases (i.e., >10kHz), and computing trapping stiffness via finite difference derivatives involves computing acoustic pressure at many points per trap (i.e., 55 points in [17]) using Eq 1. Instead, direct minimization of the Gor'kov potential (or its simplification) that only considers spatial derivatives in the principal acoustic wave propagation direction [17] has provided more viable alternatives for fast multi-point levitation even in the presence of scattering objects.The most computationally efficient acoustic field optimization for levitation is to compute acoustic pressure only at the trapping location, aiming to create high amplitude focal points that can be converted into acoustic tweezers via standard levitation signatures [25].In symmetric levitation setups (e.g. when no rigid obstacles are involved), the acoustic pressure of focal points linearly correlates with their trapping stiffness after the application of levitation signatures [34].

Acoustic Levitation Stability
While advances have been made in acoustic levitation, particles frequently fall or stray from desired traps during experiments.Singleaxis levitation studies have explored particle stability concerning size and medium viscosity [10], and others have analyzed levitated droplet oscillation [16].However, while PAT levitation largely draws from single-axis principles that assume a plane standing wave field, the exploration of instabilities in dynamic levitation, especially involving multiple particles moving in free space, remains limited.
Recently, it has been shown that for large displacements and thus fast movements of a single particle in a PAT levitator, the assumption that forces are linear in the vicinity of the trapping position is not valid anymore.Instead, the trapping stiffness becomes non-linear, which explains the period-doubling bifurcation of levitated particles [11].Furthermore, in contrast to single-point levitation, which usually involves shifts of a focusing phase map to move particles in 3D space, multi-point levitation requires optimization of transducers' phases as described in the previous section.So far, this optimization has been time-invariant, i.e. the transducer phases   for different movement time steps  = [1, ... ], are optimized independently, which leads to high phase changes   among transducers between movement frames.High abrupt phase changes lead to amplitude fluctuations in the transducers' emission [37].That is, the delivered acoustic energy diminished and is not sufficient to keep particles levitated in mid-air.
On the other hand, current research on acoustic levitation has only resolved particle misplacements but not particle drops.In an HCI context, LeviCursor used a motion capture system to avoid particle placements at the (weaker) secondary traps of the acoustic tweezer standing wave pattern [2], while LeviProps performed a simulated annealing to find the trapping positions of highest trapping stiffness to hold an acoustically transparent fabric in mid-air [28].Finally, other studies have been occupied with numerical optimization of single-particle trajectories [31], so that the showcased experimental trajectories better match the desired ones, effectively reducing particle misplacements.

LEVITATION DATASET
In this section, we detail the creation of our hybrid levitation dataset, which combines simulated and experimental data.This approach addresses the limitations of existing phase optimizations that excel in static simulations but falter in dynamic scenarios.We conducted levitation experiments on various multi-particle trajectories, recording particle positions and categorizing outcomes.Our research offers the first extensive levitation dataset based on rigorous feature extraction.This paves the way for deeper insights into acoustophoretic platforms and inspires future ML-driven applications.
Figure 2 graphically depicts our dataset generation process.This section outlines our levitation setup and the creation of multiparticle trajectories used for both experiments and model training/evaluation.We subsequently explain the intricacies of acoustic field propagation, optimization, and the hardware setup for capturing particle positions in transducer phase control experiments.We conclude this section by addressing the pre-processing steps for the analytical and observed features, setting the stage for the data-driven models discussed later.

Multi-particle Trajectories
We utilize a path planning algorithm [5] to generate feasible motion trajectories for multiple particles.We use a top-bottom 16 × 16 PAT levitation setup which has been adopted as the standard configuration for levitating multiple particles [8,12,13,18,20,27,28,34].While levitating 2 or 4 particles is straightforward, higher particle numbers often lead to more unsuccessful attempts.This arises from diminished acoustic energy per trap and increased particle occupation in the working volume.Additionally, due to the transducers' directivity, traps generated away from the central axis are also marginally weaker.A recent study using the same setup [13]) shows that the success rates of 3D animation with 4, 6, and 8 particles are 90%, 60%, and 40%, respectively.Considering the balance of difficulty, we focus on generating data for 6-particles in the system.
Initially, particles are randomly assigned 3D start and end positions within the working volume, We then create collision-free trajectories, maintaining minimum horizontal and vertical distances of 1.4cm and 3cm between particles.Each particle moves at a unique velocity, with a maximum of  max = 0.1ms −1 as in [13].The path planning algorithm provides checkpoints and constant speeds between them for each particle.With the PAT's update rate of 10 kHz, we interpolate trapping positions between waypoints, determining

Analytical Features
In this section, we acquire simulated data for multi-particle trajectories in the form of time series.First, we compute the acoustic transmission between transducers and trapping positions (see Eq. 1) and then optimize the transducers' phases   for each time step .To generate a varied levitation dataset, We employ different multi-point phase retrieval solvers integrated within the OpenMPD developing platform [27], such as the NAIVE and the GS-PAT [34] algorithms.Both algorithms use one pressure point per trap to estimate the phase at the focal points (or traps) and compute the phases of the transducers   using backward propagation.Considering the attributes of different algorithms, the NAIVE algorithm does not optimize the focal points' phases but rather assumes that all points will share the same phase.For this reason, it is very common that among multiple particles, there will be some outliers of lower focal point amplitude [34].On the other hand, GS-PAT iteratively optimizes the estimated point phases, and the generated focal amplitudes are generally higher in simulation.However, optimizing phases leads to large phase changes   between successive time steps, which causes transducer amplitude fluctuations [37], and thus weaker traps and particle drops.Thus, we merge the analytical features generated by different phase retrieval solvers to reflect different issues that arise in levitation experiments.
Our analytical features mostly include concise data representations based on the complex acoustic field generated by the computed transducers' phases   and Eq. 1.In this way, we can acquire smaller size data representations by computing the pressure   only at the trapping positions, or more physics-oriented data for levitation, like trapping stiffness   at these points.Similarly, we can compute the phase change   between the computed focal points among time steps, as any large phase changes   in the transducer domain will transfer to the (much fewer) focal points.Notably, to incorporate the periodicity of phase values (i.e.,   ,   ∈ [−, ] into phase changes, we calculate the absolute phase change as shown in the equation 2. (2)

Experimental / Observed Features
Using the motion trajectories from Section 3.1, we conduct levitation experiments to observe actual trajectories and assess motion stability.The multi-point levitation solvers continuously adjust the trajectory trap positions at a 10kHz rate using optimized transducer phases.We pair this with the OptiTrack Flex motion capture system, consisting of infrared cameras equipped with LEDs, to track real-time motion trajectories (See Figure 3).These cameras capture reflections from levitated particles to determine their 3D positions.Six cameras, placed at varied angles, cover the levitation volume.
Despite the system's 120Hz tracking rate being slower than the levitator's 10kHz update, it's apt for our purpose since the particles move slowly, keeping the dataset manageable.
To minimize the impact of external factors leading to unexpected drops, before initiating the experiment, we inspect every transducer to confirm they are functioning and ensure no wind disturbance is present.During experiments, we use 2 mm-diameter EPS particles for levitation and mitigate the particle initialization displacement (to a secondary trap) by using the tracking system's feedback.We also ensure the PAT operates at a consistent performance without overheating (i.e., taking a break and allowing it to cool every 30 minutes).
We track each group of motion trajectories three consecutive times and record actual trajectories.When a displacement between the target and the actual (i.e., captured) trajectories of a particle becomes larger than 10mm, we consider that particle to have dropped.If all three attempts are completed without any particle dropping, we label the group as 'stable' (normal); otherwise, we label it as 'unstable' (abnormal).Those outcome features of motion performance will be used in later model training and evaluation.The tracking process takes approximately 40 hours.

Data-processing and Dataset Composition
To align the camera's coordinates with the levitator's, we applied an affine transformation to the position data.Before trajectory experiments, we established this transformation matrix by levitating a particle at 27 predefined locations (a 3D grid of 3 × 3 × 3) and recording its positions using the motion capture system.This matrix then corrects the captured data to the levitator's coordinate system, reducing position-tracking biases.
When tracking multiple particles, the motion capture system can mismatch particle positions, complicating the task of matching particle indices to position data.We address this by employing the Hungarian algorithm [21] to optimize assignments based on proximity between target and captured positions at each time step, ensuring accurate tracking trajectories.Additionally, due to optical variations and particle occlusion, some positions might be missed.These gaps are filled using interpolation to provide a comprehensive trajectory.
We merged the analytical and refined experimental data to craft our final dataset, which consists of time-series data from 200 groups spanning 902 time steps each.These groups are broken down into 90 from the NAIVE solver and 110 from the GS-PAT solver.This dataset encompasses analytical features like the Gor'kov potential ( ), trapping stiffness (), focal point amplitude (), phase ( ), amplitude change (Δ), point phase change (Δ ), and average transducer phase change ( Δ).The outcome labels are detailed in Table 1.Notably, while we used the same motion trajectories for the two solvers, the resulting features and instabilities varied.The phase instabilities of GS-PAT are more frequent and can affect all acoustic traps, leading to more abnormal groups.In contrast, the NAIVE algorithm shows higher success rates as its amplitude discrepancies usually concern individual traps.Our hybrid dataset is available online 1 .

STABLELEV
While research has noted instabilities and drops in multi-point levitation [8,13], there's no documented study predicting such behavior during dynamic levitation of multiple particles.Furthermore, strategies for improving stability in dynamic settings are uncharted in the current literature.Addressing this gap, we introduce Sta-bleLev, a data-driven solution for optimizing multi-particle stability.This method unfolds in three phases: From our dataset and expertise, we pinpoint essential levitation characteristics (stage: feature curation).Using diverse deep neural network models, we spot anomalies in unstable trajectories (stage: anomaly detection).We rectify detected anomalies, bolstering motion steadiness (stage: anomaly amendment).

Feature Curation
In our analysis, we utilized a feature correlation heatmap (Figure 4) to determine the relationships between each analytical feature in our dataset (section 3.4).The heatmap color scale indicates the strength of feature correlations.Notably, Gor'kov potential ( ), stiffness (), and focal point amplitude () emerge as tightly interrelated, all indicating trap intensity.Among these, we prioritize focal point amplitude () due to its computational efficiency, needing just a singular pressure value per trap, unlike the Gor'kov-associated features that demand more complex computations [25].Additionally, the heatmap reveals phase change (Δ ) as another critical factor impacting trap intensity.Rapid phase transitions during motion adjustments might induce transducer emission fluctuations [37], an aspect not reflected in intensity-related analytical features.This realization underscores the significance of the point phase change (Δ ) as an additional feature that can provide valuable information for anomaly detection.

Anomaly Detection
Having identified the focal point amplitude () and point phase change (Δ ) as pivotal features, our next aim is to detect instances of particle drops during the dynamic levitation process.We characterize these drops as anomalies in a time-series dataset, posing unique challenges due to their unbalanced occurrences and unpredictable behaviors [9,32].Deep learning has shown remarkable capabilities in learning underlying features to detect anomalies [32,41].AutoEncoders (AE) emerge as a promising solution to this anomaly detection challenge.AEs excel at uncovering non-linear correlations in datasets, crucial for recognizing subtle, unpredicted deviations.Their encoderdecoder mechanism efficiently reduces data dimensionality, emphasizing crucial features while filtering out noise.This ensures accurate anomaly identification even within intricate datasets.[6,35].
To further bolster anomaly detection capabilities, we integrate deep AEs with the temporal dynamics of long-short-term memory (LSTM) and gated recurrent unit (GRU) architectures and the robustness of deep generative models such as variational AutoEncoders (VAE): LSTM: A specialized form of RNN, LSTMs adeptly handle longterm dependencies in sequential data.Equipped with memory cells and gate units, they can filter noise and retain significant patterns, rendering them particularly effective for our anomaly detection challenge [15].
GRU: A variant of RNNs, GRUs address gradient issues inherent in traditional RNNs.Their memory cells and gated units, including the update and reset gates, make them adept at processing complex time-series data and detecting anomalies [14,36].
VAE: Variational AutoEncoders blend probabilistic generative models with deep neural network capacities.Their encoders output conditional probability distributions, thus allowing superior data reconstruction, making them valuable for modeling standard behaviors in anomaly detection [23,43].
Using these building blocks, we propose three hybrid anomaly detector models, namely LSTM AE, GRU AE, and LSTM VAE, all built on the AE framework.Figure 5 shows our hybrid anomaly detector designs, where we can represent the encoder and decoder units with different components, such as LSTM or GRU layers.For LSTM VAE, we represent the encoder unit by LSTM layers followed by the distribution function (mean and variance) to learn the encoder features and decoder unit by LSTM layers, respectively.A detailed exploration of each unit's functionality can be found in Homayouni et al [19].Table 2 presents the specifications and hyper-parameters of hybrid AE models.
Utilizing our time-series dataset, our models train on the selected features of each levitated particle.Through the AE's encoderdecoder framework, our approach discerns patterns in stable trajectories and pinpoints anomalies in unstable ones.The results showcasing the efficacy of our anomaly detection approach are detailed in Figure 6, 7 and Section 5.1.

Anomaly Amendment
Following our anomaly detection process, we take corrective measures in the anomaly regions to rectify potential instabilities.Of the two prominent analytical features critical for detecting anomalies in real levitation trajectories, namely the focal point amplitude and phase change, we prioritize rectifying anomalies linked to amplitude (see results in Section 5.2).
Though our AE models predict absolute phase change values (Δ ), and given that our dataset also includes the focal point phase ( ), we could technically estimate phase information for amending anomalies.However, this estimation process introduces complexities.Namely, it necessitates additional constraints to ensure that the AE-predicted phase changes remain minimal across consecutive time frames.Presently, significant phase changes are managed by interpolating over the entire phase change range across various time frames.This interpolation, however, hampers the speed of focal point generation and transitions [37].
In practice, multi-point levitation solvers determine transducer phases based on a consistent target point amplitude for each trap, aiming for uniform intensity across traps.However, our anomaly detection and observation results indicate that this ideal criterion often remains unachieved in real experiments, leading to unintended particle drop.A direct solution to this is adjusting the amplitude of specific, unstable particles within identified anomaly regions.Given the anomaly time windows, we can trace back to the corresponding positions along the trajectory, where we create target trap points.At these points, we deliberately increase the target amplitude for unstable particles while maintaining the amplitude for stable ones.

Hybrid AE Models Training and Anomaly Detection
To train our hybrid models, we first represent the time series data set of the selected features into a sequence of time windows (  ), where the selected features of all traps at time step t of each time window can be depicted as  1, ,  2, , . . .,  , , where, k = 1,. . ., total features of all traps.We explored various "look-back periods" or time window sizes (10, 50, 100, 200, and 500) to train our AE models.Given the abrupt variations in our feature dataset's short time steps and the 200 groups of sequences, each with 902 time steps, larger windows proved inefficient and unwieldy.Hence, we settled on a time window size ((  ) = 20) to train our model.
We split the dataset from Section 3.4 into training (77 normal samples: 55 NAIVE, 22 GSPAT) and test sets, which contain both normal (20 NAIVE, 10 GSPAT) and abnormal samples (15 NAIVE, 78 GSPAT).Given the differing scales of the selected features, we apply the linear Min-Max scaling method, transforming the feature values as, Here,  ranges between [0-1], with min normal and max normal denoting the training dataset's minimum and maximum values.Utilizing 5-fold cross-validation [40], we train hybrid models on sequential data X, encoding with LSTM, GRU, or VAE, then reconstructing to X.The aim is minimizing the mean squared reconstruction error (  = MSE( − X )).Table 2: Hyperparamters of hybrid AutoEncoder models.
Utilizing trained models, we predicted on the test dataset, leveraging the reconstruction error threshold  to distinguish sequence types.To select the proper threshold, we determine it with the F-score as the metric of choice [41].For our detection purpose, we prioritize Precision (finding as many anomalies as possible) and Recall (minimizing the missed anomalies) as our evaluation metrics, since F-Score usually makes a balance between Precision and Recall, also useful when the distribution of class is imbalanced.So we adopt the F-score as our metric to select the threshold to distinguish between normal and abnormal.In K-fold cross-validation, we assessed variability and uncertainty in predictions of each fold, displaying performance as error bars on F-score values.Our analyses, shown in Figure 6, reveal that a 90% -99% threshold range saw consistent performance, with mean F-scores approximately between 0.80 and 0.90.
We opt for the LSTM AE hybrid model to enhance stability and improve dynamic levitation.With a 92% threshold, the model achieves a mean F-score of 0.9 (with corresponding Precision: 86%, Recall:95%), demonstrating roughly 90% accuracy in detecting anomalies on the test dataset.In our test set, most of the actual abnormal groups (88 groups) are correctly predicted as abnormal (true positive), while a few (5 groups) fall into a false negative category.Some actual normal groups (14 groups) are predicted as abnormal (false positive), and 16 actual normal groups are predicted as normal (true negative).Among the true positive groups, we present a few examples where feature anomalies precede actual particle drop events (e.g., large position displacement captured by the camera) in Figure 7. Notably, a single anomaly step does not necessarily lead to one    (or more) particle drops and we often observe an accumulation of anomalies before drop events, as indicated by the red dashed lines in Figure 7.

Stability Enhancement
Here, we present a few anomalous groups (46, 61, 62, and 67 in Figure 8, 9) as examples and report the stability enhancement through the amplitude amendment approach proposed in Section 4.3.
First, we used the processed tracking trajectories in our dataset and compared them to the target trajectories, identifying which particles dropped (i.e., particle 2 in Figure 8).Also with the anomaly time regions, when setting the phase retrieval solver parameters, we change the unstable trap's target amplitude to higher than other previous stable particles (traps).Note that we gradually increase the target amplitude and find a proper increase.Here, with 30% of amplitude enhancement at predicted anomalous regions, we  repeatedly ran the same trajectory for consecutive 3 times as we did in Section 3, and particle 2 in both groups 46 and 61 arrived at the endpoint of the trajectory without dropping.
In our examination of amplitude modifications for groups 62 and 67, we noted that exclusively increasing the target amplitude of the unstable trap does not reliably enhance stability.Across three repeated tests of a few groups, we found it is uncertain whether a particular particle will consistently drop, complicating the identification of the problematic trap.However, during this period of anomaly, we have the option to either reduce the amplitude of the stronger trap, increase the amplitude of the relatively weaker one, or employ a combination of both strategies to comprehensively address this instability.In group 62 (see Figure 9), after lowering the trap amplitude of particle 4 by 20%, no drop occurred.Likewise, by lowering the trap amplitude of particle 1 and increasing the trap amplitude of particle 2 by 20% within the suggested anomaly time region, we prevented the drop from happening in group 67.

DISCUSSION 6.1 Improvement of Stability Detection and Enhancement
Harnessing our domain knowledge and utilizing the feature correlation heat map, we have identified two crucial features that are instrumental in achieving trap stability during the dynamic levitation of multi-particles.Our current model, based on these features, attained an F-score of 0.9, but incorporating additional correlated intensity features like Gor'kov potential (U) and stiffness (S) could further enhance learning and improve model performance.For instance, the selection of Gor'kov potential ( ) with focal point amplitude () in combination with point phase change (Δ ) can unveil previously hidden patterns that can greatly enhance the model's efficacy.However, as features like stiffness are more computationally demanding, the trade-off between the increase in feature dimensions and the associated training costs should be taken into account.
One limitation of our study is its emphasis on anomaly amendment using only one primary feature: the focal point amplitude, which is straightforward and feasible for existing phase retrieval algorithms to achieve in real time.Beyond intuitively tuning amplitude, we envision that the reconstructed time-series sequences by our AE models are inherent indicators of stable amplitude and phase change.By further utilizing that information, it is possible to tune both amplitude and phase changes to get robust levitation performance.Our dataset already encompasses time-series data on varied features.Therefore, the holistic approach to repairing anomalies could encompass amplitude, phase, and others, marking a potential avenue for future extensions of our work.

Generalizability of Dataset and Finding
We observed that various phase retrieval solvers influence stability differently due to their inherent properties.To reflect this, we selected two representative solvers that showcase a range of levitation situations, thereby improving the generalizability of our dataset.Meanwhile, our data is exclusively derived from the setup with 16*16 top-bottom PATs, which are commonly used in levitation applications and have shown some performance issues.In the future, to extend the dataset in different setups (e.g., 8*8 top-bottom PATs, V-shaped PATs), researchers can start by establishing the levitator through the OpenMP framework [27] which is becoming a standard hardware and solver solution in this field (e.g., adopted in UIST student innovation contest).Once set up, researchers can replicate the dataset composition process and train their models with proper analytical and experimental features.

Levitation Stability on Various Object Structures
In this work, we mainly discussed levitation stability on individual EPS particles, and those observing and experimental features are based on particle performance.Apart from point objects, other object structures like threads [8] or fabrics [28] are often employed for levitation.To consider the stability of such levitation structures, compared to the individual particles, collective effects (e.g., a set of traps maintains a fixed relative distance and shares the same velocity) should be considered when analyzing the features and making amendments.Single unstable traps can inevitably affect the neighbor traps, especially when they are physically connected by threads or fabrics.Therefore, more types of analytical features and experimental constraints will be introduced for multi-"object" levitation stability.

Levitation Stability on Various Object Materials
Note that levitation stability also varies depending on different materials.As previous comparisons [13] suggest, liquid particles achieve less stable motion compared to EPS solid particles under the same number of traps and moving velocity.Properties such as density and surface tension of liquid particles [38] play a crucial role in how particles respond to acoustic pressure and acoustic radiation force.These forces can cause deformations, changes in shape, or even particle destruction, which are all dynamic aspects of particle behavior during levitation.Therefore, when characterizing the levitation stability of different object materials, it is necessary to consider more underlying physics properties and different motion behaviors, going beyond the simple binary distinction of whether they drop or remain suspended.

Further Data-driven Explorations
This paper presents StableLev, the first data-driven approach tailored for dynamic multi-point acoustic levitation.It's crucial to recognize that prior deep learning endeavors (e.g., [22]) targeting acoustic phase-modulating devices have primarily hinged on simulated data and focused on creating complex acoustic fields, often visualized as images.However, such acoustic wavefront shaping strategies are predominantly suited for static particle manipulations, a prime example being a one-step acoustic fabrication, as documented in [26].In stark contrast, StableLev stands out with its capability to process time series data to successfully bolster the stability of levitating particles in motion during experiments.
StableLev represents an initial foray into leveraging data for pinpointing and controlling dynamic acoustic fields.Acoustic levitation inherently grapples with non-linear acoustic phenomena, which the Gor'kov theory has sought to streamline by assuming enduring acoustic waves, essentially static in nature.Our opensourced hybrid levitation dataset and methodology can inspire novel research that can, for example, seek explicit equations for the non-linear dynamics of multi-particle levitation, similar to ongoing research on data-driven discovery of governing physics [3] or hardware-in-the-loop modeling of interactive devices [33].

Figure 1 :
Figure 1: Illustration of the simulated sound field generated by Top-Bottom PATs with Two Traps: (a) Depicts the pressure amplitude distribution around the trap.(b) Depicts the distribution of Gor'kov Potential around the trap.Particles (green dots in (a), (b)) are suspended at the sound wave's antinode, coinciding with the minimum Gor'kov potential.

Figure 2 :
Figure 2: The building procedure of hybrid levitation dataset.

Figure 3 :
Figure 3: Motion capture system tracks multi-particle trajectories in the mid-air.

Figure 5 :
Figure 5: Anomaly detection model overview using AutoEncoder-based deep neural networks.

Figure 6 :
Figure 6: Performance F-score error plots for each k-fold hybrid AE model at different reconstruction error threshold values.
Time windows or look-back period   20 K-fold 5 Number of layers for encoder/decoder 2 Number of memory units or neurons in each layer 64(layer1) 32(layer2) Activation function tanh Reconstruction error threshold  92%

Figure 7 :
Figure 7: The position displacements between target trajectories and real trajectories, with the red dashed lines indicating the predicted anomaly time windows.Particle 6 dropped in group 69 and particle 4 in group 89.

Figure 8 :
Figure 8: Two abnormal groups succeeded after adjustment to the target point amplitude of dropping particles.In group 46 (a) and group 61 (b), only particle 2 dropped when moving along the target trajectories (dashed line).Anomaly time windows (in red) are predicted by the anomaly detection model.

Figure 9 :
Figure 9: Two abnormal groups succeeded after adjusting the target point amplitude, including particles beyond those exclusively categorized as 'dropping.(a) In group 62, all trajectories get stable after lowering the target amplitude of particle 4. (b) In group 67, all trajectories get stable after lowering the target amplitude of particle 1 and increasing the target amplitude of particle 2.

Table 1 :
Observed outcome labels of running 200 group levitation trajectories by NAIVE and GS-PAT solvers.