Parallel spatiotemporally adaptive DEM-based snow simulation

This paper applies spatial and temporal adaptivity to an existing discrete element method (DEM) based snow simulation on the GPU. For spatial adaptivity, visually significant spatial regions are identified and simulated at varying resolutions. To this end, we propose efficient splitting and merging to generate adaptive resolution while maintaining the simulation stability. We obtain further optimization by skipping computation on temporally inactive regions. In agreement with the base solver, our method also operates almost entirely on the GPU, which includes operations like activity determination, merging, and splitting of the particles. We demonstrate that a speedup of three times or more of the original non-adaptive simulation can be achieved on scenes containing about 3 million particles. We also discuss the advantages and drawbacks of our spatiotemporal optimization in different simulation scenarios.

Fig. 1.Animation progression with a set of gears plow through an accumulated mass of snow consisting of more than 1.15M snow particles simulated using our spatiotemporally adaptive DEM simulation on the GPU.
This paper applies spatial and temporal adaptivity to an existing discrete element method (DEM) based snow simulation on the GPU.For spatial adaptivity, visually significant spatial regions are identified and simulated at varying resolutions.To this end, we propose efficient splitting and merging to generate adaptive resolution while maintaining the simulation stability.We obtain further optimization by skipping computation on temporally inactive regions.In agreement with the base solver, our method also operates almost entirely on the GPU, which includes operations like activity determination, merging, and splitting of the particles.We demonstrate that a speedup of three times or more of the original non-adaptive simulation can be achieved on scenes containing about 3 million particles.We also discuss the advantages and drawbacks of our spatiotemporal optimization in different simulation scenarios.Snow forms an essential component of many scenes based on the winter setting in computer graphics (CG).Various applications, for instance, movies, games [RDR2 2024], etc., require snow as a background component on the terrain or as a material with which agents interact.Snow animation remains a challenge, unlike several other materials that can be easily simulated at efficient frame rates on modern GPUs with state-of-the-art algorithms.This can be attributed to the complex physical properties of snow; it can deform, change state from being porous to nearly solid, and even melt.
Whereas each method has advantages, a common limiting factor with most methods is the varying but high levels of computational cost involved.Most methods demand a high-resolution simulation to capture realistic and fine-grained effects, primarily when snow interacts with other objects in the scene.Given these simulators' already computationally intensive nature, this further diminishes the possibility of any real-time or interactive considerations.
The contribution of this paper is to present a parallel, spatially, and temporally adaptive DEMbased snow simulator on the GPU to increase the simulation resolution at a reduced computational cost.Whereas spatial adaptivity ensures that more computations are delegated to visibly significant regions in the simulation domain, temporal adaptivity further optimizes on particles that are exhibiting very little or no movement overall.In addition to lightweight particle activity detection, our method handles efficient merging and splitting of particles on the GPU, as well as maintaining the stability of the overall simulation.Operations such as spatial (splitting and merging of particles) [Winchenbach and Kolb 2021] and temporal activity [Reinhardt et al. 2017] optimization have been shown to be effective in separate CPU-based implementations, especially for fluid simulations.Our approach demonstrates that these optimizations can be combined together on parallel hardware, and the benefits can be extended to materials such as snow.
Our method utilizes the recently developed efficient DEM-based [Goswami et al. 2022] method as our base solver.Furthermore, the proposed method can effectively reproduce all other involved processes in the original method, like compression and particle bonds, as the particles are split or merged.We also identify scenarios where our adaptive method can be more helpful.
The remainder of this paper is organized as follows.Section 2 discusses the related work in the field and identifies the gap.In Section 3, we briefly recap the base DEM solver.This is followed by a detailed exposition of our spatiotemporal adaptation on the DEM solver in Section 4. We present our obtained results in Section 5, and finally conclude in Section 6.

Related work
High-performance animation of materials is of particular interest to the CG community.In the past few years, advancements have been made to simulate many materials, including fluids [Bender et al. 2017], sand [Longmore et al. 2013], cloth, etc., at efficient frame rates.This is made possible owing to the development of efficient solutions, as well as the rapid growth of parallel hardware, like GPU.However, despite its importance, physically-based snow simulation has received attention only in the last decade or so.Due to its complex nature and limiting frame rates, the gaming world often has to rely on procedural mesh deformation techniques instead to generate the effect of snow.Stomakhin et al. [Stomakhin et al. 2013] introduced the material point method (MPM) to the CG community for snow simulation.The proposed method is hybrid Eulerian-Lagrangian in nature.Whereas MPM has long been used in the base fields, their method made some crucial simplifications and optimizations over the base solver proposed by Sulsky et al. [Sulsky et al. 1995].It has been further accelerated by Hu et al. [Hu et al. 2019] wherein efficient CPU-and GPU-based parallel techniques are presented.Gissler et al. proposed an implicit compressible SPH solver for snow [Gissler et al. 2020] by modifying the underlying implicit incompressible method to account for compression in snow.Both these methods can employ large time steps.However, a common limitation is the high time taken to compute every simulation step.
Techniques approaching physical snow simulation from the DEM angle have been proposed recently in CG [Goswami et al. 2019[Goswami et al. , 2022]].Both these methods are implemented and tested on the GPU.Whereas Goswami et al. [Goswami et al. 2019] were forced to use small time steps to maintain the overall stability of the simulation, it was circumvented by employing an iterative scheme in a later work [Goswami et al. 2022].An advantage of the proposed DEM methods is the relatively low time taken to compute a simulation step while maintaining high frame rates.We thus chose the DEM-based solver [Goswami et al. 2022] as our base method.A limitation that still exists with the accelerated solver is the limited movement of snow in every step, even if the frame rates are high enough for real-time considerations.A recent survey on existing snow methods in CG is presented in [Goswami 2024].

Spatial adaptivity:
A common technique to reduce the computational time is to apply spatial adaptivity to the simulations.In the case of spatial adaptivity, regions with less visual importance or reduced activity are simulated with a reduced resolution, whereas the prioritized regions are simulated with a higher resolution.Desbrun and Cani [Desbrun and Cani 1999] applied spatial adaptivity for deformable materials.Spatially adaptive solutions to vary particle sizes have also been proposed for SPH simulations.Among others, this includes adaptive sampling algorithms based on distance from the medial axis [Adams et al. 2007], infinite continuous adaptivity to incompressible SPH [Winchenbach et al. 2017], and iterative refinement process for low-viscosity turbulent flows using a discretized objective function [Winchenbach and Kolb 2021].
Temporal adaptivity: Another technique for reducing the time spent on computation is temporal adaptivity.Unlike spatial adaptivity, which primarily relies on processing different regions in the simulation domain with different resolutions, there are multiple possible means to achieve temporal-based optimization.Ihmsen et al. [Ihmsen et al. 2010] calculated a new global time step for every simulation step based on the most restrictive particle time step.Individual time stepping is used for deformable objects by Desbrun and Cani [Desbrun and Cani 1999] and in SPH fluids by Reinhardt et al. [Reinhardt et al. 2017].Regions are identified using an efficient block-based approach, and particles belonging to inactive fluid regions are temporally frozen and reactivated on the need for SPH fluids in some approaches [Goswami and Pajarola 2011].Regional time stepping is applied to identified block-based implicit regions for SPH fluids [Goswami and Batty 2014] and in MPM simulations [Fang et al. 2018].
In contrast to most of the above-stated works, which focus on SPH fluids, we propose applying both temporal and spatial adaptivity to a DEM-based snow simulator.Also, different from Fang et al. [Fang et al. 2018], who applied only temporal adaptivity to MPM (grid-based hybrid method), our approach targets a particle-based paradigm.With the combination of spatial and temporal adaptivity, our work provides insight into the performance gain that can be obtained on an efficient DEM-based particle snow simulator.Our method is GPU-based, and hence, contrary to most existing methods, it investigates the possibility of performing operations such as splitting and merging particles on parallel hardware.

Base iterative method
In this section, we begin by briefly recapping the base DEM method.All particles in the simulation are initialized as pure snow particles with radius   and are allowed to compress to pure ice particles with radius   .The state of a particle (denoted by ) anywhere between that of snow and ice can be determined using its radius as  =  −    −  .The physical properties of this particle (Young's modulus, density, etc.) are computed by linearly interpolating them on the properties of soft snow and ice, using determined .The following points briefly summarize the highlights of the base approach.
• A virtual uniform grid covers the simulation volume (inspired by Green [Green 2010]) to determine the neighbors for each particle efficiently.• Boundary objects interacting with the snow particles are sampled as static, fully compressed ice particles.• Each particle  interacts with its neighboring particles  under the effect of a normal force ì   .
The formula is dependent on the particle overlap , radius  , Young's modulus  , contact normal ì , and the normal cohesion   .A negative overlap between any two neighboring particles creates an attractive force, whereas a positive overlap results in a repulsive force.
• Each particle interacts with its neighboring particles under the effect of a frictional shear, causing a tangential force ì   .
(2) The angle of repose  is the steepest angle a pile can form for granular material, and ì  is the shear displacement.
• Snow particles lose air and get compressed to ice owing to the uniaxial compressive force pairs acting on each particle.
Here   is the compression force, ì  + and ì  − are the compressive forces acting in the positive and negative directions of the base axes, respectively, and  is the particle durability coefficient.
• Durability  for each particle is computed from the applied pressure   , density , durability change coefficient   , and a threshold force  ℎℎ .The idea is that the particle resists deformation and enters into a plastic deformation only when a threshold force is applied.
• An approximate bonding behavior between snow particles is captured by introducing the quantity bond threshold  =     , where   is the current neighbor count of a particle and   is the maximum count it had so far.
• The snow/ice particles are also subjected to an air drag force and a basic thermodynamic model as adopted in the base method.• In [Goswami et al. 2019], snow and ice particles can melt into water, and water can freeze back to ice.Their method, therefore, supports basic thermodynamics for this purpose.Although for the sake of completeness, we have provided extensions to the base method to handle the thermodynamics component while applying spatial adaptivity, similar to [Goswami et al. 2022], we have focused on the snow simulation aspect in this paper.We refer the reader to [Goswami et al. 2019] for a more detailed exposition of the method.
Iterative DEM solver: Algorithm 1 describes the update loop for the DEM-based iterative snow solver, which employs the aforestated physical model.The code is divided into higher-level blocks (highlighted by the comments) for the purpose of explanation.Each of these blocks can be realized as a CUDA kernel for effective task distribution.In the launched CUDA kernels, each thread is responsible for computing the quantity of the particle in question.For instance, in the kernel __  (), quantities ì  ,,,  (), ì    , ì  +,−  ( + 1) are calculated by each thread.The core idea of this approach is to expose basic forces in [Goswami et al. 2019] to an iterative loop to allow for stability while taking larger time steps.After determining the neighbors for each particle, _ℎ(), the external, compression, tangent, and accumulated forces are initialized in _  ().In order to achieve faster convergence, an iteration is performed for a warm start in __  (), which includes gathering forces from all neighbors (snow and boundary particles).The method enters into a global iterative phase by predicting the positions and velocities for all the particles (__ ()) and updating the accumulated and compression forces (__  ()).The iterative loop is terminated once the maximum overlap error ( __ ()) between any two particles reaches below a user-specified threshold or a maximum number of iterations have been performed.Thereafter, the bond status (_ ()) and compression (_()) of each particle are updated based on the obtained force in the iteration.The final force determines the next velocity and position of the particle (_ ()).An oscillation control is applied to the updated position of particles to prevent oscillations caused, in some cases, by a relatively larger time step.The overlap error computation differs from incompressible materials in that it is adjusted based on the particles' hardness; softer overlapping particles have a lower error as they can compress.It is important to note that both the base and iterative methods are GPU-based and implemented on CUDA.The reader is referred to [Goswami et al. 2022] for more details of this method.

Our method
Since the purpose of our approach is to accelerate the animation, we have chosen the base solver from the iterative DEM-based solver [Goswami et al. 2022] for our implementation.It is an efficient, easy-to-implement, GPU-based parallel method (implemented using CUDA) that produces an interactive to real-time snow simulation in some scenarios.Our adaptive approach is also mostly parallel and implemented on the GPU using CUDA.The higher-level approach to achieve spatial and temporal adaptivity is outlined in Algorithm 2. Our adaptive algorithm is carried out after executing the physics loop on the existing particles in Algorithm 1 (line 2 in Algorithm 2).

Activity
The original simulation containing all initialized particles of the same radius and mass forms our base reference.This is the simulation state with the highest spatial resolution (red particles in Figure 2).To leverage the computational benefit while simulating relatively inactive or visibly less important regions, particles in these areas are incrementally combined to form larger particles.In what follows, we will employ the term level to refer to the spatial activity level of particles.All particles from the original simulation belong to the same level.Similarly, any pair of particles from this simulation when combined together, produce a resultant particle of the next level, and so forth.Two particles belonging to the same level in the combination hierarchy can be combined together to yield a larger particle.Starting from the base resolution, up to 8 neighboring particles Algorithm 1 Base iterative DEM snow solver [Goswami et al. 2022] 1: while simulating do 2: // _ℎ ( )

23:
for all snow particle p do 24:

49:
end for 50: end while from a level can be merged with this approach in three consecutive steps.At the beginning of the animation sequence, where no activity takes place, the base particles are merged to produce a simulation containing only the largest possible particles.When an activity is detected, particles are split.Conversely, the split particles are combined when the motion disappears.The particles can continue combining till they reach the original maximal size at which the simulation started.Throughout this process of merging and splitting, we keep track of the spatial activity level of each particle.In order to capture both spatial and temporal adaptivity, two types of activities are measured and stored in two different grids.These will be referred to as spatial-temporal and temporal cell activities.The purpose of the former is to detect common motion on a number of particles, and both adaptive methods use it.The latter detects activity on the particles and is used by the temporal method only.For the sake of efficiency and simplification, all particle activity is computed at the grid scale (Figure 3) and then passed on to the particles it contains.This optimization not only reduces the cost of computation but also enables a more coherent region space and contiguous transition between adjoining activity levels.In the following sections, we provide a detailed exposition of all the various steps in activity computation and propagation.
4.1.1Spatial-temporal activity.A high (or low) spatial-temporal activity is determined based on the velocity of particles within the containing grid cell.To account for any activity arising from objects interacting with the snow mass, the sampled boundary particles in motion, such as a moving obstacle, are assigned an active status.The remaining static boundary particles continue to have the default inactive status.In order to minimize the memory footprint and computational cost, the grid maintaining this activity is coarsened by a factor of 4 compared to the standard neighborhood grid while covering the same simulation space.The activity level of a grid cell (line 4 in Algorithm 2) is determined by the sum of the velocities of all particles in that cell divided by the number of particles as . The variable   is set to 1 for the object sampled particles and to a lower value for snow particles (0.5).This is done to reduce the contribution of snow particles in comparison to boundary objects and, therefore, to hinder snow from causing the splitting of particles at the same speed as moving objects.Furthermore, the sum of velocities (not magnitudes) ensures that random particle movements cancel out each other.This was a necessary measure since splitting and merging can cause unstable particle movements.The next step after global spatial-temporal activity determination is to propagate it, as explained below.Similar to [Goswami and Batty 2014], the result of computed and propagated grid activity is quantized to pre-selected levels based on   ; see also Figure 2.
4.1.2Activity propagation.Activity propagation to the regions in the simulation about to encounter a hike in activity is essential to allow them sufficient time to split.To handle this efficiently, the hiked activity values on the spatial-temporal activity grid are propagated out to the neighboring cells in the direction of movement using the fast marching method (FMM) [Sethian 1999].The FMM can be used to create a distance field to a grid based on at least one seed point.The identified red grid cells are considered to be the source of activity and are treated as seed points for activity propagation to the neighboring grid cells (Figure 2).In similarity to Dijksta's shortest path algorithm, FMM uses a priority queue of nodes (cells) to gradually map out the arrival time (or distance) from seed points, making the algorithm serialized in its simple version.To better anticipate movement, the direction of the cells' particles is taken into consideration, implying that an object in motion will activate cells further in its direction than to the sides.
In addition to lowering the activity grid resolution, we make further optimizations to lower the propagation cost.Firstly, the algorithm proceeds up to a set maximum threshold from each seed cell for the activity propagation.Secondly, we limit the frequency of our algorithm once every few physics frames.Whenever spatial adaptivity is enabled, the interval is 100 iterations, but the propagation extends a bit further spatially.With only temporal adaptivity enabled, the propagation is shorter and less expensive but runs every 50 iterations instead.Finally, since the resulting process was relatively inexpensive computationally on our coarsened grid, we implemented it on the CPU for the sake of ease.It can be implemented on the GPU for a speedup in case of larger or finer grids [Mirebeau et al. 2023 update temporal_adaptivity() ⊲ Section 4.3 10: end while

Spatial adaptivity
The primary objective of spatial adaptivity is to optimize computation by reducing the overall particle count in the less active regions while maintaining a comparable perceived shape and behavior of the snow.Particles in the original non-adaptive simulation can vary in radius based on their compression or state, from snow   to ice   .The compression or radius of a particle determines its physical state  based on the initial reference radii   and   .However, they are all assumed to have a uniform mass, meaning they all still belong to the same level in terms of their spatial adaptivity.In our spatially adaptive simulation, larger particles are created by merging the base Fig. 3.The figure illustrates an overview of the merging and splitting process in our approach: a) particles before the spatial adaption process with colors helping represent their current size, b) background cells visualize the computed desired activity level, and hence, particle size is given by the spatial-temporal grid, c) identification of particles with incorrect sizes when compared with its containing grid cell activity with identified pairs for merging visualized with a connecting arch, d)-f) interpolation process wherein pairs identified for merging or splitting are grouped by a black border.The splitting particle generates a virtual particle represented by the black dot in d), which is initialized after mass transfer from its parent particle.
particles.It is, therefore, no longer possible to account for the physical state of a merged particle purely based on its radius or mass.We, thus, create an additional variable to track the spatial adaptivity level of each particle.
As the simulation proceeds, particles identified to belong to regions of lower visual importance can move to regions that are now visually more prominent, for example, closer to moving objects.The reverse case can also happen; a low-priority zone experiencing minor or no movement can become a zone of high importance.In such cases, the combined particles are again split to provide a higher resolution in such simulation regions.After spatial-temporal activity identification and propagation to all grid cells (Figure 2), each particle is assigned an activity level of the grid cell it is contained in.Our general idea on this front is inspired by the regional time stepping by Goswami and Batty [Goswami and Batty 2014].In the following, we will explain how the merging and splitting of particles is achieved (see also Figure 3).

Merging.
In Algorithm 2, if a particle is smaller than the desired size as identified by the activity level of the grid cell, a search is initiated to identify another particle for merging.This pertains to particles that have a reduced activity now than before, and hence, which can be merged to reduce the overall particle count without compromising much on the visual quality of the simulation in that region (Figure 3).It is often almost impossible to find two exactly similar particles to merge.Hence, we need to relax some constraints while maintaining others to keep the simulation physically consistent yet efficient.To identify potential pairs, particles are first searched within the same cell, followed by those in the neighboring cells.The algorithm's criterion for merging dictates that particles cannot merge into a particle larger than the targetted size, and their velocities should not differ significantly.The targetted size or mass of the merged particle for a particular level is determined from the base simulation.A preference is given to two particles belonging to the same spatial adaptivity level in the splitting hierarchy for merging; this restriction is, however, relaxed if the exact match is not found.The merging particles can not belong to different states, implying that a snow particle cannot be merged with a water particle.The algorithm favors merging pairs with a combined mass closer to the targeted mass corresponding to the level and stops searching once a sufficiently good match is found.
The smallest particle is considered the source to draw the mass from since it entails less mass transferred and hence creates less disturbance in the simulation.If one or both of the particles have partially melted to water, the merged particle also carries the combined water mass.The larger of the two particles, therefore, is considered the destination where the whole mass is eventually transferred.Since the pairing process is carried out for each particle, some particles may be paired multiple times.Therefore, the pair list is later culled to remove duplicate particles.The pair is added to a list using an atomic buffer count.This introduces some serialization to the process on the GPU.
Interpolation: After identifying pairs of particles to be merged together, the actual process of merging is done incrementally through interpolation over  number of physics iterations so as to avoid any sudden changes in the simulation.The process of interpolation is synchronized globally in our implementation.This is easy to achieve since the particles are checked for their spatial activity every 100 iterations.During these iterations, the particles already identified in the recent most pass for merging or splitting complete their interpolation process.Both the source and destination particles enter into the iterative simulation loop as soon as their creation starts.The basis of this step is to ensure a gradual transfer of mass and other attributes from the existing particles to the newly created unified particle.This is because sudden transitions in particle size or density can create overlaps with surrounding particles, leading to unintended DEM overlaps and resulting in particle instabilities or explosions.A small amount of mass is transferred from the source to the destination particle in each iteration (as visualized in Figure 3).Along with the mass, several other physical attributes are transferred in proportion to the transferred mass.These attributes include thermal energy, size, and compression.The function   () computes volume, while ℎ () calculates the radius of a sphere. is set to a value of 100 in our implementation.
The interpolation process while merging a pair of identified particles is explained in Algorithm 3.During each iteration of the interpolation process, the amount of transferred mass   for a particle varies and depends on its current source mass (lines 4-7 and 17-20).We incorporate it by using a dynamically adjusting variable  , which starts with smaller fractions of the source mass being transferred to the destination particle initially, to gradually increasing it as the process progresses.The motivation behind the variable transfer rate is to make the transition between the simulation states gradual by allowing it to adjust with both the growing destination and depleting source particles during the interpolation period.At the beginning of this transfer phase, the newly created particle exhibits a higher rate of growth in radius when the additional volume is added.At the same time, the source particle possesses a much larger mass at the beginning of the transfer phase.Our variable transfer mechanism counteracts both of these challenges by gradually transferring mass in a more consistent manner.
In addition to mass conservation and maintaining the physical stability of the simulation, other particle attributes are also required to be preserved as much as possible during merging and associated with the interpolation process.We discuss these in the following.Thermodynamics: To maintain thermodynamical consistency while merging, heat is extracted and transferred together with the transferred mass to the destination particle (lines 10-14 in Algorithm 3).All melting particles are partly ice/snow and partly water in their constitution.Hence, it is important to account for the presence of this water in the thermodynamics exchange.The heat transfer is achieved separately for the transfer of particle mass that has now converted into water ( ) from the snow mass.After this, the actual thermodynamic calculation is deferred to the actual physics stage in the original loop.
Compression: Since the merging particles have a more or less consistent state, we do not explicitly adapt the density of   .Instead, the volume of the transferred snow is added to the destination; the combined mass implicitly adapts to the compression (lines 17-20).This is achieved by computing the compression factor of both the particles at the beginning of interpolation.Then, when the particles have completed merging, the average compression factor is assigned to the combined mass.

Bonds:
The union set of neighbors of the merged particles decides the bond status of the merged particle.Consistent with the base simulation, the bond threshold for different sizes (or spatial adaptivity levels) of particles is scaled based on the size of the merged particle.For every doubling of particle radius, the bond threshold is scaled by a factor of 2 3 of the previous level.It is worthwhile mentioning that we have used the same values of Young's modulus  as in the base simulation for all particles.On the other hand, normal cohesion   was scaled inversely to the radius growth.
Position offset: As the destination particle grows, the source particle has to be consistently moved apart from the former.This is crucial to avoid sudden, large repulsive forces.The position is offset based on the growth of the destination's radius.The offset is based on the destination since its radius changes less rapidly than the source (lines 28-31).

Splitting.
In Algorithm 2, if a particle is identified as larger than the desired size based on the assigned activity to its containing grid cell, it is marked for splitting.This pertains to particles that have increased activity now than before and, hence, should be split to provide a higher resolution of

35:
+ + 36: end while simulation in that region.Each splitting particle gives birth to two particles, with each containing half of the mass of the marked particles (Figure 3).This is achieved by pairing each marked particle (source) with a currently nonexistent or virtual (destination) particle which is instantiated at a later stage.In the splitting process, the newly created particle is marked as the destination, while the splitting particle is set as the source.In essence, the virtual particle is placed at the corner of the source particle, with mass and velocity set to zero and temperature and bond status inherited from the source.The mass to transfer in splitting is set as   =   (  /2) to distribute it as evenly as possible while keeping integer masses.The integer masses are derived from the base simulation at the start, wherein all particles belong to the base level and have the same mass.If the particle has partially melted, the liquid water mass is distributed evenly between the split particles.The major steps of our particle splitting process are outlined in Algorithm 4.
Interpolation: Similar to merging, after identifying particles to be split, the actual process of splitting is done gradually through interpolation so as to avoid any sudden changes or instability in the simulation.In principle, the interpolation process behaves similarly to merging, barring two key differences.Firstly, the process requires an extra buffer (line 9 in Algorithm 4) to keep track of how much mass is left to be transferred for each source particle.Secondly, special consideration needs to be given to the newly created particle as it grows.The new particle, even while it has a small mass, has a high chance of overlapping with other particles in the neighborhood (see Figure 4).The repulsive forces generated from these overlaps can cause drastic accelerations, thereby destabilizing the DEM simulation.To mitigate this, the destination particle (starting as a virtual Proc.ACM Comput.Graph.Interact.Tech., Vol. 7, No. 3, Article 50.Publication date: July 2024. Parallel spatiotemporally adaptive DEM-based snow simulation 50:13 particle) is glued to the source particle until it reaches a minimum mass threshold (set to roughly 40% of the targetted particle mass) (lines 31-39).To further address this challenge, collision physics is exempted for these small particles with their neighbors until they reach the minimum mass threshold, which allows them sufficient time to settle down in the simulation.
Similar to the case of merging, the newly generated particle is gradually moved away from the source particle.On the other hand, contrary to the merging interpolation process, we execute the splitting interpolation loop for  times even if there is no further mass left to be transferred from the source particle.This is crucial to handle the logic of gluing the newly created small particle while still below the ℎℎ to its parent to maintain the overall stability.The two created particles inherit the compression and bond attributes from their parent, while the heat transfer follows a similar logic as in the case of merging.

Temporal adaptivity
In addition to the spatial adaptivity, we have also applied temporal adaptivity to optimize computation further by selectively reducing the physics update frequency on dormant or inactive regions in the simulation.This is similar to the concept of particle sleeping or freezing in other particle-based simulations [Goswami and Pajarola 2011].The temporal activity (line 9 in Algorithm 2) is determined at the grid scale by a combination of spatial-temporal activity and an additional activity grid (also coarsened).All grid cells are either marked as active or inactive regarding their temporal activity (see Figure 2).To this end, all active cells containing moving particles in the spatial-temporal grid are marked as active in the temporal grid as well.The need for the activity grid arises from special cases, such as a lone particle existing in a grid cell or particles thrown in the air having zero velocity before reversing their direction.To prevent detecting temporarily slow-moving particles as inactive or frozen, the temporal grid utilizes the maximum individual velocity of a particle in a grid cell and acceleration.We also enforce through the temporal grid that the high-resolution spatial areas never freeze.
The activity of a grid cell is measured by the frequency update of its contained particles.In our implementation, active particles update in every physics iteration, while inactive particles do it once every 10 iterations.The inactive particles, hence, skip most of the physics routines except for certain checks, such as collision and boundary checks.We also experimented with the update frequency of the inactive particles and found that skipping more frames than 10 did not result in any further noticeable gain in performance.

Utilizing the GPU
On a quick inspection of Algorithm 1, 3, and 4, it is easy to see that the merging and splitting algorithms carry a similar parallel nature as the main algorithm.Each of the lower-level routines listed in Algorithm 2 corresponds to a CUDA kernel.For CUDA kernels launched, each thread assigns itself to a particle in the buffer to handle the activity computation, splitting, merging, or physics computation.While determining and propagating the block activity, the CUDA kernels are launched for the blocks instead, with each thread computing for a single grid cell.Due to the coarsened grid employed and, hence, fewer grid cells, a lesser number of kernel launches are needed for the grid-level computations than for the particles.Like the base method (Algorithm 1), most of our spatiotemporal adaptive method is completely parallel and implemented on the GPU using CUDA.There are only two exceptions wherein the functions had to be implemented on the CPU due to their serial nature.The first function is spatial-temporal activity propagation using FMM, which operates on the coarsened grid to track activity levels.The second deals with culling the duplicates wherein a particle is paired with multiple particles in the merging pair list.However, both these are lightweight functions and do not alter the overall frame rates noticeably.The extra operations in the splitting or merging process entail bookkeeping of the merged and split particles.To this end, we already employ ping-pong buffers to store particles' positions and other attributes.The newly created particles originating from the split of an existing particle are written atomically to the end of the other buffer.Similarly, two merging particles create a new particle and are deleted from the new buffer.

Results
Our method was implemented and tested using C++, CUDA, and GLSL shaders on an Intel Core i7-13700K 5600 MHz CPU with 32GB DDR5-6400 CL32 RAM and an NVIDIA GeForce RTX 3080 10GB 1905 MHz GPU.We implemented four different scenes (Table 1) to study the performance benefits and the visual impact of our spatiotemporal approach.Four different modes were tested in each scene by varying the number of snow particles using no adaptivity, only temporal, only spatial, and both spatial and temporal adaptivities combined.All tests were run for 1.8 seconds of simulation time, which is equivalent to 3000 physics iterations since a fixed simulation time step (Δ) of 0.6 milliseconds was used.For each combination of settings and particle counts, five tests were run to get the average result.As stated before, all simulations were started with the finest allowed resolution or maximum particle count, and the particles merged and split in various regions as the simulation proceeded.In all the scenes, therefore, a large percentage of particles were combined in the first few frames due to large inactive or dormant regions.For each scene, Table 1 gives the maximum number of particles created, the average number of particles in the adaptive variations, and the speed-up obtained thereof when compared to the base, non-adaptive variant.In two of the chosen scenes, a significant speed-up of around 3x (or more) was achieved with the spatiotemporal approach when compared with the original simulation on the GPU.In addition to the nature of the scene itself, the speed-up also seems to depend on the number of particles employed; higher particle counts typically result in a higher acceleration.It can be noticed that while the speed-up due to temporal adaptivity was consistent and limited, the performance gain due to spatial adaptivity also varied depending on factors like the scene and particle count.
The graphs shown in Figure 6 (left) compare simulation times for various scenes and particle counts.The execution time represents the total simulation time, which includes both CPU and GPU time, with GPU time overlaid on the CPU time.It is essential to compute the overall time since all CUDA kernels are launched from the CPU.Furthermore, we needed to delegate some Parallel spatiotemporally adaptive DEM-based snow simulation 50:17 Table 1.Results from a selection of scenarios using various modes (temporal, spatial, spatiotemporal) on the stated hardware show the execution time for each adaptive setting and how much it sped up compared to no adaptivity.It also shows the average particle count and the corresponding percentage with respect to the original (maximal) particle numbers for the entire benchmark pass.lightweight functions to the CPU due to their sequential nature.For the same scenes, the graphs in Figure 6 (right) demonstrate the times in milliseconds for each update iteration of one iterative step, comparing different adaptive settings.The graphs depict frame time plots where each dot represents one iteration (x-axis) of the simulation and how long that iteration took in milliseconds (y-axis).The isolated singular dots are outliers where certain iterations took significantly longer due to various factors related to maintaining the adaptive algorithms.It can be inferred from the graphs that the time required for each update iteration in different scenes varies a bit as the simulation proceeds but remains stable even for large particle counts.For instance, in the Wall scene graph in Figure 6 (right), at around iteration 750, the wall hits the snow, and a sharp increase in update times can be seen, especially for the versions without spatial adaptivity.After this, the update times slowly increase until the end of the test, with a steeper increase for the spatial versions, and around iteration 1500, they become slower than the version using only temporal adaptivity.An estimate of the breakdown of the total GPU time spent on major kernels is provided in Figure 7.The presented tests were benchmarked without rendering the visual output.The category Setup features various kernels related to preparing the data for the physics simulation, including particle index hashing, sorting, and resetting arrays between updates.The GLBuffers category primarily features the Map and Unmap functions used for communicating data between the CPU and GPU.In our simulation, we need GLBuffers to expose the buffer to the CPU, where it is needed for computations.The Physics category includes all kernels involved in the physics calculations, which primarily consists of checking for collisions against other particles and calculating the resulting forces and movement.Finally, the Adaptive category includes the kernels used to enable the adaptive techniques.Whereas the overhead to deal with the spatiotemporal adaptivity is not high, the computational cost of other major kernels (GLBuffers, Setup) could not be reduced significantly.

Discussion and limitations
In the provided visual comparison between different modes for the chosen scenes, it is apparent that the temporal-only mode produces a simulation quality that is reasonably similar to the original or nonadaptive DEM simulation, albeit with a limited performance gain.On the other hand, the spatial and spatiotemporal adaptivity modes produce some visual effects that are different from the base simulation.Such effects typically originate in and around the regions with higher activity and can be attributed to the splitting or merging of particles.This can cause some extra perturbation in these regions in the process of preparation of an oncoming or finished higher activity, especially in case of particle splitting.We observed that this effect is much less pronounced or observable in scenes dealing with macro behavior of snow mass or large-scale obstacles (wall, sheet) and more prominent in scenes containing a small-scale activity (shoes); see also Figure 8.The visible difference could also be attributed to the strong DEM forces, which become more prominent when the simulation profile changes during particle splitting or merging.Here, a smaller smoothing radius in DEM simulations (as compared to SPH) might also be a contributor, which otherwise works in favor by reducing the computational overhead.Therefore, our approach could be more suitable for scenarios involving macro-scale behavior or where the interaction of larger objects with a mass of snow is sought.This is a common scenario in many CG applications, like computer games, wherein capturing an approximate behavior is often good enough.Additionally, the proposed approach could also be of use where the micro-interactions are not viewed from closeup angles or where such activity is not so pronounced.
A noteworthy observation is that the shared buffer between OpenGL and CUDA proves to be expensive in terms of computational time (Figure 7).This time could be reduced drastically by the use of modern APIs like Vulkan or DirectX 12. Another direction toward performance improvement could be to explore the use of adaptive grids or other sophisticated data structures for neighborhood determination of particles.In the base solver, the radius of different particles could vary 2 times in size depending on their physical state.In the current implementation, the particle sizes can vary many times further due to the adaptive spatial adaptation, making the uniform grid size a limiting factor on the performance front.

Conclusions
In this paper, we presented an efficient, parallel approach to apply spatial and temporal adaptivity to DEM-based snow simulation.Whereas spatial or temporal adaptivity has been researched separately in the context of SPH simulations in CG, we present a new adaptive method that can handle particle splitting and merging efficiently on the GPU.Our approach further identifies the advantages and challenges encountered while introducing adaptive spatial and temporal activity in the case of snow and DEM simulations.Similar to the base DEM solver, our adaptive technique was predominantly GPU-based.Our results show that a maximal speedup of more than 3x was achieved using our method with 3 million particles.We also identified the benefits, disadvantages, and sweet spots of our adaptive technique with the help of our presented visual examples.In the future, splitting and merging processes could be extended to reduce the difference in such zones from the original simulation.Finally, we believe that our adaptive approach could be extended to other materials as well as to different base solvers (for instance, SPH).

Fig. 2 .
Fig. 2. Particle size and activity adaptation as a wall moves against accumulated snow (top left).Starting with a scene containing all particles with the largest possible size, spatiotemporal grid activity (top right) and temporal grid activity (bottom left) are identified for regions within the scene.The actual values are obtained by transferring these activities to the particles (bottom right).
Fig. 4. The figure visualizes the overlap a sudden split of a particle can generate.The green particle in a) is marked to split.In b) the green particle has split into two.The red highlights the overlap that the split could result in.

Fig. 5 .
Fig. 5.The spatial and temporal activity caused by the falling of snowballs on an accumulated sheet of snow leading to a total particle count of nearly 3 million.

Fig. 6 .
Fig. 6.Graphs comparing (left) simulation times for various scenes and particle counts with total execution time overlaid with the GPU time, (right) times in milliseconds for each update iteration of one benchmark pass comparing different adaptive settings with colored dots.Isolated dots are outlying parts of the respective curves and depict a much longer update time.

Fig. 7 .
Fig. 7. Breakdown of GPU time spent in major kernels for different scenes and particle counts for the base and adaptive simulations for two different scenes.

Fig. 8 .
Fig. 8. Visual comparison of the simulations obtained using the base or default, temporally adaptive only, spatially adaptive only, and spatiotemporally adaptive simulations for the (top) gears and (bottom) shoes scenes. ].