On the Usage of the Sparse Fourier Transform in Ultrasound Propagation Simulation

The Fourier transform is an algorithm for transforming the signal from the space/time domain into the frequency domain. This algorithm is essential for applications like image processing, communication, medicine, differential equations solvers, and many others. In some of these applications, most of the Fourier coefficients are small or equal to zero. This property of the signals is used by the Sparse Fourier transform which estimates significant coefficients of the signal with a lower time complexity than the Fourier transform. The goal of this paper is to evaluate available implementations of the Sparse Fourier transform on a set of benchmarks solving the ultrasound wave propagation in 1D, 2D, and 3D heterogeneous media. The results show that the fastest available implementation in 1D domains is MSFFT, however, it is not possible to use it in our implementation of the 2D Sparse Fourier transform. Thus the AAFFT 0.9 is selected for our implementation of the 2D Sparse Fourier transform as the most stable and acceptably fast implementation. The results on 3D simulation data show, that by using the SpFFT library it is possible to reduce the computation time of the Fourier transform in ultrasound wave propagation simulation.


INTRODUCTION
The Fourier transform is a widely used algorithm for transforming signals from time to frequency domain and analyzing their spectral representation.Applications of the Fourier transform take place in multiple areas including medicine applications like magnetic resonance [20], ultrasound imaging [25], solving of differential equations [23], or numerical calculation of derivation [26].Since the Fourier transform allows calculation of the derivation more precisely than other methods, such as localized finite difference methods [13], thus it makes it suitable for usage in the ultrasound wave propagation simulation.
The time complexity of the Fourier transform algorithm is  ( 2 ), which is not suitable for long-time series.To reduce the time complexity of the Fourier transform, the Fast Fourier transform algorithm (FFT) [7] was introduced.This algorithm reduces computational time complexity from  ( 2 ) to  ( log  ), which is a significant improvement against the original algorithm.Although the Fast Fourier transform algorithm significantly reduced the time necessary to compute the Fourier transform, it is not always necessary to compute all Fourier coefficients, especially if from the nature of the problem there are only a few nonzero coefficients in the frequency domain.One example of such a problem is GPS synchronization [11].The synchronization process produces only a single spike in the time domain.To find this spike the Sparse Inverse Fourier transform was used.Another application using the sparse property of the input signal is a Time-Based Analog-to-Digital converter [29] working for signals with 3% sparse frequency domain.
The sparsity of the signal means, that the signal has only at most  significant coefficients, where  <<  for the signal of length  .The rest of the coefficients are usually zero (for exactly sparse signals) or negligible (for approximately sparse signals) representing noise in the processed signal.A good example of a sparse signal in the frequency domain is monochromatic light or a single-frequency transmitter.This signal property is exploited by the Sparse Fourier transform (SFT) algorithms.The Sparse Fourier transform can find significant coefficients with lower than  ( log  ) time complexity.The algorithm usually consists of three main steps frequency bucketization, frequency estimation, and collision resolution [24].Multiple algorithms are performing the SFT with different techniques and time complexity [18], for example, sFFT 2.0 [12], AAFFT [15], DMSFT [22] and many others which varies in noise robustness, time complexity and accuracy.
In this paper, some of the available implementations of the SFT will be selected and used to compute the Sparse Fourier transform in the last step of the ultrasound wave propagation simulation.In the simulations, we focus on a spatial pressure distribution so the last simulation step was selected to get more dense input samples.The number of coefficients of the signal in a frequency domain increases during simulation time due to signal decay, absorption, dispersion, and reflection.During this process, physical phenomena like reflection, refraction, and diffraction occur.
In the following sections, the experiments over the 1D signals concerning computation accuracy and execution time will be described.After the evaluation of the 1D implementations, one of them will be chosen and used in an implementation of the 2D SFT to evaluate possible usage of the Sparse Fourier transform on 2D signals with a focus on computation accuracy.At the end the SpFFT [2] (3D SFT library) will be evaluated on 3D simulation samples.

PROBLEM SPECIFICATION
Ultrasound simulations are used in many areas from industry to medicine.One of the rapidly emerging and critically deployed is preoperative treatment planning of ultrasound surgery.To run the simulation, the k-Wave toolbox [28] based on the following governing equations is used [28].
Equation ( 1) can be written in a discrete form using the k-space pseudospectral method [27].This equation is part of the spatial gradient calculations based on the Fourier collocation spectral method.

𝜕 𝜕𝜉 𝑝
In Eq. (2) for the Cartesian direction  =  in  1 ,  = ,  in  2 , F and F −1 denote the forward and inverse spatial Fourier transform ,  is the imaginary unit,   represents the wave numbers in the  direction, and  is the k-space operator defined as  =  (   Δ/2), where    is a scalar reference sound speed.
There are several approaches to solve mentioned differential equations such as pseudo-spectral method [28], finite-difference time-domain method [14], etc.
The k-Wave toolbox uses the pseudo-spectral method with the Fourier basis function, which means that a significant portion of the simulation is spent on the computation of the Fourier transform.This makes it a suitable tool for reducing simulation time by using the SFT algorithm.
The idea of pseudo-spectral methods is to transform the solution of the differential equation into a sum of a certain basis function.In the finite-difference time-domain methods, the gradient is computed based on the function values at the neighbor points.The more points are used, the more accurate the derivative estimation is.In spectral methods, the solution depends on the entire domain which makes them more accurate than local methods [10].In k-Wave, the Fourier collocation spectral method is used [28].In this method the derivative is much more accurate, thus the finer sampling around 3-5 grid points as opposed to 15-20 for finite difference methods can be used.This reduce the memory requirements 100 times in 3D domain.Additionally, thanks to the k-space correction, a relatively long step can be chosen, with a CFL (Courant-Friedrichs-Lewy) number of 0.3.This means that only three steps are needed to move the wave by one grid point.
In the current implementation, the Fast Fourier transform algorithm is used to find the signal's coefficients in the spectral domain.Computation of the ultrasound propagation simulation consists of many simulation time steps requiring 14 FFTs to be computed in each time step, which contributes to 60% of the simulation time spent by computing the Fourier transform [17].This means that the Fourier transform takes a significant part of the simulation time.
When the transmitter with a single frequency is used, the signal will spread across the medium and new frequencies will accrue during simulation time.Primarily on the edges of different media due to the change of propagation speed and reflection on the edges of two media.
The assumption is that it should be possible to decrease the computation time by using the Sparse Fourier transform algorithm.
To answer the question about the possibility of the usage of the SFT, the experiments on the one, two, and three-dimensional domains will be performed and evaluated in the following sections.

ONE-DIMENSIONAL SPARSE FOURIER TRANSFORM
At the beginning of this section, freely available one-dimensional SFT algorithm implementations will be shortly introduced followed by the experiments and their evaluation.All of the following implementations have two common input parameters.The length of the signal  and the number of significant coefficients .This means that number of significant coefficients must be known before the transform execution.

Ann Arbor Fast Fourier Transform (AAFFT)
The Ann Arbor Fast Fourier Transform algorithm [15] comes in two variants.AAFFT 0.5, the implementation of the FADFT-1 [8] with the time complexity  ( 2 • ( )) and AAFFT 0.9 that is implementation of FADTF-2 [9] with the time complexity  ( • ( )).The difference is that FADTF-2 uses the unequally spaced Fourier transform meaning multiplication of some  ×  submatrix of the  ×  Fourier matrix by a length-k vector [9].
In experiments, AAFFT 0.9 was used.The implementation relies on twenty user-defined parameters thus, its usage requires a lot of parameter tuning.These parameters affect the accuracy and computation time of the algorithm.It is impossible to run experiments with all combinations of these twenty input parameters.Therefore, the parameters with the highest impact on the execution time and precision were selected and changed during our experiments, the other parameters were set to a default value.The changing parameters affect the number of coefficient estimation cycles, the number of Taylor series coefficients, and the number of input samples used in the estimation phase of the algorithm.

Gopher Fast Fourier Transform (GFFT)
The Gopher Fast Fourier Transform [5] implementation comes in four variants with different algorithms [5] [16] for finding and estimating coefficients of the input signal.All variants take advantage of aliasing phenomena.
A detailed explanation of differences between all mentioned variants can be found in [16].Table 1 shows the time complexity of different GFFT implementations.Tested implementation requires only one input parameter controlling the accuracy of the Monte-Carlo algorithm to be set by the user.

Discrete Michigan State Fourier Transform algorithm (DMSFT)
The Discrete Michigan State Fourier Transform algorithm [22] was derived from the previously mentioned GFFT algorithm.It's time complexity is  ( 2 • log 11 2  ).This algorithm uses the multiscale error correction to estimate coefficients in noise signals [19].The implementation of this algorithm requires three user-defined parameters.First gives the number of points used in convolution, second the first main prime number in the sampling scheme, and third controls the number of sets of prime numbers in the sampling scheme.

Multiscale Sub-linear Time Fourier Transform (MSFFT)
The Multiscale Sub-linear Time Fourier algorithm [6] runs in a time complexity  ( • log  • log   ) on average.This implementation has four input parameters.The idea of the MSFFT algorithm is the alike principle of analog-to-digital conversion where a value of the signal can be estimated with high precision by using coarse binary quantization.The authors recommended suitable values for these parameters to achieve a good balance between speed and accuracy.

Experiments
To evaluate the libraries, the k-Wave toolbox [28] to simulate ultrasound propagation in a 1D grid was used.Simulations with a different number of media were run to get inputs with various numbers of significant coefficients.The transmitter was set to a single frequency to get the simplest simulation.The input signals are shown in Fig. 1.To reduce the amount of noise in the input signal, all coefficients smaller than -50dB were filtered out before the benchmark execution.In real simulations, the threshold of the noise is given by the parameters of the perfectly matched layer (PML) that gives us computation accuracy.The number of media in real simulations is around 30 for segmented data and 256 for data from computed tomography [21].As mentioned before, all implementations require the number of most significant coefficients to search for as an input parameter.To get this value, the Fourier transform was applied to the filtered input signal to get the number of the most significant coefficients .The number of coefficients in each input signal is the following: one for homogeneous media, 13 for two media, 16 for three media, and 18 for four media.
The field of our interest was the computation time and accuracy of the result in form of  2 and   error that are defined as follow: As the reference library, FFTW3 [1], which is a de-facto standard in FFT calculation Fast Fourier transform algorithm was used.The sizes of the input domain were  = 2 20 , 2 22 , 2 24 , 2 26 , 2 28 with the input file size from 8MB up to 2048MB.As mentioned before, all these implementations have some parameters, that influence the execution time and accuracy.We run multiple simulations to select the parameters that have the best balance between execution time and accuracy.For each parameter, an interval was selected, and the step by which the parameter is increased within the interval (eg. =< 0, 1 > and  = 0.2 results in a vector of parameter  = {0, 0.2, 0.4, 0.6, 0.8, 1}).Each implementation was then run with all combinations of these parameters over each input signal.The run with the smallest error and the best execution time was selected as the final value.The table of parameters is attached in appendix A. All simulations were computed on a single compute node of the Barbora supercomputer [3].This node is equipped with two Intel Cascade Lake 6240, 2.6GHz, and 192GB of RAM.
Before we look at the results, it is important to mention that all graphs with the results are in the logarithmic scale and the number of coefficients in the signals is extremely low compared to the sizes of the input signal.
The results are shown in Figure 2. First, the MSFFT implementation outperforms all of the other implementations over all given inputs.Second, the AAFFT is the most stable implementation concerning the signal size.With the increasing size of the signal, this implementation's execution time is almost constant, which probably depends on how efficiently the given implementation estimates the coefficients concerning the number of used signal samples.Finally, almost all of the measured implementations outperform FFTW3 on large input domains except GFFT-Superdet.Measured  2 and   errors of the SFT implementations are shown in Fig. 3.We can see that all implementations have an acceptable level of error (except of the AAFT on the second simulation), which is in the case of ultrasound propagation simulation   <= 10 −5 .All computations were made with double precision values.
We can notice that there are no values for GFFT in the error charts and the execution time measurement is not complete for this library in heterogeneous media.The reason is the initialization process of the GFFT library.To execute the benchmark over the SFT algorithm implementation, it is necessary to load the signal into the structures of the SFT algorithm implementation.The implementation of the GFFT algorithm does not allow an easy way to load the whole input signal because the sampling and other operations are performed during the initialization process.The GFFT source code allows benchmarking where the user selects a number of coefficients and level of noise and GFFT creates only samples necessary for its computation.Thus processing the whole input of this size leads to a long initialization process in the case of heterogeneous media so we were not able to measure the results of this library on bigger inputs due to the time-consuming initialization process.
The   error of almost all of the evaluated implementations is on the values, that allow their usage in ultrasound propagation simulation.Even if the error of each step were summed up, the simulation   error would be lower than 10 −5 in 1000 simulation time steps.The speedup against FFTW [1] is in the case of MSFFT [6] on the order of 10.000x, and in the case of AAFFT 0.9 [15] 82x for a signal with 4 media and the size of 2 28 .

TWO-DIMENSIONAL SPARSE FOURIER TRANSFORM
The process of computation of 2D SFT is schematically described in Figure 4.The algorithm of the two-dimensional Fourier transform uses a one-dimension Fourier transform in the following way.In the first step, the one-dimensional Fourier transform is applied to each row in the 2D domain.After the first step, each row contains the results of the one-dimensional Fourier transforms.In the second step, the Fourier transform algorithm is called on the results from the first step but in a column direction.For the implementation of the two-dimensional SFT, we needed a 1D SFT implementation, that meets some requirements.The implementation should be able to fill  most significant coefficients with zeros in case the selected  for a given dimension is larger  The number of coefficients in the spectral domain is variable during the simulation, where the signal may spread across a media with a different sound speed and density.For the usage of the Sparse Fourier transform in ultrasound propagation simulation, it will be necessary to use some iterative approach to estimate an unknown number of coefficients in each row/column of the media to achieve a two-dimensional Fourier transform with a given accuracy.

THREE-DIMENSIONAL SPARSE FOURIER TRANSFORM
For a three-dimensional SFT, we were able to find the SpFFT[2] library.SpFFT implements a three-dimensional Sparse Fourier transform with the support of OpenMP, MPI, CUDA, and ROCm.There is support for both single and double precision implementation.Three-dimensional simulation is the main simulation used for treatment planning, thus the experiments were executed over unfiltered real 3D simulation data (last simulation step) to get as close to the real conditions as possible.First, two simulations in the domain with a cutout of the human head were taken.The size of these simulations is 262 3 and 472 3 with input sizes 1GB and 4 GB.Let's denote them as P262 and P472 respectively.The second two simulations with the domain containing a full human head were taken.The simulation domain sizes are 502 3 and 922 3 with input sizes 1.25GB and 6.5GB.Let's denote them as F502 and F922 respectively.As the reference the FFTW3 [1] library was used.The measurement was executed at Karolina cluster [4] on one up to eight computation nodes, where each of them is equipped with two AMD Zen 2 EPIC 7H12, 2.6GHz, and 256GB of RAM.
The results in Table 2 show that in double precision the SpFFT library meets the required precision unlike for single precision.When we look at the computation speedup in Table 3 for forward transform and in Table 4 for backward transform it can be observed, that in some cases this library is more than two times faster for forward transform and almost two times faster for backward transform than FFTW3 that is currently used in ultrasound wave propagation simulation.

CONCLUSION
The performance and accuracy of several SFT algorithm implementations on the last step of one-dimensional ultrasound propagation simulation were measured.After the evaluation of the selected 1D SFT implementations, AAFFT 0.9 was chosen as a suitable 1D SFT implementation for our experimental two-dimensional SFT.In the final section, the SpFFT implementation of the three-dimensional SFT was measured on unfiltered real simulation data.The results have shown that MSFFT is in given cases the fastest implementation with time around 10 −4 seconds and   error around 10 −11 .Unfortunately, MSFFT was not able to fill  coefficients with zeros which eliminates it from being used in 2D SFT implementation.The results have also shown that the accuracy and computation time of each library is highly dependent on the values of the input parameters.
In the 2D variant of SFT, the sparsity of the signal in each row/column of the 2D domain is different, thus the results with differently filtered input signals were provided.The measurements have shown the 2D SFT can be used on the real simulation data while reaching   error below 10 −10 .In the case of the 3D input data, the results show that SpFFT is more than two times faster for forward transform and almost two times faster for backward transform while holding the required computation precision.
The results indicate that using the Sparse Fourier transform in k-Wave [28] ultrasound propagation simulation should be possible while holding a given computation accuracy.This may lead to lower simulation time, thus reducing the time of the treatment planning.
However, there are some goals to achieve.First, remove the dependency of the 1D SFT on the knowledge of the number of coefficients in the spectral domain of the input signal and expensive parameter tuning before transform execution.Second, the 2D SFT implementation using existing 1D SFT is currently fully sequential, which leads to a long computation time that does not allow us to run full simulations with the use of this 2D SFT implementation.This paper provides an answer which concludes with the statement that it is theoretically possible to use SFT in the ultrasound propagation simulation.This paper should be the starting point for the future acceleration of the ultrasound wave propagation simulation in the k-Wave toolbox.

Figure 1 :
Figure 1: Last step of the linear wave propagation simulation performed by the k-Wave toolbox.The X axis represents grid points(distance) and Y axis represents amplitude.

Figure 3 :
Figure 3:  2 and   error of different SFT implementations on the 1D signal with 2 20 samples

Figure 4 :
Figure 4: Usage of one-dimensional FFT to create two-dimensional FFT.

Figure 8 :
Figure 8: Execution time of the 2D SFT implementation

Figure 9 :
Figure 9: Execution time of the SpFFT for the different number of computation nodes.

Table 1 :
Time complexity of different GFFT implementations

Table 2 :
2 and   error of the 3D SFT implementation in Single precision (SP) and Double precision (DP) on the real data samples after forward and backward transformation.error DP 1.60 −10 1.46 −18 1.89 −18 3.12−18

Table 3 :
Forward transformation speedup of the 3D SFT against 3D FFTW on the real data samples.

Table 4 :
Backward transformation speedup of the 3D SFT against 3D FFTW on the real data samples.