Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs

Bessel functions are critical in scientific computing for applications such as machine learning, protein structure modeling, and robotics. However, currently, available routines lack precision or fail for certain input ranges, such as when the order v is large, and GPU-specific implementations are limited. We address the precision limitations of current numerical implementations while dramatically improving the runtime. We propose two novel algorithms for computing the logarithm of modified Bessel functions of the first and second kinds by computing intermediate values on a logarithmic scale. Our algorithms are robust and never have issues with underflows or overflows while having relative errors on the order of machine precision, even for inputs where existing libraries fail. In C++/CUDA, our algorithms have median and maximum speedups of 45x and 6150x for GPU and 17x and 3403x for CPU, respectively, over the ranges of inputs and third-party libraries tested. Compared to SciPy, the algorithms have median and maximum speedups of 77x and 300x for GPU and 35x and 98x for CPU, respectively, over the ranges of inputs tested. The ability to robustly compute a solution and the low relative errors allow us to fit von Mises-Fisher, vMF, distributions to high-dimensional neural network features. This is, e.g., relevant for uncertainty quantification in metric learning. We obtain image feature data by processing CIFAR10 training images with the convolutional layers of a pre-trained ResNet50. We successfully fit vMF distributions to 2048-, 8192-, and 32768-dimensional image feature data using our algorithms. Our approach provides fast and accurate results while existing implementations in the Python libraries SciPy and mpmath fail to fit successfully. Our approach is readily implementable on GPUs, and we provide a fast open-source implementation alongside this paper.

This paper focuses on the modified Bessel functions of the first and second kinds, denoted, respectively, I v (x) and K v (x), where v represents the order and x the argument.These functions will be formally defined in Section 3.
In statistics, the modified Bessel functions are central to distributions such as the Mises-Fisher distribution and the K-distribution Watson (1983); Mardia, Jupp, and Mardia (2000); Kotz, Balakrishnan, and Johnson (2004).Despite their widespread use, there remain significant challenges with their numerical computation; particularly in terms of robustness when dealing with large order values and over different input ranges, as they easily underflow or overflow.Although early work has been done on algorithms that optimize accuracy and reduce computational time, they still have significant problems Bremer (2019); Lin (2016); Biagetti et al. (2016); Tumakov (2019).
In Bayesian deep learning, the limitations of current methods for computing modified Bessel functions have been noted, with existing techniques often lacking precision or prone to overflow or underflow making them unusable Oh, Adamczewski, and Park (2020).To circumvent these problems, Oh, Adamczewski, and Park (2020) resort to bounds on the ratio (x) when v takes large values, e.g.v in the thousands or tens of thousands.Figure 1 demonstrate these limitations using SciPy's implementation, which struggles beyond v = 128 and exhibits longer runtimes than our library.
To address the accuracy limitations of current implementations, we focus on computing the logarithm of modified Bessel functions efficiently and effectively.Some methods for computing Bessel functions focus on scaling the Bessel functions to avoid numerical problems such as underflow or overflow.In these cases, the Bessel function B(x) is scaled by a function S B (x), which for I v (x) and K v (x) are given by Equations ( 1) and (2).In this way, the scaling functions compensate for exponential increases and decreases in the functions.SciPy uses these scaling functions to extend the range of input in which it can compute the Bessel functions Virtanen et al. (2020).
(1) S K (x) = exp(|x|). (2) Instead of scaling the functions with exponentials, we propose to compute the logarithms of the modified Bessel functions, log I v (x) and log K v (x), by computing intermediate values on a logarithmic scale to ensure the numerical stability of the results.
In addition to the lack of robustness, the existing implementations are not fast and are CPU-bound.Therefore, we also develop a GPU version, since CUDA only has implementations for v = 0 and v = 1, and applications need log I v (x) for any v > 0.
This paper is part of a project that seeks to create a special function library that can run on GPUs.This avoids GPU-to-CPU and CPU-to-GPU memory transfers when performing machine learning with special functions, as these memory transfers are expensive Gregg and Hazelwood (2011).The functions should be precise and have runtimes comparable to existing solutions when using a GPU. 1 1 The code can be found at https://lab.compute.dtu.dk/cusf/cusf 2 Related Work Applications of Bessel Functions.The modified Bessel function of the first kind appears in the probability density function of the von Mises-Fisher distribution.This models data on the p − 1 unit hypersphere, S p−1 = {x ∈ R p | ∥x∥ 2 = 1}, and is a cornerstone of directional statistics Mardia, Jupp, and Mardia (2000); the von Mises-Fisher distribution in R p uses I v (x) for v = p /2 and v = p /2−1 (Section 6.3).We review a few applications next.
Beik-Mohammadi et al. ( 2021) create a generative model with the von Mises-Fisher (vMF) distribution to capture endeffector orientations in a robot control setting.
The vMF distribution is also used in metric learning, which aims for similar data to be near, while dissimilar ones should be far away Musgrave, Belongie, and Lim (2020).Warburg et al. (2024) propose a Bayesian method to capture the inherent uncertainty of the model predictions.Their pipeline embeds images into a neural network feature space, which is normalized to unit norm.Using a Laplace approximation to capture weight uncertainty, they approximate the predicted feature distribution with a vMF distribution, which requires high-order Bessel functions.Boomsma et al. (2008) use the vMF distribution to model local protein structures.Banerjee et al. (2005) use vMF to cluster high-dimensional data and give methods to fit the vMF distributions without having to evaluate the modified Bessel function.However, the approximation has errors around 1% even for low-dimensional data, p ∈ [10, 100].
Oh, Adamczewski, and Park ( 2020) use the vMF distribution for Bayesian uncertainty quantification and observe the inherent instability in modern libraries.Therefore, they create their own approximation to estimate the ratio between the modified Bessel function of the first kind of orders p /2 and p /2 − 1.This is required when estimating the parameters of the vMF distribution using the statistical estimation method presented by Sra (2012).This shows that whenever the modified Bessel functions were encountered, the researchers had to come up with approximations because they did not have a numerically stable method for computing the functions, which is the core contribution of our work.
Calculating Bessel Functions.Many software libraries that handle special functions incorporate algorithms to compute Bessel functions, often based on the approach described by Amos (1986).An example is the Boost C++ library, which implements the unscaled Bessel functions.The documentation of the Boost library contains tests of the precision of their implementations; they report the relative errors in units of epsilon (machine precision) using Mathematica for the reference solutions.The tests are done with double precision, but no input range is given.The relative errors they report are around machine precision for doubles (≈ 2 • 10 −16 ), but we experimentally found (Section 6) that the relative errors can be several orders of magnitude larger.
Calculating Logarithm of Bessel Functions.Research has already been conducted on computing the logarithm of Bessel functions.For example, Rothwell (2008) presents algorithms for both the Bessel functions of the first and second kind, using recursion in areas where the conventional approach introduced by Amos (1986) faces numerical challenges.However, it is important to recognize that the recursion method means that the runtimes for computing the logarithms grow linearly with the order v.In addition, these methods involve the use of complex numbers in the computation process because the Bessel functions can be negative.The high runtime when v is large would negatively affect the performance on GPUs as we wish to process large arrays of numbers and not just singular values.Additionally, complex numbers increase the memory load and register usage, thus limiting the maximum number of concurrent kernels.Similarly, Cuingnet (2023) propose an approach for the logarithm of the modified Bessel function of the second kind, but the work does not provide code or comparative information, making any comparison difficult.Bremer (2019) provides algorithms to evaluate the logarithm of the Bessel function of the first and second kind, with errors ranging from 10 −12 to 10 −9 and a runtime of approximately 3 to 4 seconds for 10M input values.Errors are estimated using Mathematica to provide reference solutions.Our implementation manages to do much better for the logarithm of the modified Bessel of the first and second kind; we get errors around machine precision (10 −16 ) while the runtime in most cases is 1 to 2 orders of magnitude better.

Theory
This section will go into the theory necessary to develop the algorithms and introduce asymptotic expressions.Watson (1922) gives a thorough explanation of the Bessel functions.
Defining Bessel Functions.The Bessel functions are defined as the solutions, y(z), to Bessel's differential equation shown in Equation (3), where z ∈ Z is the argument and v is the order Watson (1922).Meanwhile, the modified Bessel functions are the solutions to Bessel's modified differential equation seen in Equation (4), where x ∈ R is now the argument (Abramowitz and Stegun 1972, p. 374).
The Bessel functions of the first and second kind are the solutions to Equation (3), where y(z) is, respectively, finite or divergent for z = 0.The solutions are respectively denoted as J v (z) and Y v (z).The modified Bessel functions of the first and second kind, I v (x) and K v (x), are given by restricting J v (z) and Y v (z) to purely imaginary inputs z; for example, Olver et al. (2023).
Logarithmic computations.When computing logarithms for Bessel functions, a certain transformation is essential that involves logarithms of sums.It is the "logarithmof-a-sum" trick for a sequence of N positive numbers, providing a formula to efficiently compute the logarithm of the sum.This technique improves numerical accuracy, for instance, by focusing on the logarithms of terms rather than the terms themselves.The transformation is given as For the sake of numerical accuracy, the optimal a i to select is the largest term of the series, i.e. i = arg max k a k .This prevents the exponentiation operations from overflowing.

Modified Bessel function of the first kind
First, we will consider the modified Bessel function of the first kind, I v (x), for the inputs v ≥ 0 and x ≥ 0. This function produces positive outputs, so the logarithm is welldefined.An infinite series can be used to evaluate the function for all inputs (Abramowitz and Stegun 1972, p. 375 Eq. 9.6.10).However, it is advantageous to use some asymptotic expressions for various ranges of input values to speed up the computations and improve the numerical accuracy.The same techniques and some of the expressions used for I v (x) can be applied with minor modifications to the modified Bessel function of the second kind, K v (x).I v (x) is given by the infinite series below Olver et al. (2023).
To compute log I v (x), we truncate the series after a finite number of terms.For this, we have the following corollary.Corollary 1 The series in Equation (6) converges absolutely for all inputs.
This follows from the fact that the factorial function grows much faster than the exponential function.
Corollary 1 implies that, in practice, the series can be truncated after N terms.
Recurrence relation.Given the sum presented above, we can simplify the calculation of the terms by introducing a recurrence relation.In the series Equation ( 6), the terms are given by Equation ( 7) and can be calculated recursively, Equations ( 8) and ( 9).This is relevant for efficient implementations, as the previous terms can be used to quickly calculate the next term.
Taking the logarithm of I v (x) and applying the "logarithm of a sum" trick gives where a i is the largest of the terms log Thus, to evaluate log I v (x), we only need the logarithm of the terms a k .Due to the multiplication in the recurrence relation for a k , Equation ( 9), the logarithm of the terms can be efficiently evaluated without the risk of underflow or overflow using a similar recursion, (12) With these equations, a computational routine can be developed to compute log I v (x).Given the input parameters v and x, the procedure involves the following steps: 1. Initialize the value of N to ensure sufficient accuracy.2. Compute Equation ( 12) for the a k terms.3. Calculate log I v (x) using Equation ( 10).
This leaves an unanswered question.What value N should be used to accurately calculate log I v (x)?
Using the recurrence relation for log a k as given in Equation (12), the value of log a k changes by, log a k+1 − log a k =2 log x − log 4 − log(k + 1) − log(k + v + 1).Thus, the value of log a k increases ∀k < K, where K is the unique value that satisfies 2 log x − log 4 = log(K) + log(K + v), and decreases ∀k > K.
Solving the equation for K gives Equation ( 13) with the approximation holding when x ≫ v, x ≪ v, and x ≈ v. Thus, the location of the peak term is approximately linear in x.
The computations are done in finite precision, so when doing the summation in Equation ( 5), we get no influence on the summation ∀k : log a k − log a K < log ϵ, where ϵ is the machine precision.The terms grow (almost) exponentially until this point, followed by a superexponential (c/(k!)) decrease, so many terms would not affect the summation.2Therefore, it is not necessary to compute all the terms, as only the terms above machine precision in Equation ( 5) affect the output.Terms less than machine precision, relative to the maximum value, are ignored.From empirical experiments, we observe that the number of relevant terms grows as 9.2 x is still large, so it is still necessary to evaluate many terms.Therefore, we will use asymptotic expressions when the inputs are large.
Assymptotic expressions.We find asymptotic expressions that hold when v or x is large to avoid evaluating the summation presented above for many terms.We found two asymptotic expressions that are fast to evaluate and give accurate results when the series definition of log I v (x) is lacking.Numerical tests against MATLAB's Bessel functions are used to determine in which regions the expressions work well; we will focus on this in Section 4.
The first expression is denoted as the "µ K expression" and is given below for log I v (x) with µ = 4v 2 Olver et al. (2023).
(14) This expression works well for large arguments x provided the order v is small.Numerical tests have shown that it is not necessary to evaluate more than about 20 terms in the infinite series.The main reason for using 20 terms instead of, say, 5 terms is that the expression works for slightly smaller inputs.However, this effect plateaus after 20 terms.The K in µ K indicates that the first K terms are used to calculate the result.There is an absolute value around the infinite series, as numerical errors can cause it to be negative.In addition, as seen earlier with the log a k recursion, the terms in the series can also be calculated recursively to improve the runtime.
The second expression is denoted as the "U K expression" and is given below for log I v (x) Olver et al. (2023).
The infinite series is cut off after a few terms and the K is again used to indicate how many terms are kept.This method can only be used when v ̸ = 0 due to the definition of x ′ , and it does not work well if 0 < v ≪ 1.The functions u k (t) are polynomials given by the recursive expression Olver et al. (2023).
We have only been able to find the polynomials u k (t) for k = 1, ..., 6 in the literature (Olver et al. 2023, 10.41(ii)).
Using Mathematica, we computed u k (t) for k = 7, ..., 13.Like in Equation ( 14), the infinite series in Equation ( 15) contains an absolute value sign, which is only needed to limit numerical issues.
Based on the input values, we then have three available methods, the sum definition from Equation ( 6), the µ k expression from Equation ( 14), and lastly the U K expression from Equation (15) to calculate log I v (x).In Section 4 we will determine the input ranges for each method.

Modified Bessel function of the second kind
This subsection focuses on the elements necessary to calculate the logarithm of the modified Bessel function of the second kind, log K v (x), for input argument x and order v.
Starting with the asymptotic expressions, µ K and U K can still be used with minor changes, as indicated below.
Here, only some of the first coefficients and the alternating sign in the infinite series have changed.Therefore, the considerations for the µ K expression of log I v (x) still hold.Specifically, the same number of terms and input ranges works well.
The "U K expression" for log K v (x) is similar to before Olver et al. (2023).
with u k (t) being as in Equations ( 16) and ( 17).
Integral definition.The asymptotic expressions only hold for large inputs, so we need an expression for small values.For small inputs, we can use the integral expression in Equation ( 20) (Rothwell 2008, p. 241 Eqs. ( 26)-( 27)).
To simplify expressions based on the integral, we define the functions f (u), g(u), and h(u) as given below.
We need to evaluate the integral numerically, which we will do using Simpson's 1/3 composite rule Solomon (2015).
The N is constant, and numerical tests have shown that N = 600 gives acceptable results balancing runtime and accuracy for all inputs tested.To simplify the sums we define the weights w k given by: Combining the weights with the definition of g and h, and applying the "logarithm-of-a-sum" trick, the logarithm of the integral term in Equation ( 20) can be rewritten as follows.Note that f (0) = 0.
The integral can then be computed quickly from log G and log H.We once again apply the "logarithm-of-a-sum" trick; however, we precompute the maximum values before evaluating the sum.Otherwise, we must store all terms in memory and loop through the sums twice.To simplify, we focus solely on max g(u) and max h(u) and ignore the weights w k .
Finding heuristics for maximums.
The maximum value of a function, here g(u) and h(u), is either where the derivative is zero or the boundaries of the input range.Based on the integral Equation ( 20) we have u ∈ [0, 1].Therefore, the maximum can also be at u = 0 and u = 1.
We start by considering g(u), and we apply the logarithm as this will not change the location of the maximum.Furthermore, we take the derivative with respect to u.
Setting the derivative to zero and solving for u gives Setting x = u β and solving for x gives a quadratic equation with solutions given below in Equation ( 21).
If there exists a solution x ∈ R to the above equation such that u * = x 1 β ∈ [0, 1] then u * is the argument that maximizes g.However, it is difficult to determine for which input ranges solutions exist.If u * exists, numerical experiments have shown u * ∈ [0.95, 1], with the maximum only slightly larger than g(1).Furthermore, in most cases there are no solutions u * ∈ [0, 1], and the maximum is at u = 1.So, the heuristic u * = 1 is valid in most cases, and if it is false, then max u g(u) − g(1) is small.Thus, there are no numerical problems when using this heuristic.
We will then consider log h(u) and its derivative.In addition, we will set the derivative to zero and solve for u.
The last expression can be rewritten as a quadratic equation with solutions: The part under the square root is non-negative for all inputs x and v. Thus, we see that a solution always exists, and numerical tests have shown this to be the maximum in all cases.However, the following heuristic has been shown to approximate the solution u * = arg max h(u) very well.
Thus, we can get a good approximation of the maximums of g(u) and h(u) without evaluating complex expressions.
Expression Input ranges µ We can now calculate log K v (x) using these three available methods, the integral definition (20), the µ k expression (18), and the U K expression (19).In Section 4, we determine suitable input ranges for each method.

Algorithms
The expressions presented above must be merged to give algorithms to calculate log I v (x) and log K v (x).To do this, we need to determine which expression to use for a given input.
TEST: 4 We conducted tests of the runtimes in MATLAB 2020b and found that the µ K expression is the fastest to compute for all K ≤ 20.The second fastest is the U K expression for all K ≤ 13.Finally, the infinite series and integral expressions, for, respectively, log I v (x) and log K v (x), are the slowest.

Choosing the right expressions
We have different approximations of log I v (x) and log K v (x) and use numerical experiments to determine which is more accurate for different input ranges.The tests show that the same input ranges are valid for the µ K and U K expressions when calculating both log I v (x) and log K v (x).The input ranges can be seen in Table 1.
The order of the expressions in the table gives the order of priority for the expressions with the series and integral definitions as the fallback cases.Thus, if for an input (v, x) the expression µ 20 is valid, then it will be used regardless of whether any of the U K expressions can be used.Additionally, to simplify the code, we do not use all µ K or U K expressions for K = 3, ..., 20 and K = 4, ..., 13, respectively.While the µ 4 expression is faster than the µ 20 expression, the difference is not very large, so to simplify the code, we chose a select few expressions based on numerical tests.The main runtime sinks are currently the sum and integral expressions, while the other expressions are much faster.

Pseudo code
Using the expressions presented previously and the input ranges for which they are valid, we can write a routine to calculate log I v (x) and log K v (x) given an input (v, x).The pseudocode can be found in Algorithm 1.
using the equations given in the theory section when running on a CPU.When running on a GPU the branches for the µ 3 , U 4 , U 6 , U 9 expressions are removed to reduce warp divergence.
Using these algorithms, we can then compare them to the solutions available in other software packages by directly comparing the runtimes and accuracies, but also by testing them on a real-world use case.

GPU specific optimizations
The CPU code uses naive parallelization by parallelizing the loop over the elements using OpenMP.We make some modifications to the CPU code to optimize the code for GPUs.The code is written for CUDA where threads are collected in blocks of threads; 256 threads in our case gave good results.The threads in a block are collected in warps of 32 threads, where each warp is used for the Single Instruction Multiple Threads, SIMT, execution model.When naively parallelized on GPUs, the i'th thread computes log I v (x) or log K v (x) for the i'th input value (v i , x i ) as for CPUs.However, the threads in a warp should work on the same expression.For example, if the i'th and i + 1'th elements are computed using the µ 20 and U 9 expressions, respectively, and they run on the same warp, this will cause warp divergence.To avoid this and to balance the load for an entire block of threads, we sort the input elements based on which expression is used for each element.This means that if there are k elements com- 1,000,000 log I 0 (x) 10,000,000 Large 10,000 Table 2: Number of reference solutions for testing the precision when calculating log I v (x), log K v (x) and log I 0 (x).
puted using the µ 20 expression, then k 256 blocks will work entirely on the µ 20 expression, resulting in high utilization for these blocks.Measurements of the runtime will include the sorting operations, which account for ≈ 33% of the runtime, however without the sorting the code is overall 3 to 4 times slower.Furthermore, as noted in Algorithm 1, the GPU version omits some of the expressions.This is done as an additional measure to improve utilization and thus performance.The faster expressions that cover smaller input ranges have been removed, which has been found to improve the overall runtime for GPUs.

Data and experimental setup for numerical experiments
We benchmark our library against the corresponding functions for log I v (x) and log K v (x) in the standard library (std), the GNU Scientific Library (GSL), and the Boost library Cpp (2020); Galassi et al. (2009); Boost Community ( 2024).For GSL we can use their scaled functions, which reduce the risk of numerical underflows and overflows.We take the logarithm and add or remove the argument to undo the scaling functions shown in Equations ( 1) and (2), while for all others, we take the logarithm of the function outputs.Moreover, GSL, Boost, and the CUDA Math Library (CUDA in tables) contain functions specifically for orders v = 0 and v = 1.Therefore, we also compare our library's general log I v (x) function against these special-purpose functions.

Test Regions
We evaluated the performance of log I v (x) and log K v (x) in two distinct regions of (v, x).The two regions that were considered for testing are the Small region  [150,4000] for log K v (x)3 ).When testing the special case v = 0, we use the Small and Large regions for the input argument x.The Small region is chosen so that thirdparty libraries, specifically Boost, could compute values for log I v (x) without having underflow or overflow errors or returning Not a Number (NaN).The large region was limited by the range of inputs for which it was feasible to compute reference values using Mathematica.

Test Points Sampling
For performance evaluation of log I v (x) and log K v (x), we sampled 10M points uniformly in the specified regions, while for v = 0, we uniformly sample 100M points in both regions.
For precision testing, we uniformly sample 1M points in the Small region and 1K points in the Large region for the fractional order.We sample 1M points in the Small region and 10K points in the Large region for the special case v = 0.

Reference Solutions
To establish reference solutions for precision tests, we utilized Mathematica 13.3 to perform accurate calculations for each sampled point and stored up to 16 decimal points to ensure that reference solutions are precise up to machine errors for doubles.For precision testing, any points that were incorrectly evaluated in Mathematica were filtered out.Such inaccuracies were determined by Mathematica not providing a numeric answer but simply stating "Indeterminate".However, it should be noted that even when Mathematica provides a numeric output, it may not be accurate in some cases; particularly when evaluating values where v ≈ 100 and x ≈ 0.1.In such cases, Mathematica raises a warning indicating that there is a loss of precision.To obtain more accurate results, we suggest using Wolfram|Alpha.However, due to its computational heaviness and time-consuming nature, we only used Wolfram|Alpha for a few dozen input points, which will be discussed later.
The reference solutions provide a baseline for evaluating the accuracy of the implemented functions on both CPU and GPU.

Numerical results
This section looks at the results of experiments performed using the data presented in the previous section to test different libraries that compute log I v (x) and log K v (x).The tests were run on an NVIDIA RTX 2080 TI GPU and an Intel Xeon Silver 4208 CPU using all 16 cores.We tested later on newer hardware (NVIDIA A100 80GB and AMD EPYC 7742 CPU using all 64 cores), and the relative differences in runtimes between libraries and the precision was unchanged.Tests are performed using double precision floating point numbers, which reduces the throughput of arithmetic instructions on the GPU by a factor of 32 compared to using single precision floating point numbers (NVIDIA 2023, 5.4.1. Arithmetic Instructions).We are more concerned with numerical accuracy than runtime, so the potential throughput improvement from 32 times more arithmetic instructions does not offset the reduced accuracy.

Precision
We split results for the CPU and GPU functions in our library and denote the computing device explicitly.The precision is measured as the absolute value of the relative error to the reference solution.We compute precision as the median and maximum statistics, excluding Not-A-Number (NaN) and ±infinity values.In cases where all values are NaN or ±∞, the median and the maximum are denoted as Not Available (N/A).
We define robustness as the fraction of test points for which the methods were able to produce an answer that was not NaN or ±∞.Robustness is the key metric when comparing libraries, and we break ties with the maximum statistic.
The key observation from Table 3 is that our library never fails to compute a value, i.e., the overall robustness is 100%.In addition, the median errors are at machine precision; however, the maximum relative errors are sometimes higher than the maximum relative errors for other libraries.For example, compared to Boost in the Small region for log I v (x).Therefore, we cannot determine which library is here the most accurate.As an extension, we plot the cumulative distribution of relative errors in Figure 2, which shows that our library has a higher proportion of values with small errors.for the Small region.The plot illustrates the comparative performance in terms of numerical precision.Our library has a steeper curve, indicating a higher proportion of outputs with minimal relative errors, demonstrating our library's higher precision in evaluating the logarithm of the modified Bessel function of the first kind over the entire range tested.
Table 3 indicate some outputs from our library with high relative errors compared to the reference values of Mathematica.These are values where v ≈ 100 and x ≈ 0.1, and Mathematica warns about the loss of precision for such inputs.Therefore, we extracted the 35 values with the highest relative errors for our library (relative errors of ≈ 10 −6 or higher), and recalculated the reference values using Wolfram|Alpha (Table 4).We see that the most accurate library is now ours, which gives answers that are 12 orders of magnitude closer to the reference values than other libraries (errors near machine precision).GSL could not calculate an answer for any of the 35 values, while std and Boost had high median and maximum errors.
These results suggest a discrepancy between Mathematica and Wolfram|Alpha.We were unable to find any information about this difference, but we noted that the Wolfram Language supports arbitrary-precision arithmetic Wol (2003).However, the Wolfram language is used by both, so this would not explain why Mathematica has numerical prob- Table 3: Precision metrics for different libraries in Small and Large regions when calculating log I v (x) and log K v (x).The errors are the absolute relative errors compared to the reference solutions calculated using Mathematica.The robustness value in bold indicates the library with the highest robustness value, where the maximum relative error is used to break ties.for a given function and region.When computing log I v (x) for the Small region, notice that the boost library has a much lower maximum error than the other libraries when using Mathematica as the reference solution.
lems.It should be noted that Mathematica raised a warning about loss of precision due to numerical underflow, so this would support Wolfram|Alpha being the better option for reference values.This suggests that Mathematica may not be the optimal source for numerical tests, as is current practice Boost Community (2024).In summary, our library has similar or better precision than existing libraries when considering the discrepancy between Mathematica and Wolfram|Alpha.More importantly, our library is always able to return a successful value.
Additionally, we can compare the libraries, along with the CUDA Math Library, by computing the modified Bessel function of the first kind for the special cases where v ∈ {0, 1}; these special cases allows for specialized solutions.Our library does not have functions specifically designed to handle this special case, so the generic function for log I v (x) is used.The metrics can be seen in Table 5 and show that std is the most precise for the values in the Small region, with our library only slightly less precise.For the Large region, the GSL and our library are the most precise, as both can compute for all input values, and the relative errors are numerical errors, while the others lack robustness and accuracy.

Performance
Next, we look at the runtimes of the different libraries to compare the computational efficiency of each.For the fractional-order methods, we get the runtimes shown in Table 6.From the table, we can see that our library is the fastest for computing log I v (x) in both regions, but for the Small region, the functions from the std and GSL libraries are faster for computing log K v (x).Except for the Small region when computing log K v (x), the runtime of our library is about 20 to 200 ms for 10M input values.This is one to two orders of magnitude faster than the results presented by Bremer (2019) for computing log J v (x) and log Y v (x).
We can then also compare the runtimes when looking at the special case v ∈ {0, 1}, where the CUDA Math Library is also available for a GPU comparison.For these special cases, there exist simplified expressions.The results of these tests are shown in Table 7, where it is clear that the CUDA Math Library is much faster in both regions.However, the CPU function in our library is comparable to the other CPU libraries; and for the Large region, we are much faster than the other CPU libraries.We now want to show that our library can be used to solve a real-world problem using a setup similar to Warburg et al. (2024).Here, we fit high-dimensional features from image processing to von Mises Fischer, vMF, distributions.We choose this distribution because it models data    that exist on a unit hypersphere in p-dimensional space, S p−1 = {x ∈ R p |∥x∥ 2 = 1}.This is exactly the type of data that results from l 2 -normalization of the extracted features Musgrave, Belongie, and Lim (2020).

Metric learning
The images are first resized to a resolution of D × D. We then use the convolutional layers of a pre-trained image classification model to extract high-dimensional features.This pipeline has been shown in Figure 3.The dimensionality of the extracted features is adjustable by changing the resolution D.
We use the CIFAR10 training dataset of 32 × 32 color images, which consists of 50k images in ten categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck (see Krizhevsky, Hinton et al. (2009) for details).Example images are shown in Figure 4.The images are resized to affect the dimensionality of the extracted features to which we fit a vMF distribution.We bilinearly resize the images to 32×, 64×64, and 128×128.To extract features from the images, we use a convolutional model as it is agnostic to the input dimensions and gives output dimensions based on the input dimensions.For this project, we treat the convolutional model as a black-box feature extractor.We used the convolutional layers from the ResNet50 model with pretrained weights, as this is a widely used baseline computer vision model He et al. (2016); Vryniotis (2021).Given the sizes of the images used, the extracted features have sizes 2048, 8192, and 32768.
As mentioned earlier, the von Mises-Fischer (vMF) distribution models data on the sphere, where it generalizes the normal distribution Mardia, Jupp, and Mardia (2000); Gatto (2010).The distribution is given by a mean direction, µ, and concentration parameter, κ, and has density We see above that we need I v (x) for v = p /2 − 1 to compute the probability density function for the vMF distribution.There are explicit expressions to fit the parameters µ and κ to a dataset by maximizing the log-likelihood.The mean direction is given by taking the mean of the data.Suppose that we have the data The concentration parameter κ is more difficult to estimate.However, Sra (2012) gave an approximation, κ 0 , which can be further improved by performing one or two Newton  7: Mean runtime in milliseconds over five runs when computing log I 0 (x) and log I 1 (x).The ± indicates the standard deviation for the five runs.The GNU Scientific Library (GSL), Boost, and CUDA use special functions for log I 0 (x) and log I 1 (x).Our library uses the general log I v (x).The fastest library is CUDA, and our GPU version is second.Our CPU version is comparable to the other CPU libraries.We can see that there is only a small difference between v = 0 and v = 1.
Numerical evaluations in their paper show a relative error for κ 2 between 10 −12 and 10 −11 over dimensions p ∈ [100, 100000] (Sra 2012, Table 2).We focus on the feasibility of fitting the vMF distributions to evaluate our library.We fit the mean direction parameter using Equation ( 22) to the data from the data pipeline.We then maximize the log-likelihood over the concentration parameter κ given the mean direction and compare the resulting estimate to the approximations κ i , i = 0, 1, 2. The log-likelihood is given by, We maximize the log-likelihood by minimizing the negative log-likelihood using minimize in SciPy with the L-BFGS-B method, bounding κ to be a non-negative number.The optimization is performed with and without analytic gradients.The results of the optimizations can be seen in Table 8.For the gradient-free method, we see that the estimated κ matches the first six, four, and two digits of κ 2 for the 2048-, 8192-, and 32768-dimensional features, respectively.
The optimizer stops because the gradient is estimated to be zero, which is caused by numerical errors, since the negative log-likelihood is still decreasing, with the minimum close to κ 2 .However, even with these numerical inaccuracies, the estimates are still within 0.2% of the best available estimate.With the gradient, we see that the optimizer can estimate κ much better, with relative errors ≤ 3.87 • 10 −11 .Thus, with the gradient, the relative errors match the errors reported by Sra (2012) for their method, so we cannot rule out that our errors are caused by inaccuracies in κ 2 .
We performed the same test using SciPy and mpmath to compute the Bessel functions; however, the optimizer could not converge in any tests.With mpmath, it would abort due to loss of precision, and with SciPy it would diverge.Thus, only with our library can we estimate the parameters of a vMF distribution by optimizing the log-likelihood to a dataset.This demonstrates that our approach opens up tasks involving the modified Bessel functions of the first and second kind that were previously not realistic.

Conclusion
This paper presents new algorithms for computing the logarithm of modified Bessel functions of the first and second kind that address critical precision limitations and underflow and overflow problems in existing libraries that make existing libraries unsuitable for many practical applications.Our algorithms consistently produced numerically stable results without underflow or overflow, with precision equal to or better than current C++ libraries and significantly faster running times.In most cases, the runtime was one or two orders of magnitude faster.22) and maximizing the log-likelihood over the concentration parameter κ.The estimates are compared with the approximations κ i , i = 0, 1, 2 from Equation ( 23).The gradient-free estimates have a relative error of 2.39 • 10 −6 , 1.71 • 10 −4 and 1.99 • 10 −3 for, respectively, 2048-, 8192-and 32768-dimensional features compared to κ 2 .With gradient, the relative errors drop to, respectively, 3.87 × 10 −11 , 2.13 × 10 −11 , 1.72 × 10 −11 .plications.We present an example use case by successfully fitting the von Mises-Fisher distribution to high-dimensional data by numerically optimizing the log-likelihood of the data.This demonstrates that our libraries enable the use of modified Bessel functions for high-dimensional data, which was previously not possible with available libraries.
Our methods significantly outperform existing libraries that compute exponentially scaled functions, especially in terms of robustness and computational efficiency.The only exception is the modified Bessel function of the second kind for small values, where our library currently lags behind existing solutions in speed and accuracy.However, our library is still more robust than existing libraries.Future research could explore the potential of developing functions that are specialized for certain inputs, such as integer order.In addition, implementing derivatives of these functions would facilitate the use of gradient-based optimization techniques, further extending the utility of our library.There is also potential to adapt our approach to other specialized functions that face similar computational challenges.
Stability comparison for computing log Iv(x): SciPy versus our library.The left panel shows SciPy's underflow regions (in red), indicating limited stability.In contrast, the right panel demonstrates that our library consistently returns finite values (in green), reflecting its high robustness.The color scheme highlights the computational reliability of our library over the tested domain.

Figure 1 :
Figure 1: Comparison of the robustness and runtime of Scipy's scaled implementation and our library for calculating the logarithm of the modified Bessel function of the first kind.

Figure 2 :
Figure2: Cumulative distribution of relative errors for our and the three third-party libraries when computing log I v (x) for the Small region.The plot illustrates the comparative performance in terms of numerical precision.Our library has a steeper curve, indicating a higher proportion of outputs with minimal relative errors, demonstrating our library's higher precision in evaluating the logarithm of the modified Bessel function of the first kind over the entire range tested.Table3indicate some outputs from our library with high relative errors compared to the reference values of Mathematica.These are values where v ≈ 100 and x ≈ 0.1, and Mathematica warns about the loss of precision for such inputs.Therefore, we extracted the 35 values with the highest relative errors for our library (relative errors of ≈ 10 −6 or higher), and recalculated the reference values using Wolfram|Alpha (Table4).We see that the most accurate library is now ours, which gives answers that are 12 orders of magnitude closer to the reference values than other libraries (errors near machine precision).GSL could not calculate an answer for any of the 35 values, while std and Boost had high median and maximum errors.These results suggest a discrepancy between Mathematica and Wolfram|Alpha.We were unable to find any information about this difference, but we noted that the Wolfram Language supports arbitrary-precision arithmeticWol (2003).However, the Wolfram language is used by both, so this would not explain why Mathematica has numerical prob-

Figure 3 :
Figure 3: Metric learning pipeline.Images from CIFAR10 ( Figure 4) are resized to three different sizes of D = 32, D = 64, and D = 128, and passed through the convolutional layers from ResNet50 to extract image features.We fit a vMF distribution to the l 2 -normalized features.The extracted features have 2048, 8192, and 32768 dimensions, respectively.We now want to show that our library can be used to solve a real-world problem using a setup similar toWarburg et al. (2024).Here, we fit high-dimensional features from image processing to von Mises Fischer, vMF, distributions.We choose this distribution because it models data

Figure 4 :
Figure 4: Example images from the CIFAR10 dataset.

Table 1 :
Input ranges where the asymptotic expressions for the modified Bessel functions of the first and second kind are applicable.Ordered top-to-bottom from fastest to slowest.

Table 4 :
Precision metrics for different libraries for 35 selected values in the Small region when calculating log I v (x).The robustness value in bold indicates the library with the highest robustness value, where the maximum relative error is used to break ties.The reference values are computed using Wolfram|Alpha.It can be seen that our library is much more precise when calculating log I v (x) compared to the other libraries when using Wolfram|Alpha for the reference solutions.GSL does not return any values, and thus the median and maximum are not available (N/A). 10 −16 2.16 • 10 −16 2.15 • 10 −16 2.15 • 10 −16 2.22 • 10 −16 2.22 • 10 −16

Table 5 :
Precision metrics for different libraries in Small and Large regions when calculating log I 0 (x).The errors are the relative errors compared to the reference solutions calculated with Mathematica.The robustness value in bold indicates the library with the highest robustness value, where the maximum relative error is used to break ties.We can see that std, boost, and CUDA all lack robustness in the Large region.Results for v = 1 are omitted as they are similar.

Table 6 :
Mean runtime in milliseconds over five runs for computing the modified Bessel functions of the first kind, log I v (x), and the second kind, log K v (x), for fractional orders in the two regions Small and Large.The ± indicates the standard deviation for the five runs.The fastest method for each function and region combination is highlighted in bold.Overall, our library is much faster than the other libraries, except for log K v (x) in the Small region.

Table 8 :
These results have implications for the use of the modified Bessel functions in practical ap-Results for fitting the vMF distribution to assess our implementation of log I v (x) in a real-world use case.The table presents estimates obtained by fitting the mean direction parameter according to Equation (